From mistersheik at gmail.com Sat Oct 1 14:07:25 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 1 Oct 2016 11:07:25 -0700 (PDT) Subject: [Python-ideas] Conditional context manager Message-ID: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> I'm just throwing this idea out there to get feedback. Sometimes, I want to conditionally enter a context manager. This simplest (afaik) way of doing that is: with ExitStack() as stack: if condition: cm = stack.enter_context(cm_function()) suite() I suggest a more compact notation: with cm_function() as cm if condition: suite() I'm not sure that this is possible within the grammar. (For some reason with with_expr contains '"as" expr' rather than '"as" NAME'? I realize this comes up somewhat rarely. I use context managers a lot, and it comes up maybe 1 in 5k lines of code. For some extensions of this notation, an else clause could bind a value to cm in the case that condition is false. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Oct 1 15:42:18 2016 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 1 Oct 2016 20:42:18 +0100 Subject: [Python-ideas] Conditional context manager In-Reply-To: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> References: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> Message-ID: <4a370b9d-b8de-761c-a7bc-92c272c34340@mrabarnett.plus.com> On 2016-10-01 19:07, Neil Girdhar wrote: > I'm just throwing this idea out there to get feedback. > > Sometimes, I want to conditionally enter a context manager. This > simplest (afaik) way of doing that is: > > with ExitStack() as stack: > if condition: > cm = stack.enter_context(cm_function()) > suite() > > I suggest a more compact notation: > > with cm_function() as cm if condition: > suite() > > I'm not sure that this is possible within the grammar. (For some reason > with with_expr contains '"as" expr' rather than '"as" NAME'? > > I realize this comes up somewhat rarely. I use context managers a lot, > and it comes up maybe 1 in 5k lines of code. > > For some extensions of this notation, an else clause could bind a value > to cm in the case that condition is false. > If you defined a null context manager, you could then write: with (cm_function() if condition else cm_null()) as cm: suite() Do you need 'cm' itself? Its type changes depending on the condition, so I don't see how it could be useful. If it's not needed, then that shortens a little to: with cm_function() if condition else cm_null(): suite() From mistersheik at gmail.com Sat Oct 1 16:01:34 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 01 Oct 2016 20:01:34 +0000 Subject: [Python-ideas] Conditional context manager In-Reply-To: <4a370b9d-b8de-761c-a7bc-92c272c34340@mrabarnett.plus.com> References: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> <4a370b9d-b8de-761c-a7bc-92c272c34340@mrabarnett.plus.com> Message-ID: FYI: There is a null context manager: ExitStack(). On Sat, Oct 1, 2016 at 3:43 PM MRAB wrote: > On 2016-10-01 19:07, Neil Girdhar wrote: > > I'm just throwing this idea out there to get feedback. > > > > Sometimes, I want to conditionally enter a context manager. This > > simplest (afaik) way of doing that is: > > > > with ExitStack() as stack: > > if condition: > > cm = stack.enter_context(cm_function()) > > suite() > > > > I suggest a more compact notation: > > > > with cm_function() as cm if condition: > > suite() > > > > I'm not sure that this is possible within the grammar. (For some reason > > with with_expr contains '"as" expr' rather than '"as" NAME'? > > > > I realize this comes up somewhat rarely. I use context managers a lot, > > and it comes up maybe 1 in 5k lines of code. > > > > For some extensions of this notation, an else clause could bind a value > > to cm in the case that condition is false. > > > If you defined a null context manager, you could then write: > > with (cm_function() if condition else cm_null()) as cm: > suite() > > Do you need 'cm' itself? Its type changes depending on the condition, so > I don't see how it could be useful. > > If it's not needed, then that shortens a little to: > > with cm_function() if condition else cm_null(): > suite() > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dcu3O1qaC3E/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Oct 1 16:02:02 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 2 Oct 2016 07:02:02 +1100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <5ee6a817-bc68-06f6-3ecf-75fe0c7a4139@gmail.com> <57EAE656.1070103@canterbury.ac.nz> <1475254377.680408.742129865.62D71D9D@webmail.messagingengine.com> Message-ID: > > > I think wasting of indentation levels for a single logical block should > be > > avoided if possible to make the code more legible, otherwise one hits the > > suggested line length limit too fast - suppose this is now inside a > method, > > you already lose at least 8 char ... > > Hence generators, which allow the nested loops to be readily factored > out into a named operation. > > def iter_interesting_triples(seq1, seq2, seq3): > for x in seq1: > if p1(x): > for y in seq2: > if p2(x, y): > for z in seq3: > if p3(x, y, z): > yield x, y, z > > for x, y, z in iter_interesting_triples(seq1, seq2, seq3): > f(x, y, z) > This is an elegant solution, but I think it makes the code less clear if one has to loop up the definition of the generator. I any case, irrespective of limits or being dispensable, I think it would be more consistent for the language to allow the same syntax for "for" loops as is allowed in comprehensions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Oct 1 16:11:14 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 1 Oct 2016 21:11:14 +0100 Subject: [Python-ideas] Conditional context manager In-Reply-To: References: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> Message-ID: Resending, because Google Groups messes up replying to the list :-( On 1 October 2016 at 21:09, Paul Moore wrote: > On 1 October 2016 at 19:07, Neil Girdhar wrote: >> Sometimes, I want to conditionally enter a context manager. This simplest >> (afaik) way of doing that is: >> >> with ExitStack() as stack: >> if condition: >> cm = stack.enter_context(cm_function()) >> suite() >> >> I suggest a more compact notation: >> >> with cm_function() as cm if condition: >> suite() > > This sounds like exactly the sort of situation ExitStack was designed > for. I'm not sure the situation is common enough to need dedicated > syntax beyond that. I actually find the ExitStack version easier to > understand, as the condition is more visible when it's not tucked away > at the end of the with statement. > > If compactness is that important, you could refactor the code to use a > custom context manager that encapsulates the "cm_function if condition > else nothing" pattern. > > Paul From rosuav at gmail.com Sat Oct 1 18:45:41 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 2 Oct 2016 09:45:41 +1100 Subject: [Python-ideas] Conditional context manager In-Reply-To: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> References: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> Message-ID: On Sun, Oct 2, 2016 at 5:07 AM, Neil Girdhar wrote: > I suggest a more compact notation: > > with cm_function() as cm if condition: > suite() > The simplest way would be to make a conditional version of the context manager. @contextlib.contextmanager def maybe_cm(state): if state: with cm_function() as cm: yield cm else: yield None I believe that'll work. ChrisA From ncoghlan at gmail.com Sun Oct 2 03:46:19 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Oct 2016 17:46:19 +1000 Subject: [Python-ideas] Fwd: Conditional context manager In-Reply-To: References: <80744782-bbdf-48e4-ac52-aac00e1d3af3@googlegroups.com> Message-ID: Forwarding to the list, since I took the broken Google Group cc out of the reply list, but didn't added the proper one. ---------- Forwarded message ---------- From: Nick Coghlan Date: 2 October 2016 at 17:45 Subject: Re: [Python-ideas] Conditional context manager To: Neil Girdhar On 2 October 2016 at 04:07, Neil Girdhar wrote: > I'm just throwing this idea out there to get feedback. > > Sometimes, I want to conditionally enter a context manager. This simplest > (afaik) way of doing that is: > > with ExitStack() as stack: > if condition: > cm = stack.enter_context(cm_function()) > suite() > > I suggest a more compact notation: > > with cm_function() as cm if condition: > suite() As Chris notes, this is typically going to be better handled by creating an *un*conditional CM that encapsulates the optionality so you don't need to worry about it at the point of use. If you wanted a generic version of that, then the stack creation and cm creation can be combined into a single context manager: @contextmanager def optional_cm(condition, cm_factory, *args, **kwds): stack = ExitStack() cm = None with stack: if condition: cm = stack.enter_context(cm_factory(*args, **kwds)) yield stack, cm However, even simpler than both this and Chris's maybe_cm() example is the plain ExitStack-as-the-null-context-manager function approach already covered in the contextlib docs: https://docs.python.org/3/library/contextlib.html#simplifying-support-for-single-optional-context-managers Applying that approach to this particular pattern looks like: def optional_cm(condition, cm_factory, *args, **kwds): if condition: return cm_factory(*args, **kwds) return ExitStack() Resulting in: with optional_cm(condition, cm_function): suite() which is fine for a construct that is uncommon in general, but may be popular in a particular code base. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rene at stranden.com Sun Oct 2 09:26:07 2016 From: rene at stranden.com (Rene Nejsum) Date: Sun, 2 Oct 2016 15:26:07 +0200 Subject: [Python-ideas] async objects Message-ID: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Having followed Yury Selivanov yselivanov.ml at gmail.com proposal to add async/await to Python (PEP 492 Coroutines with async and await syntax and (PEP 525 Asynchronous Generators) and and especially the discussion about PEP 530: Asynchronous Comprehensions I would like to add some concerns about the direction Python is taking on this. As Sven R. Kunze srkunze at mail.de mentions the is a risk of having to double a lot of methods/functions to have an Async implementation. Just look at the mess in .NET when Microsoft introduced async/await in their library, a huge number of functions had to be implemented with a Async version of each member. Definitely not the DRY principle. While I think parallelism and concurrency are very important features in a language, I feel the direction Python is taking right now is getting to complicated, being difficult to understand and implement correct. I thought it might be worth to look at using async at a higher level. Instead of making methods, generators and lists async, why not make the object itself async? Meaning that the method call (message to object) is async Example: class SomeClass(object): def some_method(self): return 42 o = async SomeClass() # Indicating that the user want?s an async version of the object r = o.some_method() # Will implicit be a async/await ?wrapped? method no matter impl. # Here other code could execute, until the result (r) is referenced print r I think above code is easier to implement, use and understand, while it handles some of the use cases handled by defining a lot of methods as async/await. I have made a small implementation called PYWORKS (https://github.com/pylots/pyworks ), somewhat based on the idea above. PYWORKS has been used in several real world implementation and seams to be fairly easy for developers to understand and use. br /Rene PS. This is my first post to python-ideas, please be gentle :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Mon Oct 3 10:52:18 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 3 Oct 2016 16:52:18 +0200 Subject: [Python-ideas] async objects In-Reply-To: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: Independently from what the proposed solution is, I think you raised a very valid concern: the DRY principle. Right now the stdlib has tons of client network libraries which do not support the new async model. As such, library vendors will have to rewrite them by using the new syntax and provide an "aftplib", "ahttplib" etc. and release them as third-party libs hosted on PYPI. This trend is already happening as we speak: https://github.com/python/asyncio/wiki/ThirdParty#clients It would be awesome if somehow the Python stdlib itself would provide some mechanism to make the existent "batteries" able to run asynchronously so that, say, ftplib or httplib can be used with asyncio as the base IO loop and at the same time maintain the same existent API. Gevent tried to do the same thing with http://www.gevent.org/gevent.monkey.html As for *how* to do that, I'm sorry to say that I really have no idea. It's a complicated issue, but I think it's good that this has been raised. On Sun, Oct 2, 2016 at 3:26 PM, Rene Nejsum wrote: > Having followed Yury Selivanov yselivanov.ml at gmail.com proposal to add > async/await to Python (PEP 492 Coroutines with async and await syntax and > (PEP 525 Asynchronous Generators) and and especially the discussion about > PEP 530: Asynchronous Comprehensions I would like to add some concerns > about the direction Python is taking on this. > > As Sven R. Kunze srkunze at mail.de mentions the is a risk of having to > double a lot of methods/functions to have an Async implementation. Just > look at the mess in .NET when Microsoft introduced async/await in their > library, a huge number of functions had to be implemented with a Async > version of each member. Definitely not the DRY principle. > > While I think parallelism and concurrency are very important features in a > language, I feel the direction Python is taking right now is getting to > complicated, being difficult to understand and implement correct. > > I thought it might be worth to look at using async at a higher level. > Instead of making methods, generators and lists async, why not make the > object itself async? Meaning that the method call (message to object) is > async > > Example: > > class SomeClass(object): > def some_method(self): > return 42 > > o = async SomeClass() # Indicating that the user want?s an async version > of the object > r = o.some_method() # Will implicit be a async/await ?wrapped? method > no matter impl. > # Here other code could execute, until the result (r) is referenced > print r > > I think above code is easier to implement, use and understand, while it > handles some of the use cases handled by defining a lot of methods as > async/await. > > I have made a small implementation called PYWORKS ( > https://github.com/pylots/pyworks), somewhat based on the idea above. > PYWORKS has been used in several real world implementation and seams to be > fairly easy for developers to understand and use. > > br > /Rene > > PS. This is my first post to python-ideas, please be gentle :-) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 3 11:15:04 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 3 Oct 2016 16:15:04 +0100 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: On 3 October 2016 at 15:52, Giampaolo Rodola' wrote: > Independently from what the proposed solution is, I think you raised a very > valid concern: the DRY principle. > Right now the stdlib has tons of client network libraries which do not > support the new async model. > As such, library vendors will have to rewrite them by using the new syntax > and provide an "aftplib", "ahttplib" etc. and release them as third-party > libs hosted on PYPI. This trend is already happening as we speak: > https://github.com/python/asyncio/wiki/ThirdParty#clients > It would be awesome if somehow the Python stdlib itself would provide some > mechanism to make the existent "batteries" able to run asynchronously so > that, say, ftplib or httplib can be used with asyncio as the base IO loop > and at the same time maintain the same existent API. > Gevent tried to do the same thing with > http://www.gevent.org/gevent.monkey.html > As for *how* to do that, I'm sorry to say that I really have no idea. It's a > complicated issue, but I think it's good that this has been raised. There's https://sans-io.readthedocs.io/ which proposes an approach to solving this issue. Paul From kaiser.yann at gmail.com Mon Oct 3 11:46:21 2016 From: kaiser.yann at gmail.com (Yann Kaiser) Date: Mon, 03 Oct 2016 15:46:21 +0000 Subject: [Python-ideas] async objects In-Reply-To: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords. Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control. On Sun, Oct 2, 2016, 14:26 Rene Nejsum wrote: > Having followed Yury Selivanov yselivanov.ml at gmail.com proposal to add > async/await to Python (PEP 492 Coroutines with async and await syntax and > (PEP 525 Asynchronous Generators) and and especially the discussion about > PEP 530: Asynchronous Comprehensions I would like to add some concerns > about the direction Python is taking on this. > > As Sven R. Kunze srkunze at mail.de mentions the is a risk of having to > double a lot of methods/functions to have an Async implementation. Just > look at the mess in .NET when Microsoft introduced async/await in their > library, a huge number of functions had to be implemented with a Async > version of each member. Definitely not the DRY principle. > > While I think parallelism and concurrency are very important features in a > language, I feel the direction Python is taking right now is getting to > complicated, being difficult to understand and implement correct. > > I thought it might be worth to look at using async at a higher level. > Instead of making methods, generators and lists async, why not make the > object itself async? Meaning that the method call (message to object) is > async > > Example: > > class SomeClass(object): > def some_method(self): > return 42 > > o = async SomeClass() # Indicating that the user want?s an async version > of the object > r = o.some_method() # Will implicit be a async/await ?wrapped? method > no matter impl. > # Here other code could execute, until the result (r) is referenced > print r > > I think above code is easier to implement, use and understand, while it > handles some of the use cases handled by defining a lot of methods as > async/await. > > I have made a small implementation called PYWORKS ( > https://github.com/pylots/pyworks), somewhat based on the idea above. > PYWORKS has been used in several real world implementation and seams to be > fairly easy for developers to understand and use. > > br > /Rene > > PS. This is my first post to python-ideas, please be gentle :-) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Yann Kaiser kaiser.yann at gmail.com yann.kaiser at efrei.net +33 6 51 64 01 89 https://github.com/epsy -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Oct 3 11:59:32 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 3 Oct 2016 16:59:32 +0100 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: <59e2bb05-79d5-883c-481d-b1a7daa20657@mrabarnett.plus.com> On 2016-10-03 16:46, Yann Kaiser wrote: > The way I see it, the great thing about async/await as opposed to > threading is that it is explicit about when execution will "take a > break" from your function or resume into it. This is made clear and > readable through the use of `await` keywords. > > Your proposal unfortunately goes directly against this idea of > explicitness. You won't know what function will need to be fed into an > event loop or not. You won't know where your code is going to lose or > gain control. > Could we turn this around the other way and allow the use of 'await' in both cases, checking at runtime whether it needs to behave asynchronously or not? > On Sun, Oct 2, 2016, 14:26 Rene Nejsum > wrote: > > Having followed Yury Selivanov yselivanov.ml > at gmail.com proposal to add async/await to > Python (PEP 492 Coroutines with async and await syntax and (PEP 525 > Asynchronous Generators) and and especially the discussion about > PEP 530: Asynchronous Comprehensions I would like to add some > concerns about the direction Python is taking on this. > > As Sven R. Kunze srkunze at mail.de mentions the is > a risk of having to double a lot of methods/functions to have an > Async implementation. Just look at the mess in .NET when Microsoft > introduced async/await in their library, a huge number of functions > had to be implemented with a Async version of each member. > Definitely not the DRY principle. > > While I think parallelism and concurrency are very important > features in a language, I feel the direction Python is taking right > now is getting to complicated, being difficult to understand and > implement correct. > > I thought it might be worth to look at using async at a higher > level. Instead of making methods, generators and lists async, why > not make the object itself async? Meaning that the method call > (message to object) is async > > Example: > > class SomeClass(object): > def some_method(self): > return 42 > > o = async SomeClass() # Indicating that the user want?s an async > version of the object > r = o.some_method() # Will implicit be a async/await ?wrapped? > method no matter impl. > # Here other code could execute, until the result (r) is referenced > print r > > I think above code is easier to implement, use and understand, while > it handles some of the use cases handled by defining a lot of > methods as async/await. > > I have made a small implementation called PYWORKS > (https://github.com/pylots/pyworks), somewhat based on the idea > above. PYWORKS has been used in several real world implementation > and seams to be fairly easy for developers to understand and use. > > br > /Rene > > PS. This is my first post to python-ideas, please be gentle :-) > From python at lucidity.plus.com Mon Oct 3 18:18:36 2016 From: python at lucidity.plus.com (Erik) Date: Mon, 3 Oct 2016 23:18:36 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: Message-ID: Hi, On 11/09/16 10:36, Dominik Gresch wrote: > So I asked myself if a syntax as follows would be possible: > > for i in range(10) if i != 5: > body I've read the thread and I understand the general issues with making the condition part of the expression. However, what if this wasn't part of changing the expression syntax but changing the declarative syntax instead to remove the need for a newline and indent after the colon? I'm fairly sure this will have been suggested and shot down in the past, but I couldn't find any obvious references so I'll say it (again?). The expression suggested could be spelled: for i in range(10): if i != 5: body So, if a colon followed by another suite is equivalent to the same construct but without the INDENT (and then the corresponding DEDENT unwinds up to the point of the first keyword) then we get something that's pretty much as succinct as Dominik suggested. Of course, we then might get: for i in myweirdobject: if i != 5: while foobar(i) > 10: while frob(i+1) < 99: body ... which is hideous. But is it actually _likely_? E. From rene at stranden.com Mon Oct 3 18:26:30 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 4 Oct 2016 00:26:30 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: Hi Yann/ > On 03 Oct 2016, at 17:46, Yann Kaiser wrote: > > The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords. The way I read this argument, a parallel could be ?the great thing about alloc/free is that it is explicit about when allocation will happen?, but I believe that the more control you can leave to the runtime the better. > Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control. I believe that you should be able to code concurrent code, without being to explicit about it, but let the runtime handle low-level timing, as long as you know your code will execute in the intended order. br /Rene > > On Sun, Oct 2, 2016, 14:26 Rene Nejsum > wrote: > Having followed Yury Selivanov yselivanov.ml at gmail.com proposal to add async/await to Python (PEP 492 Coroutines with async and await syntax and (PEP 525 Asynchronous Generators) and and especially the discussion about PEP 530: Asynchronous Comprehensions I would like to add some concerns about the direction Python is taking on this. > > As Sven R. Kunze srkunze at mail.de mentions the is a risk of having to double a lot of methods/functions to have an Async implementation. Just look at the mess in .NET when Microsoft introduced async/await in their library, a huge number of functions had to be implemented with a Async version of each member. Definitely not the DRY principle. > > While I think parallelism and concurrency are very important features in a language, I feel the direction Python is taking right now is getting to complicated, being difficult to understand and implement correct. > > I thought it might be worth to look at using async at a higher level. Instead of making methods, generators and lists async, why not make the object itself async? Meaning that the method call (message to object) is async > > Example: > > class SomeClass(object): > def some_method(self): > return 42 > > o = async SomeClass() # Indicating that the user want?s an async version of the object > r = o.some_method() # Will implicit be a async/await ?wrapped? method no matter impl. > # Here other code could execute, until the result (r) is referenced > print r > > I think above code is easier to implement, use and understand, while it handles some of the use cases handled by defining a lot of methods as async/await. > > I have made a small implementation called PYWORKS (https://github.com/pylots/pyworks ), somewhat based on the idea above. PYWORKS has been used in several real world implementation and seams to be fairly easy for developers to understand and use. > > br > /Rene > > PS. This is my first post to python-ideas, please be gentle :-) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- > Yann Kaiser > kaiser.yann at gmail.com > yann.kaiser at efrei.net > +33 6 51 64 01 89 > https://github.com/epsy -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 3 20:09:21 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 4 Oct 2016 09:09:21 +0900 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Rene Nejsum writes: > I believe that you should be able to code concurrent code, without > being to explicit about it, but let the runtime handle low-level > timing, as long as you know your code will execute in the intended > order. Isn't "concurrent code whose order of execution you know" an oxymoron? From anthony at xtfx.me Mon Oct 3 20:48:40 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Mon, 3 Oct 2016 19:48:40 -0500 Subject: [Python-ideas] async objects In-Reply-To: <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: On Oct 3, 2016 7:09 PM, "Stephen J. Turnbull" < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > > Rene Nejsum writes: > > > I believe that you should be able to code concurrent code, without > > being to explicit about it, but let the runtime handle low-level > > timing, as long as you know your code will execute in the intended > > order. > > Isn't "concurrent code whose order of execution you know" an oxymoron? They are referring to the synchronous nature of any independent control state. Whether it's a thread, a coroutine, a continuation, or whatever else doesn't really matter much. When a thing runs concurrently along side other things, it's still synchronous with respect to itself regardless of how many context switches occur before completion. Such things only need mechanisms to synchronize in order to cooperate. People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail. Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication? The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords. In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern. In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 3 23:31:22 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 13:31:22 +1000 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: Message-ID: On 4 October 2016 at 08:18, Erik wrote: > The expression suggested could be spelled: > > for i in range(10): if i != 5: > body > > So, if a colon followed by another suite is equivalent to the same construct > but without the INDENT (and then the corresponding DEDENT unwinds up to the > point of the first keyword) then we get something that's pretty much as > succinct as Dominik suggested. What's the pay-off though? The ultimate problem with deeply nested code isn't the amount of vertical whitespace it takes up - it's the amount of working memory it requires in the brain of a human trying to read it. "This requires a lot of lines and a lot of indentation" is just an affordance at time of writing that reminds the code author of the future readability problem they're creating for themselves. Extracting named chunks solves the underlying readability problem by reducing the working memory demand in reading the code (assuming the chunks are well named, so the reader can either make a useful guess about the purpose of the extracted piece without even looking at its documentation, or at least remember what it does after looking it up the first time they encounter it). By contrast, eliminating the vertical whitespace without actually reducing the level of nesting is merely hiding the readability problem without actually addressing it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Mon Oct 3 17:32:17 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 04 Oct 2016 10:32:17 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: <57F2CE61.4030308@canterbury.ac.nz> Yann Kaiser wrote: > The way I see it, the great thing about async/await as opposed to > threading is that it is explicit about when execution will "take a > break" from your function or resume into it. Another thing is that async/await tasks are very lightweight compared to OS threads, so you can afford to have a large number of them active at once. Rene's approach seems to be based on ordinary threads, so it would not have this property. -- Greg From ncoghlan at gmail.com Tue Oct 4 00:05:38 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 14:05:38 +1000 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: On 4 October 2016 at 10:48, C Anthony Risinger wrote: > In Go I can spawn a new control state (goroutine) at any time against any > function. This is clear in the code. In Erlang I can spawn a new control > state (Erlang process) at any time and it's also clear. Erlang is a little > different because it will preempt me, but the point is I am simply choosing > a target function to run in a new context. Gevent and even threading module > is another example of this pattern. Right, this thread is more about "imperative shell, asynchronous execution", than it is event driven servers. http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html and the code at https://bitbucket.org/ncoghlan/misc/src/default/tinkering/background_tasks.py gives an example of doing that with "schedule_coroutine", "run_in_foreground" and "call_in_background" helpers to drive the event loop. > In all reality you don't typically need many suspension points other than > around I/O, and occasionally heavy CPU, so I think folks are struggling to > understand (I admit, myself included) why the runtime doesn't want to be > more help and instead punts back to the developer. Because the asynchronous features are mostly developed by folks working on event driven servers, and the existing synchronous APIs are generally fine if you're running from a synchronous shell. That leads to the following calling models being reasonably well-standardised: - non-blocking synchronous from anywhere: just call it - blocking synchronous from synchronous: just call it - asynchronous from asynchronous: use await - blocking synchronous from asynchronous: use "loop.run_in_executor()" on the event loop The main arguable aspect there is "loop.run_in_executor()" being part of the main user facing API, rather than offering a module level `asyncio.call_in_background` helper function. What's not well-defined are the interfaces for calling into asynchronous code from synchronous code. The most transparent interface for that is gevent and the underlying greenlet support, which implement that at the C stack layer, allowing arbitrary threads to be suspended at arbitrary points. This doesn't give you any programming model benefits, it's just a lighter weight form of operating system level pre-emptive threading (see http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html#a-bit-of-background-info for more on that point). The next most transparent would be to offer a more POSIX-like shell experience, with the concepts of foreground and background jobs, and the constraint that the background jobs scheduled in the current thread only run while a foreground task is active. As far as I know, the main problems that can currently arise with that latter approach are when you attempt to run something in the foreground, but the event loop in the current thread is already running. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Tue Oct 4 00:19:17 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 4 Oct 2016 15:19:17 +1100 Subject: [Python-ideas] async objects In-Reply-To: <57F2CE61.4030308@canterbury.ac.nz> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <57F2CE61.4030308@canterbury.ac.nz> Message-ID: On Tue, Oct 4, 2016 at 8:32 AM, Greg Ewing wrote: > Yann Kaiser wrote: >> >> The way I see it, the great thing about async/await as opposed to >> threading is that it is explicit about when execution will "take a break" >> from your function or resume into it. > > > Another thing is that async/await tasks are very lightweight > compared to OS threads, so you can afford to have a large > number of them active at once. > > Rene's approach seems to be based on ordinary threads, so > it would not have this property. That keeps getting talked about, but one thing I've never seen is any sort of benchmark showing (probably per operating system) how many concurrent requests you need to have before threads become unworkable. Maybe calculate it in milli-Wikipedias, on the basis that English Wikipedia is a well-known site and has some solid stats published [1]. Of late, it's been seeing about 8e9 hits per month, or about 3000/second. So one millipedia would be three page requests per second. A single-core CPU running a simple and naive Python web application can probably handle several millipedias without any concurrency at all. (I would define "handle" as "respond to requests without visible queueing", which would mean about 50ms, for a guess - could maybe go to 100 or even 250, but then people might start noticing the slowness.) Once you're handling more requests than you can handle on a single thread, you need concurrency, but threading will do you fine for a while. Then at some further mark, threading is no longer sufficient, and you need something lighter-weight, such as asyncio. But has anyone ever measured what those two points are? ChrisA [1] http://reportcard.wmflabs.org/ From python-ideas at shalmirane.com Tue Oct 4 00:38:06 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 3 Oct 2016 21:38:06 -0700 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: Message-ID: <20161004043805.GB13643@kundert.designers-guide.com> In my experience it is exceptions and inconsistencies that consume 'working memory in the brain of humans'. By eliminating the distinction between list comprehensions and for loops we would be making the language simpler by eliminating an inconsistency. Furthermore, I do not believe it is valid to discard a potentially good idea simply because if taken to extreme it might result in ugly code. With that justification one could reject most ideas. The fact is, that in many cases this idea would result in cleaner, more compact code. We should be content to offer a language in which it is possible to express complex ideas cleanly and simply, and trust our users to use the language appropriately. For example, it was suggested that one could simplify a multi-level loop by moving the multiple levels of for loop into a separate function that acts as generator. And that is a nice idea, but when writing it, the writing of the generator function represents a speed bump. Whereas writing something like the following is simple, compact, quick, and obvious. There is no reason why it should not be allowed even though it might not always be the best approach to use: for i in range(5) for j in range(5) for k in range(5): ... And I would really like to be able to write loops of the form: for item in items if item is not None: ... It is something I do all the time, and it would be nice if it did not consume two levels on indentation. -Ken On Tue, Oct 04, 2016 at 01:31:22PM +1000, Nick Coghlan wrote: > On 4 October 2016 at 08:18, Erik wrote: > > The expression suggested could be spelled: > > > > for i in range(10): if i != 5: > > body > > > > So, if a colon followed by another suite is equivalent to the same construct > > but without the INDENT (and then the corresponding DEDENT unwinds up to the > > point of the first keyword) then we get something that's pretty much as > > succinct as Dominik suggested. > > What's the pay-off though? The ultimate problem with deeply nested > code isn't the amount of vertical whitespace it takes up - it's the > amount of working memory it requires in the brain of a human trying to > read it. "This requires a lot of lines and a lot of indentation" is > just an affordance at time of writing that reminds the code author of > the future readability problem they're creating for themselves. > > Extracting named chunks solves the underlying readability problem by > reducing the working memory demand in reading the code (assuming the > chunks are well named, so the reader can either make a useful guess > about the purpose of the extracted piece without even looking at its > documentation, or at least remember what it does after looking it up > the first time they encounter it). > > By contrast, eliminating the vertical whitespace without actually > reducing the level of nesting is merely hiding the readability problem > without actually addressing it. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rene at stranden.com Tue Oct 4 01:25:19 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 4 Oct 2016 07:25:19 +0200 Subject: [Python-ideas] async objects In-Reply-To: <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: <2E9ED61C-004E-4B6A-8BE7-7DEBB83383A4@stranden.com> > On 04 Oct 2016, at 02:09, Stephen J. Turnbull wrote: > > Rene Nejsum writes: >> I believe that you should be able to code concurrent code, without >> being to explicit about it, but let the runtime handle low-level >> timing, as long as you know your code will execute in the intended >> order. > > Isn't "concurrent code whose order of execution you know" an oxymoron? You are right, I should have been more specific. What I ment was that I don?t need code filled with async/await, I don?t care where it blocks, as long as it (the specific code block iI am looking at) runs in the order i wrote it :-) br /Rene > > From rosuav at gmail.com Tue Oct 4 01:26:58 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 4 Oct 2016 16:26:58 +1100 Subject: [Python-ideas] async objects In-Reply-To: <2E9ED61C-004E-4B6A-8BE7-7DEBB83383A4@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <2E9ED61C-004E-4B6A-8BE7-7DEBB83383A4@stranden.com> Message-ID: On Tue, Oct 4, 2016 at 4:25 PM, Rene Nejsum wrote: >> On 04 Oct 2016, at 02:09, Stephen J. Turnbull wrote: >> >> Rene Nejsum writes: >>> I believe that you should be able to code concurrent code, without >>> being to explicit about it, but let the runtime handle low-level >>> timing, as long as you know your code will execute in the intended >>> order. >> >> Isn't "concurrent code whose order of execution you know" an oxymoron? > > You are right, I should have been more specific. What I ment was that I don?t need code filled with async/await, I don?t care where it blocks, as long as it (the specific code block iI am looking at) runs in the order i wrote it :-) > Then you want threads. Easy! ChrisA From rene at stranden.com Tue Oct 4 01:37:20 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 4 Oct 2016 07:37:20 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: > On 04 Oct 2016, at 02:48, C Anthony Risinger wrote: > > On Oct 3, 2016 7:09 PM, "Stephen J. Turnbull" > wrote: > > > > Rene Nejsum writes: > > > > > I believe that you should be able to code concurrent code, without > > > being to explicit about it, but let the runtime handle low-level > > > timing, as long as you know your code will execute in the intended > > > order. > > > > Isn't "concurrent code whose order of execution you know" an oxymoron? > > They are referring to the synchronous nature of any independent control state. Whether it's a thread, a coroutine, a continuation, or whatever else doesn't really matter much. When a thing runs concurrently along side other things, it's still synchronous with respect to itself regardless of how many context switches occur before completion. Such things only need mechanisms to synchronize in order to cooperate. > I agree 100%. Ideally I think a language (would love it to be Python) should permit many (millions) of what we know as coroutines and then have as many threads as the CPU have cores to execute this coroutines, but I do not thing you as a programmer should be especially aware of this as you code. (Just like GC handles your alloc/free, the runtime should handle your ?concurrency?) > People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail. > Agree > Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication? > > The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords. > Have a look at the examples in David Beazley?s curio, he is one of the most knowable Python people I have met, but that code is almost impossible to read and understand. > In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern. > Having thought some more about it, I think that putting async i front of the object, could be kind of a channel i Go and other languages? > In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer. > Well put, we are definitely on the same page here, thank you. br /Rene > -- > > C Anthony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rene at stranden.com Tue Oct 4 03:17:32 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 4 Oct 2016 09:17:32 +0200 Subject: [Python-ideas] async objects In-Reply-To: <57F2CE61.4030308@canterbury.ac.nz> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <57F2CE61.4030308@canterbury.ac.nz> Message-ID: > On 03 Oct 2016, at 23:32, Greg Ewing wrote: > > Yann Kaiser wrote: >> The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. > > Another thing is that async/await tasks are very lightweight > compared to OS threads, so you can afford to have a large > number of them active at once. > > Rene's approach seems to be based on ordinary threads, so > it would not have this property. My implementation is, but it should not (have to) be, it only reflects my limited ability and time :-) The programmer should not need to be aware of where concurrency is achieved though coroutines or threads, ideally there should be one OS thread per core in the CPU running many (millions) of coroutines? br /Rene > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rene at stranden.com Tue Oct 4 03:30:29 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 4 Oct 2016 09:30:29 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <2E9ED61C-004E-4B6A-8BE7-7DEBB83383A4@stranden.com> Message-ID: > On 04 Oct 2016, at 07:26, Chris Angelico wrote: > > On Tue, Oct 4, 2016 at 4:25 PM, Rene Nejsum wrote: >>> On 04 Oct 2016, at 02:09, Stephen J. Turnbull wrote: >>> >>> Rene Nejsum writes: >>>> I believe that you should be able to code concurrent code, without >>>> being to explicit about it, but let the runtime handle low-level >>>> timing, as long as you know your code will execute in the intended >>>> order. >>> >>> Isn't "concurrent code whose order of execution you know" an oxymoron? >> >> You are right, I should have been more specific. What I ment was that I don?t need code filled with async/await, I don?t care where it blocks, as long as it (the specific code block iI am looking at) runs in the order i wrote it :-) >> > > Then you want threads. Easy! Well, yes and no. I other languages (Java/C#) where I have implemented concurrent objects ala PYWORKS it works pretty well, as long as you have less than maybe 10.000 threads But, in Python (CPython2 on multicore CPU) threads does not work! The GIL makes it impossible to have for example 100 threads sending messages between each other (See the Ring example in PYWORKS), that?s one reason why it would be interesting to have some kind of concurrency support built into the Python runtime. Today I see all kinds of tricks and workarounds to get around the GIL. Raging from starting several Python interpreters to difficult to read code using yield (now async/await), but when you have seen much more elegant support (Go, Erlang, maybe even ABCL) you kind of wish this could be added to you own favourite language. br /Rene > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Oct 4 03:50:20 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 4 Oct 2016 16:50:20 +0900 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > What's not well-defined are the interfaces for calling into > asynchronous code from synchronous code. I don't understand the relevance to the content of the thread. As I understand the main point, Sven and Rene don't believe that [the kind of] async code [they want to write] should need any keywords; just start the event loop and invoke functions, and that somehow automatically DTRTs. (I.e., AFAICS the intent is to unify generators and coroutines despite the insistence of Those Who Have Actually Implemented Stuff that generator != coroutine.) N.B. As I understand it, although Rene uses the async keyword when invoking the constructor, this could be just as well done with a factory function since he speaks of "wrapping" the object. And his example is in your "just call it" category: nonblocking synchronous code. That doesn't help me understand what he's really trying to do. His PyWorks project is documented as implementing the "actor" model, but async is more general than that AFAICS, and on the other hand I can't see how you can guarantee that a Python function won't modify global state. So OK, I can see that a performant implementation of the actor pattern (don't we have this in multiprocessing somewhere?) with a nice API (that's harder :-) and documented restrictions on what you can do in there might be a candidate for stdlib, but I don't see how it's related to the "async(io)" series of PEPs, which are specifically about interleaving arbitrary amounts of suspension in a Python program (which might manipulate global state, but we want to do it in a way such that we know that code between suspension points executes "atomically" from the point of view of other coroutines). Anthony also objects to the keywords, ie, that he'll need to pepper his "dual-purpose" code with "async" and "await". Again, AFAICS that implies that he doesn't see a need to distinguish async from lazy (coroutine from generator), since AFAICS you'd break the world if you changed the semantics of "def foo" to "async def foo". So if you're going to write async/await-style code, you're going to have to swallow the keywords. Am I just missing something? From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Oct 4 03:56:12 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 4 Oct 2016 16:56:12 +0900 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161004043805.GB13643@kundert.designers-guide.com> References: <20161004043805.GB13643@kundert.designers-guide.com> Message-ID: <22515.24732.345487.286855@turnbull.sk.tsukuba.ac.jp> These are my opinions; I don't claim any authority for them. I just don't find the proposed syntax as obvious and unambiguous as you do, and would like to explain why that is so. Ken Kundert writes: > In my experience it is exceptions and inconsistencies that consume 'working > memory in the brain of humans'. By eliminating the distinction between list > comprehensions and for loops we would be making the language simpler by > eliminating an inconsistency. I don't think of a comprehension as a for loop, I think of it as setbuilder notation (although of course I realize that since lists are sequences it has to be a for loop under the hood). So the claimed inconsistency here doesn't bother me. I realize it bothers a lot of people, but the proposed syntax is not obvious to me (ambiguous and inconsistent in its own way). > [T]he writing of the generator function represents a speed bump. It used to be, for me, but it really isn't any more. Perhaps you might get used to it if you tried it. Harder to argue: the fact that Guido and Nick (inter alia) consider it good style to use named functions makes that point a hard sell (ie, you don't need to convince me, you need to convince them). > Whereas writing something like the following is simple, compact, > quick, and obvious. There is no reason why it should not be allowed > even though it might not always be the best approach to use: > > for i in range(5) for j in range(5) for k in range(5): > ... To me, that is visually ambiguous with for i in (range(5) for j in (range(5) for k in range(5))): ... although syntactically the genexp requires the parentheses (and in fact is almost nonsensical!) I could easily see myself forgetting the parentheses (something I do frequently) when I *do* want to use a genexp (something I do frequently), with more or less hilarious results. As already mentioned: for i, j, k in itertools.product(range(5), range(5), range(5)): ... To me that is much clearer, because it expresses the rectangular shape of the i, j, k space. I would also stumble on for i in range(5) for j in range(i + 1): ... at least the first few times I saw it. Based on the English syntax of "for" (not to mention the genexp syntax), I would expect for j in range(i + 1) for i in range(5): ... If itertools.product is the wrong tool, then the loop bodies are presumably complex enough to deserve new indent levels. Note that simple filters like non_nil (see below) can easily be used, as long as the resulting set is still a product. > And I would really like to be able to write loops of the form: > > for item in items if item is not None: > ... def non_nil(items): return (item for item in items if item is not None) for item in non_nil(items): ... I think that's very readable, so the only reason why that 2-line function needs to be syntax that I can see is your distaste for defining functions, and that of other Python programmers who think like you. From p.f.moore at gmail.com Tue Oct 4 04:39:58 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 4 Oct 2016 09:39:58 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <22515.24732.345487.286855@turnbull.sk.tsukuba.ac.jp> References: <20161004043805.GB13643@kundert.designers-guide.com> <22515.24732.345487.286855@turnbull.sk.tsukuba.ac.jp> Message-ID: On 4 October 2016 at 08:56, Stephen J. Turnbull wrote: > These are my opinions; I don't claim any authority for them. I just > don't find the proposed syntax as obvious and unambiguous as you do, > and would like to explain why that is so. > > Ken Kundert writes: > > > In my experience it is exceptions and inconsistencies that consume 'working > > memory in the brain of humans'. By eliminating the distinction between list > > comprehensions and for loops we would be making the language simpler by > > eliminating an inconsistency. > > I don't think of a comprehension as a for loop, I think of it as > setbuilder notation (although of course I realize that since lists are > sequences it has to be a for loop under the hood). So the claimed > inconsistency here doesn't bother me. I realize it bothers a lot of > people, but the proposed syntax is not obvious to me (ambiguous and > inconsistent in its own way). > > > [T]he writing of the generator function represents a speed bump. > > It used to be, for me, but it really isn't any more. Perhaps you > might get used to it if you tried it. Harder to argue: the fact that > Guido and Nick (inter alia) consider it good style to use named > functions makes that point a hard sell (ie, you don't need to convince > me, you need to convince them). > > > Whereas writing something like the following is simple, compact, > > quick, and obvious. There is no reason why it should not be allowed > > even though it might not always be the best approach to use: > > > > for i in range(5) for j in range(5) for k in range(5): > > ... > > To me, that is visually ambiguous with > > for i in (range(5) for j in (range(5) for k in range(5))): > ... > > although syntactically the genexp requires the parentheses (and in > fact is almost nonsensical!) I could easily see myself forgetting the > parentheses (something I do frequently) when I *do* want to use a > genexp (something I do frequently), with more or less hilarious > results. As already mentioned: > > for i, j, k in itertools.product(range(5), range(5), range(5)): > ... > > To me that is much clearer, because it expresses the rectangular shape > of the i, j, k space. I would also stumble on > > for i in range(5) for j in range(i + 1): > ... > > at least the first few times I saw it. Based on the English syntax of > "for" (not to mention the genexp syntax), I would expect > > for j in range(i + 1) for i in range(5): > ... > > If itertools.product is the wrong tool, then the loop bodies are > presumably complex enough to deserve new indent levels. Note that > simple filters like non_nil (see below) can easily be used, as long as > the resulting set is still a product. > > > And I would really like to be able to write loops of the form: > > > > for item in items if item is not None: > > ... > > def non_nil(items): > return (item for item in items if item is not None) > > for item in non_nil(items): > ... > > I think that's very readable, so the only reason why that 2-line > function needs to be syntax that I can see is your distaste for > defining functions, and that of other Python programmers who think > like you. Again this is just personal opinion, but I agree 100% with everything Stephen said. It *is* a stumbling block to get used to writing generator functions like non_nil() above, but it's also a worthwhile learning experience. And yes, it's somewhat inconvenient to do so if you're working in the standard Python REPL, but designing language features around REPL usage isn't (IMO) the right choice. And if you really need a better way of handling that sort of refactoring in an interactive environment, tools like the Jupyter notebook are probably what you're looking for. (Trying to collapse multiple clauses into one line/statement is something Perl was famous for, and it's in many ways quite an attractive feature. But IMO it directly contributes to Perl's reputation for unreadability, because it *does* get used too much, whether you think it will or not - one person's "nicely compact" is another person's "obfuscated". So I'm glad that Python's design avoids encouraging that style - even though I do occasionally remember fondly my Perl one-liners :-)) Paul From ncoghlan at gmail.com Tue Oct 4 07:30:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 21:30:36 +1000 Subject: [Python-ideas] async objects In-Reply-To: <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: On 4 October 2016 at 17:50, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > What's not well-defined are the interfaces for calling into > > asynchronous code from synchronous code. > > I don't understand the relevance to the content of the thread. Given the schedule_coroutine/run_in_foreground distinction, it's relatively easy (for a given definition of easy) to write a proxy object that would make the following work: class SomeClass(object): def some_sync_method(self): return 42 async def some_async_method(self): await asyncio.sleep(3) return 42 o = auto_schedule(SomeClass()) # Indicating that the user wants an async version of the object r1 = o.some_sync_method() # Automatically run in a background thread r2 = o.some_async_method() # Automatically scheduled as a coroutine print(run_in_foreground(r1)) print(run_in_foreground(r2)) It's not particularly useful for an actual event driven server, but it should be entirely do-able for the purposes of providing a common interface over blocking and non-blocking APIs. What it *doesn't* do, and what you need greenlet for, is making that common interface look like you're using plain old synchronous C threads. If folks really want to do that, that's fine - they just need to add gevent/greenlet as a dependency, just as the folks that don't like the visibly object-oriented nature of the default unittest and logging APIs will often opt for third party alternative APIs that share some of the same underlying infrastructure. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From arek.bulski at gmail.com Tue Oct 4 07:32:49 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Tue, 4 Oct 2016 13:32:49 +0200 Subject: [Python-ideas] Float nan equality In-Reply-To: References: Message-ID: I had a bug where nan floats failed to compare equal because there seems to be more than one nan value and comparison seems to be binary based. How about we make float eq test if both are math. Isnan? -- Arkadiusz Bulski -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 4 07:37:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 21:37:04 +1000 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161004043252.GA13643@kundert.designers-guide.com> References: <20161004043252.GA13643@kundert.designers-guide.com> Message-ID: On 4 October 2016 at 14:32, Ken Kundert wrote: > For example, it was suggested that one could simplify a multi-level loop by > moving the multiple levels of for loop into a separate function that acts as > generator. And that is a nice idea, but when writing it, the writing of the > generator function represents a speed bump. Whereas writing something like the > following is simple, compact, quick, and obvious. There is no reason why it > should not be allowed even though it might not always be the best approach to > use: > > for i in range(5) for j in range(5) for k in range(5): > ... > > And I would really like to be able to write loops of the form: > > for item in items if item is not None: > ... > > It is something I do all the time, and it would be nice if it did not consume > two levels on indentation. And when you add the "else" clause that's supported by both "for" and "if", what does that mean in the abbreviated form? for item in items if item is not None: ... else: # ??? Or is the implicit proposal that this form be special cased to disallow the "else" clause? Comprehensions don't have that concern, as they don't support "else" clauses at all. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Oct 4 07:43:03 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 21:43:03 +1000 Subject: [Python-ideas] Float nan equality In-Reply-To: References: Message-ID: On 4 October 2016 at 21:32, Arek Bulski wrote: > I had a bug where nan floats failed to compare equal because there seems to > be more than one nan value and comparison seems to be binary based. "NaN != NaN" is part of the definition of IEEE 754 floats: https://en.wikipedia.org/wiki/NaN#Floating_point That's why it returns False even if you compare a specific NaN instance with itself: >>> x = float("nan") >>> x == x False If you need a kinda-like-NaN value that provides reflexive equality, then Python's None singleton is a better fit. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dmoisset at machinalis.com Tue Oct 4 08:03:05 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Tue, 4 Oct 2016 13:03:05 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004043252.GA13643@kundert.designers-guide.com> Message-ID: Something else that may look confusing can be a break statement; in a for i in range(5) for j in range(5) for k in range(5): ... break does it break the inner "k" loop, going to the next "j" (as it would happen with 3 nested loops), or does it end the whole for statement? Similar question with "continue" On 4 October 2016 at 12:37, Nick Coghlan wrote: > On 4 October 2016 at 14:32, Ken Kundert wrote: > > For example, it was suggested that one could simplify a multi-level loop > by > > moving the multiple levels of for loop into a separate function that > acts as > > generator. And that is a nice idea, but when writing it, the writing of > the > > generator function represents a speed bump. Whereas writing something > like the > > following is simple, compact, quick, and obvious. There is no reason why > it > > should not be allowed even though it might not always be the best > approach to > > use: > > > > for i in range(5) for j in range(5) for k in range(5): > > ... > > > > And I would really like to be able to write loops of the form: > > > > for item in items if item is not None: > > ... > > > > It is something I do all the time, and it would be nice if it did not > consume > > two levels on indentation. > > And when you add the "else" clause that's supported by both "for" and > "if", what does that mean in the abbreviated form? > > for item in items if item is not None: > ... > else: > # ??? > > Or is the implicit proposal that this form be special cased to > disallow the "else" clause? > > Comprehensions don't have that concern, as they don't support "else" > clauses at all. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Oct 4 09:20:02 2016 From: random832 at fastmail.com (Random832) Date: Tue, 04 Oct 2016 09:20:02 -0400 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004043252.GA13643@kundert.designers-guide.com> Message-ID: <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> On Tue, Oct 4, 2016, at 07:37, Nick Coghlan wrote: > And when you add the "else" clause that's supported by both "for" and > "if", what does that mean in the abbreviated form? > > for item in items if item is not None: > ... > else: > # ??? > > Or is the implicit proposal that this form be special cased to > disallow the "else" clause? I think it's obvious that it would be on the outermost construct (i.e. the one that would still be at the same indentation level fully expanded). The *real* question is what "break" should do. I think it should likewise break from the outermost for-loop (but "continue" should still continue the innermost one), but this does mean that it's not mechanically identical to the "equivalent" nested loops [it would, however, make it mechanically identical to the "generator and single loop" form] From ncoghlan at gmail.com Tue Oct 4 09:36:11 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Oct 2016 23:36:11 +1000 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> References: <20161004043252.GA13643@kundert.designers-guide.com> <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> Message-ID: On 4 October 2016 at 23:20, Random832 wrote: > On Tue, Oct 4, 2016, at 07:37, Nick Coghlan wrote: >> And when you add the "else" clause that's supported by both "for" and >> "if", what does that mean in the abbreviated form? >> >> for item in items if item is not None: >> ... >> else: >> # ??? >> >> Or is the implicit proposal that this form be special cased to >> disallow the "else" clause? > > I think it's obvious that it would be on the outermost construct (i.e. > the one that would still be at the same indentation level fully > expanded). But would that interpretation be obvious to folks that aren't yet aware that you can have "else" clauses on loops? (Folks can be *years* into using Python before they first encounter that, whether in real code or in a "Did you know about Python?" snippet) > The *real* question is what "break" should do. I think it should > likewise break from the outermost for-loop (but "continue" should still > continue the innermost one), but this does mean that it's not > mechanically identical to the "equivalent" nested loops [it would, > however, make it mechanically identical to the "generator and single > loop" form] Or we could stick with the status quo where limiting the keyword chaining to the expression form naturally avoids all of these awkward interactions with other statement level constructs. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Tue Oct 4 10:22:54 2016 From: random832 at fastmail.com (Random832) Date: Tue, 04 Oct 2016 10:22:54 -0400 Subject: [Python-ideas] Suggestion: Clear screen command for the REPL In-Reply-To: References: <809a59bf-1fb5-77e0-029b-bdafc2ffaa79@googlemail.com> <4f80e18e-afda-4165-84d0-6b403e8121ea@googlegroups.com> <20160929020428.GU22471@ando.pearwood.info> Message-ID: <1475590974.3173615.745440953.3667C59C@webmail.messagingengine.com> On Wed, Sep 28, 2016, at 23:36, Chris Angelico wrote: > On Thu, Sep 29, 2016 at 12:04 PM, Steven D'Aprano > wrote: > > (Also, it seems a shame that Ctrl-D is EOF in Linux and Mac, but Windows > > is Ctrl-Z + Return. Can that be standardized to Ctrl-D everywhere?) > > Sadly, I suspect not. If you're running in the default Windows > terminal emulator (the one a normal user will get by invoking > cmd.exe), you're running under a lot of restrictions, and I believe > one of them is that you can't get Ctrl-D without an enter. Well, we could read _everything_ in character-at-a-time mode, and implement our own line editing. In effect, that's what readline is doing. The main consequence of reading everything in character-at-a-time mode is that we'd have to implement everything ourselves, and the line editing you get *without* doing it yourself is somewhat nicer on Windows than on Linux (it supports cursor movement, inserting characters, and history). On Wed, Sep 28, 2016, at 23:41, ????? wrote: > "Bash on Ubuntu on windows" responds to CTRL+D just fine. I don't really > know how it works, but it looks like it is based on the Windows terminal > emulator. It runs inside it, but it's using the "Windows Subsystem for Linux", which (I assume) reads character-at-a-time and feeds it to a Unix-like terminal driver, (which Bash then has incidentally also put in character-at-a-time mode by using readline - to see what you get on WSL *without* doing this, try running "cat" under bash.exe) From srkunze at mail.de Tue Oct 4 11:06:29 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 4 Oct 2016 17:06:29 +0200 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> References: <20161004043252.GA13643@kundert.designers-guide.com> <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> Message-ID: On 04.10.2016 15:20, Random832 wrote: > The *real* question is what "break" should do. I think it should > likewise break from the outermost for-loop (but "continue" should still > continue the innermost one), but this does mean that it's not > mechanically identical to the "equivalent" nested loops [it would, > however, make it mechanically identical to the "generator and single > loop" form] To me, a for loop starts with a "for" and ends with a ":". I wouldn't mind the ability of more "for"s or "if"s in between. I would skip over them anyway while reading. Technically, I agree with you as it matches my intuition: for blaa foo blah blaaaa blubber babble: break # go outside continue # go to next item else: # no break Cheers, Sven From guido at python.org Tue Oct 4 11:31:12 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Oct 2016 08:31:12 -0700 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Oct 3, 2016 at 10:37 PM, Rene Nejsum wrote: > Ideally I think a language (would love it to be Python) should > permit many (millions) of what we know as coroutines and then have as many > threads as the CPU have cores to execute this coroutines, but I do not thing > you as a programmer should be especially aware of this as you code. (Just > like GC handles your alloc/free, the runtime should handle your > ?concurrency?) There's a problem with this model (of using all CPUs to run coroutines), since when you have two coroutines that can run in unspecified order but update the same datastructure, the current coroutine model *promises* that they will not run in parallel -- they may only alternate running if they use `await`. This promise implies that you can update the datastructure without worrying about locking as long as you don't use `await` in the middle. (IOW it's non-pre-emptive scheduling.) If you were to change the model to allow multiple coroutines being executed in parallel on multiple CPUs, such coroutines would have to use locks locks, and then you have all the problems of threading back in your coroutines! (There might be other things too, but there's no wait to avoid a fundamental change in the concurrency model.) Basically you're asking for Go's concurrency model -- it's nice in some ways, but asyncio wasn't made to do that, and I'm not planning to change it (let's wait for a GIL-free Python 4 first). I'm still trying to figure out my position on the other points of discussion here -- keep discussing! -- --Guido van Rossum (python.org/~guido) From mertz at gnosis.cx Tue Oct 4 11:36:08 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 4 Oct 2016 08:36:08 -0700 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161004043805.GB13643@kundert.designers-guide.com> References: <20161004043805.GB13643@kundert.designers-guide.com> Message-ID: In my mind, these proposed complications of the 'for' loop would *introduce* inconsistency, NOT reduce it. It's simple to remember that suites nest statements while comprehensions are expressions on single (logical) lines. Adding more edge cases to blue the distinction makes cognitive load higher. On Oct 3, 2016 9:38 PM, "Ken Kundert" wrote: > In my experience it is exceptions and inconsistencies that consume 'working > memory in the brain of humans'. By eliminating the distinction between list > comprehensions and for loops we would be making the language simpler by > eliminating an inconsistency. > > Furthermore, I do not believe it is valid to discard a potentially good > idea > simply because if taken to extreme it might result in ugly code. With that > justification one could reject most ideas. The fact is, that in many cases > this > idea would result in cleaner, more compact code. We should be content to > offer > a language in which it is possible to express complex ideas cleanly and > simply, > and trust our users to use the language appropriately. > > For example, it was suggested that one could simplify a multi-level loop by > moving the multiple levels of for loop into a separate function that acts > as > generator. And that is a nice idea, but when writing it, the writing of the > generator function represents a speed bump. Whereas writing something like > the > following is simple, compact, quick, and obvious. There is no reason why it > should not be allowed even though it might not always be the best approach > to > use: > > for i in range(5) for j in range(5) for k in range(5): > ... > > And I would really like to be able to write loops of the form: > > for item in items if item is not None: > ... > > It is something I do all the time, and it would be nice if it did not > consume > two levels on indentation. > > -Ken > > On Tue, Oct 04, 2016 at 01:31:22PM +1000, Nick Coghlan wrote: > > On 4 October 2016 at 08:18, Erik wrote: > > > The expression suggested could be spelled: > > > > > > for i in range(10): if i != 5: > > > body > > > > > > So, if a colon followed by another suite is equivalent to the same > construct > > > but without the INDENT (and then the corresponding DEDENT unwinds up > to the > > > point of the first keyword) then we get something that's pretty much as > > > succinct as Dominik suggested. > > > > What's the pay-off though? The ultimate problem with deeply nested > > code isn't the amount of vertical whitespace it takes up - it's the > > amount of working memory it requires in the brain of a human trying to > > read it. "This requires a lot of lines and a lot of indentation" is > > just an affordance at time of writing that reminds the code author of > > the future readability problem they're creating for themselves. > > > > Extracting named chunks solves the underlying readability problem by > > reducing the working memory demand in reading the code (assuming the > > chunks are well named, so the reader can either make a useful guess > > about the purpose of the extracted piece without even looking at its > > documentation, or at least remember what it does after looking it up > > the first time they encounter it). > > > > By contrast, eliminating the vertical whitespace without actually > > reducing the level of nesting is merely hiding the readability problem > > without actually addressing it. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Oct 4 11:42:23 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 4 Oct 2016 08:42:23 -0700 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> References: <20161004043252.GA13643@kundert.designers-guide.com> <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> Message-ID: On Oct 4, 2016 6:20 AM, "Random832" wrote: > > for item in items if item is not None: > > ... > > else: > > # ??? > > I think it's obvious that it would be on the outermost construct (i.e. > the one that would still be at the same indentation level fully > expanded). I think it's obvious it would be the innermost construct... Or at least very plausible. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Oct 4 11:49:02 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 5 Oct 2016 02:49:02 +1100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004043252.GA13643@kundert.designers-guide.com> <1475587202.3159752.745413905.727E8DE8@webmail.messagingengine.com> Message-ID: On Wed, Oct 5, 2016 at 2:42 AM, David Mertz wrote: > On Oct 4, 2016 6:20 AM, "Random832" wrote: >> > for item in items if item is not None: >> > ... >> > else: >> > # ??? > >> >> I think it's obvious that it would be on the outermost construct (i.e. >> the one that would still be at the same indentation level fully >> expanded). > > I think it's obvious it would be the innermost construct... Or at least very > plausible. My reading of this is that the loop consists of a single filtered iteration, ergo break/continue/else are as if the loop used a generator: # for item in items if item is not None: for item in (item for item in items if item is not None): These two would be semantically equivalent, and the first one has the advantage of not sounding like the Cheshire Cat as Alice entered 'Machinations'. << Time to jump in time to jump through time.... I'm dizzy. >> ChrisA From steve at pearwood.info Tue Oct 4 12:07:42 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 5 Oct 2016 03:07:42 +1100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: Message-ID: <20161004160740.GF22471@ando.pearwood.info> On Tue, Oct 04, 2016 at 01:31:22PM +1000, Nick Coghlan wrote: > By contrast, eliminating the vertical whitespace without actually > reducing the level of nesting is merely hiding the readability problem > without actually addressing it. +1 Extra newlines are cheap. Writing for x in expression: if condition: block is a simple, clean idiom that is easy to understand, avoids a number of pitfalls (where do you put the elif or else if you need one?), and only costs one extra line and one extra indent. If you have so many indents that this is a problem, that's a code smell and you ought to think more closely about what you are doing. There's another variation that saves an indent for the cost of one more line: for x in expression: if not condition: continue block In contrast, comprehensions are a single expression and are expected to usually be written in one line, although that's often hard to do without very long lines. They cannot include elif or else clauses, so avoid that particular pitfall. But the "very long line" problem shows that they are too dense: simple examples look fine: [x+1 for x in seq if cond] but in practice, they're often much longer with a much higher density of code: [default if obj is None else obj.method(arg) for (obj, count) in zip(values, counts) if count > 1] Some compromise on the optimal level of readability and code density is allowed: that's the price we pay in order to squeeze everything into a single expression. But that is not something we ought to copy when we have the luxury of a suite of statements. -- Steve From guido at python.org Tue Oct 4 12:15:31 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Oct 2016 09:15:31 -0700 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Oct 4, 2016 at 4:30 AM, Nick Coghlan wrote: > class SomeClass(object): > def some_sync_method(self): > return 42 > async def some_async_method(self): > await asyncio.sleep(3) > return 42 > > o = auto_schedule(SomeClass()) # Indicating that the user wants an > async version of the object > r1 = o.some_sync_method() # Automatically run in a background thread > r2 = o.some_async_method() # Automatically scheduled as a coroutine > print(run_in_foreground(r1)) > print(run_in_foreground(r2)) So maybe r1 and r2 are just concurrent.futures.Futures, and run_in_foreground(r) wraps r.result(). And auto_schedule() is a proxy that turns all method calls into async calls with a (concurrent) Future to wait for the result. There's an event loop somewhere that sits idle except when you call run_in_foreground() on somethong; it's only used for the async methods, since the sync methods run in a background thread (pool, I hope). Or perhaps r2 is an asyncio.Future and run_in_foreground(r2) wraps loop.run_until_complete(r2). I suppose the event loop should also be activated when waiting for r1, so maybe r1 should be an asyncio Future that wraps a concurrent Future (using asyncio.wrap_future(), which can do just that thing). Honestly it feels like many things can go wrong with this API model, esp. you haven't answered what should happen when a method of SomeClass (either a synchronous one or an async one) calls run_in_foreground() on something -- or, more likely, calls some harmless-looking function that calls another harmless-looking function that calls run_in_foreground(). At that point you have pre-emptive scheduling back in play (or your coroutines may be blocked unnecessarily) and I think you have nothing except a more complicated API to work with threads. I think I am ready to offer a counterproposal where the event loop runs in one thread and synchronous code runs in another thread and we give the synchronous code a way to synchronously await a coroutine or an asyncio.Future. This can be based on asyncio.run_coroutine_threadsafe(), which takes a coroutine or an asyncio.Future and returns a concurrent Future. (It also takes a loop, and it assumes that loop runs in a different thread. I think it should assert that.) The main feature of my counterproposal as I see it is that async code should not call back into synchronous code, IOW once you are writing coroutines, you have to use the coroutine API for everything you do. And if something doesn't have a coroutine API, you run it in a background thread using loop.run_in_executor(). So either you buy into the async way of living and it's coroutines all the way down from there, no looking back -- or you stay on the safe side of the fence, and you interact with coroutines only using a very limited "remote manipulator" API. The two don't mix any better than that. -- --Guido van Rossum (python.org/~guido) From srkunze at mail.de Tue Oct 4 12:40:58 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 4 Oct 2016 18:40:58 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: On 04.10.2016 13:30, Nick Coghlan wrote: > What it *doesn't* do, and what you need greenlet for, is making that > common interface look like you're using plain old synchronous C > threads. > > If folks really want to do that, that's fine - they just need to add > gevent/greenlet as a dependency, just as the folks that don't like the > visibly object-oriented nature of the default unittest and logging > APIs will often opt for third party alternative APIs that share some > of the same underlying infrastructure. Maybe, this is all a big misunderstanding. asyncio is incompatible with regular execution flow and it's **always blocking**. However, asyncio is perceived by some of us (including me) as a shiny alternative to processes and threads but really isn't. I remember doing this survey on python-ideas (results here: https://srkunze.blogspot.de/2016/02/concurrency-in-python.html) but I get the feeling that we still miss something. My impression is that asyncio shall be used for something completely different than dropping off things into a background worker. But looking at the cooking example given by Steve Dower (cf. blog post), at other explanations, at examples in the PEPs, it just seems to me that his analogy could have been made with threads and processes as well. At its core (the ASYNC part), asyncio is quite similar to threads and processes. But its IO-part seem to drive some (design) decisions that don't go well with the existing mental model of many developers. Even PEP-reviewers are fooled by simple asyncio examples. Why? Because they forget to spawn an eventloop. "async def and await" are just useless without an eventloop. And maybe that's what's people frustration is about. They want the ASYNC part without worrying about the IO part. Furthermore, adding 2 (TWO) new keywords to a language has such an immense impact. Especially when those people are told "the barrier for new keywords is quite high!!". So, these new keywords must mean something. I think what would help here are concrete answers to: 0) Is asyncio a niche feature only be used for better IO? 1) What is the right way of integrating asyncio into existing code? 2) How do we intend to solve the DRY-principle issue? If the answer is "don't use asyncio", that's a fine result but honestly I think it would be just insane to assume that we got all these features, all this work and all those duplicated functions all for nothing. I can't believe that. So, I am still looking for a reasonable use-case of asyncio in our environment. Cheers, Sven From srkunze at mail.de Tue Oct 4 12:40:28 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 4 Oct 2016 18:40:28 +0200 Subject: [Python-ideas] async objects In-Reply-To: <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> On 04.10.2016 09:50, Stephen J. Turnbull wrote: > As I understand the main point, Sven and Rene don't believe that [the > kind of] async code [they want to write] should need any keywords; > just start the event loop and invoke functions, and that somehow > automatically DTRTs. [reading my name second time] I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls. As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what Ren? and Anthony objects about. Cheers, Sven From rosuav at gmail.com Tue Oct 4 13:38:04 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 5 Oct 2016 04:38:04 +1100 Subject: [Python-ideas] async objects In-Reply-To: <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> Message-ID: On Wed, Oct 5, 2016 at 3:40 AM, Sven R. Kunze wrote: > I don't think that's actually what I wanted here. One simple keyword should > have sufficed just like golang did. So, the developer gets a way to decide > whether or not he needs it blocking or nonblocking **when using a > function**. He doesn't need to decide it **when writing the function**. The only way to do that is to write *every* function as async, and then if you want it blocking, you immediately wait for it. In other words, you write everything asynchronously. ChrisA From eryksun at gmail.com Tue Oct 4 14:47:30 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 4 Oct 2016 18:47:30 +0000 Subject: [Python-ideas] Suggestion: Clear screen command for the REPL In-Reply-To: <1475590974.3173615.745440953.3667C59C@webmail.messagingengine.com> References: <809a59bf-1fb5-77e0-029b-bdafc2ffaa79@googlemail.com> <4f80e18e-afda-4165-84d0-6b403e8121ea@googlegroups.com> <20160929020428.GU22471@ando.pearwood.info> <1475590974.3173615.745440953.3667C59C@webmail.messagingengine.com> Message-ID: On Tue, Oct 4, 2016 at 2:22 PM, Random832 wrote: > On Wed, Sep 28, 2016, at 23:36, Chris Angelico wrote: >> On Thu, Sep 29, 2016 at 12:04 PM, Steven D'Aprano >> wrote: >> > (Also, it seems a shame that Ctrl-D is EOF in Linux and Mac, but Windows >> > is Ctrl-Z + Return. Can that be standardized to Ctrl-D everywhere?) >> >> Sadly, I suspect not. If you're running in the default Windows >> terminal emulator (the one a normal user will get by invoking >> cmd.exe), you're running under a lot of restrictions, and I believe >> one of them is that you can't get Ctrl-D without an enter. > > Well, we could read _everything_ in character-at-a-time mode, and > implement our own line editing. In effect, that's what readline is > doing. 3.6+ switched to calling ReadConsoleW, which allows using a 32-bit control mask to indicate which ASCII control codes should terminate a read. The control character is left in the input string, so it's possible to define custom behavior for multiple control characters. Here's a basic ctypes example of how this feature works. In each case, after calling ReadConsoleW I enter "spam" and then type a control character to terminate the read. import sys import msvcrt import ctypes kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) ReadConsoleW = kernel32.ReadConsoleW CTRL_MASK = 2 ** 32 - 1 # all ctrl codes hin = msvcrt.get_osfhandle(sys.stdin.fileno()) buf = (ctypes.c_wchar * 10)(*('-' * 10)) pn = (ctypes.c_ulong * 1)() ctl = (ctypes.c_ulong * 4)(16, 0, CTRL_MASK, 0) >>> # Ctrl+2 or Ctrl+@ (i.e. NUL) ... ret = ReadConsoleW(hin, buf, 10, pn, ctl); print() spam >>> buf[:] 'spam\x00-----' >>> # Ctrl+D ... ret = ReadConsoleW(hin, buf, 10, pn, ctl); print() spam >>> buf[:] 'spam\x04-----' >>> # Ctrl+[ ... ret = ReadConsoleW(hin, buf, 10, pn, ctl); print() spam >>> buf[:] 'spam\x1b-----' This could be used to implement Ctrl+D and Ctrl+L support in PyOS_Readline. Supporting Ctrl+L to work like GNU readline wouldn't be a trivial one-liner, but it's doable. It has to clear the screen and also write the input (except the Ctrl+L) back to the input buffer. > The main consequence of reading everything in character-at-a-time mode > is that we'd have to implement everything ourselves, and the line > editing you get *without* doing it yourself is somewhat nicer on Windows > than on Linux (it supports cursor movement, inserting characters, and > history). Line-input mode also supports F7 for a history popup window to select a previous command; Ctrl+F to search the screen text; text selection (e.g. shift+arrows or Ctrl+A); copy/paste via Ctrl+C and Ctrl+V (or Ctrl+Insert and Shift+Insert); and parameterized input aliases ($1-$9 and $* for parameters). https://technet.microsoft.com/en-us/library/mt427362 https://technet.microsoft.com/en-us/library/cc753867 >> "Bash on Ubuntu on windows" responds to CTRL+D just fine. I don't really >> know how it works, but it looks like it is based on the Windows terminal >> emulator. > > It runs inside it, but it's using the "Windows Subsystem for Linux", > which (I assume) reads character-at-a-time and feeds it to a Unix-like > terminal driver, (which Bash then has incidentally also put in > character-at-a-time mode by using readline - to see what you get on WSL > *without* doing this, try running "cat" under bash.exe) Let's take a look at how WSL modifies the console's global state. Here's a simple function to print the console's input and output modes and codepages, which we can call in the background to monitor the console state: def report(): hin = msvcrt.get_osfhandle(0) hout = msvcrt.get_osfhandle(1) modeIn = (ctypes.c_ulong * 1)() modeOut = (ctypes.c_ulong * 1)() kernel32.GetConsoleMode(hin, modeIn) kernel32.GetConsoleMode(hout, modeOut) cpIn = kernel32.GetConsoleCP() cpOut = kernel32.GetConsoleOutputCP() print('\nmodeIn=%x, modeOut=%x, cpIn=%d, cpOut=%d' % (modeIn[0], modeOut[0], cpIn, cpOut)) def monitor(): report() t = threading.Timer(10, monitor, ()) t.start() >>> monitor(); subprocess.call('bash.exe') modeIn=f7, modeOut=3, cpIn=437, cpOut=437 ... modeIn=2d8, modeOut=f, cpIn=65001, cpOut=65001 See the following page for a description of the mode flags: https://msdn.microsoft.com/en-us/library/ms686033 The output mode changed from 0x3 to 0xf, enabling DISABLE_NEWLINE_AUTO_RETURN (0x8) ENABLE_VIRTUAL_TERMINAL_PROCESSING (0x4) The input mode changed from 0xf7 to 0x2d8, enabling ENABLE_VIRTUAL_TERMINAL_INPUT (0x200) ENABLE_WINDOW_INPUT (0x8, probably for SIGWINCH) and disabling ENABLE_INSERT_MODE (0x20) ENABLE_ECHO_INPUT (0x4) ENABLE_LINE_INPUT (0x2) ENABLE_PROCESSED_INPUT (0x1) So you're correct that it's basically using a raw read, except it's also translating some input keys to VT100 sequences. If you Ctrl+Break out of WSL, don't plan to reuse the console for regular Windows console programs. You could reset the modes and codepages, but it'll simpler to just open a new console. Here's an example of the VT100 sequences for the arrow keys after breaking out of WSL: C:\>^[[A^[[B^[[C^[[D WSL also changes the input and output codepages to 65001 (UTF-8). It hasn't done anything to fix the console's broken support for non-ASCII input when using UTF-8. But instead of getting an empty read (i.e. EOF) like what we see in this case with the cooked read used by Windows Python, WSL's raw read simply strips out non-ASCII input. That's simply brilliant. /s From nathan12343 at gmail.com Tue Oct 4 16:52:27 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 4 Oct 2016 15:52:27 -0500 Subject: [Python-ideas] Improve error message when missing 'self' in method definition Message-ID: Hi all, Recently pypy received a patch that improves the error message one gets when 'self' is missing in a method's signature: https://mail.python.org/pipermail/pypy-dev/2016-September/014678.html Here are the commits that implement the change in pypy: https://bitbucket.org/pypy/pypy/commits/all?search=branch(better-error-missing-self) I'm curious whether a similar improvement would also be received well in CPython. In particular, this guides one to the correct solution for a common programming mistake made by newcomers (and even not-so-newcomers). Here is a case study that found this was a common source of errors for newcomers: http://dl.acm.org/citation.cfm?id=2960327 -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Oct 4 17:31:42 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 4 Oct 2016 17:31:42 -0400 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: <7544aeeb-6126-5ab9-ecdd-a57d1d9ec6f3@gmail.com> On 2016-10-04 4:52 PM, Nathan Goldbaum wrote: > Hi all, > > Recently pypy received a patch that improves the error message one gets > when 'self' is missing in a method's signature: > > https://mail.python.org/pipermail/pypy-dev/2016-September/014678.html > > Here are the commits that implement the change in pypy: > > https://bitbucket.org/pypy/pypy/commits/all?search=branch(better-error-missing-self) > > I'm curious whether a similar improvement would also be received well in > CPython. In particular, this guides one to the correct solution for a > common programming mistake made by newcomers (and even not-so-newcomers). +1 on the idea. Yury From python-ideas at shalmirane.com Wed Oct 5 00:09:40 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Tue, 4 Oct 2016 21:09:40 -0700 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161004160740.GF22471@ando.pearwood.info> References: <20161004160740.GF22471@ando.pearwood.info> Message-ID: <20161005040940.GA23968@kundert.designers-guide.com> On Wed, Oct 05, 2016 at 03:07:42AM +1100, Steven D'Aprano wrote: > > Extra newlines are cheap. Writing > The cost is paid in newlines *and* extra levels of indentation. Why isn't it the programmer that is writing the code the best person to decide what is best? -Ken From ethan at stoneleaf.us Wed Oct 5 00:11:52 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 04 Oct 2016 21:11:52 -0700 Subject: [Python-ideas] xfork [was Re: async objects] In-Reply-To: <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> Message-ID: <57F47D88.6050503@stoneleaf.us> On 10/04/2016 09:40 AM, Sven R. Kunze wrote: > As a result of past discussions, I wrote the module "xfork" > which basically does this "golang goroutine" stuff. It's just > a thin wrapper around "futures" but it allows to avoid that > what Ren? and Anthony objects about. Looks cool! Thanks! -- ~Ethan~ From ncoghlan at gmail.com Wed Oct 5 01:46:22 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Oct 2016 15:46:22 +1000 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: On 5 October 2016 at 02:15, Guido van Rossum wrote: > Honestly it feels like many things can go wrong with this API model, > esp. you haven't answered what should happen when a method of > SomeClass (either a synchronous one or an async one) calls > run_in_foreground() on something -- or, more likely, calls some > harmless-looking function that calls another harmless-looking function > that calls run_in_foreground(). At that point you have pre-emptive > scheduling back in play (or your coroutines may be blocked > unnecessarily) and I think you have nothing except a more complicated > API to work with threads. Yeah, that's the main reason I haven't gone beyond this as a toy idea - there are so many ways to get yourself in trouble if you don't already understand the internal details. > I think I am ready to offer a counterproposal where the event loop > runs in one thread and synchronous code runs in another thread and we > give the synchronous code a way to synchronously await a coroutine or > an asyncio.Future. This can be based on > asyncio.run_coroutine_threadsafe(), which takes a coroutine or an > asyncio.Future and returns a concurrent Future. (It also takes a loop, > and it assumes that loop runs in a different thread. I think it should > assert that.) Oh, that makes a lot more sense, as we'd end up with a situation where async code gets used in one of two ways: - asynchronous main thread (the typical way it gets used now) - synchronous thread with a linked asynchronous helper thread The key differences between the latter and a traditional thread pool is that there'd only be the *one* helper thread for any given synchronous thread, and as long as the parent thread keeps its hands off any shared data structures while coroutines are running, you can still rely on async/await to interleave access to data structures shared by the coroutines. > The main feature of my counterproposal as I see it is that async code > should not call back into synchronous code, IOW once you are writing > coroutines, you have to use the coroutine API for everything you do. > And if something doesn't have a coroutine API, you run it in a > background thread using loop.run_in_executor(). > > So either you buy into the async way of living and it's coroutines all > the way down from there, no looking back -- or you stay on the safe > side of the fence, and you interact with coroutines only using a very > limited "remote manipulator" API. The two don't mix any better than > that. +1 I considered suggesting that the "remote manipulator" API could be spelled "await expr", but after starting to write that idea up, realised it was likely a recipe for hard-to-debug problems when folks forget to add the "async" declaration to a coroutine definition. So that would instead suggest 2 module level functions in asyncio: * call_in_background(coroutine_or_callable, *args, **kwds): - creates the helper thread if it doesn't already exist, stores a reference in a thread local variable - schedules coroutines directly in the helper thread's event loop - schedules other callables in the helper thread's executor - returns an asyncio.Future instance - perhaps lets the EventLoopPolicy override this default behaviour? * wait_for_result: - blocking call that waits for asyncio.Future.result() to be ready Using "call_in_background" from a coroutine would be OK, but somewhat redundant (as if a coroutine is already running, you could just use the current thread's event loop instead). Using "wait_for_result" from a coroutine would be inappropriate, as with any other blocking call. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rene at stranden.com Wed Oct 5 02:49:41 2016 From: rene at stranden.com (Rene Nejsum) Date: Wed, 5 Oct 2016 08:49:41 +0200 Subject: [Python-ideas] async objects In-Reply-To: <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> Message-ID: <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> > On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: > > On 04.10.2016 09:50, Stephen J. Turnbull wrote: >> As I understand the main point, Sven and Rene don't believe that [the >> kind of] async code [they want to write] should need any keywords; >> just start the event loop and invoke functions, and that somehow >> automatically DTRTs. > [reading my name second time] > > > I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. I agree, that?s why i proposed to put the async keyword in when creating the object, saying in this instance I want asynchronous communication with the object. > You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls. +1 > As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what Ren? and Anthony objects about. I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead. I think that the ?model? of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc) br /Rene > > Cheers, > Sven > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Wed Oct 5 03:31:49 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Oct 2016 08:31:49 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161005040940.GA23968@kundert.designers-guide.com> References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> Message-ID: On 5 October 2016 at 05:09, Ken Kundert wrote: > On Wed, Oct 05, 2016 at 03:07:42AM +1100, Steven D'Aprano wrote: >> >> Extra newlines are cheap. Writing >> > > The cost is paid in newlines *and* extra levels of indentation. No extra indentation if you ise "if not condition: continue" or refactor the condition into a custom iterable. Both of which have already been mentioned here as ways of achieving the desired result without a language change. > Why isn't it the programmer that is writing the code the best person to decide > what is best? Because the programmer writing the code isn't going to write and maintain the changes to the CPython/Jython/PyPy codebases, write the tests and documentation, support the questions that come from other users, etc...? More seriously, that argument could apply to *any* proposal. "Let the user decide whether to use the feature or not, and just add it". However, not all features get added precisely because someone has to make a cost/benefit judgement on any proposal and the people who do that are the CPython core devs. Discussion on this list is about thrashing out convincing arguments that will persuade the core devs - which is one of the reasons a lot of the core devs hang out here, to provide a sounding board on whether arguments are convincing or not. "Make the feature available and let the user decide if they want to use it" isn't a convincing argument. At best it could be a small part of a larger argument. It's countered by "does it make the language harder to teach having multiple ways of doing things?", "what about edge cases?" (in this case, trailing elses have been mentioned), "is there a well-known and easy workaround?", "no other languages (apart from Perl) seem to have this feature", ... and those issues need to be addressed in a full proposal. Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 5 05:20:34 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 5 Oct 2016 18:20:34 +0900 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161005040940.GA23968@kundert.designers-guide.com> References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> Message-ID: <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> Ken Kundert writes: > Why isn't it the programmer that is writing the code the best > person to decide what is best? Aside from what Paul said, there's a reason why this proposal is unlikely to attract support from senior devs. Python language design and style guides take the position that most code is read far more often than it is written. Unless there is an overriding advantage, recognized by the senior developers, Python will protect the reader by preferring syntax that keeps each control flow line simple in preference to saving the writer keystrokes, and even levels of indentation. Eg, there's no question in my mind that for i in range(m): for j in range (n): for k in range (p): m_out(i, k) += m_in1(i, j) * m_in2(j, k) is easier to read[1] than for i in range(m) for j in range (n) for k in range (p): m_out(i, k) += m_in1(i, j) * m_in2(j, k) despite costing two lines and two levels of indentation. YMMV, of course, but I suspect most senior devs will disagree with you. Footnotes: [1] It's also less painstaking to fix the bug. From steve at pearwood.info Wed Oct 5 07:19:11 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 5 Oct 2016 22:19:11 +1100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <20161005040940.GA23968@kundert.designers-guide.com> References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> Message-ID: <20161005111911.GH22471@ando.pearwood.info> On Tue, Oct 04, 2016 at 09:09:40PM -0700, Ken Kundert wrote: > On Wed, Oct 05, 2016 at 03:07:42AM +1100, Steven D'Aprano wrote: > > > > Extra newlines are cheap. Writing > > > > The cost is paid in newlines *and* extra levels of indentation. You've quoted me out of context -- I did also refer to extra indentation being cheap. At the point that it isn't any more, it is a code smell and you (that's generic you, not just you personally) should think hard about how the design of your code. > Why isn't it the programmer that is writing the code the best person to decide > what is best? Have you *seen* the quality of code written by the average coder? And remember, fifty percent of coders are worse than that. I jest, but only a bit. For better or worse, of course every programmer can set their own style, within the constraints of the language. If they cannot bear the language contraints, they're free to use a different language, or design their own. Anyone can be "the best person to decide" for their own private language. All languages have their own style, of what is or isn't allowed, what's encouraged and what's discouraged, and their own idiomatic way of doing things. The syntax constraints of the language depend on the language designer, not the programmers who use it. For some languages, those constraints are set by those who are appointed to sit on a standards board, usually driven by the corporations with the deepest pockets. Python, it is Guido and the core developers who set the boundaries of what coding styles can work in Python, and while the community can influence that, it doesn't control it. It isn't a wild free-for-all where every programmer is "the best person to decide". Some people might think that moving closer towards a Perl-ish one-liner culture by allowing (say): for x in seq for y in items if cond: block makes Python better ("saves some lines! saves some indents!"), but to those who like the discipline and structure of Python's existing loop syntax, this will make Python significantly worse. No decision can please everybody. -- Steve From rene at stranden.com Wed Oct 5 07:29:36 2016 From: rene at stranden.com (Rene Nejsum) Date: Wed, 5 Oct 2016 13:29:36 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: > On 04 Oct 2016, at 18:15, Guido van Rossum wrote: > > On Tue, Oct 4, 2016 at 4:30 AM, Nick Coghlan wrote: >> class SomeClass(object): >> def some_sync_method(self): >> return 42 >> async def some_async_method(self): >> await asyncio.sleep(3) >> return 42 >> >> o = auto_schedule(SomeClass()) # Indicating that the user wants an >> async version of the object >> r1 = o.some_sync_method() # Automatically run in a background thread >> r2 = o.some_async_method() # Automatically scheduled as a coroutine >> print(run_in_foreground(r1)) >> print(run_in_foreground(r2)) > > So maybe r1 and r2 are just concurrent.futures.Futures, and > run_in_foreground(r) wraps r.result(). And auto_schedule() is a proxy > that turns all method calls into async calls with a (concurrent) > Future to wait for the result. There's an event loop somewhere that > sits idle except when you call run_in_foreground() on somethong; it's > only used for the async methods, since the sync methods run in a > background thread (pool, I hope). Or perhaps r2 is an asyncio.Future > and run_in_foreground(r2) wraps loop.run_until_complete(r2). I suppose > the event loop should also be activated when waiting for r1, so maybe > r1 should be an asyncio Future that wraps a concurrent Future (using > asyncio.wrap_future(), which can do just that thing). > > Honestly it feels like many things can go wrong with this API model, > esp. you haven't answered what should happen when a method of > SomeClass (either a synchronous one or an async one) calls > run_in_foreground() on something -- or, more likely, calls some > harmless-looking function that calls another harmless-looking function > that calls run_in_foreground(). At that point you have pre-emptive > scheduling back in play (or your coroutines may be blocked > unnecessarily) and I think you have nothing except a more complicated > API to work with threads. I am a little out on deep water here, but I think that if an object instance was guaranteed - by Python runtime - to run in one coroutine/thread and only the message passing of method call and return values was allowed to pass between coroutine/thread context, then at least all local instance variable reference would be fine? > I think I am ready to offer a counterproposal where the event loop > runs in one thread and synchronous code runs in another thread and we > give the synchronous code a way to synchronously await a coroutine or > an asyncio.Future. This can be based on > asyncio.run_coroutine_threadsafe(), which takes a coroutine or an > asyncio.Future and returns a concurrent Future. (It also takes a loop, > and it assumes that loop runs in a different thread. I think it should > assert that.) > > The main feature of my counterproposal as I see it is that async code > should not call back into synchronous code, IOW once you are writing > coroutines, you have to use the coroutine API for everything you do. > And if something doesn't have a coroutine API, you run it in a > background thread using loop.run_in_executor(). > > So either you buy into the async way of living and it's coroutines all > the way down from there, no looking back -- or you stay on the safe > side of the fence, and you interact with coroutines only using a very > limited "remote manipulator" API. The two don't mix any better than > that. Maybe not, but I am hoping for something better :-) > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From sylvain.desodt at gmail.com Wed Oct 5 08:16:45 2016 From: sylvain.desodt at gmail.com (Sylvain Desodt) Date: Wed, 5 Oct 2016 14:16:45 +0200 Subject: [Python-ideas] Improve error message when missing 'self' in method definition Message-ID: Hi all, A bit of shameless self-promotion but in case anyone interested, a while ago, I had started to work on a project to improve error message. In case anyone's interested, you can found everything at: https://github.com/SylvainDe/DidYouMean-Python . It can be invoked in different ways, one of them being a hook. For instance, you'd get something like: >>> import didyoumean_api >>> didyoumean_api.didyoumean_enablehook() >>> math.pi Traceback (most recent call last): File "", line 1, in NameError: name 'math' is not defined*. Did you mean to import math first?* There is still a lot to be done (and the main thing would be to make it pip installable) but it may be useful if the improved error messages do not make it to the CPython interpreter. Regards Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 5 09:11:50 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 5 Oct 2016 22:11:50 +0900 Subject: [Python-ideas] async objects In-Reply-To: <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <22516.64534.772885.995952@turnbull.sk.tsukuba.ac.jp> Rene Nejsum writes: > On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: > > I don't think that's actually what I wanted here. One simple > > keyword should have sufficed just like golang did. So, the > > developer gets a way to decide whether or not he needs it > > blocking or nonblocking **when using a function**. He doesn't > > need to decide it **when writing the function**. > > I agree, I don't believe it's true, but suppose it is. *You don't need syntactic support* (a keyword) for it. Do you? It can all be done conveniently and readably with functions, as you have proved yourself with pyworks and Sven has with xfork, not to forget greenlets and gevent. No? You could argue that coroutines don't require syntax (keywords) either, but some Very Smart People disagree. I don't understand PEP 492's implementation well, but pretty clearly there are blockers to allowing ordinary __next__ methods doing async calls. There's also the issue mentioned in PEP 3153 that generators don't fit the notion of (self-actuated) producers "pushing" values into other code; they're really about having values pulled out of them. So PEPs 3156 and 492 are actual extensions to Python's capabilities for compact, readable expression of [a specific idiom/model of] asynchronous execution. They aren't intended for all possible models, just to help with one that is important to a fairly large class of Python programmers. > I think that the model of doing async should be defined in the > Python language/runtime (like in Go, Erlang, ABCL) . Why be restrictive? Python already supports many models of concurrency, pretty much filling the space (parallel execution vs. coroutines, shared-state vs. isolated, cooperative vs. preemptive, perhaps there are other dimensions). Why go backward from where we already are? From srkunze at mail.de Wed Oct 5 09:27:16 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 5 Oct 2016 15:27:16 +0200 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> Message-ID: <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> On 05.10.2016 11:20, Stephen J. Turnbull wrote: > Eg, there's no question in my mind that > > for i in range(m): > for j in range (n): > for k in range (p): > m_out(i, k) += m_in1(i, j) * m_in2(j, k) > > is easier to read[1] than > > for i in range(m) for j in range (n) for k in range (p): > m_out(i, k) += m_in1(i, j) * m_in2(j, k) > > despite costing two lines and two levels of indentation. YMMV, of > course, but I suspect most senior devs will disagree with you. I agree with you on this when it comes to long-living production code. For small scripts this is still useful. Not everybody writes huge programs, which needs to adhere to style guides and QA. Cheers, Sven From p.f.moore at gmail.com Wed Oct 5 09:40:39 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Oct 2016 14:40:39 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> Message-ID: On 5 October 2016 at 14:27, Sven R. Kunze wrote: > For small scripts this is still useful. Not everybody writes huge programs, > which needs to adhere to style guides and QA. Sure. But convenience in small scripts and the REPL typically isn't a good enough argument to justify a language change. It's (to an extent) a point in favour of the proposal, but nobody's debating that. The problem is that we aren't seeing any *other* arguments in favour. And specifically not ones that provide benefits (or at least no disadvantages - "you don't need to use it" isn't enough, someone will be bound to try to and it'll have to be dealt with, hopefully in code review but maybe in maintenance) to large-scale production code, which is probably the vast majority of Python usage[1]. Let's take "it helps interactive use and quick scripts" as a given, and move on. Any other benefits? Readability has been demonstrated as subjective, so let's skip that. Paul [1] Although these days, data analysis / interactive exploration may be a growing proportion... From ncoghlan at gmail.com Wed Oct 5 12:06:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Oct 2016 02:06:21 +1000 Subject: [Python-ideas] async objects In-Reply-To: <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: On 5 October 2016 at 16:49, Rene Nejsum wrote: >> On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: >> I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. > > I agree, that?s why i proposed to put the async keyword in when creating the object, saying in this instance I want asynchronous communication with the object. OK, I think there may be a piece of foundational knowledge regarding runtime design that's contributing to the confusion here. Python's core runtime model is the C runtime model: threads (with a local stack and access to a global process heap) and processes (which contain a heap and one or more threads). Anything else we do (whether it's generators, coroutines, or some other form of paused execution like callback management) gets layered on top of that runtime model. When folks ask questions like "Why can't Python be more like Go?", "Why can't Python be more like Erlang?", or "Why can't Python be more like Rust?" and get a negative response, it's usually because there's an inherent conflict between the C runtime model and whatever piece of the Go/Erlang/Rust runtime model we want to steal. So the "async" keyword in "async def", "async for" and "async with" is essentially a marker saying "This is not a C-like runtime concept anymore!" (The closest C-ish equivalent I'm aware of would be Apple's Grand Central Dispatch in Objective-C and that shows many of the async/await characteristics also seen in Python and C#: https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 ) Go (as with Erlang before it) avoided these problems by not providing C-equivalent functions in the first place. Accordingly, *every* normal function defined in Go can also be used as a goroutine, rather than needing to be a distinct type - their special case is defining functions that interoperate with external C libraries. Python (along with other languages built on the C runtime model like C# and Objective-C) doesn't have that luxury - we need to distinguish coroutines from regular functions, since we can't just handle them according to the underlying C runtime model any more. Guido's idea of a shadow thread to let synchronous threads run coroutines without needing to actually run a foreground event loop should provide a manageable way of getting the two runtime models (traditional C and asynchronous coroutines) to play nicely together in a single application, and has the virtue of being something folks can readily experiment with for themselves before we commit to anything specific in the standard library (since all the building blocks of thread local storage, event loop management, and inter-thread message passing primitives are already available). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Oct 5 12:26:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Oct 2016 02:26:21 +1000 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> Message-ID: On 5 October 2016 at 23:40, Paul Moore wrote: > On 5 October 2016 at 14:27, Sven R. Kunze wrote: >> For small scripts this is still useful. Not everybody writes huge programs, >> which needs to adhere to style guides and QA. > > Sure. But convenience in small scripts and the REPL typically isn't a > good enough argument to justify a language change. A useful rule of thumb: if a proposed syntax change would be accompanied by a PEP 8 addition that says "Never use this in code you expect anyone else to read or have to maintain", it's not going to be approved. If a change requires a PEP 8 update *at all*, that's a significant mark against it, since it provides clear evidence that the addition is increasing the cognitive burden of the language by requiring devs to make syntactic decisions that aren't related to how they're modeling the particular problem they're trying to solve. For purely local use, folks also have a lot more freedom to adopt Python supersets that may limit code shareability, but make what they write more amenable to them personally without losing access to the rest of the Python ecosystem. Hylang shows that that freedom goes at least as far as "I'd really prefer to be writing in LISP". Project Jupyter's "!" notation and xon.sh both show that it's possible to integrate easier access to the system shell into a Python-like language. Compared to those, locally modifying the token stream to inject ": INDENT" pairs when the if and for keywords are encountered between an opening "for" keyword and a closing ":" keyword would be a relatively straightforward change that only impacted folks that decided they preferred that particular flavour of Abbreviated Python to the regular version. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Wed Oct 5 13:13:52 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Oct 2016 18:13:52 +0100 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> Message-ID: On 5 October 2016 at 17:26, Nick Coghlan wrote: > Compared to those, locally modifying the token stream to inject ": > INDENT" pairs when the if and for keywords are encountered between an > opening "for" keyword and a closing ":" keyword would be a relatively > straightforward change that only impacted folks that decided they > preferred that particular flavour of Abbreviated Python to the regular > version. It's also worth noting that the obvious response "but I don't want to have to run a preprocessor against my code" is another indication that this isn't solving a significant enough problem to warrant a language change. Again, this isn't a hard and fast rule, but it is a useful rule of thumb - how much effort are you willing to go to to get this feature without it being built in? That's one of the reasons "it should be made into a module on PyPI" is a useful counter to proposals for new stdlib functions. It's also worth looking at the cases where things get added despite not going via that route - sometimes "being built in" is an important benefit of itself. But typically that's because people are encouraged to use built in facilities, so guiding beginners (or not-so-beginners) into good practice is important. In this case, it's far from clear that the feature is actually good practice. Paul From lisaroach14 at gmail.com Wed Oct 5 13:17:35 2016 From: lisaroach14 at gmail.com (Lisa Roach) Date: Wed, 5 Oct 2016 10:17:35 -0700 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: +1 I've definitely seen a lot of new users make this error, an improved message could go a long way. On Tue, Oct 4, 2016 at 1:52 PM, Nathan Goldbaum wrote: > Hi all, > > Recently pypy received a patch that improves the error message one gets > when 'self' is missing in a method's signature: > > https://mail.python.org/pipermail/pypy-dev/2016-September/014678.html > > Here are the commits that implement the change in pypy: > > https://bitbucket.org/pypy/pypy/commits/all?search= > branch(better-error-missing-self) > > I'm curious whether a similar improvement would also be received well in > CPython. In particular, this guides one to the correct solution for a > common programming mistake made by newcomers (and even not-so-newcomers). > > Here is a case study that found this was a common source of errors for > newcomers: > > http://dl.acm.org/citation.cfm?id=2960327 > > -Nathan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 5 13:29:43 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Oct 2016 18:29:43 +0100 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On 5 October 2016 at 18:17, Lisa Roach wrote: > +1 > > I've definitely seen a lot of new users make this error, an improved message > could go a long way. I'm not a new user by any means, and I still regularly make this mistake. Because I've got the experience, I recognise the error when I see it, but that's not much help for people who haven't already made the mistake hundreds of times :-) +1 on improving the message. Paul From stephanh42 at gmail.com Wed Oct 5 14:09:17 2016 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 5 Oct 2016 20:09:17 +0200 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: +? Another long-time user here who occasionally still makes this mistake. Stephan 2016-10-05 19:29 GMT+02:00 Paul Moore : > On 5 October 2016 at 18:17, Lisa Roach wrote: > > +1 > > > > I've definitely seen a lot of new users make this error, an improved > message > > could go a long way. > > I'm not a new user by any means, and I still regularly make this > mistake. Because I've got the experience, I recognise the error when I > see it, but that's not much help for people who haven't already made > the mistake hundreds of times :-) > > +1 on improving the message. > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Wed Oct 5 14:23:21 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 5 Oct 2016 20:23:21 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> Message-ID: <35bdc9e7-9af6-bf7c-fe59-4d218b969527@gmail.com> On the other hand, await / async is a fantastic interface to unify all concurrent paradigms and asyncio already provide a bridge with threads and subprocess. So it kinda make sense. Le 04/10/2016 ? 18:40, Sven R. Kunze a ?crit : > On 04.10.2016 13:30, Nick Coghlan wrote: >> What it *doesn't* do, and what you need greenlet for, is making that >> common interface look like you're using plain old synchronous C >> threads. >> >> If folks really want to do that, that's fine - they just need to add >> gevent/greenlet as a dependency, just as the folks that don't like the >> visibly object-oriented nature of the default unittest and logging >> APIs will often opt for third party alternative APIs that share some >> of the same underlying infrastructure. > > Maybe, this is all a big misunderstanding. > > asyncio is incompatible with regular execution flow and it's **always > blocking**. However, asyncio is perceived by some of us (including me) > as a shiny alternative to processes and threads but really isn't. I > remember doing this survey on python-ideas (results here: > https://srkunze.blogspot.de/2016/02/concurrency-in-python.html) but I > get the feeling that we still miss something. > > My impression is that asyncio shall be used for something completely > different than dropping off things into a background worker. But looking > at the cooking example given by Steve Dower (cf. blog post), at other > explanations, at examples in the PEPs, it just seems to me that his > analogy could have been made with threads and processes as well. > > At its core (the ASYNC part), asyncio is quite similar to threads and > processes. But its IO-part seem to drive some (design) decisions that > don't go well with the existing mental model of many developers. Even > PEP-reviewers are fooled by simple asyncio examples. Why? Because they > forget to spawn an eventloop. "async def and await" are just useless > without an eventloop. And maybe that's what's people frustration is > about. They want the ASYNC part without worrying about the IO part. > > Furthermore, adding 2 (TWO) new keywords to a language has such an > immense impact. Especially when those people are told "the barrier for > new keywords is quite high!!". So, these new keywords must mean something. > > > I think what would help here are concrete answers to: > > 0) Is asyncio a niche feature only be used for better IO? > 1) What is the right way of integrating asyncio into existing code? > 2) How do we intend to solve the DRY-principle issue? > > If the answer is "don't use asyncio", that's a fine result but honestly > I think it would be just insane to assume that we got all these > features, all this work and all those duplicated functions all for > nothing. I can't believe that. So, I am still looking for a reasonable > use-case of asyncio in our environment. > > Cheers, > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From desmoulinmichel at gmail.com Wed Oct 5 14:27:46 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 5 Oct 2016 20:27:46 +0200 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: +1. Python does need better error messages. This and the recent new import exception will really help. Will feature freeze prevent this to get into 3.6 if some champion it? I also really like https://github.com/SylvainDe/DidYouMean-Python and as a trainer, will use it in my next training sessions. Le 05/10/2016 ? 20:09, Stephan Houben a ?crit : > +? > > Another long-time user here who occasionally still makes this mistake. > > Stephan > > 2016-10-05 19:29 GMT+02:00 Paul Moore >: > > On 5 October 2016 at 18:17, Lisa Roach > wrote: > > +1 > > > > I've definitely seen a lot of new users make this error, an improved message > > could go a long way. > > I'm not a new user by any means, and I still regularly make this > mistake. Because I've got the experience, I recognise the error when I > see it, but that's not much help for people who haven't already made > the mistake hundreds of times :-) > > +1 on improving the message. > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Wed Oct 5 14:34:12 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 6 Oct 2016 05:34:12 +1100 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On Thu, Oct 6, 2016 at 5:27 AM, Michel Desmoulin wrote: > +1. Python does need better error messages. This and the recent new import > exception will really help. > > Will feature freeze prevent this to get into 3.6 if some champion it? > Given that it's not changing semantics at all, just adding info/hints to an error message, it could well be added in a point release. +1 on any feature that helps people to debug code. This doesn't look overly spammy or anything, and it's easy for someone coming from C++ to forget to include that key parameter. ChrisA From nathan12343 at gmail.com Wed Oct 5 14:50:44 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 5 Oct 2016 13:50:44 -0500 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On Wed, Oct 5, 2016 at 1:27 PM, Michel Desmoulin wrote: > +1. Python does need better error messages. This and the recent new import > exception will really help. > > Will feature freeze prevent this to get into 3.6 if some champion it? > Speaking of, I'm not much of a C hacker, and messing with CPython internals is a little daunting. If anyone wants to take this on, you have my blessing. I also may take a shot at implementing this idea in the next couple weeks when I have some time. > > I also really like https://github.com/SylvainDe/DidYouMean-Python and as > a trainer, will use it in my next training sessions. > > Le 05/10/2016 ? 20:09, Stephan Houben a ?crit : > >> +? >> >> Another long-time user here who occasionally still makes this mistake. >> >> Stephan >> >> 2016-10-05 19:29 GMT+02:00 Paul Moore > >: >> >> On 5 October 2016 at 18:17, Lisa Roach > > wrote: >> > +1 >> > >> > I've definitely seen a lot of new users make this error, an >> improved message >> > could go a long way. >> >> I'm not a new user by any means, and I still regularly make this >> mistake. Because I've got the experience, I recognise the error when I >> see it, but that's not much help for people who haven't already made >> the mistake hundreds of times :-) >> >> +1 on improving the message. >> Paul >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 5 14:55:35 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 5 Oct 2016 14:55:35 -0400 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On 2016-10-05 2:50 PM, Nathan Goldbaum wrote: > On Wed, Oct 5, 2016 at 1:27 PM, Michel Desmoulin > wrote: > >> +1. Python does need better error messages. This and the recent new import >> exception will really help. >> >> Will feature freeze prevent this to get into 3.6 if some champion it? >> > Speaking of, I'm not much of a C hacker, and messing with CPython internals > is a little daunting. If anyone wants to take this on, you have my > blessing. I also may take a shot at implementing this idea in the next > couple weeks when I have some time. It would help if you could create an issue and write exhaustive unittests (or at least specifying how exactly the patch should work for all corner cases). Someone with the knowledge of CPython internals will later add the missing C code to the patch. Yury From levkivskyi at gmail.com Wed Oct 5 15:02:50 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 5 Oct 2016 21:02:50 +0200 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On 5 October 2016 at 20:55, Yury Selivanov wrote: > > Speaking of, I'm not much of a C hacker, and messing with CPython internals >> is a little daunting. If anyone wants to take this on, you have my >> blessing. I also may take a shot at implementing this idea in the next >> couple weeks when I have some time. >> > > It would help if you could create an issue and write exhaustive unittests > (or at least specifying how exactly the patch should work for all corner > cases). Someone with the knowledge of CPython internals will later add the > missing C code to the patch. > > Yury > > I agree with Yury here. There are corner cases (like what to do with classmethods etc). If behaviour for all of them are specified, it would be quite straightforward to implement this. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Oct 5 15:03:08 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 5 Oct 2016 21:03:08 +0200 Subject: [Python-ideas] async objects In-Reply-To: <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: On 05.10.2016 08:49, Rene Nejsum wrote: > >> As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what Ren? and Anthony objects about. > I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead. Thanks. :) > I think that the ?model? of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc) That's the way I see it as well. The Python language is extremely high-level. So, I guess in most cases, most people would just use the default implementation. Cheers, Sven From srkunze at mail.de Wed Oct 5 15:06:17 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 5 Oct 2016 21:06:17 +0200 Subject: [Python-ideas] async objects In-Reply-To: <35bdc9e7-9af6-bf7c-fe59-4d218b969527@gmail.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <35bdc9e7-9af6-bf7c-fe59-4d218b969527@gmail.com> Message-ID: <382eb6b5-02a5-951c-fd80-0bdc57b6c400@mail.de> On 05.10.2016 20:23, Michel Desmoulin wrote: > On the other hand, await / async is a fantastic interface to unify all > concurrent paradigms and asyncio already provide a bridge with threads > and subprocess. So it kinda make sense. Almost if it would not require duplicate pieces of code. But maybe, we are wrong and there won't be any duplications. Cheers, Sven From srkunze at mail.de Wed Oct 5 15:06:51 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 5 Oct 2016 21:06:51 +0200 Subject: [Python-ideas] xfork [was Re: async objects] In-Reply-To: <57F47D88.6050503@stoneleaf.us> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <57F47D88.6050503@stoneleaf.us> Message-ID: <92eb7004-190a-8007-51d8-9a5bfe63c227@mail.de> On 05.10.2016 06:11, Ethan Furman wrote: > On 10/04/2016 09:40 AM, Sven R. Kunze wrote: > >> As a result of past discussions, I wrote the module "xfork" >> which basically does this "golang goroutine" stuff. It's just >> a thin wrapper around "futures" but it allows to avoid that >> what Ren? and Anthony objects about. > > Looks cool! Thanks! You're welcome. :) Cheers, Sven From elazarg at gmail.com Wed Oct 5 15:15:08 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 05 Oct 2016 19:15:08 +0000 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: Isn't it possible to implement it as a pure Python exception hook? On Wed, Oct 5, 2016 at 10:04 PM Ivan Levkivskyi wrote: > > On 5 October 2016 at 20:55, Yury Selivanov > wrote: > > > Speaking of, I'm not much of a C hacker, and messing with CPython internals > is a little daunting. If anyone wants to take this on, you have my > blessing. I also may take a shot at implementing this idea in the next > couple weeks when I have some time. > > > It would help if you could create an issue and write exhaustive unittests > (or at least specifying how exactly the patch should work for all corner > cases). Someone with the knowledge of CPython internals will later add the > missing C code to the patch. > > Yury > > > I agree with Yury here. There are corner cases (like what to do with > classmethods etc). If behaviour for all of them are specified, it would be > quite straightforward to implement this. > > -- > Ivan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Oct 5 15:20:10 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 5 Oct 2016 21:20:10 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: On 05.10.2016 18:06, Nick Coghlan wrote: > [runtime matters] I think I understand your point. I also hope that others and me could provide you with our perspective. We see Python not as a C-like runtime but as an abstract modelling language. I know that it's different from the point of view of CPython internals, however from the outside Python suggests to be much more than a simple wrapper around C. Just two different perspectives. Unfortunately, your runtime explanations still don't address the DRY issue. :-/ > Guido's idea of a shadow thread to let synchronous threads run > coroutines without needing to actually run a foreground event loop > should provide a manageable way of getting the two runtime models > (traditional C and asynchronous coroutines) to play nicely together in > a single application, and has the virtue of being something folks can > readily experiment with for themselves before we commit to anything > specific in the standard library (since all the building blocks of > thread local storage, event loop management, and inter-thread message > passing primitives are already available). I needed to think about this further when Guido mentioned it. But I like it now. If you check https://github.com/srkunze/fork/tree/asyncio , I already started working on integrating asyncio into xfork at long time ago. But I still couldn't wrap my mind around it and it stalled. But IIRC, I would have implemented a shadow thread solution as well. So, if his idea goes into the stdlib first, I welcome it even more as it would do the heavy lifting for me. xfork would then be just a common interface to threads, processes and coroutines. Cheers, Sven From ethan at stoneleaf.us Wed Oct 5 15:51:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 05 Oct 2016 12:51:47 -0700 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <57F559D3.7090901@stoneleaf.us> On 10/05/2016 12:20 PM, Sven R. Kunze wrote: > On 05.10.2016 18:06, Nick Coghlan wrote: >> Guido's idea of a shadow thread to let synchronous threads run >> coroutines without needing to actually run a foreground event >> loop should provide a manageable way of getting the two runtime >> models (traditional C and asynchronous coroutines) to play >> nicely together in a single application, and has the virtue of >> being something folks can readily experiment with for themselves >> before we commit to anything specific in the standard library >> (since all the building blocks of thread local storage, event >> loop management, and inter-thread message passing primitives are >> already available). > > I needed to think about this further when Guido mentioned it. But > I like it now. > > If you check https://github.com/srkunze/fork/tree/asyncio , I > already started working on integrating asyncio into xfork at long > time ago. But I still couldn't wrap my mind around it and it > stalled. But IIRC, I would have implemented a shadow thread > solution as well. So, if his idea goes into the stdlib first, I > welcome it even more as it would do the heavy lifting for me. xfork > would then be just a common interface to threads, processes and > coroutines. At this point I'm willing to bet that you (Sven) are closest to actually having a shadow thread thingy that actually works. Maybe some other asyncio folks would be willing to help you develop it? -- ~Ethan~ From rene at stranden.com Wed Oct 5 16:28:02 2016 From: rene at stranden.com (Rene Nejsum) Date: Wed, 5 Oct 2016 22:28:02 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> > On 05 Oct 2016, at 21:20, Sven R. Kunze wrote: > > On 05.10.2016 18:06, Nick Coghlan wrote: >> [runtime matters] > > I think I understand your point. > > I also hope that others and me could provide you with our perspective. We see Python not as a C-like runtime but as an abstract modelling language. I know that it's different from the point of view of CPython internals, however from the outside Python suggests to be much more than a simple wrapper around C. Just two different perspectives. Excellent point. For me CPython, Jython, IronPython, PyPy are the same (99.9%) and the important part is the Python the language. For a long time I tested PYWORKS again all implementations and were happy that it ran on all. Clearly, for others CPython (incl. runtime and C-bindings) is the fact and the others are far from the same, especially because the missing C-integration. But, are the runtimes for Python and Erlang that fundamentally different? Is it Python?s tight integration with C that is the big difference? When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto. > Unfortunately, your runtime explanations still don't address the DRY issue. :-/ > >> Guido's idea of a shadow thread to let synchronous threads run >> coroutines without needing to actually run a foreground event loop >> should provide a manageable way of getting the two runtime models >> (traditional C and asynchronous coroutines) to play nicely together in >> a single application, and has the virtue of being something folks can >> readily experiment with for themselves before we commit to anything >> specific in the standard library (since all the building blocks of >> thread local storage, event loop management, and inter-thread message >> passing primitives are already available). > > I needed to think about this further when Guido mentioned it. But I like it now. > > If you check https://github.com/srkunze/fork/tree/asyncio , I already started working on integrating asyncio into xfork at long time ago. But I still couldn't wrap my mind around it and it stalled. But IIRC, I would have implemented a shadow thread solution as well. So, if his idea goes into the stdlib first, I welcome it even more as it would do the heavy lifting for me. xfork would then be just a common interface to threads, processes and coroutines. xfork (as pyworks) implements a proxy object, which ?almost? behaves like the real object, but it is still a proxy. If fork (or spawn, chan, async, whatever.) was a part of the language it would be more clean. br /Rene > > Cheers, > Sven > From p.f.moore at gmail.com Wed Oct 5 17:02:52 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Oct 2016 22:02:52 +0100 Subject: [Python-ideas] async objects In-Reply-To: <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> Message-ID: On 5 October 2016 at 21:28, Rene Nejsum wrote: > But, are the runtimes for Python and Erlang that fundamentally different? Is it Python?s tight integration with C that is the big difference? I don't know *that* much about Erlang, but Python's model is that of a single shared address space with (potentially multiple) threads of code running, having access to that address space. Erlang's model is that of multiple threads of execution (processes) that are isolated from each other (they have independent address spaces). That's a pretty fundamental difference, and gets right to the heart of why async is fundamentally different in the two languages. It also shows in Erlang's C FFI, which as I understand it is to have the C code isolated in a separate "process", and the user's program communicating with it through channels. As far as I can see, that's a direct consequence of the fact that you couldn't safely expect to call a C function (with its direct access to the whole address space) direct from an Erlang process. Python's model is very similar to C (and Java, and C#/.net, and many other "traditional" languages [1]). That's not "to make it easier to call C functions", it's just because it was a familiar and obvious model to use, known to work well, when Python was first developed. The fact that it made calling C from Python easy was a side effect - one that helped make Python as useful and popular as it is today, but nevertheless simply a side effect of the model. Paul [1] And actual computer hardware, which isn't a coincidence :-) From greg.ewing at canterbury.ac.nz Wed Oct 5 17:34:24 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 06 Oct 2016 10:34:24 +1300 Subject: [Python-ideas] if-statement in for-loop In-Reply-To: References: <20161004160740.GF22471@ando.pearwood.info> <20161005040940.GA23968@kundert.designers-guide.com> <22516.50658.518426.23244@turnbull.sk.tsukuba.ac.jp> <7055ab3b-f215-6b8e-cf5a-95de19c0fc3b@mail.de> Message-ID: <57F571E0.8030303@canterbury.ac.nz> Paul Moore wrote: > It's also worth noting that the obvious response "but I don't want to > have to run a preprocessor against my code" is another indication that > this isn't solving a significant enough problem to warrant a language > change. There are valid reasons for disliking preprocessors other than "I can't be bothered running it". For example, errors tend to get reported with reference to the post-processed code rather than the original source, making debugging difficult. -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 5 17:40:49 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 06 Oct 2016 10:40:49 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> Message-ID: <57F57361.2080001@canterbury.ac.nz> Paul Moore wrote: > I don't know *that* much about Erlang, but Python's model is that of a > single shared address space with (potentially multiple) threads of > code running, having access to that address space. I don't know much about Erlang either, but from what I gather, it's a functional language. That removes a lot of potential problems with concurrency right from the beginning. You can't have trouble with mutation of shared state if you can't mutate state in the first place. :-) -- Greg From steve at pearwood.info Wed Oct 5 19:34:48 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 6 Oct 2016 10:34:48 +1100 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: <20161005233448.GI22471@ando.pearwood.info> On Wed, Oct 05, 2016 at 09:02:50PM +0200, Ivan Levkivskyi wrote: > I agree with Yury here. There are corner cases (like what to do with > classmethods etc). If behaviour for all of them are specified, it would be > quite straightforward to implement this. I don't know... there's a lot of corner cases and I don't think we can improve them all. Here's the suggested exception from PyPy: TypeError: f() takes exactly 1 argument (2 given). Did you forget 'self' in the function definition? What happens if f() takes a mix of positional and keyword arguments? What if it takes arbitrary positional arguments? There are also classmethods, staticmethods, and any arbitrary descriptor. And don't forget ordinary functions too. How will this affect them? Before accepting this patch, I think that we need to ensure that it improves at least *some* cases (not necessarily all) while it does not make any of the remaining cases worse. I wonder whether an alternate approach might be better. Instead of trying to guess whether the method signature is wrong when the method is called, maybe the default metaclass (type) could introspect the namespace, inspect each callable in the namespace, and raise a warning (not an error) if the first argument is not `self` (for regular methods) or `cls` (for classmethods). It's not that self is mandatory, but it's conventional, and if we're going to guess that the name ought to be `self` at method call time, maybe we should guess that the name should be `self` when we build the class. -- Steve From yselivanov.ml at gmail.com Wed Oct 5 19:43:40 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 5 Oct 2016 19:43:40 -0400 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: <20161005233448.GI22471@ando.pearwood.info> References: <20161005233448.GI22471@ando.pearwood.info> Message-ID: <11797741-8703-edfd-054a-611d7085e172@gmail.com> On 2016-10-05 7:34 PM, Steven D'Aprano wrote: > On Wed, Oct 05, 2016 at 09:02:50PM +0200, Ivan Levkivskyi wrote: > >> >I agree with Yury here. There are corner cases (like what to do with >> >classmethods etc). If behaviour for all of them are specified, it would be >> >quite straightforward to implement this. > I don't know... there's a lot of corner cases and I don't think we can > improve them all. > > Here's the suggested exception from PyPy: > > TypeError: f() takes exactly 1 argument (2 given). Did you forget > 'self' in the function definition? > > > What happens if f() takes a mix of positional and keyword arguments? > What if it takes arbitrary positional arguments? > > There are also classmethods, staticmethods, and any arbitrary > descriptor. And don't forget ordinary functions too. How will this > affect them? We can implement this only for bound methods. Yury From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Oct 6 01:15:49 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 6 Oct 2016 14:15:49 +0900 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > Python's core runtime model is the C runtime model: threads (with a > local stack and access to a global process heap) and processes (which > contain a heap and one or more threads). Anything else we do (whether > it's generators, coroutines, or some other form of paused execution > like callback management) gets layered on top of that runtime model. > When folks ask questions like "Why can't Python be more like Go?", > "Why can't Python be more like Erlang?", or "Why can't Python be more > like Rust?" and get a negative response, it's usually because there's > an inherent conflict between the C runtime model and whatever piece of > the Go/Erlang/Rust runtime model we want to steal. How can there be a conflict between Python implementing the C runtime model *itself* which says "you can do anything anywhere anytime", and some part of Python implementing the more restricted models that allow safe concurrency? If you can do anything, well, you can voluntarily submit to compiler discipline to a restricted set. No? So it must be that the existing constructions (functions, for, with) that need an "async" marker have an implementation that is itself unsafe. This need is not being explained very well. What is also not being explained is what would be lost by simply using the "safe" implementations generated by the async versions everywhere. These may be hard to explain, and I know you, Yury, and Guido are very busy. But it's frustrating for all to see this go around in a circle: "it's like it is because it has to be that way, so that's the way it is". There's also the question of "is async/await really a language feature, or is it patching up a deficiency in the CPython implementation that other implementations don't necessarily have?" (which has been brought up before, in less contentious terms). > So the "async" keyword in "async def", "async for" and "async with" is > essentially a marker saying "This is not a C-like runtime concept > anymore!" That's understood, of course. The question that isn't being answered well is "why can't that non-C-like runtime concept be like Go or Erlang or Rust?" Or, less obtusely, "what exactly is the 'async' runtime concept, and why is it preferred to the concepts implemented by Go or Erlang or Rust or gevent or greenlets or Stackless?" I guess the answer to "why not Stackless?" is buried in the archives for Python-Dev somewhere, but I need to get back to $DAYJOB, maybe I'll look it up later. From rene at stranden.com Thu Oct 6 02:43:50 2016 From: rene at stranden.com (Rene Nejsum) Date: Thu, 6 Oct 2016 08:43:50 +0200 Subject: [Python-ideas] async objects In-Reply-To: <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Message-ID: <6BDAA71B-29C4-43FC-AC8C-B1877C9EFD6C@stranden.com> > On 06 Oct 2016, at 07:15, Stephen J. Turnbull wrote: > > Nick Coghlan writes: > >> Python's core runtime model is the C runtime model: threads (with a >> local stack and access to a global process heap) and processes (which >> contain a heap and one or more threads). Anything else we do (whether >> it's generators, coroutines, or some other form of paused execution >> like callback management) gets layered on top of that runtime model. >> When folks ask questions like "Why can't Python be more like Go?", >> "Why can't Python be more like Erlang?", or "Why can't Python be more >> like Rust?" and get a negative response, it's usually because there's >> an inherent conflict between the C runtime model and whatever piece of >> the Go/Erlang/Rust runtime model we want to steal. > > How can there be a conflict between Python implementing the C runtime > model *itself* which says "you can do anything anywhere anytime", and > some part of Python implementing the more restricted models that allow > safe concurrency? If you can do anything, well, you can voluntarily > submit to compiler discipline to a restricted set. No? So it must be > that the existing constructions (functions, for, with) that need an > "async" marker have an implementation that is itself unsafe. This > need is not being explained very well. What is also not being > explained is what would be lost by simply using the "safe" > implementations generated by the async versions everywhere. Agree, well put. The Erlang runtime (VM) is also written in C, so anything should be possible. I do not advocate that Python should be a ?new? Erlang or Go, just saying that since we are introducing some level of concurrency in Python that we look at some of the elegant ways others have achieved this and try to implement something like that in Python. > These may be hard to explain, and I know you, Yury, and Guido are very > busy. But it's frustrating for all to see this go around in a circle: > "it's like it is because it has to be that way, so that's the way it is?. I understand that there is a lot of backwards compatibility, especially in regards to the Python/C interface, but I think that it is possible to find an elegant solution to this. > There's also the question of "is async/await really a language > feature, or is it patching up a deficiency in the CPython > implementation that other implementations don't necessarily have?" > (which has been brought up before, in less contentious terms). > >> So the "async" keyword in "async def", "async for" and "async with" is >> essentially a marker saying "This is not a C-like runtime concept >> anymore!" > > That's understood, of course. The question that isn't being answered > well is "why can't that non-C-like runtime concept be like Go or > Erlang or Rust?" Or, less obtusely, "what exactly is the 'async' > runtime concept, and why is it preferred to the concepts implemented > by Go or Erlang or Rust or gevent or greenlets or Stackless?? This would be very interesting to understand. > I guess the answer to "why not Stackless?" is buried in the archives > for Python-Dev somewhere, but I need to get back to $DAYJOB, maybe > I'll look it up later. I will try to look for that, I have some time on my hands, not sure I have have the %BRAINSKILL, but never the less? br /Rene From njs at pobox.com Thu Oct 6 02:52:04 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 5 Oct 2016 23:52:04 -0700 Subject: [Python-ideas] async objects In-Reply-To: <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> Message-ID: On Wed, Oct 5, 2016 at 1:28 PM, Rene Nejsum wrote: > When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto. I think there's two fundamentally different layers getting conflated here, which is really confusing the issue. Layer 1 is the user API for concurrency. At this layer, there are two major options in current Python. The first option is the "implicit interleaving" model provided by classic threads, stackless, gevent, goroutines, etc., where as a user you write regular "serial" code + some calls to thread spawning primitives, and then the runtime magically arranges for multiple pieces of "serial" code to run in some kind of concurrent/parallel fashion. One downside of this approach is that because the runtime gets to arbitrarily decide how to interleave the execution of these different pieces of code, it can be difficult for the user to reason about interactions between them. So this motivated the second option for user APIs: the "explicit interleaving" model where as a user you annotate your code with some sort of marker saying where it's willing to be suspended (Python uses the "await" keyword), and then the runtime is restricted to only running one piece of code at a time, and only switching between them at these explicitly marked points. (The canonical reference on this is https://glyph.twistedmatrix.com/2014/02/unyielding.html) (I like to think about this as opt-out concurrency vs opt-in concurrency: the first model is concurrent by default except where you explicitly use a mutex; the second is serial by default except where you explicitly use "await".) So that's the user API level. Then there's Layer 2, the strategies that the runtime underneath uses to implement whichever semantics are in play. There are a lot of options here -- in particular, within the "implicit interleaving" model Python has existing production-ready implementations using OS level threads with a GIL (CPython's threading module), clever C stack manipulation tricks on a single OS level thread (gevent), OS level threads without a GIL (Jython's threading module), etc., etc. Picking between these is an implementation trade-off, not a language-level semantics trade-off -- from the point of view of the user API, they're pretty much interchangeable. ...And in principle you could also use any of these options to implement the "explicit interleaving" approach. For example, each coroutine could get assigned its own OS level thread, and then to get the 'await' semantics you could have a shared global lock that gets dropped when entering an 'await' and then re-acquired afterwards. This would be silly and inefficient compared to what asyncio actually does (it uses a single thread, like gevent), so no-one would do this. But my point is that at the user API level, again, these are just implementation details -- this would be a valid way to implement the async/await semantics. So what can we conclude from all this? First, if your goal is to write code that gets faster when you add more CPU cores, then that means you're looking for a particular implementation strategy: you want OS level threads, and no GIL. One way to do this would be to keep the Python language semantics the same, while modifying CPython's implementation to remove the GIL. This turns out to be really hard :-). But Jython demonstrates that the existing APIs are sufficient to make it possible -- the difficulties are in the CPython implementation, not in the language, so that's where it would need to be fixed. If someone wants to push this forward probably the thing to do is to see how Larry's "gilectomy" project is doing and help it along. Another strategy would be to come up with some new user API that can be added to the language, and whose semantics are more amenable to no-GIL-multithreading. There are lots of somewhat nascent ideas out there -- IIRC Eric's been thinking about using subinterpreters to add shared-nothing threads (versus the shared-everything threads which Python currently supports -- shared nothing is what Erlang does), there's Armin's experiments with STM in PyPy, there's PyParallel, etc. Nick has a good summary: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html#what-does-the-future-look-like-for-exploitation-of-multiple-cores-in-python But -- and this is the main point I've been leading up to -- async/await is *not* the new user-level API that you're looking for. Async/await were created to enable the "explicitly interleaved" style of programming, which as we saw above effectively takes the GIL and promotes it to becoming an explicit part of the user API, instead of an implementation detail of the runtime. This is the one and only reason async/await exist -- if you don't want to explicitly control where your code can switch "threads" and be guaranteed that no other code is running at the same time, then there is no reason to use async/await. So I think the objection to async/await on the grounds that they clutter up the code is based on a misunderstanding of what they're for. It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users. It's exactly the other way around. *If* you as a user want to add some explicit annotations to your code to control how parallel execution can be interleaved, *then* there has to be some keywords to write those annotations, and that's what async/await are. And OTOH if you *don't* want to have markers in your code to explicitly control interleaving -- if you prefer the "implicit interleaving" style -- then async/await are irrelevant and you shouldn't use them, you should use threading/gevent/whatever. -n -- Nathaniel J. Smith -- https://vorpus.org From greg.ewing at canterbury.ac.nz Thu Oct 6 03:45:42 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 06 Oct 2016 20:45:42 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> Message-ID: <57F60126.40103@canterbury.ac.nz> Nathaniel Smith wrote: > It wasn't that we created these keywords to solve some > implementation problem and then inflicted them on users. I disagree -- looking at the history of how we ended up with async/await, it looks to me like this is exactly what *did* happen. First we had generators. Then 'yield from' was invented to (among other things) leverage them as a way of getting lightweight threads. Then 'await' was introduced as a nicer way to spell 'yield from' when using it for that purpose. Saying that 'await' is good for you because it makes the suspension points visible seems to me a rationalisation after the fact. It was something that emerged from the implementation, not a prior design requirement. -- Greg From njs at pobox.com Thu Oct 6 05:15:59 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 6 Oct 2016 02:15:59 -0700 Subject: [Python-ideas] async objects In-Reply-To: <57F60126.40103@canterbury.ac.nz> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> Message-ID: On Thu, Oct 6, 2016 at 12:45 AM, Greg Ewing wrote: > Nathaniel Smith wrote: >> >> It wasn't that we created these keywords to solve some >> implementation problem and then inflicted them on users. > > > I disagree -- looking at the history of how we > ended up with async/await, it looks to me like > this is exactly what *did* happen. > > First we had generators. Then 'yield from' was > invented to (among other things) leverage them as > a way of getting lightweight threads. Then 'await' > was introduced as a nicer way to spell 'yield from' > when using it for that purpose. > > Saying that 'await' is good for you because it > makes the suspension points visible seems to me > a rationalisation after the fact. It was something > that emerged from the implementation, not a > prior design requirement. I wasn't trying to write a detailed account of the development, as much as try to capture some essential features. Myth, not history :-). In the final design, the one and only thing that distinguishes async/await from gevent is that in the former the suspension points are visible, and in the latter they aren't. I don't really believe that it's an accident that people put a lot of effort into creating async/await in this way at a time when gevent already existed and was widely used in production, and we have historical documents like Glyph's blog arguing for visible yield points as a motivation for async/await, but... even if you think it *was* an accident, it hardly matters at this point. The core distinguishing feature between async/await and gevent is the visibility of suspension points, so it might as well be the case that async/await is designed for exactly those people who want visible suspension points. (And I didn't say await or visible suspension points are necessarily "good for you" -- obviously the implicit and explicit interleaving approaches have trade-offs you'll have to judge for yourself. But there are some people in some situations who want implicit interleaving and async/await is there for them.) -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Thu Oct 6 06:27:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Oct 2016 20:27:45 +1000 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: On 6 October 2016 at 05:20, Sven R. Kunze wrote: > On 05.10.2016 18:06, Nick Coghlan wrote: >> >> [runtime matters] > > > I think I understand your point. > > I also hope that others and me could provide you with our perspective. We > see Python not as a C-like runtime but as an abstract modelling language. I > know that it's different from the point of view of CPython internals, > however from the outside Python suggests to be much more than a simple > wrapper around C. Just two different perspectives. It's not a question that's up for debate - as a point of factual history, Python's runtime model is anchored in the C runtime model, and this pervades the entire language design. Simply wishing that Python's core runtime design was other than it is doesn't make it so. We can diverge from that base model when we decide there's sufficient benefit in doing so (e.g. the object model, the import system, the numeric tower, exception handling, lexical closures, generators, generators-as-coroutines, context management, native coroutines), but whatever we decide to do still needs to be expressible in terms of underlying operating system provided C primitives, or CPython can't implement it (and if CPython can't implement a feature as the reference implementation, that feature can't become part of the language definition). Postponing the point at which folks are confronted by those underlying C-level constraints is often an admirable goal, though - the only thing that isn't possible without fundamentally changing the language is getting rid of them entirely. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Oct 6 07:34:28 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Oct 2016 21:34:28 +1000 Subject: [Python-ideas] async objects In-Reply-To: <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Message-ID: On 6 October 2016 at 15:15, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > Python's core runtime model is the C runtime model: threads (with a > > local stack and access to a global process heap) and processes (which > > contain a heap and one or more threads). Anything else we do (whether > > it's generators, coroutines, or some other form of paused execution > > like callback management) gets layered on top of that runtime model. > > When folks ask questions like "Why can't Python be more like Go?", > > "Why can't Python be more like Erlang?", or "Why can't Python be more > > like Rust?" and get a negative response, it's usually because there's > > an inherent conflict between the C runtime model and whatever piece of > > the Go/Erlang/Rust runtime model we want to steal. > > How can there be a conflict between Python implementing the C runtime > model *itself* which says "you can do anything anywhere anytime", and > some part of Python implementing the more restricted models that allow > safe concurrency? Anything is possible in C, but not everything is readily supportable :) When you design a new language and runtime from scratch, you get to set new rules and expectations if you want to do that. Ericsson did it with Erlang and BEAM (the reference Erlang VM) by declaring "Everything's an Actor in the 'Actor Model' sense, and Actors can send messages to each other's mailboxes". That pushes you heavily towards application designs where each "process" is a Finite State Machine with state changes triggered by external events, or by messages from other processes. If BEAM had been published as open source a decade earlier than it eventually was, I suspect the modern computing landscape would look quite different from the way it does today. Google did something similar with Golang and goroutines by declaring that Communicating Sequential Processes would be their core concurrency primitive rather than C's shared memory threading. By contrast, Python, C++, Java, C#, Objective-C all retained C's core thread-based "private stack, shared heap" concurrency model, which later expanded to also include thread local heap storage. Rust actually retains this core "private stack, private heap, shared heap" model, but changes the management of data ownership to avoid the messy problems that arise in practice when using the "everything is accessible to every thread by default" model. > If you can do anything, well, you can voluntarily > submit to compiler discipline to a restricted set. No? So it must be > that the existing constructions (functions, for, with) that need an > "async" marker have an implementation that is itself unsafe. Correct (for a given definition of unsafe): in normal operation, CPython uses the *C stack* to manage the Python frame stack, so when you descend into a new function call in CPython, you're also using up more C level stack space. This means that when CPython throws RecursionError, what it's actually aiming to prevent is a C level segfault arising from running out of stack space to manage frames: $ ./python -X faulthandler Python 3.6.0b1+ (3.6:b995b1f52975, Sep 22 2016, 01:19:04) [GCC 6.1.1 20160621 (Red Hat 6.1.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> def f(): f() ... >>> f() Traceback (most recent call last): File "", line 1, in File "", line 1, in f File "", line 1, in f File "", line 1, in f [Previous line repeated 995 more times] RecursionError: maximum recursion depth exceeded >>> import sys >>> sys.setrecursionlimit(int(1e5)) >>> f() Fatal Python error: Segmentation fault Current thread 0x00007fe977a7c700 (most recent call first): File "", line 1 in f File "", line 1 in f File "", line 1 in f [] ... Segmentation fault (core dumped) Loops, with statements and other magic method invocations all work that way - they make a C level call to the magic method implementation which may end up running a new invocation of the eval loop to evaluate the bytecode of a magic method implementation that's written in Python. The pay-off that CPython gets from this is that we get to delegate 99.9% of the work for supporting different CPU architectures to C compiler developers, and we get a lot of capabilities "for free" when it comes to stack management. The downside is that C runtimes don't officially support swapping out the stack of the current thread with new contents. It's *possible* to do that (hence Stackless and gevent), but you're on your own when it comes to debugging it when it breaks. That makes it a good candidate for an opt-in "expert users only" capability - folks that decide gevent is the right answer for their needs can adopt it if they want to (perhaps restricting their choice of target platform and C extension modules as a result), while we (as in the CPython core devs) don't need to keep custom stack manipulation code working on all the platforms where CPython is supported and with all the custom C extension modules that are out there. > This > need is not being explained very well. What is also not being > explained is what would be lost by simply using the "safe" > implementations generated by the async versions everywhere. The two main problems with that idea are speed and extension module compatibility. The speed aspect is simply that we have more than 4 decades behind us of CPU designers and compiler developers making C code run fast. CPython uses that raw underlying speed to offer a lot of runtime flexibility with a relatively simple implementation while still being "fast enough" for many use cases. Even then, function calls are still notoriously slow, and await invocations tend to be slower still. The extension module compatibility problem is simply that whereas you can emulate a normal Python function just by writing a normal C function, emulating a Python coroutine involves implementing the coroutine protocol. That's possible, but it's a lot more complicated, and even if you implemented a standard wrapper, you'd be straight back to the speed problem. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Oct 6 07:50:49 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Oct 2016 21:50:49 +1000 Subject: [Python-ideas] async objects In-Reply-To: <57F60126.40103@canterbury.ac.nz> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> Message-ID: On 6 October 2016 at 17:45, Greg Ewing wrote: > Saying that 'await' is good for you because it > makes the suspension points visible seems to me > a rationalisation after the fact. It was something > that emerged from the implementation, not a > prior design requirement. I'd say it emerged from most folks still not grasping generators-as-coroutines a decade after PEP 342, and asynchronous IO in general ~15 years after Twisted was first released. When a language usage pattern is supported for that long, but folks still don't grok how it might benefit them, you have a UX problem, and one of the ways to address it is to take the existing pattern and give it dedicated syntax, which is exactly what PEP 492 did. Dedicated syntax at least dramatically lowers the barrier to *recognition* of the coroutine design pattern when it's being used, and can help with explaining it as well (since the overlap with other concepts in the language becomes a hidden implementation detail rather than being an essential part of the user experience). The shadow thread idea will hopefully prove successful in addressing the last major rough spot in the UX, which is the ability to easily integrate asynchronous components into an otherwise synchronous application. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From filipp at bakanov.su Thu Oct 6 09:45:01 2016 From: filipp at bakanov.su (Filipp Bakanov) Date: Thu, 6 Oct 2016 16:45:01 +0300 Subject: [Python-ideas] Add "equal" builtin function Message-ID: For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow ( http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical) - all suggestions are either slow or not very elegant. What do you think about it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Oct 6 10:01:36 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 6 Oct 2016 15:01:36 +0100 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: On 6 October 2016 at 14:45, Filipp Bakanov wrote: > For now there are many usefull builtin functions like "any", "all", etc. I'd > like to propose a new builtin function "equal". It should accept iterable, > and return True if all items in iterable are the same or iterable is emty. > That's quite popular problem, there is a discussion of how to perform it on > stackoverflow > (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical) > - all suggestions are either slow or not very elegant. > What do you think about it? It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list). It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature. Paul From elazarg at gmail.com Thu Oct 6 10:45:11 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 06 Oct 2016 14:45:11 +0000 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it. (I'd like to note that it makes sense for this operation to be written as *iter1 == *lst although it requires a significant change to the language, so a Sequence.equal function makes sense) Elazar On Thu, Oct 6, 2016 at 5:02 PM Paul Moore wrote: > On 6 October 2016 at 14:45, Filipp Bakanov wrote: > > For now there are many usefull builtin functions like "any", "all", etc. > I'd > > like to propose a new builtin function "equal". It should accept > iterable, > > and return True if all items in iterable are the same or iterable is > emty. > > That's quite popular problem, there is a discussion of how to perform it > on > > stackoverflow > > ( > http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical > ) > > - all suggestions are either slow or not very elegant. > > What do you think about it? > > It's not a problem I've needed to solve often, if at all (in > real-world code). But even if we assume it is worth having as a > builtin, what would you propose as the implementation? The > stackoverflow discussion highlights a lot of approaches, all with > their own trade-offs. One problem with a builtin is that it would have > to work on all iterables, which is likely to preclude a number of the > faster solutions (which rely on the argument being an actual list). > > It's an interesting optimisation problem, and the discussion gives > some great insight into how to micro-optimise an operation like this, > but I'd question whether it needs to be a language/stdlib feature. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjoerdjob at sjoerdjob.com Thu Oct 6 10:43:09 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Thu, 6 Oct 2016 16:43:09 +0200 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: <20161006144309.GA13170@sjoerdjob.com> On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote: > On 6 October 2016 at 14:45, Filipp Bakanov wrote: > > For now there are many usefull builtin functions like "any", "all", etc. I'd > > like to propose a new builtin function "equal". It should accept iterable, > > and return True if all items in iterable are the same or iterable is emty. > > That's quite popular problem, there is a discussion of how to perform it on > > stackoverflow > > (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical) > > - all suggestions are either slow or not very elegant. > > What do you think about it? > > It's not a problem I've needed to solve often, if at all (in > real-world code). But even if we assume it is worth having as a > builtin, what would you propose as the implementation? The > stackoverflow discussion highlights a lot of approaches, all with > their own trade-offs. One problem with a builtin is that it would have > to work on all iterables, which is likely to preclude a number of the > faster solutions (which rely on the argument being an actual list). > > It's an interesting optimisation problem, and the discussion gives > some great insight into how to micro-optimise an operation like this, > but I'd question whether it needs to be a language/stdlib feature. > > Paul I've needed it several times, but can't really remember what for anymore, which makes me think it's not really that important. A motivating reason for adding it to the builtins would be that it can be written in C instead of Python, and hence be a lot faster. The single slowest solution is actually the fastest when the difference is detected very soon (case s3), all others are `O(n)` and not `O(first-mismatch)`. Though, that means it could also be written in C and provided to PyPI, at the cost of asking others to install an extra package. From sjoerdjob at sjoerdjob.com Thu Oct 6 10:52:46 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Thu, 6 Oct 2016 16:52:46 +0200 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: <20161006145246.GB13170@sjoerdjob.com> On Thu, Oct 06, 2016 at 02:45:11PM +0000, ????? wrote: > It is a real problem. People are used to write `seq == [1, 2, 3]` and it > passes unnoticed (even with type checkers) that if seq changes to e.g. a > tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == > 3 and seq == [1, 2, 3]` and people often don't notice the need to write it. > > (I'd like to note that it makes sense for this operation to be written as > > *iter1 == *lst > > although it requires a significant change to the language, so a > Sequence.equal function makes sense) > > Elazar > I think you're mistaken about the suggestion. It's not about a function def equal(it1: Iterable, it2: Iterable) -> bool: but about a function def equal(it: Iterable) -> bool: . From elazarg at gmail.com Thu Oct 6 10:56:22 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 06 Oct 2016 14:56:22 +0000 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: <20161006145246.GB13170@sjoerdjob.com> References: <20161006145246.GB13170@sjoerdjob.com> Message-ID: On Thu, Oct 6, 2016 at 5:53 PM Sjoerd Job Postmus wrote: > On Thu, Oct 06, 2016 at 02:45:11PM +0000, ????? wrote: > > It is a real problem. People are used to write `seq == [1, 2, 3]` and it > > passes unnoticed (even with type checkers) that if seq changes to e.g. a > > tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) > == > > 3 and seq == [1, 2, 3]` and people often don't notice the need to write > it. > > > > (I'd like to note that it makes sense for this operation to be written as > > > > *iter1 == *lst > > > > although it requires a significant change to the language, so a > > Sequence.equal function makes sense) > > > > Elazar > > > > I think you're mistaken about the suggestion. You are right of course. Sorry. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 6 11:23:57 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 7 Oct 2016 02:23:57 +1100 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: <20161006152357.GJ22471@ando.pearwood.info> On Thu, Oct 06, 2016 at 04:45:01PM +0300, Filipp Bakanov wrote: > For now there are many usefull builtin functions like "any", "all", etc. > I'd like to propose a new builtin function "equal". It should accept > iterable, and return True if all items in iterable are the same or iterable > is emty. > That's quite popular problem, there is a discussion of how to perform it on > stackoverflow ( > http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical) > - all suggestions are either slow or not very elegant. I haven't checked the link, but just off the top of my head, how's this? def all_equal(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) I think that's neat, elegant, fast, and short enough that I don't mind writing it myself when I need it (although I wouldn't mind adding it to my own personal toolbox). +0.3 to adding it the standard library. +0.1 to adding it to built-ins -0.1 on adding it to built-ins under the name "equal", as that will confuse too many people. -- Steve From ethan at stoneleaf.us Thu Oct 6 11:39:33 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Oct 2016 08:39:33 -0700 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: <57F67035.7020608@stoneleaf.us> On 10/06/2016 06:45 AM, Filipp Bakanov wrote: > For now there are many usefull builtin functions like "any", "all", > etc. I'd like to propose a new builtin function "equal". It should > accept iterable, and return True if all items in iterable are the > same or iterable is emty. > > That's quite popular problem, there is a discussion of how to > perform it on stackoverflow - all suggestions are either slow > or not very elegant. > > What do you think about it? I don't know if it's common enough to warrant being a built-in, but I know I've needed it several times, and wrote my own. -- ~Ethan~ From rosuav at gmail.com Thu Oct 6 11:42:15 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 7 Oct 2016 02:42:15 +1100 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: <20161006152357.GJ22471@ando.pearwood.info> References: <20161006152357.GJ22471@ando.pearwood.info> Message-ID: On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano wrote: > +0.3 to adding it the standard library. > > +0.1 to adding it to built-ins > > -0.1 on adding it to built-ins under the name "equal", as that will > confuse too many people. I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to itertools or the itertools recipes. ChrisA From vxgmichel at gmail.com Thu Oct 6 11:55:48 2016 From: vxgmichel at gmail.com (Vincent Michel) Date: Thu, 6 Oct 2016 17:55:48 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> Message-ID: 2016-10-06 13:50 GMT+02:00 Nick Coghlan : > The shadow thread idea will hopefully prove successful in addressing > the last major rough spot in the UX, which is the ability to easily > integrate asynchronous components into an otherwise synchronous > application. > That's my opinion as well. If I had to run asyncio coroutines from synchronous code, I'd probably take advantage of the Executor interface defined by concurrent.futures. Executors handle resource management through a context manager interface, which is a good way to start and clean after the shadow thread. Also, the submit method returns a concurrent.futures.Future, i.e. the standard for accessing an asynchronous result from synchronous code. Here's a simple implementation: https://gist.github.com/vxgmichel/d16e66d1107a369877f6ef7e646ac2e5 If this is not enough, (say one wants to write a synchronous API to an asynchronous library), then it simply is a matter of instantiating the executor once in the module and wrap all the coroutines to expose with executor.submit and Future.result. This might provide an acceptable answer to the DRY thing that has been mentioned a few times, though I'm not convinced it is such a problematic issue (at least nothing that sans-io already addresses in the first place). -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Oct 6 12:20:18 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 6 Oct 2016 12:20:18 -0400 Subject: [Python-ideas] async objects In-Reply-To: <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Message-ID: On 2016-10-06 1:15 AM, Stephen J. Turnbull wrote: > These may be hard to explain, and I know you, Yury, and Guido are very > busy. But it's frustrating for all to see this go around in a circle: > "it's like it is because it has to be that way, so that's the way it is". To add to what Nick said. I myself would want to use a time machine to help design CPython runtime to allow Golang style jof concurrency (although Golang has its own bag of problems). Unfortunately there is no time machine, and implementing that in CPython today would be an impossibly hard and long task. To start, no matter how exactly you want to approach this, it would require us to do a *complete rewrite* of CPython internals. This is so complex that we wouldn't be able to even estimate how long it would take us. This would be a far more significant change than Python 2->3. BTW in the process of doing that, we would have to completely redesign the C API, which would effectively kill the entire numpy/scipy ecosystem. If someone disagrees with this, I invite them to go ahead and write a PEP (please!) On the other hand, async/await and non-blocking IO make it possible to write highly concurrent network applications. Even languages with good support of threading, such as C#, have async/await [sic!]. Even Rust users want them, and will likely add them in the language or std lib. Even C++ might have coroutines soon. Why? Because Rust and C# can't "just" implement actors model. Because threads are hard and deadlocks and code that is hard to reason about. Because threads can't scale as good as non-blocking IO. We probably could implement actors if we decided to merge Stackless or use greenlets in the core. Anyone who looked at/debugged the implementation of greenlets would say it's a bad idea. And gevent is available for those who want to use them anyways. In the end, async/await is the only *practical* solution for a language like Python. Yes, it's a bit harder to design libraries that support both synchronous and asynchronous APIs, but there's a way: separate your protocol parsing from IO. When done properly, it's easier to write unittests and it's a no-brainer to add support for different IO models. Yury From ethan at stoneleaf.us Thu Oct 6 12:27:19 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Oct 2016 09:27:19 -0700 Subject: [Python-ideas] async objects In-Reply-To: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> Message-ID: <57F67B67.3080504@stoneleaf.us> Interestingly, this just showed up on Python List: On 10/06/2016 05:09 AM, Frank Millman wrote: > > I have used itertools.groupby before, and I love it. I used it > to process a csv file and 'break' on change of a particular > field. It worked very well. > > Now I want to use it to process a database table. I can select > the rows in the desired sequence with no problem. However, I > am using asyncio, so I am reading the rows asynchronously. > > My 'reader' class has __aiter__() and __anext__() defined. > > If I pass the reader to groupby, I get the error message > 'object is not iterable'. > > Before I spend hours trying to figure it out, can anyone > confirm if this is doable at all, or is groupby not designed > for this. Is adapting the groupby recipe in the docs and adding async support (and thus duplicating code) the only way at this point? Will there be a better way in the future? -- ~Ethan~ From filipp at bakanov.su Thu Oct 6 13:09:07 2016 From: filipp at bakanov.su (Filipp Bakanov) Date: Thu, 6 Oct 2016 20:09:07 +0300 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: <20161006152357.GJ22471@ando.pearwood.info> Message-ID: Seems like itertools recipes already have "all_equal" function. What do you think about moving it from recipes to itertools? I suggest a C implementation with optimisations for builtin collections. 2016-10-06 18:42 GMT+03:00 Chris Angelico : > On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano > wrote: > > +0.3 to adding it the standard library. > > > > +0.1 to adding it to built-ins > > > > -0.1 on adding it to built-ins under the name "equal", as that will > > confuse too many people. > > I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to > itertools or the itertools recipes. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Thu Oct 6 13:20:36 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 06 Oct 2016 17:20:36 +0000 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: <20161006152357.GJ22471@ando.pearwood.info> Message-ID: The name might be a little confusing; it can be understood as comparing two sequences, so passing two sequences may seem reasonable to a reviewer. Elazar ?????? ??? ??, 6 ????' 2016, 20:15, ??? Filipp Bakanov ?: > Seems like itertools recipes already have "all_equal" function. What do > you think about moving it from recipes to itertools? I suggest a C > implementation with optimisations for builtin collections. > > 2016-10-06 18:42 GMT+03:00 Chris Angelico : > > On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano > wrote: > > +0.3 to adding it the standard library. > > > > +0.1 to adding it to built-ins > > > > -0.1 on adding it to built-ins under the name "equal", as that will > > confuse too many people. > > I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to > itertools or the itertools recipes. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Thu Oct 6 14:52:05 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 6 Oct 2016 19:52:05 +0100 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: <20161006144309.GA13170@sjoerdjob.com> References: <20161006144309.GA13170@sjoerdjob.com> Message-ID: On 06/10/2016 15:43, Sjoerd Job Postmus wrote: > On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote: >> On 6 October 2016 at 14:45, Filipp Bakanov wrote: >>> For now there are many usefull builtin functions like "any", "all", etc. I'd >>> like to propose a new builtin function "equal". It should accept iterable, >>> and return True if all items in iterable are the same or iterable is emty. >>> That's quite popular problem, there is a discussion of how to perform it on >>> stackoverflow >>> (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-are-identical) >>> - all suggestions are either slow or not very elegant. >>> What do you think about it? >> >> It's not a problem I've needed to solve often, if at all (in >> real-world code). But even if we assume it is worth having as a >> builtin, what would you propose as the implementation? The >> stackoverflow discussion highlights a lot of approaches, all with >> their own trade-offs. One problem with a builtin is that it would have >> to work on all iterables, which is likely to preclude a number of the >> faster solutions (which rely on the argument being an actual list). >> >> It's an interesting optimisation problem, and the discussion gives >> some great insight into how to micro-optimise an operation like this, >> but I'd question whether it needs to be a language/stdlib feature. >> >> Paul > > I've needed it several times, but can't really remember what for > anymore, which makes me think it's not really that important. > A motivating reason for adding it to the builtins would be that it can > be written in C instead of Python, and hence be a lot faster. This should be on the bug tracker as "release blocker" as we clearly need something that is fast that isn't that important. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From brenbarn at brenbarn.net Thu Oct 6 15:06:02 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Thu, 06 Oct 2016 12:06:02 -0700 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <57F6A09A.70908@brenbarn.net> On 2016-10-06 03:27, Nick Coghlan wrote: > It's not a question that's up for debate - as a point of factual > history, Python's runtime model is anchored in the C runtime model, > and this pervades the entire language design. Simply wishing that > Python's core runtime design was other than it is doesn't make it so. That may be true, but the limitation there is Python's core runtime model, not C's. As you say, Python's runtime model is historically anchored in C, but that doesn't mean C's runtime model itself directly constrains Python's. As others have mentioned, there are plenty of other languages that are themselves written in C but have different runtime models. The constraint is not compatibility with the C runtime model, but backward compatibility with Python's own earlier decisions about its own runtime model. This may sound like an academic point, but I just want to mention it because, as you say later, hiding C from the Python programmer is often an admirable goal. I would go so far as to say it is almost always an admirable goal. The Python runtime isn't going to suddenly change, but we can make smart decisions about incremental changes in a way that, over time, allows it to drift further from the C model, rather than adding more and more tethers linking it more tightly to the C model. > Postponing the point at which folks are confronted by those underlying > C-level constraints is often an admirable goal, though - the only > thing that isn't possible without fundamentally changing the language > is getting rid of them entirely. Sure. But over the long term, almost anything is possible. As I said above, my own opinion is that hiding C from Python users is almost always a good thing. I (and I think many other people) use Python because I like Python. If I liked C I would use C. To the extent that Python allows C to constrain it (or, more specifically, allows the nature of C to constrain people who are only writing Python code), it limits its ability to evolve in a way that frees users from the things they don't like about C. This is kind of tangential to the current issue about async. To be honest I am quite ignorant of how async/await will help or hurt me as a Python user. As you say, certain constraints are unavoidable. (We don't have to use C's runtime model, but we do have to be able to write our runtime model in C.) But I think it's good, when thinking about these features, to think how they will constrain future language development versus opening it up. If, for instance, people start using async/await and old-school generator-send-style coroutines become unused, it will be easier to deprecate generator-send in the distant future. On the flip side, I would hate to see decisions made that result in lots of Python code that "bakes in" specific runtime model assumptions, making it more difficult to leave those assumptions behind in the future. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From p.f.moore at gmail.com Thu Oct 6 15:15:04 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 6 Oct 2016 20:15:04 +0100 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: <20161006152357.GJ22471@ando.pearwood.info> Message-ID: On 6 October 2016 at 18:09, Filipp Bakanov wrote: > Seems like itertools recipes already have "all_equal" function. What do you > think about moving it from recipes to itertools? I suggest a C > implementation with optimisations for builtin collections. Interestingly, the recipe given there was not mentioned in the stackoverflow thread. Testing it against Steven's example given above: recipe.py: from itertools import groupby def all_equal_1(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) def all_equal_2(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) Results: > # Itertools recipe, all different >py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_1(x)" .................... Median +- std dev: 596 ns +- 10 ns > # Itertools recipe, all the same >py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_1(y)" .................... Median +- std dev: 7.17 us +- 0.05 us > # Steven's recipe, all different >py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_2(x)" .................... Median +- std dev: 998 ns +- 12 ns > # Steven's recipe, all the same >py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_2(y)" .................... Median +- std dev: 84.3 us +- 0.9 us So the itertools recipe is just under twice as fast for all-different values, and over 10 times faster if all the values are the same. The moral here is probably to check the itertools recipes, they are really well coded. If you really feel that it's worth promoting this recipe to an actual itertools function, you should probably create a tracker item for it, aimed at Python 3.7, with a patch implementing it. My feeling is that Raymond (who's in charge of the itertools module) won't think it's worth including - he's typically very cautious about adding itertools unless they have proven broad value. But that's just my guess, and the only way to know for sure is to ask. BTW, given that this *is* already an itertools recipe, it seems clear to me that the only reasonable place to put it if it does go into core Python would be the itertools module. Paul From greg.ewing at canterbury.ac.nz Thu Oct 6 19:12:14 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 07 Oct 2016 12:12:14 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> Message-ID: <57F6DA4E.6000108@canterbury.ac.nz> Nathaniel Smith wrote: > The core distinguishing feature between > async/await and gevent is the visibility of suspension points, so it > might as well be the case that async/await is designed for exactly > those people who want visible suspension points. They're not quite independent axes, though. Gevent is based on greenlet, which relies on some slightly dubious tricks at the C level and doesn't play well with some external libraries. As far as I know, there's no current alternative that's just as efficient and portable as asyncio but without the extra keywords. If you want the full benefits of asyncio, you're forced to accept explicit suspension points. -- Greg From mistersheik at gmail.com Thu Oct 6 19:19:17 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 6 Oct 2016 16:19:17 -0700 (PDT) Subject: [Python-ideas] str(slice(10)) should return "slice(10)" Message-ID: <18c9fb30-566a-4213-a066-7ea5c9f5c44e@googlegroups.com> Currently str(slice(10)) returns "slice(None, 10, None)" If the start and step are None, consider not emitting them. Similarly slice(None) is rendered slice(None, None, None). When you're printing a lot of slices, it's a lot of extra noise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Oct 6 19:28:22 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 07 Oct 2016 12:28:22 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Message-ID: <57F6DE16.6080701@canterbury.ac.nz> Nick Coghlan wrote: > The pay-off that CPython gets from this is that we get to delegate > 99.9% of the work for supporting different CPU architectures to C > compiler developers, and we get a lot of capabilities "for free" when > it comes to stack management. One of the main benefits is that it's very easy for external code to make callbacks to Python code. The original implementation of Stackless decoupled the eval stack from the C stack, but at the expense of making the API for calling external C code much less straightforward. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 6 19:40:50 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 07 Oct 2016 12:40:50 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> Message-ID: <57F6E102.4030604@canterbury.ac.nz> Nick Coghlan wrote: > When a language usage pattern is supported for that long, but folks > still don't grok how it might benefit them, you have a UX problem, and > one of the ways to address it is to take the existing pattern and give > it dedicated syntax, which is exactly what PEP 492 did. However, it was just replacing one way of explicitly marking suspension points ("yield from") with another ("await"). The fact that suspension points are explicitly marked was driven by the implementation from the beginning. When I first proposed "yield from" as an aid to using generators as coroutines, my intention was always to eventually replace it with something else. PEP 3152 was my proposal for what the something else might be. I initially regarded it as a wart that it still required a special syntax for suspendable calls, and felt the need to apologise for that. I was totally surprised when people said they actually *liked* the idea of explicit suspension points. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 6 19:50:50 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 07 Oct 2016 12:50:50 +1300 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <22517.56837.392194.462904@turnbull.sk.tsukuba.ac.jp> Message-ID: <57F6E35A.70801@canterbury.ac.nz> Yury Selivanov wrote: > To start, no matter how exactly you want to approach this, it would > require us to do a *complete rewrite* of CPython internals. This is so > complex that we wouldn't be able to even estimate how long it would take > us. You could ask the author of Stackless -- he did exactly that quite a while back. -- Greg From njs at pobox.com Fri Oct 7 02:42:25 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 6 Oct 2016 23:42:25 -0700 Subject: [Python-ideas] async objects In-Reply-To: <57F6DA4E.6000108@canterbury.ac.nz> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> <57F6DA4E.6000108@canterbury.ac.nz> Message-ID: On Thu, Oct 6, 2016 at 4:12 PM, Greg Ewing wrote: > Nathaniel Smith wrote: >> >> The core distinguishing feature between >> async/await and gevent is the visibility of suspension points, so it >> might as well be the case that async/await is designed for exactly >> those people who want visible suspension points. > > > They're not quite independent axes, though. Gevent is based > on greenlet, which relies on some slightly dubious tricks at > the C level and doesn't play well with some external libraries. > > As far as I know, there's no current alternative that's just > as efficient and portable as asyncio but without the extra > keywords. If you want the full benefits of asyncio, you're > forced to accept explicit suspension points. I'd be interested to hear more about this. gevent/greenlet don't seem to have an official "list of supported platforms" that I can find, but I can't find concrete examples of unsupported platforms either. Are we talking like, HPUX-on-MIPS or...? And obviously there are always going to be some cases that are better supported by either one tool or another, but as we've seen getting external libraries to play well with asyncio is also pretty non-trivial (exactly because of those explicit suspension points!), and my impression was that for now gevent actually had a larger ecosystem. For folks who prefer the gevent API, is it really easier to port libraries to asyncio than to port them to gevent? -n -- Nathaniel J. Smith -- https://vorpus.org From lkb.teichmann at gmail.com Fri Oct 7 04:07:36 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Fri, 7 Oct 2016 10:07:36 +0200 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio Message-ID: Hi list, I am currently developing a Python library based on asyncio. Unfortunately, not all users of my library have much experience with asynchronous programming, so they often try to use blocking functions. I thought it would be a good idea if we could somehow flag blocking functions in the standard library, such that they issue a warning (or even raise an exception) if they are used in an asyncio context. For functions implemented in Python, a simple decorator should do the job. For functions implemented in C, things get a bit more complex. Thinking about it, I realized that currently the best indicator for a C function to block is that it releases the GIL. There are some false positives, like a read with O_NONBLOCK set, in which case we need a way to opt out, but in general it could be a good idea that releasing the GIL triggers a warning in an asyncio environment. Greetings Martin From ncoghlan at gmail.com Fri Oct 7 10:55:28 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 8 Oct 2016 00:55:28 +1000 Subject: [Python-ideas] Add "equal" builtin function In-Reply-To: References: Message-ID: On 6 October 2016 at 23:45, Filipp Bakanov wrote: > For now there are many usefull builtin functions like "any", "all", etc. I'd > like to propose a new builtin function "equal". It should accept iterable, > and return True if all items in iterable are the same or iterable is emty. If the items are hashable, you can already just dump them in a set: len(set(iterable)) <= 1 If they're not hashable or you want to exit ASAP on larger inputs, you'll want an algorithm that works the same way any/all do: def all_same(iterable): itr = itr(iterable) try: first = next(itr) except StopIteration: return True return all(x == first for x in itr) (Checking the SO question, both of those are given in the first answer) If you know you have a sequence, you can also do: not seq or all(x == seq[0] for x in seq) Exactly which of those options makes sense is going to depend on what format your data is in, and what other operations you're planning to do with it - without a context of use in the SO question, it sounds more like someone seeking help with their algorithms and data structures homework than it does a practical programming problem. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Oct 7 11:16:29 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 7 Oct 2016 08:16:29 -0700 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: Message-ID: On Fri, Oct 7, 2016 at 1:07 AM, Martin Teichmann wrote: > I am currently developing a Python library based on asyncio. > Unfortunately, not all users of my library have much experience with > asynchronous programming, so they often try to use blocking functions. > > I thought it would be a good idea if we could somehow flag blocking > functions in the standard library, such that they issue a warning (or > even raise an exception) if they are used in an asyncio context. For > functions implemented in Python, a simple decorator should do the job. > > For functions implemented in C, things get a bit more complex. > Thinking about it, I realized that currently the best indicator for a > C function to block is that it releases the GIL. There are some false > positives, like a read with O_NONBLOCK set, in which case we need a > way to opt out, but in general it could be a good idea that releasing > the GIL triggers a warning in an asyncio environment. That implementation idea seems iffy -- it feels like it would be a lot of work to pull it off. Releasing the GIL is done at an extremely low level and it's not clear how you'd even raise an exception at that point. Maybe a simpler approach would be to write a linter that checks for a known list of common blocking functions, and anything that calls those automatically gets the same property? -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Fri Oct 7 11:18:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 8 Oct 2016 01:18:37 +1000 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> <57F6DA4E.6000108@canterbury.ac.nz> Message-ID: On 7 October 2016 at 16:42, Nathaniel Smith wrote: > For folks who prefer the > gevent API, is it really easier to port libraries to asyncio than to > port them to gevent? It's definitely *not* easier, as gevent lets you suspend execution inside arbitrary CPython magic method calls. That's why you can still use SQL Alchemy's ORM layer with gevent - greenlet can swap the stack even with the extra C call frames on there. If you're running in vanilla CPython (or recent non-Windows versions of PyPy2), on a relatively mainstream architecture like x86_64 or ARM, then gevent/greenlet will be fine as an applications synchronous/asyncrhonous bridge. However, if you're running in a context that embeds CPython inside a larger application (e.g. mod_wsgi inside Apache), then gevent's assumptions about how the C thread states are managed may be wrong, and hence you may be in for some "interesting" debugging sessions. The same goes for any library that implements callbacks that end up executing a greenlet switch when they weren't expecting it (e.g. while holding a threading lock - that will protect you from other OS threads, but not from other greenlets in the same thread) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Fri Oct 7 12:52:27 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 7 Oct 2016 12:52:27 -0400 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: Message-ID: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> On 2016-10-07 11:16 AM, Guido van Rossum wrote: > Maybe a simpler approach would be to write a linter that checks for a > known list of common blocking functions, and anything that calls those > automatically gets the same property? What if somebody uses logging module and logs to a file? I think this is something that linters can't infer (how logging is configured). One way to solve this would be to monkeypatch the io and os modules (gevent does that, so it's possible) to issue a warning when it's used in an asyncio context. This can be done as a module on PyPI. Another way would be to add some kind of IO tracing hooks to CPython. Yury From g.rodola at gmail.com Fri Oct 7 13:31:17 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 7 Oct 2016 19:31:17 +0200 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: On Fri, Oct 7, 2016 at 6:52 PM, Yury Selivanov wrote: > On 2016-10-07 11:16 AM, Guido van Rossum wrote: > > Maybe a simpler approach would be to write a linter that checks for a >> known list of common blocking functions, and anything that calls those >> automatically gets the same property? >> > > What if somebody uses logging module and logs to a file? I think this is > something that linters can't infer (how logging is configured). > > One way to solve this would be to monkeypatch the io and os modules > (gevent does that, so it's possible) to issue a warning when it's used in > an asyncio context. This can be done as a module on PyPI. > > Another way would be to add some kind of IO tracing hooks to CPython. How about something like this? http://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop.set_blocking_signal_threshold -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Oct 7 13:34:25 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 7 Oct 2016 10:34:25 -0700 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: On Fri, Oct 7, 2016 at 9:52 AM, Yury Selivanov wrote: > On 2016-10-07 11:16 AM, Guido van Rossum wrote: > >> Maybe a simpler approach would be to write a linter that checks for a >> known list of common blocking functions, and anything that calls those >> automatically gets the same property? > > What if somebody uses logging module and logs to a file? I think this is > something that linters can't infer (how logging is configured). And depending on your use case this may be acceptable. > One way to solve this would be to monkeypatch the io and os modules (gevent > does that, so it's possible) to issue a warning when it's used in an asyncio > context. This can be done as a module on PyPI. > > Another way would be to add some kind of IO tracing hooks to CPython. Honestly before writing a lot of code here I'd like to hear more from Martin about the spread of mistakes he's observed among his users. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Fri Oct 7 14:32:46 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 7 Oct 2016 14:32:46 -0400 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: <5c4c9b31-d693-f70e-95fa-fe0cfb19b7c3@gmail.com> On 2016-10-07 1:31 PM, Giampaolo Rodola' wrote: > On Fri, Oct 7, 2016 at 6:52 PM, Yury Selivanov > wrote: > >> On 2016-10-07 11:16 AM, Guido van Rossum wrote: >> >> Maybe a simpler approach would be to write a linter that checks for a >>> known list of common blocking functions, and anything that calls those >>> automatically gets the same property? >>> >> What if somebody uses logging module and logs to a file? I think this is >> something that linters can't infer (how logging is configured). >> >> One way to solve this would be to monkeypatch the io and os modules >> (gevent does that, so it's possible) to issue a warning when it's used in >> an asyncio context. This can be done as a module on PyPI. >> >> Another way would be to add some kind of IO tracing hooks to CPython. > > How about something like this? > http://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop.set_blocking_signal_threshold > Yes, we already have a similar mechanism in asyncio -- loop.slow_callback_duration property that is used in debug mode. The thing it isn't really precise, as you can have a lot of relatively fast blocking calls that harm performance, but complete faster than slow_callback_duration. Yury From jcrmatos at gmail.com Fri Oct 7 13:19:06 2016 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Matos?=) Date: Fri, 7 Oct 2016 10:19:06 -0700 (PDT) Subject: [Python-ideas] pip enhancements: check for dependent packages before uninstalling and store installation date to allow listing by it Message-ID: <4972eb0e-612b-4766-a07c-9d109090bacb@googlegroups.com> Hello, I believe it would be helpful if pip checked if there are any dependent packages before uninstalling a package. If there were, it should: 1. Warn the user by listing the dependent packages; 2. Require a specific option, eg. --alldeps to uninstall all dependent packages before uninstalling the specified package; 3. Require a specific option, eg. --force to continue with the uninstall w/o uninstalling the dependent packages (which could be dangerous, but it's the current behaviour and may be useful in specific situations). This is similar to what happens with package managers in Linux. This should also work on venvs. By storing the installation date of all packages, it should allow the list command to be ordered by it. Best regards, JM -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Oct 7 19:03:09 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 07 Oct 2016 16:03:09 -0700 Subject: [Python-ideas] pip enhancements: check for dependent packages before uninstalling and store installation date to allow listing by it In-Reply-To: <4972eb0e-612b-4766-a07c-9d109090bacb@googlegroups.com> References: <4972eb0e-612b-4766-a07c-9d109090bacb@googlegroups.com> Message-ID: <57F829AD.1080901@stoneleaf.us> On 10/07/2016 10:19 AM, Jo?o Matos wrote: > I believe it would be helpful if pip checked if there are any dependent > packages before uninstalling a package. Seems like a good idea, but you'll need to suggest it at distutils-sig at python.org as that is where pip is developed. -- ~Ethan~ From jcrmatos at gmail.com Fri Oct 7 19:05:37 2016 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Sat, 8 Oct 2016 00:05:37 +0100 Subject: [Python-ideas] pip enhancements: check for dependent packages before uninstalling and store installation date to allow listing by it In-Reply-To: <57F829AD.1080901@stoneleaf.us> References: <4972eb0e-612b-4766-a07c-9d109090bacb@googlegroups.com> <57F829AD.1080901@stoneleaf.us> Message-ID: Hello, Ok, did that. Thanks, JM On 08-10-2016 00:03, Ethan Furman wrote: > On 10/07/2016 10:19 AM, Jo?o Matos wrote: > >> I believe it would be helpful if pip checked if there are any dependent >> packages before uninstalling a package. > > Seems like a good idea, but you'll need to suggest it at > > distutils-sig at python.org > > as that is where pip is developed. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From victor.stinner at gmail.com Sat Oct 8 10:50:47 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 8 Oct 2016 16:50:47 +0200 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: <5c4c9b31-d693-f70e-95fa-fe0cfb19b7c3@gmail.com> References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> <5c4c9b31-d693-f70e-95fa-fe0cfb19b7c3@gmail.com> Message-ID: It seems different. It looks like Tornado uses an alarm and SIGALRM, whereas asyncio only checks elapsed time and so is unable to interrupt a blocked function. Victor Le 7 oct. 2016 20:33, "Yury Selivanov" a ?crit : > > > On 2016-10-07 1:31 PM, Giampaolo Rodola' wrote: > >> On Fri, Oct 7, 2016 at 6:52 PM, Yury Selivanov >> wrote: >> >> On 2016-10-07 11:16 AM, Guido van Rossum wrote: >>> >>> Maybe a simpler approach would be to write a linter that checks for a >>> >>>> known list of common blocking functions, and anything that calls those >>>> automatically gets the same property? >>>> >>>> What if somebody uses logging module and logs to a file? I think this >>> is >>> something that linters can't infer (how logging is configured). >>> >>> One way to solve this would be to monkeypatch the io and os modules >>> (gevent does that, so it's possible) to issue a warning when it's used in >>> an asyncio context. This can be done as a module on PyPI. >>> >>> Another way would be to add some kind of IO tracing hooks to CPython. >>> >> >> How about something like this? >> http://www.tornadoweb.org/en/stable/ioloop.html#tornado.iolo >> op.IOLoop.set_blocking_signal_threshold >> >> > Yes, we already have a similar mechanism in asyncio -- > loop.slow_callback_duration property that is used in debug mode. The thing > it isn't really precise, as you can have a lot of relatively fast blocking > calls that harm performance, but complete faster than > slow_callback_duration. > > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at jeltef.nl Sat Oct 8 15:26:13 2016 From: me at jeltef.nl (Jelte Fennema) Date: Sat, 8 Oct 2016 21:26:13 +0200 Subject: [Python-ideas] PEP8 dictionary indenting addition Message-ID: I have an idea to improve indenting guidelines for dictionaries for better readability: If a value in a dictionary literal is placed on a new line, it should have (or at least be allowed to have) a n additional hanging indent. Below is an example: mydict = {'mykey': 'a very very very very very long value', 'secondkey': 'a short value', 'thirdkey': 'a very very very ' 'long value that continues on the next line', } As opposed to this IMHO much less readable version: mydict = {'mykey': 'a very very very very very long value', 'secondkey': 'a short value', 'thirdkey': 'a very very very ' 'long value that continues on the next line', } As you can see it is much harder in the second version to distinguish between keys and values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Sat Oct 8 16:08:39 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat, 8 Oct 2016 13:08:39 -0700 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: Message-ID: Makes sense, maybe you can send a PR to the Python/peps repo? --Guido (mobile) On Oct 8, 2016 12:27 PM, "Jelte Fennema" wrote: > I have an idea to improve indenting guidelines for dictionaries for better > readability: If a value in a dictionary literal is placed on a new line, it > should have (or at least be allowed to have) a n additional hanging indent. > > Below is an example: > > mydict = {'mykey': > 'a very very very very very long value', > 'secondkey': 'a short value', > 'thirdkey': 'a very very very ' > 'long value that continues on the next line', > } > > > As opposed to this IMHO much less readable version: > > mydict = {'mykey': > 'a very very very very very long value', > 'secondkey': 'a short value', > 'thirdkey': 'a very very very ' > 'long value that continues on the next line', > } > > As you can see it is much harder in the second version to distinguish > between keys and values. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at jeltef.nl Sat Oct 8 16:23:38 2016 From: me at jeltef.nl (Jelte Fennema) Date: Sat, 8 Oct 2016 22:23:38 +0200 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: Message-ID: Alright, I'll make one when I have some time in the near future. On 8 Oct 2016 10:08 pm, "Guido van Rossum" wrote: > Makes sense, maybe you can send a PR to the Python/peps repo? > > --Guido (mobile) > > On Oct 8, 2016 12:27 PM, "Jelte Fennema" wrote: > >> I have an idea to improve indenting guidelines for dictionaries for >> better readability: If a value in a dictionary literal is placed on a new >> line, it should have (or at least be allowed to have) a n additional >> hanging indent. >> >> Below is an example: >> >> mydict = {'mykey': >> 'a very very very very very long value', >> 'secondkey': 'a short value', >> 'thirdkey': 'a very very very ' >> 'long value that continues on the next line', >> } >> >> >> As opposed to this IMHO much less readable version: >> >> mydict = {'mykey': >> 'a very very very very very long value', >> 'secondkey': 'a short value', >> 'thirdkey': 'a very very very ' >> 'long value that continues on the next line', >> } >> >> As you can see it is much harder in the second version to distinguish >> between keys and values. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Sat Oct 8 17:26:29 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat, 8 Oct 2016 14:26:29 -0700 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: Message-ID: Might also send something to pystylechecker at the same time. --Guido (mobile) On Oct 8, 2016 1:23 PM, "Jelte Fennema" wrote: > Alright, I'll make one when I have some time in the near future. > > On 8 Oct 2016 10:08 pm, "Guido van Rossum" wrote: > >> Makes sense, maybe you can send a PR to the Python/peps repo? >> >> --Guido (mobile) >> >> On Oct 8, 2016 12:27 PM, "Jelte Fennema" wrote: >> >>> I have an idea to improve indenting guidelines for dictionaries for >>> better readability: If a value in a dictionary literal is placed on a new >>> line, it should have (or at least be allowed to have) a n additional >>> hanging indent. >>> >>> Below is an example: >>> >>> mydict = {'mykey': >>> 'a very very very very very long value', >>> 'secondkey': 'a short value', >>> 'thirdkey': 'a very very very ' >>> 'long value that continues on the next line', >>> } >>> >>> >>> As opposed to this IMHO much less readable version: >>> >>> mydict = {'mykey': >>> 'a very very very very very long value', >>> 'secondkey': 'a short value', >>> 'thirdkey': 'a very very very ' >>> 'long value that continues on the next line', >>> } >>> >>> As you can see it is much harder in the second version to distinguish >>> between keys and values. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Sat Oct 8 18:02:01 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat, 8 Oct 2016 15:02:01 -0700 Subject: [Python-ideas] pip enhancements: check for dependent packages before uninstalling and store installation date to allow listing by it In-Reply-To: References: <4972eb0e-612b-4766-a07c-9d109090bacb@googlegroups.com> <57F829AD.1080901@stoneleaf.us> Message-ID: Better, try the pip tracker at https://github.com/pypa/pip/issues --Guido (mobile) On Oct 7, 2016 4:06 PM, "Jo?o Matos" wrote: > Hello, > > Ok, did that. > > Thanks, > > JM > > > On 08-10-2016 00:03, Ethan Furman wrote: > >> On 10/07/2016 10:19 AM, Jo?o Matos wrote: >> >> I believe it would be helpful if pip checked if there are any dependent >>> packages before uninstalling a package. >>> >> >> Seems like a good idea, but you'll need to suggest it at >> >> distutils-sig at python.org >> >> as that is where pip is developed. >> >> -- >> ~Ethan~ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 8 20:25:29 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 9 Oct 2016 11:25:29 +1100 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: Message-ID: <20161009002527.GM22471@ando.pearwood.info> On Sat, Oct 08, 2016 at 09:26:13PM +0200, Jelte Fennema wrote: > I have an idea to improve indenting guidelines for dictionaries for better > readability: If a value in a dictionary literal is placed on a new line, it > should have (or at least be allowed to have) a n additional hanging indent. > > Below is an example: > > mydict = {'mykey': > 'a very very very very very long value', > 'secondkey': 'a short value', > 'thirdkey': 'a very very very ' > 'long value that continues on the next line', > } Looks good to me, except that my personal preference for the implicit string concatenation (thirdkey) is to move the space to the following line, and (if possible) align the parts: mydict = {'mykey': 'a very very very very very long value', 'secondkey': 'a short value', 'thirdkey': 'a very very very' ' long value that continues on the next line', } (And also align the closing brace with the opening brace.) Really long lines like thirdkey are ugly no matter what you do, but I find that the leading space stands out more than the trailing space, and makes it more obvious that something out of the ordinary is going on. Very few string literals start with a leading space, so when I see one, I know to look more closely. In your example, I find that I don't even notice the trailing space unless I read the string very carefully. -- Steve From p.f.moore at gmail.com Sun Oct 9 07:43:03 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 9 Oct 2016 12:43:03 +0100 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: <20161009002527.GM22471@ando.pearwood.info> References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: On 9 October 2016 at 01:25, Steven D'Aprano wrote: > On Sat, Oct 08, 2016 at 09:26:13PM +0200, Jelte Fennema wrote: >> I have an idea to improve indenting guidelines for dictionaries for better >> readability: If a value in a dictionary literal is placed on a new line, it >> should have (or at least be allowed to have) a n additional hanging indent. >> >> Below is an example: >> >> mydict = {'mykey': >> 'a very very very very very long value', >> 'secondkey': 'a short value', >> 'thirdkey': 'a very very very ' >> 'long value that continues on the next line', >> } > > Looks good to me, except that my personal preference for the implicit > string concatenation (thirdkey) is to move the space to the > following line, and (if possible) align the parts: The proposed approach looks good to me, but I'm a strong believe that when you get to situations this complex, the overriding rule should always be "use your judgement". For thirdkey, it's quite possible I'd advise splitting the value out into a named variable. I'd probably lay this out as # Less indent needed for keys, so thirdkey fits better in this case mydict = { 'mykey': 'a very very very very very long value', 'secondkey': 'a short value', 'thirdkey': 'a very very very long value that continues on the next line', } Or # Move the troublesome value out into a named variable val3 = 'a very very very long value that continues on the next line' mydict = { 'mykey': 'a very very very very very long value', 'secondkey': 'a short value', 'thirdkey': val3, } or if I *really* had to split val3, I might go for # Triple-quote/backslash lets you start at the left margin. # And I personally find space-backslash more noticeable than space-quote # Space on the second line is often semantically less consistent (depends on the value) val3 = '''\ a very very very long value that \ continues on the next line''' or even # Just give up on trying to make it a constant val3 = ' '.join([ "a very very very long value that", "continues on the next line" ]) There's also the option of simply giving up on the line length guideline for this one value, and putting the constant on one long line. Lots of possibilities, and which one I'd go for depends on context and the actual content of a (non-toy) example. Paul From lkb.teichmann at gmail.com Mon Oct 10 05:59:11 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Mon, 10 Oct 2016 11:59:11 +0200 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: Hi, > Honestly before writing a lot of code here I'd like to hear more from > Martin about the spread of mistakes he's observed among his users. Over the weekend, I tried to classify the mistakes I found. Most of the times, it's something like "I'm just doing a quick lookup on the database, that shouldn't be a problem". For people coming from a threading background, this are indeed fast operations, they don't consider such calls as blocking. In the end, it all boils down to some read operation down in some non-asyncio code. This is why I got my idea to flag such calls. Unfortunately, I realized that it is nearly impossible to tell whether a read call is blocking or not. We would need to know whether the file descriptor we read from was created as non-blocking, or whether it was an actual file, and how fast the file storage is for this file (SSD: maybe fine, Network: to slow, magnetic disk: dunno). All of this is unfortunately not a Python issue, but an issue for the underlying operating system. So I guess I have to tell my users to program carefully and think about what they're reading from. No automatic detection of problems seems to be possible, at least not easily. Greetings Martin From rosuav at gmail.com Mon Oct 10 06:36:42 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 10 Oct 2016 21:36:42 +1100 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: On Mon, Oct 10, 2016 at 8:59 PM, Martin Teichmann wrote: > We would need to know whether the file descriptor we > read from was created as non-blocking, or whether it was an actual > file, and how fast the file storage is for this file (SSD: maybe fine, > Network: to slow, magnetic disk: dunno). All of this is unfortunately > not a Python issue, but an issue for the underlying operating system. Probably not worth trying to categorize those reads by source. However, one important feature would be: coming from cache, or actually waiting for content? With pipes and sockets, this is a very significant difference, and if you've done a peek() or select() to find that there is content there, a read() should be perfectly legal, even in an asyncio world. ChrisA From njs at pobox.com Mon Oct 10 11:32:19 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Oct 2016 08:32:19 -0700 Subject: [Python-ideas] Flagging blocking functions not to be used with asyncio In-Reply-To: References: <8d5342a4-6f5b-9545-11d4-25b3613631c1@gmail.com> Message-ID: On Mon, Oct 10, 2016 at 2:59 AM, Martin Teichmann wrote: > This is why I got my idea to flag such calls. Unfortunately, I > realized that it is nearly impossible to tell whether a read call is > blocking or not. We would need to know whether the file descriptor we > read from was created as non-blocking, or whether it was an actual > file, and how fast the file storage is for this file (SSD: maybe fine, > Network: to slow, magnetic disk: dunno). All of this is unfortunately > not a Python issue, but an issue for the underlying operating system. Yeah, it really doesn't help that a synchronous network query to a remote SSD-backed database can easily be lower latency than a synchronous local disk read to spinning media, yet the fact that we have async network APIs but no standard async disk APIs means that we would inevitably find ourselves warning about the former case while letting the latter one pass silently... -n -- Nathaniel J. Smith -- https://vorpus.org From vashek at gmail.com Mon Oct 10 20:43:36 2016 From: vashek at gmail.com (=?UTF-8?B?VsOhY2xhdiBEdm/FmcOhaw==?=) Date: Tue, 11 Oct 2016 02:43:36 +0200 Subject: [Python-ideas] suppressing exception context when it is not relevant Message-ID: I'm aware of "raise ... from None" (from PEP 415). However, how can I achieve that same effect (of suppressing the "During handling of the above exception, another exception occurred" message) without having control over the code that is executed from the except clause? I thought that sys.exc_clear() could be used for this, but that function doesn't exist in Python 3 anymore. Why would I want this? I have some simple caching code that looks like (simplified): try: value = cache_dict[key] except KeyError: value = some_api.get_the_value_via_web_service_call(key) cache_dict[key] = value When there's an exception in the API call, the output will be something like this: Traceback (most recent call last): File ..., line ..., in ... KeyError: '...' During handling of the above exception, another exception occurred: Traceback (most recent call last): File ..., line ..., in ... some_api.TheInterestingException: ... But I find this misleading, as the original KeyError is not really an error at all. I could of course avoid the situation by changing the try/except (EAFP) into a test for the key's presence (LBYL) but that's not very Pythonic and less thread-friendly (not that the above is thread-safe as is, but that's beside the point). Also, yes, I could instead subclass dict and implement __missing__, but that's only a solution for this particular case. The problem (if you agree it's a problem) occurs any time an exception is not actually an error, but rather a condition that just happens to be indicated by an exception. It's unreasonable to expect all code in some_api to change their raise X to raise X from None (and it wouldn't even make sense in all cases). Is there a clean solution to avoid the unwanted exception chain in the error message? If not, would it make sense to re-introduce sys.exc_clear() for this purpose? (I originally asked about this here: http://stackoverflow. com/questions/30235516/how-to-suppress-displaying-the- parent-exception-the-cause-for-subsequent-excep but find the answer unappealing.) Vashek -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Oct 10 20:53:42 2016 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 11 Oct 2016 01:53:42 +0100 Subject: [Python-ideas] suppressing exception context when it is not relevant In-Reply-To: References: Message-ID: On 2016-10-11 01:43, V?clav Dvo??k wrote: > I'm aware of "raise ... from None" (from PEP 415). However, how can I > achieve that same effect (of suppressing the "During handling of the > above exception, another exception occurred" message) without having > control over the code that is executed from the except clause? I thought > that sys.exc_clear() could be used for this, but that function doesn't > exist in Python 3 anymore. > > Why would I want this? I have some simple caching code that looks like > (simplified): > > try: > value = cache_dict[key] > except KeyError: > value = some_api.get_the_value_via_web_service_call(key) > cache_dict[key] = value > > > When there's an exception in the API call, the output will be something > like this: > > Traceback (most recent call last): > File ..., line ..., in ... > KeyError: '...' > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File ..., line ..., in ... > some_api.TheInterestingException: ... > > > But I find this misleading, as the original KeyError is not really an > error at all. I could of course avoid the situation by changing the > try/except (EAFP) into a test for the key's presence (LBYL) but that's > not very Pythonic and less thread-friendly (not that the above is > thread-safe as is, but that's beside the point). Also, yes, I could > instead subclass dict and implement __missing__, but that's only a > solution for this particular case. The problem (if you agree it's a > problem) occurs any time an exception is not actually an error, but > rather a condition that just happens to be indicated by an exception. > > It's unreasonable to expect all code in some_api to change their raise > X to raise X from None (and it wouldn't even make sense in all cases). > Is there a clean solution to avoid the unwanted exception chain in the > error message? > > If not, would it make sense to re-introduce sys.exc_clear() for this > purpose? > > (I originally asked about this > here: http://stackoverflow.com/questions/30235516/how-to-suppress-displaying-the-parent-exception-the-cause-for-subsequent-excep > but > find the answer unappealing.) > You could use a sentinel instead: MISSING = object() value = cache_dict.get(key, MISSING) if value is MISSING: value = some_api.get_the_value_via_web_service_call(key) cache_dict[key] = value From python at lucidity.plus.com Mon Oct 10 21:00:40 2016 From: python at lucidity.plus.com (Erik) Date: Tue, 11 Oct 2016 02:00:40 +0100 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: <8ba0f2cc-15a1-3baf-3274-a67c646769aa@lucidity.plus.com> On 09/10/16 12:43, Paul Moore wrote: > I'd probably lay this out as > > # Less indent needed for keys, so thirdkey fits better in this case > mydict = { > 'mykey': 'a very very very very very long value', > 'secondkey': 'a short value', > 'thirdkey': > 'a very very very long value that continues on the next line', > } +1 from me on this general style of layout. Why associate the indentation level with the name of the identifier being bound? Treat the opening parenthesis as beginning a "suite" of indented key/value pairs in the same way as a colon introduces an indented suite of statements in other constructs. It may not be part of the formal syntax, but it's consistent with other constructs in the language that _are_ defined by the formal syntax. E. From tim.peters at gmail.com Mon Oct 10 23:02:12 2016 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 10 Oct 2016 22:02:12 -0500 Subject: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: [please restrict follow-ups to python-ideas] Let's not get hung up on meta-discussion here - I always thought "massive clusterf**k" was a precise technical term anyway ;-) While timing certainly needs to be done more carefully, it's obvious to me that this approach _should_ pay off significantly when it applies. Comparisons are extraordinarily expensive in Python, precisely because of the maze of test-and-branch code it requires just to figure out which bottom-level comparison function to invoke each time. That's why I spent months of my life (overall) devising a sequence of sorting algorithms for Python that reduced the number of comparisons needed. Note that when Python's current sort was adopted in Java, they still kept a quicksort variant for "unboxed" builtin types. The adaptive merge sort incurs many overheads that often cost more than they save unless comparisons are in fact very expensive compared to the cost of pointer copying (and in Java comparison of unboxed types is cheap). Indeed, for native numeric types, where comparison is dirt cheap, quicksort generally runs faster than mergesort despite that the former does _more_ comparisons (because mergesort does so much more pointer-copying). I had considered something "like this" for Python 2, but didn't pursue it because comparison was defined between virtually any two types (34 < [1], etc), and people were careless about that (both by design and by accident). In Python 3, comparison "blows up" for absurdly mixed types, so specializing for homogeneously-typed lists is a more promising idea on the face of it. The comparisons needed to determine _whether_ a list's objects have a common type is just len(list)-1 C-level pointer comparisons, and so goes fast. So I expect that, when it applies, this would speed even sorting an already-ordered list with at least 2 elements. For a mixed-type list with at least 2 elements, it will always be pure loss. But (a) I expect such lists are uncommon (and especially uncommon in Python 3); and (b) a one-time scan doing C-level pointer comparisons until finding a mismatched type is bound to be a relatively tiny cost compared to the expense of all the "rich comparisons" that follow. So +1 from me on pursuing this. Elliot, please: - Keep this on python-ideas. python-dev is for current issues in Python development, not for speculating about changes. - Open an issue on the tracker: https://bugs.python.org/ - At least browse the info for developers: https://docs.python.org/devguide/ - Don't overlook Lib/test/sortperf.py. As is, it should be a good test of what your approach so far _doesn't_ help, since it sorts only lists of floats (& I don't think you're special-casing them). If the timing results it reports aren't significantly hurt (and I expect they won't be), then add specialization for floats too and gloat about the speedup :-) - I expect tuples will also be worth specializing (complex sort keys are often implemented as tuples). Nice start! :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Mon Oct 10 23:29:28 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 03:29:28 +0000 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: Thanks for looking at this! That's why I spent months of my life (overall) devising a sequence of sorting algorithms for Python that reduced the number of comparisons needed. Yes, that's why I think this is so cool: for a couple dozen lines of code, we can get (at least for some cases, according to my questionable benchmarks) the kinds of massive improvements you had to use actual computer science to achieve (as opposed to mere hackery). Note that when Python's current sort was adopted in Java, they still kept a quicksort variant for "unboxed" builtin types. The adaptive merge sort incurs many overheads that often cost more than they save unless comparisons are in fact very expensive compared to the cost of pointer copying (and in Java comparison of unboxed types is cheap). Indeed, for native numeric types, where comparison is dirt cheap, quicksort generally runs faster than mergesort despite that the former does _more_ comparisons (because mergesort does so much more pointer-copying). Ya, I think this may be a good approach for floats: if the list is all floats, just copy all the floats into a seperate array, use the standard library quicksort, and then construct a sorted PyObject* array. Like maybe set up a struct { PyObject* payload, float key } type of deal. This wouldn't work for strings (unicode is scary), and probably not for ints (one would have to check that all the ints are within C long bounds). Though on the other hand perhaps this would be too expensive? I had considered something "like this" for Python 2, but didn't pursue it because comparison was defined between virtually any two types (34 < [1], etc), and people were careless about that (both by design and by accident). In Python 3, comparison "blows up" for absurdly mixed types, so specializing for homogeneously-typed lists is a more promising idea on the face of it. The comparisons needed to determine _whether_ a list's objects have a common type is just len(list)-1 C-level pointer comparisons, and so goes fast. So I expect that, when it applies, this would speed even sorting an already-ordered list with at least 2 elements. That's what my crude benchmarks indicate... when I applied my sort to a list of 1e7 ints with a float tacked on the end, my sort actually ended up being a bit faster over several trials (which I attribute to PyObject_RichCompare == Py_True being faster than PyObject_RichCompareBool == 1, apologies for any typos in that code). For a mixed-type list with at least 2 elements, it will always be pure loss. But (a) I expect such lists are uncommon (and especially uncommon in Python 3); and (b) a one-time scan doing C-level pointer comparisons until finding a mismatched type is bound to be a relatively tiny cost compared to the expense of all the "rich comparisons" that follow. So +1 from me on pursuing this. Elliot, please: - Keep this on python-ideas. python-dev is for current issues in Python development, not for speculating about changes. - Open an issue on the tracker: https://bugs.python.org/ OK - At least browse the info for developers: https://docs.python.org/devguide/ Ya, I'm working on setting this up as a patch in the hg repo as opposed to an extension module to make benchmarking cleaner/more sane. - Don't overlook Lib/test/sortperf.py. As is, it should be a good test of what your approach so far _doesn't_ help, since it sorts only lists of floats (& I don't think you're special-casing them). If the timing results it reports aren't significantly hurt (and I expect they won't be), then add specialization for floats too and gloat about the speedup :-) Ya, I mean they aren't special-cased, but homogenous lists of floats still fit in the tp->rich_compare case, which still bypasses the expensive PyObject_RichCompare. I'll guess I'll see when I implement this as a patch and can run it on sortperf.py. - I expect tuples will also be worth specializing (complex sort keys are often implemented as tuples). I'm not sure what you mean here... I'm looking at the types of lo.keys, not of saved_ob_item (I think I said that earlier in this thread by mistake actually). So if someone is passing tuples and using itemgetter to extract ints or strings or whatever, the current code will work fine; lo.keys will be scalar types. Unless I misunderstand you here. I mean, when would lo.keys actually be tuples? Nice start! :-) Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Oct 10 23:31:25 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 11 Oct 2016 14:31:25 +1100 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: <22524.23684.863380.593596@turnbull.sk.tsukuba.ac.jp> References: <22524.23684.863380.593596@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Oct 11, 2016 at 2:29 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > Given that it's not changing semantics at all, just adding info/hints > > to an error message, it could well be added in a point release. > > But it does change semantics, specifically for doctests. Blah, forgot about doctests. Guess that's off the cards for a point release, then, but still, shouldn't be a big deal for 3.7. ChrisA From rosuav at gmail.com Mon Oct 10 23:37:51 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 11 Oct 2016 14:37:51 +1100 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky wrote: > Ya, I think this may be a good approach for floats: if the list is all > floats, just copy all the floats into a seperate array, use the standard > library quicksort, and then construct a sorted PyObject* array. Like maybe > set up a struct { PyObject* payload, float key } type of deal. Not quite sure what you mean here. What is payload, what is key? Are you implying that the original float objects could be destroyed and replaced with others of equal value? Python (unlike insurance claims) guarantees that you get back the exact same object as you started with. ChrisA From elliot.gorokhovsky at gmail.com Mon Oct 10 23:41:35 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 03:41:35 +0000 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: Oh no, the idea here is just you would copy over the floats associated with the PyObject* and keep them in an array of such structs, so that we know which PyObject* are associated with which floats. Then after the standard library quicksort sorts them you would copy the PyObject* into the list. So you sort the PyObject* keyed by the floats. Anyway, I think the copying back and forth would probably be too expensive, it's just an idea. Also, I apologize for the formatting of my last email, I didn't realize Inbox would mess up the quoting like that. I'll ensure I use plain-text quotes from now on. On Mon, Oct 10, 2016 at 9:38 PM Chris Angelico wrote: > On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky > wrote: > > Ya, I think this may be a good approach for floats: if the list is all > > floats, just copy all the floats into a seperate array, use the standard > > library quicksort, and then construct a sorted PyObject* array. Like > maybe > > set up a struct { PyObject* payload, float key } type of deal. > > Not quite sure what you mean here. What is payload, what is key? Are > you implying that the original float objects could be destroyed and > replaced with others of equal value? Python (unlike insurance claims) > guarantees that you get back the exact same object as you started > with. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Oct 10 23:48:52 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 11 Oct 2016 14:48:52 +1100 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Tue, Oct 11, 2016 at 2:41 PM, Elliot Gorokhovsky wrote: > Oh no, the idea here is just you would copy over the floats associated with > the PyObject* and keep them in an array of such structs, so that we know > which PyObject* are associated with which floats. Then after the standard > library quicksort sorts them you would copy the PyObject* into the list. So > you sort the PyObject* keyed by the floats. Anyway, I think the copying back > and forth would probably be too expensive, it's just an idea. It also wouldn't work if you have more than one object with the same value. >>> x = 1.0 >>> y = 2.0/2 >>> x is y False >>> l = [x, y, x] >>> l.sort() >>> assert l[0] is x >>> assert l[1] is y >>> assert l[2] is x Python's sort is stable, so the three elements of the list (being all equal) must remain in the same order. ChrisA From elliot.gorokhovsky at gmail.com Mon Oct 10 23:51:55 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 03:51:55 +0000 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: It would still be stable. You would copy over {{x,1.0},{y,1.0},{x,1.0}}, and as long as a stable sort is used you would get out the same array, using the cmp function left->key < right->key. Then you would go in order, copying back [x,y,x]. On Mon, Oct 10, 2016 at 9:49 PM Chris Angelico wrote: > On Tue, Oct 11, 2016 at 2:41 PM, Elliot Gorokhovsky > wrote: > > Oh no, the idea here is just you would copy over the floats associated > with > > the PyObject* and keep them in an array of such structs, so that we know > > which PyObject* are associated with which floats. Then after the standard > > library quicksort sorts them you would copy the PyObject* into the list. > So > > you sort the PyObject* keyed by the floats. Anyway, I think the copying > back > > and forth would probably be too expensive, it's just an idea. > > It also wouldn't work if you have more than one object with the same value. > > >>> x = 1.0 > >>> y = 2.0/2 > >>> x is y > False > >>> l = [x, y, x] > >>> l.sort() > >>> assert l[0] is x > >>> assert l[1] is y > >>> assert l[2] is x > > Python's sort is stable, so the three elements of the list (being all > equal) must remain in the same order. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Mon Oct 10 23:52:48 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Tue, 11 Oct 2016 03:52:48 +0000 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Mon, Oct 10, 2016 at 11:30 PM Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > - I expect tuples will also be worth specializing (complex sort keys are > often implemented as tuples). > > I'm not sure what you mean here... I'm looking at the types of lo.keys, > not of saved_ob_item (I think I said that earlier in this thread by mistake > actually). So if someone is passing tuples and using itemgetter to extract > ints or strings or whatever, the current code will work fine; lo.keys will > be scalar types. Unless I misunderstand you here. I mean, when would > lo.keys actually be tuples? > If someone wanted to sort, e.g., a table (likely a list of tuples) by multiple columns at once, they might pass the key function as `itemgetter(3, 4, 5)`, meaning to sort by "column" (actually item) 3, then columns 4 and then 5 as tiebreakers. This itemgetter will return a new tuple of three items, that tuple being the key to sort by. Since tuples sort by the first different item, in this theoretical example the result of sort() will be exactly what the user wanted: a table sorted by three columns at once. A practical example of such a use case is sorting by last name first and then by first name where two people have the same last name. Assuming a list of dicts in this case, the key function passed to sort() would simply be `itemgetter('lastname", "firstname")`, which returns a tuple of two items to use as the key. So yes, there are perfectly valid use cases for tuples as keys. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 10 23:29:08 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 11 Oct 2016 12:29:08 +0900 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: <22524.23684.863380.593596@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > Given that it's not changing semantics at all, just adding info/hints > to an error message, it could well be added in a point release. But it does change semantics, specifically for doctests. I seem to recall that that is considered a blocker for this kind of change in a maintenance-only branch. In the end that's probably up to the RM, but I would be mildly against it. FWIW YMMV of course. From guido at python.org Tue Oct 11 00:15:20 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Oct 2016 21:15:20 -0700 Subject: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Mon, Oct 10, 2016 at 7:56 PM, Elliot Gorokhovsky wrote: > So here's a simple attempt at taking lots of measurements just using > time.time() with lists of ints. The results are great, if they are valid > (which I leave to you to judge); even for lists with just one element, it's > 16% faster! But that's suspicious in itself -- since no comparisons are needed to sort a 1-element list, if it's still faster, there must be something else you're doing (or not doing) that's affecting the time measured. I wonder if it's the method lookup that's is slower than the entire call duration? That explains why s[:1] == 'x' is faster than s.startswith('x'), for example. A simple nit on your test code: calling time() twice per iteration could also affect things. I would just call time() once before and once after the innermost for-loops. (IIRC timeit tries to compensate for the cost of the loop itself by measuring an empty loop, but that's got its own set of problems.) Anyway, you should ignore me and listen to Tim, so I'll shut up now. -- --Guido van Rossum (python.org/~guido) From elliot.gorokhovsky at gmail.com Tue Oct 11 00:22:04 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 04:22:04 +0000 Subject: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Mon, Oct 10, 2016 at 10:15 PM Guido van Rossum wrote: > But that's suspicious in itself -- since no comparisons are needed to > > sort a 1-element list, if it's still faster, there must be something > > else you're doing (or not doing) that's affecting the time measured. > Oh, ya. Duh. So that's weird... I would very much to figure out what causes that, actually. I don't think method calling has anything to do with it, since I'm subclassing list (could be wrong though). Perhaps it has to do with the fact that my sort method is compiled on my laptop while my python is a distributed binary? I will be able to rule that out when I implement this as a patch instead of an extension module and test my own build. Anyway, thanks for looking at all this, I will post on the bug tracker and on here once I have something more mature; this feedback has been very useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 11 00:49:28 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 11 Oct 2016 14:49:28 +1000 Subject: [Python-ideas] suppressing exception context when it is not relevant In-Reply-To: References: Message-ID: On 11 October 2016 at 10:43, V?clav Dvo??k wrote: > But I find this misleading, as the original KeyError is not really an error > at all. I could of course avoid the situation by changing the try/except > (EAFP) into a test for the key's presence (LBYL) but that's not very > Pythonic and less thread-friendly (not that the above is thread-safe as is, > but that's beside the point). Also, yes, I could instead subclass dict and > implement __missing__, but that's only a solution for this particular case. > The problem (if you agree it's a problem) occurs any time an exception is > not actually an error, but rather a condition that just happens to be > indicated by an exception. > > It's unreasonable to expect all code in some_api to change their raise X to > raise X from None (and it wouldn't even make sense in all cases). Is there a > clean solution to avoid the unwanted exception chain in the error message? Yes, you can restructure the code so you're not doing further work in the exception handler, and instead do the work after the try/except block finishes and the exception context is cleared automatically: value = MISSING = object() try: value = cache_dict[key] except KeyError: pass if value is MISSING: value = some_api.get_the_value_via_web_service_call(key) cache_dict[key] = value (This is the general case of MRAB's answer, as the try/except KeyError/pass pattern above is what dict.get() implements) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Oct 11 01:20:56 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 11 Oct 2016 18:20:56 +1300 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: <57FC76B8.6030103@canterbury.ac.nz> Elliot Gorokhovsky wrote: > if the list is all > floats, just copy all the floats into a seperate array, use the standard > library quicksort, and then construct a sorted PyObject* array. My question would be whether sorting list of just floats (or where the keys are just floats) is common enough to be worth doing this. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 11 01:31:05 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 11 Oct 2016 18:31:05 +1300 Subject: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: <57FC7919.9020503@canterbury.ac.nz> Elliot Gorokhovsky wrote: > I will be able to rule that out > when I implement this as a patch instead of an extension module and test > my own build. You could test it against a locally built Python without having to go that far. -- Greg From srkunze at mail.de Tue Oct 11 03:30:06 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 11 Oct 2016 09:30:06 +0200 Subject: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: <3497414f-c7d6-65a0-7606-b197ac824059@mail.de> On 11.10.2016 05:02, Tim Peters wrote: > Let's not get hung up on meta-discussion here - I always thought > "massive clusterf**k" was a precise technical term anyway ;-) I thought so as well. ;) http://www.urbandictionary.com/define.php?term=clusterfuck Cheers, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarek at ziade.org Tue Oct 11 08:09:49 2016 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek=20Ziad=E9?=) Date: Tue, 11 Oct 2016 14:09:49 +0200 Subject: [Python-ideas] Adding full() to collections.deque Message-ID: <1476187789.4014609.752298193.7F77D50F@webmail.messagingengine.com> Hey, When creating deque instances using a value for maxlen, it would be nice to have a .full() method like what Queue provides, so one may do: my_deque = deque(maxlen=300) if my_deque.full(): do_something() instead of doing: if len(my_deque) == my_deque.maxlen: do_something() If people think it's a good idea, I can add a ticket in the tracker and try to provide a patch for the collections module maintainer. If this was already talked about, or is a bad idea, sorry! :) Cheers Tarek -- Tarek Ziad? | coding: https://ziade.org | running: https://foule.es | twitter: @tarek_ziade From srkunze at mail.de Tue Oct 11 08:41:34 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 11 Oct 2016 14:41:34 +0200 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles Message-ID: Hey python-ideas, on django-developers, an intriguing idea appeared: https://groups.google.com/d/msg/django-developers/4bntzg1HwwY/HHHjbDnLBQAJ """ It seems to me that the default `method.__bool__` is undesirable in Jinja2 templates. I do not know Jinja2 well enough, but maybe they could benefit from a patch where `if`-statements give a warning/error when the expression is a callable (with the default `FunctionType.__bool__`? This would solve the issue not just for the methods you mention, but more in general. [Or maybe Python itself should have that warning/error?] """ Background: Django implemented form.is_valid as a function. During development, people fall into the trap of believing it's a property or boolean attribute. That's usually not big deal but can take substantial amount of time when writing non trivial code among which reside following innocuous-looking lines: if obj.has_special_property: # will always # be executed What do you think about that Python emitting an warning/error as described above? Cheers, Sven From p.f.moore at gmail.com Tue Oct 11 09:02:59 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Oct 2016 14:02:59 +0100 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: Message-ID: On 11 October 2016 at 13:41, Sven R. Kunze wrote: > maybe they could benefit > from a patch where `if`-statements give a warning/error when the expression > is a callable (with the default `FunctionType.__bool__`? [...] > What do you think about that Python emitting an warning/error as described > above? Interesting idea. There may be some issues - consider an object that may optionally have a handler method handle_event, and you want to call that method if it exists: handler = getattr(obj, 'handle_event', None) if handler: # prepare arguments handler(args) That could would break (technically, it would produce an incorrect warning) with this change. I do think that the scenario you described is a valid one - and there's no obvious "better name". The stdlib module pathlib uses the same pattern "my_path.is_absolute()", and IIRC I've made the mistake you described (although I don't recall any major trauma, so the problem was probably fixed relatively quickly). I'm not sure: Pros: - Catches an annoying and potentially hard to spot bug Cons: - Would trigger on certain reasonable coding patterns that aren't an error - IMO, "false positives" in warnings are very annoying, particularly in Python where they are runtime rather than compile-time, and so affect the end user (if they aren't fixed in development) Paul From mar77i at mar77i.ch Tue Oct 11 08:42:54 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Tue, 11 Oct 2016 14:42:54 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: Message-ID: Hello list I love the "new" unpacking generalisations as of pep448. And I found myself using them rather regularly, both with lists and dict. Today I somehow expected that [*foo for foo in bar] was equivalent to itertools.chain(*[foo for foo in bar]), which it turned out to be a SyntaxError. The dict equivalent of the above might then be something along the lines of {**v for v in dict_of_dicts.values()}. In case the values (which be all dicts) are records with the same keys, one might go and prepend the keys with their former keys using { **dict( ("{}_{}".format(k, k_sub), v_sub) for k_sub, v_sub in v.items() ) for k, v in dict_of_dicts.items() } Was anyone able to follow me through this? cheers! mar77i From ericsnowcurrently at gmail.com Tue Oct 11 09:38:23 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 11 Oct 2016 07:38:23 -0600 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: Message-ID: On Tue, Oct 11, 2016 at 7:02 AM, Paul Moore wrote: > Interesting idea. There may be some issues - consider an object that > may optionally have a handler method handle_event, and you want to > call that method if it exists: > > handler = getattr(obj, 'handle_event', None) > if handler: > # prepare arguments > handler(args) This isn't a problem if you test for None: "if handler is not None". I was going to concede that there are probably other cases where that doesn't help, but can't think of any. In what case would a callable also evaluate to False, or you have a collection of arbitrary mixed values and you are testing their truthiness? So perhaps always using a None test is the correct answer if we add the warning. That said, I have no illusions that I know all possible use cases... :) > > That could would break (technically, it would produce an incorrect > warning) with this change. > > I do think that the scenario you described is a valid one - and > there's no obvious "better name". The stdlib module pathlib uses the > same pattern "my_path.is_absolute()", and IIRC I've made the mistake > you described (although I don't recall any major trauma, so the > problem was probably fixed relatively quickly). FWIW, I do the same thing from time to time and I usually only catch it when writing unit tests. I'm +1 to a warning as proposed. -eric From elazarg at gmail.com Tue Oct 11 09:50:29 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Tue, 11 Oct 2016 13:50:29 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: Message-ID: I thought about it a lot recently. Specifically on your proposal, and in general. Unpacking expression can have a much more uniform treatment in the language, as an expression with special "bare tuple" type - like tuple, but "without the braces". It also gives mental explanation for the conditional expression, where "a if cond" is an unpack expression whose value is "*[a]" if cond hold, and "*[]" otherwise. without context, this is an error. But with an else, "a if cond else b" becomes "*[] else b" which evaluates to b. The result is exactly like today, but gives the ability to build conditional elements in a list literal: x = [foo(), bar() if cond, goo()] y = [1, bar()?, 3] x is a list of 2 elements or three elements, depending of the truthness of cond. y is a list of 2 elements or three elements, depending on whether bar() is None. It also opens the gate for None-coercion operator (discussed recently), where "x?" is replaced with "*[x if x is None]". If operations on this expression are mapped into the elements, "x?.foo" becomes "*[x.foo if x is None]" which is "x.foo" if x is not None, and "*[]" otherwise. It is similar to except-expression, but without actual explicit exception handling, and thus much more readable. Elazar On Tue, Oct 11, 2016 at 4:08 PM Martti K?hne wrote: > Hello list > > I love the "new" unpacking generalisations as of pep448. And I found > myself using them rather regularly, both with lists and dict. > Today I somehow expected that [*foo for foo in bar] was equivalent to > itertools.chain(*[foo for foo in bar]), which it turned out to be a > SyntaxError. > The dict equivalent of the above might then be something along the > lines of {**v for v in dict_of_dicts.values()}. In case the values > (which be all dicts) are records with the same keys, one might go and > prepend the keys with their former keys using > { > **dict( > ("{}_{}".format(k, k_sub), v_sub) > for k_sub, v_sub in v.items() > ) for k, v in dict_of_dicts.items() > } > Was anyone able to follow me through this? > > cheers! > mar77i > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Oct 11 09:59:47 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Oct 2016 14:59:47 +0100 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: Message-ID: On 11 October 2016 at 14:38, Eric Snow wrote: > On Tue, Oct 11, 2016 at 7:02 AM, Paul Moore wrote: >> Interesting idea. There may be some issues - consider an object that >> may optionally have a handler method handle_event, and you want to >> call that method if it exists: >> >> handler = getattr(obj, 'handle_event', None) >> if handler: >> # prepare arguments >> handler(args) > > This isn't a problem if you test for None: "if handler is not None". > I was going to concede that there are probably other cases where that > doesn't help, but can't think of any. In what case would a callable > also evaluate to False, or you have a collection of arbitrary mixed > values and you are testing their truthiness? So perhaps always using > a None test is the correct answer if we add the warning. That said, I > have no illusions that I know all possible use cases... :) Certainly. But the whole point of the warning is to catch people who do the wrong thing. All I'm saying is that the new warning would cause problems for people who omit a (currently unnecessary) "is None" check in the process of protecting people who forget whether an attribute is a method or a property. Which group do we prefer to help? As I said, I find false positives with warnings particularly annoying, so personally I may find it less acceptable to ask people with working code to tighten it up than others do. It's a trade-off (and although "warns on correct code" is very close to a backward compatibility break[1], I *do* think it's worth considering here - it's just that my personal feelings are mixed). Paul [1] Technically it *is* a break. But I'm willing to give a little wiggle room to the argument that "it's only a warning". From steve at pearwood.info Tue Oct 11 10:00:20 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Oct 2016 01:00:20 +1100 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: Message-ID: <20161011140019.GQ22471@ando.pearwood.info> On Tue, Oct 11, 2016 at 02:41:34PM +0200, Sven R. Kunze wrote: > Hey python-ideas, > > on django-developers, an intriguing idea appeared: > https://groups.google.com/d/msg/django-developers/4bntzg1HwwY/HHHjbDnLBQAJ > > """ > It seems to me that the default `method.__bool__` is undesirable in > Jinja2 templates. I do not know Jinja2 well enough, but maybe they could > benefit from a patch where `if`-statements give a warning/error when the > expression is a callable (with the default `FunctionType.__bool__`? > This would solve the issue not just for the methods you mention, but > more in general. That should be easy enough to do as a custom descriptor. But I would not like to see the default function or method __bool__ raise a warning. Consider processing a sequence of functions/methods, skipping any which are None: for func in callables: if func is not None: func(some_arg) I often written code like that. Now imagine that somebody reasons that since all functions and methods are truthy, and None if falsey, we can write the code as: for func in callables: if func: func(some_arg) That's perfectly reasonable code too, and it should be purely a matter of taste whether you prefer that or the first version. But with this suggestion, we get flooded by spurious warnings. So I think this is something that Django/Jinja2 should implement for its own methods that need it, it should not be a general feature of all Python functions/methods. -- Steve From ethan at stoneleaf.us Tue Oct 11 11:34:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 11 Oct 2016 08:34:22 -0700 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: <20161011140019.GQ22471@ando.pearwood.info> References: <20161011140019.GQ22471@ando.pearwood.info> Message-ID: <57FD067E.5000101@stoneleaf.us> On 10/11/2016 07:00 AM, Steven D'Aprano wrote: > On Tue, Oct 11, 2016 at 02:41:34PM +0200, Sven R. Kunze wrote: >> on django-developers, an intriguing idea appeared: >> https://groups.google.com/d/msg/django-developers/4bntzg1HwwY/HHHjbDnLBQAJ >> >> """ >> It seems to me that the default `method.__bool__` is undesirable in >> Jinja2 templates. I do not know Jinja2 well enough, but maybe they could >> benefit from a patch where `if`-statements give a warning/error when the >> expression is a callable (with the default `FunctionType.__bool__`? >> This would solve the issue not just for the methods you mention, but >> more in general. > > That should be easy enough to do as a custom descriptor. > > But I would not like to see the default function or method __bool__ > raise a warning. [...] > So I think this is something that Django/Jinja2 should implement for its > own methods that need it, it should not be a general feature of all > Python functions/methods. Agreed. Python is a /general/-purpose programming language. We should not make changes to help one subset of users when those changes will harm another subset (and being flooded with false positives is harmful) -- particularly when easy customization is already available. -- ~Ethan~ From erik.m.bray at gmail.com Tue Oct 11 11:40:03 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Tue, 11 Oct 2016 17:40:03 +0200 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: <20161009002527.GM22471@ando.pearwood.info> References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: On Sun, Oct 9, 2016 at 2:25 AM, Steven D'Aprano wrote: > On Sat, Oct 08, 2016 at 09:26:13PM +0200, Jelte Fennema wrote: >> I have an idea to improve indenting guidelines for dictionaries for better >> readability: If a value in a dictionary literal is placed on a new line, it >> should have (or at least be allowed to have) a n additional hanging indent. >> >> Below is an example: >> >> mydict = {'mykey': >> 'a very very very very very long value', >> 'secondkey': 'a short value', >> 'thirdkey': 'a very very very ' >> 'long value that continues on the next line', >> } > > Looks good to me, except that my personal preference for the implicit > string concatenation (thirdkey) is to move the space to the > following line, and (if possible) align the parts: > mydict = {'mykey': > 'a very very very very very long value', > 'secondkey': 'a short value', > 'thirdkey': 'a very very very' > ' long value that continues on the next line', > } Heh--not to bikeshed, but my personal preference is to leave the trailing space on the first line. This is because by the time I've started a new line (and possibly have spent time fussing with indentation for the odd cases that my editor doesn't get quite right) I'll have forgotten that I need to start the line with a space :) Best, Erik From guido at python.org Tue Oct 11 11:44:17 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Oct 2016 08:44:17 -0700 Subject: [Python-ideas] Adding full() to collections.deque In-Reply-To: <1476187789.4014609.752298193.7F77D50F@webmail.messagingengine.com> References: <1476187789.4014609.752298193.7F77D50F@webmail.messagingengine.com> Message-ID: Isn't the problem that you don't know if it's still full on the next line? After all you supposedly have a multi-threaded app here, otherwise why bother with any of that? Or maybe you can describe the real-world use case where you wanted this in more detail? Without much more evidence I can't support such a change. On Tue, Oct 11, 2016 at 5:09 AM, Tarek Ziad? wrote: > > Hey, > > When creating deque instances using a value for maxlen, it would be nice > to have a .full() method like what Queue provides, so one may do: > > my_deque = deque(maxlen=300) > > if my_deque.full(): > do_something() > > instead of doing: > > if len(my_deque) == my_deque.maxlen: > do_something() > > > If people think it's a good idea, I can add a ticket in the tracker and > try to provide a patch for the collections module maintainer. > If this was already talked about, or is a bad idea, sorry! :) > > Cheers > Tarek > -- > > Tarek Ziad? | coding: https://ziade.org | running: https://foule.es | > twitter: @tarek_ziade > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Oct 11 12:06:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 12 Oct 2016 02:06:50 +1000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: Message-ID: On 11 October 2016 at 23:50, ????? wrote: > I thought about it a lot recently. Specifically on your proposal, and in > general. Unpacking expression can have a much more uniform treatment in the > language, as an expression with special "bare tuple" type - like tuple, but > "without the braces". That's a recipe for much deeper confusion, as it would make "*expr" and "*expr," semantically identical >>> *range(3), (0, 1, 2) As things stand, the above makes tuple expansion the same as any other expression: you need a comma to actually make it a tuple. If you allow a bare "*" to imply the trailing comma, then it immediately becomes confusing when you actually *do* have a comma present, as the "*" no longer implies a new tuple, it gets absorbed into the surrounding one. That's outright backwards incompatible with the status quo once you take parentheses into account: >>> (*range(3)), (0, 1, 2) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Tue Oct 11 12:08:26 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 11 Oct 2016 11:08:26 -0500 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: On Oct 11, 2016 10:40 AM, "Erik Bray" wrote: > > On Sun, Oct 9, 2016 at 2:25 AM, Steven D'Aprano wrote: > > On Sat, Oct 08, 2016 at 09:26:13PM +0200, Jelte Fennema wrote: > >> I have an idea to improve indenting guidelines for dictionaries for better > >> readability: If a value in a dictionary literal is placed on a new line, it > >> should have (or at least be allowed to have) a n additional hanging indent. > >> > >> Below is an example: > >> > >> mydict = {'mykey': > >> 'a very very very very very long value', > >> 'secondkey': 'a short value', > >> 'thirdkey': 'a very very very ' > >> 'long value that continues on the next line', > >> } > > > > Looks good to me, except that my personal preference for the implicit > > string concatenation (thirdkey) is to move the space to the > > following line, and (if possible) align the parts: > > mydict = {'mykey': > > 'a very very very very very long value', > > 'secondkey': 'a short value', > > 'thirdkey': 'a very very very' > > ' long value that continues on the next line', > > } > > Heh--not to bikeshed, but my personal preference is to leave the > trailing space on the first line. This is because by the time I've > started a new line (and possibly have spent time fussing with > indentation for the odd cases that my editor doesn't get quite right) > I'll have forgotten that I need to start the line with a space :) Until you end up with like 20 merge conflicts because some editors strip trailing whitespace... > > Best, > Erik > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Tue Oct 11 12:16:45 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 16:16:45 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: Warning: the contents of this message may be dangerous for readers with heart conditions. So today I looked at PyFloat_RichCompare. I had been scared initially because it was so complicated, I was worried I would miss some important error check or something if I special cased it. So I took the following approach: I replaced all statements of the form PyLong_Check(w) with 0, and all statements of the form PyFloat_Check(w) with 1. Because that's the whole point; we're cutting out type checks. I then cut out all blocks preceded by if (0). So it's definitely safe. And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE (after we set op=Py_LT)!!! Just return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False : Py_True; So I thought, wow, this will give some nice numbers! But I underestimated the power of this optimization. You have no idea. It's crazy. Since we're dealing with floats, I used Tim Peter's benchmark: Lib/test/sortperf.py, just modifying one function: #Old function Tim Peters wrote: def doit_fast(L): t0 = time.perf_counter() L.fastsort() t1 = time.perf_counter() print("%6.2f" % (t1-t0), end=' ') flush() #My function: def doit(L): F = FastList(L) f0 = time.perf_counter() F.fastsort() f1 = time.perf_counter() F = FastList(L) t0 = time.perf_counter() F.sort() t1 = time.perf_counter() print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') flush() So the benchmarking here is valid. I didn't write it. All I did was modify it to print percent improvement instead of sort time. The numbers below are percent improvement of my sort vs default sort (just clone my repo and run python sortperf.py to verify): i 2**i *sort \sort /sort 3sort +sort %sort ~sort =sort !sort 15 32768 44.11% 54.69% 47.41% 57.61% 50.17% 75.24% 68.03% 65.16% 82.16% 16 65536 14.14% 75.38% 63.44% 56.56% 67.99% 66.19% 50.72% 61.55% 61.87% 17 131072 73.54% 60.52% 60.97% 61.63% 52.55% 49.84% 68.68% 84.12% 65.90% 18 262144 54.19% 55.34% 54.67% 54.13% 55.62% 52.88% 69.30% 74.86% 72.66% 19 524288 55.12% 53.32% 53.77% 54.27% 52.97% 53.53% 67.55% 76.60% 78.56% 20 1048576 55.05% 51.09% 60.05% 50.69% 62.98% 50.20% 66.24% 71.47% 61.40% This is just insane. This is crazy. I didn't write this benchmark, OK, this is a valid benchmark. Tim Peters wrote it. And except for one trial, they all show more than 50% improvement. Some in the 70s and 80s. This is crazy. This is so cool. I just wanted to share this with you guys. I'll submit a patch to bugs.python.org soon; I just have to write a special case comparison for tuples and then I'm basically done. This is so cool!!!!!!!!! 50% man!!!!! Crazy!!!!! Elliot On Mon, Oct 10, 2016 at 9:02 PM Tim Peters wrote: > [please restrict follow-ups to python-ideas] > > Let's not get hung up on meta-discussion here - I always thought "massive > clusterf**k" was a precise technical term anyway ;-) > > While timing certainly needs to be done more carefully, it's obvious to me > that this approach _should_ pay off significantly when it applies. > Comparisons are extraordinarily expensive in Python, precisely because of > the maze of test-and-branch code it requires just to figure out which > bottom-level comparison function to invoke each time. That's why I spent > months of my life (overall) devising a sequence of sorting algorithms for > Python that reduced the number of comparisons needed. > > Note that when Python's current sort was adopted in Java, they still kept > a quicksort variant for "unboxed" builtin types. The adaptive merge sort > incurs many overheads that often cost more than they save unless > comparisons are in fact very expensive compared to the cost of pointer > copying (and in Java comparison of unboxed types is cheap). Indeed, for > native numeric types, where comparison is dirt cheap, quicksort generally > runs faster than mergesort despite that the former does _more_ comparisons > (because mergesort does so much more pointer-copying). > > I had considered something "like this" for Python 2, but didn't pursue it > because comparison was defined between virtually any two types (34 < [1], > etc), and people were careless about that (both by design and by > accident). In Python 3, comparison "blows up" for absurdly mixed types, so > specializing for homogeneously-typed lists is a more promising idea on the > face of it. > > The comparisons needed to determine _whether_ a list's objects have a > common type is just len(list)-1 C-level pointer comparisons, and so goes > fast. So I expect that, when it applies, this would speed even sorting an > already-ordered list with at least 2 elements. > > For a mixed-type list with at least 2 elements, it will always be pure > loss. But (a) I expect such lists are uncommon (and especially uncommon in > Python 3); and (b) a one-time scan doing C-level pointer comparisons until > finding a mismatched type is bound to be a relatively tiny cost compared to > the expense of all the "rich comparisons" that follow. > > So +1 from me on pursuing this. > > Elliot, please: > > - Keep this on python-ideas. python-dev is for current issues in Python > development, not for speculating about changes. > > - Open an issue on the tracker: https://bugs.python.org/ > > - At least browse the info for developers: > https://docs.python.org/devguide/ > > - Don't overlook Lib/test/sortperf.py. As is, it should be a good test of > what your approach so far _doesn't_ help, since it sorts only lists of > floats (& I don't think you're special-casing them). If the timing results > it reports aren't significantly hurt (and I expect they won't be), then add > specialization for floats too and gloat about the speedup :-) > > - I expect tuples will also be worth specializing (complex sort keys are > often implemented as tuples). > > Nice start! :-) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davisein at gmail.com Tue Oct 11 12:21:45 2016 From: davisein at gmail.com (David Navarro) Date: Tue, 11 Oct 2016 18:21:45 +0200 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: <57FD067E.5000101@stoneleaf.us> References: <20161011140019.GQ22471@ando.pearwood.info> <57FD067E.5000101@stoneleaf.us> Message-ID: One option would be to decorate those functions and provide an implementation to __bool__ or __nonzero__ which raises an exception. Something like this In [1]: def a(): pass In [2]: def r(): raise RuntimeError('Do not forget to call this') In [3]: a.__bool__ = r In [4]: if a: pass I don't have an environment to test if this is possible. This would allow marking with a decorator functions that might be misleading or that are a common source of issues for new users. -- David Navarro On 11 October 2016 at 17:34, Ethan Furman wrote: > On 10/11/2016 07:00 AM, Steven D'Aprano wrote: > >> On Tue, Oct 11, 2016 at 02:41:34PM +0200, Sven R. Kunze wrote: >> > > on django-developers, an intriguing idea appeared: >>> https://groups.google.com/d/msg/django-developers/4bntzg1Hww >>> Y/HHHjbDnLBQAJ >>> >>> """ >>> It seems to me that the default `method.__bool__` is undesirable in >>> Jinja2 templates. I do not know Jinja2 well enough, but maybe they could >>> benefit from a patch where `if`-statements give a warning/error when the >>> expression is a callable (with the default `FunctionType.__bool__`? >>> This would solve the issue not just for the methods you mention, but >>> more in general. >>> >> >> That should be easy enough to do as a custom descriptor. >> >> But I would not like to see the default function or method __bool__ >> raise a warning. >> > > [...] > > So I think this is something that Django/Jinja2 should implement for its >> own methods that need it, it should not be a general feature of all >> Python functions/methods. >> > > Agreed. Python is a /general/-purpose programming language. We should not > make changes to help one subset of users when those changes will harm > another subset (and being flooded with false positives is harmful) -- > particularly when easy customization is already available. > > -- > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- David Navarro Estruch -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 11 12:35:19 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 12 Oct 2016 02:35:19 +1000 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: Message-ID: On 11 October 2016 at 23:59, Paul Moore wrote: > Certainly. But the whole point of the warning is to catch people who > do the wrong thing. All I'm saying is that the new warning would cause > problems for people who omit a (currently unnecessary) "is None" check > in the process of protecting people who forget whether an attribute is > a method or a property. Which group do we prefer to help? I don't think there's a technicality here: new warnings on valid code count as a backwards compatibility break, as some folks run their test suites under "-Werror". That means existing correct code wins by default. However, writing a "@predicatemethod" descriptor that folks could use if they wanted to warn about that particular case may make sense. That's already possible for API designers that choose to do it, so I don't see a strong reason to add it to the standard library at this point. (Especially since type inference engines are often going to be able to pick this particular problem up statically) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tarek at ziade.org Tue Oct 11 12:40:33 2016 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek=20Ziad=E9?=) Date: Tue, 11 Oct 2016 18:40:33 +0200 Subject: [Python-ideas] Adding full() to collections.deque In-Reply-To: References: <1476187789.4014609.752298193.7F77D50F@webmail.messagingengine.com> Message-ID: <1476204033.4077414.752603833.3F6A46A7@webmail.messagingengine.com> Ah! I havn't thought about that because in my use case the deque is append-only so once it's full it discards older elements. I guess what I really need is a regular FIFO Queue Thanks for the feedback ! -- Tarek Ziad? | coding: https://ziade.org | running: https://foule.es | twitter: @tarek_ziade On Tue, Oct 11, 2016, at 05:44 PM, Guido van Rossum wrote: > Isn't the problem that you don't know if it's still full on the next > line? After all you supposedly have a multi-threaded app here, > otherwise why bother with any of that? Or maybe you can describe the > real-world use case where you wanted this in more detail? Without much > more evidence I can't support such a change. > > On Tue, Oct 11, 2016 at 5:09 AM, Tarek Ziad? wrote: > > > > Hey, > > > > When creating deque instances using a value for maxlen, it would be nice > > to have a .full() method like what Queue provides, so one may do: > > > > my_deque = deque(maxlen=300) > > > > if my_deque.full(): > > do_something() > > > > instead of doing: > > > > if len(my_deque) == my_deque.maxlen: > > do_something() > > > > > > If people think it's a good idea, I can add a ticket in the tracker and > > try to provide a patch for the collections module maintainer. > > If this was already talked about, or is a bad idea, sorry! :) > > > > Cheers > > Tarek > > -- > > > > Tarek Ziad? | coding: https://ziade.org | running: https://foule.es | > > twitter: @tarek_ziade > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Oct 11 12:49:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 12 Oct 2016 02:49:44 +1000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 02:16, Elliot Gorokhovsky wrote: > So I thought, wow, this will give some nice numbers! But I underestimated > the power of this optimization. You have no idea. It's crazy. > This is just insane. This is crazy. Not to take away from the potential for speed improvements (which do indeed seem interesting), but I'd ask that folks avoid using mental health terms to describe test results that we find unbelievable. There are plenty of other adjectives we can use, and a text-based medium like email gives us a chance to proofread our posts before we send them. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Oct 11 14:22:15 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Oct 2016 19:22:15 +0100 Subject: [Python-ideas] warn/error when using a method as boolean in ifs/whiles In-Reply-To: References: <20161011140019.GQ22471@ando.pearwood.info> <57FD067E.5000101@stoneleaf.us> Message-ID: On 11 October 2016 at 17:21, David Navarro wrote: > Something like this > > In [1]: def a(): pass > In [2]: def r(): raise RuntimeError('Do not forget to call this') > In [3]: a.__bool__ = r > In [4]: if a: pass > > I don't have an environment to test if this is possible. This would allow > marking with a decorator functions that might be misleading or that are a > common source of issues for new users. It would need to be somewhat more complex (the above doesn't work as it stands, because __bool__ is only recognised as a method on a class, not an attribute of a function). But it's doable, either manually (make a a class with __call__ and __bool__) or semi-automatically (a decorator that creates a class for which __call__ is defined as running the decorated function, and __bool__ raises a warning). Paul From p.f.moore at gmail.com Tue Oct 11 14:30:25 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Oct 2016 19:30:25 +0100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On 11 October 2016 at 17:49, Nick Coghlan wrote: > On 12 October 2016 at 02:16, Elliot Gorokhovsky > wrote: >> So I thought, wow, this will give some nice numbers! But I underestimated >> the power of this optimization. You have no idea. It's crazy. >> This is just insane. This is crazy. > > Not to take away from the potential for speed improvements (which do > indeed seem interesting), but I'd ask that folks avoid using mental > health terms to describe test results that we find unbelievable. There > are plenty of other adjectives we can use, and a text-based medium > like email gives us a chance to proofread our posts before we send > them. I'd also suggest toning down the rhetoric a bit (all-caps title, "the contents of this message may be dangerous for readers with heart conditions" etc. Your results do seem good, but it's a little hard to work out what you actually did, and how your results were produced, through the hype. It'll be much better when someone else has a means to reproduce your results to confirm them. In all honestly, people have been working on Python's performance for a long time now, and I'm more inclined to think that a 50% speedup is a mistake rather than an opportunity that's been missed for all that time. I'd be happy to be proved wrong, but for now I'm skeptical. Please continue working on this - I'd love my skepticism to be proved wrong! Paul From mafagafogigante at gmail.com Tue Oct 11 15:13:03 2016 From: mafagafogigante at gmail.com (Bernardo Sulzbach) Date: Tue, 11 Oct 2016 16:13:03 -0300 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: I honestly appreciate your work on this. However, don't write as if you were trying to sell something to the people on the mailing list. "INSANE FLOAT PERFORMANCE!!!" seems the title of a bad YouTube video or something lower than that, while you are just trying to tell us about "Relevant performance improvement when sorting floating point numbers", a subject which is much clearer, less intrusive, and more professional than the one you used. Thanks for your efforts, nonetheless. -- Bernardo Sulzbach http://www.mafagafogigante.org/ mafagafogigante at mafagafogigante.org From elliot.gorokhovsky at gmail.com Tue Oct 11 16:58:07 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 11 Oct 2016 20:58:07 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: So I got excited here. And the reason why is that I got those numbers *on Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I figured there was probably a problem with they way I was timing, and certainly the gains couldn't be as extreme as they suggested. But this is on a benchmark that's already in the codebase! Here is a detailed explanation of how to reproduce my results, and the circumstances that would lead them to be invalid: ****************************************** To reproduce, just activate a virtualenv, and then clone https://github.com/embg/python-fast-listsort.git. Then python setup.py install and python sortperf.py. Now let's look at what sortperf.py does and how it relates to Tim's benchmark at Lib/test/sortperf.py. If you diff the two, you'll find I made three changes: 1. I added an import, "import fastlist". This obviously would not make sorting twice faster. 2. I changed the way it formats the output: I changed "fmt = ("%2s %7s" + " %7s"*len(cases))" to "fmt = ("%2s %7s" + " %6s"*len(cases))". Again irrelevant. 3. I changed the timing function #from this def doit_fast(L): t0 = time.perf_counter() L.fastsort() t1 = time.perf_counter() print("%6.2f" % (t1-t0), end=' ') flush() #to this def doit(L): F = FastList(L) f0 = time.perf_counter() F.fastsort() f1 = time.perf_counter() F = FastList(L) t0 = time.perf_counter() F.sort() t1 = time.perf_counter() print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') flush() ******************************************* So what we've shown is that (1) if you trust the existing sorting benchmark and (2) if my modification to doit() doesn't mess anything up (I leave this up to you to judge), then the measurements are as valid. Which is a pretty big deal (50% !!!!!!!), hence my overexcitement. **************************************** Now I'd like to respond to responses (the one I'm thinking of was off-list so I don't want to quote it) I've gotten questioning how it could be possible for such a small optimization, bypassing the typechecks, to possibly have such a large effect, even in theory. Here's my answer: Let's ignore branch prediction and cache for now and just look at a high level. The cost of sorting is related to the cost of a single comparison, because the vast majority of our time (let's say certainly at least 90%, depending on the list) is spent in comparisons. So let's look at the cost of a comparison. Without my optimization, comparisons for floats (that's what this benchmark looks at) go roughly like this: 1. Test type of left and right for PyObject_RichCompare (which costs two pointer dereferences) and compare them. "3 ops" (quotes because counting ops like this is pretty hand-wavy). "2 memory accesses". 2. Get the address of the float compare method from PyFloat_Type->tp_richcompare. "1 op". "1 memory access". 3. Call the function whose address we just got. "1 op". "Basically 0 memory accesses because we count the stack stuff in that 1 op". 4. Test type of left and right again in PyFloat_RichCompare and compare both of them to PyFloat_Type. "4 ops". "2 memory accesses". 5. Get floats from the PyObject* by calling PyFloat_AS_DOUBLE or whatever. "2 ops". "2 memory accesses". 6. Compare the floats and return. "2 ops". Now let's tally the "cost" (sorry for use of quotes here, just trying to emphasize this is an intuitive, theoretical explanation for the numbers which doesn't take into account the hardware): "13 ops, 7 memory accesses". Here's what it looks like in my code: 1. Call PyFloat_AS_DOUBLE on left and right. "2 ops". "2 memory acceses". 2. Compare the floats and return. "2 ops". Tally: "4 ops, 2 memory accesses". Now you can argue branch prediction alleviates a lot of this cost, since we're taking the same branches every time. But note that, branch prediction or not, we still have to do all of those memory acceses, and since they're pointers to places all over memory, they miss the cache basically every time (correct me if I'm wrong). So memory-wise, we really are doing something like a 7:2 ratio, and op-wise, perhaps not as bad because of branch prediction, but still, 13:4 is probably bad no matter what's going on in the hardware. Now consider that something like 90% of our time is spent in those steps. Are my numbers really that unbelievable? Thanks for everything, looking forward to writing this up as a nice latex doc with graphs and perf benchmarks and all the other rigorous goodies, as well as a special case cmp func for homogeneous tuples and a simple patch file, Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Oct 11 18:30:48 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 11 Oct 2016 17:30:48 -0500 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: Message-ID: On Wednesday, October 5, 2016, Yury Selivanov wrote: > > > On 2016-10-05 2:50 PM, Nathan Goldbaum wrote: > >> On Wed, Oct 5, 2016 at 1:27 PM, Michel Desmoulin < >> desmoulinmichel at gmail.com> >> wrote: >> >> +1. Python does need better error messages. This and the recent new import >>> exception will really help. >>> >>> Will feature freeze prevent this to get into 3.6 if some champion it? >>> >>> Speaking of, I'm not much of a C hacker, and messing with CPython >> internals >> is a little daunting. If anyone wants to take this on, you have my >> blessing. I also may take a shot at implementing this idea in the next >> couple weeks when I have some time. >> > > It would help if you could create an issue and write exhaustive unittests > (or at least specifying how exactly the patch should work for all corner > cases). Someone with the knowledge of CPython internals will later add the > missing C code to the patch. Good idea. I will make an attempt at this later this week, starting with the tests that were added to pypy. For now I will focus on bound methods. > > Yury > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Tue Oct 11 20:25:16 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 00:25:16 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: To answer your question: I special-case unicode (strings), ints, and floats. I am working on special-casing tuples (can even be different types, just need homogeneity column-wise). The best speedups will be tuples of floats: it'll bypass three layers of useless checks. If I run it without special-casing floats (just using tp->rich_compare) I only get like a 5-10% speedup. I'm working on rigorous benchmarks for all this stuff, will post a pdf along with the patch once it's all done. But it's certainly <10%. However, part of this is because my special case is really low-level; for strings I've actually found the opposite, using tp->richcompare gives me almost the same results as my special case compare, since it still has to PyUnicode_READY the strings (or whatever it's called). Regarding generalization: the general technique for special-casing is you just substitute all type checks with 1 or 0 by applying the type assumption you're making. That's the only way to guarantee it's safe and compliant. Elliot On Tue, Oct 11, 2016, 5:19 PM Jim J. Jewett wrote: > Excellent. > I'm surprised cache didn't save more, but less surprised than I was ... I > hadn't realized that you were skipping the verifications in > PyFloat_RichCompare as well. Does that generalize to other element types > without exposing too much of the per-type internals to list.sort? > > Oh ... and I appreciate your not quoting private email as a general > courtesy, but I hereby give you permission if it was mine that was private. > [Though I think your summary was better than a quote anyhow.] > > -jJ > > On Oct 11, 2016 4:58 PM, "Elliot Gorokhovsky" < > elliot.gorokhovsky at gmail.com> wrote: > > So I got excited here. And the reason why is that I got those numbers *on > Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I > figured there was probably a problem with they way I was timing, and > certainly the gains couldn't be as extreme as they suggested. But this is > on a benchmark that's already in the codebase! > > > Here is a detailed explanation of how to reproduce my results, and the > circumstances that would lead them to be invalid: > > ****************************************** > > To reproduce, just activate a virtualenv, and then clone > https://github.com/embg/python-fast-listsort.git. Then python setup.py > install and python sortperf.py. > > > Now let's look at what sortperf.py does and how it relates to Tim's > benchmark at Lib/test/sortperf.py. If you diff the two, you'll find I made > three changes: > > > 1. I added an import, "import fastlist". This obviously would not make > sorting twice faster. > > > 2. I changed the way it formats the output: I changed "fmt = ("%2s %7s" + > " %7s"*len(cases))" to "fmt = ("%2s %7s" + " %6s"*len(cases))". Again > irrelevant. > > > 3. I changed the timing function > > #from this > > > def doit_fast(L): > t0 = time.perf_counter() > L.fastsort() > t1 = time.perf_counter() > print("%6.2f" % (t1-t0), end=' ') > flush() > > > > #to this > > > def doit(L): > F = FastList(L) > f0 = time.perf_counter() > F.fastsort() > f1 = time.perf_counter() > F = FastList(L) > t0 = time.perf_counter() > F.sort() > t1 = time.perf_counter() > print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') > flush() > > > ******************************************* > > So what we've shown is that (1) if you trust the existing sorting > benchmark and (2) if my modification to doit() doesn't mess anything up (I > leave this up to you to judge), then the measurements are as valid. Which > is a pretty big deal (50% !!!!!!!), hence my overexcitement. > > **************************************** > > > Now I'd like to respond to responses (the one I'm thinking of was off-list > so I don't want to quote it) I've gotten questioning how it could be > possible for such a small optimization, bypassing the typechecks, to > possibly have such a large effect, even in theory. Here's my answer: > > Let's ignore branch prediction and cache for now and just look at a high > level. The cost of sorting is related to the cost of a single comparison, > because the vast majority of our time (let's say certainly at least 90%, > depending on the list) is spent in comparisons. So let's look at the cost > of a comparison. > > Without my optimization, comparisons for floats (that's what this > benchmark looks at) go roughly like this: > > 1. Test type of left and right for PyObject_RichCompare (which costs two > pointer dereferences) and compare them. "3 ops" (quotes because counting > ops like this is pretty hand-wavy). "2 memory accesses". > > 2. Get the address of the float compare method from > PyFloat_Type->tp_richcompare. "1 op". "1 memory access". > > 3. Call the function whose address we just got. "1 op". "Basically 0 > memory accesses because we count the stack stuff in that 1 op". > > 4. Test type of left and right again in PyFloat_RichCompare and compare > both of them to PyFloat_Type. "4 ops". "2 memory accesses". > > 5. Get floats from the PyObject* by calling PyFloat_AS_DOUBLE or whatever. > "2 ops". "2 memory accesses". > > 6. Compare the floats and return. "2 ops". > > Now let's tally the "cost" (sorry for use of quotes here, just trying to > emphasize this is an intuitive, theoretical explanation for the numbers > which doesn't take into account the hardware): > "13 ops, 7 memory accesses". > > Here's what it looks like in my code: > > 1. Call PyFloat_AS_DOUBLE on left and right. "2 ops". "2 memory acceses". > > 2. Compare the floats and return. "2 ops". > > Tally: "4 ops, 2 memory accesses". > > Now you can argue branch prediction alleviates a lot of this cost, since > we're taking the same branches every time. But note that, branch prediction > or not, we still have to do all of those memory acceses, and since they're > pointers to places all over memory, they miss the cache basically every > time (correct me if I'm wrong). So memory-wise, we really are doing > something like a 7:2 ratio, and op-wise, perhaps not as bad because of > branch prediction, but still, 13:4 is probably bad no matter what's going > on in the hardware. > > Now consider that something like 90% of our time is spent in those steps. > Are my numbers really that unbelievable? > > Thanks for everything, looking forward to writing this up as a nice latex > doc with graphs and perf benchmarks and all the other rigorous goodies, as > well as a special case cmp func for homogeneous tuples and a simple patch > file, > > Elliot > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Oct 11 22:24:47 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Oct 2016 22:24:47 -0400 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: On 10/11/2016 12:08 PM, Ryan Gonzalez wrote: > On Oct 11, 2016 10:40 AM, "Erik Bray" > > wrote: >> >> On Sun, Oct 9, 2016 at 2:25 AM, Steven D'Aprano > > wrote: >> > On Sat, Oct 08, 2016 at 09:26:13PM +0200, Jelte Fennema wrote: >> >> I have an idea to improve indenting guidelines for dictionaries for > better >> >> readability: If a value in a dictionary literal is placed on a new > line, it >> >> should have (or at least be allowed to have) a n additional hanging > indent. >> >> >> >> Below is an example: >> >> >> >> mydict = {'mykey': >> >> 'a very very very very very long value', >> >> 'secondkey': 'a short value', >> >> 'thirdkey': 'a very very very ' >> >> 'long value that continues on the next line', >> >> } >> > >> > Looks good to me, except that my personal preference for the implicit >> > string concatenation (thirdkey) is to move the space to the >> > following line, and (if possible) align the parts: >> > mydict = {'mykey': >> > 'a very very very very very long value', >> > 'secondkey': 'a short value', >> > 'thirdkey': 'a very very very' >> > ' long value that continues on the next line', >> > } >> >> Heh--not to bikeshed, but my personal preference is to leave the >> trailing space on the first line. This is because by the time I've >> started a new line (and possibly have spent time fussing with >> indentation for the odd cases that my editor doesn't get quite right) >> I'll have forgotten that I need to start the line with a space :) I agree that the first version of the example, with space after 'very', before the quote, is better. I also put '\n' at the end of literals to be auto-joined, rather than at the beginning. > Until you end up with like 20 merge conflicts because some editors strip > trailing whitespace... A space within a string literal is not trailing whitespace and will not be stripped. -- Terry Jan Reedy From tjreedy at udel.edu Tue Oct 11 22:59:01 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Oct 2016 22:59:01 -0400 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On 10/11/2016 2:30 PM, Paul Moore wrote: > On 11 October 2016 at 17:49, Nick Coghlan wrote: >> On 12 October 2016 at 02:16, Elliot Gorokhovsky >> wrote: >>> So I thought, wow, this will give some nice numbers! But I underestimated >>> the power of this optimization. You have no idea. It's crazy. >>> This is just insane. This is crazy. >> >> Not to take away from the potential for speed improvements (which do >> indeed seem interesting), but I'd ask that folks avoid using mental >> health terms to describe test results that we find unbelievable. There >> are plenty of other adjectives we can use, and a text-based medium >> like email gives us a chance to proofread our posts before we send >> them. > > I'd also suggest toning down the rhetoric a bit (all-caps title, "the > contents of this message may be dangerous for readers with heart > conditions" etc. I triple the motion. In general, all caps = spam or worse and I usually don't even open such posts. Elliot, to me, all caps means IGNORE ME. I suspect this is not what you want. > Your results do seem good, but it's a little hard to > work out what you actually did, and how your results were produced, > through the hype. It'll be much better when someone else has a means > to reproduce your results to confirm them. In all honestly, people > have been working on Python's performance for a long time now, and I'm > more inclined to think that a 50% speedup is a mistake rather than an > opportunity that's been missed for all that time. I'd be happy to be > proved wrong, but for now I'm skeptical. I'm not, in the same sense, even though Elliot suggested that we should be ;-). His key insight is that if all members of a list have the same type (which is a common 'special case'), then we can replace the general, somewhat convoluted, rich-comparison function, containing at least two type checks, with a faster special-case comparison function without any type checks. Since Python floats wrap machine doubles, I expect that float may have the greatest speedup. > Please continue working on this - I'd love my skepticism to be proved wrong! It may be the case now that sorting a list of all floats is faster than a mixed list of ints and floats. I expect that it definitely will be with a float comparison function. -- Terry Jan Reedy From tim.peters at gmail.com Tue Oct 11 23:06:19 2016 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 11 Oct 2016 22:06:19 -0500 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: [Elliot Gorokhovsky] > Warning: the contents of this message may be dangerous for readers with > heart conditions. It appears some people are offended by your exuberance. Ignore them ;-) > ... > And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE > (after we set op=Py_LT)!!! Just Right. And this is why floats will likely be the best case for your approach: all the overheads of finding "the right" bottom-level comparison are weighed against what's basically a single machine instruction to do the actual work of comparing two floats. > return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False : > Py_True; In real life, this is better written as: return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False; That is, the comparison to 0 was an artificial complication left over from simplifying the code. However, that code is buggy, in a way that may not show up for a long time. Keeping track of reference counts is crucial. It needs to be more like: PyObject *res: res = (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False; Py_INCREF(res); return res; > So I thought, wow, this will give some nice numbers! But I underestimated > the power of this optimization. You have no idea. It's crazy. It's in the ballpark of what I expected :-) > Since we're dealing with floats, I used Tim Peter's benchmark: > Lib/test/sortperf.py, just modifying one function: It's actually Guido's file, although I probably changed it much more recently than he did. As I explained in a different msg, it's _not_ a good benchmark. It's good at distinguishing among "really really good", "not a huge change", and "really really bad", but that's all. At the start, such gross distinctions are all anyone should care about. One thing I'd like to see: do exactly the same, but comment out your float specialization. It's important to see whether sortperf.py says "not a huge change" then (which I expect, but it needs confirmation). That is, it's not enough if some cases get 4x faster if it's also that other cases get much slower. > ... > #My function: > def doit(L): > F = FastList(L) > f0 = time.perf_counter() > F.fastsort() > f1 = time.perf_counter() > F = FastList(L) > t0 = time.perf_counter() > F.sort() > t1 = time.perf_counter() > print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') > flush() Numbers in benchmarks always need to be explained, because they're only clear to the person who wrote the code. From your code, you're essentially computing (old_time - new_time) / old_time * 100.0 but in an obfuscated way. So long as the new way is at least as fast, the only possible values are between 0% and 100%. That's important to note. Other people, e.g., measure these things by putting "new_time" in the denominator. Or just compute a ratio, like old_time / new_time. They way you're doing it, e.g., "75%" means the new way took only a quarter of the time of the old way - it means new_time is 75% smaller than old_time. I'm comfortable with that, but it's not the _clearest_ way to display timing differences so dramatic; old_time / new_time would be 4.0 in this case, which is easier to grasp at a glance. > So the benchmarking here is valid. No, it sucks ;-) But it's perfectly adequate for what I wanted to see from it :-) > I didn't write it. All I did was modify > it to print percent improvement instead of sort time. The numbers below are > percent improvement of my sort vs default sort (just clone my repo and run > python sortperf.py to verify): > > i 2**i *sort \sort /sort 3sort +sort %sort ~sort =sort !sort > 15 32768 44.11% 54.69% 47.41% 57.61% 50.17% 75.24% 68.03% 65.16% > 82.16% > 16 65536 14.14% 75.38% 63.44% 56.56% 67.99% 66.19% 50.72% 61.55% > 61.87% > 17 131072 73.54% 60.52% 60.97% 61.63% 52.55% 49.84% 68.68% 84.12% > 65.90% > 18 262144 54.19% 55.34% 54.67% 54.13% 55.62% 52.88% 69.30% 74.86% > 72.66% > 19 524288 55.12% 53.32% 53.77% 54.27% 52.97% 53.53% 67.55% 76.60% > 78.56% > 20 1048576 55.05% 51.09% 60.05% 50.69% 62.98% 50.20% 66.24% 71.47% > 61.40% If it _didn't_ suck, all the numbers in a column would be about the same :-) A meta-note: when Guido first wrote sortperf.py, machines - and Python! - were much slower. Now sorting 2**15 elements goes so fast that this gross timing approach is especially only good for making the grossest distinctions. On my box today, even the slowest case (*sort on 2**20 elements) takes under half a second. That's "why" the numbers in each column are much more stable across the last 3 rows than across the first 3 rows - the cases in the first 3 rows take so little time that any timing glitch can skew them badly. Note that you can pass arguments to sortperf.py to check any range of power-of-2 values you like, but if you're aware of the pitfalls the output as-is is fine for me. > This is just insane. This is crazy. Yet nevertheless wholly expected ;-) > ... > This is so cool. It is! Congratulations :-) From ncoghlan at gmail.com Tue Oct 11 23:56:58 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 12 Oct 2016 13:56:58 +1000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 06:58, Elliot Gorokhovsky wrote: > So I got excited here. And the reason why is that I got those numbers *on > Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I > figured there was probably a problem with they way I was timing, and > certainly the gains couldn't be as extreme as they suggested. But this is on > a benchmark that's already in the codebase! Thanks for the clearer write-up - this is indeed very cool, and it's wonderful to see that the new assumptions permitted by Python 3 getting stricter about cross-type ordering comparisons may lead to speed-ups for certain common kinds of operations (i.e. sorting lists where the sorting keys are builtin immutable types). Once you get to the point of being able to do performance mentions on a CPython build with a modified list.sort() implementation, you'll want to take a look at the modern benchmark suite in https://github.com/python/performance Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ram at rachum.com Wed Oct 12 05:26:02 2016 From: ram at rachum.com (Ram Rachum) Date: Wed, 12 Oct 2016 12:26:02 +0300 Subject: [Python-ideas] Expose condition._waiters Message-ID: Hi guys, I'm writing some code that uses `threading.Condition` and I found that I want to access condition._waiters. I want to do it in two different parts of my code for two different reasons: 1. When shutting down the thread that manages the condition, I want to be sure that there are no waiters on the condition, so I check whether `condition._waiters` is empty before exiting, otherwise I'll let them finish and only then exit. 2. When I do notify_all, I actually want to do as many notify actions as needed until there's a full round of notify_all in which none of the conditions for any of the waiters have been met. Only then do I want my code to continue. (It's because these waiters are waiting for resources that I'm giving them, each wanting a different number of resources, and I want to be sure that all of them are starved before I get more resources for them.) Do you think it'll be a good idea to add non-private functionality like that to threading.Condition? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Oct 12 06:16:22 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Oct 2016 21:16:22 +1100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: <20161012101621.GR22471@ando.pearwood.info> On Wed, Oct 12, 2016 at 12:25:16AM +0000, Elliot Gorokhovsky wrote: > Regarding generalization: the general technique for special-casing is you > just substitute all type checks with 1 or 0 by applying the type assumption > you're making. That's the only way to guarantee it's safe and compliant. I'm confused -- I don't understand how *removing* type checks can possible guarantee the code is safe and compliant. It's all very well and good when you are running tests that meet your type assumption, but what happens if they don't? If I sort a list made up of (say) mixed int and float (possibly including subclasses), does your "all type checks are 1 or 0" sort segfault? If not, why not? Where's the safety coming from? By the way, your emails in this thread have reminded me of a quote from the late Sir Terry Pratchett's novel "Maskerade" (the odd spelling is intentional): "What sort of person," said Salzella patiently, "sits down and *writes* a maniacal laugh? And all those exclamation marks, you notice? Five? A sure sign of someone who wears his underpants on his head." :-) -- Steve From p.f.moore at gmail.com Wed Oct 12 07:35:23 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 12 Oct 2016 12:35:23 +0100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: <20161012101621.GR22471@ando.pearwood.info> References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 11:16, Steven D'Aprano wrote: > On Wed, Oct 12, 2016 at 12:25:16AM +0000, Elliot Gorokhovsky wrote: > >> Regarding generalization: the general technique for special-casing is you >> just substitute all type checks with 1 or 0 by applying the type assumption >> you're making. That's the only way to guarantee it's safe and compliant. > > I'm confused -- I don't understand how *removing* type checks can > possible guarantee the code is safe and compliant. > > It's all very well and good when you are running tests that meet your > type assumption, but what happens if they don't? If I sort a list made > up of (say) mixed int and float (possibly including subclasses), does > your "all type checks are 1 or 0" sort segfault? If not, why not? > Where's the safety coming from? My understanding is that the code does a per-check that all the elements of the list are the same type (float, for example). This is a relatively quick test (O(n) pointer comparisons). If everything *is* a float, then an optimised comparison routine that skips all the type checks and goes straight to a float comparison (single machine op). Because there are more than O(n) comparisons done in a typical sort, this is a win. And because the type checks needed in rich comparison are much more expensive than a pointer check, it's a *big* win. What I'm *not* quite clear on is why Python 3's change to reject comparisons between unrelated types makes this optimisation possible. Surely you have to check either way? It's not that it's a particularly important question - if the optimisation works, it's not that big a deal what triggered the insight. It's just that I'm not sure if there's some other point that I've not properly understood. Paul From srkunze at mail.de Wed Oct 12 09:58:25 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 12 Oct 2016 15:58:25 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: Message-ID: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> Hi Martti, On 11.10.2016 14:42, Martti K?hne wrote: > Hello list > > I love the "new" unpacking generalisations as of pep448. And I found > myself using them rather regularly, both with lists and dict. > Today I somehow expected that [*foo for foo in bar] was equivalent to > itertools.chain(*[foo for foo in bar]), which it turned out to be a > SyntaxError. > The dict equivalent of the above might then be something along the > lines of {**v for v in dict_of_dicts.values()}. In case the values > (which be all dicts) are records with the same keys, one might go and > prepend the keys with their former keys using > { > **dict( > ("{}_{}".format(k, k_sub), v_sub) > for k_sub, v_sub in v.items() > ) for k, v in dict_of_dicts.items() > } > Was anyone able to follow me through this? Reading PEP448 it seems to me that it's already been considered: https://www.python.org/dev/peps/pep-0448/#variations The reason for not-inclusion were about concerns about acceptance because of "strong concerns about readability" but also received "mild support". I think your post strengthens the support given that you "expected it to just work". This shows at least to me that the concerns about readability/understandability are not justified much. Personally, I find inclusion of */** expansion for comprehensions very natural. It would again strengthen the meaning of */** for unpacking which I am also in favor of. Cheers, Sven From ncoghlan at gmail.com Wed Oct 12 11:09:29 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Oct 2016 01:09:29 +1000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 21:35, Paul Moore wrote: > What I'm *not* quite clear on is why Python 3's change to reject > comparisons between unrelated types makes this optimisation possible. > Surely you have to check either way? It's not that it's a particularly > important question - if the optimisation works, it's not that big a > deal what triggered the insight. It's just that I'm not sure if > there's some other point that I've not properly understood. It's probably more relevant that cmp() went away, since that simplified the comparison logic to just PyObject_RichCompareBool, without the custom comparison function path. It *might* have still been possible to do something like this in the Py2 code (since the main requirement is to do the pre-check for consistent types if the first object is of a known type with an optimised fast path), but I don't know anyone that actually *likes* adding new special cases to already complex code and trying to figure out how to test whether or not they've broken anything :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tim.peters at gmail.com Wed Oct 12 11:19:05 2016 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 12 Oct 2016 10:19:05 -0500 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: [Paul Moore] > My understanding is that the code does a per-check that all the > elements of the list are the same type (float, for example). This is a > relatively quick test (O(n) pointer comparisons). If everything *is* a > float, then an optimised comparison routine that skips all the type > checks and goes straight to a float comparison (single machine op). That matches my understanding. > Because there are more than O(n) comparisons done in a typical sort, > this is a win. If the types are in fact all the same, it should be a win even for n==2 (at n < 2 no comparisons are done; at n==2 exactly 1 comparison is done): one pointer compare + go-straight-to-C-float-"x And because the type checks needed in rich comparison And layers of function calls. > are much more expensive than a pointer check, it's a *big* win. Bingo :-) > What I'm *not* quite clear on is why Python 3's change to reject > comparisons between unrelated types makes this optimisation possible. It doesn't. It would also apply in Python 2. I simply expect the optimization will pay off more frequently in Python 3 code. For example, in Python 2 I used to create lists with objects of wildly mixed types, and sort them merely to bring objects of the same type next to each other. Things "like that" don't work at all in Python 3. > Surely you have to check either way? It's not that it's a particularly > important question - if the optimisation works, it's not that big a > deal what triggered the insight. It's just that I'm not sure if > there's some other point that I've not properly understood. Well, either your understanding is fine, or we're both confused :-) From ncoghlan at gmail.com Wed Oct 12 11:41:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Oct 2016 01:41:30 +1000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> Message-ID: On 12 October 2016 at 23:58, Sven R. Kunze wrote: > Reading PEP448 it seems to me that it's already been considered: > https://www.python.org/dev/peps/pep-0448/#variations > > The reason for not-inclusion were about concerns about acceptance because of > "strong concerns about readability" but also received "mild support". I > think your post strengthens the support given that you "expected it to just > work". This shows at least to me that the concerns about > readability/understandability are not justified much. Readability isn't about "Do some people guess the same semantics for what it would mean?", as when there are only a few plausible interpretations, all the possibilities are going to get a respectable number of folks picking them as reasonable behaviour. Instead, readability is about: - Do people consistently guess the *same* interpretation? - Is that interpretation consistent with other existing uses of the syntax? - Is it more readily comprehensible than existing alternatives, or is it brevity for brevity's sake? This particular proposal fails on the first question (as too many people would expect it to mean the same thing as either "[*expr, for expr in iterable]" or "[*(expr for expr in iterable)]"), but it fails on the other two grounds as well. In most uses of *-unpacking it's adding entries to a comma-delimited sequence, or consuming entries in a comma delimited sequence (the commas are optional in some cases, but they're still part of the relevant contexts). The expansions removed the special casing of functions, and made these capabilities generally available to all sequence definition operations. Comprehensions and generator expressions, by contrast, dispense with the comma delimited format entirely, and instead use a format inspired by mathematical set builder notation (just modified to use keywords and Python expressions rather than symbols and mathematical expressions): https://en.wikipedia.org/wiki/Set-builder_notation#Sets_defined_by_a_predicate However, set builder notation doesn't inherently include the notion of flattening lists-of-lists. Instead, that's a *consumption* operation that happens externally after the initial list-of-lists has been built, and that's exactly how it's currently spelled in Python: "itertools.chain.from_iterable(subiter for subiter in iterable)". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Wed Oct 12 11:42:26 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Oct 2016 02:42:26 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: Message-ID: <20161012154224.GT22471@ando.pearwood.info> On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti K?hne wrote: > Hello list > > I love the "new" unpacking generalisations as of pep448. And I found > myself using them rather regularly, both with lists and dict. > Today I somehow expected that [*foo for foo in bar] was equivalent to > itertools.chain(*[foo for foo in bar]), which it turned out to be a > SyntaxError. To me, that's a very strange thing to expect. Why would you expect that unpacking items in a list comprehension would magically lead to extra items in the resulting list? I don't think that makes any sense. Obviously we could program list comprehensions to act that way if we wanted to, but that would not be consistent with the ordinary use of list comprehensions. It would introduce a special case of magical behaviour that people will have to memorise, because it doesn't follow logically from the standard list comprehension design. The fundamental design principle of list comps is that they are equivalent to a for-loop with a single append per loop: [expr for t in iterable] is equivalent to: result = [] for t in iterable: result.append(expr) If I had seen a list comprehension with an unpacked loop variable: [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] I never in a million years would expect that running a list comprehension over a three-item sequence would magically expand to six items: [1, 'a', 2, 'b', 3, 'c'] I would expect that using the unpacking operator would give some sort of error, or *at best*, be a no-op and the result would be: [(1, 'a'), (2, 'b'), (3, 'c')] append() doesn't take multiple arguments, hence a error should be the most obvious result. But if not an error, imagine the tuple unpacked to two arguments 1 and 'a' (on the first iteration), then automatically packed back into a tuple (1, 'a') just as you started with. I think it is a clear, obvious and, most importantly, desirable property of list comprehensions with a single loop that they cannot be longer than the initial iterable that feeds them. They might be shorter, if you use the form [expr for t in iterable if condition] but they cannot be longer. So I'm afraid I cannot understand what reasoning lead you to expect that unpacking would apply this way. Wishful thinking perhaps? -- Steve From tim.peters at gmail.com Wed Oct 12 11:43:08 2016 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 12 Oct 2016 10:43:08 -0500 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: [Nick Coghlan] > It's probably more relevant that cmp() went away, since that > simplified the comparison logic to just PyObject_RichCompareBool, > without the custom comparison function path. Well, the current sort is old by now, and was written for Python 2. But it did anticipate that rich comparisons were the future, and deliberately restricted itself to using only "<" (Py_LT) comparisons. So, same as now, only the "<" path needed to be examined. > It *might* have still been possible to do something like this in the > Py2 code (since the main requirement is to do the pre-check for > consistent types if the first object is of a known type with an > optimised fast path), It shouldn't really matter whether it's a known type. For any type, if it's known that all the objects are of that type, that type's tp_richcompare slot can be read up once, and if non-NULL used throughout. That would save several levels of function call per comparison during the sort; although that's not factor-of-3-speedup potential, it should still be a significant win. > but I don't know anyone that actually *likes* adding new special cases > to already complex code and trying to figure out how to test whether > or not they've broken anything :) A nice thing about this one is that special cases are a one-time thing at the start, and don't change anything in the vast bulk of the current sorting code. So when it breaks, it should be pretty easy to figure out why ;-) From steve at pearwood.info Wed Oct 12 12:04:19 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Oct 2016 03:04:19 +1100 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: <20161009002527.GM22471@ando.pearwood.info> Message-ID: <20161012160419.GU22471@ando.pearwood.info> On Tue, Oct 11, 2016 at 10:24:47PM -0400, Terry Reedy wrote: > >>Heh--not to bikeshed, but my personal preference is to leave the > >>trailing space on the first line. This is because by the time I've > >>started a new line (and possibly have spent time fussing with > >>indentation for the odd cases that my editor doesn't get quite right) > >>I'll have forgotten that I need to start the line with a space :) > > I agree that the first version of the example, with space after 'very', > before the quote, is better. I used to think the same, until I got sick and tired of having my code output strings like: a very very verylong value that continues on the next line I learned the hard way that if I don't put the breaking space at the beginning of the next fragment, I probably wouldn't put it at the end of the previous fragment either. YMMV, I'm just reporting what works for me. -- Steve From enguerrand.pelletier at gmail.com Wed Oct 12 12:06:51 2016 From: enguerrand.pelletier at gmail.com (Enguerrand Pelletier) Date: Wed, 12 Oct 2016 18:06:51 +0200 Subject: [Python-ideas] Add a method to get the subset of a dictionnary. In-Reply-To: References: Message-ID: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> Hi all, It always bothered me to write something like this when i want to strip keys from a dictionnary in Python: a = {"foo": 1, "bar": 2, "baz": 3, "foobar": 42} interesting_keys = ["foo", "bar", "baz"] b = {k, v for k,v in a.items() if k in interesting_keys} Wouldn't it be nice to have a syntactic sugar such as: b = a.subset(interesting_keys) I find this version more elegant/explicit. But maybe this feature is not "worth it" Cheers ! From elazarg at gmail.com Wed Oct 12 12:11:55 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 12 Oct 2016 16:11:55 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161012154224.GT22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: Steve, you only need to allow multiple arguments to append(), then it makes perfect sense. ?????? ??? ??, 12 ????' 2016, 18:43, ??? Steven D'Aprano ?< steve at pearwood.info>: > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti K?hne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Oct 12 12:32:12 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 12 Oct 2016 18:32:12 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> Message-ID: <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> On 12.10.2016 17:41, Nick Coghlan wrote: > This particular proposal fails on the first question (as too many > people would expect it to mean the same thing as either "[*expr, for > expr in iterable]" or "[*(expr for expr in iterable)]") So, my reasoning would tell me: where have I seen * so far? *args and **kwargs! [...] is just the list constructor. So, putting those two pieces together is quite simple. I expect that Martti's reasoning was similar. Furthermore, your two "interpretations" would yield the very same result as [expr for expr in iterable] which doesn't match with my experience with Python so far; especially when it comes to special characters. They must mean something. So, a simple "no-op" would not match my expectations. > but it fails on the other two grounds as well. Here I disagree with you. We use *args all the time, so we know what * does. I don't understand why this should not work in between brackets [...]. Well, it works in between [...] sometimes but not always, to be precise. And that's the problem, I guess. > In most uses of *-unpacking it's adding entries to a comma-delimited > sequence, or consuming entries in a comma delimited sequence (the > commas are optional in some cases, but they're still part of the > relevant contexts). The expansions removed the special casing of > functions, and made these capabilities generally available to all > sequence definition operations. I don't know what you mean by comma-delimited sequence. There are no commas. It's just a list of entries. * adds entries to this list. (At least from my point of view.) > Comprehensions ... [are] inspired by mathematical set builder notation. Exactly. Inspired. I don't see any reason not to extend on this idea to make it more useful. > "itertools.chain.from_iterable(subiter for subiter in iterable)". I have to admit that need to read that twice to get what it does. But that might just be me. Cheers, Sven From waultah at gmail.com Wed Oct 12 12:33:08 2016 From: waultah at gmail.com (Riley Banks) Date: Wed, 12 Oct 2016 17:33:08 +0100 Subject: [Python-ideas] Add a method to get the subset of a dictionnary. In-Reply-To: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> References: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> Message-ID: Looks like it was discussed before: https://mail.python.org/pipermail/python-ideas/2012-January/013252.html From cory at lukasa.co.uk Wed Oct 12 13:08:58 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 12 Oct 2016 18:08:58 +0100 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <794933BF-8D1F-4DFA-AD92-A3DBF5274902@stranden.com> <57F60126.40103@canterbury.ac.nz> <57F6DA4E.6000108@canterbury.ac.nz> Message-ID: <57EC3929-2D42-4A5A-84BA-E57E322F35EF@lukasa.co.uk> > On 7 Oct 2016, at 16:18, Nick Coghlan wrote: > > However, if you're running in a context that embeds CPython inside a > larger application (e.g. mod_wsgi inside Apache), then gevent's > assumptions about how the C thread states are managed may be wrong, > and hence you may be in for some "interesting" debugging sessions. The > same goes for any library that implements callbacks that end up > executing a greenlet switch when they weren't expecting it (e.g. while > holding a threading lock - that will protect you from other OS > threads, but not from other greenlets in the same thread) I can speak to this. It?s been my professional experience with gevent that choosing to obtain concurrency by using gevent as opposed to explicit async was a trade-off: we replaced a large amount of drudge work in writing a codebase with async/await pervasively throughout it with a smaller amount of dramatically (10x to 100x times) more intellectually challenging debugging work when unstated assumptions regarding thread-safety and concurrent access were violated. For many developers these trade offs are sensible and reasonable, but we should all remember that there are costs and advantages of most kinds of runtime model. I?m happy to have a language that lets me do all of these things than one that chooses one for me and says ?that ought to be good enough for everyone?. Cory From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 12 13:40:59 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 13 Oct 2016 02:40:59 +0900 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: <20161012160419.GU22471@ando.pearwood.info> References: <20161009002527.GM22471@ando.pearwood.info> <20161012160419.GU22471@ando.pearwood.info> Message-ID: <22526.30123.849597.644722@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I learned the hard way that if I don't put the breaking space at > the beginning of the next fragment, I probably wouldn't put it at > the end of the previous fragment either. The converse applies in my case, so that actually doesn't matter to me. When I don't put it in, I don't put it in anywhere. What does matter to me is that I rarely make spelling errors (including typos) or omit internal spaces. That means I can get away with not reading strings carefully most of the time, and I don't. But omitted space at the joins of a continued string is frequent, and frequently caught when I'm following skimming down a suite to the next syntactic construct. But spaces at end never will be. Ie, space-at-beginning makes for more effective review for me. YMMV. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 12 13:49:03 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 13 Oct 2016 02:49:03 +0900 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: <22526.30607.281492.842122@turnbull.sk.tsukuba.ac.jp> ????? writes: > Steve, you only need to allow multiple arguments to append(), then it makes > perfect sense. No, because that would be explicit. Here it's implicit and ambiguous. Specifically, it requires guessing "operator associativity". That is something people have different intuitions about. > > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti K?hne wrote: > > > Hello list > > > > > > I love the "new" unpacking generalisations as of pep448. And I found > > > myself using them rather regularly, both with lists and dict. > > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > > SyntaxError. Which is what I myself would expect, same as *(1, 2) + 3 is a SyntaxError. I could see Nick's interpretation that *foo in such a context would actually mean (*foo,) (i.e., it casts iterables to tuples). I would certainly want [i, j for i, j in [[1, 2], [3, 4]]] to evaluate to [(1, 2), (3, 4)] (if it weren't a SyntaxError). > > To me, that's a very strange thing to expect. Why would you expect that > > unpacking items in a list comprehension would magically lead to extra > > items in the resulting list? I don't think that makes any sense. Well, that's what it does in display syntax for sequences. If you think of a comprehension as a "macro" that expands to display syntax, makes some sense. But as you and Nick point out, comprehensions are real operations, not macros which implicitly construct displays, then evaluate them to get the actual sequence. > > Wishful thinking perhaps? That was unnecessary. I know sometimes I fall into the trap of thinking there really ought to be concise syntax for a "simple" idea, and then making one up rather than looking it up. From mertz at gnosis.cx Wed Oct 12 15:22:11 2016 From: mertz at gnosis.cx (David Mertz) Date: Wed, 12 Oct 2016 12:22:11 -0700 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161012154224.GT22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: I've followed this discussion some, and every example given so far completely mystifies me and I have no intuition about what they should mean. On Oct 12, 2016 8:43 AM, "Steven D'Aprano" wrote: > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti K?hne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Wed Oct 12 15:38:57 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 12 Oct 2016 19:38:57 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: What is the intuition behind [1, *x, 5]? The starred expression is replaced with a comma-separated sequence of its elements. The trailing comma Nick referred to is there, with the rule that [1,, 5] is the same as [1, 5]. All the examples follow this intuition, IIUC. Elazar ?????? ??? ??, 12 ????' 2016, 22:22, ??? David Mertz ?: > I've followed this discussion some, and every example given so far > completely mystifies me and I have no intuition about what they should mean. > > On Oct 12, 2016 8:43 AM, "Steven D'Aprano" wrote: > > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti K?hne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Oct 12 16:26:55 2016 From: mertz at gnosis.cx (David Mertz) Date: Wed, 12 Oct 2016 13:26:55 -0700 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 12:38 PM, ????? wrote: > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > I've never actually used the `[1, *x, 5]` form. And therefore, of course, I've never taught it either (I teach Python for a living nowadays). I think that syntax already perhaps goes too far, actually; but I can understand it relatively easily by analogy with: a, *b, c = range(10) But the way I think about or explain either of those is "gather the extra items from the sequence." That works in both those contexts. In contrast: >>> *b = range(10) SyntaxError: starred assignment target must be in a list or tuple Since nothing was assigned to a non-unpacked variable, nothing is "extra items" in the same sense. So failure feels right to me. I understand that "convert an iterable to a list" is conceptually available for that line, but we already have `list(it)` around, so it would be redundant and slightly confusing. What seems to be wanted with `[*foo for foo in bar]` is basically just `flatten(bar)`. The latter feels like a better spelling, and the recipes in itertools docs give an implementation already (a one-liner). We do have a possibility of writing this: >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]] [(-5, -4, -3, -2), (0, 1, 2, 3, 4)] That's not flattened, as it should not be. But it is very confusing to have `[(*stuff) for stuff in ...]` behave differently than that. It's much more natural?and much more explicit?to write: >>> [item for seq in [range(-5,-1), range(5)] for item in seq] [-5, -4, -3, -2, 0, 1, 2, 3, 4] -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Oct 12 16:31:17 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 12 Oct 2016 15:31:17 -0500 Subject: [Python-ideas] Add a method to get the subset of a dictionnary. In-Reply-To: References: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> Message-ID: That discussion seemed to mostly just conclude that dicts shouldn't have all set operations, and then it kind of just dropped off. No one really argued the subset part. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Oct 12, 2016 11:33 AM, "Riley Banks" wrote: > Looks like it was discussed before: > https://mail.python.org/pipermail/python-ideas/2012-January/013252.html > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 12 16:35:27 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 12 Oct 2016 21:35:27 +0100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 20:22, David Mertz wrote: > I've followed this discussion some, and every example given so far > completely mystifies me and I have no intuition about what they should mean. Same here. On 12 October 2016 at 20:38, ????? wrote: > What is the intuition behind [1, *x, 5]? The starred expression is replaced > with a comma-separated sequence of its elements. > > The trailing comma Nick referred to is there, with the rule that [1,, 5] is > the same as [1, 5]. > > All the examples follow this intuition, IIUC. But intuition is precisely that - it's not based on rules, but on people's instinctive understanding. When evaluating whether something is intuitive, the *only* thing that matters is what people tell you they do or don't understand by a given construct. And in this case, people have been expressing differing interpretations, and confusion. That says "not intuitive" loud and clear to me. And yes, I find [1, *x, 5] intuitive. And I can't tell you why I find it OK, but I find {**x for x in d.items()} non-intuitive. But just because I can't explain it doesn't mean it's not true, or you can "change my mind" about how I feel. Paul From elazarg at gmail.com Wed Oct 12 16:37:38 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 12 Oct 2016 20:37:38 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 11:26 PM David Mertz wrote: > On Wed, Oct 12, 2016 at 12:38 PM, ????? wrote: > > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > > I've never actually used the `[1, *x, 5]` form. And therefore, of course, > I've never taught it either (I teach Python for a living nowadays). I > think that syntax already perhaps goes too far, actually; but I can > understand it relatively easily by analogy with: > > a, *b, c = range(10) > > It's not exactly "analogy" as such - it is the dual notion. Here you are using the "destructor" (functional terminology) but we are talking about "constructors". But nevermind. > But the way I think about or explain either of those is "gather the extra > items from the sequence." That works in both those contexts. In contrast: > > >>> *b = range(10) > SyntaxError: starred assignment target must be in a list or tuple > > Since nothing was assigned to a non-unpacked variable, nothing is "extra > items" in the same sense. So failure feels right to me. I understand that > "convert an iterable to a list" is conceptually available for that line, > but we already have `list(it)` around, so it would be redundant and > slightly confusing. > > But that's not a uniform treatment. It might have good reasons from readability point of view, but it is an explicit exception for the rule. The desired behavior would be equivalent to b = tuple(range(10)) and yes, there are Two Ways To Do It. I would think it should have been prohibited by PEP-8 and not by the compiler. Oh well. What seems to be wanted with `[*foo for foo in bar]` is basically just > `flatten(bar)`. The latter feels like a better spelling, and the recipes > in itertools docs give an implementation already (a one-liner). > > We do have a possibility of writing this: > > >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]] > [(-5, -4, -3, -2), (0, 1, 2, 3, 4)] > > That's not flattened, as it should not be. But it is very confusing to > have `[(*stuff) for stuff in ...]` behave differently than that. It's much > more natural?and much more explicit?to write: > > >>> [item for seq in [range(-5,-1), range(5)] for item in seq] > [-5, -4, -3, -2, 0, 1, 2, 3, 4] > > The distinction between (x) and (x,) is already deep in the language. It has nothing to do with this thread >>> [1, *([2],), 3] [1, [2], 3] >>> [1, *([2]), 3] [1, 2, 3] So there. Just like in this proposal. Elazar. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Oct 12 16:39:17 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 12 Oct 2016 22:39:17 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: <0a2c8587-fb02-3776-feca-d5a51e68062e@mail.de> On 12.10.2016 21:38, ????? wrote: > > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > > The trailing comma Nick referred to is there, with the rule that [1,, > 5] is the same as [1, 5]. > I have to admit that I have my problems with this "comma-separated sequence" idea. For me, lists are just collections of items. There are no commas involved. I also think that thinking about commas here complicates the matter. What * does, it basically plugs in the items from the starred expression into its surroundings: [*[1,2,3]] = [1,2,3] Let's plug in two lists into its surrounding list: [*[1,2,3], *[1,2,3]] = [1,2,3,1,2,3] So, as the thing goes, it looks like as if * could just work anywhere inside those brackets: [*[1,2,3] for _ in range(3)] = [*[1,2,3], *[1,2,3], *[1,2,3]] = [1,2,3,1,2,3,1,2,3] I have difficulties to understand the problem of understanding the syntax. The * and ** variants just flow naturally whereas the "chain" equivalent is bit "meh". Cheers, Sven From elazarg at gmail.com Wed Oct 12 16:39:41 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 12 Oct 2016 20:39:41 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: To be honest, I don't have a clear picture of what {**x for x in d.items()} should be. But I do have such picture for dict(**x for x in many_dictionaries) Elazar ?On Wed, Oct 12, 2016 at 11:37 PM ???????? wrote:? > On Wed, Oct 12, 2016 at 11:26 PM David Mertz wrote: > > On Wed, Oct 12, 2016 at 12:38 PM, ????? wrote: > > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > > I've never actually used the `[1, *x, 5]` form. And therefore, of course, > I've never taught it either (I teach Python for a living nowadays). I > think that syntax already perhaps goes too far, actually; but I can > understand it relatively easily by analogy with: > > a, *b, c = range(10) > > > It's not exactly "analogy" as such - it is the dual notion. Here you are > using the "destructor" (functional terminology) but we are talking about > "constructors". But nevermind. > > > But the way I think about or explain either of those is "gather the extra > items from the sequence." That works in both those contexts. In contrast: > > >>> *b = range(10) > SyntaxError: starred assignment target must be in a list or tuple > > Since nothing was assigned to a non-unpacked variable, nothing is "extra > items" in the same sense. So failure feels right to me. I understand that > "convert an iterable to a list" is conceptually available for that line, > but we already have `list(it)` around, so it would be redundant and > slightly confusing. > > > But that's not a uniform treatment. It might have good reasons from > readability point of view, but it is an explicit exception for the rule. > The desired behavior would be equivalent to > > b = tuple(range(10)) > > and yes, there are Two Ways To Do It. I would think it should have been > prohibited by PEP-8 and not by the compiler. Oh well. > > What seems to be wanted with `[*foo for foo in bar]` is basically just > `flatten(bar)`. The latter feels like a better spelling, and the recipes > in itertools docs give an implementation already (a one-liner). > > We do have a possibility of writing this: > > >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]] > [(-5, -4, -3, -2), (0, 1, 2, 3, 4)] > > That's not flattened, as it should not be. But it is very confusing to > have `[(*stuff) for stuff in ...]` behave differently than that. It's much > more natural?and much more explicit?to write: > > >>> [item for seq in [range(-5,-1), range(5)] for item in seq] > [-5, -4, -3, -2, 0, 1, 2, 3, 4] > > > The distinction between (x) and (x,) is already deep in the language. It > has nothing to do with this thread > > >>> [1, *([2],), 3] > [1, [2], 3] > >>> [1, *([2]), 3] > [1, 2, 3] > > So there. Just like in this proposal. > > Elazar. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Wed Oct 12 17:19:13 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:19:13 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 9:20 AM Tim Peters wrote: > > What I'm *not* quite clear on is why Python 3's change to reject > > comparisons between unrelated types makes this optimisation possible. > > It doesn't. It would also apply in Python 2. I simply expect the > optimization will pay off more frequently in Python 3 code. For > example, in Python 2 I used to create lists with objects of wildly > mixed types, and sort them merely to bring objects of the same type > next to each other. Things "like that" don't work at all in Python 3. > > > > Surely you have to check either way? It's not that it's a particularly > > important question - if the optimisation works, it's not that big a > > deal what triggered the insight. It's just that I'm not sure if > > there's some other point that I've not properly understood. > Yup. Actually, the initial version of this work was with Python 2. What happened was this: I had posted earlier something along the lines of "hey everybody let's radix sort strings instead of merge sort because that will be more fun ok". And everyone wrote me back "no please don't are you kidding". Tim Peters wrote back "try it but just fyi it's not gonna work". So I set off to try it. I had never used the C API before, but luckily I found some Python 2 documentation that gives an example of subclassing list, so I was able to mostly just copy-paste to get a working list extension module. I then copied over the implementation of listsort. My first question was how expensive python compares are vs C compares. And since python 2 has PyString_AS_STRING, which just gives you a char* pointer to a C string, I went in and replaced PyObject_RichCompareBool with strcmp and did a simple benchmark. And I was just totally blown away; it turns out you get something like a 40-50% improvement (at least on my simple benchmark). So that was the motivation for all this. Actually, if I wrote this for python 2, I might be able to get even better numbers (at least for strings), since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings are strcmp-able, so maybe if we go through and verify all the strings are UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff works to do this safely). My string special case currently just bypasses the typechecks and goes to unicode_compare(), which is still wayyy overkill for the common case of ASCII or Latin-1 strings, since it uses a for loop to go through and check characters, and strcmp uses compiler magic to do it in like, negative time or something. I even PyUnicode_READY the strings before comparing; I'm not sure if that's really necessary, but that's how PyUnicode_Compare does it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Wed Oct 12 17:26:28 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:26:28 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Tue, Oct 11, 2016 at 9:56 PM Nick Coghlan wrote: > Once you get to the point of being able to do performance mentions on > a CPython build with a modified list.sort() implementation, you'll > want to take a look at the modern benchmark suite in > https://github.com/python/performance > Yup, that's the plan. I'm going to implement optimized compares for tuples, then implement this as a CPython build, and then run benchmark suites and write some rigorous benchmarks using perf/timeit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Wed Oct 12 17:33:11 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 12 Oct 2016 23:33:11 +0200 Subject: [Python-ideas] Proposal for default character representation Message-ID: Hello all, I want to share my thoughts about syntax improvements regarding character representation in Python. I am new to the list so if such a discussion or a PEP exists already, please let me know. So in short: Currently Python uses hexadecimal notation for characters for input and output. For example let's take a unicode string "???.txt" (a file named with first three Cyrillic letters). Now printing it we get: u'\u0430\u0431\u0432.txt' So one sees that we have hex numbers here. Same is for typing in the strings which obviously also uses hex. Same is for some parts of the Python documentation, especially those about unicode strings. PROPOSAL: 1. Remove all hex notation from printing functions, typing, documention. So for printing functions leave the hex as an "option", for example for those who feel the need for hex representation, which is strange IMO. 2. Replace it with decimal notation, in this case e.g: u'\u0430\u0431\u0432.txt' becomes u'\u1072\u1073\u1074.txt' and similarly for other cases where raw bytes must be printed/inputed So to summarize: make the decimal notation standard for all cases. I am not going to go deeper, such as what digit amount (leading zeros) to use, since it's quite secondary decision. MOTIVATION: 1. Hex notation is hardly readable. It was not designed with readability in mind, so for reading it is not appropriate system, at least with the current character set, which is a mix of digits and letters (curious who was that wize person who invented such a set?). 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, I hope no need to explain why. So that's it, in short. Feel free to discuss and comment. Regards, Mikhail From elliot.gorokhovsky at gmail.com Wed Oct 12 17:35:26 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:35:26 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 5:36 AM Paul Moore wrote: > On 12 October 2016 at 11:16, Steven D'Aprano wrote: > > On Wed, Oct 12, 2016 at 12:25:16AM +0000, Elliot Gorokhovsky wrote: > > > >> Regarding generalization: the general technique for special-casing is > you > >> just substitute all type checks with 1 or 0 by applying the type > assumption > >> you're making. That's the only way to guarantee it's safe and compliant. > > > > I'm confused -- I don't understand how *removing* type checks can > > possible guarantee the code is safe and compliant. > > > > It's all very well and good when you are running tests that meet your > > type assumption, but what happens if they don't? If I sort a list made > > up of (say) mixed int and float (possibly including subclasses), does > > your "all type checks are 1 or 0" sort segfault? If not, why not? > > Where's the safety coming from? > > My understanding is that the code does a pre-check that all the > elements of the list are the same type (float, for example). This is a > relatively quick test (O(n) pointer comparisons). Yes, that's correct. I'd like to emphasize that I'm not "*removing* type checks" -- I'm checking them in advance, and then substituting in the values I already know are correct. To put it rigorously: there are expressions of the form PyWhatever_Check. I can be eager or lazy about how I calculate these. The current implementation is lazy: it waits until the values are actually called for before calculating them. This is expensive, because they are called for many, many times. My implementation is eager: I calculate all the values in advance, and then if they all happen to be the same, I plug in that value (1 or 0 as the case may be) wherever it appears in the code. If they don't happen to all be the same, like for "mixed int and float", then I just don't change anything and use the default implementation. The code for this is really very simple: int keys_are_all_same_type = 1; PyTypeObject* key_type = lo.keys[0]->ob_type; for (i=0; i< saved_ob_size; i++){ if (lo.keys[i]->ob_type != key_type){ keys_are_all_same_type = 0; break; } } if (keys_are_all_same_type){ if (key_type == &PyUnicode_Type) compare_function = unsafe_unicode_compare; if (key_type == &PyLong_Type) compare_function = unsafe_long_compare; if (key_type == &PyFloat_Type) compare_function = unsafe_float_compare; else compare_function = key_type->tp_richcompare; } else { compare_function = PyObject_RichCompare; } Those unsafe_whatever* functions are derived by substituting in, like I said, the known values for the typechecks (known since keys_are_all_same_type=1 and key_type = whatever) in the existing implementations of the compare functions. Hope everything is clear now! Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Oct 12 17:39:14 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Oct 2016 14:39:14 -0700 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 2:19 PM, Elliot Gorokhovsky wrote: [...] > So that was the motivation for all this. Actually, if I wrote this for > python 2, I might be able to get even better numbers (at least for strings), > since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings > are strcmp-able, so maybe if we go through and verify all the strings are > UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff > works to do this safely). My string special case currently just bypasses the > typechecks and goes to unicode_compare(), which is still wayyy overkill for > the common case of ASCII or Latin-1 strings, since it uses a for loop to go > through and check characters, and strcmp uses compiler magic to do it in > like, negative time or something. I even PyUnicode_READY the strings before > comparing; I'm not sure if that's really necessary, but that's how > PyUnicode_Compare does it. It looks like PyUnicode_Compare already has a special case to use memcmp when both of the strings fit into latin1: https://github.com/python/cpython/blob/cfc517e6eba37f1bd61d57bf0dbece9843bff9c8/Objects/unicodeobject.c#L10855-L10860 I suppose the for loops that are used for multibyte strings could potentially be sped up with SIMD or something, but that gets complicated fast, and modern compilers might even be doing it already. -n -- Nathaniel J. Smith -- https://vorpus.org From spencerb21 at live.com Wed Oct 12 17:39:16 2016 From: spencerb21 at live.com (Spencer Brown) Date: Wed, 12 Oct 2016 21:39:16 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> , Message-ID: The semantics seem fairly obvious if you treat it as changing the method calls. For lists, * uses .extend() instead of .append(). Sets use * for .update() instead of .add(). Dicts use ** for .update() instead of __setitem__. In that case x should be a mapping (or iterable of pairs maybe), and all pairs in that should be added to the dict. In generator expressions * means yield from instead of just yield. The ** in dicts is needed to distinguish between set and dict comprehensions, since it doesn't use a colon. Spencer On 13 Oct. 2016, at 6:41 am, ????? > wrote: To be honest, I don't have a clear picture of what {**x for x in d.items()} should be. But I do have such picture for dict(**x for x in many_dictionaries) Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Oct 12 17:41:20 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 08:41:20 +1100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 8:19 AM, Elliot Gorokhovsky wrote: > > My first question was how expensive python compares are vs C compares. And > since python 2 has PyString_AS_STRING, which just gives you a char* pointer > to a C string, I went in and replaced PyObject_RichCompareBool with strcmp > and did a simple benchmark. And I was just totally blown away; it turns out > you get something like a 40-50% improvement (at least on my simple > benchmark). > > So that was the motivation for all this. Actually, if I wrote this for > python 2, I might be able to get even better numbers (at least for strings), > since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings > are strcmp-able, so maybe if we go through and verify all the strings are > UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff > works to do this safely). I'm not sure what you mean by "strcmp-able"; do you mean that the lexical ordering of two Unicode strings is guaranteed to be the same as the byte-wise ordering of their UTF-8 encodings? I don't think that's true, but then, I'm not entirely sure how Python currently sorts strings. Without knowing which language the text represents, it's not possible to sort perfectly. https://en.wikipedia.org/wiki/Collation#Automated_collation """ Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in German dictionaries the word ?konomisch comes between offenbar and olfaktorisch, while Turkish dictionaries treat o and ? as different letters, placing oyun before ?b?r. """ Which means these lists would already be considered sorted, in their respective languages: rosuav at sikorsky:~$ python3 Python 3.7.0a0 (default:a78446a65b1d+, Sep 29 2016, 02:01:55) [GCC 6.1.1 20160802] on linux Type "help", "copyright", "credits" or "license" for more information. >>> sorted(["offenbar", "?konomisch", "olfaktorisch"]) ['offenbar', 'olfaktorisch', '?konomisch'] >>> sorted(["oyun", "?b?r", "par?ld?yor"]) ['oyun', 'par?ld?yor', '?b?r'] So what's Python doing? Is it a codepoint ordering? ChrisA From elliot.gorokhovsky at gmail.com Wed Oct 12 17:43:35 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:43:35 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 3:39 PM Nathaniel Smith wrote: > It looks like PyUnicode_Compare already has a special case to use > memcmp when both of the strings fit into latin1: > Wow! That's great! I didn't even try reading through unicode_compare, because I felt I might miss some subtle detail that would break everything. But ya, that's great! Since surely latin1 is the most common use case. So I'll just add a latin1 check in the check-loop, and then I'll have two unsafe_unicode_compare functions. I felt bad about not being able to get the same kind of string performance I had gotten with python2, so this is nice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Wed Oct 12 17:45:38 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:45:38 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: > So what's Python doing? Is it a codepoint ordering? > ...ya...how is the python interpreter supposed to know what language strings are in? There is a unique ordering of unicode strings defined by the unicode standard, AFAIK. If you want to sort by natural language ordering, see here: https://pypi.python.org/pypi/natsort -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Oct 12 17:48:15 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 12 Oct 2016 23:48:15 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: Message-ID: <57FEAF9F.5020103@egenix.com> On 12.10.2016 23:33, Mikhail V wrote: > Hello all, > > I want to share my thoughts about syntax improvements regarding > character representation in Python. > I am new to the list so if such a discussion or a PEP exists already, > please let me know. > > So in short: > > Currently Python uses hexadecimal notation > for characters for input and output. > For example let's take a unicode string "???.txt" > (a file named with first three Cyrillic letters). > > Now printing it we get: > > u'\u0430\u0431\u0432.txt' Hmm, in Python3, I get: >>> s = "???.txt" >>> s '???.txt' > So one sees that we have hex numbers here. > Same is for typing in the strings which obviously also uses hex. > Same is for some parts of the Python documentation, > especially those about unicode strings. > > PROPOSAL: > 1. Remove all hex notation from printing functions, typing, > documention. > So for printing functions leave the hex as an "option", > for example for those who feel the need for hex representation, > which is strange IMO. > 2. Replace it with decimal notation, in this case e.g: > > u'\u0430\u0431\u0432.txt' becomes > u'\u1072\u1073\u1074.txt' > > and similarly for other cases where raw bytes must be printed/inputed > So to summarize: make the decimal notation standard for all cases. > I am not going to go deeper, such as what digit amount (leading zeros) > to use, since it's quite secondary decision. > > MOTIVATION: > 1. Hex notation is hardly readable. It was not designed with readability > in mind, so for reading it is not appropriate system, at least with the > current character set, which is a mix of digits and letters (curious who > was that wize person who invented such a set?). > 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, > I hope no need to explain why. > > So that's it, in short. > Feel free to discuss and comment. The hex notation for \uXXXX is a standard also used in many other programming languages, it's also easier to parse, so I don't think we should change this default. Take e.g. >>> s = "\u123456" >>> s '?56' With decimal notation, it's not clear where to end parsing the digit notation. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 12 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From tomuxiong at gmail.com Wed Oct 12 17:50:36 2016 From: tomuxiong at gmail.com (Thomas Nyberg) Date: Wed, 12 Oct 2016 17:50:36 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: Message-ID: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> On 10/12/2016 05:33 PM, Mikhail V wrote: > Hello all, Hello! New to this list so not sure if I can reply here... :) > > Now printing it we get: > > u'\u0430\u0431\u0432.txt' > By "printing it", do you mean "this is the string representation"? I would presume printing it would show characters nicely rendered. Does it not for you? > > and similarly for other cases where raw bytes must be printed/inputed > So to summarize: make the decimal notation standard for all cases. > I am not going to go deeper, such as what digit amount (leading zeros) > to use, since it's quite secondary decision. Since when was decimal notation "standard"? It seems to be quite the opposite. For unicode representations, byte notation seems standard. > MOTIVATION: > 1. Hex notation is hardly readable. It was not designed with readability > in mind, so for reading it is not appropriate system, at least with the > current character set, which is a mix of digits and letters (curious who > was that wize person who invented such a set?). This is an opinion. I should clarify that for many cases I personally find byte notation much simpler. In this case, I view it as a toss up though for something like utf8-encoded text I would had it if I saw decimal numbers and not bytes. > 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, > I hope no need to explain why. Still not sure which "mixing" you refer to. > > So that's it, in short. > Feel free to discuss and comment. > > Regards, > Mikhail Cheers, Thomas From njs at pobox.com Wed Oct 12 17:51:21 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Oct 2016 14:51:21 -0700 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: The comparison methods on Python's str are codepoint-by-codepoint. A neat fact about UTF-8 is that bytewise comparisons on UTF-8 are equivalent to doing codepoint comparisons. But this isn't relevant to Python's str, because Python's str never uses UTF-8. -n On Wed, Oct 12, 2016 at 2:45 PM, Elliot Gorokhovsky wrote: > >> So what's Python doing? Is it a codepoint ordering? > > > ...ya...how is the python interpreter supposed to know what language strings > are in? There is a unique ordering of unicode strings defined by the unicode > standard, AFAIK. > If you want to sort by natural language ordering, see here: > https://pypi.python.org/pypi/natsort > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org From tjreedy at udel.edu Wed Oct 12 17:52:30 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 12 Oct 2016 17:52:30 -0400 Subject: [Python-ideas] Add a method to get the subset of a dictionnary. In-Reply-To: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> References: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> Message-ID: On 10/12/2016 12:06 PM, Enguerrand Pelletier wrote: > Hi all, > > It always bothered me to write something like this when i want to strip > keys from a dictionnary in Python: > > a = {"foo": 1, "bar": 2, "baz": 3, "foobar": 42} > interesting_keys = ["foo", "bar", "baz"] If the keys are hashable, this should be a set. > b = {k, v for k,v in a.items() if k in interesting_keys} Test code before posting. The above is a set comprehension creating a set of tupes. For a dict, 'k, v' must be 'k:v'. > Wouldn't it be nice to have a syntactic sugar such as: > > b = a.subset(interesting_keys) It is pretty rare for the filter condition to be exactly 'key in explicit_keys'. If it is, one can directly construct the dict from a and explict_keys. b = {k:a[k] for k in interesting_keys} The syntactic sugar wrapping this would save 6 keypresses. Interesting_keys can be any iterable. To guarantee no KeyErrors, add 'if k in a'. -- Terry Jan Reedy From elliot.gorokhovsky at gmail.com Wed Oct 12 17:57:58 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 21:57:58 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > Really? I thought in python 3, strings are all unicode... so what encoding do they use, then? -------------- next part -------------- An HTML attachment was scrubbed... URL: From danilo.bellini at gmail.com Wed Oct 12 17:58:46 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Wed, 12 Oct 2016 18:58:46 -0300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FEAF9F.5020103@egenix.com> References: <57FEAF9F.5020103@egenix.com> Message-ID: I'm -1 on this. Just type "0431 unicode" on your favorite search engine. U+0431 is the codepoint, not whatever digits 0x431 has in decimal. That's a tradition and something external to Python. As a related concern, I think using decimal/octal on raw data is a terrible idea (e.g. On Linux, I always have to re-format the "cmp -l" to really grasp what's going on, changing it to hexadecimal). Decimal notation is hardly readable when we're dealing with stuff designed in base 2 (e.g. due to the visual separation of distinct bytes). How many people use "hexdump" (or any binary file viewer) with decimal output instead of hexadecimal? I agree that mixing representations for the same abstraction (using decimal in some places, hexadecimal in other ones) can be a bad idea. Actually, that makes me believe "decimal unicode codepoint" shouldn't ever appear in string representations. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Oct 12 18:03:59 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 12 Oct 2016 18:03:59 -0400 Subject: [Python-ideas] Add a method to get the subset of a dictionnary. In-Reply-To: References: <0945e829-6936-9bb9-5d4f-5c85ef01fd69@gmail.com> Message-ID: On 10/12/2016 5:52 PM, Terry Reedy wrote: > On 10/12/2016 12:06 PM, Enguerrand Pelletier wrote: >> b = {k, v for k,v in a.items() if k in interesting_keys} > > Test code before posting. The above is a set comprehension creating a > set of tupes. I should have followed my own advice. The above is a SyntaxError until 'k,v' is wrapped in parens, '(k,v)'. -- Terry Jan Reedy From p.f.moore at gmail.com Wed Oct 12 18:07:28 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 12 Oct 2016 23:07:28 +0100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On 12 October 2016 at 22:57, Elliot Gorokhovsky wrote: > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: >> >> But this isn't relevant to Python's str, because Python's str never uses >> UTF-8. > > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? They are stored internally as arrays of code points, 1-byte (0-255) if all code points fit in that range, otherwise 2-byte or if needed 4 byte. See PEP 393 (https://www.python.org/dev/peps/pep-0393/) for details. Paul From alexander.belopolsky at gmail.com Wed Oct 12 18:08:40 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 12 Oct 2016 18:08:40 -0400 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > >> But this isn't relevant to Python's str, because Python's str never uses >> UTF-8. >> > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? > No encoding is used. The actual code points are stored as integers of the same size. If all code points are less than 256, they are stored as 8-bit integers (bytes). If some code points are greater or equal to 256 but less than 65536, they are stored as 16-bit integers and so on. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Oct 12 18:11:19 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 09:11:19 +1100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 8:51 AM, Nathaniel Smith wrote: > The comparison methods on Python's str are codepoint-by-codepoint. Thanks, that's what I wasn't sure of. ChrisA From elliot.gorokhovsky at gmail.com Wed Oct 12 18:14:11 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 22:14:11 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: Ah. That makes a lot of sense, actually. Anyway, so then Latin1 strings are memcmp-able, and others are not. That's fine; I'll just add a check for that (I think there are already helper functions for this) and then have two special-case string functions. Thanks! On Wed, Oct 12, 2016 at 4:08 PM Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < > elliot.gorokhovsky at gmail.com> wrote: > > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? > > > No encoding is used. The actual code points are stored as integers of the > same size. If all code points are less than 256, they are stored as 8-bit > integers (bytes). If some code points are greater or equal to 256 but less > than 65536, they are stored as 16-bit integers and so on. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Oct 12 18:25:03 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 12 Oct 2016 18:25:03 -0400 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161012101621.GR22471@ando.pearwood.info> Message-ID: On 10/12/2016 5:57 PM, Elliot Gorokhovsky wrote: > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith > wrote: > > But this isn't relevant to Python's str, because Python's str never > uses UTF-8. > > > Really? I thought in python 3, strings are all unicode... They are ... > so what encoding do they use, then? Since 3.3, essentially ascii, latin1, utf-16 without surrogates (ucs2), or utf-32, depending on the hightest codepoint. This is the 'kind' field. If we go this route, I suspect that optimizing string sorting will take some experimentation. If the initial item is str, it might be worthwhile to record the highest 'kind' during the type scan, so that strncmp can be used if all are ascii or latin-1. -- Terry Jan Reedy From tjreedy at udel.edu Wed Oct 12 18:27:56 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 12 Oct 2016 18:27:56 -0400 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: <22526.30123.849597.644722@turnbull.sk.tsukuba.ac.jp> References: <20161009002527.GM22471@ando.pearwood.info> <20161012160419.GU22471@ando.pearwood.info> <22526.30123.849597.644722@turnbull.sk.tsukuba.ac.jp> Message-ID: On 10/12/2016 1:40 PM, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I learned the hard way that if I don't put the breaking space at > > the beginning of the next fragment, I probably wouldn't put it at > > the end of the previous fragment either. > > The converse applies in my case, so that actually doesn't matter to > me. When I don't put it in, I don't put it in anywhere. > > What does matter to me is that I rarely make spelling errors > (including typos) or omit internal spaces. That means I can get away > with not reading strings carefully most of the time, and I don't. But > omitted space at the joins of a continued string is frequent, and > frequently caught when I'm following skimming down a suite to the next > syntactic construct. But spaces at end never will be. > > Ie, space-at-beginning makes for more effective review for me. YMMV. I think that PEP 8 should not recommend either way. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Wed Oct 12 18:34:14 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 12 Oct 2016 18:34:14 -0400 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > so then Latin1 strings are memcmp-able, and others are not. No. Strings of the same kind are "memcmp-able" regardless of their kind. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Oct 12 18:41:05 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 12 Oct 2016 23:41:05 +0100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161012101621.GR22471@ando.pearwood.info> Message-ID: On 2016-10-12 23:34, Alexander Belopolsky wrote: > > On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky > > wrote: > > so then Latin1 strings are memcmp-able, and others are not. > > > No. Strings of the same kind are "memcmp-able" regardless of their kind. > Surely that's true only if they're big-endian. From njs at pobox.com Wed Oct 12 18:41:54 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Oct 2016 15:41:54 -0700 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 3:34 PM, Alexander Belopolsky wrote: > > On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky > wrote: >> >> so then Latin1 strings are memcmp-able, and others are not. > > > No. Strings of the same kind are "memcmp-able" regardless of their kind. I don't think this is true on little-endian systems. -n -- Nathaniel J. Smith -- https://vorpus.org From greg.ewing at canterbury.ac.nz Wed Oct 12 18:44:59 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 11:44:59 +1300 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: <57FEBCEB.4070800@canterbury.ac.nz> Paul Moore wrote: > What I'm *not* quite clear on is why Python 3's change to reject > comparisons between unrelated types makes this optimisation possible. I think the idea was that it's likely to be *useful* a higher proportion of the time, because Python 3 programmers have to be careful that the types they're sorting are compatible. I'm not sure how true that is -- just because you *could* sort lists containing a random selection of types in Python 2 doesn't necessarily mean it was done often. -- Greg From mikhailwas at gmail.com Wed Oct 12 19:06:04 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 01:06:04 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FEAF9F.5020103@egenix.com> References: <57FEAF9F.5020103@egenix.com> Message-ID: Forgot to reply to all, duping my mesage... On 12 October 2016 at 23:48, M.-A. Lemburg wrote: > Hmm, in Python3, I get: > >>>> s = "???.txt" >>>> s > '???.txt' I posted output with Python2 and Windows 7 BTW , In Windows 10 'print' won't work in cmd console at all by default with unicode but thats another story, let us not go into that. I think you get my idea right, it is not only about printing. > The hex notation for \uXXXX is a standard also used in many other > programming languages, it's also easier to parse, so I don't > think we should change this default. In programming literature it is used often, but let me point out that decimal is THE standard and is much much better standard in sence of readability. And there is no solid reason to use 2 standards at the same time. > > Take e.g. > >>>> s = "\u123456" >>>> s > '?56' > > With decimal notation, it's not clear where to end parsing > the digit notation. How it is not clear if the digit amount is fixed? Not very clear what did you mean. From mikhailwas at gmail.com Wed Oct 12 19:09:58 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 01:09:58 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On 12 October 2016 at 23:58, Danilo J. S. Bellini wrote: > Decimal notation is hardly > readable when we're dealing with stuff designed in base 2 (e.g. due to the > visual separation of distinct bytes). Hmm what keeps you from separateting the logical units to be represented each by a decimal number? like 001 023 255 ... Do you really think this is less readable than its hex equivalent? Then you are probably working with hex numbers only, but I doubt that. > I agree that mixing representations for the same abstraction (using decimal > in some places, hexadecimal in other ones) can be a bad idea. "Can be"? It is indeed a horrible idea. Also not only for same abstraction but at all. > makes me believe "decimal unicode codepoint" shouldn't ever appear in string > representations. I use this site to look the chars up: http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html PS: that is rather peculiar, three negative replies already but with no strong arguments why it would be bad to stick to decimal only, only some "others do it so" and "tradition" arguments. The "base 2" argument could work at some grade but if stick to this criteria why not speak about octal/quoternary/binary then? Please note, I am talking only about readability _of the character set_ actually. And it is not including your habit issues, but rather is an objective criteria for using this or that character set. And decimal is objectively way more readable than hex standard character set, regardless of how strong your habits are. From mikhailwas at gmail.com Wed Oct 12 19:13:12 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 01:13:12 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> Message-ID: On 12 October 2016 at 23:50, Thomas Nyberg wrote: > Since when was decimal notation "standard"? Depends on what planet do you live. I live on planet Earth. And you? > opposite. For unicode representations, byte notation seems standard. How does this make it a good idea? Consider unicode table as an array with glyphs. Now the index of the array is suddenly represented in some obscure character set. How this index is other than index of any array or natural number? Think about it... >> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, >> I hope no need to explain why. > > Still not sure which "mixing" you refer to. Still not sure? These two words in brackets. Mixing those two systems. From elliot.gorokhovsky at gmail.com Wed Oct 12 19:20:48 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 12 Oct 2016 23:20:48 +0000 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 4:26 PM Terry Reedy wrote: > I suspect that optimizing string sorting > will take some experimentation. If the initial item is str, it might be > worthwhile to record the highest 'kind' during the type scan, so that > strncmp can be used if all are ascii or latin-1. > My thoughts exactly. One other optimization along these lines: the reason ints don't give quite as shocking results as floats is that comparisons are a bit more expensive: one first has to check that the int would fit in a c long before comparing; if not, then a custom procedure has to be used. However, in practice ints being sorted are almost always smaller in absolute value than 2**32 or whatever. So I think, just as it might pay off to check for latin-1 and use strcmp, it may also pay off to check for fits-in-a-C-long and use a custom function for that case as well, since the performance would be precisely as awesome as the float performance that started this thread: comparisons would just be the cost of pointer dereference plus the cost of C long comparison, i.e. the minimum possible cost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Oct 12 19:29:43 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Oct 2016 10:29:43 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> Message-ID: <20161012232943.GV22471@ando.pearwood.info> On Wed, Oct 12, 2016 at 06:32:12PM +0200, Sven R. Kunze wrote: > On 12.10.2016 17:41, Nick Coghlan wrote: > >This particular proposal fails on the first question (as too many > >people would expect it to mean the same thing as either "[*expr, for > >expr in iterable]" or "[*(expr for expr in iterable)]") > > So, my reasoning would tell me: where have I seen * so far? *args and > **kwargs! And multiplication. And sequence unpacking. > [...] is just the list constructor. Also indexing: dict[key] or sequence[item or slice]. The list constructor would be either list(...) or possibly list.__new__. [...] is either a list display: [1, 2, 3, 4] or a list comprehension. They are not the same thing, and they don't work the same way. The only similarity is that they use [ ] as delimiters, just like dict and sequence indexing. That doesn't mean that you can write: mydict[x for x in seq if condition] Not everything with [ ] is the same. > So, putting those two pieces together is quite simple. I don't see that it is simple at all. I don't see any connection between function *args and list comprehension loop variables. > Furthermore, your two "interpretations" would yield the very same result > as [expr for expr in iterable] which doesn't match with my experience > with Python so far; especially when it comes to special characters. They > must mean something. So, a simple "no-op" would not match my expectations. Just because something would otherwise be a no-op doesn't mean that it therefore has to have some magical meaning. Python has a few no-ops which are allowed, or required, by syntax but don't do anything. pass (x) # same as just x +1 # no difference between literals +1 and 1 -0 func((expr for x in iterable)) # redundant parens for generator expr There may be more. > >but it fails on the other two grounds as well. > > Here I disagree with you. We use *args all the time, so we know what * > does. I don't understand why this should not work in between brackets [...]. By this logic, *t should work... everywhere? while *args: try: raise *args except *args: del *args That's not how Python works. Just because syntax is common, doesn't mean it has to work everywhere. We cannot write: for x in import math: ... even though importing is common. *t doesn't work as the expression inside a list comprehension because that's not how list comps work. To make it work requires making this a special case and mapping [expr for t in iterable] to a list append, while [*expr for t in iterable] gets mapped to a list extend. Its okay to want that as a special feature, but understand what you are asking for: you're not asking for some restriction to be lifted, which will then automatically give you the functionality you expect. You're asking for new functionality to be added. Sequence unpacking inside list comprehensions as a way of flattening a sequence is completely new functionality which does not logically follow from the current semantics of comprehensions. > >In most uses of *-unpacking it's adding entries to a comma-delimited > >sequence, or consuming entries in a comma delimited sequence (the > >commas are optional in some cases, but they're still part of the > >relevant contexts). The expansions removed the special casing of > >functions, and made these capabilities generally available to all > >sequence definition operations. > > I don't know what you mean by comma-delimited sequence. There are no > commas. It's just a list of entries. * adds entries to this list. (At > least from my point of view.) Not all points of view are equally valid. -- Steve From steve at pearwood.info Wed Oct 12 19:34:40 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Oct 2016 10:34:40 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: <20161012233440.GW22471@ando.pearwood.info> On Wed, Oct 12, 2016 at 04:11:55PM +0000, ????? wrote: > Steve, you only need to allow multiple arguments to append(), then it makes > perfect sense. I think you're missing a step. What will multiple arguments given to append do? There are two obvious possibilities: - collect all the arguments into a tuple, and append the tuple; - duplicate the functionality of list.extend neither of which appeals to me. -- Steve From elazarg at gmail.com Wed Oct 12 19:48:26 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 12 Oct 2016 23:48:26 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161012233440.GW22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161012233440.GW22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 2:35 AM Steven D'Aprano wrote: > On Wed, Oct 12, 2016 at 04:11:55PM +0000, ????? wrote: > > > Steve, you only need to allow multiple arguments to append(), then it > makes > > perfect sense. > > I think you're missing a step. What will multiple arguments given to > append do? There are two obvious possibilities: > > - collect all the arguments into a tuple, and append the tuple; > > - duplicate the functionality of list.extend > > > neither of which appeals to me. > The latter, of course. Similar to max(). Not unheard of. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Oct 12 19:50:30 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 10:50:30 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote: > On 12 October 2016 at 23:58, Danilo J. S. Bellini > wrote: > >> Decimal notation is hardly >> readable when we're dealing with stuff designed in base 2 (e.g. due to the >> visual separation of distinct bytes). > > Hmm what keeps you from separateting the logical units to be represented each > by a decimal number? like 001 023 255 ... > Do you really think this is less readable than its hex equivalent? > Then you are probably working with hex numbers only, but I doubt that. Way WAY less readable, and I'm comfortable working in both hex and decimal. >> I agree that mixing representations for the same abstraction (using decimal >> in some places, hexadecimal in other ones) can be a bad idea. > "Can be"? It is indeed a horrible idea. Also not only for same abstraction > but at all. > >> makes me believe "decimal unicode codepoint" shouldn't ever appear in string >> representations. > I use this site to look the chars up: > http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html You're the one who's non-standard here. Most of the world uses hex for Unicode codepoints. http://unicode.org/charts/ HTML entities permit either decimal or hex, but other than that, I can't think of any common system that uses decimal for Unicode codepoints in strings. > PS: > that is rather peculiar, three negative replies already but with no strong > arguments why it would be bad to stick to decimal only, only some > "others do it so" and "tradition" arguments. "Others do it so" is actually a very strong argument. If all the rest of the world uses + to mean addition, and Python used + to mean subtraction, it doesn't matter how logical that is, it is *wrong*. Most of the world uses U+201C or "\u201C" to represent a curly double quote; if you us 0x93, you are annoyingly wrong, and if you use 8220, everyone has to do the conversion from that to 201C. Yes, these are all differently-valid standards, but that doesn't make it any less annoying. > Please note, I am talking only about readability _of the character > set_ actually. > And it is not including your habit issues, but rather is an objective > criteria for using this or that character set. > And decimal is objectively way more readable than hex standard character set, > regardless of how strong your habits are. How many decimal digits would you use to denote a single character? Do you have to pad everything to seven digits (\u0000034 for an ASCII quote)? And if not, how do you mark the end? This is not "objectively more readable" if the only gain is "no A-F" and the loss is "unpredictable length". ChrisA From mikhailwas at gmail.com Wed Oct 12 21:56:59 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 03:56:59 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On 13 October 2016 at 01:50, Chris Angelico wrote: > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote: >> On 12 October 2016 at 23:58, Danilo J. S. Bellini >> wrote: >> >>> Decimal notation is hardly >>> readable when we're dealing with stuff designed in base 2 (e.g. due to the >>> visual separation of distinct bytes). >> >> Hmm what keeps you from separateting the logical units to be represented each >> by a decimal number? like 001 023 255 ... >> Do you really think this is less readable than its hex equivalent? >> Then you are probably working with hex numbers only, but I doubt that. > > Way WAY less readable, and I'm comfortable working in both hex and decimal. Please don't mix the readability and personal habit, which previuos repliers seems to do as well. Those two things has nothing to do with each other. If you are comfortable with old roman numbering system this does not make it readable. And I am NOT comfortable with hex, as well as most people would be glad to use single notation. But some of them think that they are cool because they know several numbering notations ;) But I bet few can actually understand which is more readable. > You're the one who's non-standard here. Most of the world uses hex for > Unicode codepoints. No I am not the one, many people find it silly to use different notations for same thing - index of the element, and they are very right about that. I am not silly, I refuse to use it and luckily I can. Also I know that decimal is more readable than hex so my choice is supportend by the understanding and not simply refusing. > >> PS: >> that is rather peculiar, three negative replies already but with no strong >> arguments why it would be bad to stick to decimal only, only some >> "others do it so" and "tradition" arguments. > > "Others do it so" is actually a very strong argument. If all the rest > of the world uses + to mean addition, and Python used + to mean > subtraction, it doesn't matter how logical that is, it is *wrong*. This actually supports my proposal perfectly, if everyone uses decimal why suddenly use hex for same thing - index of array. I don't see how your analogy contradicts with my proposal, it's rather supporting it. > quote; if you us 0x93, you are annoyingly wrong, Please don't make personal assessments here, I can use whatever I want, moreover I find this notation as silly as using different measurement systems without any reason and within one activity, and in my eyes this is annoyingly wrong and stupid, but I don't call nobody here stupid. But I do want that you could abstract yourself from your habit for a while and talk about what would be better for the future usage. > everyone has to do the conversion from that to 201C. Nobody need to do ANY conversions if use decimal, and as said everything is decimal: numbers, array indexes, ord() function returns decimal, you can imagine more examples so it is not only more readable but also more traditional. > How many decimal digits would you use to denote a single character? for text, three decimal digits would be enough for me personally, and in long perspective when the world's alphabetical garbage will dissapear, two digits would be ok. > you have to pad everything to seven digits (\u0000034 for an ASCII > quote)? Depends on case, for input - some separator, or padding is also ok, I don't have problems with both. For printing obviously don't show leading zeros, but rather spaces. But as said I find this Unicode only some temporary happening, it will go to history in some future and be used only to study extinct glyphs. Mikhail From brenbarn at brenbarn.net Wed Oct 12 22:18:11 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 12 Oct 2016 19:18:11 -0700 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <57FEEEE3.7050109@brenbarn.net> On 2016-10-12 18:56, Mikhail V wrote: > Please don't mix the readability and personal habit, which previuos > repliers seems to do as well. Those two things has nothing > to do with each other. You keep saying this, but it's quite incorrect. The usage of decimal notation is itself just a convention, and the only reason it's easy for you (and for many other people) is because you're used to it. If you had grown up using only hexadecimal or binary, you would find decimal awkward. There is nothing objectively better about base 10 than any other place-value numbering system. Decimal is just a habit. Now, it's true that base-10 is at this point effectively universal across human societies, and that gives it a certain claim to primacy. But base-16 (along with base 2) is also quite common in computing contexts. Saying we should dump hex notation because everyone understands decimal is like saying that all signs in Prague should only be printed in English because there are more English speakers in the entire world than Czech speakers. But that ignores the fact that there are more Czech speakers *in Prague*. Likewise, decimal may be more common as an overall numerical notation, but when it comes to referring to Unicode code points, hexadecimal is far and away more common. Just look at the Wikipedia page for Unicode, which says: "Normally a Unicode code point is referred to by writing "U+" followed by its hexadecimal number." That's it. You'll find the same thing on unicode.org. The unicode code point is hardly even a number in the usual sense; it's just a label that identifies the character. If you have an issue with using hex to represent unicode code points, your issue goes way beyond Python, and you need to take it up with the Unicode consortium. (Good luck with that.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From rosuav at gmail.com Wed Oct 12 22:24:50 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 13:24:50 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V wrote: > But as said I find this Unicode only some temporary happening, > it will go to history in some future and be > used only to study extinct glyphs. And what will we be using instead? Morbid curiosity trumping a plonking, for the moment. ChrisA From rymg19 at gmail.com Wed Oct 12 22:33:33 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 12 Oct 2016 21:33:33 -0500 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: Message-ID: On Oct 12, 2016 4:33 PM, "Mikhail V" wrote: > > Hello all, > > *snip* > > PROPOSAL: > 1. Remove all hex notation from printing functions, typing, > documention. > So for printing functions leave the hex as an "option", > for example for those who feel the need for hex representation, > which is strange IMO. > 2. Replace it with decimal notation, in this case e.g: > > u'\u0430\u0431\u0432.txt' becomes > u'\u1072\u1073\u1074.txt' > > and similarly for other cases where raw bytes must be printed/inputed > So to summarize: make the decimal notation standard for all cases. > I am not going to go deeper, such as what digit amount (leading zeros) > to use, since it's quite secondary decision. > If decimal notation isn't used for parsing, only for printing, it would be confusing as heck, but using it for both would break a lot of code in subtle ways (the worst kind of code breakage). > MOTIVATION: > 1. Hex notation is hardly readable. It was not designed with readability > in mind, so for reading it is not appropriate system, at least with the > current character set, which is a mix of digits and letters (curious who > was that wize person who invented such a set?). The Unicode standard. I agree that hex is hard to read, but the standard uses it to refer to the code points. It's great to be able to google code points and find the characters easily, and switching to decimal would screw it up. And I've never seen someone *need* to figure out the decimal version from the hex before. It's far more likely to google the hex #. TL;DR: I think this change would induce a LOT of short-term issues, despite it being up in the air if there's any long-term gain. So -1 from me. > 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, > I hope no need to explain why. > Indeed, you don't. :) > So that's it, in short. > Feel free to discuss and comment. > > Regards, > Mikhail > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Oct 12 22:35:15 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 12 Oct 2016 21:35:15 -0500 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On Oct 12, 2016 9:25 PM, "Chris Angelico" wrote: > > On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V wrote: > > But as said I find this Unicode only some temporary happening, > > it will go to history in some future and be > > used only to study extinct glyphs. > > And what will we be using instead? > Emoji, of course! What else? > Morbid curiosity trumping a plonking, for the moment. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Oct 12 22:44:25 2016 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 13 Oct 2016 03:44:25 +0100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <8a7ce00d-0c73-d850-188b-b97e16262ad1@mrabarnett.plus.com> On 2016-10-13 00:50, Chris Angelico wrote: > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote: >> On 12 October 2016 at 23:58, Danilo J. S. Bellini >> wrote: >> >>> Decimal notation is hardly >>> readable when we're dealing with stuff designed in base 2 (e.g. due to the >>> visual separation of distinct bytes). >> >> Hmm what keeps you from separateting the logical units to be represented each >> by a decimal number? like 001 023 255 ... >> Do you really think this is less readable than its hex equivalent? >> Then you are probably working with hex numbers only, but I doubt that. > > Way WAY less readable, and I'm comfortable working in both hex and decimal. > >>> I agree that mixing representations for the same abstraction (using decimal >>> in some places, hexadecimal in other ones) can be a bad idea. >> "Can be"? It is indeed a horrible idea. Also not only for same abstraction >> but at all. >> >>> makes me believe "decimal unicode codepoint" shouldn't ever appear in string >>> representations. >> I use this site to look the chars up: >> http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html > > You're the one who's non-standard here. Most of the world uses hex for > Unicode codepoints. > > http://unicode.org/charts/ > > HTML entities permit either decimal or hex, but other than that, I > can't think of any common system that uses decimal for Unicode > codepoints in strings. > >> PS: >> that is rather peculiar, three negative replies already but with no strong >> arguments why it would be bad to stick to decimal only, only some >> "others do it so" and "tradition" arguments. > > "Others do it so" is actually a very strong argument. If all the rest > of the world uses + to mean addition, and Python used + to mean > subtraction, it doesn't matter how logical that is, it is *wrong*. > Most of the world uses U+201C or "\u201C" to represent a curly double > quote; if you us 0x93, you are annoyingly wrong, and if you use 8220, > everyone has to do the conversion from that to 201C. Yes, these are > all differently-valid standards, but that doesn't make it any less > annoying. > >> Please note, I am talking only about readability _of the character >> set_ actually. >> And it is not including your habit issues, but rather is an objective >> criteria for using this or that character set. >> And decimal is objectively way more readable than hex standard character set, >> regardless of how strong your habits are. > > How many decimal digits would you use to denote a single character? Do > you have to pad everything to seven digits (\u0000034 for an ASCII > quote)? And if not, how do you mark the end? This is not "objectively > more readable" if the only gain is "no A-F" and the loss is > "unpredictable length". > Well, Perl doesn't have \u or \U; instead it has extended \x, so you can write, say, \x{201C}. Still in hex, though, as nature intended! :-) From vgr255 at live.ca Wed Oct 12 22:49:16 2016 From: vgr255 at live.ca (Emanuel Barry) Date: Thu, 13 Oct 2016 02:49:16 +0000 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: > From: Mikhail V > Sent: Wednesday, October 12, 2016 9:57 PM > Subject: Re: [Python-ideas] Proposal for default character representation Hello, and welcome to Python-ideas, where only a small portion of ideas go further, and where most newcomers that wish to improve the language get hit by the reality bat! I hope you enjoy your stay :) > On 13 October 2016 at 01:50, Chris Angelico wrote: > > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V > wrote: > > > > Way WAY less readable, and I'm comfortable working in both hex and > decimal. > > Please don't mix the readability and personal habit, which previuos > repliers seems to do as well. Those two things has nothing > to do with each other. If you are comfortable with old roman numbering > system this does not make it readable. > And I am NOT comfortable with hex, as well as most people would > be glad to use single notation. > But some of them think that they are cool because they know several > numbering notations ;) But I bet few can actually understand which is more > readable. I'll turn your argument around: Not being comfortable with hex does not make it unreadable; it's a matter of habit (as Brendan pointed out in his separate reply). > > You're the one who's non-standard here. Most of the world uses hex for > > Unicode codepoints. > No I am not the one, many people find it silly to use different notations > for same thing - index of the element, and they are very right about that. > I am not silly, I refuse to use it and luckily I can. Also I know that decimal > is more readable than hex so my choice is supportend by the > understanding and not simply refusing. Unicode code points are represented using hex notation virtually everywhere I ever saw it. Your Unicode-code-points-as-decimal website was a new discovery for me (and, I presume, many others on this list). Since it's widely used in the world, going against that effectively makes you non-standard. That doesn't mean it's necessarily a bad thing, but it does mean that your chances (or anyone's chances) of actually changing that are equal to zero (and this isn't some gross exaggeration), > > > >> PS: > >> that is rather peculiar, three negative replies already but with no strong > >> arguments why it would be bad to stick to decimal only, only some > >> "others do it so" and "tradition" arguments. > > > > "Others do it so" is actually a very strong argument. If all the rest > > of the world uses + to mean addition, and Python used + to mean > > subtraction, it doesn't matter how logical that is, it is *wrong*. > > This actually supports my proposal perfectly, if everyone uses decimal > why suddenly use hex for same thing - index of array. I don't see how > your analogy contradicts with my proposal, it's rather supporting it. I fail to see your point here. Where is that "everyone uses decimal"? Unless you stopped talking about representation in strings (which seems likely, as you're talking about indexing?), everything is represented as hex. > But I do want that you could abstract yourself from your habit for a while > and talk about what would be better for the future usage. I'll be that guy and tell you that you need to step back from your own idea for a while and consider your proposal and the current state of things. I'll also take the opportunity to reiterate that there is virtually no chance to change this behaviour. This doesn't, however, prevent you or anyone from talking about the topic, either for fun, or for finding other (related or otherwise) areas of interest that you think might be worth investigating further. A lot of threads actually branch off in different topics that came up when discussing, and that are interesting enough to pursue on their own. > > everyone has to do the conversion from that to 201C. > > Nobody need to do ANY conversions if use decimal, > and as said everything is decimal: numbers, array indexes, > ord() function returns decimal, you can imagine more examples > so it is not only more readable but also more traditional. You're mixing up more than just one concept here: - Integer literals; I assume this is what you meant, and you seem to forget (or maybe you didn't know, in which case here's to learning something new!) that 0xff is perfectly valid syntax, and store the integer with the value of 255 in base 10. - Indexing, and that's completely irrelevant to the topic at hand (also see above bullet point). - ord() which returns an integer (which can be interpreted in any base!), and that's both an argument for and against this proposal; the "against" side is actually that decimal notation has no defined boundary for when to stop (and before you argue that it does, I'll point out that the separations, e.g. grouping by the thousands, are culture-driven and not an international standard). There's actually a precedent for this in Python 2 with the \x escape (need I remind anyone why Python 3 was created again? :), but that's exactly a stone in the "don't do that" camp, instead of the other way around. > > How many decimal digits would you use to denote a single character? > > for text, three decimal digits would be enough for me personally, > and in long perspective when the world's alphabetical garbage will > dissapear, two digits would be ok. You seem to have misunderstood the question - in "\u00123456", there is no ambiguity that this is a string consisting of 5 characters; the first one is '\u0012', the second one is '3', the third one is '4', the fourth one is '5', and the last one is '6'. In the string (using \d as a hypothetical escape method; regex gurus can go read #27364 ;) "\d00123456", how many characters does this contain? It's decimal, so should the escape grab the first 5 digits? Or 6 maybe? You tell me. > > you have to pad everything to seven digits (\u0000034 for an ASCII > > quote)? > > Depends on case, for input - > some separator, or padding is also ok, > I don't have problems with both. For printing obviously don't show > leading zeros, but rather spaces. No leading zeros? That means you don't have a fixed number of digits, and your string is suddenly very ambiguous (also see my point above). > But as said I find this Unicode only some temporary happening, > it will go to history in some future and be > used only to study extinct glyphs. Unicode, a temporary happening? Well, strictly speaking, nobody can know that, but I'd expect that it's going to, someday, be *the* common standard. I'm not bathed in illusion, though. > Mikhail All in all, that's a pretty interesting idea. However, it has no chance of happening, because a lot of code would break, Python would deviate from the rest of the world, this wouldn't be backwards compatible (and another backwards-incompatible major release isn't happening; the community still hasn't fully caught up with the one 8 years ago), and it would be unintuitive to anyone who's done computer programming before (or after, or during, or anytime). I do see some bits worth pursuing in your idea, though, and I encourage you to keep going! As I said earlier, Python-ideas is a place where a lot of ideas are born and die, and that shouldn't stop you from trying to contribute. Python is 25 years old, and a bunch of stuff is there just for backwards compatibility; these kind of things can't get changed easily. The older (older by contribution period, not actual age) contributors still active don't try to fix what's not broken (to them). Newcomers, such as you, are a breath of fresh air to the language, and what helps make it thrive even more! By bringing new, uncommon ideas, you're challenging the status quo and potentially changing it for the best. But keep in mind that, with no clear consensus, the status quo always wins a stalemate. I hope that makes sense! Cheers, Emanuel From mikhailwas at gmail.com Thu Oct 13 01:46:38 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 07:46:38 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FEEEE3.7050109@brenbarn.net> References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: On 13 October 2016 at 04:18, Brendan Barnwell wrote: > On 2016-10-12 18:56, Mikhail V wrote: >> >> Please don't mix the readability and personal habit, which previuos >> repliers seems to do as well. Those two things has nothing >> to do with each other. > > > You keep saying this, but it's quite incorrect. The usage of > decimal notation is itself just a convention, and the only reason it's easy > for you (and for many other people) is because you're used to it. If you > had grown up using only hexadecimal or binary, you would find decimal > awkward. Exactly, but this is not called "readability" but rather "acquired ability to read" or simply habit, which does not reflect the "readability" of the character set itself. > There is nothing objectively better about base 10 than any other > place-value numbering system. Sorry to say, but here you are totally wrong. Not to treat you personally for your fallacy, that is quite common among those who are not familiar with the topic, but you should consider some important points: --- 1. Each taken character set has certain grade of readability which depends solely on the form of its units (aka glyphs). 2. Linear string representation is superior to anything else (spiral, arc, etc.) 3. There exist glyphs which provide maximal readability, those are particular glyphs with particular constant form, and this form is absolutely independent from the encoding subject. 4. According to my personal studies (which does not mean it must be accepted or blindly believed in, but I have solid experience in this area and acting quite successful in it) the amount of this glyphs is less then 10, namely I am by 8 glyphs now. 5. Main measured parameter which reflects the readability (somewhat indirect however) is the pair-wize optical collision of each character pair of the set. This refers somewhat to legibility, or differentiation ability of glyphs. --- Less technically, you can understand it better if you think of your own words "There is nothing objectively better about base 10 than any other place-value numbering system." If this could be ever true than you could read with characters that are very similar to each other or something messy as good as with characters which are easily identifyable, collision resistant and optically consistent. But that is absurd, sorry. For numbers obviously you don't need so many character as for speech encoding, so this means that only those glyphs or even a subset of it should be used. This means anything more than 8 characters is quite worthless for reading numbers. Note that I can't provide here the works currently so don't ask me for that. Some of them would be probably available in near future. Your analogy with speech and signs is not correct because speech is different but numbers are numbers. But also for different speech, same character set must be used namely the one with superior optical qualities, readability. > Saying we should dump hex notation because everyone understands decimal is > like saying that all signs in Prague should only be printed in English We should dump hex notation because currently decimal is simply superiour to hex, just like Mercedes is superior to Lada, aand secondly, because it is more common for ALL people, so it is 2:0 for not using such notation. With that said, I am not against base-16 itself in the first place, but rather against the character set which is simply visually inconsistent and not readable. Someone just took arabic digits and added first latin letters to it. It could be forgiven for a schoolboy's exercises in drawing but I fail to understand how it can be accepted as a working notation for medium supposed to be human readable. Practically all this notation does, it reduces the time before you as a programmer become visual and brain impairments. > Just look at the Wikipedia page for Unicode, which says: "Normally a > Unicode code point is referred to by writing "U+" followed by its > hexadecimal number." That's it. Yeah that's it. And it sucks and migrated to coding standard, sucks twice. If a new syntax/standard is decided, there'll be only positive sides of using decimal vs hex. So nobody'll be hurt, this is only the question of remaking current implementation and is proposed only as a long-term theoretical improvement. > it's just > a label that identifies the character. Ok, but if I write a string filtering in Python for example then obviously I use decimal everywhere to compare index ranges, etc. so what is the use for me of that label? Just redundant conversions back and forth. Makes me sick actually. From greg.ewing at canterbury.ac.nz Thu Oct 13 01:53:10 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 18:53:10 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <57FF2146.9090406@canterbury.ac.nz> Mikhail V wrote: > And decimal is objectively way more readable than hex standard character set, > regardless of how strong your habits are. That depends on what you're trying to read from it. I can look at a hex number and instantly get a mental picture of the bit pattern it represents. I can't do that with decimal numbers. This is the reason hex exists. It's used when the bit pattern represented by a number is more important to know than its numerical value. This is the case with Unicode code points. Their numerical value is irrelevant, but the bit pattern conveys useful information, such as which page and plane it belongs to, whether it fits in 1 or 2 bytes, etc. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 13 02:02:35 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 19:02:35 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> Message-ID: <57FF237B.8090702@canterbury.ac.nz> Mikhail V wrote: > Consider unicode table as an array with glyphs. You mean like this one? http://unicode-table.com/en/ Unless I've miscounted, that one has the characters arranged in rows of 16, so it would be *harder* to look up a decimal index in it. -- Greg From mikhailwas at gmail.com Thu Oct 13 02:18:25 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 08:18:25 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On 13 October 2016 at 04:49, Emanuel Barry wrote: >> From: Mikhail V >> Sent: Wednesday, October 12, 2016 9:57 PM >> Subject: Re: [Python-ideas] Proposal for default character representation > > Hello, and welcome to Python-ideas, where only a small portion of ideas go > further, and where most newcomers that wish to improve the language get hit > by the reality bat! I hope you enjoy your stay :) Hi, thanks! I enjoy the conversation indeed , never had so much interesting in a discussion actually! > >> On 13 October 2016 at 01:50, Chris Angelico wrote: >> > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V >> wrote: >> > >> > Way WAY less readable, and I'm comfortable working in both hex and >> decimal. >> >> Please don't mix the readability and personal habit, which previuos >> repliers seems to do as well. Those two things has nothing >> to do with each other. If you are comfortable with old roman numbering >> system this does not make it readable. >> And I am NOT comfortable with hex, as well as most people would >> be glad to use single notation. >> But some of them think that they are cool because they know several >> numbering notations ;) But I bet few can actually understand which is more >> readable. > > I'll turn your argument around: Not being comfortable with hex does not make > it unreadable; it's a matter of habit (as Brendan pointed out in his > separate reply). Matter of habit does not reflect the readability, see my last reply to Brandan. It is quite precise engeneering. And readability it is kind of serious stuff especially if you decide for programming carreer. Young people underestimate it and for oldies it is too late when they realize it :) And Python is all about readability and I like it. As for your other points, I'll need to read it with fresh head tomorrow, Of course I don't believe this would all suddenly happen with Python, or other programming language, it is just an idea anyway. And I do want to learn more actually. Especially want to see some example where it would be really beneficial to use hex, either technically (some low level binary related stuff?) or regarding comprehension, which is to my knowledge hardly possible. > - Indexing, and that's completely irrelevant to the topic at hand (also see > above bullet point). Eee how would I find if the character lies in certain range? With index here I meant it's numeric value, I just called it index for some reason, I don't know why. So its a table - value and corresponding glyph. Just consieder analogy: I make an 3d array, first index is my value, and 2nd 3rd is image pixels, so simply image stack. Why on earth would I use for 1st index some other literals than decimal. Did you see much code written with hex literals? Some low level things probably ... > - ord() which returns an integer (which can be interpreted in any base!), Yes so my idea is to stick to other notations than hex. for low level bit manipulation obviously two-character notation should be used, so again I fail to see something... Mikhail From mikhailwas at gmail.com Thu Oct 13 02:42:14 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 13 Oct 2016 08:42:14 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FF237B.8090702@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> Message-ID: On 13 October 2016 at 08:02, Greg Ewing wrote: > Mikhail V wrote: >> >> Consider unicode table as an array with glyphs. > > > You mean like this one? > > http://unicode-table.com/en/ > > Unless I've miscounted, that one has the characters > arranged in rows of 16, so it would be *harder* to > look up a decimal index in it. > > -- > Greg Nice point finally, I admit, although quite minor. Where the data implies such pagings or alignment, the notation should be (probably) more binary-oriented. But: you claim to see bit patterns in hex numbers? Then I bet you will see them much better if you take binary notation (2 symbols) or quaternary notation (4 symbols), I guarantee. And if you take consistent glyph set for them also you'll see them twice better, also guarantee 100%. So not that the decimal is cool, but hex sucks (too big alphabet) and _the character set_ used for hex optically sucks. That is the point. On the other hand why would unicode glyph table which is to the biggest part a museum of glyphs would be necesserily paged in a binary-friendly manner and not in a decimal friendly manner? But I am not saying it should or not, its quite irrelevant for this particular case I think. Mikhail From rosuav at gmail.com Thu Oct 13 02:58:07 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 17:58:07 +1100 Subject: [Python-ideas] INSANE FLOAT PERFORMANCE!!! In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> <20161012101621.GR22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > I'm not sure what you mean by "strcmp-able"; do you mean that the > > lexical ordering of two Unicode strings is guaranteed to be the same > > as the byte-wise ordering of their UTF-8 encodings? > > This is definitely not true for the Han characters. In Japanese, the > most commonly used lexical ordering is based on the pronunciation, > meaning that there are few characters (perhaps none) in common use > that has a unique place in lexical ordering (most individual > characters have multiple pronunciations, and even many whole personal > names do). Yeah, and even just with Latin-1 characters, you have (a) non-ASCII characters that sort between ASCII characters, and (b) characters that have different meanings in different languages, and should be sorted differently. So lexicographical ordering is impossible in a generic string sort. ChrisA From mal at egenix.com Thu Oct 13 04:18:30 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 13 Oct 2016 10:18:30 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <57FF4356.1070104@egenix.com> On 13.10.2016 01:06, Mikhail V wrote: > On 12 October 2016 at 23:48, M.-A. Lemburg wrote: >> The hex notation for \uXXXX is a standard also used in many other >> programming languages, it's also easier to parse, so I don't >> think we should change this default. > > In programming literature it is used often, but let me point out that > decimal is THE standard and is much much better standard > in sence of readability. And there is no solid reason to use 2 standards > at the same time. I guess it's a matter of choosing the right standard for the right purpose. For \uXXXX and \UXXXXXXXX the intention was to be able to represent a Unicode code point using its standard Unicode ordinal representation and since the standard uses hex for this, it's quite natural to use the same here. >> Take e.g. >> >>>>> s = "\u123456" >>>>> s >> '?56' >> >> With decimal notation, it's not clear where to end parsing >> the digit notation. > > How it is not clear if the digit amount is fixed? Not very clear what > did you mean. Unicode code points have ordinals from the range [0, 1114111], so it's not clear where to stop parsing the decimal representation and continue to interpret the literal as regular string, since I suppose you did not intend everyone to have to write \u0000010 just to get a newline code point to avoid the ambiguity. PS: I'm not even talking about the breakage such a change would cause. This discussion is merely about the pointing out how things got to be how they are now. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From srkunze at mail.de Thu Oct 13 04:37:35 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 13 Oct 2016 10:37:35 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161012232943.GV22471@ando.pearwood.info> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> Message-ID: On 13.10.2016 01:29, Steven D'Aprano wrote: > On Wed, Oct 12, 2016 at 06:32:12PM +0200, Sven R. Kunze wrote: >> >> So, my reasoning would tell me: where have I seen * so far? *args and >> **kwargs! > And multiplication. Multiplication with only a single argument? Come on. > And sequence unpacking. We are on the right side of the = if any and not no the left side. > >> [...] is just the list constructor. > Also indexing: dict[key] or sequence[item or slice]. There's no name in front of [. So, I cannot be an index either. Nothing else matches (in my head) and I also don't see any ambiguities. YMMV. I remember a new co-worker, I taught how to use *args and **kwargs. It was unintuitive to him on the first time as well. About the list constructor: we construct a list by writing [a,b,c] or by writing [b for b in bs]. The end result is a list and that matters from the end developer's point of view, no matter how fancy words you choose for it. Cheers, Sven From greg.ewing at canterbury.ac.nz Thu Oct 13 04:43:39 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 21:43:39 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <57FF493B.5040306@canterbury.ac.nz> Mikhail V wrote: > Did you see much code written with hex literals? From /usr/include/sys/fcntl.h: /* * File status flags: these are used by open(2), fcntl(2). * They are also used (indirectly) in the kernel file structure f_flags, * which is a superset of the open/fcntl flags. Open flags and f_flags * are inter-convertible using OFLAGS(fflags) and FFLAGS(oflags). * Open/fcntl flags begin with O_; kernel-internal flags begin with F. */ /* open-only flags */ #define O_RDONLY 0x0000 /* open for reading only */ #define O_WRONLY 0x0001 /* open for writing only */ #define O_RDWR 0x0002 /* open for reading and writing */ #define O_ACCMODE 0x0003 /* mask for above modes */ /* * Kernel encoding of open mode; separate read and write bits that are * independently testable: 1 greater than the above. * * XXX * FREAD and FWRITE are excluded from the #ifdef KERNEL so that TIOCFLUSH, * which was documented to use FREAD/FWRITE, continues to work. */ #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) #define FREAD 0x0001 #define FWRITE 0x0002 #endif #define O_NONBLOCK 0x0004 /* no delay */ #define O_APPEND 0x0008 /* set append mode */ #ifndef O_SYNC /* allow simultaneous inclusion of */ #define O_SYNC 0x0080 /* synch I/O file integrity */ #endif #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) #define O_SHLOCK 0x0010 /* open with shared file lock */ #define O_EXLOCK 0x0020 /* open with exclusive file lock */ #define O_ASYNC 0x0040 /* signal pgrp when data ready */ #define O_FSYNC O_SYNC /* source compatibility: do not use */ #define O_NOFOLLOW 0x0100 /* don't follow symlinks */ #endif /* (_POSIX_C_SOURCE && !_DARWIN_C_SOURCE) */ #define O_CREAT 0x0200 /* create if nonexistant */ #define O_TRUNC 0x0400 /* truncate to zero length */ #define O_EXCL 0x0800 /* error if already exists */ #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) #define O_EVTONLY 0x8000 /* descriptor requested for event notifications only */ #endif #define O_NOCTTY 0x20000 /* don't assign controlling terminal */ #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) #define O_DIRECTORY 0x100000 #define O_SYMLINK 0x200000 /* allow open of a symlink */ #endif #ifndef O_DSYNC /* allow simultaneous inclusion of */ #define O_DSYNC 0x400000 /* synch I/O data integrity */ #endif -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 13 04:47:03 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 21:47:03 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: <57FF4A07.8020208@canterbury.ac.nz> Mikhail V wrote: > I am not against base-16 itself in the first place, > but rather against the character set which is simply visually > inconsistent and not readable. Now you're talking about inventing new characters, or at least new glyphs for existing ones, and persuading everyone to use them. That's well beyond the scope of what Python can achieve! -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 13 04:55:19 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 21:55:19 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: <57FF4BF7.6010000@canterbury.ac.nz> Mikhail V wrote: > Ok, but if I write a string filtering in Python for example then > obviously I use decimal everywhere to compare index ranges, etc. > so what is the use for me of that label? Just redundant > conversions back and forth. I'm not sure what you mean by that. If by "index ranges" you're talking about the numbers you use to index into the string, they have nothing to do with character codes, so you can write them in whatever base is most convenient for you. If you have occasion to write a literal representing a character code, there's nothing to stop you writing it in hex to match the way it's shown in a repr(), or in published Unicode tables, etc. I don't see a need for any conversions back and forth. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 13 05:03:50 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 22:03:50 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <57FF4DF6.9010404@canterbury.ac.nz> Mikhail V wrote: > Eee how would I find if the character lies in certain range? >>> c = "\u1235" >>> if "\u1230" <= c <= "\u123f": ... print("Boo!") ... Boo! -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 13 05:24:51 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Oct 2016 22:24:51 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> Message-ID: <57FF52E3.3060309@canterbury.ac.nz> Mikhail V wrote: > But: you claim to see bit patterns in hex numbers? Then I bet you will > see them much better if you take binary notation (2 symbols) or quaternary > notation (4 symbols), I guarantee. Nope. The meaning of 0xC001 is much clearer to me than 1100000000000001, because I'd have to count the bits very carefully in the latter to distinguish it from, e.g. 6001 or 18001. The bits could be spaced out: 1100 0000 0000 0001 but that just takes up even more room to no good effect. I don't find it any faster to read -- if anything, it's slower, because my eyes have to travel further to see the whole thing. Another point -- a string of hex digits is much easier for me to *remember* if I'm transliterating it from one place to another. Not only because it's shorter, but because I can pronounce it. "Cee zero zero one" is a lot easier to keep in my head than "one one zero zero zero zero zero zero zero zero zero zero zero zero zero one"... by the time I get to the end, I've already forgotten how it started! > And if you take consistent glyph set for them > also you'll see them twice better, also guarantee 100%. When I say "instantly", I really do mean *instantly*. I fail to see how a different glyph set could reduce the recognition time to less than zero. -- Greg From cory at lukasa.co.uk Thu Oct 13 06:05:33 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 13 Oct 2016 11:05:33 +0100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FF493B.5040306@canterbury.ac.nz> References: <57FEAF9F.5020103@egenix.com> <57FF493B.5040306@canterbury.ac.nz> Message-ID: > On 13 Oct 2016, at 09:43, Greg Ewing wrote: > > Mikhail V wrote: >> Did you see much code written with hex literals? > > From /usr/include/sys/fcntl.h: > Backing Greg up for a moment, hex literals are extremely common in any code that needs to work with binary data, such as network programming or fine data structure manipulation. For example, consider the frequent requirement to mask out certain bits of a given integer (e.g., keep the low 24 bits of a 32 bit integer). Here are a few ways to represent that: integer & 0x00FFFFFF # Hex integer & 16777215 # Decimal integer & 0o77777777 # Octal integer & 0b111111111111111111111111 # Binary Of those four, hexadecimal has the advantage of being both extremely concise and clear. The octal representation is infuriating because one octal digit refers to *three* bits, which means that there is a non-whole number of octal digits in a byte (that is, one byte with all bits set is represented by 0o377). This causes problems both with reading comprehension and with most other common tasks. For example, moving from 0xFF to 0xFFFF (or 255 to 65535, also known as setting the next most significant byte to all 1) is represented in octal by moving from 0o377 to 0o177777. This is not an obvious transition, and I doubt many programmers could do it from memory in any representation but hex or binary. Decimal is no clearer. Programmers know how to represent certain bit patterns from memory in decimal simply because they see them a lot: usually they can do the all 1s case, and often the 0 followed by all 1s case (255 and 128 for one byte, 65535 and 32767 for two bytes, and then increasingly few programmers know the next set). But trying to work out what mask to use for setting only bits 15 and 14 is tricky in decimal, while in hex it?s fairly easy (in hex it?s 0xC000, in decimal it?s 49152). Binary notation seems like the solution, but note the above case: the only way to work out how many bits are being masked out is to count them, and there can be quite a lot. IIRC there?s some new syntax coming for binary literals that would let us represent them as 0b1111_1111_1111_1111, which would help the readability case, but it?s still substantially less dense and loses clarity for many kinds of unusual bit patterns. Additionally, as the number of bits increases life gets really hard: masking out certain bits of a 64-bit number requires a literal that?s at least 66 characters long, not including the underscores that would add another 15 underscores for a literal that is 81 characters long (more than the PEP8 line width recommendation). That starts getting unwieldy fast, while the hex representation is still down at 18 characters. Hexadecimal has the clear advantage that each character wholly represents 4 bits, and the next 4 bits are independent of the previous bits. That?s not true of decimal or octal, and while it?s true of binary it costs a fourfold increase in the length of the representation. It?s definitely not as intuitive to the average human being, but that?s ok: it?s a specialised use case, and we aren?t requiring that all human beings learn this skill. This is a very long argument to suggest that your argument against hexadecimal literals (namely, that they use 16 glyphs as opposed to the 10 glyphs used in decimal) is an argument that is too simple to be correct. Different collections of glyphs are clearer in different contexts. For example, decimal numerals can be represented using 10 glyphs, while the english language requires 26 glyphs plus punctuation. But I don?t think you?re seriously proposing we should swap from writing English using the larger glyph set to writing it in decimal representation of ASCII bytes. Given this, I think the argument that says that the Unicode consortium said ?write the number in hex? is good enough for me. Cory From rosuav at gmail.com Thu Oct 13 06:43:01 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 13 Oct 2016 21:43:01 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FF493B.5040306@canterbury.ac.nz> Message-ID: On Thu, Oct 13, 2016 at 9:05 PM, Cory Benfield wrote: > Binary notation seems like the solution, but note the above case: the only way to work out how many bits are being masked out is to count them, and there can be quite a lot. IIRC there?s some new syntax coming for binary literals that would let us represent them as 0b1111_1111_1111_1111, which would help the readability case, but it?s still substantially less dense and loses clarity for many kinds of unusual bit patterns. > And if you were to write them like this, you would start to read them in blocks of four - effectively, treating each underscore-separated unit as a glyph, despite them being represented with four characters. Fortunately, just like with Hangul characters, we have a transformation that combines these multi-character glyphs into single characters. We call it 'hexadecimal'. ChrisA From steve at pearwood.info Thu Oct 13 10:04:38 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 01:04:38 +1100 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: References: <22524.23684.863380.593596@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161013140437.GX22471@ando.pearwood.info> On Tue, Oct 11, 2016 at 02:31:25PM +1100, Chris Angelico wrote: > On Tue, Oct 11, 2016 at 2:29 PM, Stephen J. Turnbull > wrote: > > Chris Angelico writes: > > > > > Given that it's not changing semantics at all, just adding info/hints > > > to an error message, it could well be added in a point release. > > > > But it does change semantics, specifically for doctests. > > Blah, forgot about doctests. Guess that's off the cards for a point > release, then, but still, shouldn't be a big deal for 3.7. Error messages are not part of Python's public API. We should be able to change error messages at any time, including point releases. Nevertheless, we shouldn't abuse that right. If it's only a change to the error message, and not a functional change, then maybe we can add it to the next 3.6 beta or rc. But its probably not worth backporting it to older versions. -- Steve From steve at pearwood.info Thu Oct 13 10:10:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 01:10:01 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> Message-ID: <20161013141001.GY22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 10:37:35AM +0200, Sven R. Kunze wrote: > On 13.10.2016 01:29, Steven D'Aprano wrote: > >On Wed, Oct 12, 2016 at 06:32:12PM +0200, Sven R. Kunze wrote: > >> > >>So, my reasoning would tell me: where have I seen * so far? *args and > >>**kwargs! > >And multiplication. > > Multiplication with only a single argument? Come on. You didn't say anything about a single argument. Your exact words are shown above: "where have I seen * so far?". I'm pretty sure you've seen * used for multiplication. I also could have mentioned regexes, globs, and exponentiation. I cannot respond to your intended meaning, only to what you actually write. Don't blame the reader if you failed to communicate clearly and were misunderstood. [...] > About the list constructor: we construct a list by writing [a,b,c] or by > writing [b for b in bs]. The end result is a list I construct lists using all sorts of ways: list(argument) map(func, sequence) zip(a, b) file.readlines() dict.items() os.listdir('.') sorted(values) and so on. Should I call them all "list constructors" just because they return a list? No, I don't think so. Constructor has a specific meaning, and these are not all constructors -- and neither are list comprehensions. > and that matters from > the end developer's point of view, no matter how fancy words you choose > for it. These "fancy words" that you dismiss are necessary technical terms. Precision in language is something we should aim for, not dismiss as unimportant. List comprehensions and list displays have different names, not to show off our knowledge of "fancy terms", but because they are different things which just happen to both return lists. Neither of them are what is commonly called a constructor. List displays are, in some senses, like a literal; list comprehensions are not, and are better understood as list builders, a process which builds a list. Emphasis should be on the *process* part: a comprehension is syntactic sugar for building a list using for-loop, not for a list display or list constructor. The bottom line is that when you see a comprehension (list, set or dict) or a generator expression, you shouldn't think of list displays, but of a for-loop. That's one of the reasons why the analogy with argument unpacking fails: it doesn't match what comprehensions *actually* are. -- Steve From steve at pearwood.info Thu Oct 13 10:25:51 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 01:25:51 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: <20161013142551.GZ22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote: > > How many decimal digits would you use to denote a single character? > > for text, three decimal digits would be enough for me personally, Well, if it's enough for you, why would anyone need more? > and in long perspective when the world's alphabetical garbage will > dissapear, two digits would be ok. Are you serious? Talking about "alphabetical garbage" like that makes you seem to be an ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even 7-bit ASCII has more than 100 characters (128). -- Steve From elazarg at gmail.com Thu Oct 13 10:31:47 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 14:31:47 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013141001.GY22471@ando.pearwood.info> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 5:10 PM Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 10:37:35AM +0200, Sven R. Kunze wrote: > > About the list constructor: we construct a list by writing [a,b,c] or by > > writing [b for b in bs]. The end result is a list > > I construct lists using all sorts of ways: > I think there is a terminology problem here (again). "Constructor" in OOP has a specific meaning, and "constructor" in functional terminology has a slightly different meaning. I guess Sven uses the latter terminology because pattern matching is the dual of the constructor - it is a "destructor" - and it feels appropriate, although admittedly confusing. In this terminology, map(), zip() etc. are definitely not constructors. there is only one "constructor" (list()), and there are functions that may use it as their implementation detail. In a way, [1, 2, 3] is just a syntactic shorthand for list construction, so it is reasonable to a call it a constructor. This terminology is not a perfect fit into the object-oriented world of Python, but it is very helpful in discussion of patterns how to apply them uniformly, since they were pretty much invented in the functional world (ML, I think, and mathematics). One only needs to be aware of the two different meaning, and qualify if needed, so that we won't get lost in terminology arguments again. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Oct 13 10:32:19 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 13 Oct 2016 16:32:19 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013141001.GY22471@ando.pearwood.info> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> Message-ID: <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> On 13.10.2016 16:10, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 10:37:35AM +0200, Sven R. Kunze wrote: >> Multiplication with only a single argument? Come on. > You didn't say anything about a single argument. Your exact words are > shown above: "where have I seen * so far?". I'm pretty sure you've seen > * used for multiplication. I also could have mentioned regexes, globs, > and exponentiation. > > I cannot respond to your intended meaning, only to what you actually > write. Don't blame the reader if you failed to communicate clearly and > were misunderstood. Steven, please. You seemed to struggle to understand the notion of the [*....] construct and many people (not just me) here tried their best to explain their intuition to you. But now it seems you don't even try to come behind the idea and instead try hard not to understand the help offered. If you don't want help or don't really want to understand the proposal, that's fine but please, do us a favor and don't divert the thread with nitpicking nonsensical details (like multiplication) and waste everybody's time. The context of the proposal is about lists/dicts and the */** unpacking syntax. So, I actually expect you to put every post here into this very context. Discussions without context don't make much sense. So, I won't reply further. Best, Sven From rosuav at gmail.com Thu Oct 13 10:50:36 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 14 Oct 2016 01:50:36 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <20161013142551.GZ22471@ando.pearwood.info> References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote: >> and in long perspective when the world's alphabetical garbage will >> dissapear, two digits would be ok. > Talking about "alphabetical garbage" like that makes you seem to be an > ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even > 7-bit ASCII has more than 100 characters (128). Solution: Abolish most of the control characters. Let's define a brand new character encoding with no "alphabetical garbage". These characters will be sufficient for everyone: * [2] Formatting characters: space, newline. Everything else can go. * [8] Digits: 01234567 * [26] Lower case Latin letters a-z * [2] Vital social media characters: # (now officially called "HASHTAG"), @ * [2] Can't-type-URLs-without-them: colon, slash (now called both "SLASH" and "BACKSLASH") That's 40 characters that should cover all the important things anyone does - namely, Twitter, Facebook, and email. We don't need punctuation or capitalization, as they're dying arts and just make you look pretentious. I might have missed a few critical characters, but it should be possible to fit it all within 64, which you can then represent using two digits from our newly-restricted set; octal is better than decimal, as it needs less symbols. (Oh, sorry, so that's actually "50" characters, of which "32" are the letters. And we can use up to "100" and still fit within two digits.) Is this the wrong approach, Mikhail? Perhaps we should go the other way, then, and be *inclusive* of people who speak other languages. Thanks to Unicode's rich collection of characters, we can represent multiple languages in a single document; see, for instance, how this uses four languages and three entirely distinct scripts: http://youtu.be/iydlR_ptLmk Turkish and French both use the Latin script, but have different characters. Alphabetical garbage, or accurate representations of sounds and words in those languages? Python 3 gives the world's languages equal footing. This is a feature, not a bug. It has consequences, including that arbitrary character entities could involve up to seven decimal digits or six hex (although for most practical work, six decimal or five hex will suffice). Those consequences are a trivial price to pay for uniting the whole internet, as opposed to having pockets of different languages, like we had up until the 90s. ChrisA From tomuxiong at gmail.com Thu Oct 13 11:04:47 2016 From: tomuxiong at gmail.com (Thomas Nyberg) Date: Thu, 13 Oct 2016 11:04:47 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> Message-ID: <6ce9561b-88c1-7f29-8c45-e3a924ca2270@gmail.com> On 10/12/2016 07:13 PM, Mikhail V wrote: > On 12 October 2016 at 23:50, Thomas Nyberg wrote: >> Since when was decimal notation "standard"? > Depends on what planet do you live. I live on planet Earth. And you? If you mean that decimal notation is the standard used for _counting_ by people, then yes of course that is standard. But decimal notation certainly is not standard in this domain. >> opposite. For unicode representations, byte notation seems standard. > How does this make it a good idea? > Consider unicode table as an array with glyphs. > Now the index of the array is suddenly represented in some > obscure character set. How this index is other than index of any > array or natural number? Think about it... Hexadecimal notation is hardly "obscure", but yes I understand that fewer people understand it than decimal notation. Regardless, byte notation seems standard for unicode and unless you can convince the unicode community at large to switch, I don't think it makes any sense for python to switch. Sometimes it's better to go with the flow even if you don't want to. >>> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, >>> I hope no need to explain why. >> >> Still not sure which "mixing" you refer to. > > Still not sure? These two words in brackets. Mixing those two systems. > There is not mixing for unicode in python; it displays as hexadecimal. Decimal is used in other places though. So if by "mixing" you mean python should not use the standard notations of subdomains when working with those domains, then I would totally disagree. The language used in different disciplines is and has always been variable. Until that's no longer true it's better to stick with convention than add inconsistency which will be much more confusing in the long-term than learning the idiosyncrasies of a specific domain (in this case the use of hexadecimal in the unicode world). Cheers, Thomas From mar77i at mar77i.ch Thu Oct 13 10:34:49 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Thu, 13 Oct 2016 16:34:49 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: On Wed, Oct 12, 2016 at 5:41 PM, Nick Coghlan wrote: > However, set builder notation doesn't inherently include the notion of > flattening lists-of-lists. Instead, that's a *consumption* operation > that happens externally after the initial list-of-lists has been > built, and that's exactly how it's currently spelled in Python: > "itertools.chain.from_iterable(subiter for subiter in iterable)". On Wed, Oct 12, 2016 at 5:42 PM, Steven D'Aprano wrote: > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > As it happens, python does have an external consumption operation that happens externally with an iteration implied: for t in iterable: yield t For your example [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] that would mean: for t in [(1, 'a'), (2, 'b'), (3, 'c')]: yield t And accordingly, for the latter case [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] it would be: for item in [(1, 'a'), (2, 'b'), (3, 'c')]: for t in item: yield t cheers! mar77i From p.f.moore at gmail.com Thu Oct 13 11:18:10 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 13 Oct 2016 16:18:10 +0100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> Message-ID: On 13 October 2016 at 15:32, Sven R. Kunze wrote: > Steven, please. You seemed to struggle to understand the notion of the > [*....] construct and many people (not just me) here tried their best to > explain their intuition to you. And yet, the fact that it's hard to explain your intuition to others (Steven is not the only one who's finding this hard to follow) surely implies that it's merely that - personal intuition - and not universal understanding. The *whole point* here is that not everyone understands the proposed notation the way the proposers do, and it's *hard to explain* to those people. Blaming the people who don't understand does not support the position that this notation should be added to the language. Rather, it reinforces the idea that the new proposal is hard to teach (and consequently, it may be a bad idea for Python). Paul From elazarg at gmail.com Thu Oct 13 11:28:27 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 15:28:27 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> Message-ID: On Thu, Oct 13, 2016 at 6:19 PM Paul Moore wrote: > On 13 October 2016 at 15:32, Sven R. Kunze wrote: > > Steven, please. You seemed to struggle to understand the notion of the > > [*....] construct and many people (not just me) here tried their best to > > explain their intuition to you. > > And yet, the fact that it's hard to explain your intuition to others > (Steven is not the only one who's finding this hard to follow) surely > implies that it's merely that - personal intuition - and not universal > understanding. > > I fail to see this implication. Perhaps you mean that the universal understanding is hard to get, intuitively. And trying to explain them is the way to figure out howw hard can this difficulty be overcome. > The *whole point* here is that not everyone understands the proposed > notation the way the proposers do, and it's *hard to explain* to those > people. Blaming the people who don't understand does not support the > position that this notation should be added to the language. Rather, > it reinforces the idea that the new proposal is hard to teach (and > consequently, it may be a bad idea for Python). > > It may also suggest that there are currently two ways to understand the *[...] construct, and only one of them can be generalized to lead the new proposal. So people that are *used* to the other way may have harder time than people coming with a clean slate. So it might or might not be hard to teach. (I'm not saying that's necessarily the case) I will be happy to understand that "other way" that is harder to generalize; I think this discussion may be fruitful in making these different understandings explicit. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 13 12:48:48 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 03:48:48 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> Message-ID: <20161013164845.GA22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 03:28:27PM +0000, ????? wrote: > It may also suggest that there are currently two ways to understand the > *[...] construct, This thread is about allowing sequence unpacking as the internal expression of list comprehensions: [ *(expr) for x in iterable ] It isn't about unpacking lists: *[...] so I don't see what relevance your comment has. There may be two or three or ten or 100 ways to (mis)understand list comprehensions in Python, but only one of them is the correct way. List comprehensions are (roughly) syntactic sugar for: result = [] for x in iterable: result.append(expression) Any other understanding of them is incorrect. Now if people wish to make an argument for changing the meaning of comprehensions so that the suggested internal unpacking makes sense, then by all means try making that argument! That's absolutely fine. In the past, I've tried a similar thing: I argued for a variant list comprehension that halts early: [expr for x in iterable while condition] (as found in at least one other language), but had that knocked back because it doesn't fit the existing list comprehension semantics. I wasn't able to convince people that the value of this new comprehension was worth breaking the existing semantics of comprehensions. Maybe you will be able to do better than me. But understand that: [*(expr) for x in iterable] also fails to fit the existing list comprehension semantics. To make it work requires changing the meaning of Python list comps. It isn't enough to just deny the existing meaning and insist that your own personal meaning is correct. -- Steve From steve at pearwood.info Thu Oct 13 12:55:46 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 03:55:46 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: <20161013165546.GB22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 04:34:49PM +0200, Martti K?hne wrote: > > If I had seen a list comprehension with an unpacked loop variable: > > > > [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] Marttii, somehow you have lost the leading * when quoting me. What I actually wrote was: [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > As it happens, python does have an external consumption operation that > happens externally with an iteration implied: > > for t in iterable: > yield t If you replace the t with *t, you get a syntax error: py> def gen(): ... for t in [(1, 'a'), (2, 'b'), (3, 'c')]: ... yield *t File "", line 3 yield *t ^ SyntaxError: invalid syntax Even if it was allowed, what would it mean? It could only mean "unpack the sequence t, and collect the values into a tuple; then yield the tuple". > For your example [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] that would mean: > > for t in [(1, 'a'), (2, 'b'), (3, 'c')]: > yield t > > And accordingly, for the latter case [*t for t in [(1, 'a'), (2, 'b'), > (3, 'c')]] it would be: > > for item in [(1, 'a'), (2, 'b'), (3, 'c')]: > for t in item: > yield t No it wouldn't. Where does the second for loop come from? The list comprehension shown only has one loop, not nested loops. -- Steve From ned at nedbatchelder.com Thu Oct 13 13:45:49 2016 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 13 Oct 2016 13:45:49 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> Message-ID: On 10/13/16 2:42 AM, Mikhail V wrote: > On 13 October 2016 at 08:02, Greg Ewing wrote: >> Mikhail V wrote: >>> Consider unicode table as an array with glyphs. >> >> You mean like this one? >> >> http://unicode-table.com/en/ >> >> Unless I've miscounted, that one has the characters >> arranged in rows of 16, so it would be *harder* to >> look up a decimal index in it. >> >> -- >> Greg > Nice point finally, I admit, although quite minor. Where > the data implies such pagings or alignment, the notation > should be (probably) more binary-oriented. > But: you claim to see bit patterns in hex numbers? Then I bet you will > see them much better if you take binary notation (2 symbols) or quaternary > notation (4 symbols), I guarantee. And if you take consistent glyph set for them > also you'll see them twice better, also guarantee 100%. > So not that the decimal is cool, > but hex sucks (too big alphabet) and _the character set_ used for hex > optically sucks. > That is the point. > On the other hand why would unicode glyph table which is to the > biggest part a museum of glyphs would be necesserily > paged in a binary-friendly manner and not in a decimal friendly > manner? But I am not saying it should or not, its quite irrelevant > for this particular case I think. You continue to overlook the fact that Unicode codepoints are conventionally presented in hexadecimal, including in the page you linked us to. This is the convention. It makes sense to stick to the convention. When I see a numeric representation of a character, there are only two things I can do with it: look it up in a reference someplace, or glean some meaning from it directly. For looking things up, please remember that all Unicode references use hex numbering. Looking up a character by decimal numbers is simply more difficult than looking them up by hex numbers. For gleaning meaning directly, please keep in mind that Unicode fundamentally structured around pages of 256 code points, organized into planes of 256 pages. The very structure of how code points are allocated and grouped is based on a hexadecimal-friendly system. The blocks of codepoints are aligned on hexadecimal boundaries: http://www.fileformat.info/info/unicode/block/index.htm . When I see \u0414, I know it is a Cyrillic character because it is in block 04xx. It simply doesn't make sense to present Unicode code points in anything other than hex. --Ned. From mar77i at mar77i.ch Thu Oct 13 14:15:36 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Thu, 13 Oct 2016 20:15:36 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013165546.GB22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: On Thu, Oct 13, 2016 at 6:55 PM, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 04:34:49PM +0200, Martti K?hne wrote: > >> > If I had seen a list comprehension with an unpacked loop variable: >> > >> > [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > Martti, somehow you have lost the leading * when quoting me. What I > actually wrote was: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > Sorry for misquoting you. Can I fix my name, though? Also, this mail was too long in my outbox so the context was lost on it. I reiterate it, risking that I would annoy some, but to be absolutely clear. > >> As it happens, python does have an external consumption operation that >> happens externally with an iteration implied: >> > > If you replace the t with *t, you get a syntax error: > I meant that statement in context of the examples which were brought up: the occurrence of a list comprehension inside an array have the following effect: 1) [ ..., [expr for t in iterable] ] is equivalent to: def expr_long(iterable, result): result.append(iterable) return result expr_long(iterable, [ ..., ]) so, if you make the case for pep448, you might arrive at the following: 2) [ ..., *[expr for expr in iterable] ] which would be, if I'm typing it correctly, equivalent to, what resembles an external collection: def expr_star(list_comp, result): result.extend(list(list_comp)) return result expr_star(iterable, [ ..., ]) Having this in mind, the step to making: [ ..., [*expr for expr in iterable], ] from: def expr_insidestar(iterable, result): for expr in iterable: result.extend(expr) return result does not appear particularly far-fetched, at least not to me and a few people on this list. cheers! mar77i From mertz at gnosis.cx Thu Oct 13 14:37:17 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 13 Oct 2016 11:37:17 -0700 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> Message-ID: Exactly with Paul! As I mentioned, I teach software developers and scientists Python for a living. I get paid a lot of money to do that, and have a good sense of what learners can easily understand and not (I've also written hundred of articles and a few books about Python). The people I write for and teach are educated, smart, and generally have familiarity with multiple programming languages. In my opinion, this new construct?if added to the language?would be difficult to teach, and most of my students would get it wrong most of the time. Yes, I understand the proposed semantics. It is not *intuitive* to me, but I could file the rule about the behavior if I had to. But if I were forced to teach it, it would always be "Here's a Python wart to look out for if you see it in other code... you should not ever use it yourself." On Thu, Oct 13, 2016 at 8:18 AM, Paul Moore wrote: > On 13 October 2016 at 15:32, Sven R. Kunze wrote: > > Steven, please. You seemed to struggle to understand the notion of the > > [*....] construct and many people (not just me) here tried their best to > > explain their intuition to you. > > And yet, the fact that it's hard to explain your intuition to others > (Steven is not the only one who's finding this hard to follow) surely > implies that it's merely that - personal intuition - and not universal > understanding. > > The *whole point* here is that not everyone understands the proposed > notation the way the proposers do, and it's *hard to explain* to those > people. Blaming the people who don't understand does not support the > position that this notation should be added to the language. Rather, > it reinforces the idea that the new proposal is hard to teach (and > consequently, it may be a bad idea for Python). > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Thu Oct 13 14:50:17 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 13 Oct 2016 11:50:17 -0700 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013165546.GB22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > Another problem with this is that it is very hard to generalize to the case where the item included in a comprehension is a transformation on iterated values. E.g. what does this do? [math.exp(*t) for t in [(1,2),(3,4)]] Maybe that somehow magically gets us: [2.7182, 7.38905, 20.0855, 54.5981] Or maybe the syntax would be: [*math.exp(t) for t in [(1,2),(3,4)]] Neither of those follows conventional Python semantics for function calling or sequence unpacking. So maybe that remains a type error or syntax error. But then we exclude a very common pattern of using comprehensions to create collections of *transformed* data, not simply of filtered data. In contrast, either of these are unambiguous and obvious: [math.exp(t) for t in flatten([(1,2),(3,4)])] Or: [math.exp(n) for t in [(1,2),(3,4)] for n in t] Obviously, picking math.exp() is arbitrary and any unary function would be the same issue. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Oct 13 15:46:11 2016 From: random832 at fastmail.com (Random832) Date: Thu, 13 Oct 2016 15:46:11 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> On Thu, Oct 13, 2016, at 14:50, David Mertz wrote: > Neither of those follows conventional Python semantics for function > calling > or sequence unpacking. So maybe that remains a type error or syntax > error. But then we exclude a very common pattern of using comprehensions > to create collections of *transformed* data, not simply of filtered data. [*map(math.exp, t) for t in [(1, 2), (3, 4)]] [*(math.exp(x) for x in t) for t in [(1, 2), (3, 4)]] I think "excluding" is a bit of a strong word - just because something doesn't address a mostly unrelated need doesn't mean it doesn't have any merit in its own right. Not every proposal is going to do everything. I think the key is that the person originally asking this thought of *x as a generalized "yield from x"-ish thing, for example: "a, *b, c" becomes "def f(): yield a; yield from b; yield c;" [a, *b, c] == list(f()) (a, *b, c) == tuple(f()) so, under a similar 'transformation', "*foo for foo in bar" likewise becomes "def f(): for foo in bar: yield from foo" bar = [(1, 2), (3, 4)] (*(1, 2), *(3, 4)) == == tuple(f()) [*(1, 2), *(3, 4)] == == list(f()) > In contrast, either of these are unambiguous and obvious: > > [math.exp(t) for t in flatten([(1,2),(3,4)])] > > Or: > > [math.exp(n) for t in [(1,2),(3,4)] for n in t] > > Obviously, picking math.exp() is arbitrary and any unary function would > be > the same issue. > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.com Thu Oct 13 15:51:57 2016 From: random832 at fastmail.com (Random832) Date: Thu, 13 Oct 2016 15:51:57 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> Message-ID: <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> On Thu, Oct 13, 2016, at 15:46, Random832 wrote: > so, under a similar 'transformation', "*foo for foo in bar" likewise > becomes "def f(): for foo in bar: yield from foo" > > bar = [(1, 2), (3, 4)] > (*(1, 2), *(3, 4)) == == tuple(f()) > [*(1, 2), *(3, 4)] == == list(f()) I accidentally hit ctrl-enter while copying and pasting, causing my message to go out while my example was less thorough than intended and containing syntax errors. It was intended to read as follows: ..."*foo for foo in bar" likewise becomes def f(): for foo in bar: yield from foo a, b = (1, 2), (3, 4) bar = [a, b] (*a, *b) == (1, 2, 3, 4) == tuple(f()) # tuple(*foo for foo in bar) [*a, *b] == [1, 2, 3, 4] == list(f()) # [*foo for foo in bar] From sjoerdjob at sjoerdjob.com Thu Oct 13 16:40:19 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Thu, 13 Oct 2016 22:40:19 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013165546.GB22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: <20161013204019.GE13170@sjoerdjob.com> After having followed this thread for a while, it occured to me that the reason that the idea is confusing, is because the spelling is confusing. I think the suggested spelling (`*`) is the confusing part. If it were to be spelled `from ` instead, it would be less confusing. Consider this: g = (f(t) for t in iterable) is "merely" sugar for def gen(): for t in iterable: yield f(t) g = gen() Likewise, l = [f(t) for t in iterable] can be seen as sugar for def gen(): for t in iterable: yield f(t) l = list(gen()) Now the suggested spelling l = [*f(t) for t in iterable] is very confusing, from what I understand: what does the `*` even mean here. However, consider the following spelling: l = [from f(t) for t in iterable] To me, it does not seem far-fetched that this would mean: def gen(): for t in iterable: yield from f(t) l = list(gen()) It follows the "rule" quite well: given a generator display, everything before the first "for" gets placed after "yield ", and all the `for`/`if`s are expanded to suites. Now I'm not sure if I'm a fan of the idea, but I think that at least the `from `-spelling is less confusing than the `*`-spelling. (Unless I totally misunderstood what the `*`-spelling was about, given how confusing it supposedly is. Maybe it confused me.) On Fri, Oct 14, 2016 at 03:55:46AM +1100, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 04:34:49PM +0200, Martti K?hne wrote: > > > > If I had seen a list comprehension with an unpacked loop variable: > > > > > > [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > Marttii, somehow you have lost the leading * when quoting me. What I > actually wrote was: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > > As it happens, python does have an external consumption operation that > > happens externally with an iteration implied: > > > > for t in iterable: > > yield t > > If you replace the t with *t, you get a syntax error: > > > py> def gen(): > ... for t in [(1, 'a'), (2, 'b'), (3, 'c')]: > ... yield *t > File "", line 3 > yield *t > ^ > SyntaxError: invalid syntax > > Even if it was allowed, what would it mean? It could only mean "unpack > the sequence t, and collect the values into a tuple; then yield the > tuple". > > > > > For your example [t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] that would mean: > > > > for t in [(1, 'a'), (2, 'b'), (3, 'c')]: > > yield t > > > > And accordingly, for the latter case [*t for t in [(1, 'a'), (2, 'b'), > > (3, 'c')]] it would be: > > > > for item in [(1, 'a'), (2, 'b'), (3, 'c')]: > > for t in item: > > yield t > > No it wouldn't. Where does the second for loop come from? The list > comprehension shown only has one loop, not nested loops. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Thu Oct 13 16:42:04 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 13 Oct 2016 21:42:04 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: On 13 October 2016 at 20:51, Random832 wrote: > On Thu, Oct 13, 2016, at 15:46, Random832 wrote: >> so, under a similar 'transformation', "*foo for foo in bar" likewise >> becomes "def f(): for foo in bar: yield from foo" >> >> bar = [(1, 2), (3, 4)] >> (*(1, 2), *(3, 4)) == == tuple(f()) >> [*(1, 2), *(3, 4)] == == list(f()) > > > I accidentally hit ctrl-enter while copying and pasting, causing my > message to go out while my example was less thorough than intended and > containing syntax errors. It was intended to read as follows: > > ..."*foo for foo in bar" likewise becomes > > def f(): > for foo in bar: > yield from foo > > a, b = (1, 2), (3, 4) > bar = [a, b] > (*a, *b) == (1, 2, 3, 4) == tuple(f()) # tuple(*foo for foo in bar) > [*a, *b] == [1, 2, 3, 4] == list(f()) # [*foo for foo in bar] I remain puzzled. Given the well-documented and understood transformation: [fn(x) for x in lst if cond] translates to result = [] for x in lst: if cond: result.append(fn(x)) please can you explain how to modify that translation rule to incorporate the suggested syntax? Personally, I'm not even sure any more that I can *describe* the suggested syntax. Where in [fn(x) for x in lst if cond] is the * allowed? fn(*x)? *fn(x)? Only as *x with a bare variable, but no expression? Only in certain restricted types of construct which aren't expressions but are some variation on an unpacking construct? We've had a lot of examples. I think it's probably time for someone to describe the precise syntax (as BNF, like the syntax in the Python docs at https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries and following sections) and semantics (as an explanation of how to rewrite any syntactically valid display as a loop). It'll have to be done in the end, as part of any implementation, so why not now? Paul From elazarg at gmail.com Thu Oct 13 16:47:56 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 20:47:56 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: On Thu, Oct 13, 2016 at 11:42 PM Paul Moore wrote: > I remain puzzled. > > Given the well-documented and understood transformation: > > [fn(x) for x in lst if cond] > > translates to > > result = [] > for x in lst: > if cond: > result.append(fn(x)) > > please can you explain how to modify that translation rule to > incorporate the suggested syntax? > if you allow result.append(1, 2, 3) to mean result.extend([1,2,3]) # which was discussed before result = [] for x in lst: if cond: result.append(*fn(x)) Or simply use result.extend([*fn(x)]) Personally, I'm not even sure any more that I can *describe* the > suggested syntax. Where in [fn(x) for x in lst if cond] is the * > allowed? fn(*x)? *fn(x)? Only as *x with a bare variable, but no > expression? Only in certain restricted types of construct which aren't > expressions but are some variation on an unpacking construct? > > The star is always exactly at the place that should "handle" it. which means [*(fn(x)) for x in lst if cond]. fn(x) must be iterable as always. > We've had a lot of examples. I think it's probably time for someone to > describe the precise syntax (as BNF, like the syntax in the Python > docs at > https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries > and following sections) and semantics (as an explanation of how to > rewrite any syntactically valid display as a loop). It'll have to be > done in the end, as part of any implementation, so why not now? > > I will be happy to do so, and will be happy to work with anyone else interested. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Oct 13 16:48:09 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 13 Oct 2016 21:48:09 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013204019.GE13170@sjoerdjob.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> Message-ID: On 13 October 2016 at 21:40, Sjoerd Job Postmus wrote: > However, consider the following spelling: > > l = [from f(t) for t in iterable] > > To me, it does not seem far-fetched that this would mean: > > def gen(): > for t in iterable: > yield from f(t) > l = list(gen()) Thank you. This is the type of precise definition I was asking for in my previous post (your timing was superb!) I'm not sure I *like* the proposal, but I need to come up with some reasonable justification for my feeling, whereas for previous proposals the "I don't understand what you're suggesting" was the overwhelming feeling, and stifled any genuine discussion of merits or downsides. Paul PS I can counter a suggestion of using *f(t) rather than from f(t) in the above, by saying that it adds yet another meaning to the already heavily overloaded * symbol. The suggestion of "from" avoids this as "from" only has a few meanings already. (You can agree or disagree with my view, but at least we're debating the point objectively at that point!) From mistersheik at gmail.com Thu Oct 13 16:48:58 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 13:48:58 -0700 (PDT) Subject: [Python-ideas] Add sorted (ordered) containers In-Reply-To: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> Message-ID: <852a9619-e69f-42fc-bdce-8a98bad5d4cc@googlegroups.com> Related: Nick posted an excellent answer to this question here: http://stackoverflow.com/questions/5953205/why-are-there-no-sorted-containers-in-pythons-standard-libraries On Thursday, October 13, 2016 at 4:36:39 PM UTC-4, ???? ????????? wrote: > > I mean mutable containers that are always sorted when iterating over them. > > See http://bugs.python.org/issue28433 > > for example: > > * SortedSet (sorted unique elements, implemented using (rb?)tree instead > of hash) > * SortedList (sorted elements, the same as SortedSet, but without > uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an > array) > * SortedDict (sorted by key when interating) - like C++'s ordered_map > > There are many implementations in the net, like: > > https://bitbucket.org/bcsaller/rbtree > http://newcenturycomputers.net/projects/rbtree.html > https://sourceforge.net/projects/pyavl > http://www.grantjenks.com/docs/sortedcontainers > https://github.com/tailhook/sortedsets > https://pypi.python.org/pypi/skiplist > > and also in pip: > > pip3 search sorted | grep -Ei '[^a-z]sorted' > > I think it should be one standardized implementation of such containers in > CPython. > > For example, C++ has both ordered_map and unorderd_map. > > Instead of trees, implementation may use SkipList structure, but this is > just implementation details. > > Such structres imply fast insertion and deletion, ability to iterate, and > also memory efficiency. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Oct 13 16:59:54 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 13 Oct 2016 21:59:54 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: On 13 October 2016 at 21:47, ????? wrote: > if you allow result.append(1, 2, 3) to mean result.extend([1,2,3]) # which > was discussed before I don't (for the reasons raised before). But thank you for your explanation, it clarifies what you were proposing. And it does so within the *current* uses of the * symbol, which is good. But: 1. I'm not keen on extending append's meaning to overlap with extend's like this. 2. Your proposal does not generalise to generator expressions, set displays (without similarly modifying the set.add() method) or dictionary displays. 3. *fn(x) isn't an expression, and yet it *looks* like it should be, and in the current syntax, an expression is required in that position. To me, that suggests it would be hard to teach. [1] You can of course generalise Sjoerd's "from" proposal and then just replace "from" with "*" throughout. That avoids your requirement to change append, but at the cost of the translation no longer being a parallel to an existing use of "*". Paul [1] On a purely personal note, I'd say it's confusing, but I don't want to go back to subjective arguments, so I only note that here as an opinion, not an argument. From mistersheik at gmail.com Thu Oct 13 16:30:45 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 13:30:45 -0700 (PDT) Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161012154224.GT22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> Message-ID: <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> First of all: +1 to Sven's very well-expressed support of the proposal, and +1 to Nick's very well-explained reasons for rejecting it. As one of the main implementers of PEP 448, I have always liked this, but I suggested that we leave this out when there was opposition since there's no rush for it. Regarding Steven's example, like Sven, I also see it this way: [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] should mean: [*(1, 'a'), *(2, 'b'), *(3, 'c')]] Which coincides with what the OP is asking for. At the end of this discussion it might be good to get a tally of how many people think the proposal is reasonable and logical. I imagine people will be asking this same question next year and the year after, and so it will be good to see if as familiarity with PEP 448 expands, more people will find this intuitive and logical. >From a CPython implementation standpoint, we specifically blocked this code path, and it is only a matter of unblocking it if we want to support this. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Oct 13 17:06:00 2016 From: random832 at fastmail.com (Random832) Date: Thu, 13 Oct 2016 17:06:00 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <1476392760.3241939.755285929.70F6BFAC@webmail.messagingengine.com> On Thu, Oct 13, 2016, at 16:42, Paul Moore wrote: > I remain puzzled. > > Given the well-documented and understood transformation: > > [fn(x) for x in lst if cond] > > translates to > > result = [] > for x in lst: > if cond: > result.append(fn(x)) > > please can you explain how to modify that translation rule to > incorporate the suggested syntax? In this case * would change this to result.extend (or +=) just as result = [a, *b, c] is equivalent to: result = [] result.append(a) result.extend(b) result.append(c) result = [*x for x in lst if cond] would become: result = [] for x in lst: if cond: result.extend(x) I used yield from as my original example to include generator expressions, which should also support this. > Personally, I'm not even sure any more that I can *describe* the > suggested syntax. Where in [fn(x) for x in lst if cond] is the * > allowed? fn(*x)? This already has a meaning, so it's obviously "allowed", but not in a way relevant to this proposal. The elements of x are passed to fn as arguments rather than being inserted into the list. Ultimately the meaning is the same. > *fn(x)? Only as *x with a bare variable, but no expression? Both of these would be allowed. Any expression would be allowed, but at runtime its value must be iterable, the same as other places that you can use *x. From elazarg at gmail.com Thu Oct 13 17:06:42 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 21:06:42 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: On Thu, Oct 13, 2016 at 11:59 PM Paul Moore wrote: > On 13 October 2016 at 21:47, ????? wrote: > > if you allow result.append(1, 2, 3) to mean result.extend([1,2,3]) # > which > > was discussed before > > I don't (for the reasons raised before). But thank you for your > explanation, it clarifies what you were proposing. And it does so > within the *current* uses of the * symbol, which is good. But: > > 1. I'm not keen on extending append's meaning to overlap with extend's > like this. > 2. Your proposal does not generalise to generator expressions, set > displays (without similarly modifying the set.add() method) or > dictionary displays. > 3. *fn(x) isn't an expression, and yet it *looks* like it should be, > and in the current syntax, an expression is required in that position. > To me, that suggests it would be hard to teach. [1] > > You can of course generalise Sjoerd's "from" proposal and then just > replace "from" with "*" throughout. That avoids your requirement to > change append, but at the cost of the translation no longer being a > parallel to an existing use of "*". > > I think it is an unfortunate accident of syntax, the use of "yield from foo()" instead of "yield *foo()". These "mean" the same: a syntactic context that directly handles iterable as repetition, (with some guarantees regarding exceptions etc.). Alternatively, we could be writing [1, 2, from [3, 4], 5, 6]. Whether it is "from x" or "*x" is just an accident. In my mind. As you said, the proposal should be written in a much more formal way, so that it could be evaluated without confusion. I completely agree. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Thu Oct 13 17:09:12 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 21:09:12 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> Message-ID: On Fri, Oct 14, 2016 at 12:06 AM Neil Girdhar wrote: > Regarding Steven's example, like Sven, I also see it this way: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > should mean: > > [*(1, 'a'), *(2, 'b'), *(3, 'c')]] > > Which coincides with what the OP is asking for. > > >From a CPython implementation standpoint, we specifically blocked this code > path, and it is only a matter of unblocking it if we want to support this. > > This is *very, very* not surprising. And should be stressed. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From socketpair at gmail.com Thu Oct 13 16:36:39 2016 From: socketpair at gmail.com (=?UTF-8?B?0JzQsNGA0Log0JrQvtGA0LXQvdCx0LXRgNCz?=) Date: Thu, 13 Oct 2016 13:36:39 -0700 (PDT) Subject: [Python-ideas] Add sorted (ordered) containers Message-ID: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> I mean mutable containers that are always sorted when iterating over them. See http://bugs.python.org/issue28433 for example: * SortedSet (sorted unique elements, implemented using (rb?)tree instead of hash) * SortedList (sorted elements, the same as SortedSet, but without uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an array) * SortedDict (sorted by key when interating) - like C++'s ordered_map There are many implementations in the net, like: https://bitbucket.org/bcsaller/rbtree http://newcenturycomputers.net/projects/rbtree.html https://sourceforge.net/projects/pyavl http://www.grantjenks.com/docs/sortedcontainers https://github.com/tailhook/sortedsets https://pypi.python.org/pypi/skiplist and also in pip: pip3 search sorted | grep -Ei '[^a-z]sorted' I think it should be one standardized implementation of such containers in CPython. For example, C++ has both ordered_map and unorderd_map. Instead of trees, implementation may use SkipList structure, but this is just implementation details. Such structres imply fast insertion and deletion, ability to iterate, and also memory efficiency. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Oct 13 16:46:34 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 13:46:34 -0700 (PDT) Subject: [Python-ideas] Suggestion: Deprecate metaclasses that are not instances of type Message-ID: Background: I asked a stackoverflow question here . The Python documentation is very confusing to me. It says that: if an explicit metaclass is given and it is not an instance of type, then it is used directly as the metaclass This seems to suggest that in this case, the "explicit metaclass" does not need to be "subtype of all of these candidate metaclasses" as it would in the third case. (This is not true.) Also, providing a callable as a metaclass doesn't seem to be any more flexible, readable, or powerful than providing an instance of type. Therefore, I suggest that we deprecate the second case and replace the entire section (3.3.3.2) of the documentation to say: "The metaclass of a class definition is selected from the explicitly specified metaclass (if any) and the metaclasses (i.e. type(cls)) of all specified base classes. The most derived metaclass is one which is a subtype of all of these candidate metaclasses. If none of the candidate metaclasses meets that criterion, then the class definition will fail with TypeError. If provided, the explicit metaclass must be an instance of type()." -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Oct 13 17:30:49 2016 From: random832 at fastmail.com (Random832) Date: Thu, 13 Oct 2016 17:30:49 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> On Thu, Oct 13, 2016, at 16:59, Paul Moore wrote: > I don't (for the reasons raised before). But thank you for your > explanation, it clarifies what you were proposing. And it does so > within the *current* uses of the * symbol, which is good. But: > > 1. I'm not keen on extending append's meaning to overlap with extend's > like this. I think the "append(*x)" bit was just a flourish to try to explain it in terms of the current use of * since you don't seem to understand it any other way, rather than an actual proposal to actually change the append method. > 2. Your proposal does not generalise to generator expressions, set > displays (without similarly modifying the set.add() method) or > dictionary displays. Basically it would make the following substitutions in the conventional "equivalent loops" generator yield => yield from list append => extend set add => update dict __setitem__ => update dict comprehensions would need to use **x - {*x for x in y} would be a set comprehension. > 3. *fn(x) isn't an expression, and yet it *looks* like it should be, > and in the current syntax, an expression is required in that position. > To me, that suggests it would be hard to teach. [1] I can think of another position an expression used to be required in: Python 3.5.2 >>> [1, *(2, 3), 4] [1, 2, 3, 4] Python 2.7.11 >>> [1, *(2, 3), 4] File "", line 1 [1, *(2, 3), 4] ^ SyntaxError: invalid syntax Was that hard to teach? Maybe. But it's a bit late to object now, and every single expression on the right hand side in my examples below already has a meaning. Frankly, I don't see why the pattern isn't obvious [and why people keep assuming there will be a new meaning of f(*x) as if it doesn't already have a meaning] Lists, present: [x for x in [a, b, c]] == [a, b, c] [f(x) for x in [a, b, c]] == [f(a), f(b), f(c)] [f(*x) for x in [a, b, c]] == [f(*a), f(*b), f(*c)] [f(**x) for x in [a, b, c]] == [f(**a), f(**b), f(**c)] Lists, future: [*x for x in [a, b, c]] == [*a, *b, *c] [*f(x) for x in [a, b, c]] == [*f(a), *f(b), *f(c)] [*f(*x) for x in [a, b, c]] == [*f(*a), *f(*b), *f(*c)] [*f(**x) for x in [a, b, c]] == [*f(**a), *f(**b), *f(**c)] Sets, present: {x for x in [a, b, c]} == {a, b, c} {f(x) for x in [a, b, c]} == {f(a), f(b), f(c)} {f(*x) for x in [a, b, c]} == {f(*a), f(*b), f(*c)} {f(**x) for x in [a, b, c]} == {f(**a), f(**b), f(**c)} Sets, future: {*x for x in [a, b, c]} == {*a, *b, *c} {*f(x) for x in [a, b, c]} == {*f(a), *f(b), *f(c)} {*f(*x) for x in [a, b, c]} == {*f(*a), *f(*b), *f(*c)} {*f(**x) for x in [a, b, c]} == {*f(**a), *f(**b), *f(**c)} Dicts, future: {**x for x in [a, b, c]} == {**a, **b, **c} {**f(x) for x in [a, b, c]} == {**f(a), **f(b), **f(c)} {**f(*x) for x in [a, b, c]} == {**f(*a), **f(*b), **f(*c)} {**f(**x) for x in [a, b, c]} == {**f(**a), **f(**b), **f(**c)} From steve at pearwood.info Thu Oct 13 17:40:49 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 08:40:49 +1100 Subject: [Python-ideas] Suggestion: Deprecate metaclasses that are not instances of type In-Reply-To: References: Message-ID: <20161013214048.GC22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 01:46:34PM -0700, Neil Girdhar wrote: > If provided, the explicit metaclass must be an instance of > type()." -1 for pointless breakage. The metaclass has always been permitted to be any callable. You haven't given any good reason for gratuitously changing this. -- Steve From storchaka at gmail.com Thu Oct 13 17:52:55 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 14 Oct 2016 00:52:55 +0300 Subject: [Python-ideas] Add sorted (ordered) containers In-Reply-To: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> Message-ID: On 13.10.16 23:36, ???? ????????? wrote: > I think it should be one standardized implementation of such containers > in CPython. > > For example, C++ has both ordered_map and unorderd_map. > > Instead of trees, implementation may use SkipList structure, but this is > just implementation details. > > Such structres imply fast insertion and deletion, ability to iterate, > and also memory efficiency. I recommend to read thorough review articles written by Andrew Barnert: http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html http://stupidpythonideas.blogspot.com/2014/04/sortedcontainers.html From elazarg at gmail.com Thu Oct 13 18:08:50 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 13 Oct 2016 22:08:50 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> Message-ID: Trying to restate the proposal, somewhat more formal following Random832 and Paul's suggestion. I only speak about the single star. --- *The suggested change of syntax:* comprehension ::= starred_expression comp_for *Semantics:* (In the following, f(x) must always evaluate to an iterable) 1. List comprehension: result = [*f(x) for x in iterable if cond] Translates to result = [] for x in iterable: if cond: result.extend(f(x)) 2. Set comprehension: result = {*f(x) for x in iterable if cond} Translates to result = set() for x in iterable: if cond: result.update(f(x)) 3. Generator expression: (*f(x) for x in iterable if cond) Translates to for x in iterable: if cond: yield from f(x) Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 13 18:15:26 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 09:15:26 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013204019.GE13170@sjoerdjob.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> Message-ID: <20161013221525.GD22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 10:40:19PM +0200, Sjoerd Job Postmus wrote: > Likewise, > > l = [f(t) for t in iterable] > > can be seen as sugar for > > def gen(): > for t in iterable: > yield f(t) > l = list(gen()) But that is *not* how list comprehensions are treated today. Perhaps they should be? https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries (Aside: earlier I contrasted "list display" from "list comprehension". In fact according to the docs, a comprehension is a kind of display, a special case of display. Nevertheless, my major point still holds: a list display like [1, 2, 3] is not the same as a list comprehension like [a+1 for a in (0, 1, 2)].) There may be some conceptual benefits to switching to a model where list/set/dict displays are treated as list(gen_expr) etc. But that still leaves the question of what "yield *t" is supposed to mean? Consider the analogy with f(*t), where t = (a, b, c). We *don't* have: f(*t) is equivalent to f(a); f(b); f(c) So why would yield *t give us this? yield a; yield b; yield c By analogy with the function call syntax, it should mean: yield (a, b, c) That is, t is unpacked, then repacked to a tuple, then yielded. > Now the suggested spelling > > l = [*f(t) for t in iterable] > > is very confusing, from what I understand: what does the `*` even mean > here. Indeed. The reader may be forgiven for thinking that this is yet another unrelated and arbitrary use of * to join the many other uses: - mathematical operator; - glob and regex wild-card; - unpacking; - import all - and now yield from > However, consider the following spelling: > > l = [from f(t) for t in iterable] > > To me, it does not seem far-fetched that this would mean: > > def gen(): > for t in iterable: > yield from f(t) > l = list(gen()) Now we're starting to move towards a reasonable proposal. It still requires a conceptual shift in how list comprehensions are documented, but at least now the syntax is no longer so arbitrary. -- Steve From mistersheik at gmail.com Thu Oct 13 18:22:04 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 15:22:04 -0700 (PDT) Subject: [Python-ideas] Add sorted (ordered) containers In-Reply-To: References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> Message-ID: <008db9ff-6f4e-49c3-932a-1da888e729f5@googlegroups.com> Those are great articles. One thing that Andrew does recommend would be to standardize the interface to the sorted containers, and add them to collections.abc as SortedDict, and SortedSet. I recently switched from blist to sortedcontainers and it would be nice to have these standardized going forward. Another reason to standardize them is for use with the new type checking. Best, Neil On Thursday, October 13, 2016 at 5:54:09 PM UTC-4, Serhiy Storchaka wrote: > > On 13.10.16 23:36, ???? ????????? wrote: > > I think it should be one standardized implementation of such containers > > in CPython. > > > > For example, C++ has both ordered_map and unorderd_map. > > > > Instead of trees, implementation may use SkipList structure, but this is > > just implementation details. > > > > Such structres imply fast insertion and deletion, ability to iterate, > > and also memory efficiency. > > I recommend to read thorough review articles written by Andrew Barnert: > > > http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html > > http://stupidpythonideas.blogspot.com/2014/04/sortedcontainers.html > > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Oct 13 19:16:47 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 23:16:47 +0000 Subject: [Python-ideas] Suggestion: Deprecate metaclasses that are not instances of type In-Reply-To: <20161013214048.GC22471@ando.pearwood.info> References: <20161013214048.GC22471@ando.pearwood.info> Message-ID: That's fair. However, the documentation should at least be repaired by replacing section 3.3.3.2 with: "The metaclass of a class definition is selected from the explicitly specified metaclass (if any) and the metaclasses (i.e. type(cls)) of all specified base classes. The most derived metaclass is one which is a subtype of all of these candidate metaclasses. If none of the candidate metaclasses meets that criterion, then the class definition will fail with TypeError. If provided, the explicit metaclass must be a callable accepting the positional arguments (name, bases, _dict)." This is because something happened along the way and Objects/typeobject.c: type_new no longer coincides with Lib/types.py:new_class. The Python version conditionally calls _calculate_meta whereas the C version calls it unconditionally. I consider the C implementation to be the "correct" version. Best, Neil On Thu, Oct 13, 2016 at 5:41 PM Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 01:46:34PM -0700, Neil Girdhar wrote: > > > If provided, the explicit metaclass must be an instance of > > type()." > > -1 for pointless breakage. > > The metaclass has always been permitted to be any callable. You haven't > given any good reason for gratuitously changing this. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/wrHDM0SOIqE/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Oct 13 19:52:11 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 13 Oct 2016 23:52:11 +0000 Subject: [Python-ideas] Suggestion: Deprecate metaclasses that are not instances of type In-Reply-To: References: <20161013214048.GC22471@ando.pearwood.info> Message-ID: Bug: http://bugs.python.org/issue28437 On Thu, Oct 13, 2016 at 7:15 PM Neil Girdhar wrote: > That's fair. However, the documentation should at least be repaired by > replacing section 3.3.3.2 with: > > "The metaclass of a class definition is selected from the explicitly > specified metaclass (if any) and the metaclasses (i.e. type(cls)) of all > specified base classes. The most derived metaclass is one which is a > subtype of all of these candidate metaclasses. If none of the candidate > metaclasses meets that criterion, then the class definition will fail with > TypeError. If provided, the explicit metaclass must be a callable accepting > the positional arguments (name, bases, _dict)." > > This is because something happened along the way and Objects/typeobject.c: > type_new no longer coincides with Lib/types.py:new_class. The Python > version conditionally calls _calculate_meta whereas the C version calls it > unconditionally. I consider the C implementation to be the "correct" > version. > > Best, > > Neil > > On Thu, Oct 13, 2016 at 5:41 PM Steven D'Aprano > wrote: > > On Thu, Oct 13, 2016 at 01:46:34PM -0700, Neil Girdhar wrote: > > > If provided, the explicit metaclass must be an instance of > > type()." > > -1 for pointless breakage. > > The metaclass has always been permitted to be any callable. You haven't > given any good reason for gratuitously changing this. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/wrHDM0SOIqE/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant.jenks at gmail.com Thu Oct 13 20:25:20 2016 From: grant.jenks at gmail.com (Grant Jenks) Date: Thu, 13 Oct 2016 17:25:20 -0700 Subject: [Python-ideas] Fwd: Add sorted (ordered) containers In-Reply-To: References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> Message-ID: On Thu, Oct 13, 2016 at 1:36 PM, ???? ????????? wrote: > > I mean mutable containers that are always sorted when iterating over them. > > See http://bugs.python.org/issue28433 > > for example: > > * SortedSet (sorted unique elements, implemented using (rb?)tree instead of hash) > * SortedList (sorted elements, the same as SortedSet, but without uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an array) > * SortedDict (sorted by key when interating) - like C++'s ordered_map Can you share more about your use cases for these containers? What are you making? Nick Coghlan gave an answer to this question on StackOverflow at http://stackoverflow.com/a/5958960/232571 The answer kind of boils down to "there should be one obvious way to do it" and existing Python features like lists, sorted, bisect, and heapq cover many use cases. I wrote the answer that is now the second highest rated for that question. I've noticed that the upvotes have been accumulating at a slightly higher rate than Nick's answer. I think that reflects an increase in interest and maybe gradual tide change of opinion. > There are many implementations in the net, like: > > http://www.grantjenks.com/docs/sortedcontainers That's my project. I am the primary developer of the SortedContainers project. You may also be interested in the [SortedCollections](http://www.grantjenks.com/docs/sortedcollections/) module which builds atop SortedContainers with data types like ValueSortedDict and ItemSortedDict. Because it's pure-Python, SortedContainers offers a lot of opportunity for extension/customization. That's also made it easier for the API to adapt/expand over time. > I think it should be one standardized implementation of such containers in CPython. > > Instead of trees, implementation may use SkipList structure, but this is just implementation details. > > Such structres imply fast insertion and deletion, ability to iterate, and also memory efficiency. I gave a talk at PyCon 2016 about Python Sorted Collections[1] that's worth watching. The first third discusses about six different implementations with different strengths and weaknesses. The choice of data type is more than implementation details. One of the biggest issues is the various tradeoffs of data types like blists, rbtrees, etc. I have been meaning to discuss sorted collections further with Raymond Hettinger (author of the Python collections module). We spoke after the PyCon talk and wanted to continue the conversation. But I had a busy summer and just a few weeks ago welcomed my first son into the world. So realistically that conversation won't happen until 2017. [1]: http://www.grantjenks.com/docs/sortedcontainers/pycon-2016-talk.html > I recommend to read thorough review articles written by Andrew Barnert: > > http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html > > http://stupidpythonideas.blogspot.com/2014/04/sortedcontainers.html One of Andrew Barnert's conclusions is that SortedContainers could not scale. I did a pretty rigorous performance analysis and benchmarking at http://www.grantjenks.com/docs/sortedcontainers/performance-scale.html Short answer: I scaled SortedContainers up through ten billion elements, well past the memory limits of most machines. From steve at pearwood.info Thu Oct 13 21:04:07 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 12:04:07 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: <20161014010407.GG22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 08:15:36PM +0200, Martti K?hne wrote: > Can I fix my name, though? I don't understand what you mean. Your email address says your name is Martti K?hne. Is that incorrect? [...] > I meant that statement in context of the examples which were brought up: > the occurrence of a list comprehension inside an array have the > following effect: > > 1) [ ..., [expr for t in iterable] ] > > is equivalent to: > > def expr_long(iterable, result): > result.append(iterable) > return result > > expr_long(iterable, [ ..., ]) The good thing about this example is that it is actual runnable code that we can run to see if they are equivalent. They are not equivalent. py> def expr_long(iterable, result): ... result.append(iterable) ... return result ... py> iterable = (100, 200, 300) py> a = [..., [2*x for x in iterable]] py> b = expr_long(iterable, [...]) py> a == b False py> print(a, b) [Ellipsis, [200, 400, 600]] [Ellipsis, (100, 200, 300)] For this to work, you have to evaluate the list comprehension first, then pass the resulting list to be appended to the result. I don't think this is very insightful. All you have demonstrated is that a list display [a, b, c, ...] is equivalent to: result = [] for x in [a, b, c, ...]: result.append(x) except that you have written it in a slightly functional form. > so, if you make the case for pep448, you might arrive at the following: > > 2) [ ..., *[expr for expr in iterable] ] That syntax already works (in Python 3.5): py> [1, 2, 3, *[x+1 for x in (100, 200, 300)], 4, 5] [1, 2, 3, 101, 201, 301, 4, 5] > which would be, if I'm typing it correctly, equivalent to, what > resembles an external collection: > > def expr_star(list_comp, result): > result.extend(list(list_comp)) > return result > > expr_star(iterable, [ ..., ]) > > Having this in mind, the step to making: > > [ ..., [*expr for expr in iterable], ] > > from: > > def expr_insidestar(iterable, result): > for expr in iterable: > result.extend(expr) > return result > > does not appear particularly far-fetched, at least not to me and a few > people on this list. But you don't have [..., list_comp, ] you just have the list comp. You are saying: (1) List displays [a, b, c, d, ...] are like this; (2) we can sensibly extend that to the case [a, b, *c, d, ...] I agree with (1) and (2). But then you have a leap: (3) therefore [*t for t in iterable] should mean this. There's a huge leap between the two. To even begin to make sense of this, you have to unroll the list comprehension into a list display. But that's not very helpful: [expr for t in iterable] Would you rather see that explained as: [expr, expr, expr, expr, ...] or as this? result = [] for t in iterable: result.append(expr) The second form, the standard, documented explanation for comprehensions, also applies easily to more complex examples: [expr for t in iter1 for u in iter2 for v in iter3 if condition] result = [] for t in iter1: for u in iter2: for v in iter3: if condition: result.append(expr) -- Steve From python at mrabarnett.plus.com Thu Oct 13 23:18:40 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 14 Oct 2016 04:18:40 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161014010407.GG22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161014010407.GG22471@ando.pearwood.info> Message-ID: <01d916de-8b61-c6d1-4efc-649902ce7572@mrabarnett.plus.com> On 2016-10-14 02:04, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 08:15:36PM +0200, Martti K?hne wrote: > >> Can I fix my name, though? > > I don't understand what you mean. Your email address says your name is > Martti K?hne. Is that incorrect? > [snip] You wrote "Marttii" and he corrected it when he quoted you in his reply. From mehaase at gmail.com Thu Oct 13 23:20:14 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Thu, 13 Oct 2016 23:20:14 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: (Replying to multiple posts in this thread) Guido van Rossum: > Another problem is PEP 505 -- it > is full of discussion but its specification is unreadable due to the > author's idea to defer the actual choice of operators and use a > strange sequence of unicode characters instead. Hi, I wrote PEP-505. I'm sorry that it's unreadable. The choice of emoji as operators was supposed to be a blatant joke. I'd be happy to submit a new version that is ASCII. Or make any other changes that would facilitate making a decision on the PEP. As I recall, the thread concluded with Guido writing, "I'll have to think about this," or something to that effect. I had hoped that the next step could be a survey where we could gauge opinions on the various possible spellings. I believe this was how PEP-308 was handled, and that was a very similar proposal to this one. Most of the discussion on list was really centered around the fact that nobody like the proposed ?? or .? spellings, and nobody could see around that fact to consider whether the feature itself was intrinsically valuable. (This is why the PEP doesn't commit to a syntax.) Also, as unfortunate side effect of a miscommunication, about 95% of the posts on this PEP were written _before_ I submitted a complete draft and so most of the conversation was arguing about a straw man. David Mertz: > > The idea is that we can easily have both "regular" behavior and None > coalescing just by wrapping any objects in a utility class... and WITHOUT > adding ugly syntax. I might have missed some corners where we would want > behavior wrapped, but those shouldn't be that hard to add in principle. > The biggest problem with a wrapper in practice is that it has to be unwrapped before it can be passed to any other code that doesn't know how to handle it. E.g. if you want to JSON encode an object, you need to unwrap all of the NullCoalesce objects because the json module wouldn't know what to do with them. The process of wrapping and unwrapping makes the resulting code more verbose than any existing syntax. > How much of the time is a branch of the None check a single fallback value > or attribute access versus how often a suite of statements within the > not-None branch? > > I definitely check for None very often also. I'm curious what the > breakdown is in code I work with. > There's a script in the PEP-505 repo that can you help you identify code that could be written with the proposed syntax. (It doesn't identify blocks that would not be affected, so this doesn't completely answer your question.) https://github.com/mehaase/pep-0505/blob/master/find-pep505.py The PEP also includes the results of running this script over the standard library. On Sat, Sep 10, 2016 at 1:26 PM, Guido van Rossum wrote: > The way I recall it, we arrived at the perfect syntax (using ?) and > semantics. The issue was purely strong hesitation about whether > sprinkling ? all over your code is too ugly for Python, and in the end > we couldn't get agreement on *that*. Another problem is PEP 505 -- it > is full of discussion but its specification is unreadable due to the > author's idea to defer the actual choice of operators and use a > strange sequence of unicode characters instead. > > If someone wants to write a new, *short* PEP that defers to PEP 505 > for motivation etc. and just writes up the spec for the syntax and > semantics we'll have a better starting point. IMO the key syntax is > simply one for accessing attributes returning None instead of raising > AttributeError, so that e.g. `foo?.bar?.baz` is roughly equivalent to > `foo.bar.baz if (foo is not None and foo.bar is not None) else None`, > except evaluating foo and foo.bar only once. > > On Sat, Sep 10, 2016 at 10:14 AM, Random832 > wrote: > > On Sat, Sep 10, 2016, at 12:48, Stephen J. Turnbull wrote: > >> I forget if Guido was very sympathetic to null-coalescing operators, > >> given somebody came up with a good syntax. > > > > As I remember the discussion, I thought he'd more or less conceded on > > the use of ? but there was disagreement on how to implement it that > > never got resolved. Concerns like, you can't have a?.b return None > > because then a?.b() isn't callable, unless you want to use a?.b?() for > > this case, or some people wanted to have "a?" [where a is None] return a > > magic object whose attribute/call/getitem would give no error, but that > > would have to keep returning itself and never actually return None for > > chained operators. > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Oct 13 23:32:49 2016 From: random832 at fastmail.com (Random832) Date: Thu, 13 Oct 2016 23:32:49 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013221525.GD22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> Message-ID: <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> On Thu, Oct 13, 2016, at 18:15, Steven D'Aprano wrote: > Consider the analogy with f(*t), where t = (a, b, c). We *don't* have: > > f(*t) is equivalent to f(a); f(b); f(c) I don't know where this "analogy" is coming from. f(*t) == f(a, b, c) [*t] == [a, b, c] {*t} == {a, b, c} All of this is true *today*. t, u, v = (a, b, c), (d, e, f), (g, h, i) f(*t, *u, *v) == f(a, b, c, d, e, f, g, h, i) [*t, *u, *v] == [a, b, c, d, e, f, g, h, i] > > is very confusing, from what I understand: what does the `*` even mean > > here. > > Indeed. The reader may be forgiven for thinking that this is yet another > unrelated and arbitrary use of * to join the many other uses: How is it arbitrary? > - mathematical operator; > - glob and regex wild-card; > - unpacking; This is unpacking. It unpacks the results into the destination. There's a straight line from [*t, *u, *v] to [*x for x in (t, u, v)]. What's surprising is that it doesn't work now. I think last month we even had someone who didn't know about 'yield from' propose 'yield *x' for exactly this feature. It is intuitive - it is a straight-line extension of the unpacking syntax. > - import all > - and now yield from From mikhailwas at gmail.com Fri Oct 14 01:21:48 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 14 Oct 2016 07:21:48 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FF52E3.3060309@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > #define O_RDONLY 0x0000 /* open for reading only */ > #define O_WRONLY 0x0001 /* open for writing only */ > #define O_RDWR 0x0002 /* open for reading and writing */ > #define O_ACCMODE 0x0003 /* mask for above modes */ Good example. But it is not an average high level code of course. Example works again only if we for some reason follow binary segmentation which is bound to low level functionality, in this case 8 bit grouping. > If you have occasion to write a literal representing a > character code, there's nothing to stop you writing it > in hex to match the way it's shown in a repr(), or in > published Unicode tables, etc. >>> c = "\u1235" >>> if "\u1230" <= c <= "\u123f": > I don't see a need for any conversions back and forth. I'll explain what I mean with an example. This is an example which I would make to support my proposal. Compare: if "\u1230" <= c <= "\u123f": and: o = ord (c) if 100 <= o <= 150: So yours is a valid code but for me its freaky, and surely I stick to the second variant. You said, I can better see in which unicode page I am by looking at hex ordinal, but I hardly need it, I just need to know one integer, namely where some range begins, that's it. Furthermore this is the code which would an average programmer better read and maintain. So it is the question of maintainability (+1). Secondly, for me it is the question of being able to type in and correct these decimal values: look, if I make a mistake, typo, or want to expand the range by some value I need to make summ and substract operation in my head to progress with my code effectively. Obviously nobody operates good with two notations in head simultanosly, so I will complete my code without extra effort. Is it clear now what I mean by conversions back and forth? This example alone actually explains my whole point very well, I feel however like being misunderstood or so. >> I am not against base-16 itself in the first place, >> but rather against the character set which is simply visually >> inconsistent and not readable. >Now you're talking about inventing new characters, or >at least new glyphs for existing ones, and persuading >everyone to use them. That's well beyond the scope of >what Python can achieve! Yes ideally one uses other glyphs for base-16 it does not however mean that one must use new invented glyphs. In standard ASCII there are enough glyphs that would work way better together, but it is too late anyway, should be better decided at the time of standard declaration. Got to love it. > The meaning of 0xC001 is much clearer to me than > 1100000000000001, because I'd have to count the bits very > carefully in the latter to distinguish it from, e.g. > 6001 or 18001. > The bits could be spaced out: > 1100 0000 0000 0001 > but that just takes up even more room to no good effect. > I don't find it any faster to read -- if anything, it's Greg, I feel somehow that you are an open minded person and I value this. You also can understand quite good how you read. What you refer to here is the brevity of the word Indeed there is some degrade of readability if the word is too big, or a font is set to big size, so you brake it, one step towards better. And now I'll explain you some further magic regarding the binary representation. If you find free time you can experiment a bit. So what is so peculiar about bitstring actually? Bitstring unlike higher bases can be treated as an abscence/presence of the signal, which is not possible for higher bases, literally binary string can be made almost "analphabetic" if one could say so. Consider such notation: instead of 1100 0000 0000 0001 you write ??-? ---- ---- ---? (NOTE: of course if you read this in non monospaced font you will not see it correct, I should make screenshots which I will do in a while) Note that I choose this letter not accidentally, this letter is similar to one of glyphs with peak readability. The unset value simply would be a stroke. So I take only one letter. ??-? ---- ---- ---? ---? ---? --?- -?-- --?- ---- ---- ---? ---- ---- --?- ---? -??- ??-- ---- ---- ---- ---- ---- ---- --?? ---- ?--- ---? -?-- --?? ---- ---? So the digits need not be equal-weighted as in higher bases. What does it bring? Simple: you can downscale the strings, so a 16-bit value would be ~60 pixels wide (for 96dpi display) without legibility loss, so it compensate the "too wide to scan" issue. And don't forget to make enough linespacing. Other benefits of binary string obviously: - nice editing feautures like bitshifting - very interesting cognitive features, (it becomes more noticable however if you train to work with it) ... So there is a whole bunch of good effects. Understand me right, I don't have reason not to believe you that you don't see any effect, but you should always remember that this can be simply caused by your habit. So if you are more than 40 years old (sorry for some familiarity) this can be really strong issue and unfortunately hardly changeable. It is not bad, it is natural thing, it is with everyone so. > When I say "instantly", I really do mean *instantly*. > I fail to see how a different glyph set could reduce > the recognition time to less than zero. It is not about speed, it is about brain load. Chinese can read their hieroglyphs fast, but the cognition load on the brain is 100 times higher than current latin set. I know people who can read bash scripts fast, but would you claim that bash syntax can be any good compared to Python syntax? > Another point -- a string of hex digits is much easier > for me to *remember* Could be, I personally can remember numbers in the above mentioned notation fotographically, opposed to decimal, where I also tend to speak it out to remember better, that is interesting, more of psychology however. Everyone is unique however in this sense. Already noted, another good alternative for 8bit aligned data will be quoternary notation, it is 2x more compact and can be very legible due to few glyphs, it is also possible to emulate it with existing chars. Mikhail From greg.ewing at canterbury.ac.nz Fri Oct 14 01:23:32 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 18:23:32 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013165546.GB22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> Message-ID: <58006BD4.2000109@canterbury.ac.nz> Steven D'Aprano wrote: > py> def gen(): > ... for t in [(1, 'a'), (2, 'b'), (3, 'c')]: > ... yield *t > File "", line 3 > yield *t > ^ > SyntaxError: invalid syntax > > Even if it was allowed, what would it mean? It could only mean "unpack > the sequence t, and collect the values into a tuple; then yield the > tuple". To maintain the identity list(*x for x in y) == [*x for x in y] it would be necessary for the *x in (*x for x in y) to expand to "yield from x". -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 14 01:32:44 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 18:32:44 +1300 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> Message-ID: <58006DFC.8040009@canterbury.ac.nz> David Mertz wrote: > it would always be "Here's a Python wart to look out > for if you see it in other code... you should not ever use it yourself." Do you currently tell them the same thing about the use of * in a list display? -- Greg From mertz at gnosis.cx Fri Oct 14 01:42:23 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 13 Oct 2016 22:42:23 -0700 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <58006DFC.8040009@canterbury.ac.nz> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> <20161012232943.GV22471@ando.pearwood.info> <20161013141001.GY22471@ando.pearwood.info> <62effc3e-c0a2-59b1-75b0-760079a1bae5@mail.de> <58006DFC.8040009@canterbury.ac.nz> Message-ID: I've never used nor taught a * in a list display. I don't think they seem so bad, but it's a step down a slippery slope towards forms that might as well be Perl. On Oct 13, 2016 10:33 PM, "Greg Ewing" wrote: > David Mertz wrote: > >> it would always be "Here's a Python wart to look out for if you see it in >> other code... you should not ever use it yourself." >> > > Do you currently tell them the same thing about the use > of * in a list display? > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Oct 14 01:54:50 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Oct 2016 16:54:50 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> Message-ID: <20161014055448.GI22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 07:21:48AM +0200, Mikhail V wrote: > I'll explain what I mean with an example. > This is an example which I would make to > support my proposal. Compare: > > if "\u1230" <= c <= "\u123f": For an English-speaker writing that, I'd recommend: if "\N{ETHIOPIC SYLLABLE SA}" <= c <= "\N{ETHIOPIC SYLLABLE SHWA}": ... which is a bit verbose, but that's the price you pay for programming with text in a language you don't read. If you do read Ethiopian, then you can simply write: if "?" <= c <= "?": ... which to a literate reader of Ethiopean, is no harder to understand than the strange and mysterious rotated and reflected glyphs used by Europeans: if "d" <= c <= "p": ... (Why is "double-u" written as vv (w) instead of uu?) > and: > > o = ord (c) > if 100 <= o <= 150: Which is clearly not the same thing, and better written as: if "d" <= c <= "\x96": ... > So yours is a valid code but for me its freaky, > and surely I stick to the second variant. > You said, I can better see in which unicode page > I am by looking at hex ordinal, but I hardly > need it, I just need to know one integer, namely > where some range begins, that's it. > Furthermore this is the code which would an average > programmer better read and maintain. No, the average programmer is MUCH more skillful than that. Your standard for what you consider "average" seems to me to be more like "lowest 20%". [...] > I feel however like being misunderstood or so. Trust me, we understand you perfectly. You personally aren't familiar or comfortable with hexadecimal, Unicode code points, or programming standards which have been in widespread use for at least 35 years, and probably more like 50, but rather than accepting that this means you have a lot to learn, you think you can convince the rest of the world to dumb-down and de-skill to a level that you are comfortable with. And that eventually the entire world will standardise on just 100 characters, which you think is enough for all communication, maths and science. Good luck with that last one. Even if you could convince the Chinese and Japanese to swap to ASCII, I'd like to see you pry the emoji out of the young folk's phones. [...] > It is not about speed, it is about brain load. > Chinese can read their hieroglyphs fast, but > the cognition load on the brain is 100 times higher > than current latin set. Citation required. -- Steve From greg.ewing at canterbury.ac.nz Fri Oct 14 01:57:23 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 18:57:23 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> Message-ID: <580073C3.1030004@canterbury.ac.nz> Random832 wrote: > [*map(math.exp, t) for t in [(1, 2), (3, 4)]] > > [*(math.exp(x) for x in t) for t in [(1, 2), (3, 4)]] Or more simply, [math.exp(x) for t in [(1, 2), (3, 4)] for x in t] I think this brings out an important point. While it would be nice to allow * unpacking in comprehensions for consistency with displays, it's not strictly necessary, since you can always get the same effect with another level of looping. So it comes down to whether you think added conistency, plus maybe some efficiency gains in some cases, are worth making the change. -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 14 02:00:21 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 19:00:21 +1300 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> Message-ID: <58007475.9010306@canterbury.ac.nz> Neil Girdhar wrote: > At the end of this discussion it might be good to get a tally of how > many people think the proposal is reasonable and logical. I think it's reasonable and logical. -- Greg From mikhailwas at gmail.com Fri Oct 14 02:05:40 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 14 Oct 2016 08:05:40 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <57FF4356.1070104@egenix.com> References: <57FEAF9F.5020103@egenix.com> <57FF4356.1070104@egenix.com> Message-ID: On 13 October 2016 at 10:18, M.-A. Lemburg wrote: > I suppose you did not intend everyone to have to write > \u0000010 just to get a newline code point to avoid the > ambiguity. Ok there are different usage cases. So in short without going into detail, for example if I need to type in a unicode string literal in ASCII editor I would find such notation replacement beneficial for me: u'\u0430\u0431\u0432.txt' --> u"{1072}{1073}{1074}.txt" Printing could be the same I suppose. I use Python 2.7. And printing so with numbers instead of non-ASCII would help me see where I have non-ASCII chars. But I think the print behavior must be easily configurable. Any critics on it? Besides not following the unicode consortium. Also I would not even mind fixed width 7-digit decimals actually. Ugly but still for me better than hex. Mikhail From greg.ewing at canterbury.ac.nz Fri Oct 14 02:06:12 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 19:06:12 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013204019.GE13170@sjoerdjob.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> Message-ID: <580075D4.9050807@canterbury.ac.nz> Sjoerd Job Postmus wrote: > I think the suggested spelling (`*`) is the confusing part. If it were > to be spelled `from ` instead, it would be less confusing. Are you suggesting this spelling just for generator comprehensions, or for list comprehensions as well? What about dict comprehensions? -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 14 02:15:35 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 19:15:35 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <58007807.8010504@canterbury.ac.nz> Paul Moore wrote: > please can you explain how to modify that translation rule to > incorporate the suggested syntax? It's quite simple: when there's a '*', replace 'append' with 'extend': [*fn(x) for x in lst if cond] expands to result = [] for x in lst: if cond: result.extend(fn(x)) The people thinking that you should just stick the '*x' in as an argument to append() are misunderstanding the nature of the expansion. You can't do that, because the current expansion is based on the assumption that the thing being substituted is an expression, and '*x' is not a valid expression on its own. A new rule is needed to handle that case. And I'm the one who *invented* that expansion, so I get to say what it means. :-) -- Greg From jcgoble3 at gmail.com Fri Oct 14 02:21:54 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Fri, 14 Oct 2016 02:21:54 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <20161014055448.GI22471@ando.pearwood.info> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> Message-ID: On Fri, Oct 14, 2016 at 1:54 AM, Steven D'Aprano wrote: >> and: >> >> o = ord (c) >> if 100 <= o <= 150: > > Which is clearly not the same thing, and better written as: > > if "d" <= c <= "\x96": > ... Or, if you really want to use ord(), you can use hex literals: o = ord(c) if 0x64 <= o <= 0x96: ... From greg.ewing at canterbury.ac.nz Fri Oct 14 02:54:07 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 19:54:07 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <5800810F.5080200@canterbury.ac.nz> Paul Moore wrote: > Where in [fn(x) for x in lst if cond] is the * > allowed? fn(*x)? *fn(x)? Obviously you're *allowed* to put fn(*x), because that's already a valid function call, but the only *new* place we're talking about, and proposing new semantics for, is in front of the expression representing items to be added to the list, i.e. [*fn(x) for ...] > I think it's probably time for someone to > describe the precise syntax (as BNF, like the syntax in the Python > docs at https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries Replace comprehension ::= expression comp_for with comprehension ::= (expression | "*" expression) comp_for > and semantics (as an explanation of how to > rewrite any syntactically valid display as a loop). The expansion of the "*" case is the same as currently except that 'append' is replaced by 'extend' in a list comprehension, 'yield' is replaced by 'yield from' in a generator comprehension. If we decided to also allow ** in dict comprehensions, then the expansion would use 'update'. -- Greg From mikhailwas at gmail.com Fri Oct 14 03:02:50 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 14 Oct 2016 09:02:50 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FF493B.5040306@canterbury.ac.nz> Message-ID: On 13 October 2016 at 12:05, Cory Benfield wrote: > > integer & 0x00FFFFFF # Hex > integer & 16777215 # Decimal > integer & 0o77777777 # Octal > integer & 0b111111111111111111111111 # Binary > > The octal representation is infuriating because one octal digit refers to *three* bits Correct, makes it not so nice looking and 8-bit-paradigm friendly. Does not make it however bad option in general and according to my personal suppositions and works on glyph development the optimal set is exactly of 8 glyphs. > Decimal is no clearer. In same alignment problematic context, yes, correct. > Binary notation seems like the solution, ... Agree with you, see my last reply to Greg for more thoughts on bitstrings and quoternary approach. > IIRC there?s some new syntax coming for binary literals > that would let us represent them as 0b1111_1111_1111_1111 Very good. Healthy attitude :) > less dense and loses clarity for many kinds of unusual bit patterns. Not very clear for me what is exactly there with patterns. > Additionally, as the number of bits increases life gets really hard: > masking out certain bits of a 64-bit number requires Self the editing of such a BITmask in hex notation makes life hard. Editing it in binary notation makes life easier. > a literal that?s at least 66 characters long, Length is a feature of binary, though it is not major issue, see my ideas on it in reply to Greg > Hexadecimal has the clear advantage that each character wholly represents 4 bits, This advantage is brevity, but one need slightly less brevity to make it more readable. So what do you think about base 4 ? > This is a very long argument to suggest that your > argument against hexadecimal literals > (namely, that they use 16 glyphs as opposed > to the 10 glyphs used in decimal) > is an argument that is too simple to be correct. I didn't understood this sorry :))) Youre welcome to ask more if youre intersted in this. > Different collections of glyphs are clearer in different contexts. How much different collections and how much different contexts? > while the english language requires 26 glyphs plus punctuation. Does not *require*, but of course 8 glyphs would not suffice to effectively read the language, so one finds a way to extend the glyph set. Roughly speaking 20 letters is enough, but this is not exact science. And it is quite hard science. > But I don?t think you?re seriously proposing we should > swap from writing English using the larger glyph set > to writing it in decimal representation of ASCII bytes. I didn't understand this sentence :) In general I think we agree on many points, thank you for the input! Cheers, Mikhail From greg.ewing at canterbury.ac.nz Fri Oct 14 03:04:11 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 20:04:11 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> Message-ID: <5800836B.2090600@canterbury.ac.nz> Paul Moore wrote: > PS I can counter a suggestion of using *f(t) rather than from f(t) in > the above, by saying that it adds yet another meaning to the already > heavily overloaded * symbol. We've *already* given it that meaning in non-comprehension list displays, though, so we're not really adding any new meanings for it -- just allowing it to have that meaning in a place where it's currently disallowed. Something I've just noticed -- the Language Reference actually defines both ordinary list displays and list comprehensions as "displays", and says that a display can contain either a comprehension or an explicit list of values. It has to go out of its way a bit to restrict the * form to non-comprehensions. -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 14 03:07:22 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 20:07:22 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <5800842A.6030109@canterbury.ac.nz> Paul Moore wrote: > 3. *fn(x) isn't an expression, and yet it *looks* like it should be ... > To me, that suggests it would be hard to teach. It's not an expression in any of the other places it's used, either. Is it hard to to teach in those cases as well? -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 14 03:11:57 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 20:11:57 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> Message-ID: <5800853D.4020307@canterbury.ac.nz> ????? wrote: > I think it is an unfortunate accident of syntax, the use of "yield from > foo()" instead of "yield *foo()". I think that was actually discussed back when yield-from was being thrashed out, but as far as I remember we didn't have * in list displays then, so the argument for it was weaker. If we had, it might have been given more serious consideration. -- Greg From sjoerdjob at sjoerdjob.com Fri Oct 14 03:26:39 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Fri, 14 Oct 2016 09:26:39 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FF4356.1070104@egenix.com> Message-ID: <20161014072639.GF13170@sjoerdjob.com> On Fri, Oct 14, 2016 at 08:05:40AM +0200, Mikhail V wrote: > Any critics on it? Besides not following the unicode consortium. Besides the other remarks on "tradition", I think this is where a big problem lies: We should not deviate from a common standard (without very good cause). There are cases where a language does good by deviating from the common standard. There are also cases where it is bad to deviate. Almost all current programming languages understand unicode, for instance: * C: http://en.cppreference.com/w/c/language/escape * C++: http://en.cppreference.com/w/cpp/language/escape * JavaScript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#Using_special_characters_in_strings and that were only the first 3 I tried. They all use `\u` followed by 4 hexadecimal digits. You may not like the current standard. You may think/know/... it to be suboptimal for human comprehension. However, what you are suggesting is a very costly change. A change where --- I believe --- Python should not take the lead, but also should not be afraid to follow if other programming languages start to change. I would suggest that this is a change that might be best proposed to the unicode consortium itself, instead of going to (just) a programming language. It'd be interesting to see whether or not you can convince the unicode consortium that 8 symbols will be enough. From greg.ewing at canterbury.ac.nz Fri Oct 14 03:29:28 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 20:29:28 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161013221525.GD22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> Message-ID: <58008958.403@canterbury.ac.nz> Steven D'Aprano wrote: So why would yield *t give us this? > > yield a; yield b; yield c > > By analogy with the function call syntax, it should mean: > > yield (a, b, c) This is a false analogy, because yield is not a function. >>However, consider the following spelling: >> >> l = [from f(t) for t in iterable] That sentence no verb! In English, 'from' is a preposition, so one expects there to be a verb associated with it somewhere. We currently have 'from ... import' and 'yield from'. But 'from f(t) for t in iterable' ... do what? -- Greg From sjoerdjob at sjoerdjob.com Fri Oct 14 03:33:11 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Fri, 14 Oct 2016 09:33:11 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <580075D4.9050807@canterbury.ac.nz> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <580075D4.9050807@canterbury.ac.nz> Message-ID: <20161014073311.GG13170@sjoerdjob.com> On Fri, Oct 14, 2016 at 07:06:12PM +1300, Greg Ewing wrote: > Sjoerd Job Postmus wrote: > >I think the suggested spelling (`*`) is the confusing part. If it were > >to be spelled `from ` instead, it would be less confusing. > > Are you suggesting this spelling just for generator > comprehensions, or for list comprehensions as well? > What about dict comprehensions? For both generator, list and set comprehensions it makes sense, I think. For dict comprehensions: not so much. That in itself is already sign enough that probably the */** spelling would make more sense, while also allowing the `yield *foo` alternative to `yield from foo`. But what would be the meaning of `yield **foo`? Would that be `yield *foo.items()`? I have no idea. From songofacandy at gmail.com Fri Oct 14 03:40:12 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 14 Oct 2016 16:40:12 +0900 Subject: [Python-ideas] Show more info when `python -vV` Message-ID: When reporting issue to some project and want to include python version in the report, python -V shows very limited information. $ ./python.exe -V Python 3.6.0b2+ sys.version is more usable, but it requires one liner. $ ./python.exe -c 'import sys; print(sys.version)' 3.6.0b2+ (3.6:86a1905ea28d+, Oct 13 2016, 17:58:37) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] How about `python -vV` shows sys.version? perl -V is very verbose and it's helpful to be included in bug report. Some of them are useful and worth enough to include in `python -vV`. $ perl -V Summary of my perl5 (revision 5 version 18 subversion 2) configuration: Platform: osname=darwin, osvers=15.0, archname=darwin-thread-multi-2level uname='darwin osx219.apple.com 15.0 darwin kernel version 15.0.0: fri may 22 22:03:51 pdt 2015; root:xnu-3216.0.0.1.11~1development_x86_64 x86_64 ' config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=cc' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-arch i386 -arch x86_64 -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector', optimize='-Os', cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector' ccversion='', gccversion='4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc -mmacosx-version-min=10.11.3', ldflags ='-arch i386 -arch x86_64 -fstack-protector' libpth=/usr/lib /usr/local/lib libs= perllibs= libc=, so=dylib, useshrplib=true, libperl=libperl.dylib gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-arch i386 -arch x86_64 -bundle -undefined dynamic_lookup -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Locally applied patches: /Library/Perl/Updates/ comes before system perl directories installprivlib and installarchlib points to the Updates directory Built under darwin Compiled at Aug 11 2015 04:22:26 @INC: /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 . -- INADA Naoki From mikhailwas at gmail.com Fri Oct 14 03:53:07 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 14 Oct 2016 09:53:07 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: On 13 October 2016 at 16:50, Chris Angelico wrote: > On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano wrote: >> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote: >>> and in long perspective when the world's alphabetical garbage will >>> dissapear, two digits would be ok. >> Talking about "alphabetical garbage" like that makes you seem to be an >> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even >> 7-bit ASCII has more than 100 characters (128). This is sort of rude. Are you from unicode consortium? > Solution: Abolish most of the control characters. Let's define a brand > new character encoding with no "alphabetical garbage". These > characters will be sufficient for everyone: > > * [2] Formatting characters: space, newline. Everything else can go. > * [8] Digits: 01234567 > * [26] Lower case Latin letters a-z > * [2] Vital social media characters: # (now officially called "HASHTAG"), @ > * [2] Can't-type-URLs-without-them: colon, slash (now called both > "SLASH" and "BACKSLASH") > > That's 40 characters that should cover all the important things anyone > does - namely, Twitter, Facebook, and email. We don't need punctuation > or capitalization, as they're dying arts and just make you look > pretentious. I might have missed a few critical characters, but it > should be possible to fit it all within 64, which you can then > represent using two digits from our newly-restricted set; octal is > better than decimal, as it needs less symbols. (Oh, sorry, so that's > actually "50" characters, of which "32" are the letters. And we can > use up to "100" and still fit within two digits.) > > Is this the wrong approach, Mikhail? This is sort of correct approach. We do need punctuation however. And one does not need of course to make it too tight. So 8-bit units for text is excellent and enough space left for experiments. > Perhaps we should go the other > way, then, and be *inclusive* of people who speak other languages. What keeps people from using same characters? I will tell you what - it is local law. If you go to school you *have* to write in what is prescribed by big daddy. If youre in europe or America, you are more lucky. And if you're in China you'll be punished if you want some freedom. So like it or not, learn hieroglyphs and become visually impaired in age of 18. > Thanks to Unicode's rich collection of characters, we can represent > multiple languages in a single document; Can do it without unicode in 8-bit boundaries with tagged text, just need fonts for your language, of course if your local charset is less than 256 letters. This is how it was before unicode I suppose. BTW I don't get it still what such revolutionary advantages has unicode compared to tagged text. > script, but have different characters. Alphabetical garbage, or > accurate representations of sounds and words in those languages? Accurate with some 50 characters is more than enough. Mikhail From mistersheik at gmail.com Fri Oct 14 03:51:18 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 14 Oct 2016 07:51:18 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161014073311.GG13170@sjoerdjob.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <580075D4.9050807@canterbury.ac.nz> <20161014073311.GG13170@sjoerdjob.com> Message-ID: Here's an interesting idea regarding yield **x: Right now a function containing any yield returns a generator. Therefore, it works like a generator expression, which is the lazy version of a list display. lists can only contain elements x and unpackings *x. Therefore, it would make sense to only have "yield x" and "yield *xs" (currently spelled "yield from xs") If one day, there was a motivation to provide a lazy version of a dict display, then such a function would probably have "yield key: value" or "yield **d". Such a lazy dictionary is the map stage of the famous mapreduce algorithm. It might not make sense in single processor python, but it might in distributed Python. Best, Neil On Fri, Oct 14, 2016 at 3:34 AM Sjoerd Job Postmus wrote: > On Fri, Oct 14, 2016 at 07:06:12PM +1300, Greg Ewing wrote: > > Sjoerd Job Postmus wrote: > > >I think the suggested spelling (`*`) is the confusing part. If it were > > >to be spelled `from ` instead, it would be less confusing. > > > > Are you suggesting this spelling just for generator > > comprehensions, or for list comprehensions as well? > > What about dict comprehensions? > > For both generator, list and set comprehensions it makes sense, I think. > For dict comprehensions: not so much. That in itself is already sign > enough that probably the */** spelling would make more sense, while also > allowing the `yield *foo` alternative to `yield from foo`. But what > would be the meaning of `yield **foo`? Would that be `yield > *foo.items()`? I have no idea. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/ROYNN7a5VAc/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Oct 14 04:18:27 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 14 Oct 2016 19:18:27 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: On Fri, Oct 14, 2016 at 6:53 PM, Mikhail V wrote: > On 13 October 2016 at 16:50, Chris Angelico wrote: >> On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano wrote: >>> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote: >>>> and in long perspective when the world's alphabetical garbage will >>>> dissapear, two digits would be ok. >>> Talking about "alphabetical garbage" like that makes you seem to be an >>> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even >>> 7-bit ASCII has more than 100 characters (128). > > This is sort of rude. Are you from unicode consortium? No, he's not. He just knows a thing or two. >> Solution: Abolish most of the control characters. Let's define a brand >> new character encoding with no "alphabetical garbage". These >> characters will be sufficient for everyone: >> >> * [2] Formatting characters: space, newline. Everything else can go. >> * [8] Digits: 01234567 >> * [26] Lower case Latin letters a-z >> * [2] Vital social media characters: # (now officially called "HASHTAG"), @ >> * [2] Can't-type-URLs-without-them: colon, slash (now called both >> "SLASH" and "BACKSLASH") >> >> That's 40 characters that should cover all the important things anyone >> does - namely, Twitter, Facebook, and email. We don't need punctuation >> or capitalization, as they're dying arts and just make you look >> pretentious. I might have missed a few critical characters, but it >> should be possible to fit it all within 64, which you can then >> represent using two digits from our newly-restricted set; octal is >> better than decimal, as it needs less symbols. (Oh, sorry, so that's >> actually "50" characters, of which "32" are the letters. And we can >> use up to "100" and still fit within two digits.) >> >> Is this the wrong approach, Mikhail? > > This is sort of correct approach. We do need punctuation however. > And one does not need of course to make it too tight. > So 8-bit units for text is excellent and enough space left for experiments. ... okay. I'm done arguing. Go do some translation work some time. Here, have a read of some stuff I've written before. http://rosuav.blogspot.com/2016/09/case-sensitivity-matters.html http://rosuav.blogspot.com/2015/03/file-systems-case-insensitivity-is.html http://rosuav.blogspot.com/2014/12/unicode-makes-life-easy.html >> Perhaps we should go the other >> way, then, and be *inclusive* of people who speak other languages. > > What keeps people from using same characters? > I will tell you what - it is local law. If you go to school you *have* to > write in what is prescribed by big daddy. If youre in europe or America, you are > more lucky. And if you're in China you'll be punished if you > want some freedom. So like it or not, learn hieroglyphs > and become visually impaired in age of 18. Never mind about China and its political problems. All you need to do is move around Europe for a bit and see how there are more sounds than can be usefully represented. Turkish has a simple system wherein the written and spoken forms have direct correspondence, which means they need to distinguish eight fundamental vowels. How are you going to spell those? Scandinavian languages make use of letters like "?" (called "A with ring" in English, but identified by its sound in Norwegian, same as our letters are - pronounced "Aww" or "Or" or "Au" or thereabouts). To adequately represent both Turkish and Norwegian in the same document, you *need* more letters than our 26. >> Thanks to Unicode's rich collection of characters, we can represent >> multiple languages in a single document; > > Can do it without unicode in 8-bit boundaries with tagged text, > just need fonts for your language, of course if your > local charset is less than 256 letters. No, you can't. Also, you shouldn't. It makes virtually every text operation impossible: you can't split and rejoin text without tracking the encodings. Go try to write a text editor under your scheme and see how hard it is. > This is how it was before unicode I suppose. > BTW I don't get it still what such revolutionary > advantages has unicode compared to tagged text. It's not tagged. That's the huge advantage. >> script, but have different characters. Alphabetical garbage, or >> accurate representations of sounds and words in those languages? > > Accurate with some 50 characters is more than enough. Go build a chat room or something. Invite people to enter their names. Now make sure you're courteous enough to display those names to people. Try doing that without Unicode. I'm done. None of this belongs on python-ideas - it's getting pretty off-topic even for python-list, and you're talking about modifying Python 2.7 which is a total non-starter anyway. ChrisA From cory at lukasa.co.uk Fri Oct 14 04:18:29 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 14 Oct 2016 09:18:29 +0100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: <239B978B-4829-477F-9236-812FD9051595@lukasa.co.uk> > On 14 Oct 2016, at 08:53, Mikhail V wrote: > > What keeps people from using same characters? > I will tell you what - it is local law. If you go to school you *have* to > write in what is prescribed by big daddy. If youre in europe or America, you are > more lucky. And if you're in China you'll be punished if you > want some freedom. So like it or not, learn hieroglyphs > and become visually impaired in age of 18. So you know, for the future, I think this comment is going to be the one that causes most of the people who were left to disengage with this discussion. The many glyphs that exist for writing various human languages are not inefficiency to be optimised away. Further, I should note that most places to not legislate about what character sets are acceptable to transcribe their languages. Indeed, plenty of non-romance-language-speakers have found ways to transcribe their languages of choice into the limited 8-bit character sets that the Anglophone world propagated: take a look at Arabish for the best kind of example of this behaviour, where "???? ???? ??? ???????? ?? ?????????" will get rendered as "el gaw 3amel eh elnaharda f eskendereya?? But I think you?re in a tiny minority of people who believe that all languages should be rendered in the same script. I can think of only two reasons to argue for this: 1. Dealing with lots of scripts is technologically tricky and it would be better if we didn?t bother. This is the anti-Unicode argument, and it?s a weak argument, though it has the advantage of being internally consistent. 2. There is some genuine harm caused by learning non-ASCII scripts. Your paragraph suggest that you really believe that learning to write in Kanji (logographic system) as opposed to Katagana (alphabetic system with 48 non-punctuation characters) somehow leads to active harm (your phrase was ?become visually impaired?). I?m afraid that you?re really going to need to provide one hell of a citation for that, because that?s quite an extraordinary claim. Otherwise, I?m afraid I have to say ????????. Cory From rosuav at gmail.com Fri Oct 14 04:26:21 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 14 Oct 2016 19:26:21 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <239B978B-4829-477F-9236-812FD9051595@lukasa.co.uk> References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> <239B978B-4829-477F-9236-812FD9051595@lukasa.co.uk> Message-ID: On Fri, Oct 14, 2016 at 7:18 PM, Cory Benfield wrote: > The many glyphs that exist for writing various human languages are not inefficiency to be optimised away. Further, I should note that most places to not legislate about what character sets are acceptable to transcribe their languages. Indeed, plenty of non-romance-language-speakers have found ways to transcribe their languages of choice into the limited 8-bit character sets that the Anglophone world propagated: take a look at Arabish for the best kind of example of this behaviour, where "???? ???? ??? ???????? ?? ?????????" will get rendered as "el gaw 3amel eh elnaharda f eskendereya?? > I've worked with transliterations enough to have built myself a dedicated translit tool. It's pretty straight-forward to come up with something you can type on a US-English keyboard (eg "a\o" for "?", and "d\-" for "?"), and in some cases, it helps with visual/audio synchronization, but nobody would ever claim that it's the best way to represent that language. https://github.com/Rosuav/LetItTrans/blob/master/25%20languages.srt > But I think you?re in a tiny minority of people who believe that all languages should be rendered in the same script. I can think of only two reasons to argue for this: > > 1. Dealing with lots of scripts is technologically tricky and it would be better if we didn?t bother. This is the anti-Unicode argument, and it?s a weak argument, though it has the advantage of being internally consistent. > 2. There is some genuine harm caused by learning non-ASCII scripts. #1 does carry a decent bit of weight, but only if you start with the assumption that "characters are bytes". If you once shed that assumption (and the related assumption that "characters are 16-bit numbers"), the only weight it carries is "right-to-left text is hard"... and let's face it, that *is* hard, but there are far, far harder problems in computing. Oh wait. Naming things. In Hebrew. That's hard. ChrisA From storchaka at gmail.com Fri Oct 14 04:26:20 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 14 Oct 2016 11:26:20 +0300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: On 13.10.16 17:50, Chris Angelico wrote: > Solution: Abolish most of the control characters. Let's define a brand > new character encoding with no "alphabetical garbage". These > characters will be sufficient for everyone: > > * [2] Formatting characters: space, newline. Everything else can go. > * [8] Digits: 01234567 > * [26] Lower case Latin letters a-z > * [2] Vital social media characters: # (now officially called "HASHTAG"), @ > * [2] Can't-type-URLs-without-them: colon, slash (now called both > "SLASH" and "BACKSLASH") > > That's 40 characters that should cover all the important things anyone > does - namely, Twitter, Facebook, and email. We don't need punctuation > or capitalization, as they're dying arts and just make you look > pretentious. https://en.wikipedia.org/wiki/DEC_Radix-50 From desmoulinmichel at gmail.com Fri Oct 14 05:16:01 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Fri, 14 Oct 2016 11:16:01 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> Message-ID: Regarding all those examples: Le 14/10/2016 ? 00:08, ????? a ?crit : > Trying to restate the proposal, somewhat more formal following Random832 > and Paul's suggestion. > > I only speak about the single star. > --- > > *The suggested change of syntax:* > > comprehension ::= starred_expression comp_for > > *Semantics:* > > (In the following, f(x) must always evaluate to an iterable) > > 1. List comprehension: > > result = [*f(x) for x in iterable if cond] > > Translates to > > result = [] > for x in iterable: > if cond: > result.extend(f(x)) > > 2. Set comprehension: > > result = {*f(x) for x in iterable if cond} > > Translates to > > result = set() > for x in iterable: > if cond: > result.update(f(x)) Please note that we already have a way to do those. E.G: result = [*f(x) for x in iterable if cond] can currently been expressed as: >>> iterable = range(10) >>> f = lambda x: [x] * x >>> [y for x in iterable if x % 2 == 0 for y in f(x)] [2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8] Now I do like the new extension syntax. I find it more natural, and more readable: >>> [*f(x) for x in iterable if x % 2 == 0] But it's not a missing feature, it's really just a (rather nice) syntaxic improvement. From elazarg at gmail.com Fri Oct 14 05:27:28 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Fri, 14 Oct 2016 09:27:28 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> Message-ID: ?????? ??? ??, 14 ????' 2016, 12:19, ??? Michel Desmoulin ?< desmoulinmichel at gmail.com>: > Regarding all those examples: > > Le 14/10/2016 ? 00:08, ????? a ?crit : > > Trying to restate the proposal, somewhat more formal following Random832 > > and Paul's suggestion. > > > > I only speak about the single star. > > --- > > > > *The suggested change of syntax:* > > > > comprehension ::= starred_expression comp_for > > > > *Semantics:* > > > > (In the following, f(x) must always evaluate to an iterable) > > > > 1. List comprehension: > > > > result = [*f(x) for x in iterable if cond] > > > > Translates to > > > > result = [] > > for x in iterable: > > if cond: > > result.extend(f(x)) > > > > 2. Set comprehension: > > > > result = {*f(x) for x in iterable if cond} > > > > Translates to > > > > result = set() > > for x in iterable: > > if cond: > > result.update(f(x)) > > Please note that we already have a way to do those. E.G: > > result = [*f(x) for x in iterable if cond] > > can currently been expressed as: > > >>> iterable = range(10) > >>> f = lambda x: [x] * x > >>> [y for x in iterable if x % 2 == 0 for y in f(x)] > [2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8] > > > Now I do like the new extension syntax. I find it more natural, and more > readable: > > >>> [*f(x) for x in iterable if x % 2 == 0] > > But it's not a missing feature, it's really just a (rather nice) > syntaxic improvement. > It is about lifting restrictions from an existing syntax. That this behavior is being *explicitly disabled* in the implementation is a strong evidence, in my mind. (There are more restrictions I was asked not to divert this thread, which makes sense) Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Oct 14 05:36:35 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Oct 2016 22:36:35 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> Message-ID: <5800A723.9050806@canterbury.ac.nz> Mikhail V wrote: > if "\u1230" <= c <= "\u123f": > > and: > > o = ord (c) > if 100 <= o <= 150: Note that, if need be, you could also write that as if 0x64 <= o <= 0x96: > So yours is a valid code but for me its freaky, > and surely I stick to the second variant. The thing is, where did you get those numbers from in the first place? If you got them in some way that gives them to you in decimal, such as print(ord(c)), there is nothing to stop you from writing them as decimal constants in the code. But if you got them e.g. by looking up a character table that gives them to you in hex, you can equally well put them in as hex constants. So there is no particular advantage either way. > You said, I can better see in which unicode page > I am by looking at hex ordinal, but I hardly > need it, I just need to know one integer, namely > where some range begins, that's it. > Furthermore this is the code which would an average > programmer better read and maintain. To a maintainer who is familiar with the layout of the unicode code space, the hex representation of a character is likely to have some meaning, whereas the decimal representation will not. So for that person, using decimal would make the code *harder* to maintain. To a maintainer who doesn't have that familiarity, it makes no difference either way. So your proposal would result in a *decrease* of maintainability overall. > if I make a mistake, typo, or want to expand the range > by some value I need to make summ and substract > operation in my head to progress with my code effectively. > Is it clear now what I mean by > conversions back and forth? Yes, but in my experience the number of times I've had to do that kind of arithmetic with character codes is very nearly zero. And when I do, I'm more likely to get the computer to do it for me than work out the numbers and then type them in as literals. I just don't see this as being anywhere near being a significant problem. > In standard ASCII > there are enough glyphs that would work way better > together, Out of curiosity, what glyphs do you have in mind? > ??-? ---- ---- ---? > > you can downscale the strings, so a 16-bit > value would be ~60 pixels wide Yes, you can make the characters narrow enough that you can take 4 of them in at once, almost as though they were a single glyph... at which point you've effectively just substituted one set of 16 glyphs for another. Then you'd have to analyse whether the *combined* 4-element glyphs were easier to disinguish from each other than the ones they replaced. Since the new ones are made up of repetitions of just two elements, whereas the old ones contain a much more varied set of elements, I'd be skeptical about that. BTW, your choice of ? because of its "peak readibility" seems to be a case of taking something out of context. The readability of a glyph can only be judged in terms of how easy it is to distinguish from other glyphs. Here, the only thing that matters is distinguishing it from the other symbol, so something like "|" would perhaps be a better choice. ||-| ---- ---- ---| > So if you are more > than 40 years old (sorry for some familiarity) > this can be really strong issue and unfortunately > hardly changeable. Sure, being familiar with the current system means that it would take me some effort to become proficient with a new one. What I'm far from convinced of is that I would gain any benefit from making that effort, or that a fresh person would be noticeably better off if they learned your new system instead of the old one. At this point you're probably going to say "Greg, it's taken you 40 years to become that proficient in hex. Someone learning my system would do it much faster!" Well, no. When I was about 12 I built a computer whose only I/O devices worked in binary. From the time I first started toggling programs into it to the time I had the whole binary/hex conversion table burned into my neurons was maybe about 1 hour. And I wasn't even *trying* to memorise it, it just happened. > It is not about speed, it is about brain load. > Chinese can read their hieroglyphs fast, but > the cognition load on the brain is 100 times higher > than current latin set. Has that been measured? How? This one sets off my skepticism alarm too, because people that read Latin scripts don't read them a letter at a time -- they recognise whole *words* at once, or at least large chunks of them. The number of English words is about the same order of magnitude as the number of Chinese characters. > I know people who can read bash scripts > fast, but would you claim that bash syntax can be > any good compared to Python syntax? For the things that bash was designed to be good for, yes, it can. Python wins for anything beyond very simple programming, but bash wasn't designed for that. (The fact that some people use it that way says more about their dogged persistence in the face of adversity than it does about bash.) I don't doubt that some sets of glyphs are easier to distinguish from each other than others. But the letters and digits that we currently use have already been pretty well optimised by scribes and typographers over the last few hundred years, and I'd be surprised if there's any *major* room left for improvement. Mixing up letters and digits is certainly jarring to many people, but I'm not sure that isn't largely just because we're so used to mentally categorising them into two distinct groups. Maybe there is some objective difference that can be measured, but I'd expect it to be quite small compared to the effect of these prior "habits" as you call them. -- Greg From rosuav at gmail.com Fri Oct 14 05:41:27 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 14 Oct 2016 20:41:27 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <5800A723.9050806@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> Message-ID: On Fri, Oct 14, 2016 at 8:36 PM, Greg Ewing wrote: >> I know people who can read bash scripts >> fast, but would you claim that bash syntax can be >> any good compared to Python syntax? > > > For the things that bash was designed to be good for, > yes, it can. Python wins for anything beyond very > simple programming, but bash wasn't designed for that. > (The fact that some people use it that way says more > about their dogged persistence in the face of > adversity than it does about bash.) And any time I look at a large and complex bash script and say "this needs to be a Python script" or "this would be better done in Pike" or whatever, I end up missing the convenient syntax of piping one thing into another. Shell scripting languages are the undisputed kings of process management. ChrisA From p.f.moore at gmail.com Fri Oct 14 05:48:52 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 14 Oct 2016 10:48:52 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <5800810F.5080200@canterbury.ac.nz> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <5800810F.5080200@canterbury.ac.nz> Message-ID: On 14 October 2016 at 07:54, Greg Ewing wrote: >> I think it's probably time for someone to >> describe the precise syntax (as BNF, like the syntax in the Python >> docs at >> https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries > > > Replace > > comprehension ::= expression comp_for > > with > > comprehension ::= (expression | "*" expression) comp_for > >> and semantics (as an explanation of how to >> rewrite any syntactically valid display as a loop). > > > The expansion of the "*" case is the same as currently except > that 'append' is replaced by 'extend' in a list comprehension, > 'yield' is replaced by 'yield from' in a generator > comprehension. Thanks. That does indeed clarify. Part of my confusion was that I'm sure I'd seen someone give an example along the lines of [(x, *y, z) for ...] which *doesn't* conform to the above syntax. OTOH, it is currently valid syntax, just not an example of *this* proposal (that's part of why all this got very confusing). So now I understand what's being proposed, which is good. I don't (personally) find it very intuitive, although I'm completely capable of using the rules given to establish what it means. In practical terms, I'd be unlikely to use or recommend it - not because of anything specific about the proposal, just because it's "confusing". I would say the same about [(x, *y, z) for ...]. IMO, combining unpacking and list (or other types of) comprehensions leads to obfuscated code. Each feature is fine in isolation, but over-enthusiastic use of the ability to combine them harms readability. So I'm now -0 on this proposal. There's nothing *wrong* with it, and I now see how it can be justified as a generalisation of current rules. But I can't think of any real-world case where using the proposed syntax would measurably improve code maintainability or comprehensibility. Paul From random832 at fastmail.com Fri Oct 14 07:56:29 2016 From: random832 at fastmail.com (Random832) Date: Fri, 14 Oct 2016 07:56:29 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <20161014055448.GI22471@ando.pearwood.info> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> Message-ID: <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote: > Good luck with that last one. Even if you could convince the Chinese and > Japanese to swap to ASCII, I'd like to see you pry the emoji out of the > young folk's phones. This is actually probably the one part of this proposal that *is* feasible. While encoding emoji as a single character each makes sense for a culture that already uses thousands of characters; before they existed the English-speaking software industry already had several competing "standards" emerging for encoding them as sequences of ASCII characters. From rosuav at gmail.com Fri Oct 14 08:19:23 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 14 Oct 2016 23:19:23 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> Message-ID: On Fri, Oct 14, 2016 at 10:56 PM, Random832 wrote: > On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote: >> Good luck with that last one. Even if you could convince the Chinese and >> Japanese to swap to ASCII, I'd like to see you pry the emoji out of the >> young folk's phones. > > This is actually probably the one part of this proposal that *is* > feasible. While encoding emoji as a single character each makes sense > for a culture that already uses thousands of characters; before they > existed the English-speaking software industry already had several > competing "standards" emerging for encoding them as sequences of ASCII > characters. :-) ChrisA From gjcarneiro at gmail.com Fri Oct 14 09:14:05 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 14 Oct 2016 14:14:05 +0100 Subject: [Python-ideas] PEP 505 -- None-aware operators Message-ID: Sorry if I missed the boat, but only just now saw this PEP. Glancing through the PEP, I don't see mentioned anywhere the SQL alternative of having a coalesce() function: https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL In Python, something like this: def coalesce(*args): for arg in args: if arg is not None: return arg return None Just drop it into builtins, and voila. No need for lengthy discussions about which operator to use because IMHO it needs no operator. Sure, it's not as sexy as a fancy new operator, nor as headline grabbing, but it is pretty useful. Just my 2p. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Fri Oct 14 09:19:37 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Fri, 14 Oct 2016 13:19:37 +0000 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On Fri, Oct 14, 2016 at 4:14 PM Gustavo Carneiro wrote: > Sorry if I missed the boat, but only just now saw this PEP. > > Glancing through the PEP, I don't see mentioned anywhere the SQL > alternative of having a coalesce() function: > https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL > > In Python, something like this: > > def coalesce(*args): > for arg in args: > if arg is not None: > return arg > return None > > Just drop it into builtins, and voila. No need for lengthy discussions > about which operator to use because IMHO it needs no operator. > > Sure, it's not as sexy as a fancy new operator, nor as headline grabbing, > but it is pretty useful. > > This has the downside of not being short-circuit - arguments to the function are evaluated eagerly. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From gjcarneiro at gmail.com Fri Oct 14 09:37:42 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 14 Oct 2016 14:37:42 +0100 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On 14 October 2016 at 14:19, ????? wrote: > On Fri, Oct 14, 2016 at 4:14 PM Gustavo Carneiro > wrote: > >> Sorry if I missed the boat, but only just now saw this PEP. >> >> Glancing through the PEP, I don't see mentioned anywhere the SQL >> alternative of having a coalesce() function: https://www. >> postgresql.org/docs/9.6/static/functions-conditional. >> html#FUNCTIONS-COALESCE-NVL-IFNULL >> >> In Python, something like this: >> >> def coalesce(*args): >> for arg in args: >> if arg is not None: >> return arg >> return None >> >> Just drop it into builtins, and voila. No need for lengthy discussions >> about which operator to use because IMHO it needs no operator. >> >> Sure, it's not as sexy as a fancy new operator, nor as headline grabbing, >> but it is pretty useful. >> >> > This has the downside of not being short-circuit - arguments to the > function are evaluated eagerly. > I see. short-circuiting is nice to have, sure. But even without it, it's still useful IMHO. If you are worried about not evaluating an argument, then you can just do a normal if statement instead, for the rare cases where this is important: result = arg1 if result is None: result = compute_something() At the very least I would suggest mentioning a simple coalesce() function in the alternatives section of the PEP. coalesce function: Pros: 1. Familiarity, similar to existing function in SQL; 2. No new operator required; Cons: 1. Doesn't short-circuit the expressions; 2. Slightly more verbose than an operator; Thanks. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Fri Oct 14 09:46:04 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 14 Oct 2016 09:46:04 -0400 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On Oct 14, 2016 9:14 AM, "Gustavo Carneiro" wrote: > > Sorry if I missed the boat, but only just now saw this PEP. > > Glancing through the PEP, I don't see mentioned anywhere the SQL alternative of having a coalesce() function: https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL > > In Python, something like this: > > def coalesce(*args): > for arg in args: > if arg is not None: > return arg > return None > > Just drop it into builtins, and voila. No need for lengthy discussions about which operator to use because IMHO it needs no operator. > > Sure, it's not as sexy as a fancy new operator, nor as headline grabbing, but it is pretty useful. That function is for a different purpose. It selects the first non-null value from a flat collection. The coalesce operators are for traveling down a reference tree, and shortcutting out without an exception if the path ends. For example: return x?.a?.b?.c instead of: if x is None: return None if x.a is None: return None if x.a.b is None: return None return x.a.b.c You can use try-catch, but you might catch an irrelevant exception. try: return x.a.b.c except AttributeError: return None If `x` is an int, `x.a` will throw an AttributeError even though `x` is not None. A function for the above case is: def coalesce(obj, *names): for name in names: if obj is None: return None obj = getattr(obj, name) return obj return coalesce(x, 'a', 'b', 'c') See this section for some examples: https://www.python.org/dev/peps/pep-0505/#behavior-in-other-languages (The PEP might need more simple examples. The Motivating Examples are full chunks of code from real libraries, so they're full of distractions.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Fri Oct 14 10:23:41 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 14 Oct 2016 10:23:41 -0400 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: My mistake. You're talking about the ?? operator, and I'm thinking about the null-aware operators. You say short-circuiting would be nice to have, but short-circuiting is what people want it for. As for using `if-else`, that's listed as an alternative here: https://www.python.org/dev/peps/pep-0505/#ternary-operator The coalesce operator has the semantic advantage ("expressiveness"?): you are saying what you want to do, rather than how you do it. Making a function is one way to get semantic advantage, but you can't do that if you want short-circuiting. The question on the table is whether the semantic advantage is worth the cost of a new operator. That's a value question, so it's not gonna be easy to answer it with objective observations. (Not that I'm suggesting anything, but some languages have custom short-circuiting, via macros or lazy arg evalation. That's be using a missile to hammer in a nail.) On Oct 14, 2016 9:46 AM, "Franklin? Lee" wrote: > > On Oct 14, 2016 9:14 AM, "Gustavo Carneiro" wrote: > > > > Sorry if I missed the boat, but only just now saw this PEP. > > > > Glancing through the PEP, I don't see mentioned anywhere the SQL alternative of having a coalesce() function: https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL > > > > In Python, something like this: > > > > def coalesce(*args): > > for arg in args: > > if arg is not None: > > return arg > > return None > > > > Just drop it into builtins, and voila. No need for lengthy discussions about which operator to use because IMHO it needs no operator. > > > > Sure, it's not as sexy as a fancy new operator, nor as headline grabbing, but it is pretty useful. > > That function is for a different purpose. It selects the first non-null value from a flat collection. The coalesce operators are for traveling down a reference tree, and shortcutting out without an exception if the path ends. > > For example: > return x?.a?.b?.c > instead of: > if x is None: return None > if x.a is None: return None > if x.a.b is None: return None > return x.a.b.c > > You can use try-catch, but you might catch an irrelevant exception. > try: > return x.a.b.c > except AttributeError: > return None > If `x` is an int, `x.a` will throw an AttributeError even though `x` is not None. > > A function for the above case is: > def coalesce(obj, *names): > for name in names: > if obj is None: > return None > obj = getattr(obj, name) > return obj > > return coalesce(x, 'a', 'b', 'c') > > See this section for some examples: > https://www.python.org/dev/peps/pep-0505/#behavior-in-other-languages > > (The PEP might need more simple examples. The Motivating Examples are full chunks of code from real libraries, so they're full of distractions.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gjcarneiro at gmail.com Fri Oct 14 10:24:34 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 14 Oct 2016 15:24:34 +0100 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On 14 October 2016 at 14:46, Franklin? Lee wrote: > On Oct 14, 2016 9:14 AM, "Gustavo Carneiro" wrote: > > > > Sorry if I missed the boat, but only just now saw this PEP. > > > > Glancing through the PEP, I don't see mentioned anywhere the SQL > alternative of having a coalesce() function: https://www. > postgresql.org/docs/9.6/static/functions-conditional. > html#FUNCTIONS-COALESCE-NVL-IFNULL > > > > In Python, something like this: > > > > def coalesce(*args): > > for arg in args: > > if arg is not None: > > return arg > > return None > > > > Just drop it into builtins, and voila. No need for lengthy discussions > about which operator to use because IMHO it needs no operator. > > > > Sure, it's not as sexy as a fancy new operator, nor as headline > grabbing, but it is pretty useful. > > That function is for a different purpose. It selects the first non-null > value from a flat collection. The coalesce operators are for traveling down > a reference tree, and shortcutting out without an exception if the path > ends. > > For example: > return x?.a?.b?.c > >From what I can read in the PEP, it attacks 3 different problems at once: 1. The " null -coalescing" operator is a binary operator that returns its > left operand if it is not null . Otherwise it returns its right operand. > 2. The " null -aware member access" operator accesses an instance member > only if that instance is non- null . Otherwise it returns null . (This is > also called a "safe navigation" operator.) > 3. The " null -aware index access" operator accesses an element of a > collection only if that collection is non- null . Otherwise it returns null > . (This is another type of "safe navigation" operator.) I am proposing a coalesce() function as alternative for (solely) problem 1, while you are talking about problem 2. I do believe problems 2 and 3 are interesting too, and of course coalesce() does not work for them, they do need their own operators. Sorry, I was a bit confused by the PEP attacking 3 (related) problems at once. Thanks. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehaase at gmail.com Fri Oct 14 10:28:20 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Fri, 14 Oct 2016 10:28:20 -0400 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On Fri, Oct 14, 2016 at 9:37 AM, Gustavo Carneiro wrote: > > I see. short-circuiting is nice to have, sure. > > But even without it, it's still useful IMHO. > It's worth mentioning that SQL's COALESCE is usually (always?) short circuiting: https://www.postgresql.org/docs/9.5/static/functions-conditional.html https://msdn.microsoft.com/en-us/library/ms190349.aspx Given the debate about the utility of coalescing and the simplicity of writing the function yourself, I doubt the standard library will accept it. Most people here will tell you that such a utility belongs on PyPI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Oct 14 10:33:09 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Oct 2016 00:33:09 +1000 Subject: [Python-ideas] Improve error message when missing 'self' in method definition In-Reply-To: <20161013140437.GX22471@ando.pearwood.info> References: <22524.23684.863380.593596@turnbull.sk.tsukuba.ac.jp> <20161013140437.GX22471@ando.pearwood.info> Message-ID: On 14 October 2016 at 00:04, Steven D'Aprano wrote: > Error messages are not part of Python's public API. We should be able to > change error messages at any time, including point releases. > > Nevertheless, we shouldn't abuse that right. If it's only a change to > the error message, and not a functional change, then maybe we can add it > to the next 3.6 beta or rc. But its probably not worth backporting it to > older versions. My working assumptions for this: - students will move to the latest Python relatively quickly* -> changes aimed at newcomers can just go in the next feature release - production systems migrate slowly* -> changes aimed at making obscure failures easier to debug go into maintenance releases Neither is a hard-and-fast rule, but they're my default starting points. Cheers, Nick. *"quickly" and "slowly" are truly relative here - Python 2.6 is still pretty widely supported and used for production services, but if students are learning on anything other than Python 3.5, it's likely to be 2.7. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Oct 14 11:23:41 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Oct 2016 01:23:41 +1000 Subject: [Python-ideas] Add sorted (ordered) containers In-Reply-To: <852a9619-e69f-42fc-bdce-8a98bad5d4cc@googlegroups.com> References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> <852a9619-e69f-42fc-bdce-8a98bad5d4cc@googlegroups.com> Message-ID: On 14 October 2016 at 06:48, Neil Girdhar wrote: > Related: > > Nick posted an excellent answer to this question here: > http://stackoverflow.com/questions/5953205/why-are-there-no-sorted-containers-in-pythons-standard-libraries Ah, so this thread is why I've been getting SO notifications for that answer :) While I think that was a decent answer for its time (as standardising things too early can inhibit community experimentation - there was almost 12 years between Twisted's first release in 2002 and asyncio's provisional inclusion in the standard library in Python 3.4), I also think the broader context has changed enough that the question may be worth revisiting for Python 3.7 (in particular, the prospect that it may be possible to provide this efficiently without having to add a large new chunk of C code to maintain). However, given that Grant has already been discussing the possibility directly with Raymond as the collections module maintainer though, there's probably not a lot we can add to that discussion here, since the key trade-off is between: - helping folks that actually need a sorted container implementation find one that works well with typical memory architectures in modern CPUs - avoiding confusing folks that *don't* need a sorted container with yet another group of possible data structures to consider in the standard library *That* part of my original SO answer hasn't changed, it's just not as clearcut a decision from a maintainability perspective when we're talking about efficient and relatively easy to explain pure Python implementations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Oct 14 11:34:18 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Oct 2016 01:34:18 +1000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> References: <76da8696-8ea9-0285-f2b7-e012fdd933da@mail.de> <69ede3ff-e130-83a6-9667-99f22a91822b@mail.de> Message-ID: On 13 October 2016 at 02:32, Sven R. Kunze wrote: > Here I disagree with you. We use *args all the time, so we know what * does. > I don't understand why this should not work in between brackets [...]. It does work between brackets: >>> [*range(3)] [0, 1, 2] It doesn't work as part of the comprehension syntax, and that's the case for function calls as well: >>> f(*range(i) for i in range(3)) File "", line 1 f(*range(i) for i in range(3)) ^ SyntaxError: invalid syntax >>> [*range(i) for i in range(3)] File "", line 1 SyntaxError: iterable unpacking cannot be used in comprehension (With the less helpful error message in the function call case just being due to the vagaries of CPython's parser and compiler implementation, where things that don't even parse are just reported as "invalid syntax", while problems detected later don't have the helpful pointer to where in the line parsing failed, but do get a better explanation of what actually went wrong) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Oct 14 11:46:42 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 14 Oct 2016 08:46:42 -0700 Subject: [Python-ideas] PEP 505 -- None-aware operators In-Reply-To: References: Message-ID: On Fri, Oct 14, 2016 at 6:37 AM, Gustavo Carneiro wrote: > I see. short-circuiting is nice to have, sure. No. Short-circuiting is the entire point of the proposed operators. -- --Guido van Rossum (python.org/~guido) From tritium-list at sdamon.com Fri Oct 14 12:09:19 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Fri, 14 Oct 2016 12:09:19 -0400 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: <110a01d22635$523c9570$f6b5c050$@hotmail.com> For all intents and purposes other than debugging C (for cpython, rpython for pypy, java for jython, .NET for IronPython... you get the idea), the extra details are unnecessary to debug most problems. Most of the time it is sufficient to know what major, minor, and patchlevel you are using. You only really need to know the commit hash and compiler if you are sending a bug report about the C... and since you know when you are doing that... I don't think its uncalled for to have the one liner. > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of INADA Naoki > Sent: Friday, October 14, 2016 3:40 AM > To: python-ideas > Subject: [Python-ideas] Show more info when `python -vV` > > When reporting issue to some project and want to include > python version in the report, python -V shows very limited information. > > $ ./python.exe -V > Python 3.6.0b2+ > > sys.version is more usable, but it requires one liner. > > $ ./python.exe -c 'import sys; print(sys.version)' > 3.6.0b2+ (3.6:86a1905ea28d+, Oct 13 2016, 17:58:37) > [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] > > How about `python -vV` shows sys.version? > > > perl -V is very verbose and it's helpful to be included in bug report. > Some of them are useful and worth enough to include in `python -vV`. > > $ perl -V > Summary of my perl5 (revision 5 version 18 subversion 2) configuration: > > Platform: > osname=darwin, osvers=15.0, archname=darwin-thread-multi-2level > uname='darwin osx219.apple.com 15.0 darwin kernel version 15.0.0: > fri may 22 22:03:51 pdt 2015; > root:xnu-3216.0.0.1.11~1development_x86_64 x86_64 ' > config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= > -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none > -Dcc=cc' > hint=recommended, useposix=true, d_sigaction=define > useithreads=define, usemultiplicity=define > useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef > use64bitint=define, use64bitall=define, uselongdouble=undef > usemymalloc=n, bincompat5005=undef > Compiler: > cc='cc', ccflags ='-arch i386 -arch x86_64 -g -pipe -fno-common > -DPERL_DARWIN -fno-strict-aliasing -fstack-protector', > optimize='-Os', > cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing > -fstack-protector' > ccversion='', gccversion='4.2.1 Compatible Apple LLVM 7.0.0 > (clang-700.0.59.1)', gccosandvers='' > intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 > ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', > lseeksize=8 > alignbytes=8, prototype=define > Linker and Libraries: > ld='cc -mmacosx-version-min=10.11.3', ldflags ='-arch i386 -arch > x86_64 -fstack-protector' > libpth=/usr/lib /usr/local/lib > libs= > perllibs= > libc=, so=dylib, useshrplib=true, libperl=libperl.dylib > gnulibc_version='' > Dynamic Linking: > dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' > cccdlflags=' ', lddlflags='-arch i386 -arch x86_64 -bundle > -undefined dynamic_lookup -fstack-protector' > > > Characteristics of this binary (from libperl): > Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS > PERL_DONT_CREATE_GVSV > PERL_HASH_FUNC_ONE_AT_A_TIME_HARD > PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP > PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL > USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES > USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE > USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF > USE_REENTRANT_API > Locally applied patches: > /Library/Perl/Updates/ comes before system perl directories > installprivlib and installarchlib points to the Updates directory > Built under darwin > Compiled at Aug 11 2015 04:22:26 > @INC: > /Library/Perl/5.18/darwin-thread-multi-2level > /Library/Perl/5.18 > /Network/Library/Perl/5.18/darwin-thread-multi-2level > /Network/Library/Perl/5.18 > /Library/Perl/Updates/5.18.2 > /System/Library/Perl/5.18/darwin-thread-multi-2level > /System/Library/Perl/5.18 > /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.18 > . > > -- > INADA Naoki > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guido at python.org Fri Oct 14 12:10:34 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 14 Oct 2016 09:10:34 -0700 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: I actually think the spelling is the main stumbling block. The intrinsic value of the behavior is clear, it's finding an acceptable spelling that hold back the proposal. I propose that the next phase of the process should be to pick the best operator for each sub-proposal. Then we can decide which of the sub-proposals we actually want in the language, based on a combination of how important the functionality is and how acceptable we find the spelling. --Guido On Thu, Oct 13, 2016 at 8:20 PM, Mark E. Haase wrote: > (Replying to multiple posts in this thread) > > Guido van Rossum: >> >> Another problem is PEP 505 -- it >> is full of discussion but its specification is unreadable due to the >> author's idea to defer the actual choice of operators and use a >> strange sequence of unicode characters instead. > > > Hi, I wrote PEP-505. I'm sorry that it's unreadable. The choice of emoji as > operators was supposed to be a blatant joke. I'd be happy to submit a new > version that is ASCII. Or make any other changes that would facilitate > making a decision on the PEP. > > As I recall, the thread concluded with Guido writing, "I'll have to think > about this," or something to that effect. I had hoped that the next step > could be a survey where we could gauge opinions on the various possible > spellings. I believe this was how PEP-308 was handled, and that was a very > similar proposal to this one. > > Most of the discussion on list was really centered around the fact that > nobody like the proposed ?? or .? spellings, and nobody could see around > that fact to consider whether the feature itself was intrinsically valuable. > (This is why the PEP doesn't commit to a syntax.) Also, as unfortunate side > effect of a miscommunication, about 95% of the posts on this PEP were > written _before_ I submitted a complete draft and so most of the > conversation was arguing about a straw man. > > David Mertz: >> >> The idea is that we can easily have both "regular" behavior and None >> coalescing just by wrapping any objects in a utility class... and WITHOUT >> adding ugly syntax. I might have missed some corners where we would want >> behavior wrapped, but those shouldn't be that hard to add in principle. > > > The biggest problem with a wrapper in practice is that it has to be > unwrapped before it can be passed to any other code that doesn't know how to > handle it. E.g. if you want to JSON encode an object, you need to unwrap all > of the NullCoalesce objects because the json module wouldn't know what to do > with them. The process of wrapping and unwrapping makes the resulting code > more verbose than any existing syntax. >> >> How much of the time is a branch of the None check a single fallback value >> or attribute access versus how often a suite of statements within the >> not-None branch? >> >> I definitely check for None very often also. I'm curious what the >> breakdown is in code I work with. > > There's a script in the PEP-505 repo that can you help you identify code > that could be written with the proposed syntax. (It doesn't identify blocks > that would not be affected, so this doesn't completely answer your > question.) > > https://github.com/mehaase/pep-0505/blob/master/find-pep505.py > > The PEP also includes the results of running this script over the standard > library. > > On Sat, Sep 10, 2016 at 1:26 PM, Guido van Rossum wrote: >> >> The way I recall it, we arrived at the perfect syntax (using ?) and >> semantics. The issue was purely strong hesitation about whether >> sprinkling ? all over your code is too ugly for Python, and in the end >> we couldn't get agreement on *that*. Another problem is PEP 505 -- it >> is full of discussion but its specification is unreadable due to the >> author's idea to defer the actual choice of operators and use a >> strange sequence of unicode characters instead. >> >> If someone wants to write a new, *short* PEP that defers to PEP 505 >> for motivation etc. and just writes up the spec for the syntax and >> semantics we'll have a better starting point. IMO the key syntax is >> simply one for accessing attributes returning None instead of raising >> AttributeError, so that e.g. `foo?.bar?.baz` is roughly equivalent to >> `foo.bar.baz if (foo is not None and foo.bar is not None) else None`, >> except evaluating foo and foo.bar only once. >> >> On Sat, Sep 10, 2016 at 10:14 AM, Random832 >> wrote: >> > On Sat, Sep 10, 2016, at 12:48, Stephen J. Turnbull wrote: >> >> I forget if Guido was very sympathetic to null-coalescing operators, >> >> given somebody came up with a good syntax. >> > >> > As I remember the discussion, I thought he'd more or less conceded on >> > the use of ? but there was disagreement on how to implement it that >> > never got resolved. Concerns like, you can't have a?.b return None >> > because then a?.b() isn't callable, unless you want to use a?.b?() for >> > this case, or some people wanted to have "a?" [where a is None] return a >> > magic object whose attribute/call/getitem would give no error, but that >> > would have to keep returning itself and never actually return None for >> > chained operators. >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) From njs at pobox.com Fri Oct 14 12:39:49 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 14 Oct 2016 09:39:49 -0700 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: <110a01d22635$523c9570$f6b5c050$@hotmail.com> References: <110a01d22635$523c9570$f6b5c050$@hotmail.com> Message-ID: On Fri, Oct 14, 2016 at 9:09 AM, wrote: > For all intents and purposes other than debugging C (for cpython, rpython > for pypy, java for jython, .NET for IronPython... you get the idea), the > extra details are unnecessary to debug most problems. Most of the time it > is sufficient to know what major, minor, and patchlevel you are using. You > only really need to know the commit hash and compiler if you are sending a > bug report about the C... and since you know when you are doing that... I > don't think its uncalled for to have the one liner. The compiler information generally reveals the OS as well (if only accidentally), and the OS is often useful information. -n -- Nathaniel J. Smith -- https://vorpus.org From gjcarneiro at gmail.com Fri Oct 14 13:50:27 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 14 Oct 2016 18:50:27 +0100 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: For what it's worth, I like the C# syntax with question marks. It is probably more risky (breaks more code) to introduce a new keyword than a new symbol as operator. If we have to pick a symbol, it's less confusing if we pick something another language already uses. There is no shame in copying from other languages. Many of them copy ideas from Python as well ;-) Thanks. On 14 October 2016 at 17:10, Guido van Rossum wrote: > I actually think the spelling is the main stumbling block. The > intrinsic value of the behavior is clear, it's finding an acceptable > spelling that hold back the proposal. > > I propose that the next phase of the process should be to pick the > best operator for each sub-proposal. Then we can decide which of the > sub-proposals we actually want in the language, based on a combination > of how important the functionality is and how acceptable we find the > spelling. > > --Guido > > On Thu, Oct 13, 2016 at 8:20 PM, Mark E. Haase wrote: > > (Replying to multiple posts in this thread) > > > > Guido van Rossum: > >> > >> Another problem is PEP 505 -- it > >> is full of discussion but its specification is unreadable due to the > >> author's idea to defer the actual choice of operators and use a > >> strange sequence of unicode characters instead. > > > > > > Hi, I wrote PEP-505. I'm sorry that it's unreadable. The choice of emoji > as > > operators was supposed to be a blatant joke. I'd be happy to submit a new > > version that is ASCII. Or make any other changes that would facilitate > > making a decision on the PEP. > > > > As I recall, the thread concluded with Guido writing, "I'll have to think > > about this," or something to that effect. I had hoped that the next step > > could be a survey where we could gauge opinions on the various possible > > spellings. I believe this was how PEP-308 was handled, and that was a > very > > similar proposal to this one. > > > > Most of the discussion on list was really centered around the fact that > > nobody like the proposed ?? or .? spellings, and nobody could see around > > that fact to consider whether the feature itself was intrinsically > valuable. > > (This is why the PEP doesn't commit to a syntax.) Also, as unfortunate > side > > effect of a miscommunication, about 95% of the posts on this PEP were > > written _before_ I submitted a complete draft and so most of the > > conversation was arguing about a straw man. > > > > David Mertz: > >> > >> The idea is that we can easily have both "regular" behavior and None > >> coalescing just by wrapping any objects in a utility class... and > WITHOUT > >> adding ugly syntax. I might have missed some corners where we would > want > >> behavior wrapped, but those shouldn't be that hard to add in principle. > > > > > > The biggest problem with a wrapper in practice is that it has to be > > unwrapped before it can be passed to any other code that doesn't know > how to > > handle it. E.g. if you want to JSON encode an object, you need to unwrap > all > > of the NullCoalesce objects because the json module wouldn't know what > to do > > with them. The process of wrapping and unwrapping makes the resulting > code > > more verbose than any existing syntax. > >> > >> How much of the time is a branch of the None check a single fallback > value > >> or attribute access versus how often a suite of statements within the > >> not-None branch? > >> > >> I definitely check for None very often also. I'm curious what the > >> breakdown is in code I work with. > > > > There's a script in the PEP-505 repo that can you help you identify code > > that could be written with the proposed syntax. (It doesn't identify > blocks > > that would not be affected, so this doesn't completely answer your > > question.) > > > > https://github.com/mehaase/pep-0505/blob/master/find-pep505.py > > > > The PEP also includes the results of running this script over the > standard > > library. > > > > On Sat, Sep 10, 2016 at 1:26 PM, Guido van Rossum > wrote: > >> > >> The way I recall it, we arrived at the perfect syntax (using ?) and > >> semantics. The issue was purely strong hesitation about whether > >> sprinkling ? all over your code is too ugly for Python, and in the end > >> we couldn't get agreement on *that*. Another problem is PEP 505 -- it > >> is full of discussion but its specification is unreadable due to the > >> author's idea to defer the actual choice of operators and use a > >> strange sequence of unicode characters instead. > >> > >> If someone wants to write a new, *short* PEP that defers to PEP 505 > >> for motivation etc. and just writes up the spec for the syntax and > >> semantics we'll have a better starting point. IMO the key syntax is > >> simply one for accessing attributes returning None instead of raising > >> AttributeError, so that e.g. `foo?.bar?.baz` is roughly equivalent to > >> `foo.bar.baz if (foo is not None and foo.bar is not None) else None`, > >> except evaluating foo and foo.bar only once. > >> > >> On Sat, Sep 10, 2016 at 10:14 AM, Random832 > >> wrote: > >> > On Sat, Sep 10, 2016, at 12:48, Stephen J. Turnbull wrote: > >> >> I forget if Guido was very sympathetic to null-coalescing operators, > >> >> given somebody came up with a good syntax. > >> > > >> > As I remember the discussion, I thought he'd more or less conceded on > >> > the use of ? but there was disagreement on how to implement it that > >> > never got resolved. Concerns like, you can't have a?.b return None > >> > because then a?.b() isn't callable, unless you want to use a?.b?() for > >> > this case, or some people wanted to have "a?" [where a is None] > return a > >> > magic object whose attribute/call/getitem would give no error, but > that > >> > would have to keep returning itself and never actually return None for > >> > chained operators. > >> > _______________________________________________ > >> > Python-ideas mailing list > >> > Python-ideas at python.org > >> > https://mail.python.org/mailman/listinfo/python-ideas > >> > Code of Conduct: http://python.org/psf/codeofconduct/ > >> > >> > >> > >> -- > >> --Guido van Rossum (python.org/~guido) > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at realpath.org Fri Oct 14 13:52:20 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Fri, 14 Oct 2016 19:52:20 +0200 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: (Nathaniel Smith's message of "Fri, 14 Oct 2016 09:39:49 -0700") References: <110a01d22635$523c9570$f6b5c050$@hotmail.com> Message-ID: Nathaniel Smith wrote: > The compiler information generally reveals the OS as well (if only > accidentally), and the OS is often useful information. But in which situation would you really need to call Python from outside to find out which OS you're on? Sebastian From rosuav at gmail.com Fri Oct 14 14:21:03 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 15 Oct 2016 05:21:03 +1100 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: <110a01d22635$523c9570$f6b5c050$@hotmail.com> Message-ID: On Sat, Oct 15, 2016 at 4:52 AM, Sebastian Krause wrote: > Nathaniel Smith wrote: >> The compiler information generally reveals the OS as well (if only >> accidentally), and the OS is often useful information. > > But in which situation would you really need to call Python from > outside to find out which OS you're on? It's an easy way to gather info. Example: rosuav at sikorsky:~$ python3 -Wall Python 3.7.0a0 (default:897fe8fa14b5+, Oct 15 2016, 03:27:56) [GCC 6.1.1 20160802] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "C:\Users\Demo" File "", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape >>> "C:\Documents\Demo" sys:1: DeprecationWarning: invalid escape sequence '\D' sys:1: DeprecationWarning: invalid escape sequence '\D' 'C:\\Documents\\Demo' Just by copying and pasting the header, I tell every reader what kind of system I'm running this on. Sure, I could tell you that I'm running Debian Stretch, and I could tell you that I've compiled Python from tip, but the header says all that and in a way that is permanently valid. ChrisA From mahmoud at hatnote.com Fri Oct 14 15:09:32 2016 From: mahmoud at hatnote.com (Mahmoud Hashemi) Date: Fri, 14 Oct 2016 12:09:32 -0700 Subject: [Python-ideas] Add sorted (ordered) containers In-Reply-To: References: <28b36987-3eb2-491f-ac7f-63282644e5e9@googlegroups.com> <852a9619-e69f-42fc-bdce-8a98bad5d4cc@googlegroups.com> Message-ID: I'm all for adding more featureful data structures. At the risk of confusing Nick's folks, I think it's possible to do even better than Sorted/Ordered for many collections. In my experience, the simple Ordered trait alone was not enough of a feature improvement over the simpler builtin, leading me to implement an OrderedMultiDict, for instance. Another, more cogent example would be Boltons' IndexedSet: http://boltons.readthedocs.io/en/latest/setutils.html It's a normal MutableSet, with almost all the same time complexities, except that you can do indexed_set[0] to get the first-added item, etc. Sometimes it helps to think of it as a kind of UniqueList. If we're going for more featureful containers, I say go all-in! Mahmoud On Fri, Oct 14, 2016 at 8:23 AM, Nick Coghlan wrote: > On 14 October 2016 at 06:48, Neil Girdhar wrote: > > Related: > > > > Nick posted an excellent answer to this question here: > > http://stackoverflow.com/questions/5953205/why-are- > there-no-sorted-containers-in-pythons-standard-libraries > > Ah, so this thread is why I've been getting SO notifications for that > answer :) > > While I think that was a decent answer for its time (as standardising > things too early can inhibit community experimentation - there was > almost 12 years between Twisted's first release in 2002 and asyncio's > provisional inclusion in the standard library in Python 3.4), I also > think the broader context has changed enough that the question may be > worth revisiting for Python 3.7 (in particular, the prospect that it > may be possible to provide this efficiently without having to add a > large new chunk of C code to maintain). > > However, given that Grant has already been discussing the possibility > directly with Raymond as the collections module maintainer though, > there's probably not a lot we can add to that discussion here, since > the key trade-off is between: > > - helping folks that actually need a sorted container implementation > find one that works well with typical memory architectures in modern > CPUs > - avoiding confusing folks that *don't* need a sorted container with > yet another group of possible data structures to consider in the > standard library > > *That* part of my original SO answer hasn't changed, it's just not as > clearcut a decision from a maintainability perspective when we're > talking about efficient and relatively easy to explain pure Python > implementations. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Oct 14 20:18:10 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 11:18:10 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> Message-ID: <20161015001809.GJ22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 07:56:29AM -0400, Random832 wrote: > On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote: > > Good luck with that last one. Even if you could convince the Chinese and > > Japanese to swap to ASCII, I'd like to see you pry the emoji out of the > > young folk's phones. > > This is actually probably the one part of this proposal that *is* > feasible. While encoding emoji as a single character each makes sense > for a culture that already uses thousands of characters; before they > existed the English-speaking software industry already had several > competing "standards" emerging for encoding them as sequences of ASCII > characters. It really isn't feasible to use emoticons instead of emoji, not if you're serious about it. To put it bluntly, emoticons are amateur hour. Emoji implemented as dedicated code points are what professionals use. Why do you think phone manufacturers are standardising on dedicated code points instead of using emoticons? Anyone who has every posted (say) source code on IRC, Usenet, email or many web forums has probably seen unexpected smileys in the middle of their code (false positives). That's because some sequence of characters is being wrongly interpreted as an emoticon by the client software. The more emoticons you support, the greater the chance this will happen. A concrete example: bash code in Pidgin (IRC) will often show unwanted smileys. The quality of applications can vary greatly: once the false emoticon is displayed as a graphic, you may not be able to copy the source code containing the graphic and paste it into a text editor unchanged. There are false negatives as well as false positives: if your :-) happens to fall on the boundary of a line, and your software breaks the sequence with a soft line break, instead of seeing the smiley face you expected, you might see a line ending with :- and a new line starting with ). It's hard to use punctuation or brackets around emoticons without risking them being misinterpreted as an invalid or different sequence. If you are serious about offering smileys, snowmen and piles of poo to your users, you are much better off supporting real emoji (dedicated Unicode characters) instead of emoticons. It is much easier to support ? than :-) and you don't need any special software apart from fonts that support the emoji you care about. -- Steve From greg.ewing at canterbury.ac.nz Fri Oct 14 20:42:34 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 15 Oct 2016 13:42:34 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <20161015001809.GJ22471@ando.pearwood.info> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> <20161015001809.GJ22471@ando.pearwood.info> Message-ID: <58017B7A.3000909@canterbury.ac.nz> Steven D'Aprano wrote: > That's because some sequence of characters > is being wrongly interpreted as an emoticon by the client software. The only thing wrong here is that the client software is trying to interpret the emoticons. Emoticons are for *humans* to interpret, not software. Subtlety and cleverness is part of their charm. If you blatantly replace them with explicit images, you crush that. And don't even get me started on *animated* emoji... -- Greg From steve at pearwood.info Fri Oct 14 21:58:07 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 12:58:07 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <01d916de-8b61-c6d1-4efc-649902ce7572@mrabarnett.plus.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161014010407.GG22471@ando.pearwood.info> <01d916de-8b61-c6d1-4efc-649902ce7572@mrabarnett.plus.com> Message-ID: <20161015015806.GK22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 04:18:40AM +0100, MRAB wrote: > On 2016-10-14 02:04, Steven D'Aprano wrote: > >On Thu, Oct 13, 2016 at 08:15:36PM +0200, Martti K?hne wrote: > > > >>Can I fix my name, though? > > > >I don't understand what you mean. Your email address says your name is > >Martti K?hne. Is that incorrect? > > > [snip] > > You wrote "Marttii" and he corrected it when he quoted you in his reply. Ah, so I did! I'm sorry Martti, I read over my comment half a dozen times and couldn't see the doubled "i". My apologies. -- Steven From mehaase at gmail.com Fri Oct 14 23:09:14 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Fri, 14 Oct 2016 23:09:14 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: On Fri, Oct 14, 2016 at 12:10 PM, Guido van Rossum wrote: > I propose that the next phase of the process should be to pick the > best operator for each sub-proposal. Then we can decide which of the > sub-proposals we actually want in the language, based on a combination > of how important the functionality is and how acceptable we find the > spelling. > > --Guido > > I just submitted an updated PEP that removes the emoijs and some other cruft. How I can help with this next phase? Is a survey a good idea or a bad idea? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Oct 14 22:38:08 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 13:38:08 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> Message-ID: <20161015023808.GL22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 05:30:49PM -0400, Random832 wrote: > Frankly, I don't see why the pattern isn't obvious *shrug* Maybe your inability to look past your assumptions and see things from other people's perspective is just as much a blind spot as our inability to see why you think the pattern is obvious. We're *all* having difficulty in seeing things from the other side's perspective here. Let me put it this way: as far as I am concerned, sequence unpacking is equivalent to manually replacing the sequence with its items: t = (1, 2, 3) [100, 200, *t, 300] is equivalent to replacing "*t" with "1, 2, 3", which gives us: [100, 200, 1, 2, 3, 300] That's nice, simple, it makes sense, and it works in sufficiently recent Python versions. It applies to function calls and assignments: func(100, 200, *t) # like func(100, 200, 1, 2, 3) a, b, c, d, e = 100, 200, *t # like a, b, c, d, e = 100, 200, 1, 2, 3 although it doesn't apply when the star is on the left hand side: a, b, *x, e = 1, 2, 3, 4, 5, 6, 7 That requires a different model for starred names, but *that* model is similar to its role in function parameters: def f(*args). But I digress. Now let's apply that same model of "starred expression == expand the sequence in place" to a list comp: iterable = [t] [*t for t in iterable] If you do the same manual replacement, you get: [1, 2, 3 for t in iterable] which isn't legal since it looks like a list display [1, 2, ...] containing invalid syntax. The only way to have this make sense is to use parentheses: [(1, 2, 3) for t in iterable] which turns [*t for t in iterable] into a no-op. Why should the OP's complicated, hard to understand (to many of us) interpretation take precedence over the simple, obvious, easy to understand model of sequence unpacking that I describe here? That's not a rhetorical question. If you have a good answer, please share it. But I strongly believe that on the evidence of this thread, [a, b, *t, d] is easy to explain, teach and understand, while: [*t for t in iterable] will be confusing, hard to teach and understand except as "magic syntax" -- it works because the interpreter says it works, not because it follows from the rules of sequence unpacking or comprehensions. It might as well be spelled: [ MAGIC!!!! HAPPENS!!!! HERE!!!! t for t in iterable] except it is shorter. Of course, ultimately all syntax is "magic", it all needs to be learned. There's nothing about + that inherently means plus. But we should strongly prefer to avoid overloading the same symbol with distinct meanings, and * is one of the most heavily overloaded symbols in Python: - multiplication and exponentiation - wildcard imports - globs, regexes - collect arguments and kwargs - sequence unpacking - collect unused elements from a sequence and maybe more. This will add yet another special meaning: - expand the comprehension ("extend instead of append"). If we're going to get this (possibly useful?) functionality, I'd rather see an explicit flatten() builtin, or see it spelled: [from t for t in sequence] which at least is *obviously* something magical, than yet another magic meaning to the star operator. Its easy to look it up in the docs or google for it, and doesn't look like Perlish line noise. -- Steve From guido at python.org Fri Oct 14 23:36:11 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 14 Oct 2016 20:36:11 -0700 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: I'm not usually swayed by surveys -- Python is not a democracy. Maybe a bunch of longer examples would help all of us see the advantages of the proposals. On Fri, Oct 14, 2016 at 8:09 PM, Mark E. Haase wrote: > On Fri, Oct 14, 2016 at 12:10 PM, Guido van Rossum wrote: >> >> I propose that the next phase of the process should be to pick the >> best operator for each sub-proposal. Then we can decide which of the >> sub-proposals we actually want in the language, based on a combination >> of how important the functionality is and how acceptable we find the >> spelling. >> >> --Guido >> > > I just submitted an updated PEP that removes the emoijs and some other > cruft. > > How I can help with this next phase? Is a survey a good idea or a bad idea? -- --Guido van Rossum (python.org/~guido) From leewangzhong+python at gmail.com Sat Oct 15 00:15:08 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 15 Oct 2016 00:15:08 -0400 Subject: [Python-ideas] Optimizing list.sort() by checking type in advance In-Reply-To: References: <20161011000859.GP22471@ando.pearwood.info> Message-ID: On Mon, Oct 10, 2016 at 11:29 PM, Elliot Gorokhovsky wrote: >> Note that when Python's current sort was adopted in Java, they still kept >> a quicksort variant for "unboxed" builtin types. The adaptive merge sort >> incurs many overheads that often cost more than they save unless comparisons >> are in fact very expensive compared to the cost of pointer copying (and in >> Java comparison of unboxed types is cheap). Indeed, for native numeric >> types, where comparison is dirt cheap, quicksort generally runs faster than >> mergesort despite that the former does _more_ comparisons (because mergesort >> does so much more pointer-copying). > > > Ya, I think this may be a good approach for floats: if the list is all > floats, just copy all the floats into a seperate array, use the standard > library quicksort, and then construct a sorted PyObject* array. Like maybe > set up a struct { PyObject* payload, float key } type of deal. This wouldn't > work for strings (unicode is scary), and probably not for ints (one would > have to check that all the ints are within C long bounds). Though on the > other hand perhaps this would be too expensive? I happened onto a page talking about float radix sort, and thought of this thread. Here it is: http://stereopsis.com/radix.html The author claimed an 8x speedup, though the test was done nearly fifteen years ago. I was unsure about posting publicly, because it's not as if an even faster float sort would help decide whether specialized sorts are worth adding to CPython. I'm posting for history. From ncoghlan at gmail.com Sat Oct 15 00:30:58 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Oct 2016 14:30:58 +1000 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: <110a01d22635$523c9570$f6b5c050$@hotmail.com> Message-ID: On 15 October 2016 at 03:52, Sebastian Krause wrote: > Nathaniel Smith wrote: >> The compiler information generally reveals the OS as well (if only >> accidentally), and the OS is often useful information. > > But in which situation would you really need to call Python from > outside to find out which OS you're on? Folks don't always realise that the nominal version reported by redistributors isn't necessarily exactly the same as the upstream release bearing that version number. This discrepancy is most obvious with LTS Linux releases that don't automatically rebase their supported Python builds to new maintenance releases, and instead selectively backport changes that they or their customers need. This means that it isn't always sufficient to know that someone is running "Python on CentOS 6" (for example) - we sometimes need to know which *build* of Python they're running, as if a problem can't be reproduced with a recent from-source upstream build, it may be due to redistributor specific patches, or it may just be that there's an already implemented fix upstream that the redistributor hasn' t backported yet. So +1 from me for making "python -vV" a shorthand for "python -c 'import sys; print(sys.version)'". Since older versions won't support it, it won't help much in the near term (except as a reminder to ask for "sys.version" in cases where it may be relevant), but it should become a useful support helper given sufficient time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Oct 15 02:10:57 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Oct 2016 16:10:57 +1000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: On 15 October 2016 at 13:36, Guido van Rossum wrote: > I'm not usually swayed by surveys -- Python is not a democracy. Maybe > a bunch of longer examples would help all of us see the advantages of > the proposals. Having been previously somewhere between -1 and -0, I've been doing a lot more data mining and analysis work lately, which has been enough to shift me to at least +0 and potentially even higher when it comes to the utility of adding these operators (more on that below). = Pragmatic aspects = Regarding the spelling details, my current preferences are as follows: * None-coalescing operator: x ?or y * None-severing operator: x ?and y * None-coalescing augmented assignment: x ?= y * None-severing attribute access: x?.attr * None-severing subscript lookup: x?[expr] (The PEP currently only covers the "or?" and "and?" operator spelling suggestions, but the latter three suggestions are the same as those in the current PEP draft) My rationale for this preference is that it means that "?" is consistently a pseudo-operator that accepts an expression on the left and another binary operator (from a carefully restricted subset) on the right, and the combination is a new short-circuiting binary operation based on "LHS is not None". The last three operations can be defined in terms of the first two (with the usual benefit of avoiding repeated evaluation of the subexpression): * None-coalescing augmented assignment: x = x ?or y * None-severing attribute access: x ?and x.attr * None-severing subscript lookup: x ?and x[expr] The first two can then be defined in terms of equivalent if/else statements containing an "x is not None" clause: * None-coalescing operator: x if x is not None else y * None-severing operator: y if x is not None else x Importantly, the normal logical and/or can be expanded in terms of if/else in exactly the same way, only using "bool(x)" instead of "x is not None": * Logical or: x if x else y * Logical and: y if x else x = Language design philosophy aspects = Something I think is missing from the current PEP is a high level explanation of the *developer problem* that these operators solve - while the current PEP points to other languages as precedent, that just prompts the follow on question "Well, why did *they* add them, and does their rationale also apply to Python?". Even the current motivating examples don't really cover this, as they're quite tactical in nature ("Here is how this particular code is improved by the proposed change"), rather than explaining the high level user benefit ("What has changed in the surrounding technology environment that makes us think this is a user experience design problem worth changing the language definition to help address *now* even though Python has lived happily without these operators for 25+ years?") With conditional expressions, we had the clear driver that folks were insisting on using (and teaching!) the "and/or" hack as a workaround, and introducing bugs into their code as a result, whereas we don't have anything that clear-cut for this proposal (using "or" for None-coalescing doesn't seem to be anywhere near as popular as "and/or" used to be as an if/else equivalent). My point of view on that is that one of the biggest computing trends in recent years is the rise of "semi-structured data", where you're passing data around in either JSON-compatible data structures, or comparable structures mapped to instances and attributes, and all signs point to that being a permanent state change in the world of programming rather than merely being a passing fad. The world itself is fuzzy and ambiguous, and learning to work effectively with semi-structured data better reflects that ambiguity rather than forcing a false precision for the sake of code simplification. When you're working in that kind of context, encountering "None" is typically a shorthand for "This entire data subtree is missing, so don't try to do anything with it", but if it *isn't* None, you can safely assume that all the mandatory parts of that data segment will be present (no matter how deeply nested they are). To help explain that, it would be useful to mention not only the corresponding operators in other languages, but also the changes in data storage practices, like PostgreSQL's native support for JSON document storage and querying ( https://www.postgresql.org/docs/9.4/static/functions-json.html ) as well as the emergence/resurgence of hierarchical document storage techniques and new algorithms for working with them. However, it's also the case that where we *do* have a well understood and nicely constrained problem, it's still better to complain loudly when data is unexpectedly missing, rather than subjecting ourselves to the pain of having to deal with detecting problems with our data far away from where we introduced those problems. A *lot* of software still falls into that category, especially custom software written to meet the needs of one particular organisation. My current assumption is that those of us that now regularly need to deal with semi-structured data are thinking "Yes, these additions are obviously beneficial and improve Python's expressiveness, if we can find an acceptable spelling". Meanwhile, folks dealing primarily with entirely structured or entirely unstructured data are scratching their heads and asking "What's the big deal? How could it ever be worth introducing more line noise into the language just to make this kind of code easier to write?" Even the PEP's title is arguably a problem on that front - "None-aware operators" is a proposed *solution* to the problem of making semi-structured data easier to work with in Python, and hence reads like a solution searching for a problem to folks that don't regularly encounter these issues themselves. Framing the problem that way also provides a hint on how we could *document* these operations in the language reference in a readily comprehensible way: "Operators for working with semi-structured data" Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From levkivskyi at gmail.com Sat Oct 15 02:39:05 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sat, 15 Oct 2016 08:39:05 +0200 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: 15 ??? 2016 08:11 "Nick Coghlan" ????: > > On 15 October 2016 at 13:36, Guido van Rossum wrote: > > I'm not usually swayed by surveys -- Python is not a democracy. Maybe > > a bunch of longer examples would help all of us see the advantages of > > the proposals. > > Having been previously somewhere between -1 and -0, I've been doing a > lot more data mining and analysis work lately, which has been enough > to shift me to at least +0 and potentially even higher when it comes > to the utility of adding these operators (more on that below). > It is a real pleasure to read Nick's posts, and here he says _exactly_ what I wanted to say, but in a much clearer way than I could. (Disclaimer: I am working with semi-structured data most of time) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Oct 15 03:34:01 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 15 Oct 2016 10:34:01 +0300 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On 14.10.16 10:40, INADA Naoki wrote: > When reporting issue to some project and want to include > python version in the report, python -V shows very limited information. > > $ ./python.exe -V > Python 3.6.0b2+ > > sys.version is more usable, but it requires one liner. > > $ ./python.exe -c 'import sys; print(sys.version)' > 3.6.0b2+ (3.6:86a1905ea28d+, Oct 13 2016, 17:58:37) > [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] > > How about `python -vV` shows sys.version? Are there precedences of combining verbose and version options in other programs? PyPy just outputs sys.version for the --version option. $ pypy -V Python 2.7.10 (5.4.1+dfsg-1~ppa1~ubuntu16.04, Sep 06 2016, 23:11:39) [PyPy 5.4.1 with GCC 5.4.0 20160609] I think it would not be large breakage if new releases of CPython become outputting extended version information by default. From random832 at fastmail.com Sat Oct 15 04:12:12 2016 From: random832 at fastmail.com (Random832) Date: Sat, 15 Oct 2016 04:12:12 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015023808.GL22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> <20161015023808.GL22471@ando.pearwood.info> Message-ID: <1476519132.435953.756722153.1102EB79@webmail.messagingengine.com> On Fri, Oct 14, 2016, at 22:38, Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 05:30:49PM -0400, Random832 wrote: > > > Frankly, I don't see why the pattern isn't obvious > > *shrug* > > Maybe your inability to look past your assumptions and see things from > other people's perspective is just as much a blind spot as our inability > to see why you think the pattern is obvious. We're *all* having > difficulty in seeing things from the other side's perspective here. > > Let me put it this way: as far as I am concerned, sequence unpacking is > equivalent to manually replacing the sequence with its items: And as far as I am concerned, comprehensions are equivalent to manually creating a sequence/dict/set consisting of repeating the body of the comprehension to the left of "for" with the iteration variable[s] replaced in turn with each actual value. > t = (1, 2, 3) > [100, 200, *t, 300] > > is equivalent to replacing "*t" with "1, 2, 3", which gives us: > > [100, 200, 1, 2, 3, 300] I don't understand why it's not _just as simple_ to say: t = ('abc', 'def', 'ghi') [*x for x in t] is equivalent to replacing "x" in "*x" with, each in turn, 'abc', 'def', and 'ghi', which gives us: [*'abc', *'def', *'ghi'] just like [f(x) for x in t] would give you [f('abc'), f('def'), f('ghi')] > That's nice, simple, it makes sense, and it works in sufficiently recent > Python versions. That last bit is not an argument - every new feature works in sufficiently recent python versions. The only difference for this proposal (provided it is approved) is that the sufficiently recent python versions simply don't exist yet. From steve at pearwood.info Sat Oct 15 04:00:10 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 19:00:10 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> Message-ID: <20161015080009.GM22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 11:32:49PM -0400, Random832 wrote: > On Thu, Oct 13, 2016, at 18:15, Steven D'Aprano wrote: > > Consider the analogy with f(*t), where t = (a, b, c). We *don't* have: > > > > f(*t) is equivalent to f(a); f(b); f(c) > > I don't know where this "analogy" is coming from. I'm explicitly saying that we DON'T have that behaviour with function calls. f(*t) is NOT expanded to f(a), f(b), f(c). I even emphasised the "don't" part of my sentence above. And yet, this proposal wants to expand [*t for t in iterable] into the equivalent of: result = [] for t in iterable: a, b, c = *t result.append(a) result.append(b) result.append(c) Three separate calls to append, analogous to three separate calls to f(). The point I am making is that this proposed change is *not* analogous to the way sequence unpacking works in other contexts. I'm sorry if I wasn't clear enough. [...] > > Indeed. The reader may be forgiven for thinking that this is yet another > > unrelated and arbitrary use of * to join the many other uses: > > How is it arbitrary? It is arbitrary because the suggested use of *t in list comprehensions has no analogy to the use of *t in other contexts. As far as I can see, this is not equivalent to the way sequence (un)packing works on *either* side of assignment. It's not equivalent to the way sequence unpacking works in function calls, or in list displays. It's this magical syntax which turns a virtual append() into extend(): # [t for t in iterable] result = [] for t in iterable: result.append(t) # but [*t for t in iterable] result = [] for t in iterable: result.extend(t) or, if you prefer, keep the append but magical add an extra for-loop: # [*t for t in iterable] result = [] for t in iterable: for x in t: result.append(x) > > - mathematical operator; > > - glob and regex wild-card; > > - unpacking; > > This is unpacking. It unpacks the results into the destination. If it were unpacking as it is understood today, with no other changes, it would be a no-op. (To be technical, it would convert whatever iterable t is into a tuple.) I've covered that in an earlier post: if you replace *t with the actual items of t, you DON'T get: result = [] for t in iterable: a, b, c = *t # assuming t has three items, as per above result.append(a) result.append(b) result.append(c) as desired, but: result = [] for t in iterable: a, b, c = *t result.append((a, b, c)) which might as well be a no-op. To make this work, the "unpacking operator" needs to do more than just unpack. It has to either change append into extend, or equivalently, add an extra for loop into the list comprehension. > There's a straight line from [*t, *u, *v] to [*x for x in (t, u, v)]. > What's surprising is that it doesn't work now. I'm not surprised that it doesn't work. I expected that it wouldn't work. When I first saw the suggestion, I thought "That can't possibly be meaningful, it should be an error." Honestly Random832, I cannot comprehend how you see this as a straightforward obvious extension from existing behaviour. To me, this is nothing like the existing behaviour, and it contradicts the way sequence unpacking works everywhere else. I do not understand the reasoning you use to conclude that this is a straight-line extension to the current behaviour. Nothing I have seen in any of this discussion justifies that claim to me. I don't know what you are seeing that I cannot see. My opinion is that you're seeing things that aren't there -- I expect that your opinion is that I'm blind. > I think last month we even had someone who didn't know about 'yield > from' propose 'yield *x' for exactly this feature. It is intuitive - it > is a straight-line extension of the unpacking syntax. Except for all the folks who have repeatedly said that it is counter-intuitive, that it is a twisty, unexpected, confusing path from the existing behaviour to this proposal. -- Steve From steve at pearwood.info Sat Oct 15 04:09:58 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 19:09:58 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <58008958.403@canterbury.ac.nz> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> Message-ID: <20161015080958.GN22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 08:29:28PM +1300, Greg Ewing wrote: > Steven D'Aprano wrote: > > So why would yield *t give us this? > > > > yield a; yield b; yield c > > > >By analogy with the function call syntax, it should mean: > > > > yield (a, b, c) > > This is a false analogy, because yield is not a function. Neither are list comprehensions or sequence unpacking in the context of assignment: a, b, c = *t Not everything is a function. What's your point? As far as I can see, in *every* other use of sequence unpacking, *t is conceptually replaced by a comma-separated sequence of items from t. If the starred item is on the left-hand side of the = sign, we might call it "sequence packing" rather than unpacking, and it operates to collect unused items, just like *args does in function parameter lists. Neither of these are even close to what the proposed [*t for t in iterable] will do. > >>However, consider the following spelling: > >> > >> l = [from f(t) for t in iterable] > > That sentence no verb! > > In English, 'from' is a preposition, so one expects there > to be a verb associated with it somewhere. We currently > have 'from ... import' and 'yield from'. > > But 'from f(t) for t in iterable' ... do what? *shrug* I'm not married to this suggestion. It could be written [MAGIC!!! HAPPENS!!! HERE!!! t for t in iterable] if you prefer. The suggestion to use "from" came from Sjoerd Job Postmus, not me. -- Steve From steve at pearwood.info Sat Oct 15 04:18:21 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 19:18:21 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <580075D4.9050807@canterbury.ac.nz> <20161014073311.GG13170@sjoerdjob.com> Message-ID: <20161015081821.GO22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 07:51:18AM +0000, Neil Girdhar wrote: > Here's an interesting idea regarding yield **x: > > Right now a function containing any yield returns a generator. Therefore, > it works like a generator expression, which is the lazy version of a list > display. lists can only contain elements x and unpackings *x. Therefore, > it would make sense to only have "yield x" and "yield *xs" (currently > spelled "yield from xs") No, there's no "therefore" about it. "yield from x" is not the same as "yield *x". *x is conceptually equivalent to replacing "*x" with a comma-separated sequence of individual items from x. Given x = (1, 2, 3): f(*x) is like f(1, 2, 3) [100, 200, *x, 300] is like [100, 200, 1, 2, 3, 300] a, b, c, d = 100, *x is like a, b, c, d = 100, 1, 2, 3 Now replace "yield *x" with "yield 1, 2, 3". Conveniently, that syntax already works: py> def gen(): ... yield 1, 2, 3 ... py> it = gen() py> next(it) (1, 2, 3) "yield *x" should not be the same as "yield from x". Yielding a starred expression currently isn't allowed, but if it were allowed, it would be pointless: it would be the same as unpacking x, then repacking it into a tuple. Either that, or we would have yet another special meaning for * unrelated to the existing meanings. -- Steve From random832 at fastmail.com Sat Oct 15 04:42:13 2016 From: random832 at fastmail.com (Random832) Date: Sat, 15 Oct 2016 04:42:13 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015080009.GM22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> Message-ID: <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> On Sat, Oct 15, 2016, at 04:00, Steven D'Aprano wrote: > > This is unpacking. It unpacks the results into the destination. > > If it were unpacking as it is understood today, with no other changes, > it would be a no-op. (To be technical, it would convert whatever > iterable t is into a tuple.) If that were true, it would be a no-op everywhere. > I've covered that in an earlier post: if > you replace *t with the actual items of t, you DON'T get: Replacing it _with the items_ is not the same thing as replacing it _with a sequence containing the items_, and you're trying to pull a fast one by claiming it is by using the fact that the "equivalent loop" (which is and has always been a mere fiction, not a real transformation actually performed by the interpreter) happens to use a sequence of tokens that would cause a tuple to be created if a comma appears in the relevant position. > To make this work, the "unpacking operator" needs to do more than just > unpack. It has to either change append into extend, Yes, that's what unpacking does. In every context where unpacking means anything at all, it does something to arrange for the sequence's elements to be included "unbracketed" in the context it's being ultimately used in. It's no different from changing BUILD_LIST (equivalent to creating an empty list and appending each item) to BUILD_LIST_UNPACK (equivalent to creating an empty list and extending with each item). Imagine that we were talking about ordinary list displays, and for some reason had developed a tradition of explaining them in terms of "equivalent" code the way we do for comprehensions. x = [a, b, c] is equivalent to: x = list() x.append(a) x.append(b) x.append(c) So now if we replace c with *c [where c == [d, e]], must we now say this? x = list() x.append(a) x.append(b) x.append(d, e) Well, that's just not valid at all. Clearly we must reject this ridiculous notion of allowing starred expressions within list displays, because we _can't possibly_ change the transformation to accommodate the new feature. From steve at pearwood.info Sat Oct 15 04:36:30 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 19:36:30 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <58006BD4.2000109@canterbury.ac.nz> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <58006BD4.2000109@canterbury.ac.nz> Message-ID: <20161015083629.GP22471@ando.pearwood.info> On Fri, Oct 14, 2016 at 06:23:32PM +1300, Greg Ewing wrote: > To maintain the identity > > list(*x for x in y) == [*x for x in y] > > it would be necessary for the *x in (*x for x in y) to expand > to "yield from x". Oh man, you're not even trying to be persuasive any more. You're just assuming the result that you want, then declaring that it is "necessary". :-( I have a counter proposal: suppose *x is expanded to the string literal "Nope!". Then, given y = (1, 2, 3) (say): list(*x for x in y) gives ["Nope!", "Nope!", "Nope!"], and [*x for x in y] also gives ["Nope!", "Nope!", "Nope!"]. Thus the identity is kept, and your claim of "necessity" is disproven. We already know what *x should expand to: nearly everywhere else, *x is conceptually replaced by a comma-separated sequence of the items of x. That applies to function calls, sequence unpacking and list displays. The only exceptions I can think of are *args parameters in function parameter lists, and sequence packing on the left side of an assignment, both of which work in similar fashions. But not this proposal: it wouldn't work like either of the above, hence it would be yet another unrelated use of the * operator for some special meaning. -- Steve From steve at pearwood.info Sat Oct 15 04:53:37 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 19:53:37 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> Message-ID: <20161015085337.GQ22471@ando.pearwood.info> On Thu, Oct 13, 2016 at 01:30:45PM -0700, Neil Girdhar wrote: > From a CPython implementation standpoint, we specifically blocked this code > path, and it is only a matter of unblocking it if we want to support this. I find that difficult to believe. The suggested change seems like it should be much bigger than just removing a block. Can you point us to the relevant code? In any case, it isn't really the difficulty of implementation that is being questioned. Many things are easy to implement, but we still don't do them. The real questions here are: (1) Should we overload list comprehensions as sugar for a flatten() function? (2) If so, should we spell that [*t for t in iterable]? Actually the answer to (1) should be "we already do". We just spell it: [x for t in iterable for x in t] -- Steve From steve at pearwood.info Sat Oct 15 05:01:40 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 20:01:40 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <58017B7A.3000909@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <20161014055448.GI22471@ando.pearwood.info> <1476446189.3360709.755895993.1AF2087A@webmail.messagingengine.com> <20161015001809.GJ22471@ando.pearwood.info> <58017B7A.3000909@canterbury.ac.nz> Message-ID: <20161015090140.GR22471@ando.pearwood.info> On Sat, Oct 15, 2016 at 01:42:34PM +1300, Greg Ewing wrote: > Steven D'Aprano wrote: > >That's because some sequence of characters > >is being wrongly interpreted as an emoticon by the client software. > > The only thing wrong here is that the client software > is trying to interpret the emoticons. > > Emoticons are for *humans* to interpret, not software. > Subtlety and cleverness is part of their charm. If you > blatantly replace them with explicit images, you crush > that. Heh :-) I agree with you. But so long as people want, or at least phone and software developers think people want, graphical smiley faces and dancing paperclips and piles of poo, then emoticons are a distictly more troublesome way of dealing with them. -- Steve From mar77i at mar77i.ch Sat Oct 15 05:55:52 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Sat, 15 Oct 2016 11:55:52 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension Message-ID: On Sat, Oct 15, 2016 at 10:09 AM, Steven D'Aprano wrote: > Not everything is a function. What's your point? > > As far as I can see, in *every* other use of sequence unpacking, *t is > conceptually replaced by a comma-separated sequence of items from t. If > the starred item is on the left-hand side of the = sign, we might call > it "sequence packing" rather than unpacking, and it operates to collect > unused items, just like *args does in function parameter lists. > You brush over the fact that *t is not limited to a replacement by a comma-separated sequence of items from t, but *t is actually a replacement by that comma-separated sequence of items from t INTO an external context. For func(*t) to work, all the elements of t are kind of "leaked externally" into the function argument list's context, and for {**{'a': 1, 'b': 2, ...}} the inner dictionary's items are kind of "leaked externally" into the outer's context. You can think of the */** operators as a promotion from append to extend, but another way to see this is as a promotion from yield to yield from. So if you want to instead of append items to a comprehension, as is done with [yield_me for yield_me in iterator], you can see this new piece as a means to [*yield_from_me for yield_from_me in iterator]. FWIW, I think it's a bit confusing that yield needs a different keyword if these asterisk operators already have this outspoken promotion effect. Besides, [*thing for thing in iterable_of_iters if cond] has this cool potential for the existing any() and all() builtins for cond, where a decision can be made based on the composition of the in itself iterable thing. cheers! mar77i From mar77i at mar77i.ch Sat Oct 15 05:25:01 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Sat, 15 Oct 2016 11:25:01 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 10:09 AM, Steven D'Aprano wrote: > Not everything is a function. What's your point? > > As far as I can see, in *every* other use of sequence unpacking, *t is > conceptually replaced by a comma-separated sequence of items from t. If > the starred item is on the left-hand side of the = sign, we might call > it "sequence packing" rather than unpacking, and it operates to collect > unused items, just like *args does in function parameter lists. > You brush over the fact that *t is not limited to a replacement by a comma-separated sequence of items from t, but *t is actually a replacement by that comma-separated sequence of items from t INTO an external context. For func(*t) to work, all the elements of t are kind of "leaked externally" into the function argument list's context, and for {**{'a': 1, 'b': 2, ...}} the inner dictionary's items are kind of "leaked externally" into the outer's context. You can think of the */** operators as a promotion from append to extend, but another way to see this is as a promotion from yield to yield from. So if you want to instead of append items to a comprehension, as is done with [yield_me for yield_me in iterator], you can see this new piece as a means to [*yield_from_me for yield_from_me in iterator]. Therefore I think it's a bit confusing that yield needs a different keyword if these asterisk operators already have this intuitive promotion effect. Besides, [*thing for thing in iterable_of_iters if cond] has this cool potential for the existing any() and all() builtins for cond, where a decision can be made based on the composition of the in itself iterable thing. cheers! mar77i From srkunze at mail.de Sat Oct 15 06:20:36 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 15 Oct 2016 12:20:36 +0200 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: <4fdd52d8-aa6c-0e54-65bc-7457d5225a91@mail.de> On 15.10.2016 08:10, Nick Coghlan wrote: > However, it's also the case that where we *do* have a well understood > and nicely constrained problem, it's still better to complain loudly > when data is unexpectedly missing, rather than subjecting ourselves to > the pain of having to deal with detecting problems with our data far > away from where we introduced those problems. A *lot* of software > still falls into that category, especially custom software written to > meet the needs of one particular organisation. Definitely true. Stricter rules are similar to "fail early", "no errors should pass silently" and the like. This stance is conveyed by Python as long as I know it. > My current assumption is that those of us that now regularly need to > deal with semi-structured data are thinking "Yes, these additions are > obviously beneficial and improve Python's expressiveness, if we can > find an acceptable spelling". Meanwhile, folks dealing primarily with > entirely structured or entirely unstructured data are scratching their > heads and asking "What's the big deal? How could it ever be worth > introducing more line noise into the language just to make this kind > of code easier to write?" That's where I like to see a common middle ground between those two sides of the table. I need to work with both sides for years now. In my experience, it's best to avoid semi-structured data at all to keep the code simple. As we all know and as you described, the world isn't perfect and I can only agree. However, what served us best in recent years, is to keep the "semi-" out of the inner workings of our codebase. So, handling "semi-" at the system boundary proved to be a reliable way of not breaking everything and of keeping our devs sane. I am unsure how to implement such solution, whether via PEP8 or via the proposal's PEP. It somehow reminds me of the sans-IO idea where the core logic should be simple/linear code and the difficult/problematic issues are solved at the systems boundary. This said, let me put it differently by using an example. I can find None-aware operators very useful at the outermost function/methods of a process/library/class/module: class FanzyTool: def __init__(self, option1=None, option2=None, ...): # what happens when option6 and option7 are None # and it only matters when option 3 is not None # but when ... Internal function/methods/modules/classes and even processes/threads should have a clear, non-wishy-washy way of input and output (last but not least also to do unit-testing on relatively sane level). def _append_x(self, s): return s + 'x' # strawman operation Imagine, that s is something important to be passed around many times inside of "FanzyTool". The whole process usually makes no sense at all, when s is None. And having each internal method checking for None is getting messy fast. I hope we can also convey this issue properly when we find an appropriate syntax. > Even the PEP's title is arguably a problem on that front - "None-aware > operators" is a proposed *solution* to the problem of making > semi-structured data easier to work with in Python, and hence reads > like a solution searching for a problem to folks that don't regularly > encounter these issues themselves. > > Framing the problem that way also provides a hint on how we could > *document* these operations in the language reference in a readily > comprehensible way: "Operators for working with semi-structured data" That's indeed an extraordinarily good title as it describes best what we intend it to be used for (common usage scenarios). +1 Regards, Sven From greg.ewing at canterbury.ac.nz Sat Oct 15 06:29:21 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 15 Oct 2016 23:29:21 +1300 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015023808.GL22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <1476394249.616618.755305009.1D45836F@webmail.messagingengine.com> <20161015023808.GL22471@ando.pearwood.info> Message-ID: <58020501.80403@canterbury.ac.nz> Steven D'Aprano wrote: > t = (1, 2, 3) > iterable = [t] > [*t for t in iterable] > > If you do the same manual replacement, you get: > > [1, 2, 3 for t in iterable] Um, no, you need to also *remove the for loop*, otherwise you get complete nonsense, whether * is used or not. Let's try a less degenerate example, both ways. iterable = [1, 2, 3] [t for t in iterable] To expand that, we replace t with each of the values generated by the loop and put commas between them: [1, 2, 3] Now with the star: iterable = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] [*t for t in iterable] Replace *t with each of the sequence generated by the loop, with commas between: [1,2,3 , 4,5,6 , 7,8,9] > Maybe your inability to look past your assumptions and see things from > other people's perspective is just as much a blind spot as our inability > to see why you think the pattern is obvious. It's obvious that you're having difficulty seeing what we're seeing, but I don't know how to explain it any more clearly, I'm sorry. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 15 06:36:28 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 15 Oct 2016 23:36:28 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> Message-ID: <580206AC.1060203@canterbury.ac.nz> Martti K?hne wrote: > You brush over the fact that *t is not limited to a replacement by a > comma-separated sequence of items from t, but *t is actually a > replacement by that comma-separated sequence of items from t INTO an > external context. Indeed. In situations where there isn't any context for the interpretation of *, it's not allowed. For example: >>> x = *(1, 2, 3) File "", line 1 SyntaxError: can't use starred expression here But >>> x = 1, *(2, 3) >>> x (1, 2, 3) The * is allowed there because it's already in a context where a comma-separated list has meaning. -- Greg From steve at pearwood.info Sat Oct 15 06:38:17 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 21:38:17 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> Message-ID: <20161015103815.GS22471@ando.pearwood.info> On Sat, Oct 15, 2016 at 04:42:13AM -0400, Random832 wrote: > On Sat, Oct 15, 2016, at 04:00, Steven D'Aprano wrote: > > > This is unpacking. It unpacks the results into the destination. > > > > If it were unpacking as it is understood today, with no other changes, > > it would be a no-op. (To be technical, it would convert whatever > > iterable t is into a tuple.) > > If that were true, it would be a no-op everywhere. That's clearly not the case. x = (1, 2, 3) [100, 200, *x, 300] If you do it the way I say, and replace *x with the individual items of x, you get this: [100, 200, 1, 2, 3, 300] which conveniently happens to be what you already get in Python. You claim that if I were write it should be a no-op -- that doesn't follow. Why would it be a no-op? I've repeatedly shown the transformation to use, and it clearly does what I say it should. How could it not? > > I've covered that in an earlier post: if > > you replace *t with the actual items of t, you DON'T get: > > Replacing it _with the items_ is not the same thing as replacing it > _with a sequence containing the items_, I don't think I ever used the phrasing "a sequence containing the items". I think that's *your* phrase, not mine. I may have said "with the sequence of items" or words to that effect. These two phrases do have different meanings: x = (1, 2, 3) [100, 200, *x, 300] # Replace *x with "a sequence containing items of x" [100, 200, [1, 2, 3], 300] # Replace *x with "the sequence of items of x" [100, 200, 1, 2, 3, 300] Clearly they have different meanings. I'm confident that I've always made it clear that I'm referring to the second, not the first, but I'm only human and if I've been unclear or used the wrong phrasing, my apologies. But nit-picking about the exact phrasing used aside, it is clear that expanding the *t in a list comprehension: [*t for t in iterable] to flatten the iterable cannot be analogous to this. Whatever explanation you give for why *t expands the list comprehension, it cannot be given in terms of replacing *t with the items of t. There has to be some magic to give it the desired special behaviour. > and you're trying to pull a fast > one by claiming it is by using the fact that the "equivalent loop" > (which is and has always been a mere fiction, not a real transformation > actually performed by the interpreter) happens to use a sequence of > tokens that would cause a tuple to be created if a comma appears in the > relevant position. I don't know what "equivalent loop" you are accusing me of misusing. The core developers have made it absolutely clear that changing the fundamental equivalence of list comps as syntactic sugar for: result = [] for t in iterable: result.append(t) is NOT NEGOTIABLE. (That is much to my disappointment -- I would love to introduce a "while" version of list comps to match the "if" version, but that's not an option.) So regardless of whether it is a fiction or an absolute loop, Python's list comprehensions are categorically limited to behave equivalently to the loop above (modulo scope, temporary variables, etc). If you want to change that -- change the append to an extend, for example -- you need to make a case for that change which is strong enough to overcome Guido's ruling. (Maybe Guido will be willing to bend his ruling to allow extend as well.) There are three ways to get the desired flatten() behaviour from a list comp. One way is to explicitly add a second loop, which has the benefit of already working: [x for t in iterable for x in t] Another is to swap out the append for an extend: [*t for t in iterable] # real or virtual transformation, it doesn't matter result = [] for t in iterable: result.extend(t) And the third is to keep the append but insert an extra virtual loop: # real or virtual transformation, it still doesn't matter result = [] for t in iterable: for x in t: result.append(x) Neither of the second or third suggestions match the equivalent loop form given above. Neither the second nor third is an obvious extension of the way sequence unpacking works in other contexts. [...] > Imagine that we were talking about ordinary list displays, and for some > reason had developed a tradition of explaining them in terms of > "equivalent" code the way we do for comprehensions. > > x = [a, b, c] is equivalent to: > x = list() > x.append(a) > x.append(b) > x.append(c) > > So now if we replace c with *c [where c == [d, e]], must we now say > this? > x = list() > x.append(a) > x.append(b) > x.append(d, e) > > Well, that's just not valid at all. Right. And if we had a tradition of saying that list displays MUST be equivalent to the unpacked sequence of appends, then sequence unpacking inside a list display would be prohibited. But we have no such tradition, and sequence unpacking inside the list really is an obvious straight line extrapolation from (say) sequence unpacking inside a function call. Fortunately, we have a *different* tradition when it comes to list displays, and no ruling that *c must turn into append with multiple arguments. Our explanation of [a, b, *c] occurs at an earlier level: replace the *c with the items of c: c = [d, e] [a, b, *c] ==> [a, b, d, e] And there is no problem. Unfortuantely for you, none of this is the case for list comps. We DO have a tradition and a BDFL ruling that list comps are strictly equivalent to a loop with append. And the transformation of *t for the items of t (I don't care if it is a real transformation in the implementation, or only a fictional transformation) cannot work in a list comp. Let's make the number of items of t explicit so we don't have to worry about variable item counts: [*t for t in iterable] # t has three items [a, b, c for (a, b, c) in iterable] That's a syntax error. To avoid the syntax error, we need parentheses: [(a, b, c) for (a, b, c) in iterable] and that's a no-op. So we're back to my first response to this thread: why on earth would you expect *t in a list comprehension to flatten the iterable? It should be either an error, or a no-op. > Clearly we must reject this > ridiculous notion of allowing starred expressions within list displays, > because we _can't possibly_ change the transformation to accommodate the > new feature. Of course we can. I've repeatedly said we can do anything we want. If we want, we can have *t in a list comprehension be sugar for importing the sys module, or erasing your home directory. What we can't say is that "erasing your home directory" is an obvious straight-line extrapolation from existing uses of the star operator. There's nothing obvious here: this thread is proof that whatever connection (if any) between the two is non-obvious, twisted, even strange and bizarre. I have never argued against this suggested functionality: flattening iterables is obviously a useful thing to do. But: - we can already use a list comp to flatten: [x for t in iterable for x in t] - there's no obvious or clear connection between the *t in the suggested syntax and existing uses of the star operator; it might as well be spelled [magic!!!! t for t in iterable] for all the relevance sequence unpacking has; - if anyone can explain the connection they see, I'm listening; (believe me, I am *trying to understand* -- but none of the given explanations for a connection hold up as far as I am concerned) - even if we agree that there is a connection, this thread is categorical proof that it is not obvious: it has taken DOZENS of emails to (allegedly) get the message across; - if we do get syntactic sugar for flatten(), why does it have to overload the star operator for yet another meaning? Hence my earlier questions: do we really need this, and if so, does it have to be spelled *t? Neither of those questions are obviously answered with a "Yes". -- Steve From steve at pearwood.info Sat Oct 15 06:48:40 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Oct 2016 21:48:40 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <580206AC.1060203@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> Message-ID: <20161015104839.GT22471@ando.pearwood.info> On Sat, Oct 15, 2016 at 11:36:28PM +1300, Greg Ewing wrote: > Indeed. In situations where there isn't any context for > the interpretation of *, it's not allowed. You mean like in list comprehensions? Are you now supporting my argument that starring the list comprehension expression isn't meaningful? Not if star is defined as sequence unpacking in the usual way. If you want to invent a new meaning for * to make this work, to join all the other special case magic meanings for the * symbol, that's another story. > For example: > > >>> x = *(1, 2, 3) > File "", line 1 > SyntaxError: can't use starred expression here > > But > > >>> x = 1, *(2, 3) > >>> x > (1, 2, 3) > > The * is allowed there because it's already in a context > where a comma-separated list has meaning. Oh look, just like now: py> iterable = [(1, 'a'), (2, 'b')] py> [(100, *t) for t in iterable] [(100, 1, 'a'), (100, 2, 'b')] Hands up anyone who expected to flatten the iterable and get [100, 1, 'a', 100, 2, 'b'] instead? Anyone? No? Take out the (100, ...) and just leave the *t, and why should it be different? It is my position that: (quote) there isn't any context for the interpretation of * for [*t for t in iterable]. Writing that is the list comp equivalent of writing x = *t. -- Steve From mal at egenix.com Sat Oct 15 07:50:14 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 15 Oct 2016 13:50:14 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <20161013142551.GZ22471@ando.pearwood.info> Message-ID: <580217F6.8010808@egenix.com> On 14.10.2016 10:26, Serhiy Storchaka wrote: > On 13.10.16 17:50, Chris Angelico wrote: >> Solution: Abolish most of the control characters. Let's define a brand >> new character encoding with no "alphabetical garbage". These >> characters will be sufficient for everyone: >> >> * [2] Formatting characters: space, newline. Everything else can go. >> * [8] Digits: 01234567 >> * [26] Lower case Latin letters a-z >> * [2] Vital social media characters: # (now officially called >> "HASHTAG"), @ >> * [2] Can't-type-URLs-without-them: colon, slash (now called both >> "SLASH" and "BACKSLASH") >> >> That's 40 characters that should cover all the important things anyone >> does - namely, Twitter, Facebook, and email. We don't need punctuation >> or capitalization, as they're dying arts and just make you look >> pretentious. > > https://en.wikipedia.org/wiki/DEC_Radix-50 And then we store Python identifiers in a single 64-bit word, allow at most 20 chars per identifier and use the remaining bits for cool things like type information :-) Not a bad idea, really. But then again: even microbits support Unicode these days, so apparently there isn't much need for such memory footprint optimizations anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 15 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From mar77i at mar77i.ch Sat Oct 15 07:58:18 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Sat, 15 Oct 2016 13:58:18 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015104839.GT22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 12:48 PM, Steven D'Aprano wrote: > Oh look, just like now: > > py> iterable = [(1, 'a'), (2, 'b')] > py> [(100, *t) for t in iterable] > [(100, 1, 'a'), (100, 2, 'b')] > > Hands up anyone who expected to flatten the iterable and get > > [100, 1, 'a', 100, 2, 'b'] > > instead? Anyone? No? > I don't know whether that should be provocating or beside the poinnt. It's probably both. You're putting two expectations on the same example: first, you make the reasonable expectation that results in [(100, 1, 'a'), (100, 2, 'b')], and then you ask whether anyone expected [100, 1, 'a', 100, 2, 'b'], but don't add or remove anything from the same example. Did you forget to put a second example using the new notation in there? Then you'd have to spell it out and start out with [*(100, *t) for t in iterable]. And then you can ask who expected [100, 1, 'a', 100, 2, 'b']. Which is what this thread is all about. cheers! mar77i From jcrmatos at gmail.com Sat Oct 15 08:46:56 2016 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Matos?=) Date: Sat, 15 Oct 2016 05:46:56 -0700 (PDT) Subject: [Python-ideas] Exception to the closing brace/bracket/parenthesis indentation for multi-line constructs rule in PEP8 for multi-line function definitions Message-ID: Hello, In the Code lay-out\Indentation section of PEP8 it is stated that " The closing brace/bracket/parenthesis on multi-line constructs may either line up under the first non-whitespace character of the last line of list, as in: my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', ) or it may be lined up under the first character of the line that starts the multi-line construct, as in: my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', ) " however, right before that location, there are several examples that do not comply, like these: " # Aligned with opening delimiter. foo = long_function_name(var_one, var_two, var_three, var_four) # More indentation included to distinguish this from the rest. def long_function_name( var_one, var_two, var_three, var_four): print(var_one) # Hanging indents should add a level. foo = long_function_name( var_one, var_two, var_three, var_four) " That should be corrected but it isn't the main point of this topic. Assuming that a multi-line function definition is considered a multi-line construct, I would like to propose an exception to the closing brace/bracket/parenthesis indentation for multi-line constructs rule in PEP8. I my view all multi-line function definitions should only be allowed options "usual" and "acceptable" shown below, due to better readability. I present 3 examples (usual, acceptable, horrible) where only the last 2 comply with the current existing rule: def do_something(parameter_one, parameter_two, parameter_three, parameter_four, parameter_five, parameter_six, parameter_seven, last_parameter): """Do something.""" pass def do_something(parameter_one, parameter_two, parameter_three, parameter_four, parameter_five, parameter_six, parameter_seven, last_parameter ): """Do something.""" pass def do_something(parameter_one, parameter_two, parameter_three, parameter_four, parameter_five, parameter_six, parameter_seven, last_parameter ): """Do something.""" pass The same 3 examples in the new 3.5 typing style: def do_something(parameter_one: List[str], parameter_two: List[str], parameter_three: List[str], parameter_four: List[str], parameter_five: List[str], parameter_six: List[str], parameter_seven: List[str], last_parameter: List[str]) -> bool: """Do something.""" pass def do_something(parameter_one: List[str], parameter_two: List[str], parameter_three: List[str], parameter_four: List[str], parameter_five: List[str], parameter_six: List[str], parameter_seven: List[str], last_parameter: List[str] ) -> bool: """Do something.""" pass def do_something(parameter_one: List[str], parameter_two: List[str], parameter_three: List[str], parameter_four: List[str], parameter_five: List[str], parameter_six: List[str], parameter_seven: List[str], last_parameter: List[str] ) -> bool: """Do something.""" pass Best regards, JM -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Oct 15 09:06:16 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 15 Oct 2016 14:06:16 +0100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <5800810F.5080200@canterbury.ac.nz> Message-ID: On 14 October 2016 at 10:48, Paul Moore wrote: > On 14 October 2016 at 07:54, Greg Ewing wrote: >>> I think it's probably time for someone to >>> describe the precise syntax (as BNF, like the syntax in the Python >>> docs at >>> https://docs.python.org/3.6/reference/expressions.html#displays-for-lists-sets-and-dictionaries >> >> >> Replace >> >> comprehension ::= expression comp_for >> >> with >> >> comprehension ::= (expression | "*" expression) comp_for >> >>> and semantics (as an explanation of how to >>> rewrite any syntactically valid display as a loop). >> >> >> The expansion of the "*" case is the same as currently except >> that 'append' is replaced by 'extend' in a list comprehension, >> 'yield' is replaced by 'yield from' in a generator >> comprehension. [...] > So now I understand what's being proposed, which is good. I don't > (personally) find it very intuitive, although I'm completely capable > of using the rules given to establish what it means. In practical > terms, I'd be unlikely to use or recommend it - not because of > anything specific about the proposal, just because it's "confusing". I > would say the same about [(x, *y, z) for ...]. Thinking some more about this, is it not true that [ *expression for var in iterable ] is the same as [ x for var in iterable for x in expression ] ? If so, then this proposal adds no new expressiveness, merely a certain amount of "compactness". Which isn't necessarily a bad thing, but it's clearly controversial whether the compact version is more readable / "intuitive" in this case. Given the lack of any clear improvement, I'd be inclined to think that "explicit is better than implicit" applies here, and reject the new proposal. Paul. From mikhailwas at gmail.com Sat Oct 15 09:06:48 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sat, 15 Oct 2016 15:06:48 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <5800A723.9050806@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> Message-ID: On 14 October 2016 at 11:36, Greg Ewing wrote: >but bash wasn't designed for that. >(The fact that some people use it that way says more >about their dogged persistence in the face of >adversity than it does about bash.) I can not judge what bash is good for, since I never tried to learn it. But it *looks* indeed frightening. First feeling is OMG, I must close this and never see again. Also I can only hard imagine that special purpose of some language can ignore readability, even if it is assembler or whatever, it can be made readable without much effort. So I just look for some other solution for same task, let it be 10 times more code. > So for that > person, using decimal would make the code *harder* > to maintain. > To a maintainer who doesn't have that familiarity, > it makes no difference either way. That is because that person from beginning (blindly) follows the convention. So my intention of course was not to find out if the majority does or not, but rather which one of two makes more sence *initially*, just trying to imagine that we can decide. To be more precise, if you were to choose between two options: 1. use hex for the glyph index and use hex for numbers (e.g. some arbitrary value like screen coordinates) 2. use decimal for both cases. I personally choose option 2. Probably nothing will convince me that option 1. will be better, all the more I don't believe that anything more than base-8 makes much sense for readable numbers. Just little bit dissapointed that others again and again speak of convention. >I just >don't see this as being anywhere near being a >significant problem. I didn't mean that, it is just slightly annoys me. >> In standard ASCII >> there are enough glyphs that would work way better >> together, >Out of curiosity, what glyphs do you have in mind? If I were to decide, I would look into few options here: 1. Easy option which would raise less further questions is to take 16 first lowercase letters. 2. Better option would be to choose letters and possibly other glyphs to build up a more readable set. E.g. drop "c" letter and leave "e" due to their optical collision, drop some other weak glyphs, like "l" "h". That is of course would raise many further questions, like why you do you drop this glyph and not this and so on so it will surely end up in quarrel. Here lies another problem - non-constant width of letters, but this is more the problem of fonts and rendering, so adresses IDE and editors problematics. But as said I won't recommend base 16 at all. >> ??-? ---- ---- ---? >> >> you can downscale the strings, so a 16-bit >> value would be ~60 pixels wide > Yes, you can make the characters narrow enough that > you can take 4 of them in at once, almost as though > they were a single glyph... at which point you've > effectively just substituted one set of 16 glyphs No no. I didn't mean to shrink them till they melt together. The structure is still there, only that with such notation you don't need to keep the glyph so big as with many-glyph systems. >for another. Then you'd have to analyse whether the >*combined* 4-element glyphs were easier to disinguish >from each other than the ones they replaced. Since >the new ones are made up of repetitions of just two >elements, whereas the old ones contain a much more >varied set of elements, I'd be skeptical about that. I get your idea and this a very good point. Seems you have experience in such things? Currently I don't know for sure if such approach more effective or less than others and for what case. But I can bravely claim that it is better than *any* hex notation, it just follows from what I have here on paper on my table, namely that it is physically impossible to make up highly effective glyph system of more than 8 symbols. You want more only if really *need* more glyphs. And skepticism should always be present. One thing however especially interests me, here not only the differentiation of glyph comes in play, but also positional principle which helps to compare and it can be beneficaial for specific cases. So you can clearly see if one number is two times bigger than other for example. And of course, strictly speaking those bit groups are not glyphs, you can call them of course so, but this is just rhetorics. So one could call all english written words also glyphs but they are not really. But I get your analogy, this is how the tests should be made. >BTW, your choice of ? because of its "peak readibility" >seems to be a case of taking something out of context. >The readability of a glyph can only be judged in terms >of how easy it is to distinguish from other glyphs. True and false. Each single taken glyph has a specific structure and put alone it has optical qualities. This is somewhat quite complicated and hardly describable by words, but anyway, only tests can tell what is better. In this case it is still 2 glyphs or better say one and a half glyph. And indeed you can distinguish them really good since they have different mass. > Here, the only thing that matters is distinguishing it > from the other symbol, so something like "|" would > perhaps be a better choice. > ||-| ---- ---- ---| I can get your idea, although not really correct statement, see above. A vertical stab is hardly a good glyph, actually quite a bad one. Such notation will cause quite uncomfortable effect on eyes, and there are many things here. Less technically, here is a rule: - a good glyph has a structure, and the boundary of the glyph is a proportional form (like a bulb) (not your case) - vertical gaps/sheers inside these boundaries are bad (your case). One can't always do without them, but vertical ones are much worse than horizontal. - too primitive glyph structure is bad (your case) So a stab is good only as some punctuation sign. For this exact reason such letters, as "l", "L", "i" are bad ones, especially their sans-serif variants. And *not* in the first place because they collide with other glyphs. This is somewhat non obvious. One should understand of course that I just took the standard symbols that only try to mimic the correct representation. So if sometime you will play around with bitstrings, here are the ASCII-only variants which are best working: -y-y ---y -yy- -y-- -o-o ---o -oo- -o-- -k-k ---k -kk- -k-- -s-s ---s -ss- -s-- No need to say that these will be way, way better than "01" notation which is used as standard. If you read a lot numbers you should have noticed how unpleasant is to scan through 010101 > What I'm far from convinced of is that I would gain any > benefit from making that effort, or that a fresh person > would be noticeably better off if they learned your new > system instead of the old one. "far from convinced" sounds quite positive however :) it is never too late. I heard from Captain Crunch https://en.wikipedia.org/wiki/John_Draper That he was so tired of C syntax that he finally switched to Python for some projects. I can imagine how unwanted this can be in age. All depends on tasks that one often does. If say, imagine you'll read binaries for a long time, in one of notations I proposed above (like "-y-- --y-" for example) and after that try to switch back to "0100 0010" notation, I bet you will realize that better. Indeed learning new notation for numbers is quite easy, it is only some practice. And with base-2 you don't need learn at all, just can switch to other notation and use straight away. >> It is not about speed, it is about brain load. >> Chinese can read their hieroglyphs fast, but >> the cognition load on the brain is 100 times higher >> than current latin set. >Has that been measured? How? I don't think it is measurable at all. That is my opinion, and 100 just shows that I think it is very stressfull, also due to lot of meaning disambiguation that such system can cause. I also heard pesonal complains from chinese young students, they all had problems with vision already in early years, but I cannot support it oficially. So just imagine: if take for truth, max number of effective glyphs is 8. and hieroglyphs are *all* printed in same sized box! how would this provide efficient reading, and if you've seen chinese books, they all printed with quite small font. I am not very sentimental person but somehow feel sorry for people, one doesn't deserve it. You know, I become friends with one chinese girl, she loves to read and eager to learn and need to always carry pair of goglles with her everywhere. Somehow sad I become now writing it, she is so sweet young girl... And yes in this sence one can say that this cognition load can be measured. You go to universities in China and count those with vision problems. >I don't doubt that some sets of glyphs are easier to >distinguish from each other than others. But the That sounds good, this is not so often that one realizes that :) Most people would say "it's just matter of habit" >letters and digits that we currently use have already >been pretty well optimised by scribes and typographers >over the last few hundred years, and I'd be surprised >if there's any *major* room left for improvement. Here I would slightly disagree First, *Digits* are not optimised for anything, they are are just a heritage from ancient time. They have some minimal readability, namely "2" is not bad, others are quite poor. Second, *small latin letters* are indeed well fabricated. However don't have an illusion that someone cared much about their optimisation in last 1000 years. If you are skeptical about that, take a look at this http://daten.digitale-sammlungen.de/~db/bsb00003258/images/index.html?seite=320 If believe (there are skeptics who do not believe) that this dates back end of 10th century, so we have an interesting picture here, You see that this is indeed very similar to what you read now, somewhat optimised of course, but without much improvements. Actually in some cases there is even some degradation: now we have "pbqd" letters, which are just rotation and reflection of each other, which is no good. Strictly speaking you can use only one of these 4 glyphs. And in last 500 hundred years there was zero modifications. How much improvent can be made is hard question. According to my results, indeed the peak readability forms are similar to certain small latin letters, But I would say quite significant improvement could be made. But this is not really measurable. Mikhail From rosuav at gmail.com Sat Oct 15 10:27:28 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 01:27:28 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> Message-ID: On Sun, Oct 16, 2016 at 12:06 AM, Mikhail V wrote: > But I can bravely claim that it is better than *any* > hex notation, it just follows from what I have here > on paper on my table, namely that it is physically > impossible to make up highly effective glyph system > of more than 8 symbols. You should go and hang out with jmf. Both of you have made bold assertions that our current system is physically/mathematically impossible, despite the fact that *it is working*. Neither of you can cite any actual scientific research to back your claims. Bye bye. ChrisA From mar77i at mar77i.ch Sat Oct 15 10:47:35 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Sat, 15 Oct 2016 16:47:35 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <5800810F.5080200@canterbury.ac.nz> Message-ID: On Sat, Oct 15, 2016 at 3:06 PM, Paul Moore wrote: > is the same as > > [ x for var in iterable for x in expression ] > correction, that would be: [var for expression in iterable for var in expression] you are right, though. List comprehensions are already stackable. TIL. cheers! mar77i From random832 at fastmail.com Sat Oct 15 13:17:21 2016 From: random832 at fastmail.com (Random832) Date: Sat, 15 Oct 2016 13:17:21 -0400 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015103815.GS22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: <1476551841.534485.756969481.7EAD0A55@webmail.messagingengine.com> On Sat, Oct 15, 2016, at 06:38, Steven D'Aprano wrote: > > Replacing it _with the items_ is not the same thing as replacing it > > _with a sequence containing the items_, > > I don't think I ever used the phrasing "a sequence containing the > items". I think that's *your* phrase, not mine. It's not your phrasing, it's the actual semantic content of your claim that it would have to wrap them in a tuple. > The core developers have made it absolutely clear that changing the > fundamental equivalence of list comps as syntactic sugar for: > > result = [] > for t in iterable: > result.append(t) > > > is NOT NEGOTIABLE. I've never heard of this. It certainly never came up in this discussion. And it was negotiable just fine when they got rid of the leaked loop variable. > (That is much to my disappointment -- I would love to > introduce a "while" version of list comps to match the "if" version, but > that's not an option.) > > So regardless of whether it is a fiction or an absolute loop, Python's > list comprehensions are categorically limited to behave equivalently to > the loop above (modulo scope, temporary variables, etc). See, there it is. Why are *those* things that are allowed to be differences, but this (which could be imagined as "result += [t]" if you _really_ need a single statement where the left-hand clause is substituted in, or otherwise) is not? From elazarg at gmail.com Sat Oct 15 13:33:27 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 15 Oct 2016 17:33:27 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015103815.GS22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 1:49 PM Steven D'Aprano wrote: ... > And the transformation of *t for the items of t (I don't care if it is a > real transformation in the implementation, or only a fictional > transformation) cannot work in a list comp. Let's make the number of > items of t explicit so we don't have to worry about variable item > counts: > > [*t for t in iterable] # t has three items > [a, b, c for (a, b, c) in iterable] > > > That's a syntax error. To avoid the syntax error, we need parentheses: > > [(a, b, c) for (a, b, c) in iterable] > > and that's a no-op. You are confusing here two distinct roles of the parenthesis: disambiguation as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This overload is the reason that (1) is not a 1-tuple and we must write (1,). You may argue that this overloading causes confusion and make this construct hard to understand, but please be explicit about that; even if <1, 2,3 > was the syntax for tuples, the expansion was still [(a, b, c) for (a, b, c) in iterable] Since no tuple is constructed here. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Oct 15 13:36:05 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 04:36:05 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: > You are confusing here two distinct roles of the parenthesis: disambiguation > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This overload > is the reason that (1) is not a 1-tuple and we must write (1,). Parentheses do not a tuple make. Commas do. 1, 2, 3, # three-element tuple 1, 2, # two-element tuple 1, # one-element tuple The only time that a tuple requires parens is when it's the empty tuple, (). ChrisA From mistersheik at gmail.com Sat Oct 15 13:38:15 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 15 Oct 2016 17:38:15 +0000 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015085337.GQ22471@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> <20161015085337.GQ22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 5:01 AM Steven D'Aprano wrote: > On Thu, Oct 13, 2016 at 01:30:45PM -0700, Neil Girdhar wrote: > > > From a CPython implementation standpoint, we specifically blocked this > code > > path, and it is only a matter of unblocking it if we want to support > this. > > I find that difficult to believe. The suggested change seems like it > should be much bigger than just removing a block. Can you point us to > the relevant code? > > The Grammar specifies: dictorsetmaker: ( ((test ':' test | '**' expr) (comp_for | (',' (test ':' test | '**' expr))* [','])) | ((test | star_expr) (comp_for | (',' (test | star_expr))* [','])) ) In ast.c, you can find: if (is_dict) { ast_error(c, n, "dict unpacking cannot be used in " "dict comprehension"); return NULL; } res = ast_for_dictcomp(c, ch); and ast_for_dictcomp supports dict unpacking. Similarly: if (elt->kind == Starred_kind) { ast_error(c, ch, "iterable unpacking cannot be used in comprehension"); return NULL; } comps = ast_for_comprehension(c, CHILD(n, 1)); and ast_for_comprehensions supports iterable unpacking. In any case, it isn't really the difficulty of implementation that is > being questioned. Many things are easy to implement, but we still > don't do them. If it doesn't matter, why bring it up? > The real questions here are: > > (1) Should we overload list comprehensions as sugar for a flatten() > function? > > (2) If so, should we spell that [*t for t in iterable]? > > > Actually the answer to (1) should be "we already do". We just spell it: > > [x for t in iterable for x in t] > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/ROYNN7a5VAc/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Sat Oct 15 13:38:32 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 15 Oct 2016 17:38:32 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 8:36 PM Chris Angelico wrote: > On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: > > You are confusing here two distinct roles of the parenthesis: > disambiguation > > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This > overload > > is the reason that (1) is not a 1-tuple and we must write (1,). > > Parentheses do not a tuple make. Commas do. > > 1, 2, 3, # three-element tuple > 1, 2, # two-element tuple > 1, # one-element tuple > > And what [1, 2, 3] means? It's very different from [(1,2,3)]. Python explicitly allow 1, 2, 3 to mean tuple in certain contexts, I agree. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Oct 15 13:44:49 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 04:44:49 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: On Sun, Oct 16, 2016 at 4:38 AM, ????? wrote: > On Sat, Oct 15, 2016 at 8:36 PM Chris Angelico wrote: >> >> On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: >> > You are confusing here two distinct roles of the parenthesis: >> > disambiguation >> > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This >> > overload >> > is the reason that (1) is not a 1-tuple and we must write (1,). >> >> Parentheses do not a tuple make. Commas do. >> >> 1, 2, 3, # three-element tuple >> 1, 2, # two-element tuple >> 1, # one-element tuple >> > And what [1, 2, 3] means? It's very different from [(1,2,3)]. > > Python explicitly allow 1, 2, 3 to mean tuple in certain contexts, I agree. > Square brackets create a list. I'm not sure what you're not understanding, here. The comma does have other meanings in other contexts (list/dict/set display, function parameters), but outside of those, it means "create tuple". ChrisA From elazarg at gmail.com Sat Oct 15 13:48:44 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 15 Oct 2016 17:48:44 +0000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 8:45 PM Chris Angelico wrote: > On Sun, Oct 16, 2016 at 4:38 AM, ????? wrote: > > On Sat, Oct 15, 2016 at 8:36 PM Chris Angelico wrote: > >> > >> On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: > >> > You are confusing here two distinct roles of the parenthesis: > >> > disambiguation > >> > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This > >> > overload > >> > is the reason that (1) is not a 1-tuple and we must write (1,). > >> > >> Parentheses do not a tuple make. Commas do. > >> > >> 1, 2, 3, # three-element tuple > >> 1, 2, # two-element tuple > >> 1, # one-element tuple > >> > > And what [1, 2, 3] means? It's very different from [(1,2,3)]. > > > > Python explicitly allow 1, 2, 3 to mean tuple in certain contexts, I > agree. > > > > Square brackets create a list. I'm not sure what you're not understanding, > here. > > The comma does have other meanings in other contexts (list/dict/set > display, function parameters), but outside of those, it means "create > tuple". > On a second thought you may decide whether the rule is tuple and there are exceptions, or the other way around. The point was, conceptual expansion does not "fail" just because there is an overloading in the meaning of the tokens ( and ). It might make it harder to understand, though. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Sat Oct 15 14:15:18 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sat, 15 Oct 2016 13:15:18 -0500 Subject: [Python-ideas] Heap data type, the revival Message-ID: I once again had a use for heaps, and after rewrapping the heapq.heap* methods for the umpteenth time, figured I'd try my hand at freezing off that wart. Some research turned up an older thread by Facundo Batista (https://mail.python.org/pipermail/python-ideas/2009-April/004173.html), but it seems like interest petered out. I shoved an initial pass at a spec, implementation, and tests (robbed from /Lib/test/test_heapq.py mostly) into a repo at https://github.com/nicktimko/heapo My spec is basically: 1. Provide all existing heapq.heap* functions provided by the heapq module as methods with identical semantics 2. Provide limited magic methods to the underlying heap structure a. __len__ to see how big it is, also for boolean'ing b. __iter__ to allow reading out to something else (doesn't consume elements) 3. Add peek method to show, but not consume, lowest heap value 4. Allow custom comparison/key operation (to be implemented/copy-pasted) Open Questions * Should __init__ shallow-copy the list or leave that up to the caller? Less memory if the heap object just co-opts it, but user might accidentally reuse the reference and ruin the heap. If we make our own list then it's easier to just suck in any arbitrary iterable. * How much should the underlying list be exposed? Is there a use case for __setitem__, __delitem__? * Should there be a method to alter the priority of elements while preserving the heap invariant? Daniel Stutzbach mentioned dynamically increasing/decreasing priority of some list elements...but I'm inclined to let that be a later addition. * Add some iterable method to consume the heap in an ordered fashion? Cheers, Nick From songofacandy at gmail.com Sat Oct 15 14:36:46 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 16 Oct 2016 03:36:46 +0900 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: > > Are there precedences of combining verbose and version options in other > programs? > No, I was just afraid about other programs rely on format of python -V. > PyPy just outputs sys.version for the --version option. > > $ pypy -V > Python 2.7.10 (5.4.1+dfsg-1~ppa1~ubuntu16.04, Sep 06 2016, 23:11:39) > [PyPy 5.4.1 with GCC 5.4.0 20160609] > > I think it would not be large breakage if new releases of CPython become > outputting extended version information by default. > I like it if it's OK. Does anyone against this? From srkunze at mail.de Sat Oct 15 16:26:44 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 15 Oct 2016 22:26:44 +0200 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: References: Message-ID: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> On 15.10.2016 20:15, Nick Timkovich wrote: > I once again had a use for heaps, and after rewrapping the heapq.heap* > methods for the umpteenth time, figured I'd try my hand at freezing > off that wart. We re-discussed this in the beginning of 2016 and xheap https://pypi.python.org/pypi/xheap was one outcome. ;) In the course of doing so, some performance improvements were also discovered and some peculiarities of Python lists are discussed. See here https://mail.python.org/pipermail/python-list/2016-January/702568.html and here https://mail.python.org/pipermail/python-list/2016-March/704339.html Cheers, Sven From mikhailwas at gmail.com Sat Oct 15 17:05:19 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sat, 15 Oct 2016 23:05:19 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> Message-ID: On 15 October 2016 at 16:27, Chris Angelico wrote: > On Sun, Oct 16, 2016 at 12:06 AM, Mikhail V wrote: >> But I can bravely claim that it is better than *any* >> hex notation, it just follows from what I have here >> on paper on my table, namely that it is physically >> impossible to make up highly effective glyph system >> of more than 8 symbols. > > You should go and hang out with jmf. Both of you have made bold > assertions that our current system is physically/mathematically Who is jmf? > impossible, despite the fact that *it is working*. Neither of you can > cite any actual scientific research to back your claims. No, please don't ask that. I have enough work in real life. And you tend to understand too literally my words here. Mikhail From prometheus235 at gmail.com Sat Oct 15 17:19:10 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sat, 15 Oct 2016 16:19:10 -0500 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> References: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> Message-ID: Features and speed are good, but I'm interested in getting something with the basic features into the Standard Library so it's just there. Not having done that before and bit clueless, I'm wanting to learn that slightly less-technical procedure. What are the steps to make that happen? On Sat, Oct 15, 2016 at 3:26 PM, Sven R. Kunze wrote: > On 15.10.2016 20:15, Nick Timkovich wrote: >> >> I once again had a use for heaps, and after rewrapping the heapq.heap* >> methods for the umpteenth time, figured I'd try my hand at freezing >> off that wart. > > > We re-discussed this in the beginning of 2016 and xheap > https://pypi.python.org/pypi/xheap was one outcome. ;) In the course of > doing so, some performance improvements were also discovered and some > peculiarities of Python lists are discussed. > > See here > https://mail.python.org/pipermail/python-list/2016-January/702568.html > and here > https://mail.python.org/pipermail/python-list/2016-March/704339.html > > Cheers, > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From srkunze at mail.de Sat Oct 15 17:21:54 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 15 Oct 2016 23:21:54 +0200 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: References: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> Message-ID: <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> On 15.10.2016 23:19, Nick Timkovich wrote: > Features and speed are good, but I'm interested in getting something > with the basic features into the Standard Library so it's just there. > Not having done that before and bit clueless, I'm wanting to learn > that slightly less-technical procedure. What are the steps to make > that happen? As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen". Cheers, Sven From srkunze at mail.de Sat Oct 15 17:38:08 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 15 Oct 2016 23:38:08 +0200 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <1476387971.2836793.755191849.2DDCEFE1@webmail.messagingengine.com> <1476388317.2839650.755221057.2AE6967D@webmail.messagingengine.com> <5800810F.5080200@canterbury.ac.nz> Message-ID: On 15.10.2016 16:47, Martti K?hne wrote: > [var for expression in iterable for var in expression] > you are right, though. List comprehensions are already stackable. > TIL. Good catch, Paul. Comprehensions appear to be a special case when it comes to unpacking as they provide an alternative path. So, nested comprehensions seem to unintuitive to those who actually favor the *-variant. ;) Anyway, I don't think that it's a strong argument against the proposal. ~10 other ways are available to do what * does and this kind of argument did not prevent PEP448. What's more (and which I think is a more important response to the nested comprehension alternative) is that nested comprehensions are rarely used, and usually get long quite easily. To be practical here, let's look at an example I remembered this morning (taken from real-world code I needed to work with lately): return [(language, text) for language, text in fulltext_tuples] That's the minimum comprehension. So, you need to make it longer already to do **actual** work like filtering or mapping (otherwise, just return fulltext_tuples). So, we go even longer (and/or less readable): return [t for t in tuple for tuple in fulltext_tuples if tuple[0] == 'english'] return chain.from_iterable((language, text) for language, text in fulltext_tuples if language == 'english']) I still think the * variant would have its benefits here: return [*(language, text) for language, text in fulltext_tuples if language == 'english'] (Why it should be unpacked, you wonder? It's because of executemany of psycopg2.] Cheers, Sven From greg.ewing at canterbury.ac.nz Sat Oct 15 19:48:36 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 16 Oct 2016 12:48:36 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161015104839.GT22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> Message-ID: <5802C054.5020103@canterbury.ac.nz> Steven D'Aprano wrote: > Are you now supporting my argument that starring the list comprehension > expression isn't meaningful? The context it's in (a form of list display) has a clear meaning for a comma-separated list of values, so there is a reasonable interpretation that it *could* be given. > py> iterable = [(1, 'a'), (2, 'b')] > py> [(100, *t) for t in iterable] > [(100, 1, 'a'), (100, 2, 'b')] The * there is in the context of constructing a tuple, not the list into which the tuple is placed. The difference is the same as the difference between these: >>> t = (10, 20) >>> [1, (2, *t), 3] [1, (2, 10, 20), 3] >>> [1, 2, *t, 3] [1, 2, 10, 20, 3] -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 15 20:58:08 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 16 Oct 2016 13:58:08 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> Message-ID: <5802D0A0.1040401@canterbury.ac.nz> Mikhail V wrote: > Also I can only hard imagine that special purpose > of some language can ignore readability, Readability is not something absolute that stands on its own. It depends a great deal on what is being expressed. > even if it is assembler or whatever, > it can be made readable without much effort. You seem to be focused on a very narrow aspect of readability, i.e. fine details of individual character glyphs. That's not what we mean when we talk about readability of programs. > So I just look for some other solution for same task, > let it be 10 times more code. Then it will take you 10 times longer to write, and will on average contain 10 times as many bugs. Is that really worth some small, probably mostly theoretical advantage at the typographical level? > That is because that person from beginning > (blindly) follows the convention. What you seem to be missing is that there are *reasons* for those conventions. They were not arbitrary choices. Ultimately they can be traced back to the fact that our computers are built from two-state electronic devices. That's definitely not an arbitrary choice -- there are excellent physical reasons for it. Base 10, on the other hand, *is* an arbitrary choice. Due to an accident of evolution, we ended up with 10 easily accessible appendages for counting on, and that made its way into the counting system that is currently the most widely used by everyday people. So, if anything, *you're* the one who is "blindly following tradition" by wanting to use base 10. > 2. Better option would be to choose letters and > possibly other glyphs to build up a more readable > set. E.g. drop "c" letter and leave "e" due to > their optical collision, drop some other weak glyphs, > like "l" "h". That is of course would raise > many further questions, like why you do you drop this > glyph and not this and so on so it will surely end up in quarrel. Well, that's the thing. If there were large, objective, easily measurable differences between different possible sets of glyphs, then there would be no room for such arguments. The fact that you anticipate such arguments suggests that any differences are likely to be small, hard to measure and/or subjective. > But I can bravely claim that it is better than *any* > hex notation, it just follows from what I have here > on paper on my table, I think "on paper" is the important thing here. I suspect you are looking at the published results from some study or other and greatly overestimating the size of the effects compared to other considerations. -- Greg From brenbarn at brenbarn.net Sat Oct 15 21:07:44 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sat, 15 Oct 2016 18:07:44 -0700 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: <5802D2E0.8000303@brenbarn.net> On 2016-10-12 22:46, Mikhail V wrote: > For numbers obviously you don't need so many character as for > speech encoding, so this means that only those glyphs or even a subset > of it should be used. This means anything more than 8 characters > is quite worthless for reading numbers. > Note that I can't provide here the works currently > so don't ask me for that. Some of them would be probably > available in near future. It's pretty clear to me by this point that your argument has no rational basis, so I'm regarding this thread as a dead end. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From steve at pearwood.info Sat Oct 15 21:05:57 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 16 Oct 2016 12:05:57 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <5802C054.5020103@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> Message-ID: <20161016010552.GU22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 12:48:36PM +1300, Greg Ewing wrote: > Steven D'Aprano wrote: > >Are you now supporting my argument that starring the list comprehension > >expression isn't meaningful? > > The context it's in (a form of list display) has a clear > meaning for a comma-separated list of values, so there > is a reasonable interpretation that it *could* be given. This thread is a huge, multi-day proof that people do not agree that this is a "reasonable" interpretation. > >py> iterable = [(1, 'a'), (2, 'b')] > >py> [(100, *t) for t in iterable] > >[(100, 1, 'a'), (100, 2, 'b')] > > The * there is in the context of constructing a tuple, > not the list into which the tuple is placed. Right: the context of the star is meaningful. We all agree that *t in a list display [a, b, c, ...] is meaningful; same for tuples; same for function calls; same for sequence unpacking for assignment. What is not meaningful (except as a Perlish line-noise special case to be memorised) is *t as the list comprehension expression. I've never disputed that we could *assert* that *t in a list comp means "flatten". We could assert that it means anything we like. But it doesn't follow from the usual meaning of sequence unpacking anywhere else -- that's why it is currently a SyntaxError, and that's why people reacted with surprise at the OP who assumed that *t would magically flatten his iterable. Why would you assume that? It makes no sense to me -- that's not how sequence unpacking works in any other context, it isn't how list comprehensions work. Right from the beginning I called this "wishful thinking", and *nothing* since then has changed my mind. This proposal only makes even a little bit of sense if you imagine list comprehensions [*t for a in it1 for b in it2 for c in it3 ... for t in itN] completely unrolled into a list display: [*t, *t, *t, *t, ... ] but who does that? Why would you reason about your list comps like that? If you think about list comps as we're expected to think of them -- as list builders equivalent to a for-loop -- the use of *t there is invalid. Hence it is a SyntaxError. You want a second way to flatten your iterables? A cryptic, mysterious, Perlish line-noise way? Okay, fine, but don't pretend it is sequence unpacking -- in the context of a list comprehension, sequence unpacking doesn't make sense, it is invalid. Call it something else: the new "flatten" operator: [^t for t in iterable] for example, which magically adds an second invisible for-loop to your list comps: # expands to for t in iterable: for x in t: result.append(x) Because as your own email inadvertently reinforces, if sequence unpacking made sense in the context of a list comprehension, it would already be allowed rather than a SyntaxError: it is intentionally prohibited because it doesn't make sense in the context of list comps. -- Steve From steve at pearwood.info Sat Oct 15 21:10:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 16 Oct 2016 12:10:01 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> Message-ID: <20161016011001.GV22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 04:36:05AM +1100, Chris Angelico wrote: > On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: > > You are confusing here two distinct roles of the parenthesis: disambiguation > > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This overload > > is the reason that (1) is not a 1-tuple and we must write (1,). > > Parentheses do not a tuple make. Commas do. > > 1, 2, 3, # three-element tuple > 1, 2, # two-element tuple > 1, # one-element tuple > > The only time that a tuple requires parens is when it's the empty tuple, (). Or to disambiguate a tuple from some other comma-separated syntax. Hence why you need the parens here: [(b, a) for a,b in sequence] -- Steve From steve.dower at python.org Sat Oct 15 22:10:51 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 15 Oct 2016 19:10:51 -0700 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: FWIW, Python 3.6 should print this in the console just fine. Feel free to upgrade whenever you're ready. Cheers, Steve -----Original Message----- From: "Mikhail V" Sent: ?10/?12/?2016 16:07 To: "M.-A. Lemburg" Cc: "python-ideas at python.org" Subject: Re: [Python-ideas] Proposal for default character representation Forgot to reply to all, duping my mesage... On 12 October 2016 at 23:48, M.-A. Lemburg wrote: > Hmm, in Python3, I get: > >>>> s = "???.txt" >>>> s > '???.txt' I posted output with Python2 and Windows 7 BTW , In Windows 10 'print' won't work in cmd console at all by default with unicode but thats another story, let us not go into that. I think you get my idea right, it is not only about printing. > The hex notation for \uXXXX is a standard also used in many other > programming languages, it's also easier to parse, so I don't > think we should change this default. In programming literature it is used often, but let me point out that decimal is THE standard and is much much better standard in sence of readability. And there is no solid reason to use 2 standards at the same time. > > Take e.g. > >>>> s = "\u123456" >>>> s > '?56' > > With decimal notation, it's not clear where to end parsing > the digit notation. How it is not clear if the digit amount is fixed? Not very clear what did you mean. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 15 22:35:55 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 15 Oct 2016 19:35:55 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> Message-ID: On Oct 15, 2016 6:42 PM, "Steven D'Aprano" wrote: > doesn't make sense, it is invalid. Call it something else: the new > "flatten" operator: > > [^t for t in iterable] > > for example, which magically adds an second invisible for-loop to your list comps: This thread is a lot of work to try to save 8 characters in the spelling of `flatten(it)`. Let's just use the obvious and intuitive spelling. We really don't need to be Perl. Folks who want to write Perl have a perfectly good interpreter available already. The recipes in itertools give a nice implementation: def flatten(listOfLists): "Flatten one level of nesting" return chain.from_iterable(listOfLists) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Oct 15 22:55:52 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 13:55:52 +1100 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161016011001.GV22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> <20161016011001.GV22471@ando.pearwood.info> Message-ID: On Sun, Oct 16, 2016 at 12:10 PM, Steven D'Aprano wrote: > On Sun, Oct 16, 2016 at 04:36:05AM +1100, Chris Angelico wrote: >> On Sun, Oct 16, 2016 at 4:33 AM, ????? wrote: >> > You are confusing here two distinct roles of the parenthesis: disambiguation >> > as in "(1 + 2) * 2", and tuple construction as in (1, 2, 3). This overload >> > is the reason that (1) is not a 1-tuple and we must write (1,). >> >> Parentheses do not a tuple make. Commas do. >> >> 1, 2, 3, # three-element tuple >> 1, 2, # two-element tuple >> 1, # one-element tuple >> >> The only time that a tuple requires parens is when it's the empty tuple, (). > > Or to disambiguate a tuple from some other comma-separated syntax. Hence > why you need the parens here: > > [(b, a) for a,b in sequence] Yes, in the same way that other operators can need to be disambiguated. You need to say (1).bit_length() because otherwise "1." will be misparsed. You need parens to say x = (yield 5) + 2, else it'd yield 7. But that's not because a tuple fundamentally needs parentheses. ChrisA From alireza.rafiei94 at gmail.com Sun Oct 16 00:29:46 2016 From: alireza.rafiei94 at gmail.com (Alireza Rafiei) Date: Sat, 15 Oct 2016 21:29:46 -0700 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed Message-ID: Hi all, I have a list called count_list which contains tuples like below: [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), > ('london', 2), ('falling', 4), ('my', 1)] I want to sort it based on the second parameter in descending order and the tuples with the same second parameter must be sorted based on their first parameter, in alphabetically ascending order. So the ideal output is: [('down', 4), ('falling', 4), ('bridge', 2), ('is', 2), ('london', 2), > ('fair', 1), ('lady', 1), ('my', 1)] What I ended up doing is: count_list = sorted(count_list, > key=lambda x: (x[1], map(lambda x: -x, map(ord, > x[0]))), > reverse=True) which works. Now my solution is very specific to structures like [(str, int)] where all strs are lower case and besides ord makes it to be both limited in use and also more difficult to add extra levels of sorting. The main problem is that reverse argument takes only a boolean and applies to the whole list after sorting in finished. I couldn't think of any other way (besides mapping negative to ord values of x[0]) to say reverse on the first level but not reverse on the second level. Something like below would be ideal: count_list = sorted(count_list, > key=lambda x: (x[1], x[0]), > reverse=(True, False)) Does such a way similar to above exist? If not, how useful would it be to implement it? *P.S.* It's my first time on a mailing list. I apologize before hand if such a thing has already been discussed or even there exist a way which already achieves that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Oct 16 00:44:51 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 16 Oct 2016 17:44:51 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161016010552.GU22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> Message-ID: <580305C3.7000009@canterbury.ac.nz> Steven D'Aprano wrote: > This thread is a huge, multi-day proof that people do not agree that > this is a "reasonable" interpretation. So far I've seen one very vocal person who disgrees, and maybe one other who isn't sure. > This proposal only makes even a little > bit of sense if you imagine list comprehensions > > [*t for a in it1 for b in it2 for c in it3 ... for t in itN] > > completely unrolled into a list display: > > [*t, *t, *t, *t, ... ] > > but who does that? Why would you reason about your list comps like that? Many people do, and it's a perfectly valid way to think about them. They're meant to admit a declarative reading; that's the reason they exist in the first place. The expansion in terms of for-loops and appends is just *one* way to describe the current semantics. It's not written on stone tablets brought down from a mountain. Any other way of thinking about it that gives the same result is equally valid. > magically adds an second invisible for-loop to your > list comps: You might as well say that the existing * in a list display magically inserts a for-loop into it. You can think of it that way if you want, but you don't have to. > it is intentionally > prohibited because it doesn't make sense in the context of list comps. I don't know why it's currently prohibited. You would have to ask whoever put that code in, otherwise you're just guessing about the motivation. -- Greg From rosuav at gmail.com Sun Oct 16 00:46:25 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 15:46:25 +1100 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On Sun, Oct 16, 2016 at 3:29 PM, Alireza Rafiei wrote: > What I ended up doing is: > >> count_list = sorted(count_list, >> key=lambda x: (x[1], map(lambda x: -x, map(ord, >> x[0]))), >> reverse=True) > > > which works. Now my solution is very specific to structures like [(str, > int)] where all strs are lower case and besides ord makes it to be both > limited in use and also more difficult to add extra levels of sorting. Interesting. Personally, I would invert this; if you're sorting by an integer and a string, negate the integer, and keep the string as-is. If that doesn't work, a custom class might help. # untested class Record: reverse = False, True, True, False, True def __init__(data): self.data = data def __lt__(self, other): for v1, v2, rev in zip(self.data, other.data, self.reverse): if v1 < v2: return rev if v2 > v1: return not rev return False This is broadly similar to how tuple.__lt__ works, allowing you to flip the logic of whichever ones you like. ChrisA From rosuav at gmail.com Sun Oct 16 00:47:31 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 16 Oct 2016 15:47:31 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <580305C3.7000009@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On Sun, Oct 16, 2016 at 3:44 PM, Greg Ewing wrote: > Steven D'Aprano wrote: > >> This thread is a huge, multi-day proof that people do not agree that this >> is a "reasonable" interpretation. > > > So far I've seen one very vocal person who disgrees, and > maybe one other who isn't sure. > And what you're NOT seeing is a whole lot of people (myself included) who have mostly glazed over, unsure what is and isn't reasonable, and not clear enough on either side of the debate to weigh in. (Or not even clear what the two sides are.) ChrisA From mistersheik at gmail.com Sun Oct 16 01:14:54 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 16 Oct 2016 05:14:54 +0000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161016010552.GU22471@ando.pearwood.info> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> Message-ID: On Sat, Oct 15, 2016 at 9:42 PM Steven D'Aprano wrote: > On Sun, Oct 16, 2016 at 12:48:36PM +1300, Greg Ewing wrote: > > Steven D'Aprano wrote: > > >Are you now supporting my argument that starring the list comprehension > > >expression isn't meaningful? > > > > The context it's in (a form of list display) has a clear > > meaning for a comma-separated list of values, so there > > is a reasonable interpretation that it *could* be given. > > This thread is a huge, multi-day proof that people do not agree that > this is a "reasonable" interpretation. > > > > >py> iterable = [(1, 'a'), (2, 'b')] > > >py> [(100, *t) for t in iterable] > > >[(100, 1, 'a'), (100, 2, 'b')] > > > > The * there is in the context of constructing a tuple, > > not the list into which the tuple is placed. > > Right: the context of the star is meaningful. We all agree that *t in a > list display [a, b, c, ...] is meaningful; same for tuples; same for > function calls; same for sequence unpacking for assignment. > > What is not meaningful (except as a Perlish line-noise special case to > be memorised) is *t as the list comprehension expression. > > I've never disputed that we could *assert* that *t in a list comp means > "flatten". We could assert that it means anything we like. But it > doesn't follow from the usual meaning of sequence unpacking anywhere > else -- that's why it is currently a SyntaxError, and that's why people > reacted with surprise at the OP who assumed that *t would magically > flatten his iterable. Why would you assume that? It makes no sense to me > -- that's not how sequence unpacking works in any other context, it > isn't how list comprehensions work. > > Right from the beginning I called this "wishful thinking", and *nothing* > since then has changed my mind. This proposal only makes even a little > bit of sense if you imagine list comprehensions > > [*t for a in it1 for b in it2 for c in it3 ... for t in itN] > > completely unrolled into a list display: > > [*t, *t, *t, *t, ... ] > > but who does that? Why would you reason about your list comps like that? > If you think about list comps as we're expected to think of them -- as > list builders equivalent to a for-loop -- the use of *t there is > invalid. Hence it is a SyntaxError. > > You want a second way to flatten your iterables? A cryptic, mysterious, > Perlish line-noise way? Okay, fine, but don't pretend it is sequence > unpacking -- in the context of a list comprehension, sequence unpacking > doesn't make sense, it is invalid. Call it something else: the new > "flatten" operator: > > [^t for t in iterable] > > for example, which magically adds an second invisible for-loop to your > list comps: > > # expands to > for t in iterable: > for x in t: > result.append(x) > > Because as your own email inadvertently reinforces, if sequence > unpacking made sense in the context of a list comprehension, it would > already be allowed rather than a SyntaxError: it is intentionally > prohibited because it doesn't make sense in the context of list comps. > Whoa, hang on a second there. It is intentionally prohibited because Joshua Landau (who helped a lot with writing and implementing the PEP) and I felt like there was going to be a long debate and we wanted to get PEP 448 checked in. If it "didn't make sense" as you say, then we would have said so in the PEP. Instead, Josh wrote: This was met with a mix of strong concerns about readability and mild support. In order not to disadvantage the less controversial aspects of the PEP, this was not accepted with the rest of the proposal. I don't remember who it was who had those strong concerns (maybe you?) But that's why we didn't include it. Best, Neil > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/ROYNN7a5VAc/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 16 01:08:40 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 15 Oct 2016 22:08:40 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <580305C3.7000009@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On Oct 15, 2016 9:45 PM, "Greg Ewing" wrote: > > Steven D'Aprano wrote: > >> This thread is a huge, multi-day proof that people do not agree that this is a "reasonable" interpretation. > > > So far I've seen one very vocal person who disgrees, and > maybe one other who isn't sure. In case it wasn't entirely clear, I strongly and vehemently opposed this unnecessary new syntax. It is confusing, bug prone, and would be difficult to teach. Or am I that very vocal person? I was thinking your meant Steven. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Oct 16 02:01:03 2016 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 16 Oct 2016 01:01:03 -0500 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: [Alireza Rafiei ] > I have a list called count_list which contains tuples like below: > > > [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), > > ('london', 2), ('falling', 4), ('my', 1)] > > > I want to sort it based on the second parameter in descending order and the > tuples with the same second parameter must be sorted based on their first > parameter, in alphabetically ascending order. So the ideal output is: > > > [('down', 4), ('falling', 4), ('bridge', 2), ('is', 2), ('london', 2), > > ('fair', 1), ('lady', 1), ('my', 1)] I'd like to suggest doing something simple instead, such that data = [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), ('london', 2), ('falling', 4), ('my', 1)] from operator import itemgetter multisort(data, [# primary key is 2nd element, reversed (itemgetter(1), True), # secondary key is 1st element, natural (itemgetter(0), False)]) import pprint pprint.pprint(data) prints the output you want. It's actually very easy to do this, but the cost is that it requires doing a full sort for _each_ field you want to sort on. Because Python's sort is stable, it's sufficient to merely sort on the least-significant key first, and then sort again on each key in turn through the most-significant. There's another subtlety that makes this work: > ... > The main problem is that reverse argument takes only a boolean and applies > to the whole list after sorting in finished. Luckily, that's not how `reverse` actually works. It _first_reverses the list, then sorts, and then reverses the list again. The result is that items that compare equal _retain_ their original order, where just reversing the sorted list would invert their order. That's why, in your example above, after first sorting on the string component in natural order we get (in part) [[('down', 4), ('falling', 4), ...] and then reverse-sorting on the integer portion _leaves_ those tuples in the same order. That's essential for this decomposition of the problem to work. With that background, here's the implementation: def multisort(xs, specs): for key, reverse in reversed(specs): xs.sort(key=key, reverse=reverse) That's all it takes! And it accepts any number of items in `specs`. Before you worry that it's "too slow", time it on real test data. `.sort()` is pretty zippy, and this simple approach allows using simple key functions. More importantly, it's much easier on your brain ;-) From ncoghlan at gmail.com Sun Oct 16 03:21:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 16 Oct 2016 17:21:55 +1000 Subject: [Python-ideas] Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476551841.534485.756969481.7EAD0A55@webmail.messagingengine.com> References: <20161012154224.GT22471@ando.pearwood.info> <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <1476415969.1384439.755561833.74F36D92@webmail.messagingengine.com> <20161015080009.GM22471@ando.pearwood.info> <1476520933.440585.756726737.0F14CD86@webmail.messagingengine.com> <20161015103815.GS22471@ando.pearwood.info> <1476551841.534485.756969481.7EAD0A55@webmail.messagingengine.com> Message-ID: On 16 October 2016 at 03:17, Random832 wrote: > On Sat, Oct 15, 2016, at 06:38, Steven D'Aprano wrote: >> > Replacing it _with the items_ is not the same thing as replacing it >> > _with a sequence containing the items_, >> >> I don't think I ever used the phrasing "a sequence containing the >> items". I think that's *your* phrase, not mine. > > It's not your phrasing, it's the actual semantic content of your claim > that it would have to wrap them in a tuple. > >> The core developers have made it absolutely clear that changing the >> fundamental equivalence of list comps as syntactic sugar for: >> >> result = [] >> for t in iterable: >> result.append(t) >> >> >> is NOT NEGOTIABLE. > > I've never heard of this. It certainly never came up in this discussion. It's not negotiable, and that most recently came up just a few weeks ago: https://mail.python.org/pipermail/python-ideas/2016-September/042602.html > And it was negotiable just fine when they got rid of the leaked loop > variable. We wrapped it in a nested function scope without otherwise changing the syntactic equivalence, and only after generator expressions already introduced the notion of using a nested scope (creating Python 2's scoping discrepancy between "[x for x in iterable]" and "list(x for x in iterable)", which was eliminated in Python 3). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From alireza.rafiei94 at gmail.com Sun Oct 16 03:35:29 2016 From: alireza.rafiei94 at gmail.com (Alireza Rafiei) Date: Sun, 16 Oct 2016 00:35:29 -0700 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: Awesome! Thanks for the thorough explanation. On Sat, Oct 15, 2016 at 11:01 PM, Tim Peters wrote: > [Alireza Rafiei ] > > I have a list called count_list which contains tuples like below: > > > > > [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), > > > ('london', 2), ('falling', 4), ('my', 1)] > > > > > > I want to sort it based on the second parameter in descending order and > the > > tuples with the same second parameter must be sorted based on their first > > parameter, in alphabetically ascending order. So the ideal output is: > > > > > [('down', 4), ('falling', 4), ('bridge', 2), ('is', 2), ('london', 2), > > > ('fair', 1), ('lady', 1), ('my', 1)] > > I'd like to suggest doing something simple instead, such that > > data = [('bridge', 2), ('fair', 1), ('lady', 1), > ('is', 2), ('down', 4), ('london', 2), > ('falling', 4), ('my', 1)] > > from operator import itemgetter > multisort(data, [# primary key is 2nd element, reversed > (itemgetter(1), True), > # secondary key is 1st element, natural > (itemgetter(0), False)]) > import pprint > pprint.pprint(data) > > prints the output you want. > > It's actually very easy to do this, but the cost is that it requires > doing a full sort for _each_ field you want to sort on. Because > Python's sort is stable, it's sufficient to merely sort on the > least-significant key first, and then sort again on each key in turn > through the most-significant. There's another subtlety that makes > this work: > > > ... > > The main problem is that reverse argument takes only a boolean and > applies > > to the whole list after sorting in finished. > > Luckily, that's not how `reverse` actually works. It _first_reverses > the list, then sorts, and then reverses the list again. The result is > that items that compare equal _retain_ their original order, where > just reversing the sorted list would invert their order. That's why, > in your example above, after first sorting on the string component in > natural order we get (in part) > > [[('down', 4), ('falling', 4), ...] > > and then reverse-sorting on the integer portion _leaves_ those tuples > in the same order. That's essential for this decomposition of the > problem to work. > > With that background, here's the implementation: > > def multisort(xs, specs): > for key, reverse in reversed(specs): > xs.sort(key=key, reverse=reverse) > > That's all it takes! And it accepts any number of items in `specs`. > Before you worry that it's "too slow", time it on real test data. > `.sort()` is pretty zippy, and this simple approach allows using > simple key functions. More importantly, it's much easier on your > brain ;-) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 16 06:28:14 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 16 Oct 2016 11:28:14 +0100 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On 16 October 2016 at 08:35, Alireza Rafiei wrote: > Awesome! Thanks for the thorough explanation. Thank you for the interesting suggestion that prompted the explanation. I don't know about others, but I know that I often forget ways to use the tools already at our disposal, so threads like this are a useful reminder. (And welcome to the mailing list - hopefully your stay will be pleasant :-)) Paul From steve at pearwood.info Sun Oct 16 07:23:46 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 16 Oct 2016 22:23:46 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> <20161015085337.GQ22471@ando.pearwood.info> Message-ID: <20161016112345.GX22471@ando.pearwood.info> On Sat, Oct 15, 2016 at 05:38:15PM +0000, Neil Girdhar wrote: > In ast.c, you can find: > > if (is_dict) { > ast_error(c, n, "dict unpacking cannot be used in " > "dict comprehension"); > return NULL; > } > res = ast_for_dictcomp(c, ch); [...] Thanks for the pointer. > > In any case, it isn't really the difficulty of implementation that > > is being questioned. Many things are easy to implement, but we still > > don't do them. > > If it doesn't matter, why bring it up? I never said it doesn't matter. I brought it up because I'm curious. Because if it turns out that it actually is *difficult* to implement, that would be a point in my favour that *t doesn't naturally apply in list comps. And on the other hand, if it is *easy* to implement, that's a hint that perhaps I'm missing something and there is some natural interpretation of *t in a list comp that I've missed. Perhaps. Just because I have a strong opinion doesn't mean I'm not willing to consider the possibility that I'm wrong. -- Steve From steve at pearwood.info Sun Oct 16 07:41:03 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 16 Oct 2016 22:41:03 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> Message-ID: <20161016114103.GY22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 05:14:54AM +0000, Neil Girdhar wrote: [Steven (me), refering to Greg] > > Because as your own email inadvertently reinforces, if sequence > > unpacking made sense in the context of a list comprehension, it would > > already be allowed rather than a SyntaxError: it is intentionally > > prohibited because it doesn't make sense in the context of list comps. > > > Whoa, hang on a second there. It is intentionally prohibited because > Joshua Landau (who helped a lot with writing and implementing the PEP) and > I felt like there was going to be a long debate and we wanted to get PEP > 448 checked in. > > If it "didn't make sense" as you say, then we would have said so in the > PEP. Instead, Josh wrote: > > This was met with a mix of strong concerns about readability and mild > support. In order not to disadvantage the less controversial aspects of the > PEP, this was not accepted with the rest of the proposal. Okay, interesting, and thanks for the correction. > I don't remember who it was who had those strong concerns (maybe you?) But > that's why we didn't include it. I'm pretty sure it wasn't me. I don't recall being involved at all with any discussions about PEP 448, and a quick search has failed to come up with anything relevant. I think I sat that one out. -- Steve From srkunze at mail.de Sun Oct 16 08:34:58 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 16 Oct 2016 14:34:58 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On 16.10.2016 07:08, David Mertz wrote: > > In case it wasn't entirely clear, I strongly and vehemently opposed > this unnecessary new syntax. It is confusing, bug prone, and would be > difficult to teach. > As this discussion won't come to an end, I decided to consult my girlfriend. I started with (btw. she learned basic Python to solve some math quizzes): """ Let's install a list in another one. >>> meine_liste [2, 3, 4, 5] >>> ['a', meine_liste, 'b'] ['a', [2, 3, 4, 5], 'b'] Maybe, we want to remove the brackets. >>> ['a', *meine_liste, 'b'] ['a', 2, 3, 4, 5, 'b'] Now, the problem of the discussion is the following: >>> [(i,i,i) for i in range(4)] [(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3)] Let's remove these inner parentheses again. >>> [*(i,i,i) for i in range(4)] File "", line 1 SyntaxError: iterable unpacking cannot be used in comprehension Some guy wanted to remove that restriction. """ I said a teacher contributed to the discussion and he finds this too complicated and confusing and does not even teach * in list displays at all. Her reaction was hilarious: "Whom does he teach? Children?" Me: "What? No, everybody I think. Why?" She: "It's easy enough to remember what the star does." She also asked what would the alternative would look like. I wrote: """ >>> from itertools import chain >>> list(chain.from_iterable((i,i,i) for i in range(4))) [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3] """ Her reaction was like: "That's supposed to be easy to remember? I find the star easier." In the end, she also added: "Not everybody drives a car but they still exist." Cheers, Sven PS: off to the weekend. She's already complaint that I should spend less time inside my mailbox and more with her. ;) From levkivskyi at gmail.com Sun Oct 16 09:02:55 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 16 Oct 2016 15:02:55 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On 16 October 2016 at 06:47, Chris Angelico wrote: > On Sun, Oct 16, 2016 at 3:44 PM, Greg Ewing > wrote: > > Steven D'Aprano wrote: > > > >> This thread is a huge, multi-day proof that people do not agree that > this > >> is a "reasonable" interpretation. > > > > > > So far I've seen one very vocal person who disgrees, and > > maybe one other who isn't sure. > > > > And what you're NOT seeing is a whole lot of people (myself included) > who have mostly glazed over, unsure what is and isn't reasonable, and > not clear enough on either side of the debate to weigh in. (Or not > even clear what the two sides are.) > > +1 There are lots of arguments of whether the new syntax is readable or not etc., but not so many arguments why we need this and what kind of problems it would solve. What I have learned from this megathread is that the syntax [*foo for foo in bar] is proposed as a replacement for a one-liner itertools.chain(*[foo for foo in bar]). I do not have any strong opinion on this, because I simply do not use such constructs frequently (if ever). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 16 09:13:49 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 16 Oct 2016 22:13:49 +0900 Subject: [Python-ideas] PEP8 dictionary indenting addition In-Reply-To: References: <20161009002527.GM22471@ando.pearwood.info> <20161012160419.GU22471@ando.pearwood.info> <22526.30123.849597.644722@turnbull.sk.tsukuba.ac.jp> Message-ID: <22531.32013.479766.192630@turnbull.sk.tsukuba.ac.jp> Terry Reedy writes: > On 10/12/2016 1:40 PM, Stephen J. Turnbull wrote: > > Ie, space-at-beginning makes for more effective review for me. YMMV. > > I think that PEP 8 should not recommend either way. Oops, sorry, I forgot that was what we were talking about (subject notwithstanding. :-( I told you I don't read strings!) Is there room in PEP 8 for mention of pros and cons in "YMMV" cases? Steve From mertz at gnosis.cx Sun Oct 16 10:40:39 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 16 Oct 2016 07:40:39 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On Sun, Oct 16, 2016 at 5:34 AM, Sven R. Kunze wrote: > On 16.10.2016 07:08, David Mertz wrote: > >> In case it wasn't entirely clear, I strongly and vehemently opposed this >> unnecessary new syntax. It is confusing, bug prone, and would be difficult >> to teach. > > > "Whom does he teach? Children?" > Me: "What? No, everybody I think. Why?" > She: "It's easy enough to remember what the star does." > As I've said, the folks I teach are mostly working scientists with doctorates in scientific fields and years of programming experience in languages other than Python. For example, rocket scientists at NASA. Now I admit that I don't specifically know how quickly they would pick up on something I've never taught them. But I've written enough teaching materials and articles and books and I have a certain intuition. The way you explained the special case you built up is pretty good, but it's very tailored to making that specific case plausible, and is not general. > She also asked what would the alternative would look like. I wrote: > """ > >>> from itertools import chain > >>> list(chain.from_iterable((i,i,i) for i in range(4))) > [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3] > """ > Her reaction was like: "That's supposed to be easy to remember? I find the > star easier." > That is an absolutely terrible construct, obviously. I have to pause and think a while myself to understand what it does. It also answers a very different use case than the one that has been mostly discussed in this thread. A much better spelling is: >>> [i for i in range(4) for _ in range(3)] [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3] For some related uses, itertools.repeat() is very useful. But the case that has been discussed far more often is like this: >>> listOfLists = [[1,2,3], [4,5,6,7], [8,9]] >>> flatten(listOfLists) The very long argument is that somehow that would be easier to spell as: >>> [*i for i in listOfLists] It's just not easier. And there are contrary intuitions that will occur to many people based on the several other related-but-different uses of * for packing/unpacking in other contexts. Also, this simplest case might be teachable, but the more general "exactly where can I use that star in a comprehensions" will be far harder to explain plausibly. What's the pattern here? >>> [(*i,) for i in listOfLists] [(1, 2, 3), (4, 5, 6, 7), (8, 9)] >>> [(i,*i) for i in listOfLists] [([1, 2, 3], 1, 2, 3), ([4, 5, 6, 7], 4, 5, 6, 7), ([8, 9], 8, 9)] >>> [(*i,*i) for i in listOfLists] [(1, 2, 3, 1, 2, 3), (4, 5, 6, 7, 4, 5, 6, 7), (8, 9, 8, 9)] >>> [*(i,) for i in listOfLists] # ... no clear intuition here ... >>> [*(*i) for i in listOfLists # ... even more confusing ... Yes! I know those first few are actually doing something different than the proposed new syntax. But explaining that to my rocket scientists or your girlfriend in a consistent and accurate way would be a huge challenge. It would mostly come down to "don't do that, it's too confusing." -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Sun Oct 16 11:02:49 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sun, 16 Oct 2016 17:02:49 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <5802D0A0.1040401@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> <5802D0A0.1040401@canterbury.ac.nz> Message-ID: On 16 October 2016 at 02:58, Greg Ewing wrote: >> even if it is assembler or whatever, >> it can be made readable without much effort. > > > You seem to be focused on a very narrow aspect of > readability, i.e. fine details of individual character > glyphs. That's not what we mean when we talk about > readability of programs. In this discussion yes, but layout aspects can be also improved and I suppose special purpose of language does not always dictate the layout of code, it is up to you who can define that also. And glyphs is not very narrow aspect, it is one of the fundamental aspects. Also it is much harder to develop than good layout, note that. >> That is because that person from beginning >> (blindly) follows the convention. > > What you seem to be missing is that there are > *reasons* for those conventions. They were not > arbitrary choices. Exactly, and in case of hex notation I fail to see how my proposal with using letters instead of what we have now, could be overseen at the time of decision. There must *very* solid reason for digits+letters against my variant, wonder what is it. Hope not that mono-width reason. And basic readability principles is somewhat that was clear for people 2000 years ago already. > > So, if anything, *you're* the one who is "blindly > following tradition" by wanting to use base 10. Yes because when I was a child I learned it everywhere for everything, others too. As said I don't defend usage of base-10 as you can already note from my posts. > >> 2. Better option would be to choose letters and >> >> possibly other glyphs to build up a more readable >> set. E.g. drop "c" letter and leave "e" due to >> their optical collision, drop some other weak glyphs, >> like "l" "h". That is of course would raise >> many further questions, like why you do you drop this >> glyph and not this and so on so it will surely end up in quarrel. > > > Well, that's the thing. If there were large, objective, > easily measurable differences between different possible > sets of glyphs, then there would be no room for such > arguments. Those things cannot be easiliy measured, if at all, it requires a lot of tests and huge amount of time, you cannot plug measure device to the brain to precisely measure the load. In this case the only choice is to trust most experienced people who show the results which worked for them better and try self to implement and compare. Not saying you have special reason to trust me personally. > > The fact that you anticipate such arguments suggests > that any differences are likely to be small, hard > to measure and/or subjective. > >> But I can bravely claim that it is better than *any* >> hex notation, it just follows from what I have here >> on paper on my table, > > > I think "on paper" is the important thing here. I > suspect you are looking at the published results from > some study or other and greatly overestimating the > size of the effects compared to other considerations. If you try to google that particular topic you'll see that there is zero related published material, there are tons of papers on readability, but zero concrete proposals or any attempts to develop something real. That is the thing. I would look in results if there was something. In my case I am looking at what I've achieved during years of my work on it and indeed there some interesting things there. Not that I am overestimating the role of it, but indeed it can really help in many cases, e.g, like in my example with bitstrings. Last but not the least, I am not a "paper ass" in any case, I try to keep only experimantal work where possible. Mikhail From toddrjen at gmail.com Sun Oct 16 11:16:05 2016 From: toddrjen at gmail.com (Todd) Date: Sun, 16 Oct 2016 11:16:05 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: On Thu, Oct 13, 2016 at 1:46 AM, Mikhail V wrote: > Practically all this notation does, it reduces the time > before you as a programmer > become visual and brain impairments. > > Even if you were right that your approach is somehow inherently easier, it is flat-out wrong that other approaches lead to "brain impairment". On the contrary, it is well-established that challenging the brain prevents or at least delays brain impairment. And it also makes no sense that it would cause visual impairment, either. Comparing glyphs is a higher-level task in the brain, it has little to do with your eyes. All your eyes detect are areas of changing contrast, any set of lines and curves, not even glyphs, is functionally identical at that level (and even at medium-level brain regions). The size of the glyphs can make a difference, but not the number of available ones. On the contrary, having more glyphs increases the information density of text, reducing the amount of reading you have to do to get the same information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariatta.wijaya at gmail.com Sun Oct 16 11:41:53 2016 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Sun, 16 Oct 2016 08:41:53 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: >Her reaction was hilarious: > >"Whom does he teach? Children?" I sense mockery in your email, and it does not conform to the PSF code of conduct. Please read the CoC before posting in this mailing list. The link is available at the bottom of every python mailing list email. https://www.python.org/psf/codeofconduct/ I don't find teaching children is a laughing matter, neither is the idea of children learning to code. In Canada, we have initiatives like Girls Learning Code and Kids Learning Code. I mentored in a couple of those events and the students are girls aged 8-14. They surprised me with their abilities to learn. I would suggest looking for such mentoring opportunities in your area to gain appreciation with this regard. Thanks. (Sorry to derail everyone from the topic of list comprehension. Please continue!) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sun Oct 16 12:37:57 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 16 Oct 2016 17:37:57 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On 16/10/2016 16:41, Mariatta Wijaya wrote: >>Her reaction was hilarious: >> >>"Whom does he teach? Children?" > > I sense mockery in your email, and it does not conform to the PSF code > of conduct. Please read the CoC before posting in this mailing list. The > link is available at the bottom of every python mailing list > email.https://www.python.org/psf/codeofconduct/ > > I don't find teaching children is a laughing matter, neither is the idea > of children learning to code. > In Canada, we have initiatives like Girls Learning Code and Kids > Learning Code. I mentored in a couple of those events and the students > are girls aged 8-14. They surprised me with their abilities to learn. I > would suggest looking for such mentoring opportunities in your area to > gain appreciation with this regard. > Thanks. > (Sorry to derail everyone from the topic of list comprehension. Please > continue!) > The RUE was allowed to insult the community for years and got away with it. I'm autistic, stepped across the line, and got hammered. Hypocrisy at its best. Even funnier, the BDFL has asked for my advice in recent weeks with respect to the bug tracker. I've replied, giving the bare minimum that I feel I can give within the circumstances. Yours most disgustingly. Mark Lawrence. From prometheus235 at gmail.com Sun Oct 16 13:02:48 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sun, 16 Oct 2016 12:02:48 -0500 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> References: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> Message-ID: Functions are great; I'm a big fan of functions. That said, the group of heapq.heap* functions are literally OOP without using that "class" word. As far as flexibility, what is the use of the those functions on non-heap structures? On Sat, Oct 15, 2016 at 4:21 PM, Sven R. Kunze wrote: > On 15.10.2016 23:19, Nick Timkovich wrote: >> >> Features and speed are good, but I'm interested in getting something >> with the basic features into the Standard Library so it's just there. >> Not having done that before and bit clueless, I'm wanting to learn >> that slightly less-technical procedure. What are the steps to make >> that happen? > > > As I said, it has been discussed and the consensus so far was: "not > everything needs to be a class if it does not provide substantial benefit" + > "functions are more flexible" + "if it's slower that the original it won't > happen". > > Cheers, > Sven From jeanpierreda at gmail.com Sun Oct 16 13:40:46 2016 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 16 Oct 2016 10:40:46 -0700 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> References: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> Message-ID: > As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen". (These) functions are less flexible here. heapq forbids the use of anything except lists, for some reason. They would be *better* as list methods, because then array.array could implement them, and code could accept some arbitrary mutable sequence and transform it into a heap -- but instead, lists are required. xheap has a similar problem -- for some reason it subclasses list, which is bad practice in OOP, for good reason. Aside from making it impossible to use with array.array, it also e.g. makes it too easy to violate the heap invariant -- one of the big benefits of using a heap interface could have been making that impossible whenever your mutations originate from the heap object). The main thing I've always wanted from heapq is the ability to specify a key. This is a lot easier with a class: x = heapq.Heap(..., key=len) x.pop() vs (hypothetical, because heapq doesn't have this feature): x =... heapq.heapify(x, key=len) heapq.heappop(x, key=len) # Don't ever forget key=len unless you want to waste a few hours debugging. Classes would be more convenient and less dangerous as soon as you start adding features like this. +1 to classes. Replying to OP: > * Should __init__ shallow-copy the list or leave that up to the > caller? Less memory if the heap object just co-opts it, but user might > accidentally reuse the reference and ruin the heap. If we make our own > list then it's easier to just suck in any arbitrary iterable. Leave it up to the caller. The caller can just as easily call list(...) as you can, and might have good reasons to want to mutate the existing thing. That said, as a safety thing, it might be reasonable to create a new list by default but provide a special factory function/class to build an instance that wraps the sequence without copying. e.g.: heapq.Heap(x) vs heapq.HeapWrapper(x)). > * How much should the underlying list be exposed? Is there a use case > for __setitem__, __delitem__? If you allow the caller to keep hold of the original list, then they can always mutate it through that reference if they need to. If you don't allow the caller to keep the original list, but you support the list interface, then you've lost much of the safety you were trying to keep by not reusing references. -- Devin From mertz at gnosis.cx Sun Oct 16 13:47:04 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 16 Oct 2016 10:47:04 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: Actually, I agree with Marietta. I don't care whatsoever about mocking me, which was a certain element of it. I have thick skin and am confident in these conversations. The part that was probably over the line was mocking children who learn to program or those who teach them. That's a huge and great job. I know I would not have the skill to teach children effectively. Adults with technical expertise are much easier for me. That said, thank you Mark for your empirical research with a test subject. Best, David On Oct 16, 2016 9:39 AM, "Mark Lawrence via Python-ideas" < python-ideas at python.org> wrote: > On 16/10/2016 16:41, Mariatta Wijaya wrote: > >> Her reaction was hilarious: >>> >>> "Whom does he teach? Children?" >>> >> >> I sense mockery in your email, and it does not conform to the PSF code >> of conduct. Please read the CoC before posting in this mailing list. The >> link is available at the bottom of every python mailing list >> email.https://www.python.org/psf/codeofconduct/ >> >> I don't find teaching children is a laughing matter, neither is the idea >> of children learning to code. >> In Canada, we have initiatives like Girls Learning Code and Kids >> Learning Code. I mentored in a couple of those events and the students >> are girls aged 8-14. They surprised me with their abilities to learn. I >> would suggest looking for such mentoring opportunities in your area to >> gain appreciation with this regard. >> Thanks. >> (Sorry to derail everyone from the topic of list comprehension. Please >> continue!) >> >> > The RUE was allowed to insult the community for years and got away with > it. I'm autistic, stepped across the line, and got hammered. Hypocrisy at > its best. Even funnier, the BDFL has asked for my advice in recent weeks > with respect to the bug tracker. I've replied, giving the bare minimum > that I feel I can give within the circumstances. > > Yours most disgustingly. > > Mark Lawrence. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 16 13:48:34 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 16 Oct 2016 10:48:34 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: ... actually, thank you Sven (but Mark also. And all the contributors to the discussion, even those I disagree with). On Oct 16, 2016 10:47 AM, "David Mertz" wrote: > Actually, I agree with Marietta. I don't care whatsoever about mocking me, > which was a certain element of it. I have thick skin and am confident in > these conversations. > > The part that was probably over the line was mocking children who learn to > program or those who teach them. That's a huge and great job. I know I > would not have the skill to teach children effectively. Adults with > technical expertise are much easier for me. > > That said, thank you Mark for your empirical research with a test subject. > > Best, David > > On Oct 16, 2016 9:39 AM, "Mark Lawrence via Python-ideas" < > python-ideas at python.org> wrote: > >> On 16/10/2016 16:41, Mariatta Wijaya wrote: >> >>> Her reaction was hilarious: >>>> >>>> "Whom does he teach? Children?" >>>> >>> >>> I sense mockery in your email, and it does not conform to the PSF code >>> of conduct. Please read the CoC before posting in this mailing list. The >>> link is available at the bottom of every python mailing list >>> email.https://www.python.org/psf/codeofconduct/ >>> >>> I don't find teaching children is a laughing matter, neither is the idea >>> of children learning to code. >>> In Canada, we have initiatives like Girls Learning Code and Kids >>> Learning Code. I mentored in a couple of those events and the students >>> are girls aged 8-14. They surprised me with their abilities to learn. I >>> would suggest looking for such mentoring opportunities in your area to >>> gain appreciation with this regard. >>> Thanks. >>> (Sorry to derail everyone from the topic of list comprehension. Please >>> continue!) >>> >>> >> The RUE was allowed to insult the community for years and got away with >> it. I'm autistic, stepped across the line, and got hammered. Hypocrisy at >> its best. Even funnier, the BDFL has asked for my advice in recent weeks >> with respect to the bug tracker. I've replied, giving the bare minimum >> that I feel I can give within the circumstances. >> >> Yours most disgustingly. >> >> Mark Lawrence. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Sun Oct 16 15:26:06 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sun, 16 Oct 2016 21:26:06 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: On 16 October 2016 at 17:16, Todd wrote: >Even if you were right that your approach is somehow inherently easier, >it is flat-out wrong that other approaches lead to "brain impairment". >On the contrary, it is well-established that challenging >the brain prevents or at least delays brain impairment. My phrasing "impairment" is of course somewhat exaggeration. It cannot be compared to harm due to smoking for example. However it also known that many people who do big amount of information processing and intensive reading are subject to earlier loss of the vision sharpness. And I feel it myself. How exactly this happens to the eye itself is not clear for me. One my supposition is that during the reading there is very intensive two-directional signalling between eye and brain. So generally you are correct, the eye is technically a camera attached to the brain and simply sends pictures at some frequency to the brain. But I would tend to think that it is not so simple actually. You probably have heard sometimes users who claim something like: "this text hurts my eyes" For example if you read non-antialiased text and with too high contrast, you'll notice that something is indeed going wrong with your eyes. This can happen probably because the brain starts to signal the eye control system "something is wrong, stop doing it" Since your eye cannot do anything with wrong contrast on your screen and you still need to continue reading, this happens again and again. This can cause indeed unwanted processes and overtiredness of muscles inside the eye. So in case of my examle with Chinese students, who wear goggles more frequently, this would probaly mean that they could "recover" if they just stop reading a lot. "challenging the brain prevents or at least delays brain" Yes but I hardly see connection with this case, I would probably recommend to make some creative exercises, like drawing or solving puzzles for this purpose. But if I propose reading books in illegible font than I would be wrong in any case. > And it also makes no sense that it would cause visual impairment, either. > Comparing glyphs is a higher-level task in the brain, > it has little to do with your eyes. You forget about that whith illegible font or wrong contrast for example you *do* need to do more concentrarion, This causes again your eye to try harder to adopt to the information you see, reread, which again affects your lens and eye movements. Anyway, how do you think then this earlier vision loss happens? You'd say I fantasise? Mikhail From mikhailwas at gmail.com Sun Oct 16 16:15:53 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sun, 16 Oct 2016 22:15:53 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> Message-ID: On 16 October 2016 at 04:10, Steve Dower wrote: >> I posted output with Python2 and Windows 7 >> BTW , In Windows 10 'print' won't work in cmd console at all by default >> with unicode but thats another story, let us not go into that. >> I think you get my idea right, it is not only about printing. > FWIW, Python 3.6 should print this in the console just fine. Feel free to > upgrade whenever you're ready. > > Cheers, > Steve Thanks, that is good, sure I'll do that since I need that right now (a lot of work with Cyrillic data). Mikhail From greg.ewing at canterbury.ac.nz Sun Oct 16 17:23:49 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 17 Oct 2016 10:23:49 +1300 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> <5802D0A0.1040401@canterbury.ac.nz> Message-ID: <5803EFE5.4040701@canterbury.ac.nz> Mikhail V wrote: > Those things cannot be easiliy measured, if at all, If you can't measure something, you can't be sure it exists at all. > In my case I am looking at what I've achieved > during years of my work on it and indeed there some > interesting things there. Have you *measured* anything, though? Do you have any feel for how *big* the effects you're talking about are? > There must *very* solid reason > for digits+letters against my variant, wonder what is it. The reasons only have to be *very* solid if there are *very* large advantages to the alternative you propose. My conjecture is that the advantages are actually extremely *small* by comparison. To refute that, you would need to provide some evidence to the contrary. Here are some reasons in favour of the current system: * At the point where most people learn to program, they are already intimately familiar with reading, writing and pronouncing letters and digits. * It makes sense to use 0-9 to represent the first ten digits, because they have the same numerical value. * Using letters for the remaining digits, rather than punctuation characters, makes sense because we're already used to thinking of them as a group. * Using a consecutive sequence of letters makes sense because we're already familiar with their ordering. * In the absence of any strong reason otherwise, we might as well take them from the beginning of the alphabet. Yes, those are all based on "habits", but they're habits shared by everyone, just like the base 10 that you have a preference for. You would have to provide some strong evidence that it's worth disregarding them and using your system instead. -- Greg From steve at pearwood.info Sun Oct 16 18:40:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 17 Oct 2016 09:40:01 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: <20161016224000.GZ22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 03:02:55PM +0200, Ivan Levkivskyi wrote: > What I have learned from this megathread is that the syntax [*foo for foo > in bar] > is proposed as a replacement for a one-liner itertools.chain(*[foo for foo > in bar]). If people take away nothing else from this thread, it should be that flattening an iterable is as easy as: [x for t in iterable for x in t] which corresponds neatly to: for t in iterable: for x in t: result.append(x) -- Steve From steve at pearwood.info Sun Oct 16 18:55:41 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 17 Oct 2016 09:55:41 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: <20161016225541.GA22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 02:34:58PM +0200, Sven R. Kunze wrote: > As this discussion won't come to an end, I decided to consult my girlfriend. [...] > >>> [(i,i,i) for i in range(4)] > [(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3)] Did you remember to tell your girlfriend that a critical property of the "??? for i in range(4)" construct is that it generates one value per loop? It loops four times, so it generates exactly four values (in this case, each value is a bracketed term). > Let's remove these inner parentheses again. > > >>> [*(i,i,i) for i in range(4)] > File "", line 1 > SyntaxError: iterable unpacking cannot be used in comprehension It loops four times, so it must generate four values. What's the star supposed to do? Turn four loops into twelve? -- Steve From toddrjen at gmail.com Sun Oct 16 20:01:42 2016 From: toddrjen at gmail.com (Todd) Date: Sun, 16 Oct 2016 20:01:42 -0400 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <57FEAF9F.5020103@egenix.com> <57FEEEE3.7050109@brenbarn.net> Message-ID: On Sun, Oct 16, 2016 at 3:26 PM, Mikhail V wrote: > One my supposition is that during the reading there is > very intensive two-directional signalling between eye and > brain. So generally you are correct, the eye is technically > a camera attached to the brain and simply sends pictures > at some frequency to the brain. > But I would tend to think that it is not so simple actually. > You probably have heard sometimes users who claim something like: > "this text hurts my eyes" > For example if you read non-antialiased text and with too > high contrast, you'll notice that something is indeed going wrong > with your eyes. > This can happen probably because the brain starts to signal > the eye control system "something is wrong, stop doing it" > Since your eye cannot do anything with wrong contrast on > your screen and you still need to continue reading, this > happens again and again. This can cause indeed unwanted > processes and overtiredness of muscles inside the eye. > The downards-projecting signals from the brain to the eye are heavily studied. In fact I have friends who specialize in studying those connections specifically. They simply don't behave the way you are describing. You are basing your claims on the superiority of certain sorts of glyphs on conjecture about how the brain works, conjecture that goes against what the evidence says about how the brain actually processes visual information. Yes, the quality of the glyphs can make a big difference. There is no indication, however, that the number of possible glyphs can. > And it also makes no sense that it would cause visual impairment, either. > > Comparing glyphs is a higher-level task in the brain, > > it has little to do with your eyes. > > You forget about that whith illegible font or wrong contrast > for example you *do* need to do more concentrarion, > This causes again your eye to try harder to adopt > to the information you see, reread, which again > affects your lens and eye movements. > I don't want to imply bad faith on your part, but you cut off an important part of what I said: "The size of the glyphs can make a difference, but not the number of available ones. On the contrary, having more glyphs increases the information density of text, reducing the amount of reading you have to do to get the same information." Badly-antialised text can be a problem from that standpoint too. But again, none of this has anything whatsoever to do with the number of glyphs, which is your complaint. Again, I don't want to imply bad faith, but the argument you are making now is completely different than the argument I was addressing. I don't disagree that bad text quality or too much reading can hurt your eyes. On the contrary, I said explicitly that it can. The claim of yours that I was addressing is that having too many glyphs can hurt your eyes or brain, which doesn't match with anything we know about how the eyes or brain work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 16 20:23:03 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 17 Oct 2016 11:23:03 +1100 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> <5802D0A0.1040401@canterbury.ac.nz> Message-ID: <20161017002302.GB22471@ando.pearwood.info> On Sun, Oct 16, 2016 at 05:02:49PM +0200, Mikhail V wrote: > In this discussion yes, but layout aspects can be also > improved and I suppose special purpose of > language does not always dictate the layout of > code, it is up to you who can define that also. > And glyphs is not very narrow aspect, it is > one of the fundamental aspects. Also > it is much harder to develop than good layout, note that. This discussion is completely and utterly off-topic for this mailing list. If you want to discuss changing the world to use your own custom character set for all human communication, you should write a blog or a book. It is completely off-topic for Python: we're interested in improving the Python programming language, not yet another constructed language or artifical alphabet: https://en.wikipedia.org/wiki/Shavian_alphabet If you're interested in this, there is plenty of prior art. See for example: Esperanto, Ido, Volap?k, Interlingua, Lojban. But don't discuss it here. -- Steve From mikhailwas at gmail.com Sun Oct 16 22:48:52 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 17 Oct 2016 04:48:52 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <5803EFE5.4040701@canterbury.ac.nz> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> <5802D0A0.1040401@canterbury.ac.nz> <5803EFE5.4040701@canterbury.ac.nz> Message-ID: On 16 October 2016 at 23:23, Greg Ewing wrote: >> Those things cannot be easiliy measured, if at all, >If you can't measure something, you can't be sure >it exists at all. What do you mean I can't be sure? I am fully functional, mentally healthy man :) >Have you *measured* anything, though? Do you have >any feel for how *big* the effects you're talking >about are? For what case, of course. So the difference for "0010 0011" and "--k- --kk" I can feel indeed big difference. Literally, I can read the latter clearly even I close my left eye and *fully* defocus my right eye. That is indeed a big difference and tells a lot. I suppose for disabled people this would be the only chance to see anything there. Currently I experiment myself and of course I plan to do it with experimental subjects. I plan one survey session in the end of November. But indeed this is very off-topic. So feel free to mail me, if anything. So back to hex notation, which is still not so off-topic I suppose. >>There must *very* solid reason >>for digits+letters against my variant, wonder what is it. >The reasons only have to be *very* solid if there >are *very* large advantages to the alternative you >propose. My conjecture is that the advantages are First ,I am the opinion that *initial* decision in such a case must be supported by solid reasons and not just like, "hey, John has already written them in such a manner, lets take it!". Second, I totally disagree that there always must be *very* large advantages for new standards, if we would follow such principle, we would still use cuneiform for writing or bash-like syntax, since everytime when someone proposes a slight improvement, there would be somebody who says : "but the new is not *that much* better than old!". Actually in many cases it is better when it is evolving - everybody is aware. > Here are some reasons in favour of the current > system: > * At the point where most people learn to program, > they are already intimately familiar with reading, > writing and pronouncing letters and digits. > * It makes sense to use 0-9 to represent the first > ten digits, because they have the same numerical > value. So you mean they start to learn hex and see numbers and think like: ooo it looks like a number, not so scary. So less time to learn, yes, +1 (less pain now, more pain later) But if I am an adult intelligent man, I understand that there are only ten digits and I will despite need to extend the set and they *all* should be optically consequent and good readable. And what is a good readable set with >=16 glyphs? Small letters! Somewhat from the height of my current knowledge, since I know that digits anyway not very good readable. > * Using a consecutive sequence of letters makes > sense because we're already familiar with their > ordering. I actually proposed consecutive, but that does not make much difference: being familiar with ordering of the alphabet will have next to zero influence on the reading of numbers encoded with letters, it is just an illusion that it will, since the letter is a symbol, if I see "z" I don't think of 26. More probably, the weight of the glyph could play some role, that means less the weight - less the number. Mikhail From mikhailwas at gmail.com Sun Oct 16 22:55:27 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 17 Oct 2016 04:55:27 +0200 Subject: [Python-ideas] Proposal for default character representation In-Reply-To: <20161017002302.GB22471@ando.pearwood.info> References: <255d452c-429b-d2cc-5b16-d632b3185e9e@gmail.com> <57FF237B.8090702@canterbury.ac.nz> <57FF52E3.3060309@canterbury.ac.nz> <5800A723.9050806@canterbury.ac.nz> <5802D0A0.1040401@canterbury.ac.nz> <20161017002302.GB22471@ando.pearwood.info> Message-ID: On 17 October 2016 at 02:23, Steven D'Aprano wrote: > On Sun, Oct 16, 2016 at 05:02:49PM +0200, Mikhail V wrote: > >> In this discussion yes, but layout aspects can be also >> improved and I suppose special purpose of >> language does not always dictate the layout of >> code, it is up to you who can define that also. >> And glyphs is not very narrow aspect, it is >> one of the fundamental aspects. Also >> it is much harder to develop than good layout, note that. > > This discussion is completely and utterly off-topic for this mailing > list. If you want to discuss changing the world to use your own custom > character set for all human communication, you should write a blog or a > book. It is completely off-topic for Python: we're interested in > improving the Python programming language, not yet another constructed > language or artifical alphabet: You're right, I was just answering the questions so it came to other thing somehow. BTW, among others we have discussed bitstring representation. So if you work with those, for example if you model cryptography algorithms or similar things in Python, this could help you for example to debug your programs and generally one could interpret it as, say, how is about adding an extra notation for this sake. And if you noticed this is not really about my glyphs, but lies in ASCII. So actually it is me who tried to turn it back to on-topic. Mikhail From ncoghlan at gmail.com Sun Oct 16 23:33:42 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 17 Oct 2016 13:33:42 +1000 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On 16 October 2016 at 04:36, INADA Naoki wrote: [Serhiy wrote] >> >> Are there precedences of combining verbose and version options in other >> programs? >> > > No, I was just afraid about other programs rely on format of python -V. That would be my concern as well - while I can't *name* any specific projects that use "python -V" to extract version info (e.g. for filename generation based on MAJOR.MINOR), it's still the obvious thing to call if you need that info and aren't already writing in Python yourself. >> I think it would not be large breakage if new releases of CPython become >> outputting extended version information by default. > > I like it if it's OK. > Does anyone against this? I think adding "verbose version" is a good idea, with a clear and reasonably obvious meaning. While it *is* a little unusual to implement it that way, I don't think that's sufficient reason to break with the established output format for the plain "-V". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Sun Oct 16 23:40:01 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 17 Oct 2016 14:40:01 +1100 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On Mon, Oct 17, 2016 at 2:33 PM, Nick Coghlan wrote: > While it *is* a little unusual to implement it that way, I don't think > that's sufficient reason to break with the established output format > for the plain "-V". Seems reasonable. Minor point: I'd be forever having to check whether it's -vV, -Vv, or -VV - particularly as I often find myself using "python -v", and groaning at the spew of spam as interactive Python starts up in verbose mode. Can all three be added, maybe? Then -Vv and -vV are "verbose version", and -VV is "version, and more so" (in the same way that -qq is more quiet than q, or -gg is more debuggy than -g). ChrisA From ncoghlan at gmail.com Mon Oct 17 00:06:27 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 17 Oct 2016 14:06:27 +1000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <580305C3.7000009@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On 16 October 2016 at 14:44, Greg Ewing wrote: > Steven D'Aprano wrote: > >> This thread is a huge, multi-day proof that people do not agree that this >> is a "reasonable" interpretation. > > So far I've seen one very vocal person who disgrees, and > maybe one other who isn't sure. "Language design by whoever shouts the loudest and the longest" is a terrible idea, and always has been. It's why "Has Guido already muted the thread?" is a useful metric for whether or not people are wasting their time in an unproductive argument (I don't know if he's muted this particular thread by now, but I'd be genuinely surprised if he hasn't) Remember that what we're arguing about is that existing instances of: [x for subiterable in iterable for x in subiterable] or: list(itertools.chain.from_iterable(iterable)) would be easier to read and maintain if they were instead written as: [*subiter for subiter in iterable] That's the bar people have to reach - if we're going to add a 3rd spelling for something that already has two spellings, then a compelling argument needs to be presented that the new spelling is *always* preferable to the existing ones, *not* merely "some people already agree that this 3rd spelling should mean the same thing as the existing spellings". The only proposal in this thread that has come close to reaching that bar is David Mertz's proposal to reify single level flattening as a flatten() builtin: [x for x in flatten(iterable)] or, equivalently: list(flatten(iterable)) Then the only thing that folks need to learn is that Python's builtin "flatten()" is a *non-recursive* operation that consistently flattens one layer of iterables with no special casing (not even of strings or bytes-like objects). > Many people do, and it's a perfectly valid way to think > about them. They're meant to admit a declarative reading; > that's the reason they exist in the first place. > > The expansion in terms of for-loops and appends is just > *one* way to describe the current semantics. It's not > written on stone tablets brought down from a mountain. > Any other way of thinking about it that gives the same > result is equally valid. This is why I brought up mathematical set builder notation early in the thread, and requested that people present prior art for this proposal from that domain. It's the inspiration for comprehensions, so if a proposal to change comprehensions: - can't be readily explained in terms of their syntactic sugar for Python statements - can't be readily explained in terms of mathematical set builder notation then it's on incredibly shaky ground. >> magically adds an second invisible for-loop to your list comps: > > You might as well say that the existing * in a list > display magically inserts a for-loop into it. You can > think of it that way if you want, but you don't have > to. > >> it is intentionally >> prohibited because it doesn't make sense in the context of list comps. > > I don't know why it's currently prohibited. You would > have to ask whoever put that code in, otherwise you're > just guessing about the motivation. No need to guess, PEP 448 says why they're prohibited: https://www.python.org/dev/peps/pep-0448/#variations "This was met with a mix of strong concerns about readability and mild support. " Repeatedly saying "All of you people who find it unreadable are wrong, it's perfectly readable to *me*" does nothing except exacerbate the readability concerns, as folks who find it intuitive will use it over the more explicit existing alternatives, creating exactly the readability and maintainability problem we're worried about. Cryptic syntactic abbreviations are sometimes worthwhile when they're summarising something that can't otherwise be expressed easily in the form of an expression, but that isn't the case here - the existing alternatives are already expressions, and one of them already doesn't require any imports. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Oct 17 00:21:51 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 17 Oct 2016 14:21:51 +1000 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On 17 October 2016 at 13:40, Chris Angelico wrote: > On Mon, Oct 17, 2016 at 2:33 PM, Nick Coghlan wrote: >> While it *is* a little unusual to implement it that way, I don't think >> that's sufficient reason to break with the established output format >> for the plain "-V". > > Seems reasonable. Minor point: I'd be forever having to check whether > it's -vV, -Vv, or -VV If we use the normal verbose flag, then both "-vV" and "-Vv" will work, since options can be provided in any order. I don't think it makes sense to also allow "-VV" - we're not requesting the version twice, we're asking for more verbose version information. Since "-v" is already a counted option, we're also free to expand it to give even more info the more verbose we ask it to be (although initially I think pursuing just Inada-san's main suggestion of matching the REPL header makes sense) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Oct 17 01:51:59 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 17 Oct 2016 16:51:59 +1100 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On Mon, Oct 17, 2016 at 3:21 PM, Nick Coghlan wrote: > On 17 October 2016 at 13:40, Chris Angelico wrote: >> On Mon, Oct 17, 2016 at 2:33 PM, Nick Coghlan wrote: >>> While it *is* a little unusual to implement it that way, I don't think >>> that's sufficient reason to break with the established output format >>> for the plain "-V". >> >> Seems reasonable. Minor point: I'd be forever having to check whether >> it's -vV, -Vv, or -VV > > If we use the normal verbose flag, then both "-vV" and "-Vv" will > work, since options can be provided in any order. That's a good start, at least. > I don't think it makes sense to also allow "-VV" - we're not > requesting the version twice, we're asking for more verbose version > information. It's not as far-fetched as you might think - if "vv" means "more verbose", and "qq" means "more quiet", then "VV" means "more version info". It's a form of word doubling for emphasis, a tradition that dates back at least as far as ancient Hebrew, and has some currency in English. And if -VV has no other meaning, it's not going to hurt to accept it as an alias for -Vv, right? Remember, this option isn't only for the expert - it's also for the novice, who might be typing at the dictation of someone else ("can you tell me what python -vV says, please? -- no, that's one capital V and one small v -- no no, not both capital, just one"). But if it can't be done, so be it. At least with them being independent flags, the order doesn't matter. > Since "-v" is already a counted option, we're also free > to expand it to give even more info the more verbose we ask it to be > (although initially I think pursuing just Inada-san's main suggestion > of matching the REPL header makes sense) Sure, I guess. Not sure what -Vvv would mean, but okay. The same could easily be done with -VVV though, just by making -V a counted option. Logic could simply be: if Version: # Count verbosity based on -v and/or -V Version += verbose if Version >= 3: print("info for -Vvv") if Version >= 2: print(sys.version) if Version == 1: # subsumed into the above print(sys.version.split(" ")[0]) ChrisA From ncoghlan at gmail.com Mon Oct 17 02:02:47 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 17 Oct 2016 16:02:47 +1000 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On 17 October 2016 at 15:51, Chris Angelico wrote: > On Mon, Oct 17, 2016 at 3:21 PM, Nick Coghlan wrote: >> On 17 October 2016 at 13:40, Chris Angelico wrote: >>> On Mon, Oct 17, 2016 at 2:33 PM, Nick Coghlan wrote: >>>> While it *is* a little unusual to implement it that way, I don't think >>>> that's sufficient reason to break with the established output format >>>> for the plain "-V". >>> >>> Seems reasonable. Minor point: I'd be forever having to check whether >>> it's -vV, -Vv, or -VV >> >> If we use the normal verbose flag, then both "-vV" and "-Vv" will >> work, since options can be provided in any order. > > That's a good start, at least. > >> I don't think it makes sense to also allow "-VV" - we're not >> requesting the version twice, we're asking for more verbose version >> information. > > It's not as far-fetched as you might think - if "vv" means "more > verbose", and "qq" means "more quiet", then "VV" means "more version > info". I'm fine with making "-V" itself a counted option, and hence supporting -VV *instead of* -vV. The only approach I'm not OK with is allowing both -VV *and* the mixed-case form to request more detailed version information. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Mon Oct 17 02:02:29 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 17 Oct 2016 15:02:29 +0900 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: >> Since "-v" is already a counted option, we're also free >> to expand it to give even more info the more verbose we ask it to be >> (although initially I think pursuing just Inada-san's main suggestion >> of matching the REPL header makes sense) > > Sure, I guess. Not sure what -Vvv would mean, but okay. The same could > easily be done with -VVV though, just by making -V a counted option. > Logic could simply be: > Fortunately, it's a counting option already. In Modules/main.c: case 'v': Py_VerboseFlag++; break; ... case 'V': version++; break; ... if (version) { printf("Python %s\n", PY_VERSION); return 0; } So change is easy: diff -r 0b29adb5c804 Modules/main.c --- a/Modules/main.c Mon Oct 17 06:14:48 2016 +0300 +++ b/Modules/main.c Mon Oct 17 15:00:26 2016 +0900 @@ -512,7 +512,12 @@ Py_Main(int argc, wchar_t **argv) return usage(0, argv[0]); if (version) { - printf("Python %s\n", PY_VERSION); + if (version >= 2) { // or if (version >= 2 || Py_VerboseFlag) { + printf("Python %s\n", Py_GetVersion()); + } + else { + printf("Python %s\n", PY_VERSION); + } return 0; } $ ./python.exe -V Python 3.6.0b2+ $ ./python.exe -VV Python 3.6.0b2+ (3.6:0b29adb5c804+, Oct 17 2016, 15:00:12) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] $ ./python.exe -VVV Python 3.6.0b2+ (3.6:0b29adb5c804+, Oct 17 2016, 15:00:12) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] -- INADA Naoki From rosuav at gmail.com Mon Oct 17 02:04:29 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 17 Oct 2016 17:04:29 +1100 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On Mon, Oct 17, 2016 at 5:02 PM, Nick Coghlan wrote: > I'm fine with making "-V" itself a counted option, and hence > supporting -VV *instead of* -vV. > > The only approach I'm not OK with is allowing both -VV *and* the > mixed-case form to request more detailed version information. Okay. I'd have no problem with that. It's easy enough to ask people to capitalize them both. Definite +1 from me. ChrisA From rosuav at gmail.com Mon Oct 17 02:06:05 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 17 Oct 2016 17:06:05 +1100 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: On Mon, Oct 17, 2016 at 5:02 PM, INADA Naoki wrote: > $ ./python.exe -V > Python 3.6.0b2+ > > $ ./python.exe -VV > Python 3.6.0b2+ (3.6:0b29adb5c804+, Oct 17 2016, 15:00:12) > [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] LGTM. What's the view on backporting this to 2.7.x? We're still a good few years away from its death, and it'd be helpful if recent 2.7s could give this info too. ChrisA From songofacandy at gmail.com Mon Oct 17 02:18:38 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 17 Oct 2016 15:18:38 +0900 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: (Added python-dev in CC list, because there are enough +1 already). On Mon, Oct 17, 2016 at 3:06 PM, Chris Angelico wrote: > On Mon, Oct 17, 2016 at 5:02 PM, INADA Naoki wrote: >> $ ./python.exe -V >> Python 3.6.0b2+ >> >> $ ./python.exe -VV >> Python 3.6.0b2+ (3.6:0b29adb5c804+, Oct 17 2016, 15:00:12) >> [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] > > LGTM. > > What's the view on backporting this to 2.7.x? We're still a good few > years away from its death, and it'd be helpful if recent 2.7s could > give this info too. > > ChrisA I want to add it at least Python 3.6. Because one reason I want to propose this is I can't see exact Python version (commit id) for "nightly" or "3.6-dev" on Travis-CI test. But Python 3.6 is beta stage already. If we apply rule strictly, it should be added only in default branch (Python 3.7). So, what version can I add this? a. Only Python 3.7+ b. (beta) Python 3.6+ c. (maintenance) Python 2.7 and Python 3.5+ -- INADA Naoki From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 17 12:11:10 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 18 Oct 2016 01:11:10 +0900 Subject: [Python-ideas] Show more info when `python -vV` In-Reply-To: References: Message-ID: <22532.63518.329562.553102@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > If we use the normal verbose flag, then both "-vV" and "-Vv" will > work, since options can be provided in any order. +0.5 for some such option, +1 for "-VV" or "-V -V" I for one would likely make that mistake (using "-VV" instead of "-vV") a lot. "python -V" is the first thing I do on somebody else's system when helping them debug, but I rarely use "python -v" in anger. I suspect that I wouldn't use "python -Vv" [sic] often enough to remember it. From random832 at fastmail.com Mon Oct 17 12:11:46 2016 From: random832 at fastmail.com (Random832) Date: Mon, 17 Oct 2016 12:11:46 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> On Mon, Oct 17, 2016, at 00:06, Nick Coghlan wrote: > Remember that what we're arguing about is that existing instances of: > > [x for subiterable in iterable for x in subiterable] > > or: > > list(itertools.chain.from_iterable(iterable)) > > would be easier to read and maintain if they were instead written as: > > [*subiter for subiter in iterable] > > That's the bar people have to reach - if we're going to add a 3rd > spelling for something that already has two spellings, then a > compelling argument needs to be presented that the new spelling is > *always* preferable to the existing ones, Nothing is *always* preferable. That's an impossible bar for any feature that is already in python to have reached. Case in point - neither of the two spellings that you just gave is always preferable to the other. >*not* merely "some people > already agree that this 3rd spelling should mean the same thing as the > existing spellings". It's the same bar that [a, b, *c, d] had to reach over list(itertools.chain((a, b), c, (d,))). You've also artificially constructed the examples to make the proposal look worse - there is only one layer of the comprehension so adding a second one doesn't look so bad. Meanwhile with the itertools.chain, the fact that it's just "*subiter" rather than [*some_function(item) for item in iterable] allows your chain example to be artificially short, it'd have to be list(itertools.chain.from_iterable(some_function(item) for item in iterable)) > The only proposal in this thread that has come close to reaching that > bar is David Mertz's proposal to reify single level flattening as a > flatten() builtin: > > [x for x in flatten(iterable)] Once again, this alleged simplicity relies on the chosen example "x for x" rather than "f(x) for x" - this one doesn't even put the use of flatten in the right place to be generalized to the more complex cases. You'd need list(flatten(f(x) for x in iterable)) And this is where the "flatten" proposal fails - there's no way to use it alongside the list comprehension construct - either you can use a list comprehension, or you have to use the list constructor with a generator expression and flatten. > Repeatedly saying "All of you people who find it unreadable are wrong, > it's perfectly readable to *me*" Honestly, it goes beyond just being "wrong". The repeated refusal to even acknowledge any equivalence between [...x... for x in [a, b, c]] and [...a..., ...b..., ...c...] truly makes it difficult for me to accept some people's _sincerity_. The only other interpretation I see as possible is if they _also_ think [a, *b, c] is unreadable (something hinted at with the complaint that this is "hard to teach" because "in the current syntax, an expression is required in that position." something that was obviously also true of all the other places that unpacking generalization was added]) and are fighting a battle they already lost. > does nothing except exacerbate the > readability concerns, as folks who find it intuitive will use it over > the more explicit existing alternatives, creating exactly the > readability and maintainability problem we're worried about. Cryptic > syntactic abbreviations are sometimes worthwhile when they're > summarising something that can't otherwise be expressed easily in the > form of an expression, but that isn't the case here - the existing > alternatives are already expressions, and one of them already doesn't > require any imports. From rene at stranden.com Mon Oct 17 12:36:48 2016 From: rene at stranden.com (Rene Nejsum) Date: Mon, 17 Oct 2016 18:36:48 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> Message-ID: <1A812A64-8C11-4A10-9103-927790EB0CA8@stranden.com> Regarding the Python C-runtime and async, I just had a good talk with Kresten Krab at Trifork. He implemented ?Erjang? the Java implementation of the Erlang VM (www.erjang.org ). Doing this he had access to the Erlang (C) VM. It turn?s out that the Erlang VM and the Python VM has a lot of similarities and the differences are more in the language, than in the VM Differences between the Erlang VM and Python related to async are: 1) Most variables in Erlang are immutable Making it easier to have coroutines 2) coroutines are built into the Erlang using the ?spawn? keyword Leaving the specific implementation to the VM, but never implemented with OS threads. 3) All coroutines have their own heap and stack (initially 200 bytes), but can grow as needed 4) coroutines are managed in ?ready-queue?, from which the VM thread executes the next ready job Each job gets 2000 ?instructions? (or until IO block) and the next coroutine is executed Because of this, when multicore CPU?s entered the game, it was quite easy to change the Erlang VM to add a thread per core to pull from the ready-queue. This makes an Erlang program run twice as fast (almost) every time the number of cores are doubled! Given this, I am still convinced that: obj = async SomeObject() should be feasible, even though there will be some ?golang? like issues about shared data, but there could be several ways to handle this. br /Rene > On 05 Oct 2016, at 18:06, Nick Coghlan wrote: > > On 5 October 2016 at 16:49, Rene Nejsum wrote: >>> On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: >>> I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. >> >> I agree, that?s why i proposed to put the async keyword in when creating the object, saying in this instance I want asynchronous communication with the object. > > OK, I think there may be a piece of foundational knowledge regarding > runtime design that's contributing to the confusion here. > > Python's core runtime model is the C runtime model: threads (with a > local stack and access to a global process heap) and processes (which > contain a heap and one or more threads). Anything else we do (whether > it's generators, coroutines, or some other form of paused execution > like callback management) gets layered on top of that runtime model. > When folks ask questions like "Why can't Python be more like Go?", > "Why can't Python be more like Erlang?", or "Why can't Python be more > like Rust?" and get a negative response, it's usually because there's > an inherent conflict between the C runtime model and whatever piece of > the Go/Erlang/Rust runtime model we want to steal. > > So the "async" keyword in "async def", "async for" and "async with" is > essentially a marker saying "This is not a C-like runtime concept > anymore!" (The closest C-ish equivalent I'm aware of would be Apple's > Grand Central Dispatch in Objective-C and that shows many of the > async/await characteristics also seen in Python and C#: > https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 > ) > > Go (as with Erlang before it) avoided these problems by not providing > C-equivalent functions in the first place. Accordingly, *every* normal > function defined in Go can also be used as a goroutine, rather than > needing to be a distinct type - their special case is defining > functions that interoperate with external C libraries. Python (along > with other languages built on the C runtime model like C# and > Objective-C) doesn't have that luxury - we need to distinguish > coroutines from regular functions, since we can't just handle them > according to the underlying C runtime model any more. > > Guido's idea of a shadow thread to let synchronous threads run > coroutines without needing to actually run a foreground event loop > should provide a manageable way of getting the two runtime models > (traditional C and asynchronous coroutines) to play nicely together in > a single application, and has the virtue of being something folks can > readily experiment with for themselves before we commit to anything > specific in the standard library (since all the building blocks of > thread local storage, event loop management, and inter-thread message > passing primitives are already available). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Oct 17 13:32:19 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 18 Oct 2016 04:32:19 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> Message-ID: <20161017173219.GC22471@ando.pearwood.info> On Mon, Oct 17, 2016 at 12:11:46PM -0400, Random832 wrote: > Honestly, it goes beyond just being "wrong". The repeated refusal to > even acknowledge any equivalence between [...x... for x in [a, b, c]] > and [...a..., ...b..., ...c...] truly makes it difficult for me to > accept some people's _sincerity_. While we're talking about people being insincere, how about if you take a look at your own comments? This "repeated refusal" that you accuse us (opponents of this proposal) of is more of a rhetorical fiction than an actual reality. Paul, David and I have all acknowledged the point you are trying to make. I won't speak for Paul or David, but speaking for myself, it isn't that I don't understand the point you're trying to make, but that I do not understand why you think that point is meaningful or desirable. I have acknowledged that starring the expression in a list comprehension makes sense if you think of the comprehension as a fully unrolled list display: [*expr, *expr *expr, *expr, ...] What I don't believe is: (1) that the majority of Python programmers (or even a large minority) regularly and consistently think of comprehensions as syntactic sugar for a completely unrolled list display; rather, I expect that they usually think of them as sugar for a for-loop; (2) that we should encourage people to think of comprehensions as sugar for a completely unrolled list display rather than a for-loop; (3) that we should accept syntax which makes no sense in the context of a for-loop-with-append (i.e. the thing which comprehensions are sugar for). But if we *do* accept this syntax, then I believe that we should drop the pretense that it is a natural extension of sequence unpacking in the context of a for-loop-with-append (i.e. list comprehensions) and accept that it will be seen by people as a magical "flatten" operator. And, in my opinion, rightly so: the semantic distance between *expr in a list comp and the level of explanation where it makes sense is so great that thinking of it as just special syntax for flattening is the simplest way of looking at it. So, yet again for emphasis: I see what you mean about unrolling the list comprehension into a list display. But I believe that's not a helpful way to think about list comprehensions. The way we should be thinking about them is as for-loops with append, and in *that* context, sequence unpacking doesn't make sense. In a list comprehension, we expect the invariant that the number of items produced will equal the number of loops performed. (Less if there are any "if" clauses.) There is one virtual append per loop. You cannot get the behaviour you want without breaking that invariant: either the append has to be replaced by extend, or you have so insert an extra loop into your mental picture of comprehensions. Yet again, for emphasis: I understand that you don't believe that invariant is important, or at least you are willing to change it. But drop the pretense that this is an obvious extension to the well- established behaviour of list comprehensions and sequence unpacking. If you think you can convince people (particularly Guido) that this flattening behaviour is important enough to give up the invariant "one append per loop", then by all means try. For all I know, Guido might agree with you and love this idea! But while you're accusing us of refusing to acknowledge the point you make about unrolling the loop to a list display (what I maintain is an unhelpful and non-obvious way of thinking about this), you in turn seem to be refusing to acknowledge the points we have made. This isn't a small change: it requires not insignificant changes to people's understanding of what list comprehension syntax means and does. -- Steve From random832 at fastmail.com Mon Oct 17 13:49:22 2016 From: random832 at fastmail.com (Random832) Date: Mon, 17 Oct 2016 13:49:22 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017173219.GC22471@ando.pearwood.info> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> On Mon, Oct 17, 2016, at 13:32, Steven D'Aprano wrote: > This isn't a small change: it requires not > insignificant changes to people's understanding of what list > comprehension syntax means and does. Only if their understanding is limited to a sequence of tokens that it supposedly expands to [except for all the little differences like whether a variable actually exists] - like your argument that it should just convert to a tuple because "yield x, y" happens to yield a tuple - rather than actual operations with real semantic meaning. From brenbarn at brenbarn.net Mon Oct 17 14:16:03 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Mon, 17 Oct 2016 11:16:03 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017173219.GC22471@ando.pearwood.info> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <58051563.7010904@brenbarn.net> On 2016-10-17 10:32, Steven D'Aprano wrote: > In a list comprehension, we expect the invariant that the number of > items produced will equal the number of loops performed. (Less if there > are any "if" clauses.) There is one virtual append per loop. You cannot > get the behaviour you want without breaking that invariant: either the > append has to be replaced by extend, or you have so insert an extra loop > into your mental picture of comprehensions. > > Yet again, for emphasis: I understand that you don't believe that > invariant is important, or at least you are willing to change it. But > drop the pretense that this is an obvious extension to the well- > established behaviour of list comprehensions and sequence unpacking. It seems to me that this difference is fundamental. The entire point of this type of generalization is to break that invariant and allow the number of elements in the result list to vary independently of the number of iterations in the comprehension. It seems that a lot of this thread is talking at cross purposes, because the specifics of the syntax don't matter if you insist on that invariant. For instance, there's been a lot of discussion about whether this use of * is or isn't parallel to argument unpacking or assignment unpacking, or whether it's "intuitive" to some people or all people. But none of that matters if you insist on this invariant. If you insist on this invariant, no syntax will be acceptable; what is at issue is the semantics of enlarging the resulting list by more than one element. Now, personally, I don't insist on that invariant. I would certainly like to be able to do more general things in a list comprehension, and many times I have been irritated by the fact that the one-item-per-loop invariant exists. I'm not sure whether I'm in favor of this particular syntax, but I'd like to be able to do the kind of things it allows. But doing them inherently requires breaking the invariant you describe. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From srkunze at mail.de Mon Oct 17 14:18:42 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 20:18:42 +0200 Subject: [Python-ideas] Heap data type, the revival In-Reply-To: References: <597ec8cb-68a9-17eb-4662-a38865b41b24@mail.de> <6be4e716-ec48-d025-c9ab-33383cd8ae10@mail.de> Message-ID: <6e32d010-2510-95d6-901a-e951ce186a1e@mail.de> On 16.10.2016 19:02, Nick Timkovich wrote: > Functions are great; I'm a big fan of functions. That said, the group > of heapq.heap* functions are literally OOP without using that "class" > word. As far as flexibility, what is the use of the those functions on > non-heap structures? IIRC the statement wasn't about "non-heap structures". It was about, "I need a heap which does something special and subclassing might not solve it. So, I run my own implementation using those functions". On the other hand, I can fully understand the need for general-purpose oo-heap implementation. That why I put xheap on github/PyPI. Best, Sven From elazarg at gmail.com Mon Oct 17 14:20:48 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 17 Oct 2016 18:20:48 +0000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <58051563.7010904@brenbarn.net> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <58051563.7010904@brenbarn.net> Message-ID: But the proposal has explicit syntax that point the reader to the fact that the invariant doesn't hold. Same as other unpacking occurences: [x, *y] The invariant does not hold. And that's explicit. Elazar ?????? ??? ??, 17 ????' 2016, 21:16, ??? Brendan Barnwell ?< brenbarn at brenbarn.net>: > On 2016-10-17 10:32, Steven D'Aprano wrote: > > In a list comprehension, we expect the invariant that the number of > > items produced will equal the number of loops performed. (Less if there > > are any "if" clauses.) There is one virtual append per loop. You cannot > > get the behaviour you want without breaking that invariant: either the > > append has to be replaced by extend, or you have so insert an extra loop > > into your mental picture of comprehensions. > > > > Yet again, for emphasis: I understand that you don't believe that > > invariant is important, or at least you are willing to change it. But > > drop the pretense that this is an obvious extension to the well- > > established behaviour of list comprehensions and sequence unpacking. > > It seems to me that this difference is fundamental. The entire > point > of this type of generalization is to break that invariant and allow the > number of elements in the result list to vary independently of the > number of iterations in the comprehension. > > It seems that a lot of this thread is talking at cross purposes, > because the specifics of the syntax don't matter if you insist on that > invariant. For instance, there's been a lot of discussion about whether > this use of * is or isn't parallel to argument unpacking or assignment > unpacking, or whether it's "intuitive" to some people or all people. > But none of that matters if you insist on this invariant. If you insist > on this invariant, no syntax will be acceptable; what is at issue is the > semantics of enlarging the resulting list by more than one element. > > Now, personally, I don't insist on that invariant. I would > certainly > like to be able to do more general things in a list comprehension, and > many times I have been irritated by the fact that the one-item-per-loop > invariant exists. I'm not sure whether I'm in favor of this particular > syntax, but I'd like to be able to do the kind of things it allows. But > doing them inherently requires breaking the invariant you describe. > > -- > Brendan Barnwell > "Do not follow where the path may lead. Go, instead, where there is no > path, and leave a trail." > --author unknown > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Oct 17 14:32:24 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 17 Oct 2016 11:32:24 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On Sun, Oct 16, 2016 at 9:06 PM, Nick Coghlan wrote: > Remember that what we're arguing about is that existing instances of: > > [x for subiterable in iterable for x in subiterable] > > [...] would be easier to read and maintain if they were instead written as: > > [*subiter for subiter in iterable] > > That's the bar people have to reach - if we're going to add a 3rd > spelling for something that already has two spellings > > The only proposal in this thread that has come close to reaching that > bar is David Mertz's proposal to reify single level flattening as a > flatten() builtin: > > [x for x in flatten(iterable)] > > or, equivalently: > > list(flatten(iterable)) > I don't think I'd actually propose a builtin for this. For me, including the recipe that is in the itertools documentation into a function in the module would be plenty. Writing "from itertools import flatten" is not hard. > Then the only thing that folks need to learn is that Python's builtin > "flatten()" is a *non-recursive* operation that consistently flattens > one layer of iterables with no special casing (not even of strings or > bytes-like objects). > Actually, I think that if we added `flatten()` to itertools, I'd like a more complex implementation that had a signature like: def flatten(it, levels=1): # Not sure the best implementation for clever use of other itertools ... I'm not quite sure how one would specify "flatten all the way down." In practice, `sys.maxsize` is surely large enough; but semantically it feels weird to use an arbitrary large number to mean "infinitely if necessary." -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Oct 17 14:29:33 2016 From: brett at python.org (Brett Cannon) Date: Mon, 17 Oct 2016 18:29:33 +0000 Subject: [Python-ideas] please try to keep things civil (was: unpacking generalisations for list comprehension) In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On Sun, 16 Oct 2016 at 09:39 Mark Lawrence via Python-ideas < python-ideas at python.org> wrote: > On 16/10/2016 16:41, Mariatta Wijaya wrote: > >>Her reaction was hilarious: > >> > >>"Whom does he teach? Children?" > > > > I sense mockery in your email, and it does not conform to the PSF code > > of conduct. Please read the CoC before posting in this mailing list. The > > link is available at the bottom of every python mailing list > > email.https://www.python.org/psf/codeofconduct/ > > > > I don't find teaching children is a laughing matter, neither is the idea > > of children learning to code. > > In Canada, we have initiatives like Girls Learning Code and Kids > > Learning Code. I mentored in a couple of those events and the students > > are girls aged 8-14. They surprised me with their abilities to learn. I > > would suggest looking for such mentoring opportunities in your area to > > gain appreciation with this regard. > > Thanks. > > (Sorry to derail everyone from the topic of list comprehension. Please > > continue!) > > > > The RUE was allowed to insult the community for years and got away with > it. What is the "RUE"? > I'm autistic, stepped across the line, and got hammered. Hypocrisy > at its best. While some of us know your background, Mark, not everyone on this list does as people join at different times, so please try to give the benefit of the doubt to people. Marietta obviously takes how children are reflected personally and was trying to point out that fact. I don't think she meant for the CoC reference to come off as threatening, just to back up why she was taking the time out to speak up that she was upset by what was said. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Oct 17 14:38:45 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 17 Oct 2016 11:38:45 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> Message-ID: On Mon, Oct 17, 2016 at 9:11 AM, Random832 wrote: > Once again, this alleged simplicity relies on the chosen example "x for > x" rather than "f(x) for x" - this one doesn't even put the use of > flatten in the right place to be generalized to the more complex cases. > You'd need list(flatten(f(x) for x in iterable)) > What you're saying is EXACTLY 180 deg reversed from the truth. It's *precisely* because it doesn't need the extra complication that `flatten()` is more flexible and powerful. I have no idea what your example is meant to do, but the actual correspondence is: [f(x) for x in flatten(it)] Under my proposed "more flexible recursion levels" idea, it could even be: [f(x) for x in flatten(it, levels=3)] There would simply be NO WAY to get that out of the * comprehension syntax at all. But a decent flatten() function gets all the flexibility. > Honestly, it goes beyond just being "wrong". The repeated refusal to > even acknowledge any equivalence between [...x... for x in [a, b, c]] > and [...a..., ...b..., ...c...] truly makes it difficult for me to > accept some people's _sincerity_. > I am absolutely sincere in disliking and finding hard-to-teach this novel use of * in comprehensions. Yours, David... P.S. It's very artificial to assume user are unable to use 'from itertools import chain' to try to make chain() seem more cumbersome than it is. Likewise, I would like flatten() in itertools, but I assume the usual pattern would be importing the function itself. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Oct 17 14:39:05 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 17 Oct 2016 11:39:05 -0700 Subject: [Python-ideas] async objects In-Reply-To: <1A812A64-8C11-4A10-9103-927790EB0CA8@stranden.com> References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <1A812A64-8C11-4A10-9103-927790EB0CA8@stranden.com> Message-ID: The problem is that if your goal is to make a practical proposal, it's not enough to look at Python-the-language. You're absolutely right, AFAICT there's nothing stopping someone from making a nice implementation of Python-the-language that has erlang-style cheap shared-nothing threads with some efficient message-passing mechanism. But! It turns out that unless your new implementation supports the CPython C API, then it's almost certainly not viable as a mainstream CPython alternative, because there's this huge huge pile of libraries that have been written against that C API. You're not competing against CPython, you're competing against CPython+thousands of libraries that you don't have and that your users expect. And unfortunately, it turns out that the C API locks in a bunch of the implementation assumptions (refcounting, the GIL, use of the C stack, poor support for isolation between different interpreter states, ...) that you were trying to get away from. I mean, in many ways it's a good problem to have, that our current ecosystem is just so attractive that it's hard to compete with! (Though a pessimist could point out that this difficulty with competing with yourself is exactly what tends to eventually undermine incumbents -- cf. the innovator's dilemma.) And it's "just" a matter of implementation, not Python-the-language itself. But the bottom line is: this is *the* core problem that you have to grapple with if you want to make any radical improvements in the Python runtime and have people actually use them. -n On Mon, Oct 17, 2016 at 9:36 AM, Rene Nejsum wrote: > Regarding the Python C-runtime and async, I just had a good talk with > Kresten Krab at Trifork. He implemented ?Erjang? the Java implementation of > the Erlang VM (www.erjang.org). Doing this he had access to the Erlang (C) > VM. > > It turn?s out that the Erlang VM and the Python VM has a lot of similarities > and the differences are more in the language, than in the VM > > Differences between the Erlang VM and Python related to async are: > > 1) Most variables in Erlang are immutable > Making it easier to have coroutines > > 2) coroutines are built into the Erlang using the ?spawn? keyword > Leaving the specific implementation to the VM, but never implemented with OS > threads. > > 3) All coroutines have their own heap and stack (initially 200 bytes), but > can grow as needed > > 4) coroutines are managed in ?ready-queue?, from which the VM thread > executes the next ready job > Each job gets 2000 ?instructions? (or until IO block) and the next coroutine > is executed > > Because of this, when multicore CPU?s entered the game, it was quite easy to > change the Erlang VM to add a thread per core to pull from the ready-queue. > This makes an Erlang program run twice as fast (almost) every time the > number of cores are doubled! > > Given this, I am still convinced that: > > obj = async SomeObject() > > should be feasible, even though there will be some ?golang? like issues > about shared data, but there could be several ways to handle this. > > br > /Rene > > > On 05 Oct 2016, at 18:06, Nick Coghlan wrote: > > On 5 October 2016 at 16:49, Rene Nejsum wrote: > > On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: > I don't think that's actually what I wanted here. One simple keyword should > have sufficed just like golang did. So, the developer gets a way to decide > whether or not he needs it blocking or nonblocking **when using a > function**. He doesn't need to decide it **when writing the function**. > > > I agree, that?s why i proposed to put the async keyword in when creating the > object, saying in this instance I want asynchronous communication with the > object. > > > OK, I think there may be a piece of foundational knowledge regarding > runtime design that's contributing to the confusion here. > > Python's core runtime model is the C runtime model: threads (with a > local stack and access to a global process heap) and processes (which > contain a heap and one or more threads). Anything else we do (whether > it's generators, coroutines, or some other form of paused execution > like callback management) gets layered on top of that runtime model. > When folks ask questions like "Why can't Python be more like Go?", > "Why can't Python be more like Erlang?", or "Why can't Python be more > like Rust?" and get a negative response, it's usually because there's > an inherent conflict between the C runtime model and whatever piece of > the Go/Erlang/Rust runtime model we want to steal. > > So the "async" keyword in "async def", "async for" and "async with" is > essentially a marker saying "This is not a C-like runtime concept > anymore!" (The closest C-ish equivalent I'm aware of would be Apple's > Grand Central Dispatch in Objective-C and that shows many of the > async/await characteristics also seen in Python and C#: > https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 > ) > > Go (as with Erlang before it) avoided these problems by not providing > C-equivalent functions in the first place. Accordingly, *every* normal > function defined in Go can also be used as a goroutine, rather than > needing to be a distinct type - their special case is defining > functions that interoperate with external C libraries. Python (along > with other languages built on the C runtime model like C# and > Objective-C) doesn't have that luxury - we need to distinguish > coroutines from regular functions, since we can't just handle them > according to the underlying C runtime model any more. > > Guido's idea of a shadow thread to let synchronous threads run > coroutines without needing to actually run a foreground event loop > should provide a manageable way of getting the two runtime models > (traditional C and asynchronous coroutines) to play nicely together in > a single application, and has the virtue of being something folks can > readily experiment with for themselves before we commit to anything > specific in the standard library (since all the building blocks of > thread local storage, event loop management, and inter-thread message > passing primitives are already available). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org From mertz at gnosis.cx Mon Oct 17 14:48:23 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 17 Oct 2016 11:48:23 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017173219.GC22471@ando.pearwood.info> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: On Mon, Oct 17, 2016 at 10:32 AM, Steven D'Aprano wrote: > But if we *do* accept this syntax, then I believe that we should drop > the pretense that it is a natural extension of sequence unpacking in the > context of a for-loop-with-append (i.e. list comprehensions) and accept > that it will be seen by people as a magical "flatten" operator. [...] > So, yet again for emphasis: I see what you mean about unrolling the list > comprehension into a list display. But I believe that's not a helpful > way to think about list comprehensions. Moreover, this "magical flatten" operator will crash in bad ways that a regular flatten() will not. I.e. this is fine (if strange): >>> three_inf = (count(), count(), count()) >>> comp = (x for x in flatten(three_inf)) >>> next(comp) 0 >>> next(comp) 1 It's hard to see how that won't blow up under the new syntax (i.e. generally for all infinite sequences). Try running this, for example: >>> a, *b = count() Syntactically valid... but doesn't terminate. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Mon Oct 17 14:53:21 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 17 Oct 2016 18:53:21 +0000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: On Mon, Oct 17, 2016 at 9:49 PM David Mertz wrote: ... > Moreover, this "magical flatten" operator will crash in bad ways that a > regular flatten() will not. I.e. this is fine (if strange): > > >>> three_inf = (count(), count(), count()) > >>> comp = (x for x in flatten(three_inf)) > >>> next(comp) > 0 > >>> next(comp) > 1 > > It's hard to see how that won't blow up under the new syntax (i.e. > generally for all infinite sequences). > > The proposed semantics replace the asterisk with a "yield from" in a generator, so it should work just fine. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Oct 17 15:35:41 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 21:35:41 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> Message-ID: <5606a57a-1dca-a562-1a1c-7c665b8f3aa5@mail.de> On 17.10.2016 20:38, David Mertz wrote: > Under my proposed "more flexible recursion levels" idea, it could even > be: > > [f(x) for x in flatten(it, levels=3)] > > There would simply be NO WAY to get that out of the * comprehension > syntax at all. But a decent flatten() function gets all the flexibility. I see what you are trying to do here and I appreciate it. Just one thought from my practical experience: I haven't had a single usage for levels > 1. levels==1 is basically * which I have at least one example for. Maybe, that relates to the fact that we asked our devs to use names (as in attributes or dicts) instead of deeply nested list/tuple structures. Do you think it would make sense to start a new thread just for the sake of readability? > Honestly, it goes beyond just being "wrong". The repeated refusal to > even acknowledge any equivalence between [...x... for x in [a, b, c]] > and [...a..., ...b..., ...c...] truly makes it difficult for me to > accept some people's _sincerity_. > > > I am absolutely sincere in disliking and finding hard-to-teach this > novel use of * in comprehensions. You are consistent at least. You don't teach * in list displays, no matter if regular lists or comprehensions. +1 > P.S. It's very artificial to assume user are unable to use 'from > itertools import chain' to try to make chain() seem more cumbersome > than it is. I am sorry but it is cumbersome. Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Oct 17 16:06:22 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 22:06:22 +0200 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On 16.10.2016 09:35, Alireza Rafiei wrote: > Awesome! Thanks for the thorough explanation. Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also welcome to the list, Alireza.) > > def multisort(xs, specs): > for key, reverse in reversed(specs): > xs.sort(key=key, reverse=reverse) > > That's all it takes! And it accepts any number of items in `specs`. > Before you worry that it's "too slow", time it on real test data. > `.sort()` is pretty zippy, and this simple approach allows using > simple key functions. More importantly, it's much easier on your > brain ;-) > > @Tim Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)? Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 17 16:12:43 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:12:43 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017173219.GC22471@ando.pearwood.info> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: On 17 October 2016 at 18:32, Steven D'Aprano wrote: > On Mon, Oct 17, 2016 at 12:11:46PM -0400, Random832 wrote: > >> Honestly, it goes beyond just being "wrong". The repeated refusal to >> even acknowledge any equivalence between [...x... for x in [a, b, c]] >> and [...a..., ...b..., ...c...] truly makes it difficult for me to >> accept some people's _sincerity_. > > While we're talking about people being insincere, how about if you take > a look at your own comments? This "repeated refusal" that you accuse us > (opponents of this proposal) of is more of a rhetorical fiction than an > actual reality. Paul, David and I have all acknowledged the point you > are trying to make. I won't speak for Paul or David, but speaking for > myself, it isn't that I don't understand the point you're trying to > make, but that I do not understand why you think that point is > meaningful or desirable. For my part: 1. I've acknowledged that equivalence. As well as the fact that the proposal (specifically, as explained formally by Greg) is understandable and a viable possible extension. 2. I don't find the "interpolation" equivalence a *good* way of interpreting list comprehensions, any more than I think that loops should be explained by demonstrating how to unroll them. 3. I've even explicitly revised my position on the proposal from -1 to -0 (although I'm tending back towards -1, if I'm honest...). 4. Whether you choose to believe me or not, I've sincerely tried to understand the proposal, but I pretty much had to insist on a formal definition of syntax and semantics before I got an explanation that I could follow. However: 1. I'm tired of hearing that the syntax is "obvious". This whole thread proves otherwise, and I've yet to hear anyone from the "obvious" side of the debate acknowledge that. 2. Can someone summarise the *other* arguments for the proposal? I'm genuinely struggling to recall what they are (assuming they exist). It feels like I'm hearing nothing more than "it's obvious what this does, it's obvious that it's needed and the people saying it isn't are wrong". That may well not be the truth, but *it's the impression I'm getting*. I've tried to take a step back and summarise my side of the debate a couple of times now. I don't recall seeing anyone doing the same from the other side (Greg's summarised the proposal, but I don't recall anyone doing the same with the justification for it). 3. The fact is that the proposed behaviour was *specifically* blocked, *precisely* because of strong concerns that it would cause readability issues and only had "mild" support. I'm not hearing any reason to change that decision (sure, there are a few people here offering something stronger than "mild" support, but it's only a few voices, and they are not addressing the readability concerns at all). There was no suggestion in the PEP that this decision was expected to be revisited later. Maybe there was an *intention* to do so, but the PEP didn't state it. I'd suggest that this fact alone implies that the people proposing this change need to write a new PEP for it, but honestly I don't think the way the current discussion has gone suggests that there's any chance of putting together a persuasive PEP, much less a consensus decision. And finally, no-one has even *tried* to explain why we need a third way of expressing this construction. Nick made this point, and basically got told that his condition was too extreme. He essentially got accused of constructing an impossible test. And yet it's an entirely fair test, and one that's applied regularly to proposals - and many *do* pass the test. It's worth noting here that we have had no real-world use cases, so the common approach of demonstrating real code, and showing how the proposal improves it, is not available. Also, there's no evidence that this is a common need, and so it's not clear to what extent any sort of special language support is warranted. We don't (as far as I know, and no-one's provided evidence otherwise) see people routinely writing workarounds for this construct. We don't hear of trainers saying that pupils routinely try to do this, and are surprised when it doesn't work (I'm specifically talking about students *deducing* this behaviour, not being asked if they think it's reasonable once explained). These are all arguments that have been used in the past to justify new syntax (and so reach Nick's "bar"). And we've had a special-case function (flatten) proposed to cover the most common cases (taking the approach of the 80-20 rule) - but the only response to that proposal has been "but it doesn't cover ". If it didn't cover a demonstrably common real-world problem, that would be a different matter - but anyone can construct cases that aren't covered by *any* given proposal. That doesn't prove anything. I don't see any signs of progress here. And I'm pretty much at the point where I'm losing interest in having the same points repeated at me over and over, as if repetition and volume will persuade me. Sorry. Paul From random832 at fastmail.com Mon Oct 17 16:22:12 2016 From: random832 at fastmail.com (Random832) Date: Mon, 17 Oct 2016 16:22:12 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> Message-ID: <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> On Mon, Oct 17, 2016, at 14:38, David Mertz wrote: > What you're saying is EXACTLY 180 deg reversed from the truth. It's > *precisely* because it doesn't need the extra complication that > `flatten()` > is more flexible and powerful. I have no idea what your example is meant > to do, but the actual correspondence is: > > [f(x) for x in flatten(it)] No, it's not. For a more concrete example: [*range(x) for x in range(4)] [*(),*(0,),*(0,1),*(0,1,2)] [0, 0, 1, 0, 1, 2] There is simply no way to get there by using flatten(range(4)). The only way flatten *without* a generator expression can serve the same use cases as this proposal is for comprehensions of the *exact* form [*x for x in y]. For all other cases you'd need list(flatten(...generator expression without star...)). From p.f.moore at gmail.com Mon Oct 17 16:26:27 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:26:27 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <5606a57a-1dca-a562-1a1c-7c665b8f3aa5@mail.de> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <5606a57a-1dca-a562-1a1c-7c665b8f3aa5@mail.de> Message-ID: On 17 October 2016 at 20:35, Sven R. Kunze wrote: > P.S. It's very artificial to assume user are unable to use 'from itertools > import chain' to try to make chain() seem more cumbersome than it is. > > I am sorry but it is cumbersome. Imports are a fundamental part of Python. How are they "cumbersome"? Is it cumbersome to have to import sys to get access to argv? To import re to use regular expressions? To import subprocess to run an external program? Importing the features you use (and having an extensive standard library of tools you might want, but which don't warrant being built into the language) is, to me, a basic feature of Python. Certainly having to add an import statement is extra typing. But terseness was *never* a feature of Python. In many ways, a resistance to overly terse (I could say "Perl-like") constructs is one of the defining features of the language - and certainly, it's one that drew me to Python, and one that I value. Paul From p.f.moore at gmail.com Mon Oct 17 16:27:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:27:32 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: On 17 October 2016 at 21:22, Random832 wrote: > For a more concrete example: > > [*range(x) for x in range(4)] > [*(),*(0,),*(0,1),*(0,1,2)] > [0, 0, 1, 0, 1, 2] > > There is simply no way to get there by using flatten(range(4)). The only > way flatten *without* a generator expression can serve the same use > cases as this proposal is for comprehensions of the *exact* form [*x for > x in y]. For all other cases you'd need list(flatten(...generator > expression without star...)). Do you have a real-world example of needing this? Paul From random832 at fastmail.com Mon Oct 17 16:30:34 2016 From: random832 at fastmail.com (Random832) Date: Mon, 17 Oct 2016 16:30:34 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <1476736234.923614.758866977.3D87D66C@webmail.messagingengine.com> On Mon, Oct 17, 2016, at 16:12, Paul Moore wrote: > And finally, no-one has even *tried* to explain why we need a third > way of expressing this construction. Nick made this point, and > basically got told that his condition was too extreme. He essentially > got accused of constructing an impossible test. And yet it's an > entirely fair test, and one that's applied regularly to proposals - > and many *do* pass the test. As the one who made that accusation, my objection was specifically to the word "always" - which was emphasized - and which is something that I don't believe is actually a component of the test that is normally applied. His words, specifically, were "a compelling argument needs to be presented that the new spelling is *always* preferable to the existing ones" List comprehensions themselves aren't even always preferable to loops. From p.f.moore at gmail.com Mon Oct 17 16:31:48 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:31:48 +0100 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On 17 October 2016 at 21:06, Sven R. Kunze wrote: > Do you think that simple solution could have a chance to be added to stdlib > somehow (with the possibility of speeding it up in the future)? You could submit a doc patch to add an explanation of this technique to the list.sort function. I doubt it's worth a builtin for a 2-line function. Paul From srkunze at mail.de Mon Oct 17 16:33:32 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 22:33:32 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> On 17.10.2016 22:12, Paul Moore wrote: > 4. Whether you choose to believe me or not, I've sincerely tried to > understand the proposal [...] I think you did and I would like others to follow your example. > 2. Can someone summarise the *other* arguments for the proposal? I for one think it's just restriction lifting. If that doesn't suffice, that's okay. > It's worth noting here that we have had > no real-world use cases, so the common approach of demonstrating real > code, and showing how the proposal improves it, is not available. Sorry? You know, I am all for real-world code and I also delivered: https://mail.python.org/pipermail/python-ideas/2016-October/043030.html If it doesn't meet your standards of real-world code, okay. I meets mine. > Also, there's no evidence that this is a common need, and so it's not > clear to what extent any sort of special language support is > warranted. We don't (as far as I know, and no-one's provided evidence > otherwise) see people routinely writing workarounds for this > construct. I do. Every usage of chain.from_iterable is that, well, "workaround". Workaround is too hard I think. It's more of an inconvenience. > We don't hear of trainers saying that pupils routinely try > to do this, and are surprised when it doesn't work (I'm specifically > talking about students *deducing* this behaviour, not being asked if > they think it's reasonable once explained). That's fair. As we see it, trainers deliberately choose to omit some language features they personally feel uncomfortable with. So, yes, if there were trainers who routinely reported this, that would be a strong argument for it. However, the absence of this signal, is not an argument against it IMHO. > I don't see any signs of progress here. And I'm pretty much at the > point where I'm losing interest in having the same points repeated at > me over and over, as if repetition and volume will persuade me. Sorry. Same here. The discussion is inconclusive. I think it's best to drop it for the time being. Best, Sven From elazarg at gmail.com Mon Oct 17 16:33:29 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 17 Oct 2016 20:33:29 +0000 Subject: [Python-ideas] unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: On Mon, Oct 17, 2016 at 11:13 PM Paul Moore wrote: ... > 2. Can someone summarise the *other* arguments for the proposal? I'm > genuinely struggling to recall what they are (assuming they exist). My own argument was uniformity: allowing starred expression in other places, and I claim that the None-aware operator "come out" naturally from this uniformity. I understand that uniformity is not held high as far as decision making goes, and I was also kindly asked not to divert this thread, so I did not repeat it, but the argument is there. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 17 16:35:21 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:35:21 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476736234.923614.758866977.3D87D66C@webmail.messagingengine.com> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <1476736234.923614.758866977.3D87D66C@webmail.messagingengine.com> Message-ID: On 17 October 2016 at 21:30, Random832 wrote: > On Mon, Oct 17, 2016, at 16:12, Paul Moore wrote: >> And finally, no-one has even *tried* to explain why we need a third >> way of expressing this construction. Nick made this point, and >> basically got told that his condition was too extreme. He essentially >> got accused of constructing an impossible test. And yet it's an >> entirely fair test, and one that's applied regularly to proposals - >> and many *do* pass the test. > > As the one who made that accusation, my objection was specifically to > the word "always" - which was emphasized - and which is something that I > don't believe is actually a component of the test that is normally > applied. His words, specifically, were "a compelling argument needs to > be presented that the new spelling is *always* preferable to the > existing ones" > > List comprehensions themselves aren't even always preferable to loops. Sigh. And no-one else in this debate has ever used exaggerated language. I have no idea if Nick would reject an argument that had any exceptions at all, but I don't think it's unreasonable to ask that people at least *try* to formulate an argument that demonstrates that the two existing ways we have are inferior to the proposal. Stating that you're not even willing to try is hardly productive. Paul From srkunze at mail.de Mon Oct 17 16:43:56 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 22:43:56 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <5606a57a-1dca-a562-1a1c-7c665b8f3aa5@mail.de> Message-ID: On 17.10.2016 22:26, Paul Moore wrote: > Certainly having to add an import statement is extra typing. But > terseness was *never* a feature of Python. In many ways, a resistance > to overly terse (I could say "Perl-like") constructs is one of the > defining features of the language - and certainly, it's one that drew > me to Python, and one that I value. I am completely with you on this one, Paul. The statement about "cumbersomeness" was specific to this whole issue. Of course, importing feature-rich pieces from the stdlib is really cool. It was more the missed ability to do the same with list comprehensions of what is possible with list displays today. List displays feature * without importing anything fancy from the stdlib. Nevermind, it seems we need to wait longer for this issue to come up again and maybe again to solve it eventually. Best, Sven From srkunze at mail.de Mon Oct 17 16:48:11 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 17 Oct 2016 22:48:11 +0200 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: <1597b75b-906b-dd99-4e1e-8acb6af346a0@mail.de> On 17.10.2016 22:31, Paul Moore wrote: > On 17 October 2016 at 21:06, Sven R. Kunze wrote: >> Do you think that simple solution could have a chance to be added to stdlib >> somehow (with the possibility of speeding it up in the future)? > You could submit a doc patch to add an explanation of this technique > to the list.sort function. Is the github repo ready? If so, I will do. > I doubt it's worth a builtin for a 2-line function. Not this 2-line function alone indeed. As my note about speeding it up in the future goes, I thought about an interface which allows people to do easy multisort BUT with the possibility of further optimization by the CPython or other Python implementations. Cheers, Sven From p.f.moore at gmail.com Mon Oct 17 16:49:33 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:49:33 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> Message-ID: On 17 October 2016 at 21:33, Sven R. Kunze wrote: > On 17.10.2016 22:12, Paul Moore wrote: >> >> 4. Whether you choose to believe me or not, I've sincerely tried to >> understand the proposal [...] > > I think you did and I would like others to follow your example. >> >> 2. Can someone summarise the *other* arguments for the proposal? > > I for one think it's just restriction lifting. If that doesn't suffice, > that's okay. Thank you. You're correct that was mentioned. I infer from the responses that it isn't sufficient, but I should have noted it explicitly. Elazar also mentioned consistency, which I had also forgotten. He noted in his comment (and I agree) that consistency isn't a compelling argument in itself. I'd generalise that point and say that theoretical arguments are typically considered secondary to real-world requirements. >> It's worth noting here that we have had >> no real-world use cases, so the common approach of demonstrating real >> code, and showing how the proposal improves it, is not available. > > Sorry? You know, I am all for real-world code and I also delivered: > https://mail.python.org/pipermail/python-ideas/2016-October/043030.html > > If it doesn't meet your standards of real-world code, okay. I meets mine. Apologies. I had completely missed that example. Personally, I'd be inclined to argue that you shouldn't try so hard to build the list you want to return in a single statement. You can build the return value using a loop. Or maybe even write a function designed to filter Postgres result sets, which might even be reusable in other parts of your program. It's not my place to tell you how to redesign your code, or to insist that you have to use a particular style, but if I were writing that code, I wouldn't look for an unpacking syntax. >> Also, there's no evidence that this is a common need, and so it's not >> clear to what extent any sort of special language support is >> warranted. We don't (as far as I know, and no-one's provided evidence >> otherwise) see people routinely writing workarounds for this >> construct. > > I do. Every usage of chain.from_iterable is that, well, "workaround". > Workaround is too hard I think. It's more of an inconvenience. If this proposal had been described as "a syntax to replace chain.from_iterable", then it might have been received differently. I doubt it would have succeeded even so, but people would have understood the use case better. For my part, I find the name chain.from_iterable to be non-obvious. But if I needed to use it a lot (I don't!) I'd be more likely to simply come up with a better name, and rename it. Naming is *hard*, but it's worthwhile. One problem (IMO) of the "propose some syntax" approach is that it avoids the issue of thinking up a good name, by throwing symbols at the problem. (And you can't google for a string of symbols...) >> We don't hear of trainers saying that pupils routinely try >> to do this, and are surprised when it doesn't work (I'm specifically >> talking about students *deducing* this behaviour, not being asked if >> they think it's reasonable once explained). > > That's fair. As we see it, trainers deliberately choose to omit some > language features they personally feel uncomfortable with. So, yes, if there > were trainers who routinely reported this, that would be a strong argument > for it. However, the absence of this signal, is not an argument against it > IMHO. No-one is looking for arguments *against* the proposal. Like it or not "status quo wins" is the reality. People need to look for arguments in favour of the proposal. >> I don't see any signs of progress here. And I'm pretty much at the >> point where I'm losing interest in having the same points repeated at >> me over and over, as if repetition and volume will persuade me. Sorry. > > > Same here. The discussion is inconclusive. I think it's best to drop it for > the time being. Thanks for the reasoned response. Paul From p.f.moore at gmail.com Mon Oct 17 16:50:59 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 21:50:59 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <5606a57a-1dca-a562-1a1c-7c665b8f3aa5@mail.de> Message-ID: On 17 October 2016 at 21:43, Sven R. Kunze wrote: > The statement about "cumbersomeness" was specific to this whole issue. Of > course, importing feature-rich pieces from the stdlib is really cool. It was > more the missed ability to do the same with list comprehensions of what is > possible with list displays today. List displays feature * without importing > anything fancy from the stdlib. In your other post you specifically mentioned itertools.chain.from_iterable. I'd have to agree with you that this specific name feels clumsy to me as well. But I'd argue for finding a better name, not replacing the function with syntax :-) Cheers, Paul From breamoreboy at yahoo.co.uk Mon Oct 17 17:28:55 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 17 Oct 2016 22:28:55 +0100 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On 17/10/2016 21:31, Paul Moore wrote: > On 17 October 2016 at 21:06, Sven R. Kunze wrote: >> Do you think that simple solution could have a chance to be added to stdlib >> somehow (with the possibility of speeding it up in the future)? > > You could submit a doc patch to add an explanation of this technique > to the list.sort function. I doubt it's worth a builtin for a 2-line > function. > > Paul How about changing https://wiki.python.org/moin/HowTo/Sorting ? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From p.f.moore at gmail.com Mon Oct 17 17:53:45 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 17 Oct 2016 22:53:45 +0100 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: On 17 October 2016 at 22:28, Mark Lawrence via Python-ideas wrote: > How about changing https://wiki.python.org/moin/HowTo/Sorting ? Good point. Better still, https://docs.python.org/3.6/howto/sorting.html Paul From michael at mdupont.com Mon Oct 17 18:11:45 2016 From: michael at mdupont.com (Michael duPont) Date: Mon, 17 Oct 2016 18:11:45 -0400 Subject: [Python-ideas] Conditional Assignment in If Statement Message-ID: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> In the spirit of borrowing from other languages, there?s a particular bit of functionality from Swift that I?ve really wanted to have in Python. To preface, Swift uses var and let (static) when variables are created. It also supports optionals which allows a variable to be either some value or nil (Swift?s version of None). This enables the following syntax: if let foo = get_foo() { bar(foo) } In words: if the value returned by get_foo() is not nil, assign it to foo and enter the if block. The variable foo is static and only available within the scope of the if block. The closest thing we have in Python is: foo = get_foo() if foo is not None: bar(foo) However, foo is still available outside the scope of the if block presumably never to be referenced again. We could add ?del foo? to remove it from our outer scope, but this is extra code. What does everyone think about: if foo = get_foo(): bar(foo) as a means to replace: foo = get_foo() if not foo: bar(foo) del foo Might there be some better syntax or a different keyword? I constantly run into this sort of use case. Michael duPont From rosuav at gmail.com Mon Oct 17 18:18:14 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 18 Oct 2016 09:18:14 +1100 Subject: [Python-ideas] Conditional Assignment in If Statement In-Reply-To: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> References: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> Message-ID: On Tue, Oct 18, 2016 at 9:11 AM, Michael duPont wrote: > What does everyone think about: > > if foo = get_foo(): > bar(foo) > > as a means to replace: > > foo = get_foo() > if not foo: > bar(foo) > del foo > > Might there be some better syntax or a different keyword? I constantly run into this sort of use case. I'm pretty sure that syntax is never going to fly, for a variety of reasons (to see most of them, just read up a C style guide). But this syntax has been proposed now and then, analogously with the 'with' statement: if get_foo() as foo: bar(foo) Be careful of your definitions, though. You've said these as equivalent: if foo = get_foo(): bar(foo) foo = get_foo() if foo is not None: bar(foo) foo = get_foo() if not foo: bar(foo) del foo There are three quite different conditions here. Your last two are roughly opposites of each other; but also, most people would expect "if foo = get_foo()" to be the same condition as "if get_foo()", which is not the same as "if get_foo() is not None". The semantics most likely to be accepted would be for "if get_foo() as foo:" to use the standard boolification rules of Python (and then make 'foo' available in both 'if' and 'else' blocks). Would you support that? If so, check out some of the previous threads on the subject - this is far from the first time it's been discussed, and most likely won't be the last. ChrisA From rene at stranden.com Mon Oct 17 18:18:08 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 18 Oct 2016 00:18:08 +0200 Subject: [Python-ideas] async objects In-Reply-To: References: <7E14E7AB-5D0D-42D8-900F-398DA4E2483E@stranden.com> <22514.62257.683740.935528@turnbull.sk.tsukuba.ac.jp> <22515.24380.374120.973484@turnbull.sk.tsukuba.ac.jp> <53233b2e-5fa4-3c00-f7e4-c6521c833e4e@mail.de> <2155DE2D-E854-43AE-A7FC-62525FBF7EAA@stranden.com> <1A812A64-8C11-4A10-9103-927790EB0CA8@stranden.com> Message-ID: Your are right about the importance of Python C API, it often goes under my radar. For the past 20 years I have only used it a couple of times (to integrate Python into some existing C-code) therefore it is not as much in focus as it should be and definiatly are by others. I get your innovators dilemma all to well, just look at Python 3 and the time it took us to shift from 2. But, watching Larry Hastings talk on his awesome gilectomy project, it was my understanding that he at least saw it as a possibility to do a backward compatible extension of the C-API for his GIL removal project. As I understand he proposes that the Python runtime should check whether a given C-lib has been upgraded to support non-GIL, if not run it as an old version. I am not sure how much it will take in this case, but i thought ?hey, if Larry Hastings is removing the GIL and proposing an extension to the C-api, at least it can be done? :-) /Rene > On 17 Oct 2016, at 20:39, Nathaniel Smith wrote: > > The problem is that if your goal is to make a practical proposal, it's > not enough to look at Python-the-language. You're absolutely right, > AFAICT there's nothing stopping someone from making a nice > implementation of Python-the-language that has erlang-style cheap > shared-nothing threads with some efficient message-passing mechanism. > > But! It turns out that unless your new implementation supports the > CPython C API, then it's almost certainly not viable as a mainstream > CPython alternative, because there's this huge huge pile of libraries > that have been written against that C API. You're not competing > against CPython, you're competing against CPython+thousands of > libraries that you don't have and that your users expect. And > unfortunately, it turns out that the C API locks in a bunch of the > implementation assumptions (refcounting, the GIL, use of the C stack, > poor support for isolation between different interpreter states, ...) > that you were trying to get away from. > > I mean, in many ways it's a good problem to have, that our current > ecosystem is just so attractive that it's hard to compete with! > (Though a pessimist could point out that this difficulty with > competing with yourself is exactly what tends to eventually undermine > incumbents -- cf. the innovator's dilemma.) And it's "just" a matter > of implementation, not Python-the-language itself. But the bottom line > is: this is *the* core problem that you have to grapple with if you > want to make any radical improvements in the Python runtime and have > people actually use them. > > -n > > On Mon, Oct 17, 2016 at 9:36 AM, Rene Nejsum wrote: >> Regarding the Python C-runtime and async, I just had a good talk with >> Kresten Krab at Trifork. He implemented ?Erjang? the Java implementation of >> the Erlang VM (www.erjang.org). Doing this he had access to the Erlang (C) >> VM. >> >> It turn?s out that the Erlang VM and the Python VM has a lot of similarities >> and the differences are more in the language, than in the VM >> >> Differences between the Erlang VM and Python related to async are: >> >> 1) Most variables in Erlang are immutable >> Making it easier to have coroutines >> >> 2) coroutines are built into the Erlang using the ?spawn? keyword >> Leaving the specific implementation to the VM, but never implemented with OS >> threads. >> >> 3) All coroutines have their own heap and stack (initially 200 bytes), but >> can grow as needed >> >> 4) coroutines are managed in ?ready-queue?, from which the VM thread >> executes the next ready job >> Each job gets 2000 ?instructions? (or until IO block) and the next coroutine >> is executed >> >> Because of this, when multicore CPU?s entered the game, it was quite easy to >> change the Erlang VM to add a thread per core to pull from the ready-queue. >> This makes an Erlang program run twice as fast (almost) every time the >> number of cores are doubled! >> >> Given this, I am still convinced that: >> >> obj = async SomeObject() >> >> should be feasible, even though there will be some ?golang? like issues >> about shared data, but there could be several ways to handle this. >> >> br >> /Rene >> >> >> On 05 Oct 2016, at 18:06, Nick Coghlan wrote: >> >> On 5 October 2016 at 16:49, Rene Nejsum wrote: >> >> On 04 Oct 2016, at 18:40, Sven R. Kunze wrote: >> I don't think that's actually what I wanted here. One simple keyword should >> have sufficed just like golang did. So, the developer gets a way to decide >> whether or not he needs it blocking or nonblocking **when using a >> function**. He doesn't need to decide it **when writing the function**. >> >> >> I agree, that?s why i proposed to put the async keyword in when creating the >> object, saying in this instance I want asynchronous communication with the >> object. >> >> >> OK, I think there may be a piece of foundational knowledge regarding >> runtime design that's contributing to the confusion here. >> >> Python's core runtime model is the C runtime model: threads (with a >> local stack and access to a global process heap) and processes (which >> contain a heap and one or more threads). Anything else we do (whether >> it's generators, coroutines, or some other form of paused execution >> like callback management) gets layered on top of that runtime model. >> When folks ask questions like "Why can't Python be more like Go?", >> "Why can't Python be more like Erlang?", or "Why can't Python be more >> like Rust?" and get a negative response, it's usually because there's >> an inherent conflict between the C runtime model and whatever piece of >> the Go/Erlang/Rust runtime model we want to steal. >> >> So the "async" keyword in "async def", "async for" and "async with" is >> essentially a marker saying "This is not a C-like runtime concept >> anymore!" (The closest C-ish equivalent I'm aware of would be Apple's >> Grand Central Dispatch in Objective-C and that shows many of the >> async/await characteristics also seen in Python and C#: >> https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 >> ) >> >> Go (as with Erlang before it) avoided these problems by not providing >> C-equivalent functions in the first place. Accordingly, *every* normal >> function defined in Go can also be used as a goroutine, rather than >> needing to be a distinct type - their special case is defining >> functions that interoperate with external C libraries. Python (along >> with other languages built on the C runtime model like C# and >> Objective-C) doesn't have that luxury - we need to distinguish >> coroutines from regular functions, since we can't just handle them >> according to the underlying C runtime model any more. >> >> Guido's idea of a shadow thread to let synchronous threads run >> coroutines without needing to actually run a foreground event loop >> should provide a manageable way of getting the two runtime models >> (traditional C and asynchronous coroutines) to play nicely together in >> a single application, and has the virtue of being something folks can >> readily experiment with for themselves before we commit to anything >> specific in the standard library (since all the building blocks of >> thread local storage, event loop management, and inter-thread message >> passing primitives are already available). >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Nathaniel J. Smith -- https://vorpus.org From brett at python.org Mon Oct 17 14:29:44 2016 From: brett at python.org (Brett Cannon) Date: Mon, 17 Oct 2016 18:29:44 +0000 Subject: [Python-ideas] Civility on this mailing list Message-ID: Based on some emails I read in the " unpacking generalisations for list comprehension", I feel like I need to address this entire list about its general behaviour. If you don't follow me on Twitter you may not be aware that I am taking the entire month of October off from volunteering any personal time on Python for my personal well-being (this reply is being done on work time for instance). This stems from my wife pointing out that I had been rather stressed in July and August outside of work in relation to my Python volunteering (having your weekends ruined is never fun). That stress stemmed primarily from two rather bad interactions I had to contend with on the issue track in July and August ... and this mailing list. When I have talked to people about this mailing list it's often referred to by others as the "wild west" of Python development discussions (if you're not familiar with US culture, that turn of phrase basically means "anything goes"). To me that is not a compliment. When I created this list with Titus the goal was to provide a safe place where people could bring up ideas for Python where people could quickly provide basic feedback so people could know whether there was any chance that python-dev would consider the proposal. This was meant to be a win for proposers by not feeling like they were wasting python-dev's time and a win for python-dev by keeping that list focused on the development of Python and not fielding every idea that people want to propose. And while this list has definitely helped with the cognitive load on python-dev, it has not always provided a safe place for people to express ideas. I have seen people completely dismiss people's expertise and opinion. There has been name calling and yelling at people (which is always unnecessary). There have been threads that have completely derailed itself and gone entirely off-topic. IOW I would not hold this mailing list up as an example of the general discourse that I experience elsewhere within the community. Now I realize that we are all human beings coming from different cultural backgrounds and lives. We all have bad days and may not take the time to stop and think about what we are typing before sending it, leading to emails that are worded in a way that can be hurtful to others. It's also easy to forget that various cultures views things differently and so that can lead to people "reading between the lines" a lot and picking up things that were never intended. There are 1,031 people on this mailing list from around the world and it's easy to forget that e.g. Canadian humour may not translate well to Ukrainian culture (or something). What this means is it's okay to *nicely* say that something bothered you, but also try to give people the benefit of the doubt as you don't know what their day had been like before they wrote that email (I personally don't like the "just mute the thread" approach to dealing with bad actors when the muting is silent as that doesn't help new people who join this mailing list and the first email they see is someone being rude that everyone else didn't see because they muted the thread days ago). As for the off-topic threads, please remember there are 1,031 people on this mailing list (this doesn't count people reading through gmane or Google Groups). Being extremely generous and assuming every person on this list only spends 10 seconds deciding if they care about your email, that's still nearly 3 hours of cumulative time spent on your email. So please be cognisant when you reply, and if you want to have an off-topic conversation, please take it off-list. And finally, as one of the list administrators I am in a position of power when it comes to the rules of this list and the CoC. While I'm one of the judges on when someone has violated the CoC, I purposefully try not to play the role of police to avoid bias and abuse of power. What that means is that I never personally lodge a CoC complaint against anyone. That means that if you feel someone is being abusive here you cannot rely on list admins noticing and doing something about it. If you feel someone has continuously been abusive on this list and violating the CoC then you must email the list admins about it if you wish to see action taken (all communications are kept private among the admins). Now I'm not asking people to email us on every small infraction (as I said above, try to give everyone a break knowing we all have bad days), but if you notice a pattern then you need to speak up if you would like to see something change. When I started my month off I thought that maybe if I only read this mailing list once a week that the frequency would be low enough that I could handle the stress of being both a participant and admin who is ultimately responsible for the behaviour here, but I'm afraid that isn't going to cut it. What I don't think people realize is that I don't take my responsibility as admin lightly; any time anyone acts rudely I take it personally like I somehow failed by letting the atmosphere and discourse on this list become what it is. Because of this I'm afraid I need to mute this mailing list for the rest of my vacation from volunteering in the Python community after I send this email. I personally hope people do take the time to read this email and reflect upon how they conduct themselves on this mailing list -- and maybe on other lists as well -- so that when I attempt to come back in November I don't have to permanent stop being a participant on this list and simply become an admin for this list to prevent complete burn-out for me in the Python community (and I know this last sentence sounds dramatic, but I'm being serious; the irony of receiving the Frank Willison award the same year I'm having to contemplate fundamentally shifting how I engage with the community to not burn out is not lost on me). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Oct 17 18:45:56 2016 From: barry at python.org (Barry Warsaw) Date: Mon, 17 Oct 2016 18:45:56 -0400 Subject: [Python-ideas] Null coalescing operator References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: <20161017184556.52514b4b@anarchist> On Oct 15, 2016, at 04:10 PM, Nick Coghlan wrote: >Having been previously somewhere between -1 and -0, I've been doing a >lot more data mining and analysis work lately, which has been enough >to shift me to at least +0 and potentially even higher when it comes >to the utility of adding these operators (more on that below). I'm sympathetic to (some of) the goals of PEP 505, as these issues do occasionally annoy me. But I'm not entirely convinced they are common enough or annoying enough to warrant special syntax, and I *really* dislike the introduction of a ? operator for these purposes. I'm also concerned about adopting too much generality muddling up what I think should be a narrowly targeted improvement to readability. The other thing to note is that, while I often use ternary operators for this now, checking against None isn't always the sole conditional. E.g. self.chain = (chain if chain is None or IChain.providedBy(chain) else config.chains[chain]) That being said, null-aware member access (NAMA) would be pretty handy occasionally. I'm less sure about the other forms. For me, the biggest benefit of NAMA is the short-circuiting of chained attribute access. I don't like the operator syntax because I find it less readable (harder for the eye to pick out), and because it isn't a completely obvious operation. But also because I generally want to chase the attributes all-or-nothing. For example, foo.bar.baz.qux but only if all the intermediary attributes resolve to non-Nones. I don't want to have to write foo.?bar.?baz.?qux I tried playing around with new keywords such as 'when' and 'unless', which seem a little nice although not a perfect fit. thing = foo.bar.baz.qux unless None thing = unless None then foo.bar.baz.qux thing = when foo.bar.baz.qux thing = foo.bar.baz.qux when not None I do like the idea of a keyword more than an operator, and disagree that a new keyword can't be introduced until Python 4. That's why we have __future__! Anyway, that's my $0.02. I trust Guido to DTPT (do the Pythonic thing :), even if that means rejecting the PEP. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From greg.ewing at canterbury.ac.nz Mon Oct 17 18:56:35 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Oct 2016 11:56:35 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017173219.GC22471@ando.pearwood.info> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <58055723.7060102@canterbury.ac.nz> Steven D'Aprano wrote: > What I don't believe is: > > (1) that the majority of Python programmers (or even a large minority) > regularly and consistently think of comprehensions as syntactic sugar > for a completely unrolled list display; rather, I expect that they > usually think of them as sugar for a for-loop; You don't have to believe that, because thinking about it as a for-loop works equally well. Without the star, it means "insert each of these things into a list". With the star, it means "unpack each of these things into a list". > In a list comprehension, we expect the invariant that the number of > items produced will equal the number of loops performed. There's a corresponding invariant for list displays: the number of items produced is equal to the number of expressions in the display. But that doesn't hold when the display includes unpacking, for obvious reasons. For the same reasons, we shouldn't expect it to hold for comprehensions with unpacking. -- Greg From rene at stranden.com Mon Oct 17 19:10:12 2016 From: rene at stranden.com (Rene Nejsum) Date: Tue, 18 Oct 2016 01:10:12 +0200 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: Message-ID: Dear Brett/ I have been reading the python-idea archive from time to time over the past years and I joined the list about a month ago to promote my ?crazy? async object idea. I did fear the response to a newcomer with an unlikely idea, but I must say the *everyone* has been extremely nice, writing often long answer to discussions and trying to understand where I?m coming from with this idea. And it definitely made me try to think a little extra before sending responses ? I did also raise an eye-brow when reading some of the comments in the thread you mentioned, they seam a little out of touch with my experience on other threads here. Hope some time off will do you good, my best advice to you and others is something that have helped me, in similar situations is the old saying ?Other peoples opinion of you, are none of your business? :-) It took me some years to get it, but now it helps me every time i get worked up about something another person says to me or about me. best /Rene > On 17 Oct 2016, at 20:29, Brett Cannon wrote: > > > Based on some emails I read in the " unpacking generalisations for list comprehension", I feel like I need to address this entire list about its general behaviour. > > If you don't follow me on Twitter you may not be aware that I am taking the entire month of October off from volunteering any personal time on Python for my personal well-being (this reply is being done on work time for instance). This stems from my wife pointing out that I had been rather stressed in July and August outside of work in relation to my Python volunteering (having your weekends ruined is never fun). That stress stemmed primarily from two rather bad interactions I had to contend with on the issue track in July and August ... and this mailing list. > > When I have talked to people about this mailing list it's often referred to by others as the "wild west" of Python development discussions (if you're not familiar with US culture, that turn of phrase basically means "anything goes"). To me that is not a compliment. When I created this list with Titus the goal was to provide a safe place where people could bring up ideas for Python where people could quickly provide basic feedback so people could know whether there was any chance that python-dev would consider the proposal. This was meant to be a win for proposers by not feeling like they were wasting python-dev's time and a win for python-dev by keeping that list focused on the development of Python and not fielding every idea that people want to propose. > > And while this list has definitely helped with the cognitive load on python-dev, it has not always provided a safe place for people to express ideas. I have seen people completely dismiss people's expertise and opinion. There has been name calling and yelling at people (which is always unnecessary). There have been threads that have completely derailed itself and gone entirely off-topic. IOW I would not hold this mailing list up as an example of the general discourse that I experience elsewhere within the community. > > Now I realize that we are all human beings coming from different cultural backgrounds and lives. We all have bad days and may not take the time to stop and think about what we are typing before sending it, leading to emails that are worded in a way that can be hurtful to others. It's also easy to forget that various cultures views things differently and so that can lead to people "reading between the lines" a lot and picking up things that were never intended. There are 1,031 people on this mailing list from around the world and it's easy to forget that e.g. Canadian humour may not translate well to Ukrainian culture (or something). What this means is it's okay to nicely say that something bothered you, but also try to give people the benefit of the doubt as you don't know what their day had been like before they wrote that email (I personally don't like the "just mute the thread" approach to dealing with bad actors when the muting is silent as that doesn't help new people who join this mailing list and the first email they see is someone being rude that everyone else didn't see because they muted the thread days ago). > > As for the off-topic threads, please remember there are 1,031 people on this mailing list (this doesn't count people reading through gmane or Google Groups). Being extremely generous and assuming every person on this list only spends 10 seconds deciding if they care about your email, that's still nearly 3 hours of cumulative time spent on your email. So please be cognisant when you reply, and if you want to have an off-topic conversation, please take it off-list. > > And finally, as one of the list administrators I am in a position of power when it comes to the rules of this list and the CoC. While I'm one of the judges on when someone has violated the CoC, I purposefully try not to play the role of police to avoid bias and abuse of power. What that means is that I never personally lodge a CoC complaint against anyone. That means that if you feel someone is being abusive here you cannot rely on list admins noticing and doing something about it. If you feel someone has continuously been abusive on this list and violating the CoC then you must email the list admins about it if you wish to see action taken (all communications are kept private among the admins). Now I'm not asking people to email us on every small infraction (as I said above, try to give everyone a break knowing we all have bad days), but if you notice a pattern then you need to speak up if you would like to see something change. > > When I started my month off I thought that maybe if I only read this mailing list once a week that the frequency would be low enough that I could handle the stress of being both a participant and admin who is ultimately responsible for the behaviour here, but I'm afraid that isn't going to cut it. What I don't think people realize is that I don't take my responsibility as admin lightly; any time anyone acts rudely I take it personally like I somehow failed by letting the atmosphere and discourse on this list become what it is. Because of this I'm afraid I need to mute this mailing list for the rest of my vacation from volunteering in the Python community after I send this email. I personally hope people do take the time to read this email and reflect upon how they conduct themselves on this mailing list -- and maybe on other lists as well -- so that when I attempt to come back in November I don't have to permanent stop being a participant on this list and simply become an admin for this list to prevent complete burn-out for me in the Python community (and I know this last sentence sounds dramatic, but I'm being serious; the irony of receiving the Frank Willison award the same year I'm having to contemplate fundamentally shifting how I engage with the community to not burn out is not lost on me). > > -Brett > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmludo at gmail.com Mon Oct 17 19:10:09 2016 From: gmludo at gmail.com (Ludovic Gasc) Date: Tue, 18 Oct 2016 01:10:09 +0200 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: Message-ID: Hi Brett, +10 for the code of conduct, first step to help people to improve their behaviour themselves. Maybe the situation might be the result that Python is more and more mainstream: like a start-up that grows too much to integrate correctly new people hired, we might face to the same issue, without the money incentive to motivate people to work together. I've no magic suggestion to improve the situation, it's the responsibility of each participant to do an introspection about his own behaviour. My personal tip to have a better public behaviour on mailing-lists: When I feel to have internal emotions about a discussion, I try now to wait at least one day to answer, to sleep before to reread and to send my response. It isn't a silver bullet, especially with a provocative discussion, but, at least, I've the feeling that it's better for everybody, including me, to reduce the escalation effect. I don't know you, but I hope the situation will be better for you in the future, each person in the community is important. Have a nice week. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2016-10-17 20:29 GMT+02:00 Brett Cannon : > > Based on some emails I read in the " unpacking generalisations for list > comprehension", I feel like I need to address this entire list about its > general behaviour. > > If you don't follow me on Twitter you may not be aware that I am taking > the entire month of October off from volunteering any personal time on > Python for my personal well-being (this reply is being done on work time > for instance). This stems from my wife pointing out that I had been rather > stressed in July and August outside of work in relation to my Python > volunteering (having your weekends ruined is never fun). That stress > stemmed primarily from two rather bad interactions I had to contend with on > the issue track in July and August ... and this mailing list. > > When I have talked to people about this mailing list it's often referred > to by others as the "wild west" of Python development discussions (if > you're not familiar with US culture, that turn of phrase basically means > "anything goes"). To me that is not a compliment. When I created this list > with Titus the goal was to provide a safe place where people could bring up > ideas for Python where people could quickly provide basic feedback so > people could know whether there was any chance that python-dev would > consider the proposal. This was meant to be a win for proposers by not > feeling like they were wasting python-dev's time and a win for python-dev > by keeping that list focused on the development of Python and not fielding > every idea that people want to propose. > > And while this list has definitely helped with the cognitive load on > python-dev, it has not always provided a safe place for people to express > ideas. I have seen people completely dismiss people's expertise and > opinion. There has been name calling and yelling at people (which is always > unnecessary). There have been threads that have completely derailed itself > and gone entirely off-topic. IOW I would not hold this mailing list up as > an example of the general discourse that I experience elsewhere within the > community. > > Now I realize that we are all human beings coming from different cultural > backgrounds and lives. We all have bad days and may not take the time to > stop and think about what we are typing before sending it, leading to > emails that are worded in a way that can be hurtful to others. It's also > easy to forget that various cultures views things differently and so that > can lead to people "reading between the lines" a lot and picking up things > that were never intended. There are 1,031 people on this mailing list from > around the world and it's easy to forget that e.g. Canadian humour may not > translate well to Ukrainian culture (or something). What this means is it's > okay to *nicely* say that something bothered you, but also try to give > people the benefit of the doubt as you don't know what their day had been > like before they wrote that email (I personally don't like the "just mute > the thread" approach to dealing with bad actors when the muting is silent > as that doesn't help new people who join this mailing list and the first > email they see is someone being rude that everyone else didn't see because > they muted the thread days ago). > > As for the off-topic threads, please remember there are 1,031 people on > this mailing list (this doesn't count people reading through gmane or > Google Groups). Being extremely generous and assuming every person on this > list only spends 10 seconds deciding if they care about your email, that's > still nearly 3 hours of cumulative time spent on your email. So please be > cognisant when you reply, and if you want to have an off-topic > conversation, please take it off-list. > > And finally, as one of the list administrators I am in a position of power > when it comes to the rules of this list and the CoC. While I'm one of the > judges on when someone has violated the CoC, I purposefully try not to play > the role of police to avoid bias and abuse of power. What that means is > that I never personally lodge a CoC complaint against anyone. That means > that if you feel someone is being abusive here you cannot rely on list > admins noticing and doing something about it. If you feel someone has > continuously been abusive on this list and violating the CoC then you must > email the list admins about it if you wish to see action taken (all > communications are kept private among the admins). Now I'm not asking > people to email us on every small infraction (as I said above, try to give > everyone a break knowing we all have bad days), but if you notice a pattern > then you need to speak up if you would like to see something change. > > When I started my month off I thought that maybe if I only read this > mailing list once a week that the frequency would be low enough that I > could handle the stress of being both a participant and admin who is > ultimately responsible for the behaviour here, but I'm afraid that isn't > going to cut it. What I don't think people realize is that I don't take my > responsibility as admin lightly; any time anyone acts rudely I take it > personally like I somehow failed by letting the atmosphere and discourse on > this list become what it is. Because of this I'm afraid I need to mute this > mailing list for the rest of my vacation from volunteering in the Python > community after I send this email. I personally hope people do take the > time to read this email and reflect upon how they conduct themselves on > this mailing list -- and maybe on other lists as well -- so that when I > attempt to come back in November I don't have to permanent stop being a > participant on this list and simply become an admin for this list to > prevent complete burn-out for me in the Python community (and I know this > last sentence sounds dramatic, but I'm being serious; the irony of > receiving the Frank Willison award the same year I'm having to contemplate > fundamentally shifting how I engage with the community to not burn out is > not lost on me). > > -Brett > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 17 19:17:12 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Oct 2016 12:17:12 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> Message-ID: <58055BF8.9030606@canterbury.ac.nz> David Mertz wrote: > >>> three_inf = (count(), count(), count()) > >>> comp = (x for x in flatten(three_inf)) > >>> next(comp) > 0 > >>> next(comp) > 1 > > It's hard to see how that won't blow up under the new syntax (i.e. > generally for all infinite sequences). It won't blow up, because * in a generator expression would be equivalent to yield-from. -- Greg From steve at pearwood.info Mon Oct 17 19:35:13 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 18 Oct 2016 10:35:13 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <58051563.7010904@brenbarn.net> References: <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <58051563.7010904@brenbarn.net> Message-ID: <20161017233513.GD22471@ando.pearwood.info> On Mon, Oct 17, 2016 at 11:16:03AM -0700, Brendan Barnwell wrote: > Now, personally, I don't insist on that invariant. I would > certainly like to be able to do more general things in a list > comprehension, I hear you, because I too would like to introduce a variant comprehension that uses a while instead of if. So don't think I'm not sympathetic. But that's not an option, and given the official position on comprehensions, I don't think this should be either. Officially, list comprehensions are not a replacement for general for-loops. Python is not Perl, where we encourage people to write one-liners, nor is it Haskell, where everything is an expression. If you want to do "more general things", use a real for-loop. Comprehensions are targetted at a narrow but important and common set of use-cases. > and many times I have been irritated by the fact that the > one-item-per-loop invariant exists. I'm not sure whether I'm in favor of > this particular syntax, but I'd like to be able to do the kind of things it > allows. But doing them inherently requires breaking the invariant you > describe. That last point is incorrect. You already can do the kind of things this thread is about: [*t for t in iterable] # proposed syntax: flatten can be written as: [x for t in iterable for x in t] If you want to apply a function to each element of t: [func(x) for x in [*t for t in iterable]] # proposed syntax can be written today: [func(x) for t in iterable for x in t] If you want to apply a function or expression to t first: [*func(t) for t in iterable] # proposed syntax [*expression for t in iterable] # proposed syntax this too can be written today: [x for t in iterable for x in func(t)] [x for t in iterable for x in expression] You might have an opinion on whether it is better to have an explicit extra loop (more verbose, but less magical) or special syntax (more compact, but more magical), but I don't think this proposal adds anything that cannot be done today. -- Steve From steve at pearwood.info Mon Oct 17 20:49:47 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 18 Oct 2016 11:49:47 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> References: <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> Message-ID: <20161018004946.GE22471@ando.pearwood.info> On Mon, Oct 17, 2016 at 10:33:32PM +0200, Sven R. Kunze wrote: > Sorry? You know, I am all for real-world code and I also delivered: > https://mail.python.org/pipermail/python-ideas/2016-October/043030.html Your example shows the proposed: [*(language, text) for language, text in fulltext_tuples if language == 'english'] which can be written as: [x for language, text in fulltext_tuples for x in (language, text) if language == 'english'] which is only ten characters longer. To me, though, there's simply no nice way of writing this: the repetition of "language, text" reads poorly regardless of whether there is a star or no star. If I were doing this more than once, I'd be strongly inclined to invest in a simple helper function to make this more readable: def filter_and_flatten(language, fulltext): for lang, text in fulltext: if lang == language: yield lang yield text filter_and_flatten('english', fulltext_tuples) In some ways, list comprehensions are a trap: their convenience and ease of use for the easy cases lure us into using them when we ought to be using a generator. But that's just my opinion. -- Steve From michael at mdupont.com Mon Oct 17 18:53:09 2016 From: michael at mdupont.com (Michael duPont) Date: Mon, 17 Oct 2016 22:53:09 +0000 Subject: [Python-ideas] Conditional Assignment in If Statement In-Reply-To: References: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> Message-ID: It was not my intention to declare those to be similar, just as a furthering train of thought. I agree that using "as" is a much more Pythonic syntax. I'm sure there was (and will be) some discussion as to whether it should operate like "if foo:" or "if foo is not None:". I'll look a bit further into the archives than I did to find previous discussions. For now, I'm a fan of: if get_foo() as foo: bar(foo) to replace the "if foo:" version: foo = get_foo() if foo: bar(foo) del foo On Mon, Oct 17, 2016 at 6:18 PM Chris Angelico wrote: > On Tue, Oct 18, 2016 at 9:11 AM, Michael duPont > wrote: > > What does everyone think about: > > > > if foo = get_foo(): > > bar(foo) > > > > as a means to replace: > > > > foo = get_foo() > > if not foo: > > bar(foo) > > del foo > > > > Might there be some better syntax or a different keyword? I constantly > run into this sort of use case. > > I'm pretty sure that syntax is never going to fly, for a variety of > reasons (to see most of them, just read up a C style guide). But this > syntax has been proposed now and then, analogously with the 'with' > statement: > > if get_foo() as foo: > bar(foo) > > Be careful of your definitions, though. You've said these as equivalent: > > if foo = get_foo(): > bar(foo) > > foo = get_foo() > if foo is not None: > bar(foo) > > foo = get_foo() > if not foo: > bar(foo) > del foo > > There are three quite different conditions here. Your last two are > roughly opposites of each other; but also, most people would expect > "if foo = get_foo()" to be the same condition as "if get_foo()", which > is not the same as "if get_foo() is not None". The semantics most > likely to be accepted would be for "if get_foo() as foo:" to use the > standard boolification rules of Python (and then make 'foo' available > in both 'if' and 'else' blocks). Would you support that? If so, check > out some of the previous threads on the subject - this is far from the > first time it's been discussed, and most likely won't be the last. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Oct 17 22:17:13 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 17 Oct 2016 19:17:13 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: > > For a more concrete example: > > [*range(x) for x in range(4)] > [*(),*(0,),*(0,1),*(0,1,2)] > [0, 0, 1, 0, 1, 2] > As Paul or someone pointed out, that's a fairly odd thing to do. It's the first time that use case has been mentioned in this thread. It's true you've managed to construct something that isn't done by flatten(). I would have had to think a while to see what you meant by the original if you haven't provided the intermediate interpretations. Of course, it's *really simple* to spell that in a natural way with existing syntax that isn't confusing like yours: [x for end in range(4) for x in range(end)] There is no possible way to construct something that would use the proposed syntax that can't be expressed more naturally with a nested loop... because it's just confusing syntax sugar for exactly that. Your example looks like some sort of interview quiz question to see if someone knows obscure and unusual syntax. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Oct 17 22:33:46 2016 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 17 Oct 2016 21:33:46 -0500 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: [Sven R. Kunze ] > Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also > welcome to the list, Alireza.) It follows from what the docs say, although I'd agree it may be helpful if the docs explicitly spelled out this consequence (that reverse=True also preserves the original order of equal elements - as the docs say, it's not that the _list_ "is reversed", is that "list elements are sorted as if each comparison were reversed"). > Do you think that simple solution could have a chance to be added to stdlib > somehow (with the possibility of speeding it up in the future)? Well, the sorting "how to" already explains the basic idea. The `.sort()` docs also explain that stability "is helpful for sorting in multiple passes (for example, sort by department, then by salary grade)". I suspect I know too much about this to be of any use in judging what's missing ;-) Speeding it wouldn't be easy - or usually necessary. The obvious "improvement" would do it all in a single `.sort()` invocation. But calling back into Python code to do fancy, multi-step comparisons is so expensive that I expect it would take a large N for saving some additional worst-case O(N*log(N)) sorting steps to repay the cost. If you'd like to play with that, here's a different `multisort()` implementation. Again `specs` is a list of (key_function, True-for-reverse) tuples, most-significant key first. And, again, no assumptions are made about what key functions return, and the sort continues to guarantee that only "<" comparisons are made: def _sorter(specs): keyfuncs, reversers = zip(*specs) class Wrapper(object): def __init__(self, obj): self.keys = tuple(f(obj) for f in keyfuncs) def __lt__(x, y): for a, b, r in zip(x.keys, y.keys, reversers): if a < b: return not r if b < a: return r return False # all the keys are equal return Wrapper def multisort(xs, specs): xs.sort(key=_sorter(specs)) From random832 at fastmail.com Mon Oct 17 22:50:55 2016 From: random832 at fastmail.com (Random832) Date: Mon, 17 Oct 2016 22:50:55 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> On Mon, Oct 17, 2016, at 22:17, David Mertz wrote: > > [*range(x) for x in range(4)] > > As Paul or someone pointed out, that's a fairly odd thing to do. I agree with the specific example of it being an odd thing to do with range, it was just an attempt to illustrate with a concrete example. > It's the first time that use case has been mentioned in this thread. I think that in general the "body involves a subexpression returning an iterable other than the bare loop variable" has been covered before, though it might not have been clear at all times that that was what was being discussed. Frankly, I think it's rare that something of the form "x for x ..." is best written with a comprehension in the first place, and the same would be true for "*x for x..." so I didn't like that some of the translations being discussed only work well for that case. > Of course, it's *really simple* to spell that in a natural way with > existing syntax that isn't confusing like yours: > > [x for end in range(4) for x in range(end)] I feel like I should be honest about something else - I'm always a little bit confused by the ordering for comprehensions involving multiple clauses. For me, it's the fact that: [[a for a in b] for b in ['uvw', 'xyz']] == [['u', 'v', 'w'], ['x', 'y', 'z']] which makes me want to write: [a for a in b for b in ['uvw', 'xyz']] but that's an error, and it actually needs to be [a for b in ['uvw', 'xyz'] for a in b] == ['u', 'v', 'w', 'x', 'y', 'z'] So when this talk of readability issues comes up and the recommended alternative is something that I don't really find readable, it's frustrating. To me this proposal is something that would allow for more things to be expressed without resorting to multi-loop comprehensions. > There is no possible way to construct something that would use the > proposed syntax that can't be expressed more naturally with a nested > loop... because it's just confusing syntax sugar for exactly that. > > Your example looks like some sort of interview quiz question to see if > someone knows obscure and unusual syntax. From mertz at gnosis.cx Mon Oct 17 23:32:21 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 17 Oct 2016 20:32:21 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> Message-ID: On Mon, Oct 17, 2016 at 7:50 PM, Random832 wrote: > On Mon, Oct 17, 2016, at 22:17, David Mertz wrote: > > > [*range(x) for x in range(4)] > > > > As Paul or someone pointed out, that's a fairly odd thing to do. > > I agree with the specific example of it being an odd thing to do with > range, it was just an attempt to illustrate with a concrete example. > It's also easy to construct examples where the hypothetical * syntax can't handle a requirement. E.g. flatten() with levels>1 (yes, of course you can find some way to nest more loops or more comprehensions-within-comprehensions to make it work in some way that still uses the * by force). I feel like I should be honest about something else - I'm always a > little bit confused by the ordering for comprehensions involving > multiple clauses. Me too! I get the order of nested loops in comprehensions wrong about 25% of the time. Then it's a NameError, and I fix it. This is a lot of why I like a utility function like `flatten()` that is pretty much self-documenting. Perhaps a couple other itertools helpers would be nice. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 18 02:10:06 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Oct 2016 16:10:06 +1000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> Message-ID: On 18 October 2016 at 03:49, Random832 wrote: > On Mon, Oct 17, 2016, at 13:32, Steven D'Aprano wrote: >> This isn't a small change: it requires not >> insignificant changes to people's understanding of what list >> comprehension syntax means and does. > > Only if their understanding is limited to a sequence of tokens that it > supposedly expands to [except for all the little differences like > whether a variable actually exists] Hi, I contributed the current list comprehension implementation (when refactoring it for Python 3 to avoid leaking the iteration variable, as requested in PEP 3100 [1]), and "comprehensions are syntactic sugar for a series of nested for and if statements" is precisely my understanding of how they work, and what they mean. It is also how they are frequently explained to new Python users. Directly insulting me and many of the educators who do so much to bring new users to Python by calling our understanding of a construct I implemented (and that you apparently love using) limited, is *not* doing your cause any favours, and is incredibly inappropriate behaviour for this list. Regards, Nick. [1] https://www.python.org/dev/peps/pep-3100/#core-language -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brenbarn at brenbarn.net Tue Oct 18 02:12:13 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Mon, 17 Oct 2016 23:12:13 -0700 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161017233513.GD22471@ando.pearwood.info> References: <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <58051563.7010904@brenbarn.net> <20161017233513.GD22471@ando.pearwood.info> Message-ID: <5805BD3D.6080800@brenbarn.net> On 2016-10-17 16:35, Steven D'Aprano wrote: >> >and many times I have been irritated by the fact that the >> >one-item-per-loop invariant exists. I'm not sure whether I'm in favor of >> >this particular syntax, but I'd like to be able to do the kind of things it >> >allows. But doing them inherently requires breaking the invariant you >> >describe. > That last point is incorrect. You already can do the kind of things this > thread is about: > > [*t for t in iterable] # proposed syntax: flatten > > can be written as: > > [x for t in iterable for x in t] Right, but by "doing those kinds of things" I mean doing them more in a more conise way without an extra level of iteration. (You can "do multiplication" by adding repeatedly, but it's still nice to have multiplication as an operation.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From greg.ewing at canterbury.ac.nz Tue Oct 18 02:23:38 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Oct 2016 19:23:38 +1300 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161018004946.GE22471@ando.pearwood.info> References: <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> <20161018004946.GE22471@ando.pearwood.info> Message-ID: <5805BFEA.3060909@canterbury.ac.nz> Steven D'Aprano wrote: > Your example shows the proposed: > > [*(language, text) for language, text in fulltext_tuples if language == 'english'] > > which can be written as: > > [x for language, text in fulltext_tuples for x in (language, text) if language == 'english'] > > which is only ten characters longer. To me, though, there's simply no > nice way of writing this: the repetition of "language, text" reads > poorly regardless of whether there is a star or no star. I think the ugliness of this particular example has roots in the fact that a tuple rather than an object with named fields is being used, which is going to make *any* piece of code that touches it a bit awkward. If it were a namedtuple, for example, you could write [*t for t in fulltext_tuples if t.language == 'english'] or [x for t in fulltext_tuples if t.language == 'english' for x in t] The latter is a bit unsatisfying, because we are having to make up an arbitrary name 'x' to stand for an element of t. Even though the two elements of t have quite different roles, we can't use names that reflect those roles. Because of that, to my eyes the version with * makes it easier to see what is going on. -- Greg From ncoghlan at gmail.com Tue Oct 18 02:31:25 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Oct 2016 16:31:25 +1000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> Message-ID: On 18 October 2016 at 13:32, David Mertz wrote: > On Mon, Oct 17, 2016 at 7:50 PM, Random832 wrote: >> I feel like I should be honest about something else - I'm always a >> little bit confused by the ordering for comprehensions involving >> multiple clauses. > > Me too! I get the order of nested loops in comprehensions wrong about 25% of > the time. Then it's a NameError, and I fix it. > > This is a lot of why I like a utility function like `flatten()` that is > pretty much self-documenting. Perhaps a couple other itertools helpers > would be nice. This is also one of the main reasons that named generator expression pipelines can sometimes be easier to read than nested comprehensions: incrementally_increasing_ranges = (range(end) for end in itertools.count()) flatten = itertools.chain.from_iterable incrementally_increasing_cycles = flatten(incrementally_increasing_ranges()) Forcing ourselves to come up with a name for the series of values produced by the outer iteration then makes that name available as documentation of our intent for future readers of the code. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Oct 18 02:40:38 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Oct 2016 19:40:38 +1300 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> Message-ID: <5805C3E6.9000505@canterbury.ac.nz> Random832 wrote: > For me, it's the fact that: > [[a for a in b] for b in ['uvw', 'xyz']] == [['u', 'v', 'w'], ['x', 'y', > 'z']] > which makes me want to write: > [a for a in b for b in ['uvw', 'xyz']] You're not alone! Lately I've been becoming convinced that this is the way we should have done it right back at the beginning. But it's far too late to change it now, sadly. Our only hope would be to introduce a new syntax, maybe [a for a in b; for b in ['uvw', 'xyz']] Inserting the semicolons mightn't be such a bad idea, because when doing the reversal it would be necessary to keep any 'if' clauses together with their preceding 'for' clause. So if we wrote [a for a in b if cond(a); for b in things] we could say the rule is that you split at the semicolons and then reverse the clauses. -- Greg From mar77i at mar77i.ch Tue Oct 18 03:37:23 2016 From: mar77i at mar77i.ch (=?UTF-8?Q?Martti_K=C3=BChne?=) Date: Tue, 18 Oct 2016 09:37:23 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> Message-ID: > I feel like I should be honest about something else - I'm always a > little bit confused by the ordering for comprehensions involving > multiple clauses. For me, it's the fact that: > [[a for a in b] for b in ['uvw', 'xyz']] == [['u', 'v', 'w'], ['x', 'y', > 'z']] > which makes me want to write: > [a for a in b for b in ['uvw', 'xyz']] > but that's an error, and it actually needs to be > [a for b in ['uvw', 'xyz'] for a in b] == ['u', 'v', 'w', 'x', 'y', 'z'] > > So when this talk of readability issues comes up and the recommended > alternative is something that I don't really find readable, it's > frustrating. To me this proposal is something that would allow for more > things to be expressed without resorting to multi-loop comprehensions. > Thinking about it, though, I ended up exactly where you are now, except that I then thought about where an item would be known, and it seemed to me, yes, an item would be more likely known *after* looping over it rather than before: [bi for bi in before for before in iterable] # why should "before" exist before it is looped over? correctly: [bi for before in iterable for bi in before] it doesn't, it should be declared to the right hand side and only the result is kept over at the left hand edge. On a same note, with if expressions the picture might look different: [bi for bi in before for before in iterable if before[0] < 3] # ... is bi filtered now or not? correctly: [bi for before in iterable if before[0] < 3 for bi in before] it is filtered very clearly this way. cheers! mar77i From dmoisset at machinalis.com Tue Oct 18 04:01:23 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Tue, 18 Oct 2016 09:01:23 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: On 17 October 2016 at 21:22, Random832 wrote: > > No, it's not. > > For a more concrete example: > > [*range(x) for x in range(4)] > [*(),*(0,),*(0,1),*(0,1,2)] > [0, 0, 1, 0, 1, 2] > > There is simply no way to get there by using flatten(range(4)). the equivalent flatten for that is: flatten(range(x) for x in range(4)) ; flatten has no magic so will not replace a piece of code with two range calls (like your example) for code with one. I see some mention that flatten does not cover all cases; but correct me if I'm wrong with this statement: Any case of [* for in ] could be replaced with flatten( for in ). Where flatten is defined as def flatten(it): return [x for for subit in it for x in subit] (there is a slight difference where I'm making flatten iterable instead of a list) What perhaps was confusing is that in the case where and are the same, you can also write flatten(). So, for me, this feature is something that could be covered with a (new) function with no new syntax required. All you have to learn is that instead of [*...] you use flatten(...) Am I wrong? I keep reading people on both sides saying "flatten is not enough in all cases", and I can find a counterexample (even for 1 level flatten which is what I used here) PS: or alternatively, flatten = lambda it: list(itertools.chain(it)) # :) PPS: or if you prefer to work with iterators, flatten = itertools.chain -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Oct 18 04:15:27 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 18 Oct 2016 09:15:27 +0100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> Message-ID: On 18 October 2016 at 07:31, Nick Coghlan wrote: > > Forcing ourselves to come up with a name for the series of values > produced by the outer iteration then makes that name available as > documentation of our intent for future readers of the code. This is a key point, that is frequently missed when people propose new "shorter" syntax, or constructs that "reduce indentation levels". Make no mistake, coming up with good names is *hard* (and I have enormous respect for the (often unrecognised) people who come up with intuitive APIs and names for functions). So it's very easy to be tempted by "concise" constructs that offer the option of not naming an operation. But as someone who spends 99% of his time doing maintenance programming, be sure that the people who support your code will thank you for spending time naming your abstractions (and avoiding using constructs like the one proposed here). Paul From random832 at fastmail.com Tue Oct 18 11:23:18 2016 From: random832 at fastmail.com (Random832) Date: Tue, 18 Oct 2016 11:23:18 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: <1476804198.1148692.759756673.4E51174B@webmail.messagingengine.com> On Tue, Oct 18, 2016, at 04:01, Daniel Moisset wrote: > I see some mention that flatten does not cover all cases; but correct > me if I'm wrong with this statement: > > Any case of [* for in ] could be replaced with > flatten( for in ). Where flatten is defined as > > def flatten(it): > return [x for for subit in it for x in subit] That is correct - though flatten as previously discussed did not return a list, so list(flatten(...)) was required, though I suppose you could use [*flatten(...)] - my point was that [especially with the list constructor] this is significantly more verbose than the proposed syntax. From rob.cliffe at btinternet.com Tue Oct 18 18:08:20 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 18 Oct 2016 23:08:20 +0100 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: <5805C3E6.9000505@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> Message-ID: On 18/10/2016 07:40, Greg Ewing wrote: > Random832 wrote: >> For me, it's the fact that: >> [[a for a in b] for b in ['uvw', 'xyz']] == [['u', 'v', 'w'], ['x', 'y', >> 'z']] >> which makes me want to write: >> [a for a in b for b in ['uvw', 'xyz']] > > You're not alone! Lately I've been becoming convinced that > this is the way we should have done it right back at the > beginning. Me too. When I first got to grips with the order of loops in a list comprehension, I found it counter-intuitive and jarring. I still have to remind myself each time. I guess different people have different mental models, and may feel differently. The best way I can think of to explain why I feel this way is: If the syntax were [ x for x in alist for alist in list-of-lists ] there is a smooth (or rather, monotonic) gradation from the smallest object (x) to the next biggest object (alist) to the biggest object (list-of-lists), which IMHO is easier to follow. Each object is conceptually zero or one steps from its neighbour. But with the actual syntax [ x for alist in list-of-lists for x in alist ] there is a conceptual hiatus after "x" ("what on earth are alist and list-of-lists, and what have they got do to with x?"). This would be even more marked with more than 2 loops: we jump from the lowest level object to the two highest level objects, and it all seems like a disorienting non-sequitur until the very last loop "joins the dots". You have to identify the "x" at the beginning with the "x" near (but not at!) the end. Instead of (ideally, if not always in practice) reading the expression from left-to-right in one go, your eyes are forced to jump around in order for your brain to assimilate it. A possible alternative syntax might be to follow more closely the for-loop syntax, e.g. [ for alist in list-of-lists: for x in alist: x ] Here the "conceptual jump" between each object and the next is either 1 or 2, which for me makes this a "second best" option. But at least the "conceptual jump" is bounded (regardless of the number of loops), and this syntax has the advantage of familiarity. > But it's far too late to change it now, sadly. Indeed. :-( But if I were ruler of the world and could have my own wish-list for Python 4, this (as per the first example) would be on it. Best wishes Rob Cliffe From njs at pobox.com Wed Oct 19 00:38:32 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Oct 2016 21:38:32 -0700 Subject: [Python-ideas] Deterministic iterator cleanup Message-ID: Hi all, I'd like to propose that Python's iterator protocol be enhanced to add a first-class notion of completion / cleanup. This is mostly motivated by thinking about the issues around async generators and cleanup. Unfortunately even though PEP 525 was accepted I found myself unable to stop pondering this, and the more I've pondered the more convinced I've become that the GC hooks added in PEP 525 are really not enough, and that we'll regret it if we stick with them, or at least with them alone :-/. The strategy here is pretty different -- it's an attempt to dig down and make a fundamental improvement to the language that fixes a number of long-standing rough spots, including async generators. The basic concept is relatively simple: just adding a '__iterclose__' method that 'for' loops call upon completion, even if that's via break or exception. But, the overall issue is fairly complicated + iterators have a large surface area across the language, so the text below is pretty long. Mostly I wrote it all out to convince myself that there wasn't some weird showstopper lurking somewhere :-). For a first pass discussion, it probably makes sense to mainly focus on whether the basic concept makes sense? The main rationale is at the top, but the details are there too for those who want them. Also, for *right* now I'm hoping -- probably unreasonably -- to try to get the async iterator parts of the proposal in ASAP, ideally for 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal like this, which I apologize for -- though async generators are provisional in 3.6, so at least in theory changing them is not out of the question.) So again, it might make sense to focus especially on the async parts, which are a pretty small and self-contained part, and treat the rest of the proposal as a longer-term plan provided for context. The comparison to PEP 525 GC hooks comes right after the initial rationale. Anyway, I'll be interested to hear what you think! -n ------------------ Abstract ======== We propose to extend the iterator protocol with a new ``__(a)iterclose__`` slot, which is called automatically on exit from ``(async) for`` loops, regardless of how they exit. This allows for convenient, deterministic cleanup of resources held by iterators without reliance on the garbage collector. This is especially valuable for asynchronous generators. Note on timing ============== In practical terms, the proposal here is divided into two separate parts: the handling of async iterators, which should ideally be implemented ASAP, and the handling of regular iterators, which is a larger but more relaxed project that can't start until 3.7 at the earliest. But since the changes are closely related, and we probably don't want to end up with async iterators and regular iterators diverging in the long run, it seems useful to look at them together. Background and motivation ========================= Python iterables often hold resources which require cleanup. For example: ``file`` objects need to be closed; the `WSGI spec `_ adds a ``close`` method on top of the regular iterator protocol and demands that consumers call it at the appropriate time (though forgetting to do so is a `frequent source of bugs `_); and PEP 342 (based on PEP 325) extended generator objects to add a ``close`` method to allow generators to clean up after themselves. Generally, objects that need to clean up after themselves also define a ``__del__`` method to ensure that this cleanup will happen eventually, when the object is garbage collected. However, relying on the garbage collector for cleanup like this causes serious problems in at least two cases: - In Python implementations that do not use reference counting (e.g. PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet many situations require *prompt* cleanup of resources. Delayed cleanup produces problems like crashes due to file descriptor exhaustion, or WSGI timing middleware that collects bogus times. - Async generators (PEP 525) can only perform cleanup under the supervision of the appropriate coroutine runner. ``__del__`` doesn't have access to the coroutine runner; indeed, the coroutine runner might be garbage collected before the generator object. So relying on the garbage collector is effectively impossible without some kind of language extension. (PEP 525 does provide such an extension, but it has a number of limitations that this proposal fixes; see the "alternatives" section below for discussion.) Fortunately, Python provides a standard tool for doing resource cleanup in a more structured way: ``with`` blocks. For example, this code opens a file but relies on the garbage collector to close it:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) for document in read_newline_separated_json(path): ... and recent versions of CPython will point this out by issuing a ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: def read_newline_separated_json(path): with open(path) as file_handle: # <-- with block for line in file_handle: yield json.loads(line) for document in read_newline_separated_json(path): # <-- outer for loop ... But there's a subtlety here, caused by the interaction of ``with`` blocks and generators. ``with`` blocks are Python's main tool for managing cleanup, and they're a powerful one, because they pin the lifetime of a resource to the lifetime of a stack frame. But this assumes that someone will take care of cleaning up the stack frame... and for generators, this requires that someone ``close`` them. In this case, adding the ``with`` block *is* enough to shut up the ``ResourceWarning``, but this is misleading -- the file object cleanup here is still dependent on the garbage collector. The ``with`` block will only be unwound when the ``read_newline_separated_json`` generator is closed. If the outer ``for`` loop runs to completion then the cleanup will happen immediately; but if this loop is terminated early by a ``break`` or an exception, then the ``with`` block won't fire until the generator object is garbage collected. The correct solution requires that all *users* of this API wrap every ``for`` loop in its own ``with`` block:: with closing(read_newline_separated_json(path)) as genobj: for document in genobj: ... This gets even worse if we consider the idiom of decomposing a complex pipeline into multiple nested generators:: def read_users(path): with closing(read_newline_separated_json(path)) as gen: for document in gen: yield User.from_json(document) def users_in_group(path, group): with closing(read_users(path)) as gen: for user in gen: if user.group == group: yield user In general if you have N nested generators then you need N+1 ``with`` blocks to clean up 1 file. And good defensive programming would suggest that any time we use a generator, we should assume the possibility that there could be at least one ``with`` block somewhere in its (potentially transitive) call stack, either now or in the future, and thus always wrap it in a ``with``. But in practice, basically nobody does this, because programmers would rather write buggy code than tiresome repetitive code. In simple cases like this there are some workarounds that good Python developers know (e.g. in this simple case it would be idiomatic to pass in a file handle instead of a path and move the resource management to the top level), but in general we cannot avoid the use of ``with``/``finally`` inside of generators, and thus dealing with this problem one way or another. When beauty and correctness fight then beauty tends to win, so it's important to make correct code beautiful. Still, is this worth fixing? Until async generators came along I would have argued yes, but that it was a low priority, since everyone seems to be muddling along okay -- but async generators make it much more urgent. Async generators cannot do cleanup *at all* without some mechanism for deterministic cleanup that people will actually use, and async generators are particularly likely to hold resources like file descriptors. (After all, if they weren't doing I/O, they'd be generators, not async generators.) So we have to do something, and it might as well be a comprehensive fix to the underlying problem. And it's much easier to fix this now when async generators are first rolling out, then it will be to fix it later. The proposal itself is simple in concept: add a ``__(a)iterclose__`` method to the iterator protocol, and have (async) ``for`` loops call it when the loop is exited, even if this occurs via ``break`` or exception unwinding. Effectively, we're taking the current cumbersome idiom (``with`` block + ``for`` loop) and merging them together into a fancier ``for``. This may seem non-orthogonal, but makes sense when you consider that the existence of generators means that ``with`` blocks actually depend on iterator cleanup to work reliably, plus experience showing that iterator cleanup is often a desireable feature in its own right. Alternatives ============ PEP 525 asyncgen hooks ---------------------- PEP 525 proposes a `set of global thread-local hooks managed by new ``sys.{get/set}_asyncgen_hooks()`` functions `_, which allow event loops to integrate with the garbage collector to run cleanup for async generators. In principle, this proposal and PEP 525 are complementary, in the same way that ``with`` blocks and ``__del__`` are complementary: this proposal takes care of ensuring deterministic cleanup in most cases, while PEP 525's GC hooks clean up anything that gets missed. But ``__aiterclose__`` provides a number of advantages over GC hooks alone: - The GC hook semantics aren't part of the abstract async iterator protocol, but are instead restricted `specifically to the async generator concrete type `_. If you have an async iterator implemented using a class, like:: class MyAsyncIterator: async def __anext__(): ... then you can't refactor this into an async generator without changing its semantics, and vice-versa. This seems very unpythonic. (It also leaves open the question of what exactly class-based async iterators are supposed to do, given that they face exactly the same cleanup problems as async generators.) ``__aiterclose__``, on the other hand, is defined at the protocol level, so it's duck-type friendly and works for all iterators, not just generators. - Code that wants to work on non-CPython implementations like PyPy cannot in general rely on GC for cleanup. Without ``__aiterclose__``, it's more or less guaranteed that developers who develop and test on CPython will produce libraries that leak resources when used on PyPy. Developers who do want to target alternative implementations will either have to take the defensive approach of wrapping every ``for`` loop in a ``with`` block, or else carefully audit their code to figure out which generators might possibly contain cleanup code and add ``with`` blocks around those only. With ``__aiterclose__``, writing portable code becomes easy and natural. - An important part of building robust software is making sure that exceptions always propagate correctly without being lost. One of the most exciting things about async/await compared to traditional callback-based systems is that instead of requiring manual chaining, the runtime can now do the heavy lifting of propagating errors, making it *much* easier to write robust code. But, this beautiful new picture has one major gap: if we rely on the GC for generator cleanup, then exceptions raised during cleanup are lost. So, again, with ``__aiterclose__``, developers who care about this kind of robustness will either have to take the defensive approach of wrapping every ``for`` loop in a ``with`` block, or else carefully audit their code to figure out which generators might possibly contain cleanup code. ``__aiterclose__`` plugs this hole by performing cleanup in the caller's context, so writing more robust code becomes the path of least resistance. - The WSGI experience suggests that there exist important iterator-based APIs that need prompt cleanup and cannot rely on the GC, even in CPython. For example, consider a hypothetical WSGI-like API based around async/await and async iterators, where a response handler is an async generator that takes request headers + an async iterator over the request body, and yields response headers + the response body. (This is actually the use case that got me interested in async generators in the first place, i.e. this isn't hypothetical.) If we follow WSGI in requiring that child iterators must be closed properly, then without ``__aiterclose__`` the absolute most minimalistic middleware in our system looks something like:: async def noop_middleware(handler, request_header, request_body): async with aclosing(handler(request_body, request_body)) as aiter: async for response_item in aiter: yield response_item Arguably in regular code one can get away with skipping the ``with`` block around ``for`` loops, depending on how confident one is that one understands the internal implementation of the generator. But here we have to cope with arbitrary response handlers, so without ``__aiterclose__``, this ``with`` construction is a mandatory part of every middleware. ``__aiterclose__`` allows us to eliminate the mandatory boilerplate and an extra level of indentation from every middleware:: async def noop_middleware(handler, request_header, request_body): async for response_item in handler(request_header, request_body): yield response_item So the ``__aiterclose__`` approach provides substantial advantages over GC hooks. This leaves open the question of whether we want a combination of GC hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since the vast majority of generators are iterated over using a ``for`` loop or equivalent, ``__aiterclose__`` handles most situations before the GC has a chance to get involved. The case where GC hooks provide additional value is in code that does manual iteration, e.g.:: agen = fetch_newline_separated_json_from_url(...) while True: document = await type(agen).__anext__(agen) if document["id"] == needle: break # doesn't do 'await agen.aclose()' If we go with the GC-hooks + ``__aiterclose__`` approach, this generator will eventually be cleaned up by GC calling the generator ``__del__`` method, which then will use the hooks to call back into the event loop to run the cleanup code. If we go with the no-GC-hooks approach, this generator will eventually be garbage collected, with the following effects: - its ``__del__`` method will issue a warning that the generator was not closed (similar to the existing "coroutine never awaited" warning). - The underlying resources involved will still be cleaned up, because the generator frame will still be garbage collected, causing it to drop references to any file handles or sockets it holds, and then those objects's ``__del__`` methods will release the actual operating system resources. - But, any cleanup code inside the generator itself (e.g. logging, buffer flushing) will not get a chance to run. The solution here -- as the warning would indicate -- is to fix the code so that it calls ``__aiterclose__``, e.g. by using a ``with`` block:: async with aclosing(fetch_newline_separated_json_from_url(...)) as agen: while True: document = await type(agen).__anext__(agen) if document["id"] == needle: break Basically in this approach, the rule would be that if you want to manually implement the iterator protocol, then it's your responsibility to implement all of it, and that now includes ``__(a)iterclose__``. GC hooks add non-trivial complexity in the form of (a) new global interpreter state, (b) a somewhat complicated control flow (e.g., async generator GC always involves resurrection, so the details of PEP 442 are important), and (c) a new public API in asyncio (``await loop.shutdown_asyncgens()``) that users have to remember to call at the appropriate time. (This last point in particular somewhat undermines the argument that GC hooks provide a safe backup to guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called correctly then I *think* it's possible for generators to be silently discarded without their cleanup code being called; compare this to the ``__aiterclose__``-only approach where in the worst case we still at least get a warning printed. This might be fixable.) All this considered, GC hooks arguably aren't worth it, given that the only people they help are those who want to manually call ``__anext__`` yet don't want to manually call ``__aiterclose__``. But Yury disagrees with me on this :-). And both options are viable. Always inject resources, and do all cleanup at the top level ------------------------------------------------------------ It was suggested on python-dev (XX find link) that a pattern to avoid these problems is to always pass resources in from above, e.g. ``read_newline_separated_json`` should take a file object rather than a path, with cleanup handled at the top level:: def read_newline_separated_json(file_handle): for line in file_handle: yield json.loads(line) def read_users(file_handle): for document in read_newline_separated_json(file_handle): yield User.from_json(document) with open(path) as file_handle: for user in read_users(file_handle): ... This works well in simple cases; here it lets us avoid the "N+1 ``with`` blocks problem". But unfortunately, it breaks down quickly when things get more complex. Consider if instead of reading from a file, our generator was reading from a streaming HTTP GET request -- while handling redirects and authentication via OAUTH. Then we'd really want the sockets to be managed down inside our HTTP client library, not at the top level. Plus there are other cases where ``finally`` blocks embedded inside generators are important in their own right: db transaction management, emitting logging information during cleanup (one of the major motivating use cases for WSGI ``close``), and so forth. So this is really a workaround for simple cases, not a general solution. More complex variants of __(a)iterclose__ ----------------------------------------- The semantics of ``__(a)iterclose__`` are somewhat inspired by ``with`` blocks, but context managers are more powerful: ``__(a)exit__`` can distinguish between a normal exit versus exception unwinding, and in the case of an exception it can examine the exception details and optionally suppress propagation. ``__(a)iterclose__`` as proposed here does not have these powers, but one can imagine an alternative design where it did. However, this seems like unwarranted complexity: experience suggests that it's common for iterables to have ``close`` methods, and even to have ``__exit__`` methods that call ``self.close()``, but I'm not aware of any common cases that make use of ``__exit__``'s full power. I also can't think of any examples where this would be useful. And it seems unnecessarily confusing to allow iterators to affect flow control by swallowing exceptions -- if you're in a situation where you really want that, then you should probably use a real ``with`` block anyway. Specification ============= This section describes where we want to eventually end up, though there are some backwards compatibility issues that mean we can't jump directly here. A later section describes the transition plan. Guiding principles ------------------ Generally, ``__(a)iterclose__`` implementations should: - be idempotent, - perform any cleanup that is appropriate on the assumption that the iterator will not be used again after ``__(a)iterclose__`` is called. In particular, once ``__(a)iterclose__`` has been called then calling ``__(a)next__`` produces undefined behavior. And generally, any code which starts iterating through an iterable with the intention of exhausting it, should arrange to make sure that ``__(a)iterclose__`` is eventually called, whether or not the iterator is actually exhausted. Changes to iteration -------------------- The core proposal is the change in behavior of ``for`` loops. Given this Python code:: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of:: _iter = iter(ITERABLE) _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) try: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY finally: _iterclose(_iter) where the "traditional-for statement" here is meant as a shorthand for the classic 3.5-and-earlier ``for`` loop semantics. Besides the top-level ``for`` statement, Python also contains several other places where iterators are consumed. For consistency, these should call ``__iterclose__`` as well using semantics equivalent to the above. This includes: - ``for`` loops inside comprehensions - ``*`` unpacking - functions which accept and fully consume iterables, like ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and others. Changes to async iteration -------------------------- We also make the analogous changes to async iteration constructs, except that the new slot is called ``__aiterclose__``, and it's an async method that gets ``await``\ed. Modifications to basic iterator types ------------------------------------- Generator objects (including those created by generator comprehensions): - ``__iterclose__`` calls ``self.close()`` - ``__del__`` calls ``self.close()`` (same as now), and additionally issues a ``ResourceWarning`` if the generator wasn't exhausted. This warning is hidden by default, but can be enabled for those who want to make sure they aren't inadverdantly relying on CPython-specific GC semantics. Async generator objects (including those created by async generator comprehensions): - ``__aiterclose__`` calls ``self.aclose()`` - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been called, since this probably indicates a latent bug, similar to the "coroutine never awaited" warning. QUESTION: should file objects implement ``__iterclose__`` to close the file? On the one hand this would make this change more disruptive; on the other hand people really like writing ``for line in open(...): ...``, and if we get used to iterators taking care of their own cleanup then it might become very weird if files don't. New convenience functions ------------------------- The ``itertools`` module gains a new iterator wrapper that can be used to selectively disable the new ``__iterclose__`` behavior:: # QUESTION: I feel like there might be a better name for this one? class preserve(iterable): def __init__(self, iterable): self._it = iter(iterable) def __iter__(self): return self def __next__(self): return next(self._it) def __iterclose__(self): # Swallow __iterclose__ without passing it on pass Example usage (assuming that file objects implements ``__iterclose__``):: with open(...) as handle: # Iterate through the same file twice: for line in itertools.preserve(handle): ... handle.seek(0) for line in itertools.preserve(handle): ... The ``operator`` module gains two new functions, with semantics equivalent to the following:: def iterclose(it): if hasattr(type(it), "__iterclose__"): type(it).__iterclose__(it) async def aiterclose(ait): if hasattr(type(ait), "__aiterclose__"): await type(ait).__aiterclose__(ait) These are particularly useful when implementing the changes in the next section: __iterclose__ implementations for iterator wrappers --------------------------------------------------- Python ships a number of iterator types that act as wrappers around other iterators: ``map``, ``zip``, ``itertools.accumulate``, ``csv.reader``, and others. These iterators should define a ``__iterclose__`` method which calls ``__iterclose__`` in turn on their underlying iterators. For example, ``map`` could be implemented as:: class map: def __init__(self, fn, *iterables): self._fn = fn self._iters = [iter(iterable) for iterable in iterables] def __iter__(self): return self def __next__(self): return self._fn(*[next(it) for it in self._iters]) def __iterclose__(self): for it in self._iters: operator.iterclose(it) In some cases this requires some subtlety; for example, ```itertools.tee`` `_ should not call ``__iterclose__`` on the underlying iterator until it has been called on *all* of the clone iterators. Example / Rationale ------------------- The payoff for all this is that we can now write straightforward code like:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) and be confident that the file will receive deterministic cleanup *without the end-user having to take any special effort*, even in complex cases. For example, consider this silly pipeline:: list(map(lambda key: key.upper(), doc["key"] for doc in read_newline_separated_json(path))) If our file contains a document where ``doc["key"]`` turns out to be an integer, then the following sequence of events will happen: 1. ``key.upper()`` raises an ``AttributeError``, which propagates out of the ``map`` and triggers the implicit ``finally`` block inside ``list``. 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the map object. 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator comprehension object. 4. This injects a ``GeneratorExit`` exception into the generator comprehension body, which is currently suspended inside the comprehension's ``for`` loop body. 5. The exception propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__`` on the generator object representing the call to ``read_newline_separated_json``. 6. This injects an inner ``GeneratorExit`` exception into the body of ``read_newline_separated_json``, currently suspended at the ``yield``. 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__()`` on the file object. 8. The file object is closed. 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary of the generator function, and causes ``read_newline_separated_json``'s ``__iterclose__()`` method to return successfully. 10. Control returns to the generator comprehension body, and the outer ``GeneratorExit`` continues propagating, allowing the comprehension's ``__iterclose__()`` to return successfully. 11. The rest of the ``__iterclose__()`` calls unwind without incident, back into the body of ``list``. 12. The original ``AttributeError`` resumes propagating. (The details above assume that we implement ``file.__iterclose__``; if not then add a ``with`` block to ``read_newline_separated_json`` and essentially the same logic goes through.) Of course, from the user's point of view, this can be simplified down to just: 1. ``int.upper()`` raises an ``AttributeError`` 1. The file object is closed. 2. The ``AttributeError`` propagates out of ``list`` So we've accomplished our goal of making this "just work" without the user having to think about it. Transition plan =============== While the majority of existing ``for`` loops will continue to produce identical results, the proposed changes will produce backwards-incompatible behavior in some cases. Example:: def read_csv_with_header(lines_iterable): lines_iterator = iter(lines_iterable) for line in lines_iterator: column_names = line.strip().split("\t") break for line in lines_iterator: values = line.strip().split("\t") record = dict(zip(column_names, values)) yield record This code used to be correct, but after this proposal is implemented will require an ``itertools.preserve`` call added to the first ``for`` loop. [QUESTION: currently, if you close a generator and then try to iterate over it then it just raises ``Stop(Async)Iteration``, so code the passes the same generator object to multiple ``for`` loops but forgets to use ``itertools.preserve`` won't see an obvious error -- the second ``for`` loop will just exit immediately. Perhaps it would be better if iterating a closed generator raised a ``RuntimeError``? Note that files don't have this problem -- attempting to iterate a closed file object already raises ``ValueError``.] Specifically, the incompatibility happens when all of these factors come together: - The automatic calling of ``__(a)iterclose__`` is enabled - The iterable did not previously define ``__(a)iterclose__`` - The iterable does now define ``__(a)iterclose__`` - The iterable is re-used after the ``for`` loop exits So the problem is how to manage this transition, and those are the levers we have to work with. First, observe that the only async iterables where we propose to add ``__aiterclose__`` are async generators, and there is currently no existing code using async generators (though this will start changing very soon), so the async changes do not produce any backwards incompatibilities. (There is existing code using async iterators, but using the new async for loop on an old async iterator is harmless, because old async iterators don't have ``__aiterclose__``.) In addition, PEP 525 was accepted on a provisional basis, and async generators are by far the biggest beneficiary of this PEP's proposed changes. Therefore, I think we should strongly consider enabling ``__aiterclose__`` for ``async for`` loops and async generators ASAP, ideally for 3.6.0 or 3.6.1. For the non-async world, things are harder, but here's a potential transition path: In 3.7: Our goal is that existing unsafe code will start emitting warnings, while those who want to opt-in to the future can do that immediately: - We immediately add all the ``__iterclose__`` methods described above. - If ``from __future__ import iterclose`` is in effect, then ``for`` loops and ``*`` unpacking call ``__iterclose__`` as specified above. - If the future is *not* enabled, then ``for`` loops and ``*`` unpacking do *not* call ``__iterclose__``. But they do call some other method instead, e.g. ``__iterclose_warning__``. - Similarly, functions like ``list`` use stack introspection (!!) to check whether their direct caller has ``__future__.iterclose`` enabled, and use this to decide whether to call ``__iterclose__`` or ``__iterclose_warning__``. - For all the wrapper iterators, we also add ``__iterclose_warning__`` methods that forward to the ``__iterclose_warning__`` method of the underlying iterator or iterators. - For generators (and files, if we decide to do that), ``__iterclose_warning__`` is defined to set an internal flag, and other methods on the object are modified to check for this flag. If they find the flag set, they issue a ``PendingDeprecationWarning`` to inform the user that in the future this sequence would have led to a use-after-close situation and the user should use ``preserve()``. In 3.8: - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` In 3.9: - Enable the ``__future__`` unconditionally and remove all the ``__iterclose_warning__`` stuff. I believe that this satisfies the normal requirements for this kind of transition -- opt-in initially, with warnings targeted precisely to the cases that will be effected, and a long deprecation cycle. Probably the most controversial / risky part of this is the use of stack introspection to make the iterable-consuming functions sensitive to a ``__future__`` setting, though I haven't thought of any situation where it would actually go wrong yet... Acknowledgements ================ Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for helpful discussion on earlier versions of this idea. -- Nathaniel J. Smith -- https://vorpus.org From desmoulinmichel at gmail.com Wed Oct 19 07:29:42 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 19 Oct 2016 13:29:42 +0200 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: Message-ID: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> +1. I read many disagreements, and people being rude and unprofessional on occasions, but nothing that would make me have a bad day, even when I was the target of it. I feel like people are really getting hyper sensitive about communications. While I do prefer talking to calm rational people with a friendly tone, I acknowledge this is not always the case and it's ok if somebody go overboard from time to time. We are not living in a perfect world, and spending a lot of effort trying to smooth everything out seems overkill to me. Le 18/10/2016 ? 01:10, Rene Nejsum a ?crit : > Dear Brett/ > > I have been reading the python-idea archive from time to time over the > past years and I joined the list about a month ago to promote my ?crazy? > async object idea. I did fear the response to a newcomer with an > unlikely idea, but I must say the *everyone* has been extremely nice, > writing often long answer to discussions and trying to understand where > I?m coming from with this idea. And it definitely made me try to think a > little extra before sending responses ? > > I did also raise an eye-brow when reading some of the comments in the > thread you mentioned, they seam a little out of touch with my experience > on other threads here. > > Hope some time off will do you good, my best advice to you and others is > something that have helped me, in similar situations is the old saying > ?Other peoples opinion of you, are none of your business? :-) It took > me some years to get it, but now it helps me every time i get worked up > about something another person says to me or about me. > > best > /Rene > > > >> On 17 Oct 2016, at 20:29, Brett Cannon > > wrote: >> >> >> Based on some emails I read in the " unpacking generalisations for >> list comprehension", I feel like I need to address this entire list >> about its general behaviour. >> >> If you don't follow me on Twitter you may not be aware that I am >> taking the entire month of October off from volunteering any personal >> time on Python for my personal well-being (this reply is being done on >> work time for instance). This stems from my wife pointing out that I >> had been rather stressed in July and August outside of work in >> relation to my Python volunteering (having your weekends ruined is >> never fun). That stress stemmed primarily from two rather bad >> interactions I had to contend with on the issue track in July and >> August ... and this mailing list. >> >> When I have talked to people about this mailing list it's often >> referred to by others as the "wild west" of Python development >> discussions (if you're not familiar with US culture, that turn of >> phrase basically means "anything goes"). To me that is not a >> compliment. When I created this list with Titus the goal was to >> provide a safe place where people could bring up ideas for Python >> where people could quickly provide basic feedback so people could know >> whether there was any chance that python-dev would consider the >> proposal. This was meant to be a win for proposers by not feeling like >> they were wasting python-dev's time and a win for python-dev by >> keeping that list focused on the development of Python and not >> fielding every idea that people want to propose. >> >> And while this list has definitely helped with the cognitive load on >> python-dev, it has not always provided a safe place for people to >> express ideas. I have seen people completely dismiss people's >> expertise and opinion. There has been name calling and yelling at >> people (which is always unnecessary). There have been threads that >> have completely derailed itself and gone entirely off-topic. IOW I >> would not hold this mailing list up as an example of the general >> discourse that I experience elsewhere within the community. >> >> Now I realize that we are all human beings coming from different >> cultural backgrounds and lives. We all have bad days and may not take >> the time to stop and think about what we are typing before sending it, >> leading to emails that are worded in a way that can be hurtful to >> others. It's also easy to forget that various cultures views things >> differently and so that can lead to people "reading between the lines" >> a lot and picking up things that were never intended. There are 1,031 >> people on this mailing list from around the world and it's easy to >> forget that e.g. Canadian humour may not translate well to Ukrainian >> culture (or something). What this means is it's okay to *nicely* say >> that something bothered you, but also try to give people the benefit >> of the doubt as you don't know what their day had been like before >> they wrote that email (I personally don't like the "just mute the >> thread" approach to dealing with bad actors when the muting is silent >> as that doesn't help new people who join this mailing list and the >> first email they see is someone being rude that everyone else didn't >> see because they muted the thread days ago). >> >> As for the off-topic threads, please remember there are 1,031 people >> on this mailing list (this doesn't count people reading through gmane >> or Google Groups). Being extremely generous and assuming every person >> on this list only spends 10 seconds deciding if they care about your >> email, that's still nearly 3 hours of cumulative time spent on your >> email. So please be cognisant when you reply, and if you want to have >> an off-topic conversation, please take it off-list. >> >> And finally, as one of the list administrators I am in a position of >> power when it comes to the rules of this list and the CoC. While I'm >> one of the judges on when someone has violated the CoC, I purposefully >> try not to play the role of police to avoid bias and abuse of power. >> What that means is that I never personally lodge a CoC complaint >> against anyone. That means that if you feel someone is being abusive >> here you cannot rely on list admins noticing and doing something about >> it. If you feel someone has continuously been abusive on this list and >> violating the CoC then you must email the list admins about it if you >> wish to see action taken (all communications are kept private among >> the admins). Now I'm not asking people to email us on every small >> infraction (as I said above, try to give everyone a break knowing we >> all have bad days), but if you notice a pattern then you need to speak >> up if you would like to see something change. >> >> When I started my month off I thought that maybe if I only read this >> mailing list once a week that the frequency would be low enough that I >> could handle the stress of being both a participant and admin who is >> ultimately responsible for the behaviour here, but I'm afraid that >> isn't going to cut it. What I don't think people realize is that I >> don't take my responsibility as admin lightly; any time anyone acts >> rudely I take it personally like I somehow failed by letting the >> atmosphere and discourse on this list become what it is. Because of >> this I'm afraid I need to mute this mailing list for the rest of my >> vacation from volunteering in the Python community after I send this >> email. I personally hope people do take the time to read this email >> and reflect upon how they conduct themselves on this mailing list -- >> and maybe on other lists as well -- so that when I attempt to come >> back in November I don't have to permanent stop being a participant on >> this list and simply become an admin for this list to prevent complete >> burn-out for me in the Python community (and I know this last sentence >> sounds dramatic, but I'm being serious; the irony of receiving the >> Frank Willison award the same year I'm having to contemplate >> fundamentally shifting how I engage with the community to not burn out >> is not lost on me). >> >> -Brett >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From vxgmichel at gmail.com Wed Oct 19 07:24:50 2016 From: vxgmichel at gmail.com (Vincent Michel) Date: Wed, 19 Oct 2016 13:24:50 +0200 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: <58075802.3080402@gmail.com> Thanks Nathaniel for this great proposal. As I went through your mail, I realized all the comments I wanted to make were already covered in later paragraphs. And I don't think there's a single point I disagree with. I don't have a strong opinion about the synchronous part of the proposal. I actually wouldn't mind the disparity between asynchronous and synchronous iterators if '__aiterclose__' were to be accepted and '__iterclose__' rejected. However, I would like very much to see the asynchronous part happening in python 3.6. I can add another example for the reference: aioreactive (a fresh implementation of Rx for asyncio) is planning to handle subscriptions to a producer using a context manager: https://github.com/dbrattli/aioreactive#subscriptions-are-async-iterables async with listen(xs) as ys: async for x in ys: do_something(x) Like the proposal points out, this happens in the *user* code. With '__aiterclose__', the former example could be simplified as: async for x in listen(xs): do_something(x) Or even better: async for x in xs: do_something(x) Cheers, /Vincent On 10/19/2016 06:38 AM, Nathaniel Smith wrote: > Hi all, > > I'd like to propose that Python's iterator protocol be enhanced to add > a first-class notion of completion / cleanup. > > This is mostly motivated by thinking about the issues around async > generators and cleanup. Unfortunately even though PEP 525 was accepted > I found myself unable to stop pondering this, and the more I've > pondered the more convinced I've become that the GC hooks added in PEP > 525 are really not enough, and that we'll regret it if we stick with > them, or at least with them alone :-/. The strategy here is pretty > different -- it's an attempt to dig down and make a fundamental > improvement to the language that fixes a number of long-standing rough > spots, including async generators. > > The basic concept is relatively simple: just adding a '__iterclose__' > method that 'for' loops call upon completion, even if that's via break > or exception. But, the overall issue is fairly complicated + iterators > have a large surface area across the language, so the text below is > pretty long. Mostly I wrote it all out to convince myself that there > wasn't some weird showstopper lurking somewhere :-). For a first pass > discussion, it probably makes sense to mainly focus on whether the > basic concept makes sense? The main rationale is at the top, but the > details are there too for those who want them. > > Also, for *right* now I'm hoping -- probably unreasonably -- to try to > get the async iterator parts of the proposal in ASAP, ideally for > 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal > like this, which I apologize for -- though async generators are > provisional in 3.6, so at least in theory changing them is not out of > the question.) So again, it might make sense to focus especially on > the async parts, which are a pretty small and self-contained part, and > treat the rest of the proposal as a longer-term plan provided for > context. The comparison to PEP 525 GC hooks comes right after the > initial rationale. > > Anyway, I'll be interested to hear what you think! > > -n > > ------------------ > > Abstract > ======== > > We propose to extend the iterator protocol with a new > ``__(a)iterclose__`` slot, which is called automatically on exit from > ``(async) for`` loops, regardless of how they exit. This allows for > convenient, deterministic cleanup of resources held by iterators > without reliance on the garbage collector. This is especially valuable > for asynchronous generators. > > > Note on timing > ============== > > In practical terms, the proposal here is divided into two separate > parts: the handling of async iterators, which should ideally be > implemented ASAP, and the handling of regular iterators, which is a > larger but more relaxed project that can't start until 3.7 at the > earliest. But since the changes are closely related, and we probably > don't want to end up with async iterators and regular iterators > diverging in the long run, it seems useful to look at them together. > > > Background and motivation > ========================= > > Python iterables often hold resources which require cleanup. For > example: ``file`` objects need to be closed; the `WSGI spec > `_ adds a ``close`` method > on top of the regular iterator protocol and demands that consumers > call it at the appropriate time (though forgetting to do so is a > `frequent source of bugs > `_); > and PEP 342 (based on PEP 325) extended generator objects to add a > ``close`` method to allow generators to clean up after themselves. > > Generally, objects that need to clean up after themselves also define > a ``__del__`` method to ensure that this cleanup will happen > eventually, when the object is garbage collected. However, relying on > the garbage collector for cleanup like this causes serious problems in > at least two cases: > > - In Python implementations that do not use reference counting (e.g. > PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet > many situations require *prompt* cleanup of resources. Delayed cleanup > produces problems like crashes due to file descriptor exhaustion, or > WSGI timing middleware that collects bogus times. > > - Async generators (PEP 525) can only perform cleanup under the > supervision of the appropriate coroutine runner. ``__del__`` doesn't > have access to the coroutine runner; indeed, the coroutine runner > might be garbage collected before the generator object. So relying on > the garbage collector is effectively impossible without some kind of > language extension. (PEP 525 does provide such an extension, but it > has a number of limitations that this proposal fixes; see the > "alternatives" section below for discussion.) > > Fortunately, Python provides a standard tool for doing resource > cleanup in a more structured way: ``with`` blocks. For example, this > code opens a file but relies on the garbage collector to close it:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > for document in read_newline_separated_json(path): > ... > > and recent versions of CPython will point this out by issuing a > ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: > > def read_newline_separated_json(path): > with open(path) as file_handle: # <-- with block > for line in file_handle: > yield json.loads(line) > > for document in read_newline_separated_json(path): # <-- outer for loop > ... > > But there's a subtlety here, caused by the interaction of ``with`` > blocks and generators. ``with`` blocks are Python's main tool for > managing cleanup, and they're a powerful one, because they pin the > lifetime of a resource to the lifetime of a stack frame. But this > assumes that someone will take care of cleaning up the stack frame... > and for generators, this requires that someone ``close`` them. > > In this case, adding the ``with`` block *is* enough to shut up the > ``ResourceWarning``, but this is misleading -- the file object cleanup > here is still dependent on the garbage collector. The ``with`` block > will only be unwound when the ``read_newline_separated_json`` > generator is closed. If the outer ``for`` loop runs to completion then > the cleanup will happen immediately; but if this loop is terminated > early by a ``break`` or an exception, then the ``with`` block won't > fire until the generator object is garbage collected. > > The correct solution requires that all *users* of this API wrap every > ``for`` loop in its own ``with`` block:: > > with closing(read_newline_separated_json(path)) as genobj: > for document in genobj: > ... > > This gets even worse if we consider the idiom of decomposing a complex > pipeline into multiple nested generators:: > > def read_users(path): > with closing(read_newline_separated_json(path)) as gen: > for document in gen: > yield User.from_json(document) > > def users_in_group(path, group): > with closing(read_users(path)) as gen: > for user in gen: > if user.group == group: > yield user > > In general if you have N nested generators then you need N+1 ``with`` > blocks to clean up 1 file. And good defensive programming would > suggest that any time we use a generator, we should assume the > possibility that there could be at least one ``with`` block somewhere > in its (potentially transitive) call stack, either now or in the > future, and thus always wrap it in a ``with``. But in practice, > basically nobody does this, because programmers would rather write > buggy code than tiresome repetitive code. In simple cases like this > there are some workarounds that good Python developers know (e.g. in > this simple case it would be idiomatic to pass in a file handle > instead of a path and move the resource management to the top level), > but in general we cannot avoid the use of ``with``/``finally`` inside > of generators, and thus dealing with this problem one way or another. > When beauty and correctness fight then beauty tends to win, so it's > important to make correct code beautiful. > > Still, is this worth fixing? Until async generators came along I would > have argued yes, but that it was a low priority, since everyone seems > to be muddling along okay -- but async generators make it much more > urgent. Async generators cannot do cleanup *at all* without some > mechanism for deterministic cleanup that people will actually use, and > async generators are particularly likely to hold resources like file > descriptors. (After all, if they weren't doing I/O, they'd be > generators, not async generators.) So we have to do something, and it > might as well be a comprehensive fix to the underlying problem. And > it's much easier to fix this now when async generators are first > rolling out, then it will be to fix it later. > > The proposal itself is simple in concept: add a ``__(a)iterclose__`` > method to the iterator protocol, and have (async) ``for`` loops call > it when the loop is exited, even if this occurs via ``break`` or > exception unwinding. Effectively, we're taking the current cumbersome > idiom (``with`` block + ``for`` loop) and merging them together into a > fancier ``for``. This may seem non-orthogonal, but makes sense when > you consider that the existence of generators means that ``with`` > blocks actually depend on iterator cleanup to work reliably, plus > experience showing that iterator cleanup is often a desireable feature > in its own right. > > > Alternatives > ============ > > PEP 525 asyncgen hooks > ---------------------- > > PEP 525 proposes a `set of global thread-local hooks managed by new > ``sys.{get/set}_asyncgen_hooks()`` functions > `_, which > allow event loops to integrate with the garbage collector to run > cleanup for async generators. In principle, this proposal and PEP 525 > are complementary, in the same way that ``with`` blocks and > ``__del__`` are complementary: this proposal takes care of ensuring > deterministic cleanup in most cases, while PEP 525's GC hooks clean up > anything that gets missed. But ``__aiterclose__`` provides a number of > advantages over GC hooks alone: > > - The GC hook semantics aren't part of the abstract async iterator > protocol, but are instead restricted `specifically to the async > generator concrete type `_. > If you have an async iterator implemented using a class, like:: > > class MyAsyncIterator: > async def __anext__(): > ... > > then you can't refactor this into an async generator without > changing its semantics, and vice-versa. This seems very unpythonic. > (It also leaves open the question of what exactly class-based async > iterators are supposed to do, given that they face exactly the same > cleanup problems as async generators.) ``__aiterclose__``, on the > other hand, is defined at the protocol level, so it's duck-type > friendly and works for all iterators, not just generators. > > - Code that wants to work on non-CPython implementations like PyPy > cannot in general rely on GC for cleanup. Without ``__aiterclose__``, > it's more or less guaranteed that developers who develop and test on > CPython will produce libraries that leak resources when used on PyPy. > Developers who do want to target alternative implementations will > either have to take the defensive approach of wrapping every ``for`` > loop in a ``with`` block, or else carefully audit their code to figure > out which generators might possibly contain cleanup code and add > ``with`` blocks around those only. With ``__aiterclose__``, writing > portable code becomes easy and natural. > > - An important part of building robust software is making sure that > exceptions always propagate correctly without being lost. One of the > most exciting things about async/await compared to traditional > callback-based systems is that instead of requiring manual chaining, > the runtime can now do the heavy lifting of propagating errors, making > it *much* easier to write robust code. But, this beautiful new picture > has one major gap: if we rely on the GC for generator cleanup, then > exceptions raised during cleanup are lost. So, again, with > ``__aiterclose__``, developers who care about this kind of robustness > will either have to take the defensive approach of wrapping every > ``for`` loop in a ``with`` block, or else carefully audit their code > to figure out which generators might possibly contain cleanup code. > ``__aiterclose__`` plugs this hole by performing cleanup in the > caller's context, so writing more robust code becomes the path of > least resistance. > > - The WSGI experience suggests that there exist important > iterator-based APIs that need prompt cleanup and cannot rely on the > GC, even in CPython. For example, consider a hypothetical WSGI-like > API based around async/await and async iterators, where a response > handler is an async generator that takes request headers + an async > iterator over the request body, and yields response headers + the > response body. (This is actually the use case that got me interested > in async generators in the first place, i.e. this isn't hypothetical.) > If we follow WSGI in requiring that child iterators must be closed > properly, then without ``__aiterclose__`` the absolute most > minimalistic middleware in our system looks something like:: > > async def noop_middleware(handler, request_header, request_body): > async with aclosing(handler(request_body, request_body)) as aiter: > async for response_item in aiter: > yield response_item > > Arguably in regular code one can get away with skipping the ``with`` > block around ``for`` loops, depending on how confident one is that one > understands the internal implementation of the generator. But here we > have to cope with arbitrary response handlers, so without > ``__aiterclose__``, this ``with`` construction is a mandatory part of > every middleware. > > ``__aiterclose__`` allows us to eliminate the mandatory boilerplate > and an extra level of indentation from every middleware:: > > async def noop_middleware(handler, request_header, request_body): > async for response_item in handler(request_header, request_body): > yield response_item > > So the ``__aiterclose__`` approach provides substantial advantages > over GC hooks. > > This leaves open the question of whether we want a combination of GC > hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since > the vast majority of generators are iterated over using a ``for`` loop > or equivalent, ``__aiterclose__`` handles most situations before the > GC has a chance to get involved. The case where GC hooks provide > additional value is in code that does manual iteration, e.g.:: > > agen = fetch_newline_separated_json_from_url(...) > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > # doesn't do 'await agen.aclose()' > > If we go with the GC-hooks + ``__aiterclose__`` approach, this > generator will eventually be cleaned up by GC calling the generator > ``__del__`` method, which then will use the hooks to call back into > the event loop to run the cleanup code. > > If we go with the no-GC-hooks approach, this generator will eventually > be garbage collected, with the following effects: > > - its ``__del__`` method will issue a warning that the generator was > not closed (similar to the existing "coroutine never awaited" > warning). > > - The underlying resources involved will still be cleaned up, because > the generator frame will still be garbage collected, causing it to > drop references to any file handles or sockets it holds, and then > those objects's ``__del__`` methods will release the actual operating > system resources. > > - But, any cleanup code inside the generator itself (e.g. logging, > buffer flushing) will not get a chance to run. > > The solution here -- as the warning would indicate -- is to fix the > code so that it calls ``__aiterclose__``, e.g. by using a ``with`` > block:: > > async with aclosing(fetch_newline_separated_json_from_url(...)) as agen: > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > > Basically in this approach, the rule would be that if you want to > manually implement the iterator protocol, then it's your > responsibility to implement all of it, and that now includes > ``__(a)iterclose__``. > > GC hooks add non-trivial complexity in the form of (a) new global > interpreter state, (b) a somewhat complicated control flow (e.g., > async generator GC always involves resurrection, so the details of PEP > 442 are important), and (c) a new public API in asyncio (``await > loop.shutdown_asyncgens()``) that users have to remember to call at > the appropriate time. (This last point in particular somewhat > undermines the argument that GC hooks provide a safe backup to > guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called > correctly then I *think* it's possible for generators to be silently > discarded without their cleanup code being called; compare this to the > ``__aiterclose__``-only approach where in the worst case we still at > least get a warning printed. This might be fixable.) All this > considered, GC hooks arguably aren't worth it, given that the only > people they help are those who want to manually call ``__anext__`` yet > don't want to manually call ``__aiterclose__``. But Yury disagrees > with me on this :-). And both options are viable. > > > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: > > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > ``with`` blocks problem". But unfortunately, it breaks down quickly > when things get more complex. Consider if instead of reading from a > file, our generator was reading from a streaming HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. So this is really a workaround for simple > cases, not a general solution. > > > More complex variants of __(a)iterclose__ > ----------------------------------------- > > The semantics of ``__(a)iterclose__`` are somewhat inspired by > ``with`` blocks, but context managers are more powerful: > ``__(a)exit__`` can distinguish between a normal exit versus exception > unwinding, and in the case of an exception it can examine the > exception details and optionally suppress propagation. > ``__(a)iterclose__`` as proposed here does not have these powers, but > one can imagine an alternative design where it did. > > However, this seems like unwarranted complexity: experience suggests > that it's common for iterables to have ``close`` methods, and even to > have ``__exit__`` methods that call ``self.close()``, but I'm not > aware of any common cases that make use of ``__exit__``'s full power. > I also can't think of any examples where this would be useful. And it > seems unnecessarily confusing to allow iterators to affect flow > control by swallowing exceptions -- if you're in a situation where you > really want that, then you should probably use a real ``with`` block > anyway. > > > Specification > ============= > > This section describes where we want to eventually end up, though > there are some backwards compatibility issues that mean we can't jump > directly here. A later section describes the transition plan. > > > Guiding principles > ------------------ > > Generally, ``__(a)iterclose__`` implementations should: > > - be idempotent, > - perform any cleanup that is appropriate on the assumption that the > iterator will not be used again after ``__(a)iterclose__`` is called. > In particular, once ``__(a)iterclose__`` has been called then calling > ``__(a)next__`` produces undefined behavior. > > And generally, any code which starts iterating through an iterable > with the intention of exhausting it, should arrange to make sure that > ``__(a)iterclose__`` is eventually called, whether or not the iterator > is actually exhausted. > > > Changes to iteration > -------------------- > > The core proposal is the change in behavior of ``for`` loops. Given > this Python code:: > > for VAR in ITERABLE: > LOOP-BODY > else: > ELSE-BODY > > we desugar to the equivalent of:: > > _iter = iter(ITERABLE) > _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) > try: > traditional-for VAR in _iter: > LOOP-BODY > else: > ELSE-BODY > finally: > _iterclose(_iter) > > where the "traditional-for statement" here is meant as a shorthand for > the classic 3.5-and-earlier ``for`` loop semantics. > > Besides the top-level ``for`` statement, Python also contains several > other places where iterators are consumed. For consistency, these > should call ``__iterclose__`` as well using semantics equivalent to > the above. This includes: > > - ``for`` loops inside comprehensions > - ``*`` unpacking > - functions which accept and fully consume iterables, like > ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and > others. > > > Changes to async iteration > -------------------------- > > We also make the analogous changes to async iteration constructs, > except that the new slot is called ``__aiterclose__``, and it's an > async method that gets ``await``\ed. > > > Modifications to basic iterator types > ------------------------------------- > > Generator objects (including those created by generator comprehensions): > - ``__iterclose__`` calls ``self.close()`` > - ``__del__`` calls ``self.close()`` (same as now), and additionally > issues a ``ResourceWarning`` if the generator wasn't exhausted. This > warning is hidden by default, but can be enabled for those who want to > make sure they aren't inadverdantly relying on CPython-specific GC > semantics. > > Async generator objects (including those created by async generator > comprehensions): > - ``__aiterclose__`` calls ``self.aclose()`` > - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been > called, since this probably indicates a latent bug, similar to the > "coroutine never awaited" warning. > > QUESTION: should file objects implement ``__iterclose__`` to close the > file? On the one hand this would make this change more disruptive; on > the other hand people really like writing ``for line in open(...): > ...``, and if we get used to iterators taking care of their own > cleanup then it might become very weird if files don't. > > > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # QUESTION: I feel like there might be a better name for this one? > class preserve(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.preserve(handle): > ... > handle.seek(0) > for line in itertools.preserve(handle): > ... > > The ``operator`` module gains two new functions, with semantics > equivalent to the following:: > > def iterclose(it): > if hasattr(type(it), "__iterclose__"): > type(it).__iterclose__(it) > > async def aiterclose(ait): > if hasattr(type(ait), "__aiterclose__"): > await type(ait).__aiterclose__(ait) > > These are particularly useful when implementing the changes in the next section: > > > __iterclose__ implementations for iterator wrappers > --------------------------------------------------- > > Python ships a number of iterator types that act as wrappers around > other iterators: ``map``, ``zip``, ``itertools.accumulate``, > ``csv.reader``, and others. These iterators should define a > ``__iterclose__`` method which calls ``__iterclose__`` in turn on > their underlying iterators. For example, ``map`` could be implemented > as:: > > class map: > def __init__(self, fn, *iterables): > self._fn = fn > self._iters = [iter(iterable) for iterable in iterables] > > def __iter__(self): > return self > > def __next__(self): > return self._fn(*[next(it) for it in self._iters]) > > def __iterclose__(self): > for it in self._iters: > operator.iterclose(it) > > In some cases this requires some subtlety; for example, > ```itertools.tee`` > `_ > should not call ``__iterclose__`` on the underlying iterator until it > has been called on *all* of the clone iterators. > > > Example / Rationale > ------------------- > > The payoff for all this is that we can now write straightforward code like:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > and be confident that the file will receive deterministic cleanup > *without the end-user having to take any special effort*, even in > complex cases. For example, consider this silly pipeline:: > > list(map(lambda key: key.upper(), > doc["key"] for doc in read_newline_separated_json(path))) > > If our file contains a document where ``doc["key"]`` turns out to be > an integer, then the following sequence of events will happen: > > 1. ``key.upper()`` raises an ``AttributeError``, which propagates out > of the ``map`` and triggers the implicit ``finally`` block inside > ``list``. > 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the > map object. > 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator > comprehension object. > 4. This injects a ``GeneratorExit`` exception into the generator > comprehension body, which is currently suspended inside the > comprehension's ``for`` loop body. > 5. The exception propagates out of the ``for`` loop, triggering the > ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__`` on the generator object representing the call to > ``read_newline_separated_json``. > 6. This injects an inner ``GeneratorExit`` exception into the body of > ``read_newline_separated_json``, currently suspended at the ``yield``. > 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, > triggering the ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__()`` on the file object. > 8. The file object is closed. > 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary > of the generator function, and causes > ``read_newline_separated_json``'s ``__iterclose__()`` method to return > successfully. > 10. Control returns to the generator comprehension body, and the outer > ``GeneratorExit`` continues propagating, allowing the comprehension's > ``__iterclose__()`` to return successfully. > 11. The rest of the ``__iterclose__()`` calls unwind without incident, > back into the body of ``list``. > 12. The original ``AttributeError`` resumes propagating. > > (The details above assume that we implement ``file.__iterclose__``; if > not then add a ``with`` block to ``read_newline_separated_json`` and > essentially the same logic goes through.) > > Of course, from the user's point of view, this can be simplified down to just: > > 1. ``int.upper()`` raises an ``AttributeError`` > 1. The file object is closed. > 2. The ``AttributeError`` propagates out of ``list`` > > So we've accomplished our goal of making this "just work" without the > user having to think about it. > > > Transition plan > =============== > > While the majority of existing ``for`` loops will continue to produce > identical results, the proposed changes will produce > backwards-incompatible behavior in some cases. Example:: > > def read_csv_with_header(lines_iterable): > lines_iterator = iter(lines_iterable) > for line in lines_iterator: > column_names = line.strip().split("\t") > break > for line in lines_iterator: > values = line.strip().split("\t") > record = dict(zip(column_names, values)) > yield record > > This code used to be correct, but after this proposal is implemented > will require an ``itertools.preserve`` call added to the first ``for`` > loop. > > [QUESTION: currently, if you close a generator and then try to iterate > over it then it just raises ``Stop(Async)Iteration``, so code the > passes the same generator object to multiple ``for`` loops but forgets > to use ``itertools.preserve`` won't see an obvious error -- the second > ``for`` loop will just exit immediately. Perhaps it would be better if > iterating a closed generator raised a ``RuntimeError``? Note that > files don't have this problem -- attempting to iterate a closed file > object already raises ``ValueError``.] > > Specifically, the incompatibility happens when all of these factors > come together: > > - The automatic calling of ``__(a)iterclose__`` is enabled > - The iterable did not previously define ``__(a)iterclose__`` > - The iterable does now define ``__(a)iterclose__`` > - The iterable is re-used after the ``for`` loop exits > > So the problem is how to manage this transition, and those are the > levers we have to work with. > > First, observe that the only async iterables where we propose to add > ``__aiterclose__`` are async generators, and there is currently no > existing code using async generators (though this will start changing > very soon), so the async changes do not produce any backwards > incompatibilities. (There is existing code using async iterators, but > using the new async for loop on an old async iterator is harmless, > because old async iterators don't have ``__aiterclose__``.) In > addition, PEP 525 was accepted on a provisional basis, and async > generators are by far the biggest beneficiary of this PEP's proposed > changes. Therefore, I think we should strongly consider enabling > ``__aiterclose__`` for ``async for`` loops and async generators ASAP, > ideally for 3.6.0 or 3.6.1. > > For the non-async world, things are harder, but here's a potential > transition path: > > In 3.7: > > Our goal is that existing unsafe code will start emitting warnings, > while those who want to opt-in to the future can do that immediately: > > - We immediately add all the ``__iterclose__`` methods described above. > - If ``from __future__ import iterclose`` is in effect, then ``for`` > loops and ``*`` unpacking call ``__iterclose__`` as specified above. > - If the future is *not* enabled, then ``for`` loops and ``*`` > unpacking do *not* call ``__iterclose__``. But they do call some other > method instead, e.g. ``__iterclose_warning__``. > - Similarly, functions like ``list`` use stack introspection (!!) to > check whether their direct caller has ``__future__.iterclose`` > enabled, and use this to decide whether to call ``__iterclose__`` or > ``__iterclose_warning__``. > - For all the wrapper iterators, we also add ``__iterclose_warning__`` > methods that forward to the ``__iterclose_warning__`` method of the > underlying iterator or iterators. > - For generators (and files, if we decide to do that), > ``__iterclose_warning__`` is defined to set an internal flag, and > other methods on the object are modified to check for this flag. If > they find the flag set, they issue a ``PendingDeprecationWarning`` to > inform the user that in the future this sequence would have led to a > use-after-close situation and the user should use ``preserve()``. > > In 3.8: > > - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` > > In 3.9: > > - Enable the ``__future__`` unconditionally and remove all the > ``__iterclose_warning__`` stuff. > > I believe that this satisfies the normal requirements for this kind of > transition -- opt-in initially, with warnings targeted precisely to > the cases that will be effected, and a long deprecation cycle. > > Probably the most controversial / risky part of this is the use of > stack introspection to make the iterable-consuming functions sensitive > to a ``__future__`` setting, though I haven't thought of any situation > where it would actually go wrong yet... > > > Acknowledgements > ================ > > Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for > helpful discussion on earlier versions of this idea. > From oscar.j.benjamin at gmail.com Wed Oct 19 07:33:18 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 19 Oct 2016 12:33:18 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 17 October 2016 at 09:08, Nathaniel Smith wrote: > Hi all, Hi Nathaniel. I'm just reposting what I wrote on pypy-dev (as requested) but under the assumption that you didn't substantially alter your draft - I apologise if some of the quoted text below has already been edited. > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: I suggested this and I still think that it is the best idea. > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > problem". But unfortunately, it breaks down quickly when things get > more complex. Consider if instead of reading from a file, our > generator was processing the body returned by an HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. I haven't written the kind of code that you're describing so I can't say exactly how I would do it. I imagine though that helpers could be used to solve some of the problems that you're referring to though. Here's a case I do know where the above suggestion is awkward: def concat(filenames): for filename in filenames: with open(filename) as inputfile: yield from inputfile for line in concat(filenames): ... It's still possible to safely handle this use case by creating a helper though. fileinput.input almost does what you want: with fileinput.input(filenames) as lines: for line in lines: ... Unfortunately if filenames is empty this will default to sys.stdin so it's not perfect but really I think introducing useful helpers for common cases (rather than core language changes) should be considered as the obvious solution here. Generally it would have been better if the discussion for PEP 525 has focussed more on helping people to debug/fix dependence on __del__ rather than trying to magically fix broken code. > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # XX FIXME: I feel like there might be a better name for this one? > class protect(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.protect(handle): > ... > handle.seek(0) > for line in itertools.protect(handle): > ... It would be much simpler to reverse this suggestion and say let's introduce a helper that selectively *enables* the new behaviour you're proposing i.e.: for line in itertools.closeafter(open(...)): ... if not line.startswith('#'): break # <--------------- file gets closed here Then we can leave (async) for loops as they are and there are no backward compatbility problems etc. -- Oscar From oscar.j.benjamin at gmail.com Wed Oct 19 07:39:26 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 19 Oct 2016 12:39:26 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 19 October 2016 at 12:33, Oscar Benjamin wrote: > >> New convenience functions >> ------------------------- >> >> The ``itertools`` module gains a new iterator wrapper that can be used >> to selectively disable the new ``__iterclose__`` behavior:: >> >> # XX FIXME: I feel like there might be a better name for this one? >> class protect(iterable): >> def __init__(self, iterable): >> self._it = iter(iterable) >> >> def __iter__(self): >> return self >> >> def __next__(self): >> return next(self._it) >> >> def __iterclose__(self): >> # Swallow __iterclose__ without passing it on >> pass >> >> Example usage (assuming that file objects implements ``__iterclose__``):: >> >> with open(...) as handle: >> # Iterate through the same file twice: >> for line in itertools.protect(handle): >> ... >> handle.seek(0) >> for line in itertools.protect(handle): >> ... > > It would be much simpler to reverse this suggestion and say let's > introduce a helper that selectively *enables* the new behaviour you're > proposing i.e.: > > for line in itertools.closeafter(open(...)): > ... > if not line.startswith('#'): > break # <--------------- file gets closed here Looking more closely at this I realise that there is no way to implement closeafter like this without depending on closeafter.__del__ to do the closing. So actually this is not a solution to the problem at all. Sorry for the noise there! -- Oscar From p.f.moore at gmail.com Wed Oct 19 08:24:37 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 19 Oct 2016 13:24:37 +0100 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> References: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> Message-ID: On 19 October 2016 at 12:29, Michel Desmoulin wrote: > I feel like people are really getting hyper sensitive about communications. > While I do prefer talking to calm rational people with a friendly tone, I > acknowledge this is not always the case and it's ok if somebody go overboard > from time to time. It's certainly far better to avoid taking offense at comments that are made, wherever possible. People do make mistakes, and do get overexcited at times. > We are not living in a perfect world, and spending a lot of effort trying to > smooth everything out seems overkill to me. However, it's *not* OK for people to assume that it's up to the reader to not take offense. We may not be living in a perfect world, but we are living in a world where we get to deal with people from a lot more backgrounds, cultures, and perspectives than most of us are used to. To say nothing of the fact that on the web, it's easy to *forget* that people we interact with aren't "just like us". This is a huge privilege, but also makes it very easy to make mistakes. It does no harm (far from it!) for us to put some effort into trying to consider our readers' point of view when writing. And the beauty of email is that we have the *time* to stop, think, and make sure our words say what we want, without offending people. So +0.5 on people not being too quick to take offense at what they read. But +1 on people putting time into not *causing* offense. And a big thank you to all the people (admins, people working on things like CoCs, etc) like Brett who take time to remind us all to treat each other civilly and considerately. Paul From mistersheik at gmail.com Wed Oct 19 03:38:34 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 19 Oct 2016 00:38:34 -0700 (PDT) Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> This is a very interesting proposal. I just wanted to share something I found in my quick search: http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration Could you explain why the accepted answer there doesn't address this issue? class Parse(object): """A generator that iterates through a file""" def __init__(self, path): self.path = path def __iter__(self): with open(self.path) as f: yield from f Best, Neil On Wednesday, October 19, 2016 at 12:39:34 AM UTC-4, Nathaniel Smith wrote: > > Hi all, > > I'd like to propose that Python's iterator protocol be enhanced to add > a first-class notion of completion / cleanup. > > This is mostly motivated by thinking about the issues around async > generators and cleanup. Unfortunately even though PEP 525 was accepted > I found myself unable to stop pondering this, and the more I've > pondered the more convinced I've become that the GC hooks added in PEP > 525 are really not enough, and that we'll regret it if we stick with > them, or at least with them alone :-/. The strategy here is pretty > different -- it's an attempt to dig down and make a fundamental > improvement to the language that fixes a number of long-standing rough > spots, including async generators. > > The basic concept is relatively simple: just adding a '__iterclose__' > method that 'for' loops call upon completion, even if that's via break > or exception. But, the overall issue is fairly complicated + iterators > have a large surface area across the language, so the text below is > pretty long. Mostly I wrote it all out to convince myself that there > wasn't some weird showstopper lurking somewhere :-). For a first pass > discussion, it probably makes sense to mainly focus on whether the > basic concept makes sense? The main rationale is at the top, but the > details are there too for those who want them. > > Also, for *right* now I'm hoping -- probably unreasonably -- to try to > get the async iterator parts of the proposal in ASAP, ideally for > 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal > like this, which I apologize for -- though async generators are > provisional in 3.6, so at least in theory changing them is not out of > the question.) So again, it might make sense to focus especially on > the async parts, which are a pretty small and self-contained part, and > treat the rest of the proposal as a longer-term plan provided for > context. The comparison to PEP 525 GC hooks comes right after the > initial rationale. > > Anyway, I'll be interested to hear what you think! > > -n > > ------------------ > > Abstract > ======== > > We propose to extend the iterator protocol with a new > ``__(a)iterclose__`` slot, which is called automatically on exit from > ``(async) for`` loops, regardless of how they exit. This allows for > convenient, deterministic cleanup of resources held by iterators > without reliance on the garbage collector. This is especially valuable > for asynchronous generators. > > > Note on timing > ============== > > In practical terms, the proposal here is divided into two separate > parts: the handling of async iterators, which should ideally be > implemented ASAP, and the handling of regular iterators, which is a > larger but more relaxed project that can't start until 3.7 at the > earliest. But since the changes are closely related, and we probably > don't want to end up with async iterators and regular iterators > diverging in the long run, it seems useful to look at them together. > > > Background and motivation > ========================= > > Python iterables often hold resources which require cleanup. For > example: ``file`` objects need to be closed; the `WSGI spec > `_ adds a ``close`` method > on top of the regular iterator protocol and demands that consumers > call it at the appropriate time (though forgetting to do so is a > `frequent source of bugs > `_); > > and PEP 342 (based on PEP 325) extended generator objects to add a > ``close`` method to allow generators to clean up after themselves. > > Generally, objects that need to clean up after themselves also define > a ``__del__`` method to ensure that this cleanup will happen > eventually, when the object is garbage collected. However, relying on > the garbage collector for cleanup like this causes serious problems in > at least two cases: > > - In Python implementations that do not use reference counting (e.g. > PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet > many situations require *prompt* cleanup of resources. Delayed cleanup > produces problems like crashes due to file descriptor exhaustion, or > WSGI timing middleware that collects bogus times. > > - Async generators (PEP 525) can only perform cleanup under the > supervision of the appropriate coroutine runner. ``__del__`` doesn't > have access to the coroutine runner; indeed, the coroutine runner > might be garbage collected before the generator object. So relying on > the garbage collector is effectively impossible without some kind of > language extension. (PEP 525 does provide such an extension, but it > has a number of limitations that this proposal fixes; see the > "alternatives" section below for discussion.) > > Fortunately, Python provides a standard tool for doing resource > cleanup in a more structured way: ``with`` blocks. For example, this > code opens a file but relies on the garbage collector to close it:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > for document in read_newline_separated_json(path): > ... > > and recent versions of CPython will point this out by issuing a > ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: > > def read_newline_separated_json(path): > with open(path) as file_handle: # <-- with block > for line in file_handle: > yield json.loads(line) > > for document in read_newline_separated_json(path): # <-- outer for loop > ... > > But there's a subtlety here, caused by the interaction of ``with`` > blocks and generators. ``with`` blocks are Python's main tool for > managing cleanup, and they're a powerful one, because they pin the > lifetime of a resource to the lifetime of a stack frame. But this > assumes that someone will take care of cleaning up the stack frame... > and for generators, this requires that someone ``close`` them. > > In this case, adding the ``with`` block *is* enough to shut up the > ``ResourceWarning``, but this is misleading -- the file object cleanup > here is still dependent on the garbage collector. The ``with`` block > will only be unwound when the ``read_newline_separated_json`` > generator is closed. If the outer ``for`` loop runs to completion then > the cleanup will happen immediately; but if this loop is terminated > early by a ``break`` or an exception, then the ``with`` block won't > fire until the generator object is garbage collected. > > The correct solution requires that all *users* of this API wrap every > ``for`` loop in its own ``with`` block:: > > with closing(read_newline_separated_json(path)) as genobj: > for document in genobj: > ... > > This gets even worse if we consider the idiom of decomposing a complex > pipeline into multiple nested generators:: > > def read_users(path): > with closing(read_newline_separated_json(path)) as gen: > for document in gen: > yield User.from_json(document) > > def users_in_group(path, group): > with closing(read_users(path)) as gen: > for user in gen: > if user.group == group: > yield user > > In general if you have N nested generators then you need N+1 ``with`` > blocks to clean up 1 file. And good defensive programming would > suggest that any time we use a generator, we should assume the > possibility that there could be at least one ``with`` block somewhere > in its (potentially transitive) call stack, either now or in the > future, and thus always wrap it in a ``with``. But in practice, > basically nobody does this, because programmers would rather write > buggy code than tiresome repetitive code. In simple cases like this > there are some workarounds that good Python developers know (e.g. in > this simple case it would be idiomatic to pass in a file handle > instead of a path and move the resource management to the top level), > but in general we cannot avoid the use of ``with``/``finally`` inside > of generators, and thus dealing with this problem one way or another. > When beauty and correctness fight then beauty tends to win, so it's > important to make correct code beautiful. > > Still, is this worth fixing? Until async generators came along I would > have argued yes, but that it was a low priority, since everyone seems > to be muddling along okay -- but async generators make it much more > urgent. Async generators cannot do cleanup *at all* without some > mechanism for deterministic cleanup that people will actually use, and > async generators are particularly likely to hold resources like file > descriptors. (After all, if they weren't doing I/O, they'd be > generators, not async generators.) So we have to do something, and it > might as well be a comprehensive fix to the underlying problem. And > it's much easier to fix this now when async generators are first > rolling out, then it will be to fix it later. > > The proposal itself is simple in concept: add a ``__(a)iterclose__`` > method to the iterator protocol, and have (async) ``for`` loops call > it when the loop is exited, even if this occurs via ``break`` or > exception unwinding. Effectively, we're taking the current cumbersome > idiom (``with`` block + ``for`` loop) and merging them together into a > fancier ``for``. This may seem non-orthogonal, but makes sense when > you consider that the existence of generators means that ``with`` > blocks actually depend on iterator cleanup to work reliably, plus > experience showing that iterator cleanup is often a desireable feature > in its own right. > > > Alternatives > ============ > > PEP 525 asyncgen hooks > ---------------------- > > PEP 525 proposes a `set of global thread-local hooks managed by new > ``sys.{get/set}_asyncgen_hooks()`` functions > `_, which > allow event loops to integrate with the garbage collector to run > cleanup for async generators. In principle, this proposal and PEP 525 > are complementary, in the same way that ``with`` blocks and > ``__del__`` are complementary: this proposal takes care of ensuring > deterministic cleanup in most cases, while PEP 525's GC hooks clean up > anything that gets missed. But ``__aiterclose__`` provides a number of > advantages over GC hooks alone: > > - The GC hook semantics aren't part of the abstract async iterator > protocol, but are instead restricted `specifically to the async > generator concrete type `_. > If you have an async iterator implemented using a class, like:: > > class MyAsyncIterator: > async def __anext__(): > ... > > then you can't refactor this into an async generator without > changing its semantics, and vice-versa. This seems very unpythonic. > (It also leaves open the question of what exactly class-based async > iterators are supposed to do, given that they face exactly the same > cleanup problems as async generators.) ``__aiterclose__``, on the > other hand, is defined at the protocol level, so it's duck-type > friendly and works for all iterators, not just generators. > > - Code that wants to work on non-CPython implementations like PyPy > cannot in general rely on GC for cleanup. Without ``__aiterclose__``, > it's more or less guaranteed that developers who develop and test on > CPython will produce libraries that leak resources when used on PyPy. > Developers who do want to target alternative implementations will > either have to take the defensive approach of wrapping every ``for`` > loop in a ``with`` block, or else carefully audit their code to figure > out which generators might possibly contain cleanup code and add > ``with`` blocks around those only. With ``__aiterclose__``, writing > portable code becomes easy and natural. > > - An important part of building robust software is making sure that > exceptions always propagate correctly without being lost. One of the > most exciting things about async/await compared to traditional > callback-based systems is that instead of requiring manual chaining, > the runtime can now do the heavy lifting of propagating errors, making > it *much* easier to write robust code. But, this beautiful new picture > has one major gap: if we rely on the GC for generator cleanup, then > exceptions raised during cleanup are lost. So, again, with > ``__aiterclose__``, developers who care about this kind of robustness > will either have to take the defensive approach of wrapping every > ``for`` loop in a ``with`` block, or else carefully audit their code > to figure out which generators might possibly contain cleanup code. > ``__aiterclose__`` plugs this hole by performing cleanup in the > caller's context, so writing more robust code becomes the path of > least resistance. > > - The WSGI experience suggests that there exist important > iterator-based APIs that need prompt cleanup and cannot rely on the > GC, even in CPython. For example, consider a hypothetical WSGI-like > API based around async/await and async iterators, where a response > handler is an async generator that takes request headers + an async > iterator over the request body, and yields response headers + the > response body. (This is actually the use case that got me interested > in async generators in the first place, i.e. this isn't hypothetical.) > If we follow WSGI in requiring that child iterators must be closed > properly, then without ``__aiterclose__`` the absolute most > minimalistic middleware in our system looks something like:: > > async def noop_middleware(handler, request_header, request_body): > async with aclosing(handler(request_body, request_body)) as aiter: > async for response_item in aiter: > yield response_item > > Arguably in regular code one can get away with skipping the ``with`` > block around ``for`` loops, depending on how confident one is that one > understands the internal implementation of the generator. But here we > have to cope with arbitrary response handlers, so without > ``__aiterclose__``, this ``with`` construction is a mandatory part of > every middleware. > > ``__aiterclose__`` allows us to eliminate the mandatory boilerplate > and an extra level of indentation from every middleware:: > > async def noop_middleware(handler, request_header, request_body): > async for response_item in handler(request_header, request_body): > yield response_item > > So the ``__aiterclose__`` approach provides substantial advantages > over GC hooks. > > This leaves open the question of whether we want a combination of GC > hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since > the vast majority of generators are iterated over using a ``for`` loop > or equivalent, ``__aiterclose__`` handles most situations before the > GC has a chance to get involved. The case where GC hooks provide > additional value is in code that does manual iteration, e.g.:: > > agen = fetch_newline_separated_json_from_url(...) > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > # doesn't do 'await agen.aclose()' > > If we go with the GC-hooks + ``__aiterclose__`` approach, this > generator will eventually be cleaned up by GC calling the generator > ``__del__`` method, which then will use the hooks to call back into > the event loop to run the cleanup code. > > If we go with the no-GC-hooks approach, this generator will eventually > be garbage collected, with the following effects: > > - its ``__del__`` method will issue a warning that the generator was > not closed (similar to the existing "coroutine never awaited" > warning). > > - The underlying resources involved will still be cleaned up, because > the generator frame will still be garbage collected, causing it to > drop references to any file handles or sockets it holds, and then > those objects's ``__del__`` methods will release the actual operating > system resources. > > - But, any cleanup code inside the generator itself (e.g. logging, > buffer flushing) will not get a chance to run. > > The solution here -- as the warning would indicate -- is to fix the > code so that it calls ``__aiterclose__``, e.g. by using a ``with`` > block:: > > async with aclosing(fetch_newline_separated_json_from_url(...)) as > agen: > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > > Basically in this approach, the rule would be that if you want to > manually implement the iterator protocol, then it's your > responsibility to implement all of it, and that now includes > ``__(a)iterclose__``. > > GC hooks add non-trivial complexity in the form of (a) new global > interpreter state, (b) a somewhat complicated control flow (e.g., > async generator GC always involves resurrection, so the details of PEP > 442 are important), and (c) a new public API in asyncio (``await > loop.shutdown_asyncgens()``) that users have to remember to call at > the appropriate time. (This last point in particular somewhat > undermines the argument that GC hooks provide a safe backup to > guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called > correctly then I *think* it's possible for generators to be silently > discarded without their cleanup code being called; compare this to the > ``__aiterclose__``-only approach where in the worst case we still at > least get a warning printed. This might be fixable.) All this > considered, GC hooks arguably aren't worth it, given that the only > people they help are those who want to manually call ``__anext__`` yet > don't want to manually call ``__aiterclose__``. But Yury disagrees > with me on this :-). And both options are viable. > > > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: > > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > ``with`` blocks problem". But unfortunately, it breaks down quickly > when things get more complex. Consider if instead of reading from a > file, our generator was reading from a streaming HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. So this is really a workaround for simple > cases, not a general solution. > > > More complex variants of __(a)iterclose__ > ----------------------------------------- > > The semantics of ``__(a)iterclose__`` are somewhat inspired by > ``with`` blocks, but context managers are more powerful: > ``__(a)exit__`` can distinguish between a normal exit versus exception > unwinding, and in the case of an exception it can examine the > exception details and optionally suppress propagation. > ``__(a)iterclose__`` as proposed here does not have these powers, but > one can imagine an alternative design where it did. > > However, this seems like unwarranted complexity: experience suggests > that it's common for iterables to have ``close`` methods, and even to > have ``__exit__`` methods that call ``self.close()``, but I'm not > aware of any common cases that make use of ``__exit__``'s full power. > I also can't think of any examples where this would be useful. And it > seems unnecessarily confusing to allow iterators to affect flow > control by swallowing exceptions -- if you're in a situation where you > really want that, then you should probably use a real ``with`` block > anyway. > > > Specification > ============= > > This section describes where we want to eventually end up, though > there are some backwards compatibility issues that mean we can't jump > directly here. A later section describes the transition plan. > > > Guiding principles > ------------------ > > Generally, ``__(a)iterclose__`` implementations should: > > - be idempotent, > - perform any cleanup that is appropriate on the assumption that the > iterator will not be used again after ``__(a)iterclose__`` is called. > In particular, once ``__(a)iterclose__`` has been called then calling > ``__(a)next__`` produces undefined behavior. > > And generally, any code which starts iterating through an iterable > with the intention of exhausting it, should arrange to make sure that > ``__(a)iterclose__`` is eventually called, whether or not the iterator > is actually exhausted. > > > Changes to iteration > -------------------- > > The core proposal is the change in behavior of ``for`` loops. Given > this Python code:: > > for VAR in ITERABLE: > LOOP-BODY > else: > ELSE-BODY > > we desugar to the equivalent of:: > > _iter = iter(ITERABLE) > _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) > try: > traditional-for VAR in _iter: > LOOP-BODY > else: > ELSE-BODY > finally: > _iterclose(_iter) > > where the "traditional-for statement" here is meant as a shorthand for > the classic 3.5-and-earlier ``for`` loop semantics. > > Besides the top-level ``for`` statement, Python also contains several > other places where iterators are consumed. For consistency, these > should call ``__iterclose__`` as well using semantics equivalent to > the above. This includes: > > - ``for`` loops inside comprehensions > - ``*`` unpacking > - functions which accept and fully consume iterables, like > ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and > others. > > > Changes to async iteration > -------------------------- > > We also make the analogous changes to async iteration constructs, > except that the new slot is called ``__aiterclose__``, and it's an > async method that gets ``await``\ed. > > > Modifications to basic iterator types > ------------------------------------- > > Generator objects (including those created by generator comprehensions): > - ``__iterclose__`` calls ``self.close()`` > - ``__del__`` calls ``self.close()`` (same as now), and additionally > issues a ``ResourceWarning`` if the generator wasn't exhausted. This > warning is hidden by default, but can be enabled for those who want to > make sure they aren't inadverdantly relying on CPython-specific GC > semantics. > > Async generator objects (including those created by async generator > comprehensions): > - ``__aiterclose__`` calls ``self.aclose()`` > - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been > called, since this probably indicates a latent bug, similar to the > "coroutine never awaited" warning. > > QUESTION: should file objects implement ``__iterclose__`` to close the > file? On the one hand this would make this change more disruptive; on > the other hand people really like writing ``for line in open(...): > ...``, and if we get used to iterators taking care of their own > cleanup then it might become very weird if files don't. > > > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # QUESTION: I feel like there might be a better name for this one? > class preserve(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.preserve(handle): > ... > handle.seek(0) > for line in itertools.preserve(handle): > ... > > The ``operator`` module gains two new functions, with semantics > equivalent to the following:: > > def iterclose(it): > if hasattr(type(it), "__iterclose__"): > type(it).__iterclose__(it) > > async def aiterclose(ait): > if hasattr(type(ait), "__aiterclose__"): > await type(ait).__aiterclose__(ait) > > These are particularly useful when implementing the changes in the next > section: > > > __iterclose__ implementations for iterator wrappers > --------------------------------------------------- > > Python ships a number of iterator types that act as wrappers around > other iterators: ``map``, ``zip``, ``itertools.accumulate``, > ``csv.reader``, and others. These iterators should define a > ``__iterclose__`` method which calls ``__iterclose__`` in turn on > their underlying iterators. For example, ``map`` could be implemented > as:: > > class map: > def __init__(self, fn, *iterables): > self._fn = fn > self._iters = [iter(iterable) for iterable in iterables] > > def __iter__(self): > return self > > def __next__(self): > return self._fn(*[next(it) for it in self._iters]) > > def __iterclose__(self): > for it in self._iters: > operator.iterclose(it) > > In some cases this requires some subtlety; for example, > ```itertools.tee`` > `_ > should not call ``__iterclose__`` on the underlying iterator until it > has been called on *all* of the clone iterators. > > > Example / Rationale > ------------------- > > The payoff for all this is that we can now write straightforward code > like:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > and be confident that the file will receive deterministic cleanup > *without the end-user having to take any special effort*, even in > complex cases. For example, consider this silly pipeline:: > > list(map(lambda key: key.upper(), > doc["key"] for doc in read_newline_separated_json(path))) > > If our file contains a document where ``doc["key"]`` turns out to be > an integer, then the following sequence of events will happen: > > 1. ``key.upper()`` raises an ``AttributeError``, which propagates out > of the ``map`` and triggers the implicit ``finally`` block inside > ``list``. > 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the > map object. > 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator > comprehension object. > 4. This injects a ``GeneratorExit`` exception into the generator > comprehension body, which is currently suspended inside the > comprehension's ``for`` loop body. > 5. The exception propagates out of the ``for`` loop, triggering the > ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__`` on the generator object representing the call to > ``read_newline_separated_json``. > 6. This injects an inner ``GeneratorExit`` exception into the body of > ``read_newline_separated_json``, currently suspended at the ``yield``. > 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, > triggering the ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__()`` on the file object. > 8. The file object is closed. > 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary > of the generator function, and causes > ``read_newline_separated_json``'s ``__iterclose__()`` method to return > successfully. > 10. Control returns to the generator comprehension body, and the outer > ``GeneratorExit`` continues propagating, allowing the comprehension's > ``__iterclose__()`` to return successfully. > 11. The rest of the ``__iterclose__()`` calls unwind without incident, > back into the body of ``list``. > 12. The original ``AttributeError`` resumes propagating. > > (The details above assume that we implement ``file.__iterclose__``; if > not then add a ``with`` block to ``read_newline_separated_json`` and > essentially the same logic goes through.) > > Of course, from the user's point of view, this can be simplified down to > just: > > 1. ``int.upper()`` raises an ``AttributeError`` > 1. The file object is closed. > 2. The ``AttributeError`` propagates out of ``list`` > > So we've accomplished our goal of making this "just work" without the > user having to think about it. > > > Transition plan > =============== > > While the majority of existing ``for`` loops will continue to produce > identical results, the proposed changes will produce > backwards-incompatible behavior in some cases. Example:: > > def read_csv_with_header(lines_iterable): > lines_iterator = iter(lines_iterable) > for line in lines_iterator: > column_names = line.strip().split("\t") > break > for line in lines_iterator: > values = line.strip().split("\t") > record = dict(zip(column_names, values)) > yield record > > This code used to be correct, but after this proposal is implemented > will require an ``itertools.preserve`` call added to the first ``for`` > loop. > > [QUESTION: currently, if you close a generator and then try to iterate > over it then it just raises ``Stop(Async)Iteration``, so code the > passes the same generator object to multiple ``for`` loops but forgets > to use ``itertools.preserve`` won't see an obvious error -- the second > ``for`` loop will just exit immediately. Perhaps it would be better if > iterating a closed generator raised a ``RuntimeError``? Note that > files don't have this problem -- attempting to iterate a closed file > object already raises ``ValueError``.] > > Specifically, the incompatibility happens when all of these factors > come together: > > - The automatic calling of ``__(a)iterclose__`` is enabled > - The iterable did not previously define ``__(a)iterclose__`` > - The iterable does now define ``__(a)iterclose__`` > - The iterable is re-used after the ``for`` loop exits > > So the problem is how to manage this transition, and those are the > levers we have to work with. > > First, observe that the only async iterables where we propose to add > ``__aiterclose__`` are async generators, and there is currently no > existing code using async generators (though this will start changing > very soon), so the async changes do not produce any backwards > incompatibilities. (There is existing code using async iterators, but > using the new async for loop on an old async iterator is harmless, > because old async iterators don't have ``__aiterclose__``.) In > addition, PEP 525 was accepted on a provisional basis, and async > generators are by far the biggest beneficiary of this PEP's proposed > changes. Therefore, I think we should strongly consider enabling > ``__aiterclose__`` for ``async for`` loops and async generators ASAP, > ideally for 3.6.0 or 3.6.1. > > For the non-async world, things are harder, but here's a potential > transition path: > > In 3.7: > > Our goal is that existing unsafe code will start emitting warnings, > while those who want to opt-in to the future can do that immediately: > > - We immediately add all the ``__iterclose__`` methods described above. > - If ``from __future__ import iterclose`` is in effect, then ``for`` > loops and ``*`` unpacking call ``__iterclose__`` as specified above. > - If the future is *not* enabled, then ``for`` loops and ``*`` > unpacking do *not* call ``__iterclose__``. But they do call some other > method instead, e.g. ``__iterclose_warning__``. > - Similarly, functions like ``list`` use stack introspection (!!) to > check whether their direct caller has ``__future__.iterclose`` > enabled, and use this to decide whether to call ``__iterclose__`` or > ``__iterclose_warning__``. > - For all the wrapper iterators, we also add ``__iterclose_warning__`` > methods that forward to the ``__iterclose_warning__`` method of the > underlying iterator or iterators. > - For generators (and files, if we decide to do that), > ``__iterclose_warning__`` is defined to set an internal flag, and > other methods on the object are modified to check for this flag. If > they find the flag set, they issue a ``PendingDeprecationWarning`` to > inform the user that in the future this sequence would have led to a > use-after-close situation and the user should use ``preserve()``. > > In 3.8: > > - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` > > In 3.9: > > - Enable the ``__future__`` unconditionally and remove all the > ``__iterclose_warning__`` stuff. > > I believe that this satisfies the normal requirements for this kind of > transition -- opt-in initially, with warnings targeted precisely to > the cases that will be effected, and a long deprecation cycle. > > Probably the most controversial / risky part of this is the use of > stack introspection to make the iterable-consuming functions sensitive > to a ``__future__`` setting, though I haven't thought of any situation > where it would actually go wrong yet... > > > Acknowledgements > ================ > > Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for > helpful discussion on earlier versions of this idea. > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Oct 19 11:07:16 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 11:07:16 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar wrote: > This is a very interesting proposal. I just wanted to share something I > found in my quick search: > > http://stackoverflow.com/questions/14797930/python- > custom-iterator-close-a-file-on-stopiteration > > Could you explain why the accepted answer there doesn't address this issue? > > class Parse(object): > """A generator that iterates through a file""" > def __init__(self, path): > self.path = path > > def __iter__(self): > with open(self.path) as f: > yield from f > > > Best, > > Neil > > I think the difference is that this new approach guarantees cleanup the exact moment the loop ends, no matter how it ends. If I understand correctly, your approach will do cleanup when the loop ends only if the iterator is exhausted. But if someone zips it with a shorter iterator, uses itertools.islice or something similar, breaks the loop, returns inside the loop, or in some other way ends the loop before the iterator is exhausted, the cleanup won't happen when the iterator is garbage collected. And for non-reference-counting python implementations, when this happens is completely unpredictable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 19 11:51:37 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 19 Oct 2016 11:51:37 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: I'm -1 on the idea. Here's why: 1. Python is a very dynamic language with GC and that is one of its fundamental properties. This proposal might make GC of iterators more deterministic, but that is only one case. For instance, in some places in asyncio source code we have statements like this: "self = None". Why? When an exception occurs and we want to save it (for instance to log it), it holds a reference to the Traceback object. Which in turn references frame objects. Which means that a lot of objects in those frames will be alive while the exception object is alive. So in asyncio we go to great lengths to avoid unnecessary runs of GC, but this is an exception! Most of Python code out there today doesn't do this sorts of tricks. And this is just one example of how you can have cycles that require a run of GC. It is not possible to have deterministic GC in real life Python applications. This proposal addresses only *one* use case, leaving 100s of others unresolved. IMO, while GC-related issues can be annoying to debug sometimes, it's not worth it to change the behaviour of iteration in Python only to slightly improve on this. 2. This proposal will make writing iterators significantly harder. Consider 'itertools.chain'. We will have to rewrite it to add the proposed __iterclose__ method. The Chain iterator object will have to track all of its iterators, call __iterclose__ on them when it's necessary (there are a few corner cases). Given that this object is implemented in C, it's quite a bit of work. And we'll have a lot of objects to fix. We can probably update all iterators in standard library (in 3.7), but what about third-party code? It will take many years until you can say with certainty that most of Python code supports __iterclose__ / __aiterclose__. 3. This proposal changes the behaviour of 'for' and 'async for' statements significantly. To do partial iteration you will have to use a special builtin function to guard the iterator from being closed. This is completely non-obvious to any existing Python user and will be hard to explain to newcomers. 4. This proposal only addresses iteration with 'for' and 'async for' statements. If you iterate using a 'while' loop and 'next()' function, this proposal wouldn't help you. Also see the point #2 about third-party code. 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a very similar fashion to synchronous generators. There is an API to help Python to call event loop to finalize AGs. asyncio in 3.6 (and other event loops in the near future) already uses this API to ensure that *all AGs in a long-running program are properly finalized* while it is being run. There is an extra loop method (`loop.shutdown_asyncgens`) that should be called right before stopping the loop (exiting the program) to make sure that all AGs are finalized, but if you forget to call it the world won't end. The process will end and the interpreter will shutdown, maybe issuing a couple of ResourceWarnings. No exception will pass silently in the current PEP 525 implementation. And if some AG isn't properly finalized a warning will be issued. The current AG finalization mechanism must stay even if this proposal gets accepted, as it ensures that even manually iterated AGs are properly finalized. 6. If this proposal gets accepted, I think we shouldn't introduce it in any form in 3.6. It's too late to implement it for both sync- and async-generators. Implementing it only for async-generators will only add cognitive overhead. Even implementing this only for async-generators will (and should!) delay 3.6 release significantly. 7. To conclude: I'm not convinced that this proposal fully solves the issue of non-deterministic GC of iterators. It cripples iteration protocols to partially solve the problem for 'for' and 'async for' statements, leaving manual iteration unresolved. It will make it harder to write *correct* (async-) iterators. It introduces some *implicit* context management to 'for' and 'async for' statements -- something that IMO should be done by user with an explicit 'with' or 'async with'. Yury From random832 at fastmail.com Wed Oct 19 12:38:51 2016 From: random832 at fastmail.com (Random832) Date: Wed, 19 Oct 2016 12:38:51 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote: > I'm -1 on the idea. Here's why: > > > 1. Python is a very dynamic language with GC and that is one of its > fundamental properties. This proposal might make GC of iterators more > deterministic, but that is only one case. There is a huge difference between wanting deterministic GC and wanting cleanup code to be called deterministically. We're not talking about memory usage here. From yselivanov.ml at gmail.com Wed Oct 19 12:43:42 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 19 Oct 2016 12:43:42 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <40e2a9cc-2801-ba80-7da7-9a550442bf0d@gmail.com> On 2016-10-19 12:38 PM, Random832 wrote: > On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote: >> I'm -1 on the idea. Here's why: >> >> >> 1. Python is a very dynamic language with GC and that is one of its >> fundamental properties. This proposal might make GC of iterators more >> deterministic, but that is only one case. > There is a huge difference between wanting deterministic GC and wanting > cleanup code to be called deterministically. We're not talking about > memory usage here. > I understand, but both topics are closely tied together. Cleanup code can be implemented in some __del__ method of some non-iterator object. This proposal doesn't address such cases, it focuses only on iterators. My point is that it's not worth it to *significantly* change iteration (protocols and statements) in Python to only *partially* address the issue. Yury From mistersheik at gmail.com Wed Oct 19 13:08:24 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 19 Oct 2016 17:08:24 +0000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: On Wed, Oct 19, 2016 at 11:08 AM Todd wrote: > On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar > wrote: > > This is a very interesting proposal. I just wanted to share something I > found in my quick search: > > > http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration > > Could you explain why the accepted answer there doesn't address this issue? > > class Parse(object): > """A generator that iterates through a file""" > def __init__(self, path): > self.path = path > > def __iter__(self): > with open(self.path) as f: > yield from f > > > Best, > > Neil > > > I think the difference is that this new approach guarantees cleanup the > exact moment the loop ends, no matter how it ends. > > If I understand correctly, your approach will do cleanup when the loop > ends only if the iterator is exhausted. But if someone zips it with a > shorter iterator, uses itertools.islice or something similar, breaks the > loop, returns inside the loop, or in some other way ends the loop before > the iterator is exhausted, the cleanup won't happen when the iterator is > garbage collected. And for non-reference-counting python implementations, > when this happens is completely unpredictable. > > -- > I don't see that. The "cleanup" will happen when collection is interrupted by an exception. This has nothing to do with garbage collection either since the cleanup happens deterministically when the block is ended. If this is the only example, then I would say this behavior is already provided and does not need to be added. > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/5xdf0WF1WyY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/5xdf0WF1WyY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Oct 19 14:11:53 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 11:11:53 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar wrote: > > > On Wed, Oct 19, 2016 at 11:08 AM Todd wrote: >> >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar >> wrote: >>> >>> This is a very interesting proposal. I just wanted to share something I >>> found in my quick search: >>> >>> >>> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration >>> >>> Could you explain why the accepted answer there doesn't address this >>> issue? >>> >>> class Parse(object): >>> """A generator that iterates through a file""" >>> def __init__(self, path): >>> self.path = path >>> >>> def __iter__(self): >>> with open(self.path) as f: >>> yield from f BTW it may make this easier to read if we notice that it's essentially a verbose way of writing: def parse(path): with open(path) as f: yield from f >> >> I think the difference is that this new approach guarantees cleanup the >> exact moment the loop ends, no matter how it ends. >> >> If I understand correctly, your approach will do cleanup when the loop >> ends only if the iterator is exhausted. But if someone zips it with a >> shorter iterator, uses itertools.islice or something similar, breaks the >> loop, returns inside the loop, or in some other way ends the loop before the >> iterator is exhausted, the cleanup won't happen when the iterator is garbage >> collected. And for non-reference-counting python implementations, when this >> happens is completely unpredictable. >> >> -- > > > I don't see that. The "cleanup" will happen when collection is interrupted > by an exception. This has nothing to do with garbage collection either > since the cleanup happens deterministically when the block is ended. If > this is the only example, then I would say this behavior is already provided > and does not need to be added. I think there might be a misunderstanding here. Consider code like this, that breaks out from the middle of the for loop: def use_that_generator(): for line in parse(...): if found_the_line_we_want(line): break # -- mark -- do_something_with_that_line(line) With current Python, what will happen is that when we reach the marked line, then the for loop has finished and will drop its reference to the generator object. At this point, the garbage collector comes into play. On CPython, with its reference counting collector, the garbage collector will immediately collect the generator object, and then the generator object's __del__ method will restart 'parse' by having the last 'yield' raise a GeneratorExit, and *that* exception will trigger the 'with' block's cleanup. But in order to get there, we're absolutely depending on the garbage collector to inject that GeneratorExit. And on an implementation like PyPy that doesn't use reference counting, the generator object will become collect*ible* at the marked line, but might not actually be collect*ed* for an arbitrarily long time afterwards. And until it's collected, the file will remain open. 'with' blocks guarantee that the resources they hold will be cleaned up promptly when the enclosing stack frame gets cleaned up, but for a 'with' block inside a generator then you still need something to guarantee that the enclosing stack frame gets cleaned up promptly! This proposal is about providing that thing -- with __(a)iterclose__, the end of the for loop immediately closes the generator object, so the garbage collector doesn't need to get involved. Essentially the same thing happens if we replace the 'break' with a 'raise'. Though with exceptions, things can actually get even messier, even on CPython. Here's a similar example except that (a) it exits early due to an exception (which then gets caught elsewhere), and (b) the invocation of the generator function ended up being kind of long, so I split the for loop into two lines with a temporary variable: def use_that_generator2(): it = parse("/a/really/really/really/really/really/really/really/long/path") for line in it: if not valid_format(line): raise ValueError() def catch_the_exception(): try: use_that_generator2() except ValueError: # -- mark -- ... Here the ValueError() is raised from use_that_generator2(), and then caught in catch_the_exception(). At the marked line, use_that_generator2's stack frame is still pinned in memory by the exception's traceback. And that means that all the local variables are also pinned in memory, including our temporary 'it'. Which means that parse's stack frame is also pinned in memory, and the file is not closed. With the __(a)iterclose__ proposal, when the exception is thrown then the 'for' loop in use_that_generator2() immediately closes the generator object, which in turn triggers parse's 'with' block, and that closes the file handle. And then after the file handle is closed, the exception continues propagating. So at the marked line, it's still the case that 'it' will be pinned in memory, but now 'it' is a closed generator object that has already relinquished its resources. -n -- Nathaniel J. Smith -- https://vorpus.org From rosuav at gmail.com Wed Oct 19 14:13:19 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 20 Oct 2016 05:13:19 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Thu, Oct 20, 2016 at 3:38 AM, Random832 wrote: > On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote: >> I'm -1 on the idea. Here's why: >> >> >> 1. Python is a very dynamic language with GC and that is one of its >> fundamental properties. This proposal might make GC of iterators more >> deterministic, but that is only one case. > > There is a huge difference between wanting deterministic GC and wanting > cleanup code to be called deterministically. We're not talking about > memory usage here. Currently, iterators get passed around casually - you can build on them, derive from them, etc, etc, etc. If you change the 'for' loop to explicitly close an iterator, will you also change 'yield from'? What about other forms of iteration? Will the iterator be closed when it runs out normally? This proposal is to iterators what 'with' is to open files and other resources. I can build on top of an open file fairly easily: @contextlib.contextmanager def file_with_header(fn): with open(fn, "w") as f: f.write("Header Row") yield f def main(): with file_with_header("asdf") as f: """do stuff""" I create a context manager based on another context manager, and I have a guarantee that the end of the main() 'with' block is going to properly close the file. Now, what happens if I do something similar with an iterator? def every_second(it): try: next(it) except StopIteration: return for value in it: yield value try: next(it) except StopIteration: break This will work, because it's built on a 'for' loop. What if it's built on a 'while' loop instead? def every_second_broken(it): try: while True: nextIit) yield next(it) except StopIteration: pass Now it *won't* correctly call the end-of-iteration function, because there's no 'for' loop. This is going to either (a) require that EVERY consumer of an iterator follow this new protocol, or (b) introduce a ton of edge cases. ChrisA From p.f.moore at gmail.com Wed Oct 19 14:38:28 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 19 Oct 2016 19:38:28 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On 19 October 2016 at 19:13, Chris Angelico wrote: > Now it *won't* correctly call the end-of-iteration function, because > there's no 'for' loop. This is going to either (a) require that EVERY > consumer of an iterator follow this new protocol, or (b) introduce a > ton of edge cases. Also, unless I'm misunderstanding the proposal, there's a fairly major compatibility break. At present we have: >>> lst = [1,2,3,4] >>> it = iter(lst) >>> for i in it: ... if i == 2: break >>> for i in it: ... print(i) 3 4 >>> With the proposed behaviour, if I understand it, "it" would be closed after the first loop, so resuming "it" for the second loop wouldn't work. Am I right in that? I know there's a proposed itertools function to bring back the old behaviour, but it's still a compatibility break. And code like this, that partially consumes an iterator, is not uncommon. Paul From toddrjen at gmail.com Wed Oct 19 15:08:21 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 15:08:21 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor Message-ID: I have been thinking about how to go about having a multidimensional array constructor in python. I know that Python doesn't have a built-in multidimensional array class and won't for the foreseeable future. However, some projects have come up with their own ways of making it simpler to create such arrays compared to the current somewhat verbose approach, and it might even be possible (although I think highly unlikely) for Python to provide a hook for third-party libraries to tie into the sort of syntax here. So I felt it might be worthwhile to get my thoughts on the topic in a central location for future use. If this sort of thing doesn't interest you I won't be offended if you stop reading now, and I apologize if it is considered off-topic for this ML. The problem is finding an operator that isn't already being used, wouldn't conflict with existing rules, wouldn't break existing code, but that would still be at clearer and and more concise than the current syntax. The notation I came up with uses "[|" and "|]". I picked this for 4 reasons. First, it isn't currently valid python syntax. Second, it is clearly connected with the list constructor "[ ]". Third, it is reminiscent of the "? ?" symbols used for matrices in mathematics. Fourth, "{| |}" and "(| |)" could be used for similar data structures (such as "{| |}" for labeled arrays like in pandas). Here is an example of how it would be used for a 1D array: a = [| 0, 1, 2 |] Compared to the current approach: a = np.ndarray([0, 1, 2]) It isn't much simpler (although it is considerably short). However, this new syntax becomes much clearer ?in my opinion) when dealing with higher number of dimensions (more on that at the end). For a 2D array, you would use two vertical bars as a dimension separator "||" (multiple vertical bars are also not valid python syntax): a = [| 0, 1, 2 || 3, 4, 5 |] Or, on multiple lines (whitespace is ignored): a = [| 0, 1, 2 || 3, 4, 5 |] b = [| 0, 1, 2 | | 3, 4, 5 |] You can also create a 2D row array by combining the two: a = [|| 0, 1, 2 ||] For higher dimensions, you can just put more lines together: a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] b = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] c = [||| 0, 1, 2 | | 3, 4, 5 | | | 6, 7, 8 | | 9, 10, 11 |||] A 3D row vector would just be: a = [||| 0, 1, 2 |||] A 3d column vector would be: a = [||| 0 || 1 || 2 |||] b = [||| 0 || 1 || 2 |||] A 3D depth vector would be: a = [||| 0 ||| 1 ||| 2 |||] b = [||| 0 ||| 1 ||| 2 |||] The rule for the number of dimensions is just the highest-specified dimension. So these are equivalent: a = [| 0, 1, 2 || 3, 4, 5 |] b = [|| 0, 1, 2 || 3, 4, 5 ||] This also means you would only strictly need to set the dimensions at one end. That means these are equivalent, although the second and third case should be discouraged: a = [|| 0, 1, 2 ||] b = [| 0, 1, 2 ||] c = [|| 0, 1, 2 |] As I said earlier, whitespace would not be significant. These would all be equivalent, but the fourth and fifth approaches would be discouraged as unclear. I would also discourage the third approach, since I think the whitespace at the beginning and end is important to avoid confusing, for example "[|2" with "[12". a = [| 0, 1 || 2, 3 |] b = [| 0, 1 | | 2, 3 |] c = [|0, 1||2, 3|] d = [| 0, 1 | | 2, 3 |] e = [ |0,1| |2,3| ] At least in my opinion, this sort of approach really shines when making higher-dimensional arrays. These would all be equivalent (the | at the beginning and end are just to make it easier to align indentation, they aren't required): a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 | | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 | | | -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 | | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 | || | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 | | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 | | | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 | | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] Compared to the current approach: a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) I think both of the new examples are considerably clearer than the current approach. Does anyone have any questions or thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Oct 19 15:11:40 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 19 Oct 2016 12:11:40 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <5807C56C.9060502@stoneleaf.us> On 10/19/2016 11:38 AM, Paul Moore wrote: > Also, unless I'm misunderstanding the proposal, there's a fairly major > compatibility break. At present we have: > >>>> lst = [1,2,3,4] >>>> it = iter(lst) >>>> for i in it: > ... if i == 2: break > >>>> for i in it: > ... print(i) > 3 > 4 >>>> > > With the proposed behaviour, if I understand it, "it" would be closed > after the first loop, so resuming "it" for the second loop wouldn't > work. Am I right in that? I know there's a proposed itertools function > to bring back the old behaviour, but it's still a compatibility break. > And code like this, that partially consumes an iterator, is not > uncommon. Agreed. I like the idea in general, but this particular break feels like a deal-breaker. I'd be okay with not having break close the iterator, and either introducing a 'break_and_close' type of keyword or some other way of signalling that we will not be using the iterator any more so go ahead and close it. Does that invalidate, or take away most of value of, the proposal? -- ~Ethan~ From njs at pobox.com Wed Oct 19 15:21:30 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 12:21:30 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore wrote: > On 19 October 2016 at 19:13, Chris Angelico wrote: >> Now it *won't* correctly call the end-of-iteration function, because >> there's no 'for' loop. This is going to either (a) require that EVERY >> consumer of an iterator follow this new protocol, or (b) introduce a >> ton of edge cases. > > Also, unless I'm misunderstanding the proposal, there's a fairly major > compatibility break. At present we have: > >>>> lst = [1,2,3,4] >>>> it = iter(lst) >>>> for i in it: > ... if i == 2: break > >>>> for i in it: > ... print(i) > 3 > 4 >>>> > > With the proposed behaviour, if I understand it, "it" would be closed > after the first loop, so resuming "it" for the second loop wouldn't > work. Am I right in that? I know there's a proposed itertools function > to bring back the old behaviour, but it's still a compatibility break. > And code like this, that partially consumes an iterator, is not > uncommon. Right -- did you reach the "transition plan" section? (I know it's wayyy down there.) The proposal is to hide this behind a __future__ at first + a mechanism during the transition period to catch code that depends on the old behavior and issue deprecation warnings. But it is a compatibility break, yes. -n -- Nathaniel J. Smith -- https://vorpus.org From toddrjen at gmail.com Wed Oct 19 15:21:41 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 15:21:41 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Wed, Oct 19, 2016 at 2:38 PM, Paul Moore wrote: > On 19 October 2016 at 19:13, Chris Angelico wrote: > > Now it *won't* correctly call the end-of-iteration function, because > > there's no 'for' loop. This is going to either (a) require that EVERY > > consumer of an iterator follow this new protocol, or (b) introduce a > > ton of edge cases. > > Also, unless I'm misunderstanding the proposal, there's a fairly major > compatibility break. At present we have: > > >>> lst = [1,2,3,4] > >>> it = iter(lst) > >>> for i in it: > ... if i == 2: break > > >>> for i in it: > ... print(i) > 3 > 4 > >>> > > With the proposed behaviour, if I understand it, "it" would be closed > after the first loop, so resuming "it" for the second loop wouldn't > work. Am I right in that? I know there's a proposed itertools function > to bring back the old behaviour, but it's still a compatibility break. > And code like this, that partially consumes an iterator, is not > uncommon. > > Paul > I may very well be misunderstanding the purpose of the proposal, but that is not how I saw it being used. I thought of it being used to clean up things that happened in the loop, rather than clean up the iterator itself. This would allow the iterator to manage events that occurred in the body of the loop. So it would be more like this scenario: >>> lst = objiterer([obj1, obj2, obj3, obj4]) >>> it = iter(lst) >>> for i, _ in zip(it, [1, 2]): ... b = i.some_method() >>> for i in it: ... c = i.other_method() >>> In this case, objiterer would do some cleanup related to obj1 and obj2 in the first loop and some cleanup related to obj3 and obj4 in the second loop. There would be no backwards-compatibility break, the method would be purely opt-in and most typical iterators wouldn't need it. However, in this case perhaps it might be better to have some method that is called after every loop, no matter how the loop is terminated (break, continue, return). This would allow the cleanup to be done every loop rather than just at the end. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomuxiong at gmx.com Wed Oct 19 15:24:15 2016 From: tomuxiong at gmx.com (Thomas Nyberg) Date: Wed, 19 Oct 2016 15:24:15 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Personally I like the way that numpy does it now better (even for multidimensional arrays). Being able to index into the different sub dimension using just [] iteratively matches naturally with the data structure itself in my mind. This may also just be my fear of change though... > Here is an example of how it would be used for a 1D array: > > a = [| 0, 1, 2 |] > > Compared to the current approach: > > a = np.ndarray([0, 1, 2]) What would the syntax do if you don't have numpy installed? Is the syntax tied to numpy or could other libraries make use of it? Cheers, Thomas From njs at pobox.com Wed Oct 19 15:33:57 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 12:33:57 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Wed, Oct 19, 2016 at 12:21 PM, Nathaniel Smith wrote: > On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore wrote: >> On 19 October 2016 at 19:13, Chris Angelico wrote: >>> Now it *won't* correctly call the end-of-iteration function, because >>> there's no 'for' loop. This is going to either (a) require that EVERY >>> consumer of an iterator follow this new protocol, or (b) introduce a >>> ton of edge cases. >> >> Also, unless I'm misunderstanding the proposal, there's a fairly major >> compatibility break. At present we have: >> >>>>> lst = [1,2,3,4] >>>>> it = iter(lst) >>>>> for i in it: >> ... if i == 2: break >> >>>>> for i in it: >> ... print(i) >> 3 >> 4 >>>>> >> >> With the proposed behaviour, if I understand it, "it" would be closed >> after the first loop, so resuming "it" for the second loop wouldn't >> work. Am I right in that? I know there's a proposed itertools function >> to bring back the old behaviour, but it's still a compatibility break. >> And code like this, that partially consumes an iterator, is not >> uncommon. > > Right -- did you reach the "transition plan" section? (I know it's > wayyy down there.) The proposal is to hide this behind a __future__ at > first + a mechanism during the transition period to catch code that > depends on the old behavior and issue deprecation warnings. But it is > a compatibility break, yes. I should also say, regarding your specific example, I guess it's an open question whether we would want list_iterator.__iterclose__ to actually do anything. It could flip the iterator to a state where it always raises StopIteration, or RuntimeError, or it could just be a no-op that allows iteration to continue normally afterwards. list_iterator doesn't have a close method right now, and it certainly can't "close" the underlying list (whatever that would even mean), so I don't think there's a strong expectation that it should do anything in particular. The __iterclose__ contract is that you're not supposed to call __next__ afterwards, so there's no real rule about what happens if you do. And there aren't strong conventions right now about what happens when you try to iterate an explicitly closed iterator -- files raise an error, generators just act like they were exhausted. So there's a few options that all seem more-or-less reasonable and I don't know that it's very important which one we pick. -n -- Nathaniel J. Smith -- https://vorpus.org From toddrjen at gmail.com Wed Oct 19 15:53:43 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 15:53:43 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg wrote: > Personally I like the way that numpy does it now better (even for > multidimensional arrays). Being able to index into the different sub > dimension using just [] iteratively matches naturally with the data > structure itself in my mind. This may also just be my fear of change > though... > > I agree, that is one of the reasons this is still using "[ ]". You can think of the "|" as more of a dimension delimiter. Also keep in mind that tuples and dicts still use [ ] for indexing even though their constructor doesn't use [ ]. > Here is an example of how it would be used for a 1D array: >> >> a = [| 0, 1, 2 |] >> >> Compared to the current approach: >> >> a = np.ndarray([0, 1, 2]) >> > > What would the syntax do if you don't have numpy installed? Is the syntax > tied to numpy or could other libraries make use of it? > That would depend on the implementation, which is another issue I hesitate to even discuss up because it is a huge can of worms. The most plausible implementation in my mind would be for projects like IPython, Spyder, Sage, or some array-oriented language that compiles to Python to have their own hooks that would replace this sort syntax with "np.ndarray" behind-the-scenes. A less likely scenario that occurred to me would be for the Python interpreter to provide some sort of hook to allow a class to be registered as the handler for this syntax. So someone could register numpy, dask, dynd, or whatever they wanted as the handler. If nothing was registered using the syntax would raise an exception (perhaps NameError or some new exception). With equivalent "(| |)" and "{| |}" syntax you could conceivably register three packages. I figured perhaps "[| |]" would be used for your primary array class (which currently would pretty much always be a ndarray), and there could be a more dict-like "{| |}" syntax that could be used for pandas or xarray, leaving "(| |)" for a more special-purpose library of your choosing. But that would be a convention, it would be entirely up to the user. Behind-the-scenes, this syntax would be converted to nested tuples or lists (or maybe dicts for "{| |}") and passed to the constructor or a special classmethod for the registered class to handle however it sees fit. There are all sorts of questions and corner cases for this hook approach though. Could people change the registered handler after it is set? At what points during script executation would setting the handler be allowed, at any point or only near the beginning? Would "{| |}" use the same syntax or some sort of dict-like syntax? If dict-like, would there be separate dict-like and set-like syntaxes, resulting in four handlers? Or would list-like and dict-like syntaxes be allowed in all cases, and handlers would need to deal with getting lists/tuples or dicts (even if handling was simply raising a TypeError)? Does the hook provide lists or tuples? Does the data get fed directly to the constructor or to a special class method? If the former, how do classes identify themselves as being able to act as handlers, or should users be allowed to register any class? There are so many questions I don't have good answers to I don't feel comfortable proposing this sort of hook approach as something that should actually be implemented. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Wed Oct 19 15:55:28 2016 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 19 Oct 2016 15:55:28 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: You could add or prototype this with quasiquotes ( http://quasiquotes.readthedocs.io/en/latest/). You just need to be able to parse the body of your expression as a string into an array. Here is a quick example with a parser that only accepts 2d arrays: ``` # coding: quasiquotes import numpy as np from quasiquotes import QuasiQuoter @object.__new__ class array(QuasiQuoter): def quote_expr(self, expr, frame, col_offset): return np.array([ eval('[%s]' % d, frame.f_globals, frame.f_locals) for d in expr.split('||') ]) def f(): a = 1 b = 2 c = 3 return [$array| a, b, c || 4, 5, 6 |] if __name__ == '__main__': print(f()) ``` Personally I am not sold on replacing `[` and `]` with `|` because I like that you can visually see where dimensions are closed. On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg wrote: > Personally I like the way that numpy does it now better (even for > multidimensional arrays). Being able to index into the different sub > dimension using just [] iteratively matches naturally with the data > structure itself in my mind. This may also just be my fear of change > though... > > Here is an example of how it would be used for a 1D array: >> >> a = [| 0, 1, 2 |] >> >> Compared to the current approach: >> >> a = np.ndarray([0, 1, 2]) >> > > What would the syntax do if you don't have numpy installed? Is the syntax > tied to numpy or could other libraries make use of it? > > Cheers, > Thomas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Wed Oct 19 15:56:17 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 19 Oct 2016 12:56:17 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <5807CFE1.8030808@brenbarn.net> On 2016-10-19 12:21, Nathaniel Smith wrote: >> >Also, unless I'm misunderstanding the proposal, there's a fairly major >> >compatibility break. At present we have: >> > >>>>> >>>>lst = [1,2,3,4] >>>>> >>>>it = iter(lst) >>>>> >>>>for i in it: >> >... if i == 2: break >> > >>>>> >>>>for i in it: >> >... print(i) >> >3 >> >4 >>>>> >>>> >> > >> >With the proposed behaviour, if I understand it, "it" would be closed >> >after the first loop, so resuming "it" for the second loop wouldn't >> >work. Am I right in that? I know there's a proposed itertools function >> >to bring back the old behaviour, but it's still a compatibility break. >> >And code like this, that partially consumes an iterator, is not >> >uncommon. > > Right -- did you reach the "transition plan" section? (I know it's > wayyy down there.) The proposal is to hide this behind a __future__ at > first + a mechanism during the transition period to catch code that > depends on the old behavior and issue deprecation warnings. But it is > a compatibility break, yes. To me this makes the change too hard to swallow. Although the issues you describe are real, it doesn't seem worth it to me to change the entire semantics of for loops just for these cases. There are lots of for loops that are not async and/or do not rely on resource cleanup. This will change how all of them work, just to fix something that sometimes is a problem for some resource-wrapping iterators. Moreover, even when the iterator does wrap a resource, sometimes I want to be able to stop and resume iteration. It's not uncommon, for instance, to have code using the csv module that reads some rows, pauses to make a decision (e.g., to parse differently depending what header columns are present, or skip some number of rows), and then resumes. This would increase the burden of updating code to adapt to the new breakage (since in this case the programmer would likely have to, or at least want to, think about what is going on rather than just blindly wrapping everything with protect() ). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From toddrjen at gmail.com Wed Oct 19 16:10:20 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 16:10:20 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: On Wed, Oct 19, 2016 at 3:55 PM, Joseph Jevnik wrote: > You could add or prototype this with quasiquotes (http://quasiquotes. > readthedocs.io/en/latest/). You just need to be able to parse the body of > your expression as a string into an array. Here is a quick example with a > parser that only accepts 2d arrays: > > ``` > # coding: quasiquotes > > import numpy as np > from quasiquotes import QuasiQuoter > > > @object.__new__ > class array(QuasiQuoter): > def quote_expr(self, expr, frame, col_offset): > return np.array([ > eval('[%s]' % d, frame.f_globals, frame.f_locals) > for d in expr.split('||') > ]) > > > def f(): > a = 1 > b = 2 > c = 3 > return [$array| a, b, c || 4, 5, 6 |] > > > if __name__ == '__main__': > print(f()) > ``` > Interesting project, thanks! If there is any actual interest in this that might be a good way to prototype it. > Personally I am not sold on replacing `[` and `]` with `|` because I like > that you can visually see where dimensions are closed. > > Yes, that issue occurred to me. But assuming a rectangular matrix, I had trouble coming up with a good example that is clearer than what you could do with this syntax. For simple arrays it isn't needed, and complicated arrays are large so picking out the "[" and "]" becomes visually harder at least for me. Do you have a specific example that you think would be clearer than what is possible with this syntax? Of course that is more of an issue with jagged arrays, but numpy doesn't support those and I am not aware of any plans to add them (dynd is another story). Also keep in mind that this would supplement the existing approach? it doesn't replace it. np.ndarray() would stay around just like list() stays around for cases where it makes sense. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Oct 19 16:14:19 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 19 Oct 2016 20:14:19 +0000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith wrote: > On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar > wrote: > > > > > > On Wed, Oct 19, 2016 at 11:08 AM Todd wrote: > >> > >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar > >> wrote: > >>> > >>> This is a very interesting proposal. I just wanted to share something > I > >>> found in my quick search: > >>> > >>> > >>> > http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration > >>> > >>> Could you explain why the accepted answer there doesn't address this > >>> issue? > >>> > >>> class Parse(object): > >>> """A generator that iterates through a file""" > >>> def __init__(self, path): > >>> self.path = path > >>> > >>> def __iter__(self): > >>> with open(self.path) as f: > >>> yield from f > > BTW it may make this easier to read if we notice that it's essentially > a verbose way of writing: > > def parse(path): > with open(path) as f: > yield from f > > >> > >> I think the difference is that this new approach guarantees cleanup the > >> exact moment the loop ends, no matter how it ends. > >> > >> If I understand correctly, your approach will do cleanup when the loop > >> ends only if the iterator is exhausted. But if someone zips it with a > >> shorter iterator, uses itertools.islice or something similar, breaks the > >> loop, returns inside the loop, or in some other way ends the loop > before the > >> iterator is exhausted, the cleanup won't happen when the iterator is > garbage > >> collected. And for non-reference-counting python implementations, when > this > >> happens is completely unpredictable. > >> > >> -- > > > > > > I don't see that. The "cleanup" will happen when collection is > interrupted > > by an exception. This has nothing to do with garbage collection either > > since the cleanup happens deterministically when the block is ended. If > > this is the only example, then I would say this behavior is already > provided > > and does not need to be added. > > I think there might be a misunderstanding here. Consider code like > this, that breaks out from the middle of the for loop: > > def use_that_generator(): > for line in parse(...): > if found_the_line_we_want(line): > break > # -- mark -- > do_something_with_that_line(line) > > With current Python, what will happen is that when we reach the marked > line, then the for loop has finished and will drop its reference to > the generator object. At this point, the garbage collector comes into > play. On CPython, with its reference counting collector, the garbage > collector will immediately collect the generator object, and then the > generator object's __del__ method will restart 'parse' by having the > last 'yield' raise a GeneratorExit, and *that* exception will trigger > the 'with' block's cleanup. But in order to get there, we're > absolutely depending on the garbage collector to inject that > GeneratorExit. And on an implementation like PyPy that doesn't use > reference counting, the generator object will become collect*ible* at > the marked line, but might not actually be collect*ed* for an > arbitrarily long time afterwards. And until it's collected, the file > will remain open. 'with' blocks guarantee that the resources they hold > will be cleaned up promptly when the enclosing stack frame gets > cleaned up, but for a 'with' block inside a generator then you still > need something to guarantee that the enclosing stack frame gets > cleaned up promptly! > Yes, I understand that. Maybe this is clearer. This class adds an iterclose to any iterator so that when iteration ends, iterclose is automatically called: def my_iterclose(): print("Closing!") class AddIterclose: def __init__(self, iterable, iterclose): self.iterable = iterable self.iterclose = iterclose def __iter__(self): try: for x in self.iterable: yield x finally: self.iterclose() try: for x in AddIterclose(range(10), my_iterclose): print(x) if x == 5: raise ValueError except: pass > > This proposal is about providing that thing -- with __(a)iterclose__, > the end of the for loop immediately closes the generator object, so > the garbage collector doesn't need to get involved. > > Essentially the same thing happens if we replace the 'break' with a > 'raise'. Though with exceptions, things can actually get even messier, > even on CPython. Here's a similar example except that (a) it exits > early due to an exception (which then gets caught elsewhere), and (b) > the invocation of the generator function ended up being kind of long, > so I split the for loop into two lines with a temporary variable: > > def use_that_generator2(): > it = > parse("/a/really/really/really/really/really/really/really/long/path") > for line in it: > if not valid_format(line): > raise ValueError() > > def catch_the_exception(): > try: > use_that_generator2() > except ValueError: > # -- mark -- > ... > > Here the ValueError() is raised from use_that_generator2(), and then > caught in catch_the_exception(). At the marked line, > use_that_generator2's stack frame is still pinned in memory by the > exception's traceback. And that means that all the local variables are > also pinned in memory, including our temporary 'it'. Which means that > parse's stack frame is also pinned in memory, and the file is not > closed. > > With the __(a)iterclose__ proposal, when the exception is thrown then > the 'for' loop in use_that_generator2() immediately closes the > generator object, which in turn triggers parse's 'with' block, and > that closes the file handle. And then after the file handle is closed, > the exception continues propagating. So at the marked line, it's still > the case that 'it' will be pinned in memory, but now 'it' is a closed > generator object that has already relinquished its resources. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Oct 19 16:16:52 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 20 Oct 2016 07:16:52 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: On Thu, Oct 20, 2016 at 7:14 AM, Neil Girdhar wrote: > class AddIterclose: > > def __init__(self, iterable, iterclose): > self.iterable = iterable > self.iterclose = iterclose > > def __iter__(self): > try: > for x in self.iterable: > yield x > finally: > self.iterclose() Can this be simplified down to a generator? def AddIterclose(iterable, iterclose): try: yield from iterable finally: iterclose() ChrisA From yselivanov.ml at gmail.com Wed Oct 19 16:33:46 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 19 Oct 2016 16:33:46 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <76f44dc9-1b03-313c-22a4-f5ba4baf4999@gmail.com> On 2016-10-19 3:33 PM, Nathaniel Smith wrote: >>>>>> lst = [1,2,3,4] >>>>>> >>>>>it = iter(lst) >>>>>> >>>>>for i in it: >>> >>... if i == 2: break >>> >> >>>>>> >>>>>for i in it: >>> >>... print(i) >>> >>3 >>> >>4 >>>>>> >>>>> >>> >> >>> >>With the proposed behaviour, if I understand it, "it" would be closed >>> >>after the first loop, so resuming "it" for the second loop wouldn't >>> >>work. Am I right in that? I know there's a proposed itertools function >>> >>to bring back the old behaviour, but it's still a compatibility break. >>> >>And code like this, that partially consumes an iterator, is not >>> >>uncommon. >> > >> >Right -- did you reach the "transition plan" section? (I know it's >> >wayyy down there.) The proposal is to hide this behind a __future__ at >> >first + a mechanism during the transition period to catch code that >> >depends on the old behavior and issue deprecation warnings. But it is >> >a compatibility break, yes. > I should also say, regarding your specific example, I guess it's an > open question whether we would want list_iterator.__iterclose__ to > actually do anything. It could flip the iterator to a state where it > always raises StopIteration, or RuntimeError, or it could just be a > no-op that allows iteration to continue normally afterwards. Making 'for' loop to behave differently for built-in containers (i.e. make __iterclose__ a no-op for them) will only make this whole thing even more confusing. It has to be consistent: if you partially iterate over *anything* without wrapping it with `preserve()`, it should always close the iterator. Yury From mikhailwas at gmail.com Wed Oct 19 16:35:08 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 19 Oct 2016 22:35:08 +0200 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: On 19 October 2016 at 21:08, Todd wrote: > > a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], > [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], > [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], > [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], > [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], > [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], > [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], > [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) > > I think both of the new examples are considerably clearer than the current > approach. > > Does anyone have any questions or thoughts? My 5 cents here. When I am dealing with such arrays, the only *good* solution which comes to my mind is to find or develop a nice GUI application which will allow me to use all powers of mouse/keyboard for navigation through data and switching between dimensions, and editing them in an effective way. Anything in a text mode editor for this task will be probably pointless, both for editing and reading such arrays. And indeed it is frustrating and error prone at times. Mikhail From matt at getpattern.com Wed Oct 19 16:47:11 2016 From: matt at getpattern.com (Matt Gilson) Date: Wed, 19 Oct 2016 13:47:11 -0700 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: FWIW, you probably _don't_ want to use `ndarray` directly. Normally, you want to use the `np.array` factory function... >>> import numpy as np >>> a = np.ndarray([0, 1, 2]) >>> a array([], shape=(0, 1, 2), dtype=float64) Aside from that, my main problem with this proposal is that it seems to only be relevant when used in third party code. There _is_ some precedence for this (for example rich comparisons and the matrix multiplication operator) -- However, these are all _operators_ so third party code can hook into it using the provided hook methods. This proposal is different in that it _isn't_ proposing an operator, so there isn't any object on which to define a magic hook method. I think that it was mentioned that it might be possible for a user to _register_ a callable that would then be used when this syntax was envoked -- But having a global setting like that leads to contention. What if I want to use this syntax with `np.ndarray` but some other third party code (that I want to use _with_ numpy_ tries to hook into the syntax as well? All of a sudden, my script stops working as soon as I import a new third party module. I _do_ think that this might be a valid proposal for some of the more domain specific python variants (e.g. IPython) which have a pre-processing layer on top of the rest of the language. It might be worth trying to float this idea in one of their ideas mailing lists/issue trackers. On Wed, Oct 19, 2016 at 1:10 PM, Todd wrote: > On Wed, Oct 19, 2016 at 3:55 PM, Joseph Jevnik wrote: > >> You could add or prototype this with quasiquotes ( >> http://quasiquotes.readthedocs.io/en/latest/). You just need to be able >> to parse the body of your expression as a string into an array. Here is a >> quick example with a parser that only accepts 2d arrays: >> >> ``` >> # coding: quasiquotes >> >> import numpy as np >> from quasiquotes import QuasiQuoter >> >> >> @object.__new__ >> class array(QuasiQuoter): >> def quote_expr(self, expr, frame, col_offset): >> return np.array([ >> eval('[%s]' % d, frame.f_globals, frame.f_locals) >> for d in expr.split('||') >> ]) >> >> >> def f(): >> a = 1 >> b = 2 >> c = 3 >> return [$array| a, b, c || 4, 5, 6 |] >> >> >> if __name__ == '__main__': >> print(f()) >> ``` >> > > Interesting project, thanks! If there is any actual interest in this that > might be a good way to prototype it. > > >> Personally I am not sold on replacing `[` and `]` with `|` because I like >> that you can visually see where dimensions are closed. >> >> > Yes, that issue occurred to me. But assuming a rectangular matrix, I had > trouble coming up with a good example that is clearer than what you could > do with this syntax. For simple arrays it isn't needed, and complicated > arrays are large so picking out the "[" and "]" becomes visually harder at > least for me. Do you have a specific example that you think would be > clearer than what is possible with this syntax? > > Of course that is more of an issue with jagged arrays, but numpy doesn't > support those and I am not aware of any plans to add them (dynd is another > story). > > Also keep in mind that this would supplement the existing approach? it > doesn't replace it. np.ndarray() would stay around just like list() stays > around for cases where it makes sense. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt at getpattern.com // P: 603.892.7736 We?re looking for beta testers. Go here to sign up! -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Oct 19 17:02:18 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 14:02:18 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: Hi Yury, Thanks for the detailed comments! Replies inline below. On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov wrote: > I'm -1 on the idea. Here's why: > > > 1. Python is a very dynamic language with GC and that is one of its > fundamental properties. This proposal might make GC of iterators more > deterministic, but that is only one case. > > For instance, in some places in asyncio source code we have statements like > this: "self = None". Why? When an exception occurs and we want to save it > (for instance to log it), it holds a reference to the Traceback object. > Which in turn references frame objects. Which means that a lot of objects > in those frames will be alive while the exception object is alive. So in > asyncio we go to great lengths to avoid unnecessary runs of GC, but this is > an exception! Most of Python code out there today doesn't do this sorts of > tricks. > > And this is just one example of how you can have cycles that require a run > of GC. It is not possible to have deterministic GC in real life Python > applications. This proposal addresses only *one* use case, leaving 100s of > others unresolved. Maybe I'm misunderstanding, but I think those 100s of other cases where you need deterministic cleanup are why 'with' blocks were invented, and in my experience they work great for that. Once you get in the habit, it's very easy and idiomatic to attach a 'with' to each file handle, socket, etc., at the point where you create it. So from where I stand, it seems like those 100s of unresolved cases actually are resolved? The problem is that 'with' blocks are great, and generators are great, but when you put them together into the same language there's this weird interaction that emerges, where 'with' blocks inside generators don't really work for their intended purpose unless you're very careful and willing to write boilerplate. Adding deterministic cleanup to generators plugs this gap. Beyond that, I do think it's a nice bonus that other iterables can take advantage of the feature, but this isn't just a random "hey let's smush two constructs together to save a line of code" thing -- iteration is special because it's where generator call stacks and regular call stacks meet. > IMO, while GC-related issues can be annoying to debug sometimes, it's not > worth it to change the behaviour of iteration in Python only to slightly > improve on this. > > 2. This proposal will make writing iterators significantly harder. Consider > 'itertools.chain'. We will have to rewrite it to add the proposed > __iterclose__ method. The Chain iterator object will have to track all of > its iterators, call __iterclose__ on them when it's necessary (there are a > few corner cases). Given that this object is implemented in C, it's quite a > bit of work. And we'll have a lot of objects to fix. When you say "make writing iterators significantly harder", is it fair to say that you're thinking mostly of what I'm calling "iterator wrappers"? For most day-to-day iterators, it's pretty trivial to either add a close method or not; the tricky cases are when you're trying to manage a collection of sub-iterators. itertools.chain is a great challenge / test case here, because I think it's about as hard as this gets :-). It took me a bit to wrap my head around, but I think I've got it, and that it's not so bad actually. Right now, chain's semantics are: # copied directly from the docs def chain(*iterables): for it in iterables: for element in it: yield element In a post-__iterclose__ world, the inner for loop there will already handle closing each iterators as its finished being consumed, and if the generator is closed early then the inner for loop will also close the current iterator. What we need to add is that if the generator is closed early, we should also close all the unprocessed iterators. The first change is to replace the outer for loop with a while/pop loop, so that if an exception occurs we'll know which iterables remain to be processed: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element ... Now, what do we do if an exception does occur? We need to call iterclose on all of the remaining iterables, but the tricky bit is that this might itself raise new exceptions. If this happens, we don't want to abort early; instead, we want to continue until we've closed all the iterables, and then raise a chained exception. Basically what we want is: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element finally: try: operators.iterclose(iter(iterables[0])) finally: try: operators.iterclose(iter(iterables[1])) finally: try: operators.iterclose(iter(iterables[2])) finally: ... but of course that's not valid syntax. Fortunately, it's not too hard to rewrite that into real Python -- but it's a little dense: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element # This is equivalent to the nested-finally chain above: except BaseException as last_exc: for iterable in iterables: try: operators.iterclose(iter(iterable)) except BaseException as new_exc: if new_exc.__context__ is None: new_exc.__context__ = last_exc last_exc = new_exc raise last_exc It's probably worth wrapping that bottom part into an iterclose_all() helper, since the pattern probably occurs in other cases as well. (Actually, now that I think about it, the map() example in the text should be doing this instead of what it's currently doing... I'll fix that.) This doesn't strike me as fundamentally complicated, really -- the exception chaining logic makes it look scary, but basically it's just the current chain() plus a cleanup loop. I believe that this handles all the corner cases correctly. Am I missing something? And again, this strikes me as one of the worst cases -- the vast majority of iterators out there are not doing anything nearly this complicated with subiterators. > We can probably update all iterators in standard library (in 3.7), but what > about third-party code? It will take many years until you can say with > certainty that most of Python code supports __iterclose__ / __aiterclose__. Adding support to itertools, toolz.itertoolz, and generators (which are the most common way to implement iterator wrappers) will probably take care of 95% of uses, but yeah, there's definitely a long tail that will take time to shake out. The (extremely tentative) transition plan has __iterclose__ as opt-in until 3.9, so that's about 3.5 years from now. __aiterclose__ is a different matter of course, since there are very very few async iterator wrappers in the wild, and in general I think most people writing async iterators are watching async/await-related language developments very closely. > 3. This proposal changes the behaviour of 'for' and 'async for' statements > significantly. To do partial iteration you will have to use a special > builtin function to guard the iterator from being closed. This is > completely non-obvious to any existing Python user and will be hard to > explain to newcomers. It's true that it's non-obvious to existing users, but that's true of literally every change that we could ever make :-). That's why we have release notes, deprecation warnings, enthusiastic blog posts, etc. For newcomers... well, it's always difficult for those of us with more experience to put ourselves back in the mindset, but I don't see why this would be particularly difficult to explain? for loops consume their iterator; if you don't want that then here's how you avoid it. That's no more difficult to explain than what an iterator is in the first place, I don't think, and for me at least it's a lot easier to wrap my head around than the semantics of else blocks on for loops :-). (I always forget how those work.) > 4. This proposal only addresses iteration with 'for' and 'async for' > statements. If you iterate using a 'while' loop and 'next()' function, this > proposal wouldn't help you. Also see the point #2 about third-party code. True. If you're doing manual iteration, then you are still responsible for manual cleanup (if that's what you want), just like today. This seems fine to me -- I'm not sure why it's an objection to this proposal :-). > 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a > very similar fashion to synchronous generators. There is an API to help > Python to call event loop to finalize AGs. asyncio in 3.6 (and other event > loops in the near future) already uses this API to ensure that *all AGs in a > long-running program are properly finalized* while it is being run. > > There is an extra loop method (`loop.shutdown_asyncgens`) that should be > called right before stopping the loop (exiting the program) to make sure > that all AGs are finalized, but if you forget to call it the world won't > end. The process will end and the interpreter will shutdown, maybe issuing > a couple of ResourceWarnings. There is no law that says that the interpreter always shuts down after the event loop exits. We're talking about a fundamental language feature here, it shouldn't be dependent on the details of libraries and application shutdown tendencies :-(. > No exception will pass silently in the current PEP 525 implementation. Exceptions that occur inside a garbage-collected iterator will be printed to the console, or possibly logged according to whatever the event loop does with unhandled exceptions. And sure, that's better than nothing, if someone remembers to look at the console/logs. But they *won't* be propagated out to the containing frame, they can't be caught, etc. That's a really big difference. > And if some AG isn't properly finalized a warning will be issued. This actually isn't true of the code currently in asyncio master -- if the loop is already closed (either manually by the user or by its __del__ being called) when the AG finalizer executes, then the AG is silently discarded: https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad9247b6/asyncio/base_events.py#L352 This isn't really an argument against the mechanism though, just a bug you should probably fix :-). I guess it does point to my main dissatisfaction with the whole GC hook machinery, though. At this point I have spent many, many hours tracing through the details of this catching edge cases -- first during the initial PEP process, where there were a few rounds of revision, then again the last few days when I first thought I found a bunch of bugs that turned out to be spurious because I'd missed one line in the PEP, plus one real bug that you already know about (the finalizer-called-from-wrong-thread issue), and then I spent another hour carefully reading through the code again with PEP 442 open alongside once I realized how subtle the resurrection and cyclic reference issues are here, and now here's another minor bug for you. At this point I'm about 85% confident that it does actually function as described, or that we'll at least be able to shake out any remaining weird edge cases over the next 6-12 months as people use it. But -- and I realize this is an aesthetic reaction as much as anything else -- this all feels *really* unpythonic to me. Looking at the Zen, the phrases that come to mind are "complicated", and "If the implementation is hard to explain, ...". The __(a)iterclose__ proposal definitely has its complexity as well, but it's a very different kind. The core is incredibly straightforward: "there is this method, for loops always call it". That's it. When you look at a for loop, you can be extremely confident about what's going to happen and when. Of course then there's the question of defining this method on all the diverse iterators that we have floating around -- I'm not saying it's trivial. But you can take them one at a time, and each individual case is pretty straightforward. > The current AG finalization mechanism must stay even if this proposal gets > accepted, as it ensures that even manually iterated AGs are properly > finalized. Like I said in the text, I don't find this very persuasive, since if you're manually iterating then you can just as well take manual responsibility for cleaning things up. But I could live with both mechanisms co-existing. > 6. If this proposal gets accepted, I think we shouldn't introduce it in any > form in 3.6. It's too late to implement it for both sync- and > async-generators. Implementing it only for async-generators will only add > cognitive overhead. Even implementing this only for async-generators will > (and should!) delay 3.6 release significantly. I certainly don't want to delay 3.6. I'm not as convinced as you that the async-generator code alone is so complicated that it would force a delay, but if it is then 3.6.1 is also an option worth considering. > 7. To conclude: I'm not convinced that this proposal fully solves the issue > of non-deterministic GC of iterators. It cripples iteration protocols to > partially solve the problem for 'for' and 'async for' statements, leaving > manual iteration unresolved. It will make it harder to write *correct* > (async-) iterators. It introduces some *implicit* context management to > 'for' and 'async for' statements -- something that IMO should be done by > user with an explicit 'with' or 'async with'. The goal isn't to "fully solve the problem of non-deterministic GC of iterators". That would require magic :-). The goal is to provide tools so that when users run into this problem, they have viable options to solve it. Right now, we don't have those tools, as evidenced by the fact that I've basically never seen code that does this "correctly". We can tell people that they should be using explicit 'with' on every generator that might contain cleanup code, but they don't and they won't, and as a result their code quality is suffering on several axes (portability across Python implementations, 'with' blocks inside generators that don't actually do anything except spuriously hide ResourceWarnings, etc.). Adding __(a)iterclose__ to (async) for loops makes it easy and convenient to do the right thing in common cases; and in the less-usual case where you want to do manual iteration, then you can and should use a manual 'with' block too. The proposal is not trying to replace 'with' blocks :-). As for implicitness, eh. If 'for' is defined to mean 'iterate and then close', then that's what 'for' means. If we make the change then there won't be anything more implicit about 'for' calling __iterclose__ than there is about 'for' calling __iter__ or __next__. Definitely this will take some adjustment for those who are used to the old system, but sometimes that's the price of progress ;-). -n -- Nathaniel J. Smith -- https://vorpus.org From mistersheik at gmail.com Wed Oct 19 16:29:54 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 19 Oct 2016 13:29:54 -0700 (PDT) Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <7ebae09d-2dbe-4969-b09c-4e2296ff3b51@googlegroups.com> Message-ID: <52c6c4ff-272c-4a26-8da1-ee592ed4b3cb@googlegroups.com> Ohhh, sorry, you want __iterclose__ to happen when iteration is terminated by a break statement as well? Okay, I understand, and that's fair. However, I would rather that people be explicit about when they're iterating (use the iteration protocol) and when they're managing a resource (use a context manager). Trying to figure out where the context manager should go automatically (which is what it sounds like the proposal amounts to) is too difficult to get right, and when you get it wrong you close too early, and then what's the user supposed to do? Suppress the early close with an even more convoluted notation? If there is a problem with people iterating over things without a generator, my suggestion is to force them to use the generator. For example, don't make your object iterable: make the value yielded by the context manager iterable. Best, Neil (On preview, Re: Chris Angelico's refactoring of my code, nice!!) On Wednesday, October 19, 2016 at 4:14:32 PM UTC-4, Neil Girdhar wrote: > > > > On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith wrote: > >> On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar >> wrote: >> > >> > >> > On Wed, Oct 19, 2016 at 11:08 AM Todd wrote: >> >> >> >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar >> >> wrote: >> >>> >> >>> This is a very interesting proposal. I just wanted to share >> something I >> >>> found in my quick search: >> >>> >> >>> >> >>> >> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration >> >>> >> >>> Could you explain why the accepted answer there doesn't address this >> >>> issue? >> >>> >> >>> class Parse(object): >> >>> """A generator that iterates through a file""" >> >>> def __init__(self, path): >> >>> self.path = path >> >>> >> >>> def __iter__(self): >> >>> with open(self.path) as f: >> >>> yield from f >> >> BTW it may make this easier to read if we notice that it's essentially >> a verbose way of writing: >> >> def parse(path): >> with open(path) as f: >> yield from f >> >> >> >> >> I think the difference is that this new approach guarantees cleanup the >> >> exact moment the loop ends, no matter how it ends. >> >> >> >> If I understand correctly, your approach will do cleanup when the loop >> >> ends only if the iterator is exhausted. But if someone zips it with a >> >> shorter iterator, uses itertools.islice or something similar, breaks >> the >> >> loop, returns inside the loop, or in some other way ends the loop >> before the >> >> iterator is exhausted, the cleanup won't happen when the iterator is >> garbage >> >> collected. And for non-reference-counting python implementations, >> when this >> >> happens is completely unpredictable. >> >> >> >> -- >> > >> > >> > I don't see that. The "cleanup" will happen when collection is >> interrupted >> > by an exception. This has nothing to do with garbage collection either >> > since the cleanup happens deterministically when the block is ended. If >> > this is the only example, then I would say this behavior is already >> provided >> > and does not need to be added. >> >> I think there might be a misunderstanding here. Consider code like >> this, that breaks out from the middle of the for loop: >> >> def use_that_generator(): >> for line in parse(...): >> if found_the_line_we_want(line): >> break >> # -- mark -- >> do_something_with_that_line(line) >> >> With current Python, what will happen is that when we reach the marked >> line, then the for loop has finished and will drop its reference to >> the generator object. At this point, the garbage collector comes into >> play. On CPython, with its reference counting collector, the garbage >> collector will immediately collect the generator object, and then the >> generator object's __del__ method will restart 'parse' by having the >> last 'yield' raise a GeneratorExit, and *that* exception will trigger >> the 'with' block's cleanup. But in order to get there, we're >> absolutely depending on the garbage collector to inject that >> GeneratorExit. And on an implementation like PyPy that doesn't use >> reference counting, the generator object will become collect*ible* at >> the marked line, but might not actually be collect*ed* for an >> arbitrarily long time afterwards. And until it's collected, the file >> will remain open. 'with' blocks guarantee that the resources they hold >> will be cleaned up promptly when the enclosing stack frame gets >> cleaned up, but for a 'with' block inside a generator then you still >> need something to guarantee that the enclosing stack frame gets >> cleaned up promptly! >> > > Yes, I understand that. Maybe this is clearer. This class adds an > iterclose to any iterator so that when iteration ends, iterclose is > automatically called: > > def my_iterclose(): > print("Closing!") > > > class AddIterclose: > > def __init__(self, iterable, iterclose): > self.iterable = iterable > self.iterclose = iterclose > > def __iter__(self): > try: > for x in self.iterable: > yield x > finally: > self.iterclose() > > > try: > for x in AddIterclose(range(10), my_iterclose): > print(x) > if x == 5: > raise ValueError > except: > pass > > > >> >> This proposal is about providing that thing -- with __(a)iterclose__, >> the end of the for loop immediately closes the generator object, so >> the garbage collector doesn't need to get involved. >> >> Essentially the same thing happens if we replace the 'break' with a >> 'raise'. Though with exceptions, things can actually get even messier, >> even on CPython. Here's a similar example except that (a) it exits >> early due to an exception (which then gets caught elsewhere), and (b) >> the invocation of the generator function ended up being kind of long, >> so I split the for loop into two lines with a temporary variable: >> >> def use_that_generator2(): >> it = >> parse("/a/really/really/really/really/really/really/really/long/path") >> for line in it: >> if not valid_format(line): >> raise ValueError() >> >> def catch_the_exception(): >> try: >> use_that_generator2() >> except ValueError: >> # -- mark -- >> ... >> >> Here the ValueError() is raised from use_that_generator2(), and then >> caught in catch_the_exception(). At the marked line, >> use_that_generator2's stack frame is still pinned in memory by the >> exception's traceback. And that means that all the local variables are >> also pinned in memory, including our temporary 'it'. Which means that >> parse's stack frame is also pinned in memory, and the file is not >> closed. >> >> With the __(a)iterclose__ proposal, when the exception is thrown then >> the 'for' loop in use_that_generator2() immediately closes the >> generator object, which in turn triggers parse's 'with' block, and >> that closes the file handle. And then after the file handle is closed, >> the exception continues propagating. So at the marked line, it's still >> the case that 'it' will be pinned in memory, but now 'it' is a closed >> generator object that has already relinquished its resources. >> >> -n >> >> -- >> Nathaniel J. Smith -- https://vorpus.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Oct 19 17:08:16 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 14:08:16 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <76f44dc9-1b03-313c-22a4-f5ba4baf4999@gmail.com> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <76f44dc9-1b03-313c-22a4-f5ba4baf4999@gmail.com> Message-ID: On Wed, Oct 19, 2016 at 1:33 PM, Yury Selivanov wrote: > On 2016-10-19 3:33 PM, Nathaniel Smith wrote: > >>>>>>> lst = [1,2,3,4] >>>>>>> >>>>>it = iter(lst) >>>>>>> >>>>>for i in it: >>>> >>>> >>... if i == 2: break >>>> >> >>>>>>> >>>>>>> >>>>>for i in it: >>>> >>>> >>... print(i) >>>> >>3 >>>> >>4 >>>>>>> >>>>>>> >>>>> >>>> >>>> >> >>>> >>With the proposed behaviour, if I understand it, "it" would be closed >>>> >>after the first loop, so resuming "it" for the second loop wouldn't >>>> >>work. Am I right in that? I know there's a proposed itertools function >>>> >>to bring back the old behaviour, but it's still a compatibility break. >>>> >>And code like this, that partially consumes an iterator, is not >>>> >>uncommon. >>> >>> > >>> >Right -- did you reach the "transition plan" section? (I know it's >>> >wayyy down there.) The proposal is to hide this behind a __future__ at >>> >first + a mechanism during the transition period to catch code that >>> >depends on the old behavior and issue deprecation warnings. But it is >>> >a compatibility break, yes. >> >> I should also say, regarding your specific example, I guess it's an >> open question whether we would want list_iterator.__iterclose__ to >> actually do anything. It could flip the iterator to a state where it >> always raises StopIteration, or RuntimeError, or it could just be a >> no-op that allows iteration to continue normally afterwards. > > > Making 'for' loop to behave differently for built-in containers (i.e. make > __iterclose__ a no-op for them) will only make this whole thing even more > confusing. > > It has to be consistent: if you partially iterate over *anything* without > wrapping it with `preserve()`, it should always close the iterator. You're probably right. My gut is leaning the same way, I'm just hesitant to commit because I haven't thought about it for long. But I do stand by the claim that this is probably not *that* important either way :-). -n -- Nathaniel J. Smith -- https://vorpus.org From toddrjen at gmail.com Wed Oct 19 17:50:45 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 17:50:45 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: On Wed, Oct 19, 2016 at 4:47 PM, Matt Gilson wrote: > FWIW, you probably _don't_ want to use `ndarray` directly. Normally, you > want to use the `np.array` factory function... > > >>> import numpy as np > >>> a = np.ndarray([0, 1, 2]) > >>> a > array([], shape=(0, 1, 2), dtype=float64) > > Aside from that, my main problem with this proposal is that it seems to > only be relevant when used in third party code. There _is_ some precedence > for this (for example rich comparisons and the matrix multiplication > operator) -- However, these are all _operators_ so third party code can > hook into it using the provided hook methods. This proposal is different > in that it _isn't_ proposing an operator, so there isn't any object on > which to define a magic hook method. I think that it was mentioned that it > might be possible for a user to _register_ a callable that would then be > used when this syntax was envoked -- But having a global setting like that > leads to contention. What if I want to use this syntax with `np.ndarray` > but some other third party code (that I want to use _with_ numpy_ tries to > hook into the syntax as well? All of a sudden, my script stops working as > soon as I import a new third party module. > Yes, this should definitely not be a default import of a package for exactly that reason, and it should be local to the module in which it was invoked. The most likely way I saw it working was that the user would have to explicitly invoke the hook, rather than it happening by another module on import. It would happen in the module namespace, so it would be impossible for imports to invoke it, and your use of it wouldn't affect the use of it in other modules you import. This seemed to me the approach that is safest, most reliable, and least likely to cause confusion, unexpected behavior, and unexpected breakage down the road. If it happened at import, then having two modules invoke the hook would probably need to be an exception or first-come-first-serve. But I think requiring the user to manually invoke it would be better. But as I said there are a lot of other problems with this approach so I don't consider it particularly likely. > I _do_ think that this might be a valid proposal for some of the more > domain specific python variants (e.g. IPython) which have a pre-processing > layer on top of the rest of the language. It might be worth trying to > float this idea in one of their ideas mailing lists/issue trackers. > I do see this being the most likely scenario ultimately. I am pretty sure Sage already does its own ndarray handling, and I recall talk about doing it Spyder although I don't know if anything came of it. I will probably bring this up there at some point, but as I said this is the central location for Python ideas, so I thought having it here was important. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 19 17:52:34 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 19 Oct 2016 17:52:34 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> Nathaniel, On 2016-10-19 5:02 PM, Nathaniel Smith wrote: > Hi Yury, > > Thanks for the detailed comments! Replies inline below. NP! > > On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov wrote: >> I'm -1 on the idea. Here's why: >> >> >> 1. Python is a very dynamic language with GC and that is one of its >> fundamental properties. This proposal might make GC of iterators more >> deterministic, but that is only one case. >> >> For instance, in some places in asyncio source code we have statements like >> this: "self = None". Why? When an exception occurs and we want to save it >> (for instance to log it), it holds a reference to the Traceback object. >> Which in turn references frame objects. Which means that a lot of objects >> in those frames will be alive while the exception object is alive. So in >> asyncio we go to great lengths to avoid unnecessary runs of GC, but this is >> an exception! Most of Python code out there today doesn't do this sorts of >> tricks. >> >> And this is just one example of how you can have cycles that require a run >> of GC. It is not possible to have deterministic GC in real life Python >> applications. This proposal addresses only *one* use case, leaving 100s of >> others unresolved. > Maybe I'm misunderstanding, but I think those 100s of other cases > where you need deterministic cleanup are why 'with' blocks were > invented, and in my experience they work great for that. Once you get > in the habit, it's very easy and idiomatic to attach a 'with' to each > file handle, socket, etc., at the point where you create it. So from > where I stand, it seems like those 100s of unresolved cases actually > are resolved? Not all code can be written with 'with' statements, see my example with 'self = None' in asyncio. Python code can be quite complex, involving classes with __del__ that do some cleanups etc. Fundamentally, you cannot make GC of such objects deterministic. IOW I'm not convinced that if we implement your proposal we'll fix 90% (or even 30%) of cases where non-deterministic and postponed cleanup is harmful. > The problem is that 'with' blocks are great, and generators are great, > but when you put them together into the same language there's this > weird interaction that emerges, where 'with' blocks inside generators > don't really work for their intended purpose unless you're very > careful and willing to write boilerplate. > > Adding deterministic cleanup to generators plugs this gap. Beyond > that, I do think it's a nice bonus that other iterables can take > advantage of the feature, but this isn't just a random "hey let's > smush two constructs together to save a line of code" thing -- > iteration is special because it's where generator call stacks and > regular call stacks meet. Yes, I understand that your proposal really improves some things. OTOH it undeniably complicates the iteration protocol and requires a long period of deprecations, teaching users and library authors new semantics, etc. We only now begin to see Python 3 gaining traction. I don't want us to harm that by introducing another set of things to Python 3 that are significantly different from Python 2. DeprecationWarnings/future imports don't excite users either. >> IMO, while GC-related issues can be annoying to debug sometimes, it's not >> worth it to change the behaviour of iteration in Python only to slightly >> improve on this. >> >> 2. This proposal will make writing iterators significantly harder. Consider >> 'itertools.chain'. We will have to rewrite it to add the proposed >> __iterclose__ method. The Chain iterator object will have to track all of >> its iterators, call __iterclose__ on them when it's necessary (there are a >> few corner cases). Given that this object is implemented in C, it's quite a >> bit of work. And we'll have a lot of objects to fix. > When you say "make writing iterators significantly harder", is it fair > to say that you're thinking mostly of what I'm calling "iterator > wrappers"? For most day-to-day iterators, it's pretty trivial to > either add a close method or not; the tricky cases are when you're > trying to manage a collection of sub-iterators. Yes, mainly iterator wrappers. You'll also will need to educate users to refactor (more on that below) their __del__ methods to __(a)iterclose__ in 3.6. > > itertools.chain is a great challenge / test case here, because I think > it's about as hard as this gets :-). It took me a bit to wrap my head > around, but I think I've got it, and that it's not so bad actually. Now imagine that being applied throughout the stdlib, plus some of it will have to be implemented in C. I'm not saying it's impossible, I'm saying that it will require additional effort for CPython and ecosystem. [..] > >> 3. This proposal changes the behaviour of 'for' and 'async for' statements >> significantly. To do partial iteration you will have to use a special >> builtin function to guard the iterator from being closed. This is >> completely non-obvious to any existing Python user and will be hard to >> explain to newcomers. > It's true that it's non-obvious to existing users, but that's true of > literally every change that we could ever make :-). That's why we have > release notes, deprecation warnings, enthusiastic blog posts, etc. We don't often change the behavior of basic statements like 'for', if ever. > > For newcomers... well, it's always difficult for those of us with more > experience to put ourselves back in the mindset, but I don't see why > this would be particularly difficult to explain? for loops consume > their iterator; if you don't want that then here's how you avoid it. > That's no more difficult to explain than what an iterator is in the > first place, I don't think, and for me at least it's a lot easier to > wrap my head around than the semantics of else blocks on for loops > :-). (I always forget how those work.) A lot of code that you find on stackoverflow etc will be broken. Porting code from Python2/<3.6 will be challenging. People are still struggling to understand 'dict.keys()'-like views in Python 3. > >> 4. This proposal only addresses iteration with 'for' and 'async for' >> statements. If you iterate using a 'while' loop and 'next()' function, this >> proposal wouldn't help you. Also see the point #2 about third-party code. > True. If you're doing manual iteration, then you are still responsible > for manual cleanup (if that's what you want), just like today. This > seems fine to me -- I'm not sure why it's an objection to this > proposal :-). Right now we can implement the __del__ method to cleanup iterators. And it works for both partial iteration and cases where people forgot to close the iterator explicitly. With you proposal, to achieve the same (and make the code compatible with new for-loop semantics), users will have to implement both __iterclose__ and __del__. > >> 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a >> very similar fashion to synchronous generators. There is an API to help >> Python to call event loop to finalize AGs. asyncio in 3.6 (and other event >> loops in the near future) already uses this API to ensure that *all AGs in a >> long-running program are properly finalized* while it is being run. >> >> There is an extra loop method (`loop.shutdown_asyncgens`) that should be >> called right before stopping the loop (exiting the program) to make sure >> that all AGs are finalized, but if you forget to call it the world won't >> end. The process will end and the interpreter will shutdown, maybe issuing >> a couple of ResourceWarnings. > There is no law that says that the interpreter always shuts down after > the event loop exits. We're talking about a fundamental language > feature here, it shouldn't be dependent on the details of libraries > and application shutdown tendencies :-(. It's not about shutting down the interpreter or exiting the process. The majority of async applications just run the loop until they exit. The point of PEP 525 and how the finalization is handled in asyncio is that AGs will be properly cleaned up for the absolute majority of time (while the loop is running). [..] >> And if some AG isn't properly finalized a warning will be issued. > This actually isn't true of the code currently in asyncio master -- if > the loop is already closed (either manually by the user or by its > __del__ being called) when the AG finalizer executes, then the AG is > silently discarded: > https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad9247b6/asyncio/base_events.py#L352 > > This isn't really an argument against the mechanism though, just a bug > you should probably fix :-). I don't think it's a bug. When the loop is closed, the hook will do nothing, so the asynchronous generator will be cleaned up by the interpreter. If it has an 'await' expression in its 'finally' statement, the interpreter will issue a warning. I'll add a comment explaining this. > > I guess it does point to my main dissatisfaction with the whole GC > hook machinery, though. At this point I have spent many, many hours > tracing through the details of this catching edge cases -- first > during the initial PEP process, where there were a few rounds of > revision, then again the last few days when I first thought I found a > bunch of bugs that turned out to be spurious because I'd missed one > line in the PEP, plus one real bug that you already know about (the > finalizer-called-from-wrong-thread issue), and then I spent another > hour carefully reading through the code again with PEP 442 open > alongside once I realized how subtle the resurrection and cyclic > reference issues are here, and now here's another minor bug for you. Yes, I agree it's not an easy thing to digest. Good thing is that asyncio has a reference implementation of PEP 525 support, so people can learn from it. I'll definitely add more comments to make the code easier to read. > > At this point I'm about 85% confident that it does actually function > as described, or that we'll at least be able to shake out any > remaining weird edge cases over the next 6-12 months as people use it. > But -- and I realize this is an aesthetic reaction as much as anything > else -- this all feels *really* unpythonic to me. Looking at the Zen, > the phrases that come to mind are "complicated", and "If the > implementation is hard to explain, ...". > > The __(a)iterclose__ proposal definitely has its complexity as well, > but it's a very different kind. The core is incredibly > straightforward: "there is this method, for loops always call it". > That's it. When you look at a for loop, you can be extremely confident > about what's going to happen and when. Of course then there's the > question of defining this method on all the diverse iterators that we > have floating around -- I'm not saying it's trivial. But you can take > them one at a time, and each individual case is pretty > straightforward. The __(a)iterclose__ semantics is clear. What's not clear is how much harm changing the semantics of for-loops will do (and how to quantify the amount of good :)) [..] >> 7. To conclude: I'm not convinced that this proposal fully solves the issue >> of non-deterministic GC of iterators. It cripples iteration protocols to >> partially solve the problem for 'for' and 'async for' statements, leaving >> manual iteration unresolved. It will make it harder to write *correct* >> (async-) iterators. It introduces some *implicit* context management to >> 'for' and 'async for' statements -- something that IMO should be done by >> user with an explicit 'with' or 'async with'. > The goal isn't to "fully solve the problem of non-deterministic GC of > iterators". That would require magic :-). The goal is to provide tools > so that when users run into this problem, they have viable options to > solve it. Right now, we don't have those tools, as evidenced by the > fact that I've basically never seen code that does this "correctly". > We can tell people that they should be using explicit 'with' on every > generator that might contain cleanup code, but they don't and they > won't, and as a result their code quality is suffering on several axes > (portability across Python implementations, 'with' blocks inside > generators that don't actually do anything except spuriously hide > ResourceWarnings, etc.). Perhaps we should focus on teaching people that using 'with' statements inside (async-) generators is a bad idea. What you should do instead is to have a 'with' statement wrapping the code that uses the generator. Yury From njs at pobox.com Wed Oct 19 18:01:29 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Oct 2016 15:01:29 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Wed, Oct 19, 2016 at 11:13 AM, Chris Angelico wrote: > On Thu, Oct 20, 2016 at 3:38 AM, Random832 wrote: >> On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote: >>> I'm -1 on the idea. Here's why: >>> >>> >>> 1. Python is a very dynamic language with GC and that is one of its >>> fundamental properties. This proposal might make GC of iterators more >>> deterministic, but that is only one case. >> >> There is a huge difference between wanting deterministic GC and wanting >> cleanup code to be called deterministically. We're not talking about >> memory usage here. > > Currently, iterators get passed around casually - you can build on > them, derive from them, etc, etc, etc. If you change the 'for' loop to > explicitly close an iterator, will you also change 'yield from'? Oh good point -- 'yield from' definitely needs a mention. Fortunately, I think it's pretty easy: the only way the child generator in a 'yield from' can be aborted early is if the parent generator is aborted early, so the semantics you'd want are that iff the parent generator is closed, then the child generator is also closed. 'yield from' already implements those semantics :-). So the only remaining issue is what to do if the child iterator completes normally, and in this case I guess 'yield from' probably should call '__iterclose__' at that point, like the equivalent for loop would. > What > about other forms of iteration? Will the iterator be closed when it > runs out normally? The iterator is closed if someone explicitly closes it, either by calling the method by hand, or by passing it to a construct that calls that method -- a 'for' loop without preserve(...), etc. Obviously any given iterator's __next__ method could decide to do whatever it wants when it's exhausted normally, including executing its 'close' logic, but there's no magic that causes __iterclose__ to be called here. The distinction between exhausted and exhausted+closed is useful: consider some sort of file-wrapping iterator that implements __iterclose__ as closing the file. Then this exhausts the iterator and then closes the file: for line in file_wrapping_iter: ... and this also exhausts the iterator, but since __iterclose__ is not called, it doesn't close the file, allowing it to be re-used: for line in preserve(file_wrapping_iter): ... OTOH there is one important limitation to this, which is that if you're implementing your iterator by using a generator, then generators in particular don't provide any way to distinguish between exhausted and exhausted+closed (this is just how generators already work, nothing to do with this proposal). Once a generator has been exhausted, its close() method becomes a no-op. > This proposal is to iterators what 'with' is to open files and other > resources. I can build on top of an open file fairly easily: > > @contextlib.contextmanager > def file_with_header(fn): > with open(fn, "w") as f: > f.write("Header Row") > yield f > > def main(): > with file_with_header("asdf") as f: > """do stuff""" > > I create a context manager based on another context manager, and I > have a guarantee that the end of the main() 'with' block is going to > properly close the file. Now, what happens if I do something similar > with an iterator? > > def every_second(it): > try: > next(it) > except StopIteration: > return > for value in it: > yield value > try: > next(it) > except StopIteration: > break BTW, it's probably easier to read this way :-): def every_second(it): for i, value in enumerate(it): if i % 2 == 1: yield value > This will work, because it's built on a 'for' loop. What if it's built > on a 'while' loop instead? > > def every_second_broken(it): > try: > while True: > nextIit) > yield next(it) > except StopIteration: > pass > > Now it *won't* correctly call the end-of-iteration function, because > there's no 'for' loop. This is going to either (a) require that EVERY > consumer of an iterator follow this new protocol, or (b) introduce a > ton of edge cases. Right. If the proposal is accepted then a lot (I suspect the vast majority) of iterator consumers will automatically DTRT because they're already using 'for' loops or whatever; for those that don't, they'll do whatever they're written to do, and that might or might not match what users have come to expect. Hence the transition period, ResourceWarnings and DeprecationWarnings, etc. I think the benefits are worth it, but there certainly is a transition cost. -n -- Nathaniel J. Smith -- https://vorpus.org From p.f.moore at gmail.com Wed Oct 19 18:07:56 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 19 Oct 2016 23:07:56 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On 19 October 2016 at 20:21, Nathaniel Smith wrote: > On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore wrote: >> On 19 October 2016 at 19:13, Chris Angelico wrote: >>> Now it *won't* correctly call the end-of-iteration function, because >>> there's no 'for' loop. This is going to either (a) require that EVERY >>> consumer of an iterator follow this new protocol, or (b) introduce a >>> ton of edge cases. >> >> Also, unless I'm misunderstanding the proposal, there's a fairly major >> compatibility break. At present we have: >> >>>>> lst = [1,2,3,4] >>>>> it = iter(lst) >>>>> for i in it: >> ... if i == 2: break >> >>>>> for i in it: >> ... print(i) >> 3 >> 4 >>>>> >> >> With the proposed behaviour, if I understand it, "it" would be closed >> after the first loop, so resuming "it" for the second loop wouldn't >> work. Am I right in that? I know there's a proposed itertools function >> to bring back the old behaviour, but it's still a compatibility break. >> And code like this, that partially consumes an iterator, is not >> uncommon. > > Right -- did you reach the "transition plan" section? (I know it's > wayyy down there.) The proposal is to hide this behind a __future__ at > first + a mechanism during the transition period to catch code that > depends on the old behavior and issue deprecation warnings. But it is > a compatibility break, yes. I missed that you propose phasing this in, but it doesn't really alter much, I think the current behaviour is valuable and common, and I'm -1 on breaking it. It's just too much of a fundamental change to how loops and iterators interact for me to be comfortable with it - particularly as it's only needed for a very specific use case (none of my programs ever use async - why should I have to rewrite my loops with a clumsy extra call just to cater for a problem that only occurs in async code?) IMO, and I'm sorry if this is controversial, there's a *lot* of new language complexity that's been introduced for the async use case, and it's only the fact that it can be pretty much ignored by people who don't need or use async features that makes it acceptable (the "you don't pay for what you don't use" principle). The problem with this proposal is that it doesn't conform to that principle - it has a direct, negative impact on users who have no interest in async. Paul From robertc at robertcollins.net Wed Oct 19 18:41:28 2016 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 20 Oct 2016 11:41:28 +1300 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: Hey Nathaniel - I like the intent here, but I think perhaps it would be better if the problem is approached differently. Seems to me that making *generators* have a special 'you are done now' interface is special casing, which usually makes things harder to learn and predict; and that more the net effect is that all loop constructs will need to learn about that special case, whether looping over a list, a generator, or whatever. Generators already have a well defined lifecycle - but as you say its not defined consistently across Python VM's. The language has no guarantees about when finalisation will occur :(. The PEP 525 aclose is a bit awkward itself in this way - but unlike regular generators it does have a reason, which is that the language doesn't define an event loop context as a built in thing - so finalisation can't reliably summon one up. So rather than adding a special case to finalise objects used in one particular iteration - which will play havoc with break statements, can we instead look at making escape analysis a required part of the compiler: the borrow checker in rust is getting pretty good at managing a very similar problem :). I haven't fleshed out exactly what would be entailed, so consider this a 'what if' and YMMV :). -Rob On 19 October 2016 at 17:38, Nathaniel Smith wrote: > Hi all, > > I'd like to propose that Python's iterator protocol be enhanced to add > a first-class notion of completion / cleanup. > > This is mostly motivated by thinking about the issues around async > generators and cleanup. Unfortunately even though PEP 525 was accepted > I found myself unable to stop pondering this, and the more I've > pondered the more convinced I've become that the GC hooks added in PEP > 525 are really not enough, and that we'll regret it if we stick with > them, or at least with them alone :-/. The strategy here is pretty > different -- it's an attempt to dig down and make a fundamental > improvement to the language that fixes a number of long-standing rough > spots, including async generators. > > The basic concept is relatively simple: just adding a '__iterclose__' > method that 'for' loops call upon completion, even if that's via break > or exception. But, the overall issue is fairly complicated + iterators > have a large surface area across the language, so the text below is > pretty long. Mostly I wrote it all out to convince myself that there > wasn't some weird showstopper lurking somewhere :-). For a first pass > discussion, it probably makes sense to mainly focus on whether the > basic concept makes sense? The main rationale is at the top, but the > details are there too for those who want them. > > Also, for *right* now I'm hoping -- probably unreasonably -- to try to > get the async iterator parts of the proposal in ASAP, ideally for > 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal > like this, which I apologize for -- though async generators are > provisional in 3.6, so at least in theory changing them is not out of > the question.) So again, it might make sense to focus especially on > the async parts, which are a pretty small and self-contained part, and > treat the rest of the proposal as a longer-term plan provided for > context. The comparison to PEP 525 GC hooks comes right after the > initial rationale. > > Anyway, I'll be interested to hear what you think! > > -n > > ------------------ > > Abstract > ======== > > We propose to extend the iterator protocol with a new > ``__(a)iterclose__`` slot, which is called automatically on exit from > ``(async) for`` loops, regardless of how they exit. This allows for > convenient, deterministic cleanup of resources held by iterators > without reliance on the garbage collector. This is especially valuable > for asynchronous generators. > > > Note on timing > ============== > > In practical terms, the proposal here is divided into two separate > parts: the handling of async iterators, which should ideally be > implemented ASAP, and the handling of regular iterators, which is a > larger but more relaxed project that can't start until 3.7 at the > earliest. But since the changes are closely related, and we probably > don't want to end up with async iterators and regular iterators > diverging in the long run, it seems useful to look at them together. > > > Background and motivation > ========================= > > Python iterables often hold resources which require cleanup. For > example: ``file`` objects need to be closed; the `WSGI spec > `_ adds a ``close`` method > on top of the regular iterator protocol and demands that consumers > call it at the appropriate time (though forgetting to do so is a > `frequent source of bugs > `_); > and PEP 342 (based on PEP 325) extended generator objects to add a > ``close`` method to allow generators to clean up after themselves. > > Generally, objects that need to clean up after themselves also define > a ``__del__`` method to ensure that this cleanup will happen > eventually, when the object is garbage collected. However, relying on > the garbage collector for cleanup like this causes serious problems in > at least two cases: > > - In Python implementations that do not use reference counting (e.g. > PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet > many situations require *prompt* cleanup of resources. Delayed cleanup > produces problems like crashes due to file descriptor exhaustion, or > WSGI timing middleware that collects bogus times. > > - Async generators (PEP 525) can only perform cleanup under the > supervision of the appropriate coroutine runner. ``__del__`` doesn't > have access to the coroutine runner; indeed, the coroutine runner > might be garbage collected before the generator object. So relying on > the garbage collector is effectively impossible without some kind of > language extension. (PEP 525 does provide such an extension, but it > has a number of limitations that this proposal fixes; see the > "alternatives" section below for discussion.) > > Fortunately, Python provides a standard tool for doing resource > cleanup in a more structured way: ``with`` blocks. For example, this > code opens a file but relies on the garbage collector to close it:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > for document in read_newline_separated_json(path): > ... > > and recent versions of CPython will point this out by issuing a > ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: > > def read_newline_separated_json(path): > with open(path) as file_handle: # <-- with block > for line in file_handle: > yield json.loads(line) > > for document in read_newline_separated_json(path): # <-- outer for loop > ... > > But there's a subtlety here, caused by the interaction of ``with`` > blocks and generators. ``with`` blocks are Python's main tool for > managing cleanup, and they're a powerful one, because they pin the > lifetime of a resource to the lifetime of a stack frame. But this > assumes that someone will take care of cleaning up the stack frame... > and for generators, this requires that someone ``close`` them. > > In this case, adding the ``with`` block *is* enough to shut up the > ``ResourceWarning``, but this is misleading -- the file object cleanup > here is still dependent on the garbage collector. The ``with`` block > will only be unwound when the ``read_newline_separated_json`` > generator is closed. If the outer ``for`` loop runs to completion then > the cleanup will happen immediately; but if this loop is terminated > early by a ``break`` or an exception, then the ``with`` block won't > fire until the generator object is garbage collected. > > The correct solution requires that all *users* of this API wrap every > ``for`` loop in its own ``with`` block:: > > with closing(read_newline_separated_json(path)) as genobj: > for document in genobj: > ... > > This gets even worse if we consider the idiom of decomposing a complex > pipeline into multiple nested generators:: > > def read_users(path): > with closing(read_newline_separated_json(path)) as gen: > for document in gen: > yield User.from_json(document) > > def users_in_group(path, group): > with closing(read_users(path)) as gen: > for user in gen: > if user.group == group: > yield user > > In general if you have N nested generators then you need N+1 ``with`` > blocks to clean up 1 file. And good defensive programming would > suggest that any time we use a generator, we should assume the > possibility that there could be at least one ``with`` block somewhere > in its (potentially transitive) call stack, either now or in the > future, and thus always wrap it in a ``with``. But in practice, > basically nobody does this, because programmers would rather write > buggy code than tiresome repetitive code. In simple cases like this > there are some workarounds that good Python developers know (e.g. in > this simple case it would be idiomatic to pass in a file handle > instead of a path and move the resource management to the top level), > but in general we cannot avoid the use of ``with``/``finally`` inside > of generators, and thus dealing with this problem one way or another. > When beauty and correctness fight then beauty tends to win, so it's > important to make correct code beautiful. > > Still, is this worth fixing? Until async generators came along I would > have argued yes, but that it was a low priority, since everyone seems > to be muddling along okay -- but async generators make it much more > urgent. Async generators cannot do cleanup *at all* without some > mechanism for deterministic cleanup that people will actually use, and > async generators are particularly likely to hold resources like file > descriptors. (After all, if they weren't doing I/O, they'd be > generators, not async generators.) So we have to do something, and it > might as well be a comprehensive fix to the underlying problem. And > it's much easier to fix this now when async generators are first > rolling out, then it will be to fix it later. > > The proposal itself is simple in concept: add a ``__(a)iterclose__`` > method to the iterator protocol, and have (async) ``for`` loops call > it when the loop is exited, even if this occurs via ``break`` or > exception unwinding. Effectively, we're taking the current cumbersome > idiom (``with`` block + ``for`` loop) and merging them together into a > fancier ``for``. This may seem non-orthogonal, but makes sense when > you consider that the existence of generators means that ``with`` > blocks actually depend on iterator cleanup to work reliably, plus > experience showing that iterator cleanup is often a desireable feature > in its own right. > > > Alternatives > ============ > > PEP 525 asyncgen hooks > ---------------------- > > PEP 525 proposes a `set of global thread-local hooks managed by new > ``sys.{get/set}_asyncgen_hooks()`` functions > `_, which > allow event loops to integrate with the garbage collector to run > cleanup for async generators. In principle, this proposal and PEP 525 > are complementary, in the same way that ``with`` blocks and > ``__del__`` are complementary: this proposal takes care of ensuring > deterministic cleanup in most cases, while PEP 525's GC hooks clean up > anything that gets missed. But ``__aiterclose__`` provides a number of > advantages over GC hooks alone: > > - The GC hook semantics aren't part of the abstract async iterator > protocol, but are instead restricted `specifically to the async > generator concrete type `_. > If you have an async iterator implemented using a class, like:: > > class MyAsyncIterator: > async def __anext__(): > ... > > then you can't refactor this into an async generator without > changing its semantics, and vice-versa. This seems very unpythonic. > (It also leaves open the question of what exactly class-based async > iterators are supposed to do, given that they face exactly the same > cleanup problems as async generators.) ``__aiterclose__``, on the > other hand, is defined at the protocol level, so it's duck-type > friendly and works for all iterators, not just generators. > > - Code that wants to work on non-CPython implementations like PyPy > cannot in general rely on GC for cleanup. Without ``__aiterclose__``, > it's more or less guaranteed that developers who develop and test on > CPython will produce libraries that leak resources when used on PyPy. > Developers who do want to target alternative implementations will > either have to take the defensive approach of wrapping every ``for`` > loop in a ``with`` block, or else carefully audit their code to figure > out which generators might possibly contain cleanup code and add > ``with`` blocks around those only. With ``__aiterclose__``, writing > portable code becomes easy and natural. > > - An important part of building robust software is making sure that > exceptions always propagate correctly without being lost. One of the > most exciting things about async/await compared to traditional > callback-based systems is that instead of requiring manual chaining, > the runtime can now do the heavy lifting of propagating errors, making > it *much* easier to write robust code. But, this beautiful new picture > has one major gap: if we rely on the GC for generator cleanup, then > exceptions raised during cleanup are lost. So, again, with > ``__aiterclose__``, developers who care about this kind of robustness > will either have to take the defensive approach of wrapping every > ``for`` loop in a ``with`` block, or else carefully audit their code > to figure out which generators might possibly contain cleanup code. > ``__aiterclose__`` plugs this hole by performing cleanup in the > caller's context, so writing more robust code becomes the path of > least resistance. > > - The WSGI experience suggests that there exist important > iterator-based APIs that need prompt cleanup and cannot rely on the > GC, even in CPython. For example, consider a hypothetical WSGI-like > API based around async/await and async iterators, where a response > handler is an async generator that takes request headers + an async > iterator over the request body, and yields response headers + the > response body. (This is actually the use case that got me interested > in async generators in the first place, i.e. this isn't hypothetical.) > If we follow WSGI in requiring that child iterators must be closed > properly, then without ``__aiterclose__`` the absolute most > minimalistic middleware in our system looks something like:: > > async def noop_middleware(handler, request_header, request_body): > async with aclosing(handler(request_body, request_body)) as aiter: > async for response_item in aiter: > yield response_item > > Arguably in regular code one can get away with skipping the ``with`` > block around ``for`` loops, depending on how confident one is that one > understands the internal implementation of the generator. But here we > have to cope with arbitrary response handlers, so without > ``__aiterclose__``, this ``with`` construction is a mandatory part of > every middleware. > > ``__aiterclose__`` allows us to eliminate the mandatory boilerplate > and an extra level of indentation from every middleware:: > > async def noop_middleware(handler, request_header, request_body): > async for response_item in handler(request_header, request_body): > yield response_item > > So the ``__aiterclose__`` approach provides substantial advantages > over GC hooks. > > This leaves open the question of whether we want a combination of GC > hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since > the vast majority of generators are iterated over using a ``for`` loop > or equivalent, ``__aiterclose__`` handles most situations before the > GC has a chance to get involved. The case where GC hooks provide > additional value is in code that does manual iteration, e.g.:: > > agen = fetch_newline_separated_json_from_url(...) > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > # doesn't do 'await agen.aclose()' > > If we go with the GC-hooks + ``__aiterclose__`` approach, this > generator will eventually be cleaned up by GC calling the generator > ``__del__`` method, which then will use the hooks to call back into > the event loop to run the cleanup code. > > If we go with the no-GC-hooks approach, this generator will eventually > be garbage collected, with the following effects: > > - its ``__del__`` method will issue a warning that the generator was > not closed (similar to the existing "coroutine never awaited" > warning). > > - The underlying resources involved will still be cleaned up, because > the generator frame will still be garbage collected, causing it to > drop references to any file handles or sockets it holds, and then > those objects's ``__del__`` methods will release the actual operating > system resources. > > - But, any cleanup code inside the generator itself (e.g. logging, > buffer flushing) will not get a chance to run. > > The solution here -- as the warning would indicate -- is to fix the > code so that it calls ``__aiterclose__``, e.g. by using a ``with`` > block:: > > async with aclosing(fetch_newline_separated_json_from_url(...)) as agen: > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > > Basically in this approach, the rule would be that if you want to > manually implement the iterator protocol, then it's your > responsibility to implement all of it, and that now includes > ``__(a)iterclose__``. > > GC hooks add non-trivial complexity in the form of (a) new global > interpreter state, (b) a somewhat complicated control flow (e.g., > async generator GC always involves resurrection, so the details of PEP > 442 are important), and (c) a new public API in asyncio (``await > loop.shutdown_asyncgens()``) that users have to remember to call at > the appropriate time. (This last point in particular somewhat > undermines the argument that GC hooks provide a safe backup to > guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called > correctly then I *think* it's possible for generators to be silently > discarded without their cleanup code being called; compare this to the > ``__aiterclose__``-only approach where in the worst case we still at > least get a warning printed. This might be fixable.) All this > considered, GC hooks arguably aren't worth it, given that the only > people they help are those who want to manually call ``__anext__`` yet > don't want to manually call ``__aiterclose__``. But Yury disagrees > with me on this :-). And both options are viable. > > > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: > > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > ``with`` blocks problem". But unfortunately, it breaks down quickly > when things get more complex. Consider if instead of reading from a > file, our generator was reading from a streaming HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. So this is really a workaround for simple > cases, not a general solution. > > > More complex variants of __(a)iterclose__ > ----------------------------------------- > > The semantics of ``__(a)iterclose__`` are somewhat inspired by > ``with`` blocks, but context managers are more powerful: > ``__(a)exit__`` can distinguish between a normal exit versus exception > unwinding, and in the case of an exception it can examine the > exception details and optionally suppress propagation. > ``__(a)iterclose__`` as proposed here does not have these powers, but > one can imagine an alternative design where it did. > > However, this seems like unwarranted complexity: experience suggests > that it's common for iterables to have ``close`` methods, and even to > have ``__exit__`` methods that call ``self.close()``, but I'm not > aware of any common cases that make use of ``__exit__``'s full power. > I also can't think of any examples where this would be useful. And it > seems unnecessarily confusing to allow iterators to affect flow > control by swallowing exceptions -- if you're in a situation where you > really want that, then you should probably use a real ``with`` block > anyway. > > > Specification > ============= > > This section describes where we want to eventually end up, though > there are some backwards compatibility issues that mean we can't jump > directly here. A later section describes the transition plan. > > > Guiding principles > ------------------ > > Generally, ``__(a)iterclose__`` implementations should: > > - be idempotent, > - perform any cleanup that is appropriate on the assumption that the > iterator will not be used again after ``__(a)iterclose__`` is called. > In particular, once ``__(a)iterclose__`` has been called then calling > ``__(a)next__`` produces undefined behavior. > > And generally, any code which starts iterating through an iterable > with the intention of exhausting it, should arrange to make sure that > ``__(a)iterclose__`` is eventually called, whether or not the iterator > is actually exhausted. > > > Changes to iteration > -------------------- > > The core proposal is the change in behavior of ``for`` loops. Given > this Python code:: > > for VAR in ITERABLE: > LOOP-BODY > else: > ELSE-BODY > > we desugar to the equivalent of:: > > _iter = iter(ITERABLE) > _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) > try: > traditional-for VAR in _iter: > LOOP-BODY > else: > ELSE-BODY > finally: > _iterclose(_iter) > > where the "traditional-for statement" here is meant as a shorthand for > the classic 3.5-and-earlier ``for`` loop semantics. > > Besides the top-level ``for`` statement, Python also contains several > other places where iterators are consumed. For consistency, these > should call ``__iterclose__`` as well using semantics equivalent to > the above. This includes: > > - ``for`` loops inside comprehensions > - ``*`` unpacking > - functions which accept and fully consume iterables, like > ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and > others. > > > Changes to async iteration > -------------------------- > > We also make the analogous changes to async iteration constructs, > except that the new slot is called ``__aiterclose__``, and it's an > async method that gets ``await``\ed. > > > Modifications to basic iterator types > ------------------------------------- > > Generator objects (including those created by generator comprehensions): > - ``__iterclose__`` calls ``self.close()`` > - ``__del__`` calls ``self.close()`` (same as now), and additionally > issues a ``ResourceWarning`` if the generator wasn't exhausted. This > warning is hidden by default, but can be enabled for those who want to > make sure they aren't inadverdantly relying on CPython-specific GC > semantics. > > Async generator objects (including those created by async generator > comprehensions): > - ``__aiterclose__`` calls ``self.aclose()`` > - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been > called, since this probably indicates a latent bug, similar to the > "coroutine never awaited" warning. > > QUESTION: should file objects implement ``__iterclose__`` to close the > file? On the one hand this would make this change more disruptive; on > the other hand people really like writing ``for line in open(...): > ...``, and if we get used to iterators taking care of their own > cleanup then it might become very weird if files don't. > > > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # QUESTION: I feel like there might be a better name for this one? > class preserve(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.preserve(handle): > ... > handle.seek(0) > for line in itertools.preserve(handle): > ... > > The ``operator`` module gains two new functions, with semantics > equivalent to the following:: > > def iterclose(it): > if hasattr(type(it), "__iterclose__"): > type(it).__iterclose__(it) > > async def aiterclose(ait): > if hasattr(type(ait), "__aiterclose__"): > await type(ait).__aiterclose__(ait) > > These are particularly useful when implementing the changes in the next section: > > > __iterclose__ implementations for iterator wrappers > --------------------------------------------------- > > Python ships a number of iterator types that act as wrappers around > other iterators: ``map``, ``zip``, ``itertools.accumulate``, > ``csv.reader``, and others. These iterators should define a > ``__iterclose__`` method which calls ``__iterclose__`` in turn on > their underlying iterators. For example, ``map`` could be implemented > as:: > > class map: > def __init__(self, fn, *iterables): > self._fn = fn > self._iters = [iter(iterable) for iterable in iterables] > > def __iter__(self): > return self > > def __next__(self): > return self._fn(*[next(it) for it in self._iters]) > > def __iterclose__(self): > for it in self._iters: > operator.iterclose(it) > > In some cases this requires some subtlety; for example, > ```itertools.tee`` > `_ > should not call ``__iterclose__`` on the underlying iterator until it > has been called on *all* of the clone iterators. > > > Example / Rationale > ------------------- > > The payoff for all this is that we can now write straightforward code like:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > and be confident that the file will receive deterministic cleanup > *without the end-user having to take any special effort*, even in > complex cases. For example, consider this silly pipeline:: > > list(map(lambda key: key.upper(), > doc["key"] for doc in read_newline_separated_json(path))) > > If our file contains a document where ``doc["key"]`` turns out to be > an integer, then the following sequence of events will happen: > > 1. ``key.upper()`` raises an ``AttributeError``, which propagates out > of the ``map`` and triggers the implicit ``finally`` block inside > ``list``. > 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the > map object. > 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator > comprehension object. > 4. This injects a ``GeneratorExit`` exception into the generator > comprehension body, which is currently suspended inside the > comprehension's ``for`` loop body. > 5. The exception propagates out of the ``for`` loop, triggering the > ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__`` on the generator object representing the call to > ``read_newline_separated_json``. > 6. This injects an inner ``GeneratorExit`` exception into the body of > ``read_newline_separated_json``, currently suspended at the ``yield``. > 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, > triggering the ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__()`` on the file object. > 8. The file object is closed. > 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary > of the generator function, and causes > ``read_newline_separated_json``'s ``__iterclose__()`` method to return > successfully. > 10. Control returns to the generator comprehension body, and the outer > ``GeneratorExit`` continues propagating, allowing the comprehension's > ``__iterclose__()`` to return successfully. > 11. The rest of the ``__iterclose__()`` calls unwind without incident, > back into the body of ``list``. > 12. The original ``AttributeError`` resumes propagating. > > (The details above assume that we implement ``file.__iterclose__``; if > not then add a ``with`` block to ``read_newline_separated_json`` and > essentially the same logic goes through.) > > Of course, from the user's point of view, this can be simplified down to just: > > 1. ``int.upper()`` raises an ``AttributeError`` > 1. The file object is closed. > 2. The ``AttributeError`` propagates out of ``list`` > > So we've accomplished our goal of making this "just work" without the > user having to think about it. > > > Transition plan > =============== > > While the majority of existing ``for`` loops will continue to produce > identical results, the proposed changes will produce > backwards-incompatible behavior in some cases. Example:: > > def read_csv_with_header(lines_iterable): > lines_iterator = iter(lines_iterable) > for line in lines_iterator: > column_names = line.strip().split("\t") > break > for line in lines_iterator: > values = line.strip().split("\t") > record = dict(zip(column_names, values)) > yield record > > This code used to be correct, but after this proposal is implemented > will require an ``itertools.preserve`` call added to the first ``for`` > loop. > > [QUESTION: currently, if you close a generator and then try to iterate > over it then it just raises ``Stop(Async)Iteration``, so code the > passes the same generator object to multiple ``for`` loops but forgets > to use ``itertools.preserve`` won't see an obvious error -- the second > ``for`` loop will just exit immediately. Perhaps it would be better if > iterating a closed generator raised a ``RuntimeError``? Note that > files don't have this problem -- attempting to iterate a closed file > object already raises ``ValueError``.] > > Specifically, the incompatibility happens when all of these factors > come together: > > - The automatic calling of ``__(a)iterclose__`` is enabled > - The iterable did not previously define ``__(a)iterclose__`` > - The iterable does now define ``__(a)iterclose__`` > - The iterable is re-used after the ``for`` loop exits > > So the problem is how to manage this transition, and those are the > levers we have to work with. > > First, observe that the only async iterables where we propose to add > ``__aiterclose__`` are async generators, and there is currently no > existing code using async generators (though this will start changing > very soon), so the async changes do not produce any backwards > incompatibilities. (There is existing code using async iterators, but > using the new async for loop on an old async iterator is harmless, > because old async iterators don't have ``__aiterclose__``.) In > addition, PEP 525 was accepted on a provisional basis, and async > generators are by far the biggest beneficiary of this PEP's proposed > changes. Therefore, I think we should strongly consider enabling > ``__aiterclose__`` for ``async for`` loops and async generators ASAP, > ideally for 3.6.0 or 3.6.1. > > For the non-async world, things are harder, but here's a potential > transition path: > > In 3.7: > > Our goal is that existing unsafe code will start emitting warnings, > while those who want to opt-in to the future can do that immediately: > > - We immediately add all the ``__iterclose__`` methods described above. > - If ``from __future__ import iterclose`` is in effect, then ``for`` > loops and ``*`` unpacking call ``__iterclose__`` as specified above. > - If the future is *not* enabled, then ``for`` loops and ``*`` > unpacking do *not* call ``__iterclose__``. But they do call some other > method instead, e.g. ``__iterclose_warning__``. > - Similarly, functions like ``list`` use stack introspection (!!) to > check whether their direct caller has ``__future__.iterclose`` > enabled, and use this to decide whether to call ``__iterclose__`` or > ``__iterclose_warning__``. > - For all the wrapper iterators, we also add ``__iterclose_warning__`` > methods that forward to the ``__iterclose_warning__`` method of the > underlying iterator or iterators. > - For generators (and files, if we decide to do that), > ``__iterclose_warning__`` is defined to set an internal flag, and > other methods on the object are modified to check for this flag. If > they find the flag set, they issue a ``PendingDeprecationWarning`` to > inform the user that in the future this sequence would have led to a > use-after-close situation and the user should use ``preserve()``. > > In 3.8: > > - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` > > In 3.9: > > - Enable the ``__future__`` unconditionally and remove all the > ``__iterclose_warning__`` stuff. > > I believe that this satisfies the normal requirements for this kind of > transition -- opt-in initially, with warnings targeted precisely to > the cases that will be effected, and a long deprecation cycle. > > Probably the most controversial / risky part of this is the use of > stack introspection to make the iterable-consuming functions sensitive > to a ``__future__`` setting, though I haven't thought of any situation > where it would actually go wrong yet... > > > Acknowledgements > ================ > > Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for > helpful discussion on earlier versions of this idea. > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From yselivanov.ml at gmail.com Wed Oct 19 18:57:52 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 19 Oct 2016 18:57:52 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <48ffe815-2b91-7701-6b6d-a19cea87557f@gmail.com> On 2016-10-19 6:07 PM, Paul Moore wrote: > I missed that you propose phasing this in, but it doesn't really alter > much, I think the current behaviour is valuable and common, and I'm -1 > on breaking it. It's just too much of a fundamental change to how > loops and iterators interact for me to be comfortable with it - > particularly as it's only needed for a very specific use case (none of > my programs ever use async - why should I have to rewrite my loops > with a clumsy extra call just to cater for a problem that only occurs > in async code?) If I understand Nathaniel's proposal, fixing 'async for' isn't the only motivation. Moreover, async generators aren't that different from sync generators in terms of finalization. Yury From chris.barker at noaa.gov Wed Oct 19 19:48:04 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 19 Oct 2016 16:48:04 -0700 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: a few thoughts: On Wed, Oct 19, 2016 at 12:08 PM, Todd wrote: > I have been thinking about how to go about having a multidimensional array > constructor in python. I know that Python doesn't have a built-in > multidimensional array class and won't for the foreseeable future. > no but it does have buffers and memoryviews and the extended buffer protocol supports "strided" data -- i.e. multi-dimensional arrays. So it would be nice to have SOME simple ndarray object in the standard library that would wrap such buffers -- it would be nice for working with image data, interacting with numpy arrays, etc. The "trick" is that once you have the container, you want some functionality -- so you add indexing and slicing -- natch. Then maybe some simple math? then.... eventually, you are trying to put all of numpy into the stdlib, and we already know we don't want to do that. Though I still think a simple container that only supports indexing and slicing would be lovely. That all being said: a = [| 0, 1, 2 || 3, 4, 5 |] > I really don't see the advantage of that over: a = [[0, 1, 2],[3, 4, 5]] really I don't -- and I'm a heavy numpy user, so I write a lot of those! If there is a problem with the current options (and I'm not convinced there is) it's that it in'st a literal for multidimensional array, but rather a literal for a bunch of nested lists -- the list themselves are created, and so are all the "boxed" values in the array -- only to be pulled out and unboxed to be put in the array. However, this is only for literals -- if your data are large, then they are not going to be in literals, but rather read form a file or something, so this is really not much of a limitation. However, if you really don't like it, then you can pass a string to aconfsturctor function instead: a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") yeah, you need to type the extra quotes, but that's not much. NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment. b = [| 0, 1, 2 | > | 3, 4, 5 |] > b = [[ 0, 1, 2 ], [ 3, 4, 5 ]] You can also create a 2D row array by combining the two: > > a = [|| 0, 1, 2 ||] > a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]] (I can't tell, so maybe your syntax is not so clear??? > For higher dimensions, you can just put more lines together: > > a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] > > b = [||| 0, 1, 2 > || 3, 4, 5 > ||| 6, 7, 8 > || 9, 10, 11 > |||] > I have no idea what that means! > c = [||| 0, 1, 2 | > | 3, 4, 5 | > | > | 6, 7, 8 | > | 9, 10, 11 |||] > > > A 3D row vector would just be: > > a = [||| 0, 1, 2 |||] > > A 3d column vector would be: > > a = [||| 0 || 1 || 2 |||] > > b = [||| 0 > || 1 > || 2 > |||] > > A 3D depth vector would be: > > a = [||| 0 ||| 1 ||| 2 |||] > > b = [||| 0 > ||| 1 > ||| 2 > |||] > nor these.... > At least in my opinion, this sort of approach really shines when making > higher-dimensional arrays. These would all be equivalent (the | at the > beginning and end are just to make it easier to align indentation, they > aren't required): > > a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 > || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 > ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 > || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 > |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 > || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 > ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 > || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 > ||||] > it does seem that you are saving some typing when you have high-dim arrays, but I really dont see the readability here. > > I think both of the new examples are considerably clearer than the current > approach. > not to me :-( but anyway, the way to more this kind of thing forward is to use it as a new format in an existing lib (like numpy, by passing it as a big string. IF folks like it and start using it, then there is room for a conversation. But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Oct 19 20:13:24 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 19 Oct 2016 20:13:24 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker wrote: > > > However, if you really don't like it, then you can pass a string to aconfsturctor function instead: > > a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") > > yeah, you need to type the extra quotes, but that's not much. > > NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment. You are probably thinking of the numpy.matrix constructor: >>> a = np.matrix('1 2; 3 4') >>> print(a) [[1 2] [3 4]] See . -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Oct 19 20:32:54 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 19 Oct 2016 20:32:54 -0400 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker wrote: > a few thoughts: > > On Wed, Oct 19, 2016 at 12:08 PM, Todd wrote: > >> I have been thinking about how to go about having a multidimensional >> array constructor in python. I know that Python doesn't have a built-in >> multidimensional array class and won't for the foreseeable future. >> > > no but it does have buffers and memoryviews and the extended buffer > protocol supports "strided" data -- i.e. multi-dimensional arrays. So it > would be nice to have SOME simple ndarray object in the standard library > that would wrap such buffers -- it would be nice for working with image > data, interacting with numpy arrays, etc. > > The "trick" is that once you have the container, you want some > functionality -- so you add indexing and slicing -- natch. Then maybe some > simple math? then.... eventually, you are trying to put all of numpy into > the stdlib, and we already know we don't want to do that. > > Though I still think a simple container that only supports indexing and > slicing would be lovely. > > That all being said: > > a = [| 0, 1, 2 || 3, 4, 5 |] >> > > I really don't see the advantage of that over: > > a = [[0, 1, 2],[3, 4, 5]] > > really I don't -- and I'm a heavy numpy user, so I write a lot of those! > > If there is a problem with the current options (and I'm not convinced > there is) it's that it in'st a literal for multidimensional array, but > rather a literal for a bunch of nested lists -- the list themselves are > created, and so are all the "boxed" values in the array -- only to be > pulled out and unboxed to be put in the array. > > But as you said, that is not a multidimensional array. We aren't comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3, 4, 5]])". That is a bigger difference. > However, this is only for literals -- if your data are large, then they > are not going to be in literals, but rather read form a file or something, > so this is really not much of a limitation. > Even if your original data is large, I often need smaller areas when processing, for example for broadcasting or as arguments to processing functions. > > However, if you really don't like it, then you can pass a string to > aconfsturctor function instead: > > a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") > > yeah, you need to type the extra quotes, but that's not much. > Then you need an even longer function call. Again, that defeats the purpose of having a literal, which is to make the syntax more concise. > > NOTE: I'm pretty sure numpy has something like this already, for folks > that like the MATLAB style -- though I can't find it at the moment. > It is: r_[[0, 1, 2], [3, 4, 5] But this uses indexing behind the scenes, meaning your data is created as an index then needs to be converted to a list later. This adds considerable overhead. I just tested it and it was somewhere around 20 times slower than "np.array()" in the test. > > b = [| 0, 1, 2 | >> | 3, 4, 5 |] >> > > b = [[ 0, 1, 2 ], > [ 3, 4, 5 ]] > > > No? this is the equivalent of: b = np.array([[ 0, 1, 2 ], [ 3, 4, 5 ]]) The whole point of this is to avoid the "np.array" call. > You can also create a 2D row array by combining the two: >> >> a = [|| 0, 1, 2 ||] >> > > a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]] > > (I can't tell, so maybe your syntax is not so clear??? > I am not clear where the ambiguity lies? Count the number of "|" symbols. > > >> For higher dimensions, you can just put more lines together: >> >> a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] >> >> b = [||| 0, 1, 2 >> || 3, 4, 5 >> ||| 6, 7, 8 >> || 9, 10, 11 >> |||] >> > > I have no idea what that means! > ||| is the delimiter for the third dimension, || is the delimiter for the second dimension. It is like how newline is used as a delimeter for the second dimension in CSV files. So it is equivalent to: b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]) > > >> At least in my opinion, this sort of approach really shines when making >> higher-dimensional arrays. These would all be equivalent (the | at the >> beginning and end are just to make it easier to align indentation, they >> aren't required): >> >> a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 >> || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 >> ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 >> || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 >> |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 >> || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 >> ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 >> || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 >> ||||] >> > > It does seem that you are saving some typing when you have high-dim > arrays, but I really dont see the readability here. > If you are used to counting braces, perhaps. But imagine someone who is just starting out. How do you describe how to determine what dimension is being split? "It is one more than total number of sequential left braces and left parentheses" vs ?it is the number of vertical lines". Add to that having to deal with both left and right braces rather than a single delimiter adds a lot of visual noise. There is a reason we use commas rather than, say ">,<" as a delimiter in lists, it is easier to deal with a single kind of symbol rather than three (or potentially five in the current case). > > > but anyway, the way to more this kind of thing forward is to use it as a > new format in an existing lib (like numpy, by passing it as a big string. > IF folks like it and start using it, then there is room for a conversation. > The big problem with that is that having to wrap it as a string and pass it to a function in the numpy namespace loses much of the advantage from having a literal to begin with. > > But I doubt (and I wouldn't support) that anyone would put a literal into > python for an object that doesn't exist in python... > > Yes, I understand that. But some projects are already doing that on their own. I think having a way for them to do it without losing the list constructor (which is the approach currently being taken) would be a benefit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Wed Oct 19 21:04:32 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Thu, 20 Oct 2016 01:04:32 +0000 Subject: [Python-ideas] Py_SIZE of PyLongs Message-ID: A quick note: I'm working on a special-case compare function for bounded integers for the sort stuff. By looking at the implementation, I figured out that Py_SIZE of a long is the sign times the number of digits (...right?). Before looking at the implementation, though, I had looked for this info in the docs, and I couldn't find it anywhere. Since Py_SIZE is public, I think the documentation should make clear what it returns for PyLongs, for example somewhere on the "Integer Objects" page. Apologies if this is specified somewhere else in the docs and I just couldn't find it. Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomuxiong at gmx.com Wed Oct 19 21:24:30 2016 From: tomuxiong at gmx.com (Thomas Nyberg) Date: Wed, 19 Oct 2016 21:24:30 -0400 Subject: [Python-ideas] Py_SIZE of PyLongs In-Reply-To: References: Message-ID: On 10/19/2016 09:04 PM, Elliot Gorokhovsky wrote: > A quick note: > > I'm working on a special-case compare function for bounded integers for > the sort stuff. By looking at the implementation, I figured out that > Py_SIZE of a long is the sign times the number of digits (...right?). > Before looking at the implementation, though, I had looked for this info > in the docs, and I couldn't find it anywhere. Since Py_SIZE is public, I > think the documentation should make clear what it returns for PyLongs, > for example somewhere on the "Integer Objects" page. Apologies if this > is specified somewhere else in the docs and I just couldn't find it. > > Elliot I don't think this is right. https://github.com/python/cpython/blob/master/Include/object.h#L119 https://docs.python.org/3/c-api/structures.html#c.Py_SIZE https://docs.python.org/3/c-api/structures.html#c.PyVarObject It returns the `ob_size` fields of a PyVarObject. I think this has to do with objects with variable sizes like lists. PyLongs are not PyVarObjects because they have no notion of length. Why would a long be stored as a sequence of digits instead of a (say) 64 bit integer as 8 bytes? Cheers, Thomas From tjreedy at udel.edu Wed Oct 19 22:07:18 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 19 Oct 2016 22:07:18 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 10/19/2016 12:38 AM, Nathaniel Smith wrote: > I'd like to propose that Python's iterator protocol be enhanced to add > a first-class notion of completion / cleanup. With respect the the standard iterator protocol, a very solid -1 from me. (I leave commenting specifically on __aiterclose__ to Yury.) 1. I consider the introduction of iterables and the new iterator protocol in 2.2 and their gradual replacement of lists in many situations to be the greatest enhancement to Python since 1.3 (my first version). They are, to me, they one of Python's greatest features and the minimal nature of the protocol an essential part of what makes them great. 2. I think you greatly underestimate the negative impact, just as we did with changing str is bytes to str is unicode. The change itself, embodied in for loops, will break most non-trivial programs. You yourself note that there will have to be pervasive changes in the stdlib just to begin fixing the breakage. 3. Though perhaps common for what you do, the need for the change is extremely rare in the overall Python world. Iterators depending on an external resource are rare (< 1%, I would think). Incomplete iteration is also rare (also < 1%, I think). And resources do not always need to releases immediately. 4. Previous proposals to officially augment the iterator protocol, even with optional methods, have been rejected, and I think this one should be too. a. Add .__len__ as an option. We added __length_hint__, which an iterator may implement, but which is not part of the iterator protocol. It is also ignored by bool(). b., c. Add __bool__ and/or peek(). I posted a LookAhead wrapper class that implements both for most any iterable. I suspect that the is rarely used. > def read_newline_separated_json(path): > with open(path) as file_handle: # <-- with block > for line in file_handle: > yield json.loads(line) One problem with passing paths around is that it makes the receiving function hard to test. I think functions should at least optionally take an iterable of lines, and make the open part optional. But then closing should also be conditional. If the combination of 'with', 'for', and 'yield' do not work together, then do something else, rather than changing the meaning of 'for'. Moving responsibility for closing the file from 'with' to 'for', makes 'with' pretty useless, while overloading 'for' with something that is rarely needed. This does not strike me as the right solution to the problem. > for document in read_newline_separated_json(path): # <-- outer for loop > ... If the outer loop determines when the file should be closed, then why not open it there? What fails with try: lines = open(path) gen = read_newline_separated_json(lines) for doc in gen: do_something(doc) finally: lines.close # and/or gen.throw(...) to stop the generator. -- Terry Jan Reedy From elliot.gorokhovsky at gmail.com Wed Oct 19 22:46:18 2016 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Thu, 20 Oct 2016 02:46:18 +0000 Subject: [Python-ideas] Py_SIZE of PyLongs In-Reply-To: References: Message-ID: It's in the code. See longobject.c: Py_SIZE(v) = ndigits*sign; You can also see Py_SIZE(v) used on PyLongs all over the place in longobject.c, for example: v = (PyLongObject *)vv; i = Py_SIZE(v); Just do a ctrl-f for Py_SIZE(v) in longobject.c. Like I said, by looking in the implementation I was able to figure out that Py_SIZE is interpreted as the sign times the number of digits (unless I'm missing something), but this should be in the docs IMO. On Wed, Oct 19, 2016 at 7:24 PM Thomas Nyberg wrote: > On 10/19/2016 09:04 PM, Elliot Gorokhovsky wrote: > > A quick note: > > > > I'm working on a special-case compare function for bounded integers for > > the sort stuff. By looking at the implementation, I figured out that > > Py_SIZE of a long is the sign times the number of digits (...right?). > > Before looking at the implementation, though, I had looked for this info > > in the docs, and I couldn't find it anywhere. Since Py_SIZE is public, I > > think the documentation should make clear what it returns for PyLongs, > > for example somewhere on the "Integer Objects" page. Apologies if this > > is specified somewhere else in the docs and I just couldn't find it. > > > > Elliot > > I don't think this is right. > > > https://github.com/python/cpython/blob/master/Include/object.h#L119 > https://docs.python.org/3/c-api/structures.html#c.Py_SIZE > https://docs.python.org/3/c-api/structures.html#c.PyVarObject > > It returns the `ob_size` fields of a PyVarObject. I think this has to do > with objects with variable sizes like lists. PyLongs are not > PyVarObjects because they have no notion of length. > > Why would a long be stored as a sequence of digits instead of a (say) 64 > bit integer as 8 bytes? > > Cheers, > Thomas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Oct 19 23:00:04 2016 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 19 Oct 2016 22:00:04 -0500 Subject: [Python-ideas] Py_SIZE of PyLongs In-Reply-To: References: Message-ID: [Elliot Gorokhovsky ] > I'm working on a special-case compare function for bounded integers for the > sort stuff. By looking at the implementation, I figured out that Py_SIZE of > a long is the sign times the number of digits (...right?). > ... Please ignore the other reply you got - they clearly aren't familiar with the code. The details are explained in Include/longintrepr.h. In short, an integer _is_ based on PyVarObject. Py_SIZE is a macro that merely extracts (or allows to set) the ob_size member, the sign of the int is stored as the sign of ob_size (which is really an abuse of ob_size's intended meaning), and the number of "digits" is the absolute value of ob_size. And ob_size is 0 if and only if the int is 0. Note that the number of bits per digit varies across platforms. That too is all explained in longintrepr.h. From steve at pearwood.info Wed Oct 19 23:31:47 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 20 Oct 2016 14:31:47 +1100 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: <20161020033145.GG22471@ando.pearwood.info> On Wed, Oct 19, 2016 at 03:08:21PM -0400, Todd wrote: [taking your later comment out of the order it was written] > If this sort of thing doesn't interest you I won't be offended if you stop > reading now, and I apologize if it is considered off-topic for this ML. No problem Todd, we shouldn't be offended by ideas, and this is definitely on-topic. > I have been thinking about how to go about having a multidimensional array > constructor in python. I know that Python doesn't have a built-in > multidimensional array class and won't for the foreseeable future. Generally speaking, Python doesn't invent syntax just on the off-chance that it will come in handy, nor does it typically invent operators for third-party libraries to use if they have no use in the built-ins. I'm only aware of two exceptions to this, and both were added for numpy: extended slicing seq[start:end:step] and matrix multiplication A @ B. Extended slicing now is used by the built-ins, but originally it was added specifically for numpy. However, in both cases, the suggestion came from the numpy developers themselves, and they had a specific, concrete need for the feature. Both features were solutions to real problems found by numpy users. I wasn't around when extended slicing was added, but matrix multiplication is an excellent example of a well-researched, well-written PEP: http://python.org/dev/peps/pep-0465/ Whereas your suggestion seems more like a solution in search of a problem. You've come up with syntax for building arrays, but you don't seem to know which, if any, array will use this; nor do you seem to have identified an actual problem with the existing solution used by numpy (apart from calling them "somewhat verbose"). > The problem is finding an operator that isn't already being used, wouldn't > conflict with existing rules, wouldn't break existing code, but that would > still be at clearer and and more concise than the current syntax. Just a brief note on terminology: you're not describing an operator, you're describing a "display" syntax: delimiters used to build a type such as tuple, list or dict. I still think of them as "list literals" etc, [1, 2, 3, 4] for example, even though technically they are not necessary literals (i.e. known at compile-time) and officially they are called "list displays" etc. > The notation I came up with uses "[|" and "|]". I picked this for 4 > reasons. First, it isn't currently valid python syntax. Second, it is > clearly connected with the list constructor "[ ]". Third, it is > reminiscent of the "? ?" symbols used for matrices in mathematics. Sometimes used for matrices. Its more common to use a multiple-line version of [ ] which is, of course, hard to type in a regular editor :-) See examples of matricies here: http://mathworld.wolfram.com/Matrix.html Moving on to the multi-dimensional examples you give: > For a 2D array, you would use two vertical bars as a dimension separator > "||" (multiple vertical bars are also not valid python syntax): > > a = [| 0, 1, 2 || 3, 4, 5 |] > > Or, on multiple lines (whitespace is ignored): > > a = [| 0, 1, 2 || > 3, 4, 5 |] To me, that looks decidedly strange. The | symbol has the disadvantage that you cannot tell which is opening a row and which is closing a row. The above looks like: - first row: opened with a single bar, closed with two bars; - second row: no opening delimiter at all, closed with a single bar. I think that you have to compete with existing syntax for nested lists. The lowest common denominator for any array is to use nested lists and a function call. Nested lists can be easily converted into *any* array type you like, rather than picking out one, and only one, array type for special treatment. If Python had a built-in array type, then maybe this would be justified, but it doesn't, and isn't likely to get one: lists fill the role that arrays do in most other languages. There is an array type in the standard library, array.array, but its not built-in and not important enough to be built-in or to get special syntax of its own. And I'm not sure that numpy users miss the ability to write multi-dimensional arrays using syntax instead of a function call. Normally they would want the ability to specify a type and an order (rows first, like C, or columns first, like Fortran), and I think that for multi-dimensional arrays it is more usual and simpler to write out the values in a linear array and tell the array constructor to re-arrange them. Trying to write out a visual representation of anything with more than two dimensions is cumbersome when you are limited to the flat plan of a text file. Consider: [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] If your editor can highlight matching brackets, its quite easy to see where each row and plane begins and ends. Whereas your suggested syntax looks to me like a whole bunch of confusing lines. I cannot even work out what are the dimensions of this example: > b = [||| 0, 1, 2 > || 3, 4, 5 > ||| 6, 7, 8 > || 9, 10, 11 > |||] although if I sit and stare at it for a while I might guess... 4*3? If I already know it is meant to be 3D, then I might be able to work out that the extra bar means something, and guess 2*3*2, but I really wouldn't want to bet my sanity on understanding what those lines mean. (Especially since, later on, the exact number and placement of lines is optional.) What's the rule for when to use triple bars ||| and when to use double bars || or a single bar | ? It's a mystery to me. At least with matching left and right delimiters [ ] I can match them up to see where they begin and end. > The rule for the number of dimensions is just the highest-specified > dimension. So these are equivalent: > > a = [| 0, 1, 2 || > 3, 4, 5 |] > > b = [|| 0, 1, 2 || > 3, 4, 5 ||] Okay, now I'm completely lost. Doesn't the first example with a single vertical bar | mean that it is a 1D array? What's the "highest-specified dimension"? Are you suggesting that we have to count vertical bars to work out the dimension? > This also means you would only strictly need to set the dimensions at one > end. That means these are equivalent, although the second and third case > should be discouraged: > > a = [|| 0, 1, 2 ||] > > b = [| 0, 1, 2 ||] > > c = [|| 0, 1, 2 |] This strikes me as a HUGE bug magnet. More like a bug black hole actually, sucking in bugs from all through the universe and inserting them into your arrays... *wink* Effectively, what you are saying is that *as an intentional feature*, a stray | accidentally inserted into your array will not cause a syntax error, but will instead increase the number of dimensions of the array. So instead of having a 17*10*30 array as you expected, you have a 1*17*10*30 or 17*10*30*1 array, which may or may not fail deep in your code with some more or less unexpected and hard to diagnose error. This (anti-)feature also makes syntax highlighting of matching bars impossible, instead of merely fiendishly difficult. Since it isn't an error for the bars not to match, you can't even count the bars to work out which ones are supposed to match. You have to somehow intuit or guess what the dimensions of the array are supposed to be, then reason backwards to see whether the right number of bars are in the right places to be compatible with those dimensions, and if not, your guess of the dimensions might be wrong... or not. > At least in my opinion, this sort of approach really shines when making > higher-dimensional arrays. You should compare your approach to that of mathematicians and other programming languages. Mathematicians don't really use multi-dimensional arrays. They have vectors, which are 1D, and matrices which are 2D, then they have tensors which confuse me, but they don't seem to use anything which corresponds to a simple higher-dimension analog of matrices. Tensors come close, but they don't seem to have anything like matrix-notation for tensors. (Given that tensors are often infinite dimensional, I'm hardly surprised.) Matlab has syntax for 2D arrays, which can be expanded into 3D: A = [1 2; 3 4]; A(:,:,2) = [5 6; 7 8] R has an array function: > array(1:8, c(2,2,2)) , , 1 [,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 Differences in ordering (row first or column first) aside, they are equivalent to Python's: [[[1, 2], [3, 4]], [[5, 6], [7, 8]], ] My HP-48 calculator uses square brackets for matrixes, with the convenience that in the calculator interface I only need to close the first pair of brackets: 2D: I can enter the keystrokes: [[1 2] 3 4 to get the 2D matrix: [[ 1 2 ] [ 3 4 ]] but it has no support for 3D arrays. Here's how C# does it: https://msdn.microsoft.com/en-us/library/2yd9wwz4.aspx > a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 > || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 > ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 > || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 > |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 > || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 > ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 > || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 > ||||] I wouldn't even want to guess what dimensions that is supposed to be. 10 columns, because I can count them, but everything else is a mystery. > Compared to the current approach: > > a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], > [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], > [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], > [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], > [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], > [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], > [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], > [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) But that's easy! Look at the nested brackets. The opening sequence tells you that there are four dimensions: [[[[ I can count the ten columns (and if I align them, I can visually verify that each row has exactly ten columns). Looking at the nested lists, I see: [[[[ten columns], [ten columns]], so that's two rows by ten, then continuing: [2 x 10]], which closes another layer, so that's 2 items in the third dimension, then when have another dimension: [2 x 10 x 2]] and the array is closed, giving us in total: 2 x 10 x 2 x 2 In my opinion anyone trying to write out a single 4D array like this is opening themselves up to a hiding for nothing, even with clear nesting and matching open/close delimiters. Since we don't have 4D text files, it's better to write: L = [48, 11, 141, 13, -60, -37, 58, -52, -29, 134, -6, 96, -66, 137, -59, -147, -118, -104, -123, -7, -103, 50, -89, -12, 28, -12, 119, -131, -73, 21, -58, 105, 25, -138, -106, -118, -29, -49, -63, -56, -43, -34, 101, -115, 41, 121, 3, -117, 101, -145, 100, -128, 76, 128, -113, -90, 52, -91, -72, -15, 22, -65, -118, 134, -58, 55, -73, -118, -53, -60, -85, -136, 83, -66, -35, -117, -71, 115, -56, 133] assert len(L) == 2*10*2*2 arr = array(L, dim=(2,10,2,2)) or something similar, and let the array constructor resize as needed. -- Steve From greg.ewing at canterbury.ac.nz Thu Oct 20 03:00:12 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 20 Oct 2016 20:00:12 +1300 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> Message-ID: <58086B7C.7010806@canterbury.ac.nz> Matt Gilson wrote: > I think that > it was mentioned that it might be possible for a user to _register_ a > callable that would then be used when this syntax was envoked -- But > having a global setting like that leads to contention. I think for that to fly it would have to be a per-module thing. Then each module using the syntax would be able to choose the meaning of it. A simple way to do this would be for the compiler to translate it into something like __array__([[[ ... ]]]) and then you would just define __array__ appropriately, e.g. from numpy import array as __array__ Personally I'm not very enthusiastic about the whole thing, though. I don't find the new syntax to be much of an improvement, if any. Certainly nowhere near enough to be worth adding syntax. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 20 03:20:59 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 20 Oct 2016 20:20:59 +1300 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: <5808705B.2050102@canterbury.ac.nz> Todd wrote: > ||| is the delimiter for the third dimension, || is the delimiter for > the second dimension. This seems a bit inconsistent. It appears the rule is "n vertical bars is the delimiter for the nth dimension". By that rule, the delimiter for the first dimension should be a single vertical bar, but instead it's a comma. Also, it's not very clear why when you have a 2D array with two rows you write [| 1,2,3 || 4,5,6 |] i.e. with *one* vertical bar at each end, but when there is only one row you write [|| 1,2,3 ||] i.e. with *two* vertical bars at each end. -- Greg From mertz at gnosis.cx Thu Oct 20 10:15:09 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 20 Oct 2016 07:15:09 -0700 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: <58086B7C.7010806@canterbury.ac.nz> References: <50ca4991-54bc-f7c6-d4a6-8fef5361f8c6@gmx.com> <58086B7C.7010806@canterbury.ac.nz> Message-ID: I find the proposed syntax worse than the existing square brackets. The way the NumPy does a repr of an array is a good model of clarity, and it's correct current Python (except for larger arrays where visual ellipses are used). On Oct 20, 2016 12:01 AM, "Greg Ewing" wrote: > Matt Gilson wrote: > >> I think that it was mentioned that it might be possible for a user to >> _register_ a callable that would then be used when this syntax was envoked >> -- But having a global setting like that leads to contention. >> > > I think for that to fly it would have to be a per-module > thing. Then each module using the syntax would be able > to choose the meaning of it. > > A simple way to do this would be for the compiler to > translate it into something like > > __array__([[[ ... ]]]) > > and then you would just define __array__ appropriately, > e.g. > > from numpy import array as __array__ > > Personally I'm not very enthusiastic about the whole > thing, though. I don't find the new syntax to be much of > an improvement, if any. Certainly nowhere near enough > to be worth adding syntax. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Oct 20 14:01:10 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 20 Oct 2016 20:01:10 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> Message-ID: <917e646c-fc97-969b-e118-3c23b289fb09@mail.de> On 18.10.2016 10:01, Daniel Moisset wrote: > So, for me, this feature is something that could be covered with a > (new) function with no new syntax required. All you have to learn is > that instead of [*...] you use flatten(...) The main motivation is not "hey we need yet another way for xyz" but "could we remove that inconsistency?". You wrote "[*...]"; which already works. It just that it does not work for all kinds of "..." . I hope that makes the motivation clearer. I for one don't need yet another function or function cascade to make things work. "list(itertools.chain.from_iterable(...))" just works fine for all kinds of "...". Cheers, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Oct 20 14:03:25 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 20 Oct 2016 20:03:25 +0200 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> Message-ID: <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> On 19.10.2016 00:08, Rob Cliffe wrote: > >> But it's far too late to change it now, sadly. > Indeed. :-( But if I were ruler of the world and could have my own > wish-list for Python 4, this (as per the first example) would be on it. I don't see no reason why we can't make it. Personally, I also dislike this behavior. Cheers, Sven From srkunze at mail.de Thu Oct 20 14:21:03 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 20 Oct 2016 20:21:03 +0200 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <5805BFEA.3060909@canterbury.ac.nz> References: <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <938d7a83-945d-479f-09df-4c6feeff2a0a@mail.de> <20161018004946.GE22471@ando.pearwood.info> <5805BFEA.3060909@canterbury.ac.nz> Message-ID: On 18.10.2016 08:23, Greg Ewing wrote: > If it were a namedtuple, for example, you could write > > [*t for t in fulltext_tuples if t.language == 'english'] > > or > > [x for t in fulltext_tuples if t.language == 'english' for x in t] > > The latter is a bit unsatisfying, because we are having to > make up an arbitrary name 'x' to stand for an element of t. > Even though the two elements of t have quite different roles, > we can't use names that reflect those roles. It's an intriguing idea to use namedtuples but in this case one should not over-engineer. What I dislike most are the names of "fulltext_tuple", "x", "t". If I were to use it, I think my coworkers would tar and feather me. ;) This is one of the cases where it makes absolutely no sense to invent artificial names for the sake of naming. I can name a lot of (internal) examples where we tried really hard at inventing named concepts which make absolutely no sense half a year later even to those who invented them. Repeatedly, in the same part of the code. Each newly named concept introduces another indirection. Thus, we always need to find a middle ground between naming and using language features, so I (personally) would be grateful for this particular feature. :) > Because of that, to my eyes the version with * makes it easier > to see what is going on. That's a very nice phrase: "makes it easier to see what is going on". I need to remember that. Cheers, Sven From random832 at fastmail.com Thu Oct 20 15:09:29 2016 From: random832 at fastmail.com (Random832) Date: Thu, 20 Oct 2016 15:09:29 -0400 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> Message-ID: <1476990569.1777101.762331657.5682E8F1@webmail.messagingengine.com> On Tue, Oct 18, 2016, at 02:10, Nick Coghlan wrote: > Hi, I contributed the current list comprehension implementation (when > refactoring it for Python 3 to avoid leaking the iteration variable, > as requested in PEP 3100 [1]), and "comprehensions are syntactic sugar > for a series of nested for and if statements" is precisely my > understanding of how they work, and what they mean. But it's simply not true. It has never been true and it will never be true. Something is not "syntactic sugar" if it doesn't compile to the exact same sequence of operations as the thing it is supposedly syntactic sugar for. It's a useful teaching tool (though you've eventually got to teach the differences), but claiming that it's "syntactic sugar" - and "non-negotiably" so, at that - implies that it is a literal transformation. I said "limited" in reference to the specific claim - which was not yours - that since "yield a, b" yields a tuple, "yield *x" [and therefore (*x for...)] ought to also yield a tuple, and I stand by it. It's the same kind of simplistic understanding that should lead one to believe that not only the loop variable but also the "result" temporary ought to exist after the comprehension is executed. I was being entirely serious in saying that this is like objecting to normal unpacking on the grounds that an ordinary list display should be considered syntactic sugar for an unrolled sequence of append calls. In both cases, the equivalence is not exact, and there should be room to at least discuss things that would merely require an additional rule to be added (or changed, technically making it "result += [...]" would cover both cases) to the transformation - a transformation which already results in three different statements depending on whether it is a list comprehension, a set comprehension, or a generator expression (and a fourth if you count dict comprehensions, though that's a different syntax too) - rather than simply declaring them "not negotiable". Declaring this "not negotiable" was an incredibly hostile dismissal of everyone else's position. Especially when what's being proposed wouldn't invalidate the concept, it would just change the exact details of what the transformation is. Which is more than can be said for not leaking the variable. From python at 2sn.net Thu Oct 20 16:39:05 2016 From: python at 2sn.net (Alexander Heger) Date: Fri, 21 Oct 2016 07:39:05 +1100 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> References: <20161013165546.GB22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> Message-ID: For me the current behaviour does not seem unreasonable as it resembles the order in which you write out loops outside a comprehension except that the expression for generated values is provided first. On 21 October 2016 at 05:03, Sven R. Kunze wrote: > On 19.10.2016 00:08, Rob Cliffe wrote: > >> >> But it's far too late to change it now, sadly. >>> >> Indeed. :-( But if I were ruler of the world and could have my own >> wish-list for Python 4, this (as per the first example) would be on it. >> > > I don't see no reason why we can't make it. > > Personally, I also dislike this behavior. > > > Cheers, > Sven > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Oct 20 17:31:51 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 21 Oct 2016 08:31:51 +1100 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476990569.1777101.762331657.5682E8F1@webmail.messagingengine.com> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> <1476990569.1777101.762331657.5682E8F1@webmail.messagingengine.com> Message-ID: On Fri, Oct 21, 2016 at 6:09 AM, Random832 wrote: > On Tue, Oct 18, 2016, at 02:10, Nick Coghlan wrote: >> Hi, I contributed the current list comprehension implementation (when >> refactoring it for Python 3 to avoid leaking the iteration variable, >> as requested in PEP 3100 [1]), and "comprehensions are syntactic sugar >> for a series of nested for and if statements" is precisely my >> understanding of how they work, and what they mean. > > But it's simply not true. It has never been true and it will never be > true. Something is not "syntactic sugar" if it doesn't compile to the > exact same sequence of operations as the thing it is supposedly > syntactic sugar for. It's a useful teaching tool (though you've > eventually got to teach the differences), but claiming that it's > "syntactic sugar" - and "non-negotiably" so, at that - implies that it > is a literal transformation. But it is. There are two caveats to the transformation: firstly, it's done in a nested function (as of Py3), and secondly, the core operations are done with direct opcodes rather than looking up the ".append" method; but other than that, yes, it's exactly the same. Here's the disassembly (in 2.7, to avoid the indirection of the nested function): >>> def f1(x): ... return [ord(ch) for ch in x] ... >>> dis.dis(f1) 2 0 BUILD_LIST 0 3 LOAD_FAST 0 (x) 6 GET_ITER >> 7 FOR_ITER 18 (to 28) 10 STORE_FAST 1 (ch) 13 LOAD_GLOBAL 0 (ord) 16 LOAD_FAST 1 (ch) 19 CALL_FUNCTION 1 22 LIST_APPEND 2 25 JUMP_ABSOLUTE 7 >> 28 RETURN_VALUE >>> def f2(x): ... ret = [] ... for ch in x: ... ret.append(ord(ch)) ... return ret ... >>> dis.dis(f2) 2 0 BUILD_LIST 0 3 STORE_FAST 1 (ret) 3 6 SETUP_LOOP 33 (to 42) 9 LOAD_FAST 0 (x) 12 GET_ITER >> 13 FOR_ITER 25 (to 41) 16 STORE_FAST 2 (ch) 4 19 LOAD_FAST 1 (ret) 22 LOAD_ATTR 0 (append) 25 LOAD_GLOBAL 1 (ord) 28 LOAD_FAST 2 (ch) 31 CALL_FUNCTION 1 34 CALL_FUNCTION 1 37 POP_TOP 38 JUMP_ABSOLUTE 13 >> 41 POP_BLOCK 5 >> 42 LOAD_FAST 1 (ret) 45 RETURN_VALUE >>> Okay, so what exactly is going on here? Looks to me like there's some optimization happening in the list comp, but you can see that the same code is being emitted. It's not *perfectly* identical, but that's mainly because CPython doesn't take advantage of the fact that 'ret' was initialized to a list - it still does the full "look up 'append', then call it" work. I'm not sure why SETUP_LOOP exists in the full version and not the comprehension, but I believe it's to do with the break and continue keywords, which can't happen inside a comprehension. So, again, it's optimizations that are possible in the comprehension, but otherwise, the code is identical. Maybe "syntactic sugar" is pushing it a bit, but there's no fundamental difference between the two. Imagine if an optimizing compiler could (a) notice that there's no use of break/continue, and (b) do some static type analysis to see that 'ret' is always a list (and not a subclass thereof), and optimize the multi-line version. At that point, the two forms would look almost, or maybe completely, identical. So I'd support the "syntactic sugar" label here. Why is this discussion still on python-ideas? Shouldn't it be on python-demanding-explanations-for-status-quo by now? ChrisA From chris.barker at noaa.gov Thu Oct 20 18:08:54 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 20 Oct 2016 15:08:54 -0700 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: On Wed, Oct 19, 2016 at 5:32 PM, Todd wrote: > If there is a problem with the current options (and I'm not convinced >> there is) it's that it in'st a literal for multidimensional array, but >> rather a literal for a bunch of nested lists -- the list themselves are >> created, and so are all the "boxed" values in the array -- only to be >> pulled out and unboxed to be put in the array. >> >> > But as you said, that is not a multidimensional array. We aren't > comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", > we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, > 2],[3, 4, 5]])". That is a bigger difference. > Well then, you have mixed two proposals here: 1) a literal syntax for nd arrays -- that is not going to fly if there is NO ndarray object builtin to python. I kinda think there should be, though even then there need not be a literal for it (see Decimal). So I'd say -- get an nd array object into the standard library first, then we can talk about the literal 2) what the syntax should be for such a literal. OK, in this case, suggested that the way to hash that out is to start out with passing a string to a function that constructs the array -- then you could try things out without any additions to the language or the libraries -- it could be a stand-alone module that extends numpy: from ndarray_literal import nda my array = nda('||| 3, 4, 5 || 6, 7, 8 |||') (is that legal in your notation -- I honestly am not sure) and yes, that still requires typing "nda('", which you are trying to avoid. But honestly, I really have written a lot of numpy code, and writing: np.array( ..... ) does not bother me at all. IF I did support a literal, it would be so that the object could be constructed immediately rather than by creating other python objects first (liss, usually), and then making an array from that. If you do want to push the syntax idea further, I'd suggest going to the numpy list and seeing what folks there think. But as I can't help myself. It's clear from the other posts on the list here that others find your proposed syntax as confusing as I do. but maybe it could be made more clear. Taking a page from MATLAB: 1 2 3; 4 5 6 is a 2x3 2-d array. no in MATLAB, there only used to be matrixes, so this was pretty nice, but a bit hard to extend to multiple dimensions. But the principle is handy: one delimter for the first dimension,l a nother one for the second, etc.. we probably dont want to go with trying colons, and ! and who knows what else, so I liek your idea. a 2-d array: 1 | 2 | 3 || 4 | 5 | 6 (Or better) 1 | 2 | 3 || 4 | 5 | 6 a 3d array: 0 | 1 | 2 | 3 || 4 | 5 | 6 | 7 || 8 | 9 | 10 | 11 ||| 12 | 13 | 14 | 15|| 16 | 17 | 18 | 19|| 20 | 21 | 22 | 23|| Notes: 1) guess how I wrote that? I did: np.arange(24).reshape((2,3,4)) and edited the results -- making the point that maybe the current state of affairs is not so bad... 2) These are delimiters, rather than brackets -- so don't go at the beginning and optional at the end (like commas in python) 3) It doesn't use commas as all, as having a consistent system is clearer 4) Whitespace is insignificant as with the rest of Python -- though you want to be able to use a line ending as insignificant whitespace, so this may have to wrapped in a parentheses, or something to actually use it -- though a non-issue if it's a string Hmm -- about point (3), maybe use only commas: 0, 1, 2, 3,, 4, 5, 6, 7,, 8, 9, 10, 11,,, 12, 13, 14, 15,, 16, 17, 18, 19,, 20, 21, 22, 23 That would be more consistent with the rest of python, and multiple commas in a row are currently a syntax error. Even if your original data is large, I often need smaller areas when > processing, for example for broadcasting or as arguments to processing > functions. > sure I do hard-coded arrays all teh time -- but not big ones, and I don't think I've ever needed more than 2D and certainly not more than 3D. and not large enough that performance matters. It is: >> > > r_[[0, 1, 2], [3, 4, 5] > no, that's a shorthand for "row stack" -- and really not much better than the array() call, except a few less characters I meant the np.matrix() function that Alexander pointed out -- which is only really there to make folks coming from MATLAB happier...(and it makes a Matrix object, which you likely don't want). The point was that it's easy to make such a beast for your new syntax to try it out b = np.array([[ 0, 1, 2 ], > [ 3, 4, 5 ]]) > > The whole point of this is to avoid the "np.array" call. > again, trying to separate out the idea of a literal, from the syntax of the literal. but thinking now, Python already uses (), [], {}, and < and > -- so I don't think there are any more brackets. but why not just use commas with square brackets: 2Darr = [1, 2, 3,, 4, 5, 6] maybe too subtle? Yes, I understand that. But some projects are already doing that on their > own. I think having a way for them to do it without losing the list > constructor (which is the approach currently being taken) would be a > benefit. > huh? is anyone actually overriding the list constructor?? multiple dims apart (my [ and ,, example shows that you can do that with the current syntax) this is kind of like adding Decimal -- there is another type, but does it need a literal? I have maybe 90% of the code I write with an: import numpy as np at the top -- so yes, I kind a would like a literal, but it's really a pretty small deal -- at least once I got used to it after using MATLAB for years. I'd ask folks that have been using numpy for along time -- would this really help? One more problem -- with the addition of the @ operator, there have not been any use cases in the stdlib, but it is an operator, and Python already has a mechanism for operator overloading. As far as I know, every python literal maps to a SINGLE type -- so creating a literal for a non existent type makes no sense at all. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Oct 20 19:43:31 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 21 Oct 2016 12:43:31 +1300 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> Message-ID: <580956A3.6030205@canterbury.ac.nz> Alexander Heger wrote: > For me the current behaviour does not seem unreasonable as it resembles > the order in which you write out loops outside a comprehension That's true, but the main reason for having comprehensions syntax in the first place is so that it can be read declaratively -- as a description of the list you want, rather than a step-by-step sequence of instructions for building it up. If you have to stop and mentally transform it into nested for-statements, that very purpose is undermined. -- Greg From njs at pobox.com Fri Oct 21 02:03:11 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 20 Oct 2016 23:03:11 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On Wed, Oct 19, 2016 at 3:07 PM, Paul Moore wrote: > On 19 October 2016 at 20:21, Nathaniel Smith wrote: >> On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore wrote: >>> On 19 October 2016 at 19:13, Chris Angelico wrote: >>>> Now it *won't* correctly call the end-of-iteration function, because >>>> there's no 'for' loop. This is going to either (a) require that EVERY >>>> consumer of an iterator follow this new protocol, or (b) introduce a >>>> ton of edge cases. >>> >>> Also, unless I'm misunderstanding the proposal, there's a fairly major >>> compatibility break. At present we have: >>> >>>>>> lst = [1,2,3,4] >>>>>> it = iter(lst) >>>>>> for i in it: >>> ... if i == 2: break >>> >>>>>> for i in it: >>> ... print(i) >>> 3 >>> 4 >>>>>> >>> >>> With the proposed behaviour, if I understand it, "it" would be closed >>> after the first loop, so resuming "it" for the second loop wouldn't >>> work. Am I right in that? I know there's a proposed itertools function >>> to bring back the old behaviour, but it's still a compatibility break. >>> And code like this, that partially consumes an iterator, is not >>> uncommon. >> >> Right -- did you reach the "transition plan" section? (I know it's >> wayyy down there.) The proposal is to hide this behind a __future__ at >> first + a mechanism during the transition period to catch code that >> depends on the old behavior and issue deprecation warnings. But it is >> a compatibility break, yes. > > I missed that you propose phasing this in, but it doesn't really alter > much, I think the current behaviour is valuable and common, and I'm -1 > on breaking it. It's just too much of a fundamental change to how > loops and iterators interact for me to be comfortable with it - > particularly as it's only needed for a very specific use case (none of > my programs ever use async - why should I have to rewrite my loops > with a clumsy extra call just to cater for a problem that only occurs > in async code?) > > IMO, and I'm sorry if this is controversial, there's a *lot* of new > language complexity that's been introduced for the async use case, and > it's only the fact that it can be pretty much ignored by people who > don't need or use async features that makes it acceptable (the "you > don't pay for what you don't use" principle). The problem with this > proposal is that it doesn't conform to that principle - it has a > direct, negative impact on users who have no interest in async. Oh, goodness, no -- like Yury said, the use cases here are not specific to async at all. I mean, none of the examples are async even :-). The motivation here is that prompt (non-GC-dependent) cleanup is a good thing for a variety of reasons: determinism, portability across Python implementations, proper exception propagation, etc. async does add yet another entry to this list, but I don't the basic principle is controversial. 'with' blocks are a whole chunk of extra syntax that were added to the language just for this use case. In fact 'with' blocks weren't even needed for the functionality -- we already had 'try/finally', they just weren't ergonomic enough. This use case is so important that it's had multiple rounds of syntax directed at it before async/await was even a glimmer in C#'s eye :-). BUT, currently, 'with' and 'try/finally' have a gap: if you use them inside a generator (async or not, doesn't matter), then they often fail at accomplishing their core purpose. Sure, they'll execute their cleanup code whenever the generator is cleaned up, but there's no ergonomic way to clean up the generator. Oops. I mean, you *could* respond by saying "you should never use 'with' or 'try/finally' inside a generator" and maybe add that as a rule to your style manual and linter -- and some people in this thread have suggested more-or-less that -- but that seems like a step backwards. This proposal instead tries to solve the problem of making 'with'/'try/finally' work and be ergonomic in general, and it should be evaluated on that basis, not on the async/await stuff. The reason I'm emphasizing async generators is that they effect the timeline, not the motivation: - PEP 525 actually does add async-only complexity to the language (the new GC hooks). It doesn't affect non-async users, but it is still complexity. And it's possible that if we have iterclose, then we don't need the new GC hooks (though this is still an open discussion :-)). If this is true, then now is the time to act, while reverting the GC hooks change is still a possibility; otherwise, we risk the situation where we add iterclose later, decide that the GC hooks no longer provide enough additional value to justify their complexity... but we're stuck with them anyway. - For synchronous iteration, the need for a transition period means that the iterclose proposal will take a few years to provide benefits. For asynchronous iteration, it could potentially start providing benefits much sooner -- but there's a very narrow window for that, before people start using async generators and backwards compatibility constraints kick in. If we delay a few months then we'll probably have to delay a few years. ...that said, I guess there is one way that async/await directly affected my motivation here, though it's not what you think :-). async/await have gotten me experimenting with writing network servers, and let me tell you, there is nothing that focuses the mind on correctness and simplicity like trying to write a public-facing asynchronous network server. You might think "oh well if you're trying to do some fancy rocket science and this is a feature for rocket scientists then that's irrelevant to me", but that's actually not what I mean at all. The rocket science part is like, trying to run through all possible execution orders of the different callbacks in your head, or to mentally simulate what happens if a client shows up that writes at 1 byte/second. When I'm trying to do that,then the last thing I want is be distracted by also trying to figure out boring mechanical stuff like whether or not the language is actually going to execute my 'finally' block -- yet right now that's a question that actually cannot be answered without auditing my whole source code! And that boring mechanical stuff is still boring mechanical stuff when writing less terrifying code -- it's just that I'm so used to wasting a trickle of cognitive energy on this kind of thing it that normally I don't notice it so much. And, also, regarding the "clumsy extra call": the preserve() call isn't just arbitrary clumsiness -- it's a signal that hey, you're turning off a safety feature. Now the language won't take care of this cleanup for you, so it's your responsibility. Maybe you should think about how you want to handle that. Of course your decision could be "whatever, this is a one-off script, the GC is good enough". But it's probably worth the ~0.5 seconds of thought to make that an active, conscious decision, because they aren't all one-off scripts. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Oct 21 02:37:25 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 20 Oct 2016 23:37:25 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On Wed, Oct 19, 2016 at 7:07 PM, Terry Reedy wrote: > On 10/19/2016 12:38 AM, Nathaniel Smith wrote: > >> I'd like to propose that Python's iterator protocol be enhanced to add >> a first-class notion of completion / cleanup. > > > With respect the the standard iterator protocol, a very solid -1 from me. > (I leave commenting specifically on __aiterclose__ to Yury.) > > 1. I consider the introduction of iterables and the new iterator protocol in > 2.2 and their gradual replacement of lists in many situations to be the > greatest enhancement to Python since 1.3 (my first version). They are, to > me, they one of Python's greatest features and the minimal nature of the > protocol an essential part of what makes them great. Minimalism for its own sake isn't really a core Python value, and in any case the minimalism ship has kinda sailed -- we effectively already have send/throw/close as optional parts of the protocol (they're most strongly associated with generators, but you're free to add them to your own iterators and e.g. yield from will happily work with that). This proposal is basically "we formalize and start automatically calling the 'close' methods that are already there". > 2. I think you greatly underestimate the negative impact, just as we did > with changing str is bytes to str is unicode. The change itself, embodied > in for loops, will break most non-trivial programs. You yourself note that > there will have to be pervasive changes in the stdlib just to begin fixing > the breakage. The long-ish list of stdlib changes is about enabling the feature everywhere, not about fixing backwards incompatibilities. It's an important question though what programs will break and how badly. To try and get a better handle on it I've been playing a bit with an instrumented version of CPython that logs whenever the same iterator is passed to multiple 'for' loops. I'll write up the results in more detail, but the summary so far is that there seem to be ~8 places in the stdlib that would need preserve() calls added, and ~3 in django. Maybe 2-3 hours and 1 hour of work respectively to fix? It's not a perfect measure, and the cost certainly isn't zero, but it's at a completely different order of magnitude than the str changes. Among other things, this is a transition that allows for gradual opt-in via a __future__, and fine-grained warnings pointing you at what you need to fix, neither of which were possible for str->unicode. > 3. Though perhaps common for what you do, the need for the change is > extremely rare in the overall Python world. Iterators depending on an > external resource are rare (< 1%, I would think). Incomplete iteration is > also rare (also < 1%, I think). And resources do not always need to > releases immediately. This could equally well be an argument that the change is fine -- e.g. if you're always doing complete iteration, or just iterating over lists and stuff, then it literally doesn't affect you at all either way... > 4. Previous proposals to officially augment the iterator protocol, even with > optional methods, have been rejected, and I think this one should be too. > > a. Add .__len__ as an option. We added __length_hint__, which an iterator > may implement, but which is not part of the iterator protocol. It is also > ignored by bool(). > > b., c. Add __bool__ and/or peek(). I posted a LookAhead wrapper class that > implements both for most any iterable. I suspect that the is rarely used. > > >> def read_newline_separated_json(path): >> with open(path) as file_handle: # <-- with block >> for line in file_handle: >> yield json.loads(line) > > > One problem with passing paths around is that it makes the receiving > function hard to test. I think functions should at least optionally take an > iterable of lines, and make the open part optional. But then closing should > also be conditional. Sure, that's all true, but this is the problem with tiny documentation examples :-). The point here was to explain the surprising interaction between generators and with blocks in the simplest way, not to demonstrate the ideal solution to the problem of reading newline-separated JSON. Everything you want is still doable in a post-__iterclose__ world -- in particular, if you do for doc in read_newline_separated_json(lines_generator()): ... then both iterators will be closed when the for loop exits. But if you want to re-use the lines_generator, just write: it = lines_generator() for doc in read_newline_separated_json(preserve(it)): ... for more_lines in it: ... > If the combination of 'with', 'for', and 'yield' do not work together, then > do something else, rather than changing the meaning of 'for'. Moving > responsibility for closing the file from 'with' to 'for', makes 'with' > pretty useless, while overloading 'for' with something that is rarely > needed. This does not strike me as the right solution to the problem. > >> for document in read_newline_separated_json(path): # <-- outer for loop >> ... > > > If the outer loop determines when the file should be closed, then why not > open it there? What fails with > > try: > lines = open(path) > gen = read_newline_separated_json(lines) > for doc in gen: do_something(doc) > finally: > lines.close > # and/or gen.throw(...) to stop the generator. Sure, that works in this trivial case, but they aren't all trivial :-). See the example from my first email about a WSGI-like interface where response handlers are generators: in that use case, your suggestion that we avoid all resource management inside generators would translate to: "webapps can't open files". (Or database connections, proxy requests, ... or at least, can't hold them open while streaming out response data.) Or sticking to concrete examples, here's a toy-but-plausible generator where the put-the-with-block-outside strategy seems rather difficult to implement: # Yields all lines in all files in 'directory' that contain the substring 'needle' def recursive_grep(directory, needle): for dirpath, _, filenames in os.walk(directory): for filename in filenames: with open(os.path.join(dirpath, filename)) as file_handle: for line in file_handle: if needle in line: yield line -n -- Nathaniel J. Smith -- https://vorpus.org From steve at pearwood.info Fri Oct 21 03:12:19 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 21 Oct 2016 18:12:19 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <20161021071219.GH22471@ando.pearwood.info> On Thu, Oct 20, 2016 at 11:03:11PM -0700, Nathaniel Smith wrote: > The motivation here is that prompt (non-GC-dependent) cleanup is a > good thing for a variety of reasons: determinism, portability across > Python implementations, proper exception propagation, etc. async does > add yet another entry to this list, but I don't the basic principle is > controversial. Perhaps it should be. The very first thing you say is "determinism". Hmmm. As we (or at least, some of us) move towards more async code, more threads or multi- processing, even another attempt to remove the GIL from CPython which will allow people to use threads with less cost, how much should we really value determinism? That's not a rhetorical question -- I don't know the answer. Portability across Pythons... if all Pythons performed exactly the same, why would we need multiple implementations? The way I see it, non-deterministic cleanup is the cost you pay for a non-reference counting implementation, for those who care about the garbage collection implementation. (And yes, ref counting is garbage collection.) [...] > 'with' blocks are a whole chunk of extra syntax that > were added to the language just for this use case. In fact 'with' > blocks weren't even needed for the functionality -- we already had > 'try/finally', they just weren't ergonomic enough. This use case is so > important that it's had multiple rounds of syntax directed at it > before async/await was even a glimmer in C#'s eye :-). > > BUT, currently, 'with' and 'try/finally' have a gap: if you use them > inside a generator (async or not, doesn't matter), then they often > fail at accomplishing their core purpose. Sure, they'll execute their > cleanup code whenever the generator is cleaned up, but there's no > ergonomic way to clean up the generator. Oops. How often is this *actually* a problem in practice? On my system, I can open 1000+ files as a regular user. I can't even comprehend opening a tenth of that as an ordinary application, although I can imagine that if I were writing a server application things would be different. But then I don't expect to write server applications in quite the same way as I do quick scripts or regular user applications. So it seems to me that a leaked file handler or two normally shouldn't be a problem in practice. They'll be friend when the script or application closes, and in the meantime, you have hundreds more available. 90% of the time, using `with file` does exactly what we want, and the times it doesn't (because we're writing a generator that isn't closed promptly) 90% of those times it doesn't matter. So (it seems to me) that you're talking about changing the behaviour of for-loops to suit only a small proportion of cases: maybe 10% of 10%. It is not uncommon to pass an iterator (such as a generator) through a series of filters, each processing only part of the iterator: it = generator() header = collect_header(it) body = collect_body(it) tail = collect_tail(it) Is it worth disrupting this standard idiom? I don't think so. -- Steve From steve at pearwood.info Fri Oct 21 05:53:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 21 Oct 2016 20:53:45 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <20161021095344.GI22471@ando.pearwood.info> You know, I'm actually starting to lean towards this proposal and away from my earlier objections... On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote: > I should also say, regarding your specific example, I guess it's an > open question whether we would want list_iterator.__iterclose__ to > actually do anything. It could flip the iterator to a state where it > always raises StopIteration, That seems like the most obvious. [...] > The __iterclose__ contract is that you're not supposed > to call __next__ afterwards, so there's no real rule about what > happens if you do. If I recall correctly, in your proposal you use language like "behaviour is undefined". I don't like that language, because it sounds like undefined behaviour in C, which is something to be avoided like the plague. I hope I don't need to explain why, but for those who may not understand the dangers of "undefined behaviour" as per the C standard, you can start here: https://randomascii.wordpress.com/2014/05/19/undefined-behavior-can-format-your-drive/ So let's make it clear that what we actually mean is not C-ish undefined behaviour, where the compiler is free to open a portal to the Dungeon Dimensions or use Guido's time machine to erase code that executes before the undefined code: https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/ but rather ordinary, standard "implementation-dependent behaviour". If you call next() on a closed iterator, you'll get whatever the iterator happens to do when it is closed. That will be *recommended* to raise whatever error is appropriate to the iterator, but not enforced. That makes it just like the part of the iterator protocol that says that once an iterator raise StopIterator, it should always raise StopIterator. Those that don't are officially called "broken", but they are allowed and you can write one if you want to. Shorter version: - calling next() on a closed iterator is expected to be an error of some sort, often RuntimeError error, but the iterator is free to use a different error if that makes sense (e.g. closed files); - if your own iterator classes break that convention, they will be called "broken", but nobody will stop you from writing such "broken" iterators. -- Steve From p.f.moore at gmail.com Fri Oct 21 06:03:51 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 21 Oct 2016 11:03:51 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: On 21 October 2016 at 07:03, Nathaniel Smith wrote: > Oh, goodness, no -- like Yury said, the use cases here are not > specific to async at all. I mean, none of the examples are async even > :-). [...] Ah I follow now. Sorry for the misunderstanding, I'd skimmed a bit more than I realised I had. However, it still feels to me that the code I currently write doesn't need this feature, and I'm therefore unclear as to why it's sufficiently important to warrant a backward compatibility break. It's quite possible that I've never analysed my code well enough to *notice* that there's a problem. Or that I rely on CPython's GC behaviour without realising it. Also, it's honestly very rare that I need deterministic cleanup, as opposed to guaranteed cleanup - running out of file handles, for example, isn't really a problem I encounter. But it's also possible that it's a code design difference. You use the example (from memory, sorry if this is slightly different to what you wrote): def filegen(filename): with open(filename) as f: for line in f: yield line # caller for line in filegen(name): ... I wouldn't normally write a function like that - I'd factor it differently, with the generator taking an open file (or file-like object) and the caller opening the file: def filegen(fd): for line in f: yield line # caller with open(filename) as fd: for line in filegen(fd): ... With that pattern, there's no issue. And the filegen function is more generic, as it can be used with *any* file-like object (a StringIO, for testing, for example). > And, also, regarding the "clumsy extra call": the preserve() call > isn't just arbitrary clumsiness -- it's a signal that hey, you're > turning off a safety feature. Now the language won't take care of this > cleanup for you, so it's your responsibility. Maybe you should think > about how you want to handle that. Of course your decision could be > "whatever, this is a one-off script, the GC is good enough". But it's > probably worth the ~0.5 seconds of thought to make that an active, > conscious decision, because they aren't all one-off scripts. Well, if preserve() did mean just that, then that would be OK. I'd never use it, as I don't care about deterministic cleanup, so it makes no difference to me if it's on or off. But that's not the case - in fact, preserve() means "give me the old Python 3.5 behaviour", and (because deterministic cleanup isn't important to me) that's a vague and unclear distinction. So I don't know whether my code is affected by the behaviour change and I have to guess at whether I need preserve(). What I think is needed here is a clear explanation of how this proposal affects existing code that *doesn't* need or care about cleanup. The example that's been mentioned is with open(filename) as f: for line in f: if is_end_of_header(line): break process_header(line) for line in f: process_body(line) and similar code that relies on being able to part-process an iterator in a for loop, and then have a later loop pick up where the first left off. Most users of iterators and generators probably have little understanding of GeneratorExit, closing generators, etc. And that's a good thing - it's why iterators in Python are so useful. So the proposal needs to explain how it impacts that sort of user, in terms that they understand. It's a real pity that the explanation isn't "you can ignore all of this, as you aren't affected by the problem it's trying to solve" - that's what I was getting at. At the moment, the take home message for such users feels like it's "you might need to scatter preserve() around your code, to avoid the behaviour change described above, which you glazed over because it talked about all that coroutiney stuff you don't understand" :-) Paul Paul From p.f.moore at gmail.com Fri Oct 21 06:07:46 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 21 Oct 2016 11:07:46 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021095344.GI22471@ando.pearwood.info> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021095344.GI22471@ando.pearwood.info> Message-ID: On 21 October 2016 at 10:53, Steven D'Aprano wrote: > On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote: > >> I should also say, regarding your specific example, I guess it's an >> open question whether we would want list_iterator.__iterclose__ to >> actually do anything. It could flip the iterator to a state where it >> always raises StopIteration, > > That seems like the most obvious. So - does this mean "unless you understand what preserve() does, you're OK to not use it and your code will continue to work as before"? If so, then I'd be happy with this. But I genuinely don't know (without going rummaging through docs) what that statement means in any practical sense. Paul From steve at pearwood.info Fri Oct 21 06:29:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 21 Oct 2016 21:29:01 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> References: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> Message-ID: <20161021102901.GJ22471@ando.pearwood.info> On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote: > IOW I'm not convinced that if we implement your proposal we'll fix 90% > (or even 30%) of cases where non-deterministic and postponed cleanup is > harmful. Just because something doesn't solve ALL problems doesn't mean it isn't worth doing. Reference counting doesn't solve the problem of cycles, but Python worked really well for many years even though cycles weren't automatically broken. Then a second GC was added, but it didn't solve the problem of cycles with __del__ finalizers. And recently (a year or two ago) there was an improvement that made the GC better able to deal with such cases -- but I expect that there are still edge cases where objects aren't collected. Had people said "garbage collection doesn't solve all the edge cases, therefore its not worth doing" where would we be? I don't know how big a problem the current lack of deterministic GC of resources opened in generators actually is. I guess that users of CPython will have *no idea*, because most of the time the ref counter will cleanup quite early. But not all Pythons are CPython, and despite my earlier post, I now think I've changed my mind and support this proposal. One reason for this is that I thought hard about my own code where I use the double-for-loop idiom: for x in iterator: if cond: break ... # later for y in iterator: # same iterator ... and I realised: (1) I don't do this *that* often; (2) when I do, it really wouldn't be that big a problem for me to guard against auto-closing: for x in protect(iterator): if cond: break ... (3) if I need to write hybrid code that runs over multiple versions, that's easy too: try: from itertools import protect except ImportError: def protect(it): return it > Yes, mainly iterator wrappers. You'll also will need to educate users > to refactor (more on that below) their __del__ methods to > __(a)iterclose__ in 3.6. Couldn't __(a)iterclose__ automatically call __del__ if it exists? Seems like a reasonable thing to inherit from object. > A lot of code that you find on stackoverflow etc will be broken. "A lot"? Or a little? Are you guessing, or did you actually count it? If we are worried about code like this: it = iter([1, 2, 3]) a = list(it) # currently b will be [], with this proposal it will raise RuntimeError b = list(it) we can soften the proposal's recommendation that iterators raise RuntimeError on calling next() when they are closed. I've suggested that "whatever exception makes sense" should be the rule. Iterators with no resources to close can simply raise StopIteration instead. That will preserve the current behaviour. > Porting > code from Python2/<3.6 will be challenging. People are still struggling > to understand 'dict.keys()'-like views in Python 3. I spend a lot of time on the tutor and python-list mailing lists, and a little bit of time on Reddit /python, and I don't think I've ever seen anyone struggle with those. I'm sure it happens, but I don't think it happens often. After all, for the most common use-case, there's no real difference between Python 2 and 3: for key, value in mydict.items(): ... [...] > With you proposal, to achieve the same (and make the code compatible > with new for-loop semantics), users will have to implement both > __iterclose__ and __del__. As I ask above, couldn't we just inherit a default __(a)iterclose__ from object that looks like this? def __iterclose__(self): finalizer = getattr(type(self), '__del__', None) if finalizer: finalizer(self) I know it looks a bit funny for non-iterables to have an iterclose method, but they'll never actually be called. [...] > The __(a)iterclose__ semantics is clear. What's not clear is how much > harm changing the semantics of for-loops will do (and how to quantify > the amount of good :)) The "easy" way to find out (easy for those who aren't volunteering to do the work) is to fork Python, make the change, and see what breaks. I suspect not much, and most of the breakage will be easy to fix. As for the amount of good, this proposal originally came from PyPy. I expect that CPython users won't appreciate it as much as PyPy users, and Jython/IronPython users when they eventually support Python 3.x. -- Steve From steve at pearwood.info Fri Oct 21 07:23:52 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 21 Oct 2016 22:23:52 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> Message-ID: <20161021112352.GL22471@ando.pearwood.info> On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote: > At the moment, the take home message for such users feels like it's > "you might need to scatter preserve() around your code, to avoid the > behaviour change described above, which you glazed over because it > talked about all that coroutiney stuff you don't understand" :-) I now believe that's not necessarily the case. I think that the message should be: - If your iterator class has a __del__ or close method, then you need to read up on __(a)iterclose__. - If you iterate over open files twice, then all you need to remember is that the file will be closed when you exit the first loop. To avoid that auto-closing behaviour, use itertools.preserve(). - Iterating over lists, strings, tuples, dicts, etc. won't change, since they don't have __del__ or close() methods. I think that covers all the cases the average Python code will care about. -- Steve From steve at pearwood.info Fri Oct 21 07:13:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 21 Oct 2016 22:13:45 +1100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021095344.GI22471@ando.pearwood.info> Message-ID: <20161021111345.GK22471@ando.pearwood.info> On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote: > On 21 October 2016 at 10:53, Steven D'Aprano wrote: > > On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote: > > > >> I should also say, regarding your specific example, I guess it's an > >> open question whether we would want list_iterator.__iterclose__ to > >> actually do anything. It could flip the iterator to a state where it > >> always raises StopIteration, > > > > That seems like the most obvious. I've changed my mind -- I think maybe it should do nothing, and preserve the current behaviour of lists. I'm now more concerned with keeping current behaviour as much as possible than creating some sort of consistent error condition for all iterators. Consistency is over-rated, and we already have inconsistency here: file iterators behave differently from list iterators, because they can be closed: py> f = open('/proc/mdstat', 'r') py> a = list(f) py> b = list(f) py> len(a), len(b) (20, 0) py> f.close() py> c = list(f) Traceback (most recent call last): File "", line 1, in ValueError: I/O operation on closed file. We don't need to add a close() to list iterators just so they are consistent with files. Just let __iterclose__ be a no-op. > So - does this mean "unless you understand what preserve() does, > you're OK to not use it and your code will continue to work as > before"? If so, then I'd be happy with this. Almost. Code like this will behave exactly the same as it currently does: for x in it: process(x) y = list(it) If it is a file object, the second call to list() will raise ValueError; if it is a list_iterator, or generator, etc., y will be an empty list. That part (I think) shouldn't change. What *will* change is code that partially processes the iterator in two different places. A simple example: py> it = iter([1, 2, 3, 4, 5, 6]) py> for x in it: ... if x == 4: break ... py> for x in it: ... print(x) ... 5 6 This *may* change. With this proposal, the first loop will "close" the iterator when you exit from the loop. For a list, there's no finaliser, no __del__ to call, so we can keep the current behaviour and nobody will notice any difference. But if `it` is a file iterator instead of a list iterator, the file will be closed when you exit the first for-loop, and the second loop will raise ValueError. That will be different. The fix here is simple: protect the first call from closing: for x in itertools.preserve(it): # preserve, protect, whatever ... Or, if `it` is your own class, give it a __iterclose__ method that does nothing. This is a backwards-incompatible change, so I think we would need to do this: (1) In Python 3.7, we introduce a __future__ directive: from __future__ import iterclose to enable the new behaviour. (Remember, future directives apply on a module-by-module basis.) (2) Without the directive, we keep the old behaviour, except that warnings are raised if something will change. (3) Then in 3.8 iterclose becomes the default, the warnings go away, and the new behaviour just happens. If that's too fast for people, we could slow it down: (1) Add the future directive to Python 3.7; (2) but no warnings by default (you have to opt-in to the warnings with an environment variable, or command-line switch). (3) Then in 3.8 the warnings are on by default; (4) And the iterclose behaviour doesn't become standard until 3.9. That means if this change worries you, you can ignore it until you migrate to 3.8 (which won't be production-ready until about 2020 or so), and don't have to migrate your code until 3.9, which will be a year or two later. But early adopters can start targetting the new functionality from 3.7 if they like. I don't think there's any need for a __future__ directive for aiterclose, since there's not enough backwards-incompatibility to care about. (I think, but don't mind if people disagree.) That can happen starting in 3.7, and when people complain that their syncronous generators don't have deterministic garbage collection like their asyncronous ones do, we can point them at the future directive. Bottom line is: at first I thought this was a scary change that would break too much code. But now I think it won't break much, and we can ease into it really slowly over two or three releases. So I think that the cost is probably low. I'm still not sure on how great the benefit will be, but I'm leaning towards a +1 on this. -- Steve From p.f.moore at gmail.com Fri Oct 21 09:35:19 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 21 Oct 2016 14:35:19 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021112352.GL22471@ando.pearwood.info> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021112352.GL22471@ando.pearwood.info> Message-ID: On 21 October 2016 at 12:23, Steven D'Aprano wrote: > On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote: > >> At the moment, the take home message for such users feels like it's >> "you might need to scatter preserve() around your code, to avoid the >> behaviour change described above, which you glazed over because it >> talked about all that coroutiney stuff you don't understand" :-) > > I now believe that's not necessarily the case. I think that the message > should be: > > - If your iterator class has a __del__ or close method, then you need > to read up on __(a)iterclose__. > > - If you iterate over open files twice, then all you need to remember is > that the file will be closed when you exit the first loop. To avoid > that auto-closing behaviour, use itertools.preserve(). > > - Iterating over lists, strings, tuples, dicts, etc. won't change, since > they don't have __del__ or close() methods. > > > I think that covers all the cases the average Python code will care > about. OK, that's certainly a lot less scary. Some thoughts, remain, though: 1. You mention files. Presumably (otherwise what would be the point of the change?) there will be other iterables that change similarly. There's no easy way to know in advance. 2. Cleanup protocols for iterators are pretty messy now - __del__, close, __iterclose__, __aiterclose__. What's the chance 3rd party implementers get something wrong? 3. What about generators? If you write your own generator, you don't control the cleanup code. The example: def mygen(name): with open(name) as f: for line in f: yield line is a good example - don't users of this generator need to use preserve() in order to be able to do partial iteration? And yet how would the writer of the generator know to document this? And if it isn't documented, how does the user of the generator know preserve is needed? My feeling is that this proposal is a relatively significant amount of language churn, to solve a relatively niche problem, and furthermore one that is actually only a problem to non-CPython implementations[1]. My instincts are that we need to back off on the level of such change, to give users a chance to catch their breath. We're not at the level of where we need something like the language change moratorium (PEP 3003) but I don't think it would do any harm to give users a chance to catch their breath after the wave of recent big changes (async, typing, path protocol, f-strings, funky unpacking, Windows build and installer changes, ...). To put this change in perspective - we've lived without it for many years now, can we not wait a little while longer? >From another message: > Bottom line is: at first I thought this was a scary change that would > break too much code. But now I think it won't break much, and we can > ease into it really slowly over two or three releases. So I think that > the cost is probably low. I'm still not sure on how great the benefit > will be, but I'm leaning towards a +1 on this. And yet, it still seems to me that it's going to force me to change (maybe not much, but some of) my existing code, for absolutely zero direct benefit, as I don't personally use or support PyPy or any other non-CPython implementations. Don't forget that PyPy still doesn't even implement Python 3.5 - so no-one benefits from this change until PyPy supports Python 3.8, or whatever version this becomes the default in. It's very easy to misuse an argument like this to block *any* sort of change, and that's not my intention here - but I am trying to understand what the real-world issue is here, and how (and when!) this proposal would allow people to write code to fix that problem. At the moment, it feels like: * The problem is file handle leaks in code running under PyPy * The ability to fix this will come in around 4 years (random guess as to when PyPy implements Python 3.8, plus an assumption that the code needing to be fixed can immediately abandon support for all earlier versions of PyPy). Any other cases seem to me to be theoretical at the moment. Am I being unfair in this assessment? (It feels like I might be, but I can't be sure how). Paul [1] As I understand it. CPython's refcounting GC makes this a non-issue, correct? From ncoghlan at gmail.com Fri Oct 21 10:26:10 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Oct 2016 00:26:10 +1000 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> References: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> Message-ID: On 19 October 2016 at 21:29, Michel Desmoulin wrote: > +1. > > I read many disagreements, and people being rude and unprofessional on > occasions, but nothing that would make me have a bad day, even when I was > the target of it. > > I feel like people are really getting hyper sensitive about communications. As Paul says, assuming good intent is highly desirable, but at the same time, we need to fully appreciate as post authors that python-ideas is a shared communications channel where hundreds of other people are offering us their time and attention for the shared purpose of attempting to ensure that future versions of Python offer an even better programming environment for all kinds of programmers. Respecting that means learning to somehow balance the interests of kids taking their first steps into programming with MicroPython on the micro:bit, adults picking up Python as a possible first step in pursuing a career change, professional web service developers wringing every last ounce of performance out of PyPy that they can, scientists & analysts trying to make sense of their data sets in CPython, system administrators and infrastructure engineers automating their daily activities, etc, etc. Sure, many of us are mainly here because we'd like to make future versions of Python better for ourselves as individuals, but the sheer scope of Python's existing adoption means we're all operating under the basic constraint that even unambiguously positive changes impose non-trivial costs on the rest of the ecosystem, as other implementations need to be updated, books, course notes, and other educational materials require changes, and every current Pythonista gains a new thing to keep in mind where they're wondering which features they can and can't rely on when targeting particular versions. Even the consequences for future Pythonistas aren't unambiguously good, as they'll not only gain a new capability that they'll learn in newer versions, and then have to unlearn when maintaining software written to also run on older versions, but will also often receive confusing advice from folks that first learned an earlier version of Python, and may not have kept up with all of the changes in more recent versions. This constraint is exacerbated by the fact that we're still in the midst of the Python 3 migration, where many of our current users still have software compatibility hurdles between them and their ability to benefit from the work being done on the Python 3 series. This all means that with "post your language design ideas for collaborative critique" being an inherently conflict prone activity, and with "No, that's not a good fit for Python" being such a common answer, it takes a lot of collective effort for us to ensure that this remains a primarily positive and rewarding experience both for folks posting suggestions for change, and for folks reviewing and commenting on those suggestions. In practice, this mainly boils down to attempting to follow the guidelines: - Don't make people regret posting their idea (however much we personally dislike it) - Be willing to graciously accept 'No' for an answer when posting a suggestion for change - Remember that fixing problems just for ourselves and folks that choose to adopt our solution is a perfectly fine option - we don't necessarily have to change the default characteristics of the entire language ecosystem - Remember that even if something we vehemently consider "wrong" makes it into the reference implementation, the language does have a design policy that allows us to correct design mistakes after a suitable deprecation period, and we also each personally have the option of advocating for updates to the coding styles on the projects we participate in to prohibit use of the features we consider problematic Cheers, Nick. P.S. Given the existence of the constraints discussed above, folks may then be curious as to why we have a brainstorming list at all, given that the default answer is almost always going to be "No", with folks being encouraged to instead find a way to use the existing flexibility in the language and interpreter design to solve the problem to their own satisfaction in a 3rd party module or even a language variant (with MacroPy and Hylang being a couple of significant examples of folks taking that path for ideas that would never be accepted into the default implementation). The reason it's useful to have a brainstorming list where folks may be told "That's a bad idea for Python", but should never be told "You shouldn't have suggested that", is that sometimes someone will challenge a longstanding assumption that isn't actually true anymore, or an accident of implementation that imposes an unnecessary stumbling block for new users. In those cases, the net future benefit to the overall ecosystem may be judged significant enough to be worth the costs of adjusting to it. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Oct 21 11:03:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Oct 2016 01:03:45 +1000 Subject: [Python-ideas] Fwd: Fwd: Fwd: unpacking generalisations for list comprehension In-Reply-To: <1476990569.1777101.762331657.5682E8F1@webmail.messagingengine.com> References: <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <20161017173219.GC22471@ando.pearwood.info> <1476726562.888642.758686169.52B9C868@webmail.messagingengine.com> <1476990569.1777101.762331657.5682E8F1@webmail.messagingengine.com> Message-ID: On 21 October 2016 at 05:09, Random832 wrote: > On Tue, Oct 18, 2016, at 02:10, Nick Coghlan wrote: >> Hi, I contributed the current list comprehension implementation (when >> refactoring it for Python 3 to avoid leaking the iteration variable, >> as requested in PEP 3100 [1]), and "comprehensions are syntactic sugar >> for a series of nested for and if statements" is precisely my >> understanding of how they work, and what they mean. > > But it's simply not true. It has never been true and it will never be > true. Something is not "syntactic sugar" if it doesn't compile to the > exact same sequence of operations as the thing it is supposedly > syntactic sugar for. We don't need to guess about this, since we can consult the language reference and see how comprehension semantics are specified for language implementors: https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries Firstly, container displays are broken out into two distinct kinds: For constructing a list, a set or a dictionary Python provides special syntax called ?displays?, each of them in two flavors: - either the container contents are listed explicitly, or - they are computed via a set of looping and filtering instructions, called a comprehension. Secondly, the meaning of the clauses in comprehensions is spelled out a little further down: The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached. We can also go back and read the design PEPs that added these features to the language: * List comprehensions: https://www.python.org/dev/peps/pep-0202/ * Generator expressions: https://www.python.org/dev/peps/pep-0289/ PEP 202 defined the syntax in terms of its proposed behaviour rather than a syntactic, with the only reference to the nesting equivalence being this BDFL pronouncement: - The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops. PEP 289, by contrast, fully spells out the implied generator definition that was used to guide the implementation of generator expressions in the code generator: g = (tgtexp for var1 in exp1 if exp2 for var2 in exp3 if exp4) is equivalent to: def __gen(bound_exp): for var1 in bound_exp: if exp2: for var2 in exp3: if exp4: yield tgtexp g = __gen(iter(exp1)) del __gen When I implemented the comprehension index variable hiding for Python 3.0, the final version was the one where I blended those two definitions to end up with the situation where: data = [(var1, var2) for var1 in exp1 if exp2 for var2 in exp3 if exp4)] is now equivalent to: def __comprehension(bound_exp): __hidden_var = [] for var1 in bound_exp: if exp2: for var2 in exp3: if exp4: __hidden_var.append((var1, var2)) return __hidden_var data = __comprehension(iter(exp1)) del __comprehension While it's pretty dated now (I wrote it circa 2.5 as part of a draft book manuscript that was never published), if you'd like to learn more about this, you may want to check out the section on "Initialising standard containers" in http://svn.python.org/view/sandbox/trunk/userref/ODF/Chapter02_StatementsAndExpressions.odt Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Oct 21 11:07:02 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 21 Oct 2016 16:07:02 +0100 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> Message-ID: On 21 October 2016 at 15:26, Nick Coghlan wrote: > - Remember that even if something we vehemently consider "wrong" makes > it into the reference implementation, the language does have a design > policy that allows us to correct design mistakes after a suitable > deprecation period, and we also each personally have the option of > advocating for updates to the coding styles on the projects we > participate in to prohibit use of the features we consider problematic This one, I think is particularly relevant to long-time participants (I know it applies to me). It's very easy to become strongly defensive out of fear that a chorus of enthusiasm will push something through that you disagree with. It's worth people (me!) remembering that there's a large "silent majority" of people who don't participate on python-ideas, but who have a strong influence on whether proposals get accepted. Also, the tracker and the PEP process do a great job of sanity checking proposals. So there's no need for people to feel like they have to be the lone defender of the language against wild proposals. > The reason it's useful to have a > brainstorming list where folks may be told "That's a bad idea for > Python", but should never be told "You shouldn't have suggested that", > is that sometimes someone will challenge a longstanding assumption > that isn't actually true anymore, or an accident of implementation > that imposes an unnecessary stumbling block for new users. In those > cases, the net future benefit to the overall ecosystem may be judged > significant enough to be worth the costs of adjusting to it. I wonder. Would there be value in adding a sign-up email to the list (supported by a posting of that email to the list, to catch existing contributors) that set out some of the basic principles of how changes are judged for inclusion in Python? We could cover things like: * The fact that the default answer is typically "no", along with an overview of the reasons *why* the status quo wins by default. * Some of the simple "rules of thumb" like "not every 2-line function should be a builtin. * Basic reminders that Python is used by a very diverse set of users, and proposals that are only beneficial for a limited group need to be weighed against the disruption to the majority who get no benefit. * The above comment, that we welcome ideas because it's important that we don't stagnate and having assumptions challenged is valuable, even if the bar for getting such ideas accepted is (necessarily) high. Maybe even make it a regular informational posting, if it seems that a reminder would be useful. It's possible that this would come across as too bureaucratic for new users, though, so I'm not sure... Paul From yselivanov.ml at gmail.com Fri Oct 21 11:08:57 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 21 Oct 2016 11:08:57 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021102901.GJ22471@ando.pearwood.info> References: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> <20161021102901.GJ22471@ando.pearwood.info> Message-ID: <37886575-dace-a8df-ca1d-50ead5e93748@gmail.com> On 2016-10-21 6:29 AM, Steven D'Aprano wrote: > On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote: [..] >> With you proposal, to achieve the same (and make the code compatible >> with new for-loop semantics), users will have to implement both >> __iterclose__ and __del__. > As I ask above, couldn't we just inherit a default __(a)iterclose__ from > object that looks like this? > > def __iterclose__(self): > finalizer = getattr(type(self), '__del__', None) > if finalizer: > finalizer(self) > > > I know it looks a bit funny for non-iterables to have an iterclose > method, but they'll never actually be called. No, we can't call __del__ from __iterclose__. Otherwise we'd break even more code that this proposal already breaks: for i in iter: ... iter.something() # <- this would be call after iter.__del__() [..] > As for the amount of good, this proposal originally came from PyPy. I > expect that CPython users won't appreciate it as much as PyPy users, and > Jython/IronPython users when they eventually support Python 3.x. AFAIK the proposal came "for" PyPy, not "from". And the issues Nathaniel tries to solve do also exist in CPython. It's only a question if changing 'for' statement and iteration protocol is worth the trouble. Yury From yselivanov.ml at gmail.com Fri Oct 21 11:15:25 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 21 Oct 2016 11:15:25 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021111345.GK22471@ando.pearwood.info> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021095344.GI22471@ando.pearwood.info> <20161021111345.GK22471@ando.pearwood.info> Message-ID: On 2016-10-21 7:13 AM, Steven D'Aprano wrote: > Consistency is over-rated, and we already have inconsistency > here: file iterators behave differently from list iterators, because > they can be closed: This is **very** arguable :) Yury From gjcarneiro at gmail.com Fri Oct 21 11:19:23 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 21 Oct 2016 16:19:23 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <37886575-dace-a8df-ca1d-50ead5e93748@gmail.com> References: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> <20161021102901.GJ22471@ando.pearwood.info> <37886575-dace-a8df-ca1d-50ead5e93748@gmail.com> Message-ID: Personally, I hadn't realised we had this problem in asyncio until now. Does this problem happen in asyncio at all? Or does asyncio somehow work around it by making sure to always explicitly destroy the frames of all coroutine objects, as long as someone waits on each task? On 21 October 2016 at 16:08, Yury Selivanov wrote: > > > On 2016-10-21 6:29 AM, Steven D'Aprano wrote: > >> On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote: >> > [..] > >> With you proposal, to achieve the same (and make the code compatible >>> with new for-loop semantics), users will have to implement both >>> __iterclose__ and __del__. >>> >> As I ask above, couldn't we just inherit a default __(a)iterclose__ from >> object that looks like this? >> >> def __iterclose__(self): >> finalizer = getattr(type(self), '__del__', None) >> if finalizer: >> finalizer(self) >> >> >> I know it looks a bit funny for non-iterables to have an iterclose >> method, but they'll never actually be called. >> > > No, we can't call __del__ from __iterclose__. Otherwise we'd > break even more code that this proposal already breaks: > > > for i in iter: > ... > iter.something() # <- this would be call after iter.__del__() > > [..] > >> As for the amount of good, this proposal originally came from PyPy. I >> expect that CPython users won't appreciate it as much as PyPy users, and >> Jython/IronPython users when they eventually support Python 3.x. >> > > AFAIK the proposal came "for" PyPy, not "from". And the > issues Nathaniel tries to solve do also exist in CPython. It's > only a question if changing 'for' statement and iteration protocol > is worth the trouble. > > Yury > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Oct 21 11:42:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Oct 2016 01:42:50 +1000 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> Message-ID: On 22 October 2016 at 01:07, Paul Moore wrote: > I wonder. Would there be value in adding a sign-up email to the list > (supported by a posting of that email to the list, to catch existing > contributors) that set out some of the basic principles of how changes > are judged for inclusion in Python? We could cover things like: > > * The fact that the default answer is typically "no", along with an > overview of the reasons *why* the status quo wins by default. > * Some of the simple "rules of thumb" like "not every 2-line function > should be a builtin. > * Basic reminders that Python is used by a very diverse set of users, > and proposals that are only beneficial for a limited group need to be > weighed against the disruption to the majority who get no benefit. > * The above comment, that we welcome ideas because it's important that > we don't stagnate and having assumptions challenged is valuable, even > if the bar for getting such ideas accepted is (necessarily) high. > > Maybe even make it a regular informational posting, if it seems that a > reminder would be useful. > > It's possible that this would come across as too bureaucratic for new > users, though, so I'm not sure... We have a bit of that kind of content in the developer guide (although I don't think we have anything written down anywhere regarding "Usage scenarios to keep in mind"): * https://docs.python.org/devguide/langchanges.html#langchanges * https://docs.python.org/devguide/faq.html#suggesting-changes Those could potentially be linked from the python-ideas list overview at https://mail.python.org/mailman/listinfo/python-ideas for folks that hit the mailing list sign-up page directly, rather than encountering the Developer Guide first. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Fri Oct 21 13:33:00 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 21 Oct 2016 19:33:00 +0200 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: <580956A3.6030205@canterbury.ac.nz> References: <20161013165546.GB22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> <580956A3.6030205@canterbury.ac.nz> Message-ID: <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> On 21.10.2016 01:43, Greg Ewing wrote: > Alexander Heger wrote: >> For me the current behaviour does not seem unreasonable as it >> resembles the order in which you write out loops outside a comprehension > > That's true, but the main reason for having comprehensions > syntax in the first place is so that it can be read > declaratively -- as a description of the list you want, > rather than a step-by-step sequence of instructions for > building it up. > > If you have to stop and mentally transform it into nested > for-statements, that very purpose is undermined. > Exactly. Sven From srkunze at mail.de Fri Oct 21 14:12:43 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 21 Oct 2016 20:12:43 +0200 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: On 19.10.2016 21:08, Todd wrote: > > a= [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 > || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 > ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 > || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 > |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 > || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 > ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 > || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 > ||||] > > b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 | > | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 | > | > | -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 | > | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 | > || > | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 | > | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 | > | > | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 | > | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] > > > Compared to the current approach: > > a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], > [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], > [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], > [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], > [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], > [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], > [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], > [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) > > I think both of the new examples are considerably clearer than the > current approach. > > Does anyone have any questions or thoughts? Honestly, all three look the same to me. Confusing, large, heavy. I'm sorry. :( Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Oct 21 14:16:12 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 21 Oct 2016 20:16:12 +0200 Subject: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed In-Reply-To: References: Message-ID: <2cc1c004-bf44-7240-2b5b-b2376f804bbf@mail.de> On 17.10.2016 23:53, Paul Moore wrote: > On 17 October 2016 at 22:28, Mark Lawrence via Python-ideas > wrote: >> How about changing https://wiki.python.org/moin/HowTo/Sorting ? > Good point. Better still, https://docs.python.org/3.6/howto/sorting.html Don't know what the real difference between those two are and how to change them but yes. I think tweaking "Sort Stability and Complex Sorts" and/or adding some topic ("Multisort") in between is a good idea. Best, Sven From srkunze at mail.de Fri Oct 21 14:26:33 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 21 Oct 2016 20:26:33 +0200 Subject: [Python-ideas] Conditional Assignment in If Statement In-Reply-To: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> References: <187A0737-994F-4744-BFF0-D3EC320FE4A3@mdupont.com> Message-ID: On 18.10.2016 00:11, Michael duPont wrote: > What does everyone think about: > > if foo = get_foo(): > bar(foo) > > as a means to replace: > > foo = get_foo() > if not foo: > bar(foo) > del foo > > Might there be some better syntax or a different keyword? I constantly run into this sort of use case. Before really understanding what you need here I have some questions: 1) What does real-world code look like here exactly? 2) Why do you need foo to be deleted after the if? 3) Do you need this in interactive sessions, short-lived code or maintained code? Cheers, Sven From breamoreboy at yahoo.co.uk Fri Oct 21 15:13:21 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 21 Oct 2016 20:13:21 +0100 Subject: [Python-ideas] please try to keep things civil In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: On 17/10/2016 19:29, Brett Cannon wrote: > > > On Sun, 16 Oct 2016 at 09:39 Mark Lawrence via Python-ideas > > wrote: > > On 16/10/2016 16:41, Mariatta Wijaya wrote: > >>Her reaction was hilarious: > >> > >>"Whom does he teach? Children?" > > > > I sense mockery in your email, and it does not conform to the PSF code > > of conduct. Please read the CoC before posting in this mailing > list. The > > link is available at the bottom of every python mailing list > > email.https://www.python.org/psf/codeofconduct/ > > > > I don't find teaching children is a laughing matter, neither is > the idea > > of children learning to code. > > In Canada, we have initiatives like Girls Learning Code and Kids > > Learning Code. I mentored in a couple of those events and the students > > are girls aged 8-14. They surprised me with their abilities to > learn. I > > would suggest looking for such mentoring opportunities in your area to > > gain appreciation with this regard. > > Thanks. > > (Sorry to derail everyone from the topic of list comprehension. Please > > continue!) > > > > The RUE was allowed to insult the community for years and got away with > it. > > What is the "RUE"? Not what, who, the Resident Unicode Expert who spent two years spewing his insults at the entire Python community until the moderators finally woke up, did their job, and kicked him into touch. > > I'm autistic, stepped across the line, and got hammered. Hypocrisy > at its best. > > > While some of us know your background, Mark, not everyone on this list > does as people join at different times, so please try to give the > benefit of the doubt to people. Marietta obviously takes how children > are reflected personally and was trying to point out that fact. I don't > think she meant for the CoC reference to come off as threatening, just > to back up why she was taking the time out to speak up that she was > upset by what was said. > This list is irrelevant. The PSF has to be consistent across all of its lists. Who the hell is Marietta, I don't recall a single post from her in 16 years of using Python? -- Hell I'm confused, why the hell do I bother??? From ronan.lamy at gmail.com Fri Oct 21 15:20:55 2016 From: ronan.lamy at gmail.com (Ronan Lamy) Date: Fri, 21 Oct 2016 20:20:55 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021112352.GL22471@ando.pearwood.info> Message-ID: <907d5861-60c6-2162-c144-f2b3414f7b0f@gmail.com> Le 21/10/16 ? 14:35, Paul Moore a ?crit : > > [1] As I understand it. CPython's refcounting GC makes this a > non-issue, correct? Wrong. Any guarantee that you think the CPython GC provides goes out of the window as soon as you have a reference cycle. Refcounting does not actually make GC deterministic, it merely hides the problem away from view. For instance, on CPython 3.5, running this code: #%%%%%%%%% class some_resource: def __enter__(self): print("Open resource") return 42 def __exit__(self, *args): print("Close resource") def some_iterator(): with some_resource() as s: yield s def main(): it = some_iterator() for i in it: if i == 42: print("The answer is", i) break print("End loop") # later ... try: 1/0 except ZeroDivisionError as e: exc = e main() print("Exit") #%%%%%%%%%% produces: Open resource The answer is 42 End loop Exit Close resource What happens is that 'exc' holds a cyclic reference back to the main() frame, which prevents it from being destroyed when the function exits, and that frame, in turn, holds a reference to the iterator, via the local variable 'it'. And so, the iterator remains alive, and the resource unclosed, until the next garbage collection. From ethan at stoneleaf.us Fri Oct 21 15:54:49 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 21 Oct 2016 12:54:49 -0700 Subject: [Python-ideas] please try to keep things civil In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: <580A7289.3030001@stoneleaf.us> On 10/21/2016 12:13 PM, Mark Lawrence via Python-ideas wrote: > This list is irrelevant. The PSF has to be consistent across all of its lists. This list is not irrelevant, and yes *volunteers who moderate* should be consistent. > Who the hell is Marietta, I don't recall a single post from her in 16 years of using Python? She is a fellow Pythonista, and you owe her an apology. -- ~Ethan~ P.S. Should you decide to bring my own stupidity a few months of ago of being insulting, you should also take note that I took responsibility for it and apologized myself. From ned at nedbatchelder.com Fri Oct 21 16:04:20 2016 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 21 Oct 2016 16:04:20 -0400 Subject: [Python-ideas] please try to keep things civil In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161013204019.GE13170@sjoerdjob.com> <20161013221525.GD22471@ando.pearwood.info> <58008958.403@canterbury.ac.nz> <20161015080958.GN22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> Message-ID: <89c8f0af-c4a7-aafd-84c7-06d4bff65a40@nedbatchelder.com> On 10/21/16 3:13 PM, Mark Lawrence via Python-ideas wrote: > On 17/10/2016 19:29, Brett Cannon wrote: >> >> >> On Sun, 16 Oct 2016 at 09:39 Mark Lawrence via Python-ideas >> > > wrote: >> >> On 16/10/2016 16:41, Mariatta Wijaya wrote: >> >>Her reaction was hilarious: >> >> >> >>"Whom does he teach? Children?" >> > >> > I sense mockery in your email, and it does not conform to the >> PSF code >> > of conduct. Please read the CoC before posting in this mailing >> list. The >> > link is available at the bottom of every python mailing list >> > email.https://www.python.org/psf/codeofconduct/ >> > >> > I don't find teaching children is a laughing matter, neither is >> the idea >> > of children learning to code. >> > In Canada, we have initiatives like Girls Learning Code and Kids >> > Learning Code. I mentored in a couple of those events and the >> students >> > are girls aged 8-14. They surprised me with their abilities to >> learn. I >> > would suggest looking for such mentoring opportunities in your >> area to >> > gain appreciation with this regard. >> > Thanks. >> > (Sorry to derail everyone from the topic of list comprehension. >> Please >> > continue!) >> > >> >> The RUE was allowed to insult the community for years and got >> away with >> it. >> >> What is the "RUE"? > > Not what, who, the Resident Unicode Expert who spent two years spewing > his insults at the entire Python community until the moderators > finally woke up, did their job, and kicked him into touch. The person in question was insisting that Python's Unicode handling was broken, but could never produce an example of broken behavior, only surprising (to him) timings and memory usage. He jumped into threads about Unicode for years, making the same unsubstantiated claims. Mark, I know you viewed "Python is broken" as an insult to the maintainers, but others do not. He certainly never singled out any individual for abuse. >> >> I'm autistic, stepped across the line, and got hammered. >> Hypocrisy >> at its best. You were chastised for making direct personal insults, for example, as you are doing to Marietta below. >> >> >> While some of us know your background, Mark, not everyone on this list >> does as people join at different times, so please try to give the >> benefit of the doubt to people. Marietta obviously takes how children >> are reflected personally and was trying to point out that fact. I don't >> think she meant for the CoC reference to come off as threatening, just >> to back up why she was taking the time out to speak up that she was >> upset by what was said. >> > > This list is irrelevant. The PSF has to be consistent across all of > its lists. Who the hell is Marietta, I don't recall a single post from > her in 16 years of using Python? > The fact that you haven't seen a post from Marietta before is irrelevant. She is new to this list, and has been traveling in different Python circles than you have. That doesn't make her less of a member of this community, and it doesn't diminish her point. Your vitriol towards her diminishes yours. --Ned. From yselivanov.ml at gmail.com Fri Oct 21 12:56:16 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 21 Oct 2016 12:56:16 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> <20161021102901.GJ22471@ando.pearwood.info> <37886575-dace-a8df-ca1d-50ead5e93748@gmail.com> Message-ID: <1f6a7a34-4cea-5018-187f-9955ad53e13b@gmail.com> On 2016-10-21 11:19 AM, Gustavo Carneiro wrote: > Personally, I hadn't realised we had this problem in asyncio until now. > > Does this problem happen in asyncio at all? Or does asyncio somehow work > around it by making sure to always explicitly destroy the frames of all > coroutine objects, as long as someone waits on each task? No, I think asyncio code is free of the problem this proposal is trying to address. We might have some "problem" in 3.6 when people start using async generators more often. But I think it's important for us to teach people to manage the associated resources from the outside of the generator (i.e. don't put 'async with' or 'with' inside the generator's body; instead, wrap the code that uses the generator with 'async with' or 'with'). Yury > > On 21 October 2016 at 16:08, Yury Selivanov wrote: > >> >> On 2016-10-21 6:29 AM, Steven D'Aprano wrote: >> >>> On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote: >>> >> [..] >> >>> With you proposal, to achieve the same (and make the code compatible >>>> with new for-loop semantics), users will have to implement both >>>> __iterclose__ and __del__. >>>> >>> As I ask above, couldn't we just inherit a default __(a)iterclose__ from >>> object that looks like this? >>> >>> def __iterclose__(self): >>> finalizer = getattr(type(self), '__del__', None) >>> if finalizer: >>> finalizer(self) >>> >>> >>> I know it looks a bit funny for non-iterables to have an iterclose >>> method, but they'll never actually be called. >>> >> No, we can't call __del__ from __iterclose__. Otherwise we'd >> break even more code that this proposal already breaks: >> >> >> for i in iter: >> ... >> iter.something() # <- this would be call after iter.__del__() >> >> [..] >> >>> As for the amount of good, this proposal originally came from PyPy. I >>> expect that CPython users won't appreciate it as much as PyPy users, and >>> Jython/IronPython users when they eventually support Python 3.x. >>> >> AFAIK the proposal came "for" PyPy, not "from". And the >> issues Nathaniel tries to solve do also exist in CPython. It's >> only a question if changing 'for' statement and iteration protocol >> is worth the trouble. >> >> Yury >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Fri Oct 21 16:59:34 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 21 Oct 2016 13:59:34 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021071219.GH22471@ando.pearwood.info> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On Fri, Oct 21, 2016 at 12:12 AM, Steven D'Aprano wrote: > Portability across Pythons... if all Pythons performed exactly the same, > why would we need multiple implementations? The way I see it, > non-deterministic cleanup is the cost you pay for a non-reference > counting implementation, for those who care about the garbage collection > implementation. (And yes, ref counting is garbage collection.) > Hmm -- and yet "with" was added, and I an't imageine that its largest use-case is with ( ;-) ) open: with open(filename, mode) as my_file: .... .... And yet for years I happily counted on reference counting to close my files, and was particularly happy with: data = open(filename, mode).read() I really liked that that file got opened, read, and closed and cleaned up right off the bat. And then context managers were introduced. And it seems to be there is a consensus in the Python community that we all should be using them when working on files, and I myself have finally started routinely using them, and teaching newbies to use them -- which is kind of a pain, 'cause I want to have them do basic file reading stuff before I explain what a "context manager" is. Anyway, my point is that the broader Python community really has been pretty consistent about making it easy to write code that will work the same way (maybe not with the same performance) across python implementations. Ans specifically with deterministic resource management. On my system, I can open 1000+ files as a regular user. I can't even > comprehend opening a tenth of that as an ordinary application, although > I can imagine that if I were writing a server application things would > be different. well, what you can image isn't really the point -- I've bumped into that darn open file limit in my work, which was not a server application (though it was some pretty serious number crunching...). And I'm sure I'm not alone. OK, to be fair that was a poorly designed library, not an issue with determinism of resource management (through designing the lib well WOULD depend on that) But then I don't expect to write server applications in > quite the same way as I do quick scripts or regular user applications. > Though data analysts DO write "quick scripts" that might need to do things like access 100s of files... > So it seems to me that a leaked file handler or two normally shouldn't > be a problem in practice. They'll be friend when the script or > application closes, and in the meantime, you have hundreds more > available. 90% of the time, using `with file` does exactly what we want, > and the times it doesn't (because we're writing a generator that isn't > closed promptly) 90% of those times it doesn't matter. that was the case with "with file" from the beginning -- particularly on cPython. And yet we all thought it was a great idea. > So (it seems to > me) that you're talking about changing the behaviour of for-loops to > suit only a small proportion of cases: maybe 10% of 10%. > I don't see what the big overhead is here. for loops would get a new feature, but it would only be used by the objects that chose to implement it. So no huge change. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Oct 21 14:39:59 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 21 Oct 2016 11:39:59 -0700 Subject: [Python-ideas] Python multi-dimensional array constructor In-Reply-To: References: Message-ID: <580A60FF.7020904@stoneleaf.us> On 10/19/2016 12:08 PM, Todd wrote: > At least in my opinion, this sort of approach really shines when making > higher-dimensional arrays. These would all be equivalent (the | at the > beginning and end are just to make it easier to align indentation, they > aren't required): > > a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 > || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 > ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 > || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 > |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 > || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 > ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 > || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 > ||||] > > b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 | > | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 | > | > | -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 | > | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 | > || > | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 | > | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 | > | > | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 | > | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] > > > Compared to the current approach: > > a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], > [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], > [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], > [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], > [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], > [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], > [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], > [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) > > I think both of the new examples are considerably clearer than the current approach. > > Does anyone have any questions or thoughts? Optional, semi-meaningless, not-really-an-operator markings? The current approach I could at least figure out if I had to -- yours is confusing. You have done a good job explaining what you mean, but what to you is clear is to me, and others, incomprehensible. -- ~Ethan~ From p.f.moore at gmail.com Fri Oct 21 18:53:53 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 21 Oct 2016 23:53:53 +0100 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On 21 October 2016 at 21:59, Chris Barker wrote: >> So (it seems to >> me) that you're talking about changing the behaviour of for-loops to >> suit only a small proportion of cases: maybe 10% of 10%. > > > I don't see what the big overhead is here. for loops would get a new > feature, but it would only be used by the objects that chose to implement > it. So no huge change. But the point is that the feature *would* affect people who don't need it. That's what I'm struggling to understand. I keep hearing "most code won't be affected", but then discussions about how we ensure that people are warned of where they need to add preserve() to their existing code to get the behaviour they already have. (And, of course, they need to add an "if we're on older pythons, define a no-op version of preserve() backward compatibility wrapper if they want their code to work cross version). I genuinely expect preserve() to pretty much instantly appear on people's lists of "python warts", and that bothers me. But I'm reaching the point where I'm just saying the same things over and over, so I'll bow out of this discussion now. I remain confused, but I'm going to have to trust that the people who have got a handle on the issue have understood the point I'm making, and have it covered. Paul From amit.mixie at gmail.com Fri Oct 21 18:48:30 2016 From: amit.mixie at gmail.com (Amit Green) Date: Fri, 21 Oct 2016 18:48:30 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: NOTE: This is my first post to this mailing list, I'm not really sure how to post a message, so I'm attempting a reply-all. I like Nathaniel's idea for __iterclose__. I suggest the following changes to deal with a few of the complex issues he discussed. 1. Missing __iterclose__, or a value of none, works as before, no changes. 2. An iterator can be used in one of three ways: A. 'for' loop, which will call __iterclose__ when it exits B. User controlled, in which case the user is responsible to use the iterator inside a with statement. C. Old style. The user is responsible for calling __iterclose__ 3. An iterator keeps track of __iter__ calls, this allows it to know when to cleanup. The two key additions, above, are: #2B. User can use iterator with __enter__ & __exit cleanly. #3. By tracking __iter__ calls, it makes complex user cases easier to handle. Specification ============= An iterator may implement the following method: __iterclose__. A missing method, or a value of None is allowed. When the user wants to control the iterator, the user is expected to use the iterator with a with clause. The core proposal is the change in behavior of ``for`` loops. Given this Python code: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of: _iter = iter(ITERABLE) _iterclose = getattr(_iter, '__iterclose__', None) if _iterclose is none: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY else: _stop_exception_seen = False try: traditional-for VAR in _iter: LOOP-BODY else: _stop_exception_seen = True ELSE-BODY finally: if not _stop_exception_seen: _iterclose(_iter) The test for 'none' allows us to skip the setup of a try/finally clause. Also we don't bother to call __iterclose__ if the iterator threw StopException at us. Modifications to basic iterator types ===================================== An iterator will implement something like the following: _cleanup - Private funtion, does the following: _enter_count = _itercount = -1 Do any neccessary cleanup, release resources, etc. NOTE: Is also called internally by the iterator, before throwing StopIterator _iter_count - Private value, starts at 0. _enter_count - Private value, starts at 0. __iter__ - if _iter_count >= 0: _iter_count += 1 return self __iterclose__ - if _iter_count is 0: if _enter_count is 0: _cleanup() elif _iter_count > 0: _iter_count -= 1 __enter__ - if _enter_count >= 0: _enter_count += 1 Return itself. __exit__ - if _enter_count is > 0 _enter_count -= 1 if _enter_count is _iter_count is 0: _cleanup() The suggetions on _iter_count & _enter_count are just example; internal details can differ (and better error handling). Examples: ========= NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for simplicity. For real use, the iterator would have resources such as open files it needs to close on cleanup. 1. Simple example: for v in xrange(7): print v Creates an iterator with a _usage_count of 0. The iterator exits normally (by throwing StopException), we don't bother to call __iterclose__ 2. Break example: for v in [1, 2, 3, 4, 5, 6, 7]: print v if v == 3: break Creates an iterator with a _usage_count of 0. The iterator exists after generating 4 numbers, we then call __iterclose__ & the iterator does any necessary cleanup. 3. Convert example #2 to print the next value: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 3: break print 'Next value is: ', seven.next() This will print: 1 2 3 Next value is: 4 How this works: 1. We create an iterator named seven (by calling list.__iter__). 2. We call seven.__enter__ 3. The for loop calls: seven.next() 3 times, and then calls: seven.__iterclose__ Since the _enter_count is 1, the iterator does not do cleanup yet. 4. We call seven.next() 5. We call seven.__exit. The iterator does its cleanup now. 4. More complicated example: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v This will print: 1 stolen: 2 stolen: 3 4 5 36 49 How this works: 1. Same as #3 above, cleanup is done by the __exit__ 5. Alternate way of doing #4. seven = iter([1, 2, 3, 4, 5, 6, 7]) for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v break # Different from #4 seven.__iterclose__() This will print: 1 stolen: 2 stolen: 3 4 5 36 How this works: 1. We create an iterator named seven. 2. The for loops all call seven.__iter__, causing _iter_count to increment. 3. The for loops all call seven.__iterclose__ on exit, decrement _iter_count. 4. The user calls the final __iterclose_, which close the iterator. NOTE: Method #5 is NOT recommended, the 'with' syntax is better. However, something like itertools.zip could call __iterclose__ during cleanup Change to iterators =================== All python iterators would need to add __iterclose__ (possibly with a value of None), __enter__, & __exit__. Third party iterators that do not implenent __iterclose__ cannot be used in a with clause. A new function could be added to itertools, something like: with itertools.with_wrapper(third_party_iterator) as x: ... The 'with_wrapper' would attempt to call __iterclose__ when its __exit__ function is called. On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith wrote: > Hi all, > > I'd like to propose that Python's iterator protocol be enhanced to add > a first-class notion of completion / cleanup. > > This is mostly motivated by thinking about the issues around async > generators and cleanup. Unfortunately even though PEP 525 was accepted > I found myself unable to stop pondering this, and the more I've > pondered the more convinced I've become that the GC hooks added in PEP > 525 are really not enough, and that we'll regret it if we stick with > them, or at least with them alone :-/. The strategy here is pretty > different -- it's an attempt to dig down and make a fundamental > improvement to the language that fixes a number of long-standing rough > spots, including async generators. > > The basic concept is relatively simple: just adding a '__iterclose__' > method that 'for' loops call upon completion, even if that's via break > or exception. But, the overall issue is fairly complicated + iterators > have a large surface area across the language, so the text below is > pretty long. Mostly I wrote it all out to convince myself that there > wasn't some weird showstopper lurking somewhere :-). For a first pass > discussion, it probably makes sense to mainly focus on whether the > basic concept makes sense? The main rationale is at the top, but the > details are there too for those who want them. > > Also, for *right* now I'm hoping -- probably unreasonably -- to try to > get the async iterator parts of the proposal in ASAP, ideally for > 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal > like this, which I apologize for -- though async generators are > provisional in 3.6, so at least in theory changing them is not out of > the question.) So again, it might make sense to focus especially on > the async parts, which are a pretty small and self-contained part, and > treat the rest of the proposal as a longer-term plan provided for > context. The comparison to PEP 525 GC hooks comes right after the > initial rationale. > > Anyway, I'll be interested to hear what you think! > > -n > > ------------------ > > Abstract > ======== > > We propose to extend the iterator protocol with a new > ``__(a)iterclose__`` slot, which is called automatically on exit from > ``(async) for`` loops, regardless of how they exit. This allows for > convenient, deterministic cleanup of resources held by iterators > without reliance on the garbage collector. This is especially valuable > for asynchronous generators. > > > Note on timing > ============== > > In practical terms, the proposal here is divided into two separate > parts: the handling of async iterators, which should ideally be > implemented ASAP, and the handling of regular iterators, which is a > larger but more relaxed project that can't start until 3.7 at the > earliest. But since the changes are closely related, and we probably > don't want to end up with async iterators and regular iterators > diverging in the long run, it seems useful to look at them together. > > > Background and motivation > ========================= > > Python iterables often hold resources which require cleanup. For > example: ``file`` objects need to be closed; the `WSGI spec > `_ adds a ``close`` method > on top of the regular iterator protocol and demands that consumers > call it at the appropriate time (though forgetting to do so is a > `frequent source of bugs > >`_); > and PEP 342 (based on PEP 325) extended generator objects to add a > ``close`` method to allow generators to clean up after themselves. > > Generally, objects that need to clean up after themselves also define > a ``__del__`` method to ensure that this cleanup will happen > eventually, when the object is garbage collected. However, relying on > the garbage collector for cleanup like this causes serious problems in > at least two cases: > > - In Python implementations that do not use reference counting (e.g. > PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet > many situations require *prompt* cleanup of resources. Delayed cleanup > produces problems like crashes due to file descriptor exhaustion, or > WSGI timing middleware that collects bogus times. > > - Async generators (PEP 525) can only perform cleanup under the > supervision of the appropriate coroutine runner. ``__del__`` doesn't > have access to the coroutine runner; indeed, the coroutine runner > might be garbage collected before the generator object. So relying on > the garbage collector is effectively impossible without some kind of > language extension. (PEP 525 does provide such an extension, but it > has a number of limitations that this proposal fixes; see the > "alternatives" section below for discussion.) > > Fortunately, Python provides a standard tool for doing resource > cleanup in a more structured way: ``with`` blocks. For example, this > code opens a file but relies on the garbage collector to close it:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > for document in read_newline_separated_json(path): > ... > > and recent versions of CPython will point this out by issuing a > ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: > > def read_newline_separated_json(path): > with open(path) as file_handle: # <-- with block > for line in file_handle: > yield json.loads(line) > > for document in read_newline_separated_json(path): # <-- outer for loop > ... > > But there's a subtlety here, caused by the interaction of ``with`` > blocks and generators. ``with`` blocks are Python's main tool for > managing cleanup, and they're a powerful one, because they pin the > lifetime of a resource to the lifetime of a stack frame. But this > assumes that someone will take care of cleaning up the stack frame... > and for generators, this requires that someone ``close`` them. > > In this case, adding the ``with`` block *is* enough to shut up the > ``ResourceWarning``, but this is misleading -- the file object cleanup > here is still dependent on the garbage collector. The ``with`` block > will only be unwound when the ``read_newline_separated_json`` > generator is closed. If the outer ``for`` loop runs to completion then > the cleanup will happen immediately; but if this loop is terminated > early by a ``break`` or an exception, then the ``with`` block won't > fire until the generator object is garbage collected. > > The correct solution requires that all *users* of this API wrap every > ``for`` loop in its own ``with`` block:: > > with closing(read_newline_separated_json(path)) as genobj: > for document in genobj: > ... > > This gets even worse if we consider the idiom of decomposing a complex > pipeline into multiple nested generators:: > > def read_users(path): > with closing(read_newline_separated_json(path)) as gen: > for document in gen: > yield User.from_json(document) > > def users_in_group(path, group): > with closing(read_users(path)) as gen: > for user in gen: > if user.group == group: > yield user > > In general if you have N nested generators then you need N+1 ``with`` > blocks to clean up 1 file. And good defensive programming would > suggest that any time we use a generator, we should assume the > possibility that there could be at least one ``with`` block somewhere > in its (potentially transitive) call stack, either now or in the > future, and thus always wrap it in a ``with``. But in practice, > basically nobody does this, because programmers would rather write > buggy code than tiresome repetitive code. In simple cases like this > there are some workarounds that good Python developers know (e.g. in > this simple case it would be idiomatic to pass in a file handle > instead of a path and move the resource management to the top level), > but in general we cannot avoid the use of ``with``/``finally`` inside > of generators, and thus dealing with this problem one way or another. > When beauty and correctness fight then beauty tends to win, so it's > important to make correct code beautiful. > > Still, is this worth fixing? Until async generators came along I would > have argued yes, but that it was a low priority, since everyone seems > to be muddling along okay -- but async generators make it much more > urgent. Async generators cannot do cleanup *at all* without some > mechanism for deterministic cleanup that people will actually use, and > async generators are particularly likely to hold resources like file > descriptors. (After all, if they weren't doing I/O, they'd be > generators, not async generators.) So we have to do something, and it > might as well be a comprehensive fix to the underlying problem. And > it's much easier to fix this now when async generators are first > rolling out, then it will be to fix it later. > > The proposal itself is simple in concept: add a ``__(a)iterclose__`` > method to the iterator protocol, and have (async) ``for`` loops call > it when the loop is exited, even if this occurs via ``break`` or > exception unwinding. Effectively, we're taking the current cumbersome > idiom (``with`` block + ``for`` loop) and merging them together into a > fancier ``for``. This may seem non-orthogonal, but makes sense when > you consider that the existence of generators means that ``with`` > blocks actually depend on iterator cleanup to work reliably, plus > experience showing that iterator cleanup is often a desireable feature > in its own right. > > > Alternatives > ============ > > PEP 525 asyncgen hooks > ---------------------- > > PEP 525 proposes a `set of global thread-local hooks managed by new > ``sys.{get/set}_asyncgen_hooks()`` functions > `_, which > allow event loops to integrate with the garbage collector to run > cleanup for async generators. In principle, this proposal and PEP 525 > are complementary, in the same way that ``with`` blocks and > ``__del__`` are complementary: this proposal takes care of ensuring > deterministic cleanup in most cases, while PEP 525's GC hooks clean up > anything that gets missed. But ``__aiterclose__`` provides a number of > advantages over GC hooks alone: > > - The GC hook semantics aren't part of the abstract async iterator > protocol, but are instead restricted `specifically to the async > generator concrete type `_. > If you have an async iterator implemented using a class, like:: > > class MyAsyncIterator: > async def __anext__(): > ... > > then you can't refactor this into an async generator without > changing its semantics, and vice-versa. This seems very unpythonic. > (It also leaves open the question of what exactly class-based async > iterators are supposed to do, given that they face exactly the same > cleanup problems as async generators.) ``__aiterclose__``, on the > other hand, is defined at the protocol level, so it's duck-type > friendly and works for all iterators, not just generators. > > - Code that wants to work on non-CPython implementations like PyPy > cannot in general rely on GC for cleanup. Without ``__aiterclose__``, > it's more or less guaranteed that developers who develop and test on > CPython will produce libraries that leak resources when used on PyPy. > Developers who do want to target alternative implementations will > either have to take the defensive approach of wrapping every ``for`` > loop in a ``with`` block, or else carefully audit their code to figure > out which generators might possibly contain cleanup code and add > ``with`` blocks around those only. With ``__aiterclose__``, writing > portable code becomes easy and natural. > > - An important part of building robust software is making sure that > exceptions always propagate correctly without being lost. One of the > most exciting things about async/await compared to traditional > callback-based systems is that instead of requiring manual chaining, > the runtime can now do the heavy lifting of propagating errors, making > it *much* easier to write robust code. But, this beautiful new picture > has one major gap: if we rely on the GC for generator cleanup, then > exceptions raised during cleanup are lost. So, again, with > ``__aiterclose__``, developers who care about this kind of robustness > will either have to take the defensive approach of wrapping every > ``for`` loop in a ``with`` block, or else carefully audit their code > to figure out which generators might possibly contain cleanup code. > ``__aiterclose__`` plugs this hole by performing cleanup in the > caller's context, so writing more robust code becomes the path of > least resistance. > > - The WSGI experience suggests that there exist important > iterator-based APIs that need prompt cleanup and cannot rely on the > GC, even in CPython. For example, consider a hypothetical WSGI-like > API based around async/await and async iterators, where a response > handler is an async generator that takes request headers + an async > iterator over the request body, and yields response headers + the > response body. (This is actually the use case that got me interested > in async generators in the first place, i.e. this isn't hypothetical.) > If we follow WSGI in requiring that child iterators must be closed > properly, then without ``__aiterclose__`` the absolute most > minimalistic middleware in our system looks something like:: > > async def noop_middleware(handler, request_header, request_body): > async with aclosing(handler(request_body, request_body)) as aiter: > async for response_item in aiter: > yield response_item > > Arguably in regular code one can get away with skipping the ``with`` > block around ``for`` loops, depending on how confident one is that one > understands the internal implementation of the generator. But here we > have to cope with arbitrary response handlers, so without > ``__aiterclose__``, this ``with`` construction is a mandatory part of > every middleware. > > ``__aiterclose__`` allows us to eliminate the mandatory boilerplate > and an extra level of indentation from every middleware:: > > async def noop_middleware(handler, request_header, request_body): > async for response_item in handler(request_header, request_body): > yield response_item > > So the ``__aiterclose__`` approach provides substantial advantages > over GC hooks. > > This leaves open the question of whether we want a combination of GC > hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since > the vast majority of generators are iterated over using a ``for`` loop > or equivalent, ``__aiterclose__`` handles most situations before the > GC has a chance to get involved. The case where GC hooks provide > additional value is in code that does manual iteration, e.g.:: > > agen = fetch_newline_separated_json_from_url(...) > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > # doesn't do 'await agen.aclose()' > > If we go with the GC-hooks + ``__aiterclose__`` approach, this > generator will eventually be cleaned up by GC calling the generator > ``__del__`` method, which then will use the hooks to call back into > the event loop to run the cleanup code. > > If we go with the no-GC-hooks approach, this generator will eventually > be garbage collected, with the following effects: > > - its ``__del__`` method will issue a warning that the generator was > not closed (similar to the existing "coroutine never awaited" > warning). > > - The underlying resources involved will still be cleaned up, because > the generator frame will still be garbage collected, causing it to > drop references to any file handles or sockets it holds, and then > those objects's ``__del__`` methods will release the actual operating > system resources. > > - But, any cleanup code inside the generator itself (e.g. logging, > buffer flushing) will not get a chance to run. > > The solution here -- as the warning would indicate -- is to fix the > code so that it calls ``__aiterclose__``, e.g. by using a ``with`` > block:: > > async with aclosing(fetch_newline_separated_json_from_url(...)) as > agen: > while True: > document = await type(agen).__anext__(agen) > if document["id"] == needle: > break > > Basically in this approach, the rule would be that if you want to > manually implement the iterator protocol, then it's your > responsibility to implement all of it, and that now includes > ``__(a)iterclose__``. > > GC hooks add non-trivial complexity in the form of (a) new global > interpreter state, (b) a somewhat complicated control flow (e.g., > async generator GC always involves resurrection, so the details of PEP > 442 are important), and (c) a new public API in asyncio (``await > loop.shutdown_asyncgens()``) that users have to remember to call at > the appropriate time. (This last point in particular somewhat > undermines the argument that GC hooks provide a safe backup to > guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called > correctly then I *think* it's possible for generators to be silently > discarded without their cleanup code being called; compare this to the > ``__aiterclose__``-only approach where in the worst case we still at > least get a warning printed. This might be fixable.) All this > considered, GC hooks arguably aren't worth it, given that the only > people they help are those who want to manually call ``__anext__`` yet > don't want to manually call ``__aiterclose__``. But Yury disagrees > with me on this :-). And both options are viable. > > > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: > > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > ``with`` blocks problem". But unfortunately, it breaks down quickly > when things get more complex. Consider if instead of reading from a > file, our generator was reading from a streaming HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. So this is really a workaround for simple > cases, not a general solution. > > > More complex variants of __(a)iterclose__ > ----------------------------------------- > > The semantics of ``__(a)iterclose__`` are somewhat inspired by > ``with`` blocks, but context managers are more powerful: > ``__(a)exit__`` can distinguish between a normal exit versus exception > unwinding, and in the case of an exception it can examine the > exception details and optionally suppress propagation. > ``__(a)iterclose__`` as proposed here does not have these powers, but > one can imagine an alternative design where it did. > > However, this seems like unwarranted complexity: experience suggests > that it's common for iterables to have ``close`` methods, and even to > have ``__exit__`` methods that call ``self.close()``, but I'm not > aware of any common cases that make use of ``__exit__``'s full power. > I also can't think of any examples where this would be useful. And it > seems unnecessarily confusing to allow iterators to affect flow > control by swallowing exceptions -- if you're in a situation where you > really want that, then you should probably use a real ``with`` block > anyway. > > > Specification > ============= > > This section describes where we want to eventually end up, though > there are some backwards compatibility issues that mean we can't jump > directly here. A later section describes the transition plan. > > > Guiding principles > ------------------ > > Generally, ``__(a)iterclose__`` implementations should: > > - be idempotent, > - perform any cleanup that is appropriate on the assumption that the > iterator will not be used again after ``__(a)iterclose__`` is called. > In particular, once ``__(a)iterclose__`` has been called then calling > ``__(a)next__`` produces undefined behavior. > > And generally, any code which starts iterating through an iterable > with the intention of exhausting it, should arrange to make sure that > ``__(a)iterclose__`` is eventually called, whether or not the iterator > is actually exhausted. > > > Changes to iteration > -------------------- > > The core proposal is the change in behavior of ``for`` loops. Given > this Python code:: > > for VAR in ITERABLE: > LOOP-BODY > else: > ELSE-BODY > > we desugar to the equivalent of:: > > _iter = iter(ITERABLE) > _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) > try: > traditional-for VAR in _iter: > LOOP-BODY > else: > ELSE-BODY > finally: > _iterclose(_iter) > > where the "traditional-for statement" here is meant as a shorthand for > the classic 3.5-and-earlier ``for`` loop semantics. > > Besides the top-level ``for`` statement, Python also contains several > other places where iterators are consumed. For consistency, these > should call ``__iterclose__`` as well using semantics equivalent to > the above. This includes: > > - ``for`` loops inside comprehensions > - ``*`` unpacking > - functions which accept and fully consume iterables, like > ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and > others. > > > Changes to async iteration > -------------------------- > > We also make the analogous changes to async iteration constructs, > except that the new slot is called ``__aiterclose__``, and it's an > async method that gets ``await``\ed. > > > Modifications to basic iterator types > ------------------------------------- > > Generator objects (including those created by generator comprehensions): > - ``__iterclose__`` calls ``self.close()`` > - ``__del__`` calls ``self.close()`` (same as now), and additionally > issues a ``ResourceWarning`` if the generator wasn't exhausted. This > warning is hidden by default, but can be enabled for those who want to > make sure they aren't inadverdantly relying on CPython-specific GC > semantics. > > Async generator objects (including those created by async generator > comprehensions): > - ``__aiterclose__`` calls ``self.aclose()`` > - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been > called, since this probably indicates a latent bug, similar to the > "coroutine never awaited" warning. > > QUESTION: should file objects implement ``__iterclose__`` to close the > file? On the one hand this would make this change more disruptive; on > the other hand people really like writing ``for line in open(...): > ...``, and if we get used to iterators taking care of their own > cleanup then it might become very weird if files don't. > > > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # QUESTION: I feel like there might be a better name for this one? > class preserve(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.preserve(handle): > ... > handle.seek(0) > for line in itertools.preserve(handle): > ... > > The ``operator`` module gains two new functions, with semantics > equivalent to the following:: > > def iterclose(it): > if hasattr(type(it), "__iterclose__"): > type(it).__iterclose__(it) > > async def aiterclose(ait): > if hasattr(type(ait), "__aiterclose__"): > await type(ait).__aiterclose__(ait) > > These are particularly useful when implementing the changes in the next > section: > > > __iterclose__ implementations for iterator wrappers > --------------------------------------------------- > > Python ships a number of iterator types that act as wrappers around > other iterators: ``map``, ``zip``, ``itertools.accumulate``, > ``csv.reader``, and others. These iterators should define a > ``__iterclose__`` method which calls ``__iterclose__`` in turn on > their underlying iterators. For example, ``map`` could be implemented > as:: > > class map: > def __init__(self, fn, *iterables): > self._fn = fn > self._iters = [iter(iterable) for iterable in iterables] > > def __iter__(self): > return self > > def __next__(self): > return self._fn(*[next(it) for it in self._iters]) > > def __iterclose__(self): > for it in self._iters: > operator.iterclose(it) > > In some cases this requires some subtlety; for example, > ```itertools.tee`` > `_ > should not call ``__iterclose__`` on the underlying iterator until it > has been called on *all* of the clone iterators. > > > Example / Rationale > ------------------- > > The payoff for all this is that we can now write straightforward code > like:: > > def read_newline_separated_json(path): > for line in open(path): > yield json.loads(line) > > and be confident that the file will receive deterministic cleanup > *without the end-user having to take any special effort*, even in > complex cases. For example, consider this silly pipeline:: > > list(map(lambda key: key.upper(), > doc["key"] for doc in read_newline_separated_json(path))) > > If our file contains a document where ``doc["key"]`` turns out to be > an integer, then the following sequence of events will happen: > > 1. ``key.upper()`` raises an ``AttributeError``, which propagates out > of the ``map`` and triggers the implicit ``finally`` block inside > ``list``. > 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the > map object. > 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator > comprehension object. > 4. This injects a ``GeneratorExit`` exception into the generator > comprehension body, which is currently suspended inside the > comprehension's ``for`` loop body. > 5. The exception propagates out of the ``for`` loop, triggering the > ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__`` on the generator object representing the call to > ``read_newline_separated_json``. > 6. This injects an inner ``GeneratorExit`` exception into the body of > ``read_newline_separated_json``, currently suspended at the ``yield``. > 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, > triggering the ``for`` loop's implicit ``finally`` block, which calls > ``__iterclose__()`` on the file object. > 8. The file object is closed. > 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary > of the generator function, and causes > ``read_newline_separated_json``'s ``__iterclose__()`` method to return > successfully. > 10. Control returns to the generator comprehension body, and the outer > ``GeneratorExit`` continues propagating, allowing the comprehension's > ``__iterclose__()`` to return successfully. > 11. The rest of the ``__iterclose__()`` calls unwind without incident, > back into the body of ``list``. > 12. The original ``AttributeError`` resumes propagating. > > (The details above assume that we implement ``file.__iterclose__``; if > not then add a ``with`` block to ``read_newline_separated_json`` and > essentially the same logic goes through.) > > Of course, from the user's point of view, this can be simplified down to > just: > > 1. ``int.upper()`` raises an ``AttributeError`` > 1. The file object is closed. > 2. The ``AttributeError`` propagates out of ``list`` > > So we've accomplished our goal of making this "just work" without the > user having to think about it. > > > Transition plan > =============== > > While the majority of existing ``for`` loops will continue to produce > identical results, the proposed changes will produce > backwards-incompatible behavior in some cases. Example:: > > def read_csv_with_header(lines_iterable): > lines_iterator = iter(lines_iterable) > for line in lines_iterator: > column_names = line.strip().split("\t") > break > for line in lines_iterator: > values = line.strip().split("\t") > record = dict(zip(column_names, values)) > yield record > > This code used to be correct, but after this proposal is implemented > will require an ``itertools.preserve`` call added to the first ``for`` > loop. > > [QUESTION: currently, if you close a generator and then try to iterate > over it then it just raises ``Stop(Async)Iteration``, so code the > passes the same generator object to multiple ``for`` loops but forgets > to use ``itertools.preserve`` won't see an obvious error -- the second > ``for`` loop will just exit immediately. Perhaps it would be better if > iterating a closed generator raised a ``RuntimeError``? Note that > files don't have this problem -- attempting to iterate a closed file > object already raises ``ValueError``.] > > Specifically, the incompatibility happens when all of these factors > come together: > > - The automatic calling of ``__(a)iterclose__`` is enabled > - The iterable did not previously define ``__(a)iterclose__`` > - The iterable does now define ``__(a)iterclose__`` > - The iterable is re-used after the ``for`` loop exits > > So the problem is how to manage this transition, and those are the > levers we have to work with. > > First, observe that the only async iterables where we propose to add > ``__aiterclose__`` are async generators, and there is currently no > existing code using async generators (though this will start changing > very soon), so the async changes do not produce any backwards > incompatibilities. (There is existing code using async iterators, but > using the new async for loop on an old async iterator is harmless, > because old async iterators don't have ``__aiterclose__``.) In > addition, PEP 525 was accepted on a provisional basis, and async > generators are by far the biggest beneficiary of this PEP's proposed > changes. Therefore, I think we should strongly consider enabling > ``__aiterclose__`` for ``async for`` loops and async generators ASAP, > ideally for 3.6.0 or 3.6.1. > > For the non-async world, things are harder, but here's a potential > transition path: > > In 3.7: > > Our goal is that existing unsafe code will start emitting warnings, > while those who want to opt-in to the future can do that immediately: > > - We immediately add all the ``__iterclose__`` methods described above. > - If ``from __future__ import iterclose`` is in effect, then ``for`` > loops and ``*`` unpacking call ``__iterclose__`` as specified above. > - If the future is *not* enabled, then ``for`` loops and ``*`` > unpacking do *not* call ``__iterclose__``. But they do call some other > method instead, e.g. ``__iterclose_warning__``. > - Similarly, functions like ``list`` use stack introspection (!!) to > check whether their direct caller has ``__future__.iterclose`` > enabled, and use this to decide whether to call ``__iterclose__`` or > ``__iterclose_warning__``. > - For all the wrapper iterators, we also add ``__iterclose_warning__`` > methods that forward to the ``__iterclose_warning__`` method of the > underlying iterator or iterators. > - For generators (and files, if we decide to do that), > ``__iterclose_warning__`` is defined to set an internal flag, and > other methods on the object are modified to check for this flag. If > they find the flag set, they issue a ``PendingDeprecationWarning`` to > inform the user that in the future this sequence would have led to a > use-after-close situation and the user should use ``preserve()``. > > In 3.8: > > - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` > > In 3.9: > > - Enable the ``__future__`` unconditionally and remove all the > ``__iterclose_warning__`` stuff. > > I believe that this satisfies the normal requirements for this kind of > transition -- opt-in initially, with warnings targeted precisely to > the cases that will be effected, and a long deprecation cycle. > > Probably the most controversial / risky part of this is the use of > stack introspection to make the iterable-consuming functions sensitive > to a ``__future__`` setting, though I haven't thought of any situation > where it would actually go wrong yet... > > > Acknowledgements > ================ > > Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for > helpful discussion on earlier versions of this idea. > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Oct 21 23:22:04 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 21 Oct 2016 20:22:04 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: <580ADB5C.6000007@stoneleaf.us> On 10/21/2016 03:48 PM, Amit Green wrote: > NOTE: This is my first post to this mailing list, I'm not really sure > how to post a message, so I'm attempting a reply-all. Seems to have worked! :) > I like Nathaniel's idea for __iterclose__. > > I suggest the following changes to deal with a few of the complex issues > he discussed. Your examples are interesting, but they don't seem to address the issue of closing down for loops that are using generators when those loops exit early: ----------------------------- def some_work(): with some_resource(): for widget in resource: yield widget for pane in some_work(): break: # what happens here? ----------------------------- How does your solution deal with that situation? Or are you saying that this would be closed with your modifications, and if I didn't want the generator to be closed I would have to do: ----------------------------- with some_work() as temp_gen: for pane in temp_gen: break: for another_pane in temp_gen: # temp_gen is still alive here ----------------------------- In other words, instead using the preserve() function, we would use a with statement? -- ~Ethan~ From njs at pobox.com Fri Oct 21 23:45:43 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 21 Oct 2016 20:45:43 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On Fri, Oct 21, 2016 at 3:48 PM, Amit Green wrote: > NOTE: This is my first post to this mailing list, I'm not really sure > how to post a message, so I'm attempting a reply-all. > > I like Nathaniel's idea for __iterclose__. > > I suggest the following changes to deal with a few of the complex issues > he discussed. > > 1. Missing __iterclose__, or a value of none, works as before, > no changes. > > 2. An iterator can be used in one of three ways: > > A. 'for' loop, which will call __iterclose__ when it exits > > B. User controlled, in which case the user is responsible to use the > iterator inside a with statement. > > C. Old style. The user is responsible for calling __iterclose__ > > 3. An iterator keeps track of __iter__ calls, this allows it to know > when to cleanup. > > > The two key additions, above, are: > > #2B. User can use iterator with __enter__ & __exit cleanly. > > #3. By tracking __iter__ calls, it makes complex user cases easier > to handle. These are interesting ideas! A few general comments: - I don't think we want the "don't bother to call __iterclose__ on exhaustion" functionality --it's actually useful to be able to distinguish between # closes file_handle for line in file_handle: ... and # leaves file_handle open for line in preserve(file_handle): ... To be able to distinguish these cases, it's important that the 'for' loop always call __iterclose__ (which preserve() might then cancel out). - I think it'd be practically difficult and maybe too much magic to add __enter__/__exit__/nesting-depth counts to every iterator implementation. But, the idea of using a context manager for repeated partial iteration is a great idea :-). How's this for a simplified version that still covers the main use cases? @contextmanager def reuse_then_close(it): # TODO: come up with a better name it = iter(it) try: yield preserve(it) finally: iterclose(it) with itertools.reuse_then_close(some_generator(...)) as it: for obj in it: ... # still open here, because our reference to the iterator is wrapped in preserve(...) for obj in it: ... # but then closed here, by the 'with' block -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Sat Oct 22 00:25:28 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 21 Oct 2016 21:25:28 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <20161021102901.GJ22471@ando.pearwood.info> References: <74ca605c-8775-72fc-b0e8-7f7bcc396df4@gmail.com> <20161021102901.GJ22471@ando.pearwood.info> Message-ID: On Fri, Oct 21, 2016 at 3:29 AM, Steven D'Aprano wrote: > As for the amount of good, this proposal originally came from PyPy. Just to be clear, I'm not a PyPy dev, and the PyPy devs' contribution here was mostly to look over a draft I circulated and to agree that it seemed like something that'd be useful to them. -n -- Nathaniel J. Smith -- https://vorpus.org From rainventions at gmail.com Sat Oct 22 01:17:58 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Sat, 22 Oct 2016 01:17:58 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython Message-ID: Hello everyone, I want to start small and ask about smart/curly quote marks (? vs "). Although most languages do not support these characters as quotation marks, I believe that cPython should, if possible. I'm willing to write the patch, of course, but I wanted to ask about this change, if it has come up before, and if there are any compatibility issues that I'm not seeing here. Thank you, -Ryan Birmingham -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Oct 22 01:34:51 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 21 Oct 2016 22:34:51 -0700 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: Message-ID: <580AFA7B.6020603@stoneleaf.us> On 10/21/2016 10:17 PM, Ryan Birmingham wrote: > I want to start small and ask about smart/curly quote marks (? vs "). > Although most languages do not support these characters as quotation > marks, I believe that cPython should, if possible. I'm willing to write > the patch, of course, but I wanted to ask about this change, if it has > come up before, and if there are any compatibility issues that I'm not > seeing here. What is the advantage of supporting them? New behavior, or just more possible quotes characters? -- ~Ethan~ From rainventions at gmail.com Sat Oct 22 01:45:25 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Sat, 22 Oct 2016 01:45:25 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <580AFA7B.6020603@stoneleaf.us> References: <580AFA7B.6020603@stoneleaf.us> Message-ID: I was thinking of using them only as possibly quotes characters, as students and beginners seem to have difficulties due to this quote-mismatch error. That OSX has smart quotes enabled by default makes this a worthwhile consideration, in my opinion. -Ryan Birmingham On 22 October 2016 at 01:34, Ethan Furman wrote: > On 10/21/2016 10:17 PM, Ryan Birmingham wrote: > > I want to start small and ask about smart/curly quote marks (? vs "). >> Although most languages do not support these characters as quotation >> marks, I believe that cPython should, if possible. I'm willing to write >> the patch, of course, but I wanted to ask about this change, if it has >> come up before, and if there are any compatibility issues that I'm not >> seeing here. >> > > What is the advantage of supporting them? New behavior, or just more > possible quotes characters? > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Sat Oct 22 02:13:35 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Sat, 22 Oct 2016 06:13:35 +0000 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> Message-ID: Interesting idea. +1 from me; probably can be as simple as just having the tokenizer interpret curly quotes as the ASCII (straight) version of itself (in other words, " and the two curly versions of that would all produce the same token, and same for single quotes, eliminating any need for additional changes further down the chain). This would help with copying and pasting code snippets from a source that may have auto-formatted the quotes without the original author realizing it. On Sat, Oct 22, 2016 at 1:46 AM Ryan Birmingham wrote: > I was thinking of using them only as possibly quotes characters, as > students and beginners seem to have difficulties due to this quote-mismatch > error. That OSX has smart quotes enabled by default makes this a worthwhile > consideration, in my opinion. > > -Ryan Birmingham > > On 22 October 2016 at 01:34, Ethan Furman wrote: > > On 10/21/2016 10:17 PM, Ryan Birmingham wrote: > > I want to start small and ask about smart/curly quote marks (? vs "). > Although most languages do not support these characters as quotation > marks, I believe that cPython should, if possible. I'm willing to write > the patch, of course, but I wanted to ask about this change, if it has > come up before, and if there are any compatibility issues that I'm not > seeing here. > > > What is the advantage of supporting them? New behavior, or just more > possible quotes characters? > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 22 02:35:14 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 22 Oct 2016 17:35:14 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: Message-ID: <20161022063513.GN22471@ando.pearwood.info> On Sat, Oct 22, 2016 at 01:17:58AM -0400, Ryan Birmingham wrote: > Hello everyone, > > I want to start small and ask about smart/curly quote marks (? vs "). Which curly quotes are you going to support? There's Dutch, of course: ??? ??? But how about ? ? - English ??? ??? - French ? ? ? ??? - Swiss ??? ??? - Hebrew ??? ??? - Hungarian ??? ??? - Icelandic ??? ??? - Japanese ??? ??? - Polish ??? ??? ??? - Swedish ??? ??? ??? ??? to mention only a few. I think it would be unfair to all the non-Dutch programmers if we only supported Dutch quotation marks, but as you can see, supporting the full range of internationalised curly quotes is difficult. > Although most languages do not support these characters as quotation marks, > I believe that cPython should, if possible. You say "most" -- do you know which programming languages support typographical quotation marks for strings? It would be good to see a survey of which languages support this feature, and how they cope with the internationalisation problem. I think this is likely to be just too hard. There's a reason why programming has standardized on the lowest common denominator for quotation marks '' "" and occasionally `` as well. -- Steve From rainventions at gmail.com Sat Oct 22 02:49:09 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Sat, 22 Oct 2016 02:49:09 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <20161022063513.GN22471@ando.pearwood.info> References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: The quotes I intended in this email are just “ ‘ ” , and ’ where the encoding is appropriate. Internationalization was not the intent of this. I do believe that you have a good point with supporting common quotes in other languages, but I believe that such a change would be large enough to consider a PEP. I am aware that there are other unicode characters, even in English with the Quotation_Mark character property, but this proposed change aims to solve the problem caused when editors, mail clients, web browsers, and operating systems over-zealously replacing straight quotes with these typographical characters. -Ryan Birmingham On 22 October 2016 at 02:35, Steven D'Aprano wrote: > On Sat, Oct 22, 2016 at 01:17:58AM -0400, Ryan Birmingham wrote: > > Hello everyone, > > > > I want to start small and ask about smart/curly quote marks (? vs "). > > Which curly quotes are you going to support? There's Dutch, of course: > > ??? ??? > > But how about ? ? > > - English ??? ??? > > - French ? ? ? ??? > > - Swiss ??? ??? > > - Hebrew ??? ??? > > - Hungarian ??? ??? > > - Icelandic ??? ??? > > - Japanese ??? ??? > > - Polish ??? ??? ??? > > - Swedish ??? ??? ??? ??? > > to mention only a few. I think it would be unfair to all the non-Dutch > programmers if we only supported Dutch quotation marks, but as you can > see, supporting the full range of internationalised curly quotes is > difficult. > > > > Although most languages do not support these characters as quotation > marks, > > I believe that cPython should, if possible. > > You say "most" -- do you know which programming languages support > typographical quotation marks for strings? It would be good to see a > survey of which languages support this feature, and how they cope with > the internationalisation problem. > > I think this is likely to be just too hard. There's a reason why > programming has standardized on the lowest common denominator for > quotation marks '' "" and occasionally `` as well. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 22 03:16:37 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 22 Oct 2016 18:16:37 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> Message-ID: <20161022071637.GO22471@ando.pearwood.info> On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote: > Interesting idea. +1 from me; probably can be as simple as just having the > tokenizer interpret curly quotes as the ASCII (straight) version of itself > (in other words, " and the two curly versions of that would all produce the > same token, and same for single quotes, eliminating any need for additional > changes further down the chain). There's a lot more than two. At least nineteen (including the ASCII ones): ?????"'???????????? > This would help with copying and pasting > code snippets from a source that may have auto-formatted the quotes without > the original author realizing it. Personally, I think that we should not encourage programmers to take a lazy, slap-dash attitude to coding. Precision is important to programmers, and there is no limit to how imprecise users can be. Should we also guard against people accidentally using prime marks or ornaments (dingbats): ?????? ?????? as well? If not, what makes them different from other accidents of careless programmers? I don't think we should be trying to guess what programmers mean, nor do I think that we should be encouraging programmers to use word processors for coding. Use the right tool for the right job, and even Notepad is better for the occasional programmer than Microsoft Office or LibreOffice. Programming is hard, requiring precision and care, and we don't do beginners any favours by making it easy for them to be imprecise and careless. I would be happy to see improved error messages for smart quotes: py> s = ?abcd? File "", line 1 s = ?abcd? ^ SyntaxError: invalid character in identifier (especially in IDLE), but I'm very dubious about the idea of using typographical quote marks for strings. At the very least, Python should not lead the way here. Let some other language experiment with this first, and see what happens. Python is a mature, established language, not an experimental language. Of course, there's nothing wrong with doing an experimental branch of Python supporting this feature, to see what happens. But that doesn't mean we should impose it as an official language rule. -- Steve From rosuav at gmail.com Sat Oct 22 03:17:54 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 22 Oct 2016 18:17:54 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: On Sat, Oct 22, 2016 at 5:49 PM, Ryan Birmingham wrote: > this proposed change aims to solve the problem caused when editors, mail > clients, web browsers, and operating systems over-zealously replacing > straight quotes with these typographical characters. > A programming editor shouldn't mangle your quotes, and a word processor sucks for editing code anyway, so I'd rule those out. When does an operating system change your quotes? It's really just mail and web where these kinds of issues happen. Any web site that's actually designed for code is, like a programmer's editor, going to be quote-safe; and it's not hard to configure a mail client to not mess with you. How strong is this use-case, really? ChrisA From rainventions at gmail.com Sat Oct 22 03:36:16 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Sat, 22 Oct 2016 03:36:16 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <20161022071637.GO22471@ando.pearwood.info> References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: Per the comments in this thread, I believe that a better error message for this case would be a reasonable way to fix the use case around this issue. It can be difficult to notice that your quotes are curved if you don't know that's what you're looking for. -Ryan Birmingham On 22 October 2016 at 03:16, Steven D'Aprano wrote: > On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote: > > Interesting idea. +1 from me; probably can be as simple as just having > the > > tokenizer interpret curly quotes as the ASCII (straight) version of > itself > > (in other words, " and the two curly versions of that would all produce > the > > same token, and same for single quotes, eliminating any need for > additional > > changes further down the chain). > > There's a lot more than two. At least nineteen (including the ASCII > ones): ?????"'???????????? > > > > This would help with copying and pasting > > code snippets from a source that may have auto-formatted the quotes > without > > the original author realizing it. > > Personally, I think that we should not encourage programmers to take a > lazy, slap-dash attitude to coding. Precision is important to > programmers, and there is no limit to how imprecise users can be. Should > we also guard against people accidentally using prime marks or ornaments > (dingbats): > > ?????? ?????? > > as well? If not, what makes them different from other accidents of > careless programmers? > > I don't think we should be trying to guess what programmers mean, nor do > I think that we should be encouraging programmers to use word processors > for coding. Use the right tool for the right job, and even Notepad is > better for the occasional programmer than Microsoft Office or > LibreOffice. Programming is hard, requiring precision and care, and we > don't do beginners any favours by making it easy for them to be > imprecise and careless. > > I would be happy to see improved error messages for smart quotes: > > py> s = ?abcd? > File "", line 1 > s = ?abcd? > ^ > SyntaxError: invalid character in identifier > > (especially in IDLE), but I'm very dubious about the idea of using > typographical quote marks for strings. At the very least, Python should > not lead the way here. Let some other language experiment with this > first, and see what happens. Python is a mature, established language, > not an experimental language. > > Of course, there's nothing wrong with doing an experimental branch of > Python supporting this feature, to see what happens. But that doesn't > mean we should impose it as an official language rule. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Oct 22 03:50:52 2016 From: python at 2sn.net (Alexander Heger) Date: Sat, 22 Oct 2016 18:50:52 +1100 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> References: <20161013165546.GB22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> <580956A3.6030205@canterbury.ac.nz> <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> Message-ID: > > >> For me the current behaviour does not seem unreasonable as it resembles >>> the order in which you write out loops outside a comprehension >>> >> >> That's true, but the main reason for having comprehensions >> syntax in the first place is so that it can be read >> declaratively -- as a description of the list you want, >> rather than a step-by-step sequence of instructions for >> building it up. >> >> If you have to stop and mentally transform it into nested >> for-statements, that very purpose is undermined. >> > Exactly. Well, an argument that was often brought up on this forum is that Python should do things consistently, and not in one way in one place and in another way in another place, for the same thing. Here it is about the order of loop execution. The current behaviour in comprehension is that is ts being done the same way as in nested for loops. Which is easy enough to remember. Same way, everywhere. -Alexander -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonmarkholland at gmail.com Sat Oct 22 04:34:23 2016 From: simonmarkholland at gmail.com (Simon Mark Holland) Date: Sat, 22 Oct 2016 15:34:23 +0700 Subject: [Python-ideas] Easily remove characters from a string. Message-ID: Having researched this as heavily as I am capable with limited experience, I would like to suggest a Python 3 equivalent to string.translate() that doesn't require a table as input. Maybe in the form of str.stripall() or str.replaceall(). My reasoning is that while it is currently possible to easily strip() preceding and trailing characters, and even replace() individual characters from a string, to replace more than one characters from anywhere within the string requires (i believe) at its simplest a command like this : some_string.translate(str.maketrans('','','0123456789')) In Python 2.* however we could say ... some_string.translate(None, '0123456789') My proposal is that if strip() and replace() are important enough to receive modules, then the arguably more common operation (in terms of programming tutorials, if not mainstream development) of just removing all instances of specified numbers, punctuation, or even letters etc from a list of characters should also. I wholeheartedly admit that there are MANY other ways to do this (including RegEx and List Comprehensions), as listed in the StackOverflow answer below. However the same could be said for replace() and strip(). http://stackoverflow.com/questions/22187233/how-to-delete-all-instances-of-a-character-in-a-string-in-python This is my first suggestion and welcome any and all feedback, even if this is a silly idea I would really like to know why it is. I have not seen discussion of this before, but if there is such a discussion I would welcome being directed to it. Thank you for your time. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Oct 22 07:09:38 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 22 Oct 2016 12:09:38 +0100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: On 22 October 2016 at 08:17, Chris Angelico wrote: > On Sat, Oct 22, 2016 at 5:49 PM, Ryan Birmingham wrote: >> this proposed change aims to solve the problem caused when editors, mail >> clients, web browsers, and operating systems over-zealously replacing >> straight quotes with these typographical characters. >> > > A programming editor shouldn't mangle your quotes, and a word > processor sucks for editing code anyway, so I'd rule those out. When > does an operating system change your quotes? It's really just mail and > web where these kinds of issues happen. Any web site that's actually > designed for code is, like a programmer's editor, going to be > quote-safe; and it's not hard to configure a mail client to not mess > with you. > > How strong is this use-case, really? While I agree that it's important for new programmers to learn precision, there are a lot of environments where smart quotes get accidentally inserted into code. * Pasting code into MS Word documents for reference (even if you then format the code as visibly code, the smart quote translation has already happened). That's remarkably common in the sorts of environments I deal in, where code gets quoted in documents, and then later copied out to be reused. * Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes. So in my experience this problem is pretty common. However, I view it as a chance to teach correct use of quotes in programming, rather than something to gloss over or "do what I mean" with. -1 from me. Paul From rosuav at gmail.com Sat Oct 22 07:19:22 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 22 Oct 2016 22:19:22 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: On Sat, Oct 22, 2016 at 10:09 PM, Paul Moore wrote: > While I agree that it's important for new programmers to learn > precision, there are a lot of environments where smart quotes get > accidentally inserted into code. > > * Pasting code into MS Word documents for reference (even if you then > format the code as visibly code, the smart quote translation has > already happened). That's remarkably common in the sorts of > environments I deal in, where code gets quoted in documents, and then > later copied out to be reused. One of my students remarked that she had a lot of trouble trying to maintain a notes file, because she couldn't decide whether to use a word processor (with a spell checker) or a code editor (with automatic indentation and syntax highlighting). Still, I think the solution would be to have code editors grow facilities for working with text, rather than word processors grow facilities for working with code, or programming languages grow features for coping with word processors. > * Tutorial/example material prepared by non-programmers, again using > tools that are too "helpful" in auto-converting to smart quotes. Definite learning moment for the person preparing the tutorial. If you were writing a tutorial for Russian speakers and just wrote everything using the Latin alphabet, nobody would say "we should teach Russian people to use the alphabet that my editor uses"; code has its own rules, and if you're writing about code, you should learn how to write it appropriately. > So in my experience this problem is pretty common. However, I view it > as a chance to teach correct use of quotes in programming, rather than > something to gloss over or "do what I mean" with. > > -1 from me. Agreed. Maybe the upshot of this will be a python-list thread recommending some editors that handle both code and screed well - that would be a worthwhile thread IMO. ChrisA From srkunze at mail.de Sat Oct 22 12:01:11 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 22 Oct 2016 18:01:11 +0200 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> <580956A3.6030205@canterbury.ac.nz> <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> Message-ID: On 22.10.2016 09:50, Alexander Heger wrote: > Well, an argument that was often brought up on this forum is that > Python should do things consistently, and not in one way in one place > and in another way in another place, for the same thing. Like * in list displays? ;-) > Here it is about the order of loop execution. The current behaviour > in comprehension is that is ts being done the same way as in nested > for loops. It still would. Just read it from right to left. The order stays the same. > Which is easy enough to remember. Same way, everywhere. I am sorry but many disagree with you on this thread. I still don't understand why the order needs to be one-way anyway. Comprehensions are a declarative construct, so it should be possible to mix those "for .. in .."s up in an arbitrary order. Cheers, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 22 12:02:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Oct 2016 02:02:50 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 20 October 2016 at 07:02, Nathaniel Smith wrote: > The first change is to replace the outer for loop with a while/pop > loop, so that if an exception occurs we'll know which iterables remain > to be processed: > > def chain(*iterables): > try: > while iterables: > for element in iterables.pop(0): > yield element > ... > > Now, what do we do if an exception does occur? We need to call > iterclose on all of the remaining iterables, but the tricky bit is > that this might itself raise new exceptions. If this happens, we don't > want to abort early; instead, we want to continue until we've closed > all the iterables, and then raise a chained exception. Basically what > we want is: > > def chain(*iterables): > try: > while iterables: > for element in iterables.pop(0): > yield element > finally: > try: > operators.iterclose(iter(iterables[0])) > finally: > try: > operators.iterclose(iter(iterables[1])) > finally: > try: > operators.iterclose(iter(iterables[2])) > finally: > ... > > but of course that's not valid syntax. Fortunately, it's not too hard > to rewrite that into real Python -- but it's a little dense: > > def chain(*iterables): > try: > while iterables: > for element in iterables.pop(0): > yield element > # This is equivalent to the nested-finally chain above: > except BaseException as last_exc: > for iterable in iterables: > try: > operators.iterclose(iter(iterable)) > except BaseException as new_exc: > if new_exc.__context__ is None: > new_exc.__context__ = last_exc > last_exc = new_exc > raise last_exc > > It's probably worth wrapping that bottom part into an iterclose_all() > helper, since the pattern probably occurs in other cases as well. > (Actually, now that I think about it, the map() example in the text > should be doing this instead of what it's currently doing... I'll fix > that.) At this point your code is starting to look a whole lot like the code in contextlib.ExitStack.__exit__ :) Accordingly, I'm going to suggest that while I agree the problem you describe is one that genuinely emerges in large production applications and other complex systems, this particular solution is simply far too intrusive to be accepted as a language change for Python - you're talking a fundamental change to the meaning of iteration for the sake of the relatively small portion of the community that either work on such complex services, or insist on writing their code as if it might become part of such a service, even when it currently isn't. Given that simple applications vastly outnumber complex ones, and always will, I think making such a change would be a bad trade-off that didn't come close to justifying the costs imposed on the rest of the ecosystem to adjust to it. A potentially more fruitful direction of research to pursue for 3.7 would be the notion of "frame local resources", where each Python level execution frame implicitly provided a lazily instantiated ExitStack instance (or an equivalent) for resource management. Assuming that it offered an "enter_frame_context" function that mapped to "contextlib.ExitStack.enter_context", such a system would let us do things like: from frame_resources import enter_frame_context def readlines_1(fname): return enter_frame_context(open(fname)).readlines() def readlines_2(fname): return [*enter_frame_context(open(fname))] def readlines_3(fname): return [line for line in enter_frame_context(open(fname))] def iterlines_1(fname): yield from enter_frame_context(open(fname)) def iterlines_2(fname): for line in enter_frame_context(open(fname)): yield line def iterlines_3(fname): f = enter_frame_context(open(fname)) while True: try: yield next(f) except StopIteration: pass to indicate "clean up this file handle when this frame terminates, regardless of the GC implementation used by the interpreter". Such a feature already gets you a long way towards the determinism you want, as frames are already likely to be cleaned up deterministically even in Python implementations that don't use automatic reference counting - the bit that's non-deterministic is cleaning up the local variables referenced *from* those frames. And then further down the track, once such a system had proven its utility, *then* we could talk about expanding the iteration protocol to allow for implicit registration of iterable cleanup functions as frame local resources. With the cleanup functions not firing until the *frame* exits, then the backwards compatibility break would be substantially reduced (for __main__ module code there'd essentially be no compatibility break at all, and similarly for CPython local variables), and the level of impact on language implementations would also be much lower (reduced to supporting the registration of cleanup functions with frame objects, and executing those cleanup functions when the frame terminates) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Sat Oct 22 12:12:26 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 22 Oct 2016 18:12:26 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: <70296c33-32e0-b046-9faf-7e6dec99c3e5@mail.de> +1 from me for the idea of a more useful error message (if possible). On 22.10.2016 09:36, Ryan Birmingham wrote: > Per the comments in this thread, I believe that a better error message > for this case would be a reasonable way to fix the use case around this > issue. > It can be difficult to notice that your quotes are curved if you don't > know that's what you're looking for. > > -Ryan Birmingham > > On 22 October 2016 at 03:16, Steven D'Aprano > wrote: > > On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote: > > Interesting idea. +1 from me; probably can be as simple as just having the > > tokenizer interpret curly quotes as the ASCII (straight) version of itself > > (in other words, " and the two curly versions of that would all produce the > > same token, and same for single quotes, eliminating any need for additional > > changes further down the chain). > > There's a lot more than two. At least nineteen (including the ASCII > ones): ?????"'???????????? > > > > This would help with copying and pasting > > code snippets from a source that may have auto-formatted the quotes without > > the original author realizing it. > > Personally, I think that we should not encourage programmers to take a > lazy, slap-dash attitude to coding. Precision is important to > programmers, and there is no limit to how imprecise users can be. Should > we also guard against people accidentally using prime marks or ornaments > (dingbats): > > ?????? ?????? > > as well? If not, what makes them different from other accidents of > careless programmers? > > I don't think we should be trying to guess what programmers mean, nor do > I think that we should be encouraging programmers to use word processors > for coding. Use the right tool for the right job, and even Notepad is > better for the occasional programmer than Microsoft Office or > LibreOffice. Programming is hard, requiring precision and care, and we > don't do beginners any favours by making it easy for them to be > imprecise and careless. > > I would be happy to see improved error messages for smart quotes: > > py> s = ?abcd? > File "", line 1 > s = ?abcd? > ^ > SyntaxError: invalid character in identifier > > (especially in IDLE), but I'm very dubious about the idea of using > typographical quote marks for strings. At the very least, Python should > not lead the way here. Let some other language experiment with this > first, and see what happens. Python is a mature, established language, > not an experimental language. > > Of course, there's nothing wrong with doing an experimental branch of > Python supporting this feature, to see what happens. But that doesn't > mean we should impose it as an official language rule. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From ncoghlan at gmail.com Sat Oct 22 12:17:13 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Oct 2016 02:17:13 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On 22 October 2016 at 06:59, Chris Barker wrote: > And then context managers were introduced. And it seems to be there is a > consensus in the Python community that we all should be using them when > working on files, and I myself have finally started routinely using them, > and teaching newbies to use them -- which is kind of a pain, 'cause I want > to have them do basic file reading stuff before I explain what a "context > manager" is. This is actually a case where style guidelines would ideally differ between between scripting use cases (let the GC handle it whenever, since your process will be terminating soon anyway) and library(/framework/application) development use cases (promptly clean up after yourself, since you don't necessarily know your context of use). However, that script/library distinction isn't well-defined in computing instruction in general, and most published style guides are written by library/framework/application developers, so students and folks doing ad hoc scripting tend to be the recipients of a lot of well-meaning advice that isn't actually appropriate for them :( Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From anthony at xtfx.me Sat Oct 22 12:22:11 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Sat, 22 Oct 2016 11:22:11 -0500 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <580206AC.1060203@canterbury.ac.nz> <20161015104839.GT22471@ando.pearwood.info> <5802C054.5020103@canterbury.ac.nz> <20161016010552.GU22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> <580956A3.6030205@canterbury.ac.nz> <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> Message-ID: On Oct 22, 2016 2:51 AM, "Alexander Heger" wrote: >>> >>> >>>> For me the current behaviour does not seem unreasonable as it resembles the order in which you write out loops outside a comprehension >>> >>> >>> That's true, but the main reason for having comprehensions >>> syntax in the first place is so that it can be read >>> declaratively -- as a description of the list you want, >>> rather than a step-by-step sequence of instructions for >>> building it up. >>> >>> If you have to stop and mentally transform it into nested >>> for-statements, that very purpose is undermined. >> >> Exactly. > > > Well, an argument that was often brought up on this forum is that Python should do things consistently, and not in one way in one place and in another way in another place, for the same thing. Here it is about the order of loop execution. The current behaviour in comprehension is that is ts being done the same way as in nested for loops. Which is easy enough to remember. Same way, everywhere. A strict interpretation by this logic would also require the [x ...] part to be at the end, like [... x] since that's how it would look in a nested for loop (inside deepest loop). I personally agree with what many others have said, in that comprehension order is not intuitive as is. I still page fault about it after many years of using. Is there a way to move the expression bit to the end in a backcompat way? It might be a completely different syntax though (I think both colons and semicolons were suggested). FWIW, Erlang/Elixir (sorry after 6 years python this is what I do now!) does it the same way as python: >>> [{X, Y} || X <- [1,2,3], Y <- [a,b]]. [{1,a},{1,b},{2,a},{2,b},{3,a},{3,b}] Here X is the outer loop. I think the confusion stems from doing it both ways at the same time. We retain the for loop order but then hoist the expression to the top. Ideally we'd either not do that, or reverse the for loop order. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 22 12:32:12 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Oct 2016 02:32:12 +1000 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: On 22 October 2016 at 17:36, Ryan Birmingham wrote: > Per the comments in this thread, I believe that a better error message for > this case would be a reasonable way to fix the use case around this issue. > It can be difficult to notice that your quotes are curved if you don't know > that's what you're looking for. Looking for particular Unicode confusables when post-processing SyntaxErrors seems like a reasonable idea to me - that's how we ended up implementing the heuristic that reports "Missing parenthesis in call to print" when folks attempt to run Python 2 code under Python 3. At the moment, tokenizer and parser errors are some of the most beginner-hostile ones we offer, since we don't have any real context when raising them - it's just a naive algorithm saying "This isn't the text I expected to see next". By contrast, later in the code generation pipeline, we have more information about what the user was trying to do, and can usually offer better errors. What Guido pointed out when I was working on the "print" heuristic is that we actually get a second go at this: the *exception constructor* usually has access to the text that the tokenizer or parser couldn't handle, and since it isn't on the critical performance path for anything, we can afford to invest some time in looking for common kinds of errors and try to nudge folks in a better direction when we think they've tripped over one of them. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dwblas at gmail.com Sat Oct 22 15:45:45 2016 From: dwblas at gmail.com (David B) Date: Sat, 22 Oct 2016 12:45:45 -0700 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: Message-ID: I would use list comprehension even if there were some other way to translate as it is straight forward. On 10/22/16, Simon Mark Holland wrote: > Having researched this as heavily as I am capable with limited experience, > I would like to suggest a Python 3 equivalent to string.translate() that > doesn't require a table as input. Maybe in the form of str.stripall() or > str.replaceall(). > > My reasoning is that while it is currently possible to easily strip() > preceding and trailing characters, and even replace() individual characters > from a string, to replace more than one characters from anywhere within the > string requires (i believe) at its simplest a command like this : > > some_string.translate(str.maketrans('','','0123456789')) > > In Python 2.* however we could say ... > > some_string.translate(None, '0123456789') > > My proposal is that if strip() and replace() are important enough to > receive modules, then the arguably more common operation (in terms of > programming tutorials, if not mainstream development) of just removing all > instances of specified numbers, punctuation, or even letters etc from a > list of characters should also. > > I wholeheartedly admit that there are MANY other ways to do this (including > RegEx and List Comprehensions), as listed in the StackOverflow answer > below. However the same could be said for replace() and strip(). > http://stackoverflow.com/questions/22187233/how-to-delete-all-instances-of-a-character-in-a-string-in-python > > This is my first suggestion and welcome any and all feedback, even if this > is a silly idea I would really like to know why it is. I have not seen > discussion of this before, but if there is such a discussion I would > welcome being directed to it. > > Thank you for your time. > Simon > -- With the simplicity of true nature, there shall be no desire. Without desire, one's original nature will be at peace. And the world will naturally be in accord with the right Way. Tao Te Ching From rosuav at gmail.com Sat Oct 22 18:00:14 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 23 Oct 2016 09:00:14 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: On Sun, Oct 23, 2016 at 3:32 AM, Nick Coghlan wrote: > Looking for particular Unicode confusables when post-processing > SyntaxErrors seems like a reasonable idea to me - that's how we ended > up implementing the heuristic that reports "Missing parenthesis in > call to print" when folks attempt to run Python 2 code under Python 3. > +1, big time. There are a few tricks you can easily teach people ("syntax error on line X might actually be on the previous line"), but the more that the language can help with, the better. ChrisA From greg.ewing at canterbury.ac.nz Sat Oct 22 19:47:01 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 23 Oct 2016 12:47:01 +1300 Subject: [Python-ideas] Order of loops in list comprehension In-Reply-To: References: <20161013165546.GB22471@ando.pearwood.info> <580305C3.7000009@canterbury.ac.nz> <1476720706.868186.758552937.4279A8A8@webmail.messagingengine.com> <1476735732.922212.758850089.49DEED7B@webmail.messagingengine.com> <1476759055.2940910.759150673.08556D71@webmail.messagingengine.com> <5805C3E6.9000505@canterbury.ac.nz> <990853e3-922e-1d0e-2c42-2505ca7f97ba@mail.de> <580956A3.6030205@canterbury.ac.nz> <260ae941-347f-2de5-e0dd-ce93b2eea845@mail.de> Message-ID: <580BFA75.9060700@canterbury.ac.nz> C Anthony Risinger wrote: > Erlang/Elixir (sorry after 6 years python this is what I do now!) > does it the same way as python: > > >>> [{X, Y} || X <- [1,2,3], Y <- [a,b]]. > [{1,a},{1,b},{2,a},{2,b},{3,a},{3,b}] > > Here X is the outer loop. > > I think the confusion stems from doing it both ways at the same time. If the semicolon syntax I suggested were available, you'd be able to choose either order, and maybe even mix them in the one comprehension. Not sure if that's a good thing or not... -- Greg From tjreedy at udel.edu Sat Oct 22 22:51:02 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Oct 2016 22:51:02 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <20161022071637.GO22471@ando.pearwood.info> References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: On 10/22/2016 3:16 AM, Steven D'Aprano wrote: > I would be happy to see improved error messages for smart quotes: > py> s = ?abcd? > File "", line 1 > s = ?abcd? > ^ > SyntaxError: invalid character in identifier The above *is* the improved (and regressed) 3.6 version ;-) In 3.5.2 (on Windows): >>> s = ?abcd? File "", line 1 s = `abcd' ^ SyntaxError: invalid syntax (Mangling of the echoed code line is Windows specific.) The improvement is the more specific error message. The regression is the placement of the caret at the end instead of under the initial '?'. To verify that Python is not actually pointing at '?', remove it. >>> s = ?abcd File "", line 1 s = ?abcd ^ SyntaxError: invalid character in identifier (recent 3.6 changes in encodings used on Windows removes code mangling in this echoed line.) > (especially in IDLE), What do you have in mind? Patches would be considered. I will continue this in response to Nick's post about 9 hours ago. -- Terry Jan Reedy From tjreedy at udel.edu Sat Oct 22 22:51:50 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Oct 2016 22:51:50 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: On 10/22/2016 12:32 PM, Nick Coghlan wrote: > On 22 October 2016 at 17:36, Ryan Birmingham wrote: >> Per the comments in this thread, I believe that a better error message for >> this case would be a reasonable way to fix the use case around this issue. >> It can be difficult to notice that your quotes are curved if you don't know >> that's what you're looking for. > > Looking for particular Unicode confusables when post-processing > SyntaxErrors seems like a reasonable idea to me - that's how we ended > up implementing the heuristic that reports "Missing parenthesis in > call to print" when folks attempt to run Python 2 code under Python 3. > > At the moment, tokenizer and parser errors are some of the most > beginner-hostile ones we offer, since we don't have any real context > when raising them - it's just a naive algorithm saying "This isn't the > text I expected to see next". By contrast, later in the code > generation pipeline, we have more information about what the user was > trying to do, and can usually offer better errors. > > What Guido pointed out when I was working on the "print" heuristic is > that we actually get a second go at this: the *exception constructor* > usually has access to the text that the tokenizer or parser couldn't > handle, and since it isn't on the critical performance path for > anything, we can afford to invest some time in looking for common > kinds of errors and try to nudge folks in a better direction when we > think they've tripped over one of them. (Continuing my response to Steven saying "improved error messages ... (especially in IDLE)") IDLE compiles()s and exec()s user code within separate try-except blocks, the latter usually being in a separate processes. Runtime tracebacks and exceptions are sent back to IDLE's Shell to be printed just as in a console (except for colorizing). Compile errors are handled differently. Tracebacks are tossed after extracting the file, line, and column (the last from the ^ marker). The latter are used to tag text with a red background. For shell input, the exception is printed normally. For editor input, it is displayed in a messagebox over the editor window. My point is that IDLE already intercepts exceptions and, for SyntaxErrors, does simple modifications (hopefully enhancements) *in Python*. So it could be an easy place to prototype, in Python, more advanced enhancements. Experimental enhancements could be made optional, and could supplement rather than replace the original message. They could also be added and modified in bugfix releases. I will say more about explaining exceptions better in another post. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Oct 22 23:22:54 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Oct 2016 13:22:54 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On 23 October 2016 at 02:17, Nick Coghlan wrote: > On 22 October 2016 at 06:59, Chris Barker wrote: >> And then context managers were introduced. And it seems to be there is a >> consensus in the Python community that we all should be using them when >> working on files, and I myself have finally started routinely using them, >> and teaching newbies to use them -- which is kind of a pain, 'cause I want >> to have them do basic file reading stuff before I explain what a "context >> manager" is. > > This is actually a case where style guidelines would ideally differ > between between scripting use cases (let the GC handle it whenever, > since your process will be terminating soon anyway) and > library(/framework/application) development use cases (promptly clean > up after yourself, since you don't necessarily know your context of > use). > > However, that script/library distinction isn't well-defined in > computing instruction in general, and most published style guides are > written by library/framework/application developers, so students and > folks doing ad hoc scripting tend to be the recipients of a lot of > well-meaning advice that isn't actually appropriate for them :( Pondering this overnight, I realised there's a case where folks using Python primarily as a scripting language can still run into many of the resource management problems that arise in larger applications: IPython notebooks, where the persistent kernel can keep resources alive for a surprisingly long time in the absence of a reference counting GC. Yes, they have the option of just restarting the kernel (which many applications don't have), but it's still a nicer user experience if we can help them avoid having those problems arise in the first place. This is likely mitigated in practice *today* by IPython users mostly being on CPython for access to the Scientific Python stack, but we can easily foresee a future where the PyPy community have worked out enough of their NumPy compatibility and runtime redistribution challenges that it becomes significantly more common to be using notebooks against Python kernels that don't use automatic reference counting. I'm significantly more amenable to that as a rationale for pursuing non-syntactic approaches to local resource management than I am the notion of pursuing it for the sake of high performance application development code. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped deterministic resource management *before* introducing with statements? Specifically, I'm thinking of a progression along the following lines: # Cleaned up whenever the interpreter gets around to cleaning up the function locals def readlines_with_default_resource_management(fname): return open(fname).readlines() # Cleaned up on function exit, even if the locals are still referenced from an exception traceback # or the interpreter implementation doesn't use a reference counting GC from local_resources import function_resource def readlines_with_declarative_cleanup(fname): return function_resource(open(fname)).readlines() # Cleaned up at the end of the with statement def readlines_with_imperative_cleanup(fname): with open(fname) as f: return f.readlines() The idea here is to change the requirement for new developers from "telling the interpreter what to *do*" (which is the situation we have for context managers) to "telling the interpreter what we *want*" (which is for it to link a managed resource with the lifecycle of the currently running function call, regardless of interpreter implementation details) Under that model, Inada-san's recent buffer snapshotting proposal would effectively be an optimised version of the one liner: def snapshot(data, limit, offset=0): return bytes(function_resource(memoryview(data))[offset:limit]) The big refactoring benefit that this feature would offer over with statements is that it doesn't require a structural change to the code - it's just wrapping an existing expression in a new function call that says "clean this up promptly when the function terminates, even if it's still part of a reference cycle, or we're not using a reference counting GC". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From simonmarkholland at gmail.com Sat Oct 22 23:44:47 2016 From: simonmarkholland at gmail.com (Simon Mark Holland) Date: Sun, 23 Oct 2016 10:44:47 +0700 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: Message-ID: Understood, and I agree, I have seen someone make a similar argument for using RegEx. Here are my main points... 1) Speed - Built-in's are faster. 2) Standardisation - It is a common task, that has MANY ways of being completed. 3) Frequent Task - It is to my mind as useful as str.strip() or str.replace() .. perhaps a lesser point ... 4) Batteries Included - In this case Python 3 is more obtuse than Python 2 in a task which often showcases Pythons ease of use. (see 'Programming Foundations with Python's' secret message lesson for an example.) Those on this list are the least likely to want this functionality, because each of us could solve this quickly in many different ways, but that doesn't mean we should. It is the tasks we don't think about that i believe often eat up cycles. Like I said, even is this is a bad idea I would like to fully grok why. Thank you all for your time. On 23 October 2016 at 02:45, David B wrote: > I would use list comprehension even if there were some other way to > translate as it is straight forward. > > On 10/22/16, Simon Mark Holland wrote: > > Having researched this as heavily as I am capable with limited > experience, > > I would like to suggest a Python 3 equivalent to string.translate() that > > doesn't require a table as input. Maybe in the form of str.stripall() or > > str.replaceall(). > > > > My reasoning is that while it is currently possible to easily strip() > > preceding and trailing characters, and even replace() individual > characters > > from a string, to replace more than one characters from anywhere within > the > > string requires (i believe) at its simplest a command like this : > > > > some_string.translate(str.maketrans('','','0123456789')) > > > > In Python 2.* however we could say ... > > > > some_string.translate(None, '0123456789') > > > > My proposal is that if strip() and replace() are important enough to > > receive modules, then the arguably more common operation (in terms of > > programming tutorials, if not mainstream development) of just removing > all > > instances of specified numbers, punctuation, or even letters etc from a > > list of characters should also. > > > > I wholeheartedly admit that there are MANY other ways to do this > (including > > RegEx and List Comprehensions), as listed in the StackOverflow answer > > below. However the same could be said for replace() and strip(). > > http://stackoverflow.com/questions/22187233/how-to- > delete-all-instances-of-a-character-in-a-string-in-python > > > > This is my first suggestion and welcome any and all feedback, even if > this > > is a silly idea I would really like to know why it is. I have not seen > > discussion of this before, but if there is such a discussion I would > > welcome being directed to it. > > > > Thank you for your time. > > Simon > > > > > -- > With the simplicity of true nature, there shall be no desire. > Without desire, one's original nature will be at peace. > And the world will naturally be in accord with the right Way. Tao Te Ching > -- Simon Holland BA Hons Medan, Indonesia -------------------- Mobile : +62 81 26055297 Fax : +62 81 6613280 [image: Twitter] [image: LinkedIn] [image: YouTube] [image: Google Talk] -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sun Oct 23 08:25:03 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 23 Oct 2016 14:25:03 +0200 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: Message-ID: <580CAC1F.8040809@egenix.com> On 22.10.2016 10:34, Simon Mark Holland wrote: > Having researched this as heavily as I am capable with limited experience, > I would like to suggest a Python 3 equivalent to string.translate() that > doesn't require a table as input. Maybe in the form of str.stripall() or > str.replaceall(). > > My reasoning is that while it is currently possible to easily strip() > preceding and trailing characters, and even replace() individual characters > from a string, to replace more than one characters from anywhere within the > string requires (i believe) at its simplest a command like this : > > some_string.translate(str.maketrans('','','0123456789')) > > In Python 2.* however we could say ... > > some_string.translate(None, '0123456789') > > My proposal is that if strip() and replace() are important enough to > receive modules, then the arguably more common operation (in terms of > programming tutorials, if not mainstream development) of just removing all > instances of specified numbers, punctuation, or even letters etc from a > list of characters should also. > > I wholeheartedly admit that there are MANY other ways to do this (including > RegEx and List Comprehensions), as listed in the StackOverflow answer > below. However the same could be said for replace() and strip(). > http://stackoverflow.com/questions/22187233/how-to-delete-all-instances-of-a-character-in-a-string-in-python > > This is my first suggestion and welcome any and all feedback, even if this > is a silly idea I would really like to know why it is. I have not seen > discussion of this before, but if there is such a discussion I would > welcome being directed to it. Could you perhaps give a use case for what you have in mind ? I usually go straight to the re module for anything that's non-trivial in terms of string manipulation, or use my mxTextTools for more complex stuff. re.sub() would be the natural choice for replacing multiple chars or removing multiple chars in one go. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 23 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From danilo.bellini at gmail.com Sun Oct 23 10:57:10 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sun, 23 Oct 2016 12:57:10 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions Message-ID: The idea is to let generator expressions and list/set comprehensions have a clean syntax to access its last output. That would allow them to be an alternative syntax to the scan higher-order function [1] (today implemented in the itertools.accumulate function), which leads to an alternative way to write a fold/reduce. It would be nice to have something like: >>> last(abs(prev - x) for x in [3, 4, 5] from prev = 2) 2 instead of a reduce: >>> from functools import reduce >>> reduce(lambda prev, x: abs(prev - x), [3, 4, 5], 2) 2 or an imperative approach: >>> prev = 2 >>> for x in [3, 4, 5]: ... prev = abs(prev - x) >>> prev 2 or getting the last from accumulate: >>> from itertools import accumulate >>> list(accumulate([2, 3, 4, 5], lambda prev, x: abs(prev - x)))[-1] 2 or... >>> [prev for prev in [2] ... for x in [3, 4, 5] ... for prev in [abs(prev - x)] ... ][-1] 2 Actually, I already wrote a solution for something similar to that: PyScanPrev [2]. I'm using bytecode manipulation to modify the generator expression and set/list comprehensions semantics to create a "scan", but it has the limitation of using only code with a valid syntax as the input, so I can't use "from" inside a generator expression / list comprehension. The solution was to put the first output into the iterable and define the "prev" name elsewhere: >>> last(abs(prev - x) for x in [2, 3, 4, 5]) 2 That line works with PyScanPrev (on Python 3.4 and 3.5) when defined in a function with a @enable_scan("prev") decorator. That was enough to create a "test suite" of doctest-based examples that shows several scan use cases [2]. This discussion started in a Brazilian list when someone asked how she could solve a simple uppercase/lowercase problem [3]. The goal was to alternate the upper/lower case of a string while neglecting the chars that doesn't apply (i.e., to "keep the state" when the char isn't a letter). After the discussion, I wrote the PyScanPrev package, and recently I've added this historical "alternate" function as the "conditional toggling" example [4]. Then I ask, can Python include that "scan" access to the last output in its list/set/dict comprehension and generator expression syntax? There are several possible applications for the scan itself as well as for the fold/reduce (signal processing, control theory, physics, economics, etc.), some of them I included as PyScanPrev examples. Some friends (people who like control engineering and/or signal processing) liked the "State-space model" example, where I included a "leaking bucket-spring-damper" simulation using the scan-enabled generator expressions [5]. About the syntax, there are several ideas on how that can be written. Given a "prev" identifier, a "target" identifier, an input "iterable" and an optional "start" value (and perhaps an optional "echo_start", which I assume True by default), some of them are: [func(prev, target) for target in iterable from prev = start] [func(prev, target) for target in iterable] -> prev = start [func(prev, target) for target in iterable] -> prev as start [func(prev, target) for target in iterable] from prev = start [func(prev, target) for target in iterable] from prev as start [func(prev, target) for target in iterable] with prev as start prev = start -> [func(prev, target) for target in iterable] prev(start) -> [func(prev, target) for target in iterable] [func(prev, target) for prev -> target in start -> iterable] [prev = start -> func(prev, target) for target in iterable] # With ``start`` being the first value of the iterable, i.e., # iterable = prepend(start, data) [func(prev, target) for target in iterable from prev] [func(prev, target) for target in iterable] -> prev [func(prev, target) for target in iterable] from prev prev -> [func(prev, target) for target in iterable] Before writing PyScanPrev, in [6] (Brazilian Portuguese) I used stackfull [7] to implement that idea, an accumulator example using that library is: >>> from stackfull import push, pop, stack >>> [push(pop() + el if stack() else el) for el in range(5)] [0, 1, 3, 6, 10] >>> list(itertools.accumulate(range(5))) [0, 1, 3, 6, 10] There are more I can say (e.g. the pyscanprev.scan function has a "start" value and an "echo_start" keyword argument, resources I missed in itertools.accumulate) but the links below already have a lot of information. [1] https://en.wikipedia.org/wiki/Prefix_sum#Scan_higher_order_function [2] https://pypi.python.org/pypi/pyscanprev [3] https://groups.google.com/forum/#!topic/grupy-sp/wTIj6G5_5S0 [4] https://github.com/danilobellini/pyscanprev/blob/v0.1.0/examples/conditional-toggling.rst [5] https://github.com/danilobellini/pyscanprev/blob/v0.1.0/examples/state-space.rst [6] https://groups.google.com/forum/#!topic/grupy-sp/UZp-lVSWK1s [7] https://pypi.python.org/pypi/stackfull -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 23 11:09:15 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 08:09:15 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: What is `last(inf_iter)`. E.g `last(count())`. To me, the obvious spelling is: for last in it: pass doSomething(last) This makes it clear that is the users job to make sure `it` terminates. There's no general way to get the last item without looking through all the earlier ones. On Oct 23, 2016 7:58 AM, "Danilo J. S. Bellini" wrote: > The idea is to let generator expressions and list/set comprehensions have > a clean syntax to access its last output. That would allow them to be an > alternative syntax to the scan higher-order function [1] (today implemented > in the itertools.accumulate function), which leads to an alternative way > to write a fold/reduce. It would be nice to have something like: > > >>> last(abs(prev - x) for x in [3, 4, 5] from prev = 2) > 2 > > instead of a reduce: > > >>> from functools import reduce > >>> reduce(lambda prev, x: abs(prev - x), [3, 4, 5], 2) > 2 > > or an imperative approach: > > >>> prev = 2 > >>> for x in [3, 4, 5]: > ... prev = abs(prev - x) > >>> prev > 2 > > or getting the last from accumulate: > > >>> from itertools import accumulate > >>> list(accumulate([2, 3, 4, 5], lambda prev, x: abs(prev - x)))[-1] > 2 > > or... > > >>> [prev for prev in [2] > ... for x in [3, 4, 5] > ... for prev in [abs(prev - x)] > ... ][-1] > 2 > > Actually, I already wrote a solution for something similar to that: > PyScanPrev [2]. I'm using bytecode manipulation to modify the generator > expression and set/list comprehensions semantics to create a "scan", but > it has the limitation of using only code with a valid syntax as the input, > so I can't use "from" inside a generator expression / list comprehension. > The solution was to put the first output into the iterable and define the > "prev" name elsewhere: > > >>> last(abs(prev - x) for x in [2, 3, 4, 5]) > 2 > > That line works with PyScanPrev (on Python 3.4 and 3.5) when defined in a > function with a @enable_scan("prev") decorator. That was enough to create > a "test suite" of doctest-based examples that shows several scan use cases > [2]. > > This discussion started in a Brazilian list when someone asked how she > could solve a simple uppercase/lowercase problem [3]. The goal was to > alternate the upper/lower case of a string while neglecting the chars that > doesn't apply (i.e., to "keep the state" when the char isn't a letter). After > the discussion, I wrote the PyScanPrev package, and recently I've added > this historical "alternate" function as the "conditional toggling" > example [4]. > > Then I ask, can Python include that "scan" access to the last output in > its list/set/dict comprehension and generator expression syntax? There are > several possible applications for the scan itself as well as for the > fold/reduce (signal processing, control theory, physics, economics, etc.), > some of them I included as PyScanPrev examples. Some friends (people who > like control engineering and/or signal processing) liked the "State-space > model" example, where I included a "leaking bucket-spring-damper" > simulation using the scan-enabled generator expressions [5]. > > About the syntax, there are several ideas on how that can be written. > Given a "prev" identifier, a "target" identifier, an input "iterable" and > an optional "start" value (and perhaps an optional "echo_start", which I > assume True by default), some of them are: > > [func(prev, target) for target in iterable from prev = start] > [func(prev, target) for target in iterable] -> prev = start > [func(prev, target) for target in iterable] -> prev as start > [func(prev, target) for target in iterable] from prev = start > [func(prev, target) for target in iterable] from prev as start > [func(prev, target) for target in iterable] with prev as start > prev = start -> [func(prev, target) for target in iterable] > prev(start) -> [func(prev, target) for target in iterable] > [func(prev, target) for prev -> target in start -> iterable] > [prev = start -> func(prev, target) for target in iterable] > > # With ``start`` being the first value of the iterable, i.e., > # iterable = prepend(start, data) > [func(prev, target) for target in iterable from prev] > [func(prev, target) for target in iterable] -> prev > [func(prev, target) for target in iterable] from prev > prev -> [func(prev, target) for target in iterable] > > Before writing PyScanPrev, in [6] (Brazilian Portuguese) I used stackfull > [7] to implement that idea, an accumulator example using that library is: > > >>> from stackfull import push, pop, stack > >>> [push(pop() + el if stack() else el) for el in range(5)] > [0, 1, 3, 6, 10] > >>> list(itertools.accumulate(range(5))) > [0, 1, 3, 6, 10] > > There are more I can say (e.g. the pyscanprev.scan function has a "start" > value and an "echo_start" keyword argument, resources I missed in > itertools.accumulate) but the links below already have a lot of information. > > [1] https://en.wikipedia.org/wiki/Prefix_sum#Scan_higher_order_function > [2] https://pypi.python.org/pypi/pyscanprev > [3] https://groups.google.com/forum/#!topic/grupy-sp/wTIj6G5_5S0 > [4] https://github.com/danilobellini/pyscanprev/blob/ > v0.1.0/examples/conditional-toggling.rst > [5] https://github.com/danilobellini/pyscanprev/blob/ > v0.1.0/examples/state-space.rst > [6] https://groups.google.com/forum/#!topic/grupy-sp/UZp-lVSWK1s > [7] https://pypi.python.org/pypi/stackfull > > -- > Danilo J. S. Bellini > --------------- > "*It is not our business to set up prohibitions, but to arrive at > conventions.*" (R. Carnap) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danilo.bellini at gmail.com Sun Oct 23 11:28:51 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sun, 23 Oct 2016 13:28:51 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: > > What is `last(inf_iter)`. E.g `last(count())`. The "last" is just a helper function that gets the last value of an iterable. On sequences, it can be written to get the item at index -1 to avoid traversing it. Using it on endless iterables makes no sense. This makes it clear that is the users job to make sure `it` terminates. If one call "last" for something that doesn't terminate, an "endless" iterable, well, it's pretty obvious that it won't "end" nicely. It's not the Python job to solve the Entscheidungsproblem. If you call "sorted" on endless iterables, it would behave like "last", doesn't it? The whole point of this idea is the scan as a generator expression or list/set comprehension that can access the previous iteration output. Reduce/fold is just the last value of a scan, and the scan is still defined when there's no "last value". -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 23 11:37:07 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 08:37:07 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: Of course. But if you want last(), why not just spell the utility function as I did? I.e. as a function: def last(it): for item in it: pass return item That works fine for any iteratable (including a list, array, etc), whether or not it's a reduction/accumulation. On Oct 23, 2016 8:29 AM, "Danilo J. S. Bellini" wrote: > What is `last(inf_iter)`. E.g `last(count())`. > > The "last" is just a helper function that gets the last value of an > iterable. On sequences, it can be written to get the item at index -1 to > avoid traversing it. Using it on endless iterables makes no sense. > > This makes it clear that is the users job to make sure `it` terminates. > > If one call "last" for something that doesn't terminate, an "endless" > iterable, well, it's pretty obvious that it won't "end" nicely. It's not > the Python job to solve the Entscheidungsproblem. If you call "sorted" on > endless iterables, it would behave like "last", doesn't it? > > The whole point of this idea is the scan as a generator expression or > list/set comprehension that can access the previous iteration output. > Reduce/fold is just the last value of a scan, and the scan is still defined > when there's no "last value". > > -- > Danilo J. S. Bellini > --------------- > "*It is not our business to set up prohibitions, but to arrive at > conventions.*" (R. Carnap) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 23 11:37:34 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 02:37:34 +1100 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: Message-ID: <20161023153733.GP22471@ando.pearwood.info> On Sat, Oct 22, 2016 at 03:34:23PM +0700, Simon Mark Holland wrote: > Having researched this as heavily as I am capable with limited experience, > I would like to suggest a Python 3 equivalent to string.translate() that > doesn't require a table as input. Maybe in the form of str.stripall() or > str.replaceall(). stripall() would not be appropriate: "strip" refers to removing from the front and end of the string, not the middle, and str.strip() already implements a "strip all" functionality: py> '+--+*abcd+-*xyz-*+-'.strip('*+-') 'abcd+-*xyz' But instead of a new method, why not fix translate() to be more user- friendly? Currently, it takes two method calls to delete characters using translate: table = str.maketrans('', '', '*+-.!?') newstring = mystring.translate(table) That's appropriate when you have a big translation table which you are intending to use many times, but its a bit clunky for single, one-off uses. Maybe we could change the API of translate to something like this: def translate(self, *args): if len(args) == 1: # Same as the existing behaviour. table = args[0] elif len(args) == 3: table = type(self).maketrans(*args) else: raise TypeError('too many or not enough arguments') ... Then we could write: newstring = mystring.translate('', '', '1234567890') to delete the digits. So we could fix this... but should we? Is this *actually* a problem that needs fixing, or are we just adding unnecessary complexity? > My reasoning is that while it is currently possible to easily strip() > preceding and trailing characters, and even replace() individual characters > from a string, Stripping from the front and back is a very common operation; in my experience, replacing is probably half as common, maybe even less. But deleting is even less common. > My proposal is that if strip() and replace() are important enough to > receive modules, then the arguably more common operation (in terms of > programming tutorials, if not mainstream development) of just removing all > instances of specified numbers, punctuation, or even letters etc from a > list of characters should also. I think the reason that deleting characters is common in tutorials is that it is a simple, easy, obvious task that can be programmed by a beginner in just a few lines. I don't think it is actually something that people need to do very often, outside of exercises. -- Steve From steve at pearwood.info Sun Oct 23 11:42:41 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 02:42:41 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: <20161023154240.GQ22471@ando.pearwood.info> On Sun, Oct 23, 2016 at 08:37:07AM -0700, David Mertz wrote: > Of course. But if you want last(), why not just spell the utility function > as I did? I.e. as a function: > > def last(it): > for item in it: > pass > return item > > That works fine for any iteratable (including a list, array, etc), whether > or not it's a reduction/accumulation. That's no good, because it consumes the iterator. Yes, you get the last value, but you actually needed to do work on all the previous values too. -- Steve From mertz at gnosis.cx Sun Oct 23 11:47:12 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 08:47:12 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023154240.GQ22471@ando.pearwood.info> References: <20161023154240.GQ22471@ando.pearwood.info> Message-ID: Consuming the iterator is *necessary* to get the last item. There's no way around that. Obviously, you could itertools.tee() it first if you don't mind the cache space. But there cannot be a generic "jump to the end" of an iterator without being destructive. On Oct 23, 2016 8:43 AM, "Steven D'Aprano" wrote: > On Sun, Oct 23, 2016 at 08:37:07AM -0700, David Mertz wrote: > > Of course. But if you want last(), why not just spell the utility > function > > as I did? I.e. as a function: > > > > def last(it): > > for item in it: > > pass > > return item > > > > That works fine for any iteratable (including a list, array, etc), > whether > > or not it's a reduction/accumulation. > > That's no good, because it consumes the iterator. Yes, you get > the last value, but you actually needed to do work on all the > previous values too. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danilo.bellini at gmail.com Sun Oct 23 11:58:26 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sun, 23 Oct 2016 13:58:26 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: > > Of course. But if you want last(), why not just spell the utility function > as I did? [...] > I'm not against a general "last", I just said the main idea of this thread is the access to the previous iteration output in a list/set/dict comprehension or generator expression. > Actually, your code is similar to the reference implementation I wrote for PyScanPrev, the main difference is that my "last" raises a StopIteration on an empty input instead of an UnboundLocalError: https://github.com/danilobellini/pyscanprev/blob/v0.1.0/pyscanprev.py#L148 When the input is a sequence, it should be optimized to get the item at the index -1. That works fine for any iteratable (including a list, array, etc), whether > or not it's a reduction/accumulation. > Lists and arrays don't need to be traversed. Consuming the iterator is *necessary* to get the last item. There's no way > around that. > Not if there's enough information to create the last value. Perhaps on the it = iter(range(9999999)) one can get 2 values (call next(it) twice) and use its __length_hint__ to create the last value. But I think only sequences should have such an optimization, not iterators. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 23 11:59:20 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 02:59:20 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: <20161023155920.GR22471@ando.pearwood.info> On Sun, Oct 23, 2016 at 12:57:10PM -0200, Danilo J. S. Bellini wrote: > The idea is to let generator expressions and list/set comprehensions have a > clean syntax to access its last output. That would allow them to be an > alternative syntax to the scan higher-order function [1] (today implemented > in the itertools.accumulate function), which leads to an alternative way to > write a fold/reduce. It would be nice to have something like: [cut suggested syntax] > instead of a reduce: [cut four existing ways to solve the problem] Why do we need a FIFTH way to solve this problem? What you are describing is *exactly* the use case for a reduce or fold function. Why add special magic syntax to make comprehensions do even more? Not everything needs to be a one liner. It's okay to import reduce to do a reduce. Its okay to write an actual for-loop. > Actually, I already wrote a solution for something similar to that: > PyScanPrev [2]. Ah, that makes SIX existing solutions. Do we need a seventh? -- Steve From danilo.bellini at gmail.com Sun Oct 23 12:10:42 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sun, 23 Oct 2016 14:10:42 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023155920.GR22471@ando.pearwood.info> References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: > > Ah, that makes SIX existing solutions. Do we need a seventh? It might have dozens of solutions, perhaps an infinity of solutions. Brainfuck and assembly can be used, or even turing machine instructions... But there should be one, and preferably only one, OBVIOUS way to do it. Readability counts. Reduce lost the built-in status on Python 3. Lambdas lost the decomposing arguments like "lambda (a, b), c: a + b * c". Using a for loop section inside a generator expression or list/set/dict comprehension allow the decomposing arguments and doesn't need a function to be imported. Actually, itertools.accumulate and functools.reduce have their parameters reversed, and accumulate doesn't have a "start" parameter. Actually, the example I give in this thread is about a fold/reduce trying to show it's way simpler than the other solutions. I didn't paste here any scan use case because I sent links with several use cases, should I paste their contents here? The PyScanPrev link (https://github.com/ danilobellini/pyscanprev) has several use case examples (including some just for a comparison with other possible solutions) and even have a full rationale for this idea. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 23 12:22:42 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 09:22:42 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: On Oct 23, 2016 9:12 AM, "Danilo J. S. Bellini" wrote: Actually, itertools.accumulate and functools.reduce have their parameters reversed, and accumulate doesn't have a "start" parameter. def accumulate2(fn=operator.add, it, start=None): if start is not None: it = iterations.chain([start], it) return itertools.accumulate(it, fn) I would have preferred this signature to start with, but it's easy to wrap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Oct 23 12:22:42 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 09:22:42 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: Message-ID: Sure, a better last() should try to index into it[-1] first as an efficiency. And there are lots of iterators where the last element is predictable without looking through all the prior items. I know the last item of itertools.repeat(7, sys.maxsize) without having to loop for hours. But the general case is that you need to get all the head elements to determine last(). If the main idea is "to access the previous iteration" then we already have it retools.accumulate() for exactly that purpose. If you want something that bandages a little different from accumulate, writing generator functions us really easy. On Oct 23, 2016 8:59 AM, "Danilo J. S. Bellini" wrote: > Of course. But if you want last(), why not just spell the utility function >> as I did? [...] >> > I'm not against a general "last", I just said the main idea of this thread > is the access to the previous iteration output in a list/set/dict > comprehension or generator expression. > >> Actually, your code is similar to the reference implementation I wrote > for PyScanPrev, the main difference is that my "last" raises a > StopIteration on an empty input instead of an UnboundLocalError: > https://github.com/danilobellini/pyscanprev/blob/v0.1.0/pyscanprev.py#L148 > When the input is a sequence, it should be optimized to get the item at > the index -1. > > That works fine for any iteratable (including a list, array, etc), whether >> or not it's a reduction/accumulation. >> > Lists and arrays don't need to be traversed. > > Consuming the iterator is *necessary* to get the last item. There's no way >> around that. >> > Not if there's enough information to create the last value. Perhaps on the > it = iter(range(9999999)) one can get 2 values (call next(it) twice) and > use its __length_hint__ to create the last value. But I think only > sequences should have such an optimization, not iterators. > > -- > Danilo J. S. Bellini > --------------- > "*It is not our business to set up prohibitions, but to arrive at > conventions.*" (R. Carnap) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danilo.bellini at gmail.com Sun Oct 23 12:38:52 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sun, 23 Oct 2016 14:38:52 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: > > I would have preferred this signature to start with, but it's easy to > wrap. > Indeed, but a default value for the first argument requires a default value for all arguments. It's a syntax error, but I agree a "range-like" signature like that would be better. My reference scan implementation (that's how I thought itertools.accumulate should be): https://github.com/danilobellini/pyscanprev/blob/v0.1.0/pyscanprev.py#L171 A new "functools.scan" with a signature like the one from the link above would be nice, but it would overlap with itertools.accumulate in some sense. The advantages would be: 1 - The scan signature and the functools.reduce signature are the same (the function as the first parameter, like map/filter) 2 - The module, functools, is the same that has the reduce function -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 23 13:21:43 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 23 Oct 2016 18:21:43 +0100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: On 23 October 2016 at 17:10, Danilo J. S. Bellini wrote: >> Ah, that makes SIX existing solutions. Do we need a seventh? > > It might have dozens of solutions, perhaps an infinity of solutions. > Brainfuck and assembly can be used, or even turing machine instructions... > > But there should be one, and preferably only one, OBVIOUS way to do it. > Readability counts. Sure, but you haven't explained why your proposal is more obvious than any of the other six. Built in does not equate to obvious. More obvious is often to have a dedicated tool, in a module designed to provide tools in that particular area. That's partially why reduce got moved to the functools module (another part is the fact that Guido doesn't find functional-style approaches that "obvious" - and what's obvious to a Dutchman is the benchmark here :-)) I'm not against powerful "windowed function" capabilities - my background is SQL, and windowed functions in SQL have even more power than the sort of thing we're talking about here. But I wouldn't call them "obvious" - at least not based on the number of times I get to do explanations of them to colleagues, or the number of tutorials on them I see. So the idea seems fine to me, but I'd very definitely class it as an "advanced" feature, and typically that sort of feature in Python is handled in a library. > Reduce lost the built-in status on Python 3. Lambdas lost the decomposing > arguments like "lambda (a, b), c: a + b * c". Which can be interpreted as evidence that this type of approach is not considered a core feature. In general, I like the idea, but I don't think it fits well in Python in its proposed form. Paul From steve at pearwood.info Sun Oct 23 19:22:32 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 10:22:32 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023154240.GQ22471@ando.pearwood.info> Message-ID: <20161023232230.GT22471@ando.pearwood.info> On Sun, Oct 23, 2016 at 08:47:12AM -0700, David Mertz wrote: > Consuming the iterator is *necessary* to get the last item. There's no way > around that. > > Obviously, you could itertools.tee() it first if you don't mind the cache > space. But there cannot be a generic "jump to the end" of an iterator > without being destructive. Right. But you're missing the point of Danilo's proposal. He isn't asking for a function to "jump to the end" of an iterator. Look at his example. The word "last" is a misnomer: he seems to me talking about having a special variable in comprehensions that holds the *previous* value of the loop variable, with special syntax to set its FIRST value, before the loop is entered. So "last" is a misleading name, unless you understand it as "last seen" rather than "very last, at the end". So given an iterator [1, 2, 4, 8], and an initial value of -1, we would see something like this: [(previous, this) for this in [1, 2, 4, 8] with previous as -1] # or some other syntax returns: [(-1, 1), (1, 2), (2, 4), (4, 8)] So a dedicated function that does nothing but scan to the end of the iterator and return the last/final value seen is no alternative to his proposal. -- Steve From rosuav at gmail.com Sun Oct 23 19:33:45 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 24 Oct 2016 10:33:45 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023232230.GT22471@ando.pearwood.info> References: <20161023154240.GQ22471@ando.pearwood.info> <20161023232230.GT22471@ando.pearwood.info> Message-ID: On Mon, Oct 24, 2016 at 10:22 AM, Steven D'Aprano wrote: > Right. But you're missing the point of Danilo's proposal. He isn't > asking for a function to "jump to the end" of an iterator. Look at his > example. The word "last" is a misnomer: he seems to me talking > about having a special variable in comprehensions that holds the > *previous* value of the loop variable, with special syntax to set its > FIRST value, before the loop is entered. So "last" is a misleading name, > unless you understand it as "last seen" rather than "very last, at the > end". > Sounds like the PostgreSQL "lag" function [1]. Perhaps that's a better name? Conceptually, what you have is another iteration point that lags behind where you currently are. ChrisA [1] https://www.postgresql.org/docs/current/static/functions-window.html From steve at pearwood.info Sun Oct 23 19:38:49 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 10:38:49 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023232230.GT22471@ando.pearwood.info> References: <20161023154240.GQ22471@ando.pearwood.info> <20161023232230.GT22471@ando.pearwood.info> Message-ID: <20161023233849.GU22471@ando.pearwood.info> On Mon, Oct 24, 2016 at 10:22:32AM +1100, Steven D'Aprano wrote: > On Sun, Oct 23, 2016 at 08:47:12AM -0700, David Mertz wrote: > > Consuming the iterator is *necessary* to get the last item. There's no way > > around that. > > > > Obviously, you could itertools.tee() it first if you don't mind the cache > > space. But there cannot be a generic "jump to the end" of an iterator > > without being destructive. > > Right. But you're missing the point of Danilo's proposal. Ah, actually it may be that I have misunderstood Danilo's proposal, because his example does include BOTH a suggestion of new magic syntax for retrieving the *previous* loop value inside a comprehension AND what seems to be a new built-in(?) function last() which seems to do exactly what you suggest: jump right to the end of an iterable and return the final value. My apologies for the confusion. -- Steve From mertz at gnosis.cx Sun Oct 23 20:12:19 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 23 Oct 2016 20:12:19 -0400 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023232230.GT22471@ando.pearwood.info> References: <20161023154240.GQ22471@ando.pearwood.info> <20161023232230.GT22471@ando.pearwood.info> Message-ID: On Sun, Oct 23, 2016 at 4:22 PM, Steven D'Aprano wrote: > Right. But you're missing the point of Danilo's proposal. He isn't > asking for a function to "jump to the end" of an iterator. Look at his > example. The word "last" is a misnomer: he seems to me talking > about having a special variable in comprehensions that holds the > *previous* value of the loop variable, with special syntax to set its > FIRST value, before the loop is entered. OK... but that's exactly itertools.accumulate (or maybe a thin wrapper around it I like showed earlier in the thread). I'm not sure Danilo was clear in what he's proposing. In the thread he suggested that he wanted to special case to indexing on sequences, which doesn't seem to make sense for your meaning. It feels like there might be a case here for a new function in itertools that makes use of the last-seen item in an iterable, then combines it somehow with the current item. I'm not sure the spelling, but it definitely sounds like a function to me, not a need for new syntax. I've only rarely had that specific need. That said, here's a function I use in teaching to show some of what you can do with combining iterators, especially using itertools: def item_with_total(iterable): "Generically transform a stream of numbers into a pair of (num, running_sum)" s, t = tee(iterable) yield from zip(t, accumulate(s)) This might not be *exactly* what Danilo wants, but it's a similar concept. I wrap together an iterator (including an infinite one) with an accumulation. This just takes the default `operator.add` function for accumulate(), but it could take a function argument easily enough. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 23 20:29:41 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 11:29:41 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: <20161024002939.GV22471@ando.pearwood.info> On Sun, Oct 23, 2016 at 02:10:42PM -0200, Danilo J. S. Bellini wrote: > > > > Ah, that makes SIX existing solutions. Do we need a seventh? > > It might have dozens of solutions, perhaps an infinity of solutions. > Brainfuck and assembly can be used, or even turing machine instructions... No, those are *implementations*. You can implement your solution in any language you like. For integration with Python, any of C, Fortran, Rust, Julia, Cython and (of course) pure Python are proven to work well. Using Turing Machine instructions requires some sort of TM compiler... good luck with that. But you cut out the most important part of my post. You've given lots of existing solutions. Why aren't they satisfactory? Even if somebody wants to write in a functional style, the reduce() solution you show seems perfectly clean and conventional to anyone used to functional code: from functools import reduce reduce(lambda prev, x: abs(prev - x), [3, 4, 5], 2) returns 2. What is wrong with this solution? That is the obvious solution for somebody looking for a functional style: something called reduce or fold. And there's no harm in needing to import reduce. Not every function has to be a built-in. Whereas your suggestion needs TWO new features: new syntax: (abs(prev - x) for x in [3, 4, 5] from prev = 2) plus a new built-in function last() which extracts the final value from an iterator. That means you will risk encouraging people to wastefully generate a large container of unneeded and unwanted intermediate values: last([abs(prev - x) for x in range(100000) from prev = 2]) which will generate a list 100000 items long just to extract the final one. reduce() is better, that is exactly what reduce() is designed for. > But there should be one, and preferably only one, OBVIOUS way to do it. > Readability counts. Right. And the obvious way is the imperative approach (only this time I will use a better variable name): py> result = 2 py> for x in [3, 4, 5]: ... result = abs(result - x) ... py> result 2 For those who think in functional programming terms, reduce() is the obvious way. Also, I feel that your proposal could have been explained better. I felt overloaded by the sheer mass of different alternatives, and mislead by your use of the name "prev" for something that I see now on more careful reading is *not* the previous value of the loop variable (as I first understood) but the running calculation result. In fairness I am sick and if I were well I may have been able to keep this straight in my head, but calling the variable "prev" is actively misleading. I was mislead, and (I think) Chris who just suggested this was similar to the SQL "lag" function may have been mislead as well. (Or perhaps he was just mislead by me, in which case, sorry Chris!) -- Steve From rosuav at gmail.com Sun Oct 23 20:38:52 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 24 Oct 2016 11:38:52 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161024002939.GV22471@ando.pearwood.info> References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> Message-ID: On Mon, Oct 24, 2016 at 11:29 AM, Steven D'Aprano wrote: > In fairness I am sick and if I were well I may have been able to keep > this straight in my head, but calling the variable "prev" is actively > misleading. I was mislead, and (I think) Chris who just suggested this > was similar to the SQL "lag" function may have been mislead as well. (Or > perhaps he was just mislead by me, in which case, sorry Chris!) All of the above are possible. I'm skimming the thread, not reading it in full, and I'm a bit lost as to the point of the proposal, so it's entirely possible that lag() is unrelated. But if it is indeed just reduce(), then it's even simpler. ChrisA From danilo.bellini at gmail.com Mon Oct 24 01:11:01 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Mon, 24 Oct 2016 03:11:01 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> Message-ID: The proposal is mostly about scan/accumulate. Reduce/fold is a "corollary", as it's just the last value of a scan. The idea is to have a way of using the previous iteration output inside a list comprehension (and anything alike). That is, to make them recursive. last([abs(prev - x) for x in range(100000) from prev = 2]) > Why not [abs(prev - x) for x in range(100000) from prev = 2][-1]? How about list(some_iterable)[-1]? Probably a "last" function would avoid these. But the "last" is a pretty easy function to write. This proposal is about the list comprehension syntax (and other things alike). The "last" function and the "scan" functions can be seen as secondary proposals, the main point is a syntax to access to the previous iteration output value inside a list comprehension. For example, a product: >>> [prev * k for k in [5, 2, 4, 3] from prev = 1] [1, 5, 10, 40, 120] That makes sense for me, and seem simpler than: >>> from itertools import accumulate, chain >>> list(accumulate(chain([1], [5, 2, 4, 3]), lambda prev, k: prev * k)) [1, 5, 10, 40, 120] Which is still simpler than using reduce >>> from functools import reduce >>> list(reduce(lambda hist, k: hist + [hist[-1] * k], [5, 2, 4, 3], [1])) [1, 5, 10, 40, 120] The first is explicit. The imperative approach for that would be much more like the reduce than the scan, as "hist" is the result. >>> hist = [1] >>> for k in [5, 2, 4, 3]: ... prev = hist[-1] ... hist.append(prev * k) >>> hist [1, 5, 10, 40, 120] The very idea of prefering these approaches instead of the proposal sounds strange to me. What is simpler on them, the McCabe complexity? Number of tokens? Number of repeated tokens? AST tree height? AFAIK, GvR prefers the list comprehension syntax instead of using the map/filter higher order functions. He even said somewhere that a reduce can be written as list comprehension, and it wasn't obvious for me that a "3-for-sections" list comprehension repeating a target variable name would be a valid Python code, and that's required to get a recursive list comprehension. What I'm proposing is to allow a list comprehension syntax to behave like itertools.accumulate without the "3-for-sections" kludge. The rationale for the proposal is here: https://github.com/danilobellini/pyscanprev/tree/v0.1.0#the-world-without-this-package-rationale On Haskell, the above example would be: Prelude> scanl (*) 1 [5, 2, 4, 3] [1,5,10,40,120] And that's what I'm trying to write as a list comprehension. Some months ago, thinking on how I could write this proposal, I started writing PyScanPrev. Among the examples I wrote on PyScanPrev, there are use cases on: - maths - physics - economics - string processing - signal processing - control engineering - dynamic / time-varying model simulation - gray code generation So I can say that's not niche/specific. The most sophisticated scan example I wrote is this one to plot the trajectory of a "leaking bucket-spring-damper" system: https://github.com/danilobellini/pyscanprev/blob/v0.1.0/examples/state-space.rst Lag and windowing seem unrelated, something that would be solved with itertools.tee, zip or perhaps a collections.deque and a function (and I remember of doing so on AudioLazy). 2016-10-23 22:38 GMT-02:00 Chris Angelico : > On Mon, Oct 24, 2016 at 11:29 AM, Steven D'Aprano > wrote: > > In fairness I am sick and if I were well I may have been able to keep > > this straight in my head, but calling the variable "prev" is actively > > misleading. I was mislead, and (I think) Chris who just suggested this > > was similar to the SQL "lag" function may have been mislead as well. (Or > > perhaps he was just mislead by me, in which case, sorry Chris!) > > All of the above are possible. I'm skimming the thread, not reading it > in full, and I'm a bit lost as to the point of the proposal, so it's > entirely possible that lag() is unrelated. > > But if it is indeed just reduce(), then it's even simpler. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Mon Oct 24 09:34:27 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Mon, 24 Oct 2016 15:34:27 +0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <20161023155920.GR22471@ando.pearwood.info> References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: +1, especially given that reduce is not something you use very often. You loop and you filter everyday, but you definitely don't need the cumulative result of a sequence everyday. Python already have good any, all, sum and string concatenation stories so most of the FP usual suspect are taken care of. And remember that even when we do have something missing that we use often, it's not always enough to convince Guido to change the language for it. E.G, we have an old and recurrent debate about adding a keyword to assign a temporary calculation in comprehension list so that: [x[0].upper() for x in stuff() if x[0].upper()] can become: [x[0].upper() as word for x in stuff() if word] (and many other variants) All went to a dead end. So if you want to add the accumulate feature to the syntax, you better have a VERY GOOD reason. Le 23/10/2016 ? 17:59, Steven D'Aprano a ?crit : > On Sun, Oct 23, 2016 at 12:57:10PM -0200, Danilo J. S. Bellini wrote: >> The idea is to let generator expressions and list/set comprehensions have a >> clean syntax to access its last output. That would allow them to be an >> alternative syntax to the scan higher-order function [1] (today implemented >> in the itertools.accumulate function), which leads to an alternative way to >> write a fold/reduce. It would be nice to have something like: > > [cut suggested syntax] > >> instead of a reduce: > > [cut four existing ways to solve the problem] > > Why do we need a FIFTH way to solve this problem? What you are > describing is *exactly* the use case for a reduce or fold function. Why > add special magic syntax to make comprehensions do even more? > > Not everything needs to be a one liner. It's okay to import reduce to do > a reduce. Its okay to write an actual for-loop. > >> Actually, I already wrote a solution for something similar to that: >> PyScanPrev [2]. > > Ah, that makes SIX existing solutions. Do we need a seventh? > > > From desmoulinmichel at gmail.com Mon Oct 24 11:21:23 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Mon, 24 Oct 2016 17:21:23 +0200 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: Message-ID: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Le 22/10/2016 ? 10:34, Simon Mark Holland a ?crit : > Having researched this as heavily as I am capable with limited > experience, I would like to suggest a Python 3 equivalent to > string.translate() that doesn't require a table as input. Maybe in the > form of str.stripall() or str.replaceall(). > > My reasoning is that while it is currently possible to easily strip() > preceding and trailing characters, and even replace() individual > characters from a string, to replace more than one characters from > anywhere within the string requires (i believe) at its simplest a > command like this : > > some_string.translate(str.maketrans('','','0123456789')) > > In Python 2.* however we could say ... > > some_string.translate(None, '0123456789') > > My proposal is that if strip() and replace() are important enough to > receive modules, then the arguably more common operation (in terms of > programming tutorials, if not mainstream development) of just removing > all instances of specified numbers, punctuation, or even letters etc > from a list of characters should also. > > I wholeheartedly admit that there are MANY other ways to do this > (including RegEx and List Comprehensions), as listed in the > StackOverflow answer below. However the same could be said for > replace() and strip(). This actually could be implemented directly in str.replace() without breaking the API by accepting: "stuff".replace('a', '') "stuff".replace(('a', 'b', 'c'), '') "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) A pure Python implementation looks like this: https://github.com/Tygs/ww/blob/dev/src/ww/wrappers/strings.py#L229 (this implementation also allow regexes, which is not what you want for the builtin replace(), however, as it would break the performances expectations) I often had the use case of needing to strip many strings so I would +1 for having a nice and easy way to do it. From desmoulinmichel at gmail.com Mon Oct 24 11:41:44 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Mon, 24 Oct 2016 17:41:44 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <580AFA7B.6020603@stoneleaf.us> <20161022071637.GO22471@ando.pearwood.info> Message-ID: <9a0d6c58-d9c7-a7ca-da9b-e91915bcd61f@gmail.com> +1. It's easier to implement, safer, and will educate. It has a real added value. Le 22/10/2016 ? 09:36, Ryan Birmingham a ?crit : > Per the comments in this thread, I believe that a better error message > for this case would be a reasonable way to fix the use case around this > issue. > It can be difficult to notice that your quotes are curved if you don't > know that's what you're looking for. > > -Ryan Birmingham > > On 22 October 2016 at 03:16, Steven D'Aprano > wrote: > > On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote: > > Interesting idea. +1 from me; probably can be as simple as just having the > > tokenizer interpret curly quotes as the ASCII (straight) version of itself > > (in other words, " and the two curly versions of that would all produce the > > same token, and same for single quotes, eliminating any need for additional > > changes further down the chain). > > There's a lot more than two. At least nineteen (including the ASCII > ones): ?????"'???????????? > > > > This would help with copying and pasting > > code snippets from a source that may have auto-formatted the quotes without > > the original author realizing it. > > Personally, I think that we should not encourage programmers to take a > lazy, slap-dash attitude to coding. Precision is important to > programmers, and there is no limit to how imprecise users can be. Should > we also guard against people accidentally using prime marks or ornaments > (dingbats): > > ?????? ?????? > > as well? If not, what makes them different from other accidents of > careless programmers? > > I don't think we should be trying to guess what programmers mean, nor do > I think that we should be encouraging programmers to use word processors > for coding. Use the right tool for the right job, and even Notepad is > better for the occasional programmer than Microsoft Office or > LibreOffice. Programming is hard, requiring precision and care, and we > don't do beginners any favours by making it easy for them to be > imprecise and careless. > > I would be happy to see improved error messages for smart quotes: > > py> s = ?abcd? > File "", line 1 > s = ?abcd? > ^ > SyntaxError: invalid character in identifier > > (especially in IDLE), but I'm very dubious about the idea of using > typographical quote marks for strings. At the very least, Python should > not lead the way here. Let some other language experiment with this > first, and see what happens. Python is a mature, established language, > not an experimental language. > > Of course, there's nothing wrong with doing an experimental branch of > Python supporting this feature, to see what happens. But that doesn't > mean we should impose it as an official language rule. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From chris.barker at noaa.gov Mon Oct 24 13:16:03 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 10:16:03 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On Sat, Oct 22, 2016 at 9:17 AM, Nick Coghlan wrote: > This is actually a case where style guidelines would ideally differ > between between scripting use cases ... and > library(/framework/application) development use cases > Hmm -- interesting idea -- and I recall Guido bringing something like this up on one of these lists not too long ago -- "scripting" use cases really are different that "systems programming" However, that script/library distinction isn't well-defined in > computing instruction in general, no it's not -- except in the case of "scripting languages" vs. "systems languages" -- you can go back to the classic Ousterhout paper: https://www.tcl.tk/doc/scripting.html But Python really is suitable for both use cases, so tricky to know how to teach. And my classes, at least, have folks with a broad range of use-cases in mind, so I can't choose one way or another. And, indeed, there is no small amount of code (and coder) that starts out as a quicky script, but ends up embedded in a larger system down the road. And (another and?) one of the great things ABOUT Python is that is IS suitable for such a broad range of use-cases. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 24 13:32:22 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 10:32:22 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On Sat, Oct 22, 2016 at 8:22 PM, Nick Coghlan wrote: > Pondering this overnight, I realised there's a case where folks using > Python primarily as a scripting language can still run into many of > the resource management problems that arise in larger applications: > IPython notebooks > This is likely mitigated in practice *today* by IPython users mostly > being on CPython for access to the Scientific Python stack, sure -- though there is no reason that Jupyter notebooks aren't really useful to all sort of non-data-crunching tasks. It's just that that's the community it was born in. I can imagine they would be great for database exploration/management, for instance. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped > deterministic resource management *before* introducing with > statements? At first thought, talking about this seems like it would just confuse newbies even MORE. Most of my students really want simple examples they can copy and then change for their specific use case. But I do have some pretty experienced developers (new to Python, but not programming) in my classes, too, that I might be able to bring this up with. # Cleaned up whenever the interpreter gets around to cleaning up > the function locals > def readlines_with_default_resource_management(fname): > return open(fname).readlines() > > # Cleaned up on function exit, even if the locals are still > referenced from an exception traceback > # or the interpreter implementation doesn't use a reference counting GC > from local_resources import function_resource > > def readlines_with_declarative_cleanup(fname): > return function_resource(open(fname)).readlines() > > # Cleaned up at the end of the with statement > def readlines_with_imperative_cleanup(fname): > with open(fname) as f: > return f.readlines() > > The idea here is to change the requirement for new developers from > "telling the interpreter what to *do*" (which is the situation we have > for context managers) to "telling the interpreter what we *want*" > (which is for it to link a managed resource with the lifecycle of the > currently running function call, regardless of interpreter > implementation details) > I can see that, but I'm not sure newbies will -- it either case, you have to think about what you want -- which is the complexity I'm trying to avoid at this stage. Until much later, when I get into weak references, I can pretty much tell people that python will take care of itself with regards to resource management. That's what context mangers are for, in fact. YOU can use: with open(...) as infile: ..... Without needing to know what actually has to be "cleaned up" about a file. In the case of files, it's a close() call, simple enough (in the absence of Exceptions...), but with a database connection or something, it could be a lot more complex, and it's nice to know that it will simply be taken care of for you by the context manager. The big refactoring benefit that this feature would offer over with > statements is that it doesn't require a structural change to the code > - it's just wrapping an existing expression in a new function call > that says "clean this up promptly when the function terminates, even > if it's still part of a reference cycle, or we're not using a > reference counting GC". hmm -- that would be simpler in one sense, but wouldn't it require a new function to be defined for everything you might want to do this with? rather than the same "with" syntax for everything? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Mon Oct 24 13:39:16 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 24 Oct 2016 19:39:16 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() Message-ID: Hello all, I would be happy to see a somewhat more general and user friendly version of string.translate function. It could work this way: string.newtranslate(file_with_table, Drop=True, Dec=True) So the parameters: 1. "file_with_table" : a text file with table in following format: #[In] [Out] 97 {65} 98 {66} 99 {67} 100 {} ... 110 {110} Notes: All values are decimal or hex (to switch between parsing format use Dec parameter) As it turned out from my last discussion, majority prefers hex notation, so I am not in mainstream with my decimal notation here, but both should be supported. Empty [Out] value {} means that the character will be deleted. 2. "Drop = True" this will set the default behavior for those values which are NOT in the table. For Drop = True: all values not defined in table set to [out] = {}, and be deleted. For Drop=False: all values not defined in table set [out] = [in], so those remain as is. 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. Further thoughts: for 8-bit strings this should be simple to implement I think. For 16-bit of course there is issue of memory usage for lookup tables, but the gurus could probably optimise it. E.g. at the parsing stage it is not necessary to build the lookup table for whole 16-bit range of course, but take only values till the largest ordinal present in the table file. About the format of table file: I suppose many users would want also to define characters directly, I am not sure if it is really needed, but if so, additional brackets or escape char could be used, like this for example: a {A} \98 {\66} \99 {\67} but as said I don't like very much the idea and would be OK for me to use numeric values only. So approximately I see it. Feel free to share thoughts or criticise. Mikhail From chris.barker at noaa.gov Mon Oct 24 13:39:02 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 10:39:02 -0700 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: On Sat, Oct 22, 2016 at 4:09 AM, Paul Moore wrote: > there are a lot of environments where smart quotes get > accidentally inserted into code. > > * Tutorial/example material prepared by non-programmers, again using > tools that are too "helpful" in auto-converting to smart quotes. > indeed -- I once id a whole set of python class slides in LaTeX -- really nice format, etc.... but in teh process from LaTeX to PDF, I ended up with stuff that looked like Code, but if you copy and pasted it the quotes were wrong -- but only sometimes -- I got pretty used to fixing it, but still was symied once in a while,a nd it was pretty painful for my students... I think the "better error message" option is the way to go, however. At least until we all have better Unicode support in all our tools.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 24 13:44:02 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 10:44:02 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: my thought on this: If you need translate() you probably can write the code to parse a text file, and then you can use whatever format you want. This seems a very special case to build into the stdlib. -CHB On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V wrote: > Hello all, > > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) > > So the parameters: > > 1. "file_with_table" : a text file with table in following format: > > #[In] [Out] > > 97 {65} > 98 {66} > 99 {67} > 100 {} > ... > 110 {110} > > > Notes: > All values are decimal or hex (to switch between parsing format use > Dec parameter) > As it turned out from my last discussion, majority prefers hex notation, > so I am not in mainstream with my decimal notation here, but both > should be supported. > Empty [Out] value {} means that the character will be deleted. > > 2. "Drop = True" this will set the default behavior for those values > which are NOT in the table. > > For Drop = True: all values not defined in table set to [out] = {}, > and be deleted. > > For Drop=False: all values not defined in table set [out] = [in], so > those remain as is. > > 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. > > > Further thoughts: for 8-bit strings this should be simple to implement > I think. For 16-bit of course > there is issue of memory usage for lookup tables, but the gurus could > probably optimise it. > E.g. at the parsing stage it is not necessary to build the lookup > table for whole 16-bit range of course, > but take only values till the largest ordinal present in the table file. > > About the format of table file: I suppose many users would want also > to define characters directly, I am not sure > if it is really needed, but if so, additional brackets or escape char > could be used, like this for example: > > a {A} > \98 {\66} > \99 {\67} > > but as said I don't like very much the idea and would be OK for me to > use numeric values only. > > So approximately I see it. > Feel free to share thoughts or criticise. > > > Mikhail > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 24 13:48:08 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 10:48:08 -0700 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Message-ID: On Mon, Oct 24, 2016 at 8:21 AM, Michel Desmoulin wrote: > This actually could be implemented directly in str.replace() without > breaking the API by accepting: > > "stuff".replace('a', '') > "stuff".replace(('a', 'b', 'c'), '') > "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) > +1 -- I have found I Need to do this often enough that I've wondered why it's not there. making three calls to replace() isn't too bad, but is klunky and has performance issues. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainventions at gmail.com Mon Oct 24 13:50:58 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Mon, 24 Oct 2016 13:50:58 -0400 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: I also believe that using a text file would not be the best solution; using a dictionary, other data structure, or anonomyous function would make more sense than having a specially formatted file. On Oct 24, 2016 13:45, "Chris Barker" wrote: > my thought on this: > > If you need translate() you probably can write the code to parse a text > file, and then you can use whatever format you want. > > This seems a very special case to build into the stdlib. > > -CHB > > > > > On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V wrote: > >> Hello all, >> >> I would be happy to see a somewhat more general and user friendly >> version of string.translate function. >> It could work this way: >> string.newtranslate(file_with_table, Drop=True, Dec=True) >> >> So the parameters: >> >> 1. "file_with_table" : a text file with table in following format: >> >> #[In] [Out] >> >> 97 {65} >> 98 {66} >> 99 {67} >> 100 {} >> ... >> 110 {110} >> >> >> Notes: >> All values are decimal or hex (to switch between parsing format use >> Dec parameter) >> As it turned out from my last discussion, majority prefers hex notation, >> so I am not in mainstream with my decimal notation here, but both >> should be supported. >> Empty [Out] value {} means that the character will be deleted. >> >> 2. "Drop = True" this will set the default behavior for those values >> which are NOT in the table. >> >> For Drop = True: all values not defined in table set to [out] = {}, >> and be deleted. >> >> For Drop=False: all values not defined in table set [out] = [in], so >> those remain as is. >> >> 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. >> >> >> Further thoughts: for 8-bit strings this should be simple to implement >> I think. For 16-bit of course >> there is issue of memory usage for lookup tables, but the gurus could >> probably optimise it. >> E.g. at the parsing stage it is not necessary to build the lookup >> table for whole 16-bit range of course, >> but take only values till the largest ordinal present in the table file. >> >> About the format of table file: I suppose many users would want also >> to define characters directly, I am not sure >> if it is really needed, but if so, additional brackets or escape char >> could be used, like this for example: >> >> a {A} >> \98 {\66} >> \99 {\67} >> >> but as said I don't like very much the idea and would be OK for me to >> use numeric values only. >> >> So approximately I see it. >> Feel free to share thoughts or criticise. >> >> >> Mikhail >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 24 14:02:00 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 11:02:00 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birmingham wrote: > I also believe that using a text file would not be the best solution; > using a dictionary, > actually, now that you mention it -- .translate() already takes a dict, so if youw ant to put your translation table in a text file, you can use a dict literal to do it: # contents of file: > { 32: 95, > 105: 64, 115: 36, } then use it: s.translate(ast.literal_eval(open("trans_table.txt").read())) now all you need is a tiny little utility function: def translate_from_file(s, filename): return s.translate(ast.literal_eval(open(filename).read())) :-) -Chris > > > > other data structure, or anonomyous function would make more sense than > having a specially formatted file. > > On Oct 24, 2016 13:45, "Chris Barker" wrote: > >> my thought on this: >> >> If you need translate() you probably can write the code to parse a text >> file, and then you can use whatever format you want. >> >> This seems a very special case to build into the stdlib. >> >> -CHB >> >> >> >> >> On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V wrote: >> >>> Hello all, >>> >>> I would be happy to see a somewhat more general and user friendly >>> version of string.translate function. >>> It could work this way: >>> string.newtranslate(file_with_table, Drop=True, Dec=True) >>> >>> So the parameters: >>> >>> 1. "file_with_table" : a text file with table in following format: >>> >>> #[In] [Out] >>> >>> 97 {65} >>> 98 {66} >>> 99 {67} >>> 100 {} >>> ... >>> 110 {110} >>> >>> >>> Notes: >>> All values are decimal or hex (to switch between parsing format use >>> Dec parameter) >>> As it turned out from my last discussion, majority prefers hex notation, >>> so I am not in mainstream with my decimal notation here, but both >>> should be supported. >>> Empty [Out] value {} means that the character will be deleted. >>> >>> 2. "Drop = True" this will set the default behavior for those values >>> which are NOT in the table. >>> >>> For Drop = True: all values not defined in table set to [out] = {}, >>> and be deleted. >>> >>> For Drop=False: all values not defined in table set [out] = [in], so >>> those remain as is. >>> >>> 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. >>> >>> >>> Further thoughts: for 8-bit strings this should be simple to implement >>> I think. For 16-bit of course >>> there is issue of memory usage for lookup tables, but the gurus could >>> probably optimise it. >>> E.g. at the parsing stage it is not necessary to build the lookup >>> table for whole 16-bit range of course, >>> but take only values till the largest ordinal present in the table file. >>> >>> About the format of table file: I suppose many users would want also >>> to define characters directly, I am not sure >>> if it is really needed, but if so, additional brackets or escape char >>> could be used, like this for example: >>> >>> a {A} >>> \98 {\66} >>> \99 {\67} >>> >>> but as said I don't like very much the idea and would be OK for me to >>> use numeric values only. >>> >>> So approximately I see it. >>> Feel free to share thoughts or criticise. >>> >>> >>> Mikhail >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 24 14:32:09 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 24 Oct 2016 19:32:09 +0100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On 24 October 2016 at 18:39, Mikhail V wrote: > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) Using a text file seems very odd. But regardless, this could *easily* be published on PyPI, and then if it gained enough users be proposed for the stdlib. I don't think there's anything like sufficient value to warrant "fast-tracking" something like this direct to the stdlib. And real-world use via PyPI would very quickly establish whether the unusual "pass a file with a translation table in it" design was acceptable to users. Paul From mikhailwas at gmail.com Mon Oct 24 16:30:22 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 24 Oct 2016 22:30:22 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On 24 October 2016 at 20:02, Chris Barker wrote: > On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birmingham > wrote: >> >> I also believe that using a text file would not be the best solution; >> using a dictionary, > > > actually, now that you mention it -- .translate() already takes a dict, so > if youw ant to put your translation table in a text file, you can use a dict > literal to do it: > > # contents of file: > > > { > 32: 95, > > 105: 64, > 115: 36, > } > > then use it: > > s.translate(ast.literal_eval(open("trans_table.txt").read())) > > now all you need is a tiny little utility function: > > def translate_from_file(s, filename): > return s.translate(ast.literal_eval(open(filename).read())) > > > :-) > > -Chris > Yes making special file format is not a good option I agree. Also of course it does not have sence to read it everytime if translate is called in a loop with the same table. So it was merely a sketch of behaviour. But how would you with current translate function drop all characters that are not in the table? so I can pass [deletechars] to the function but this seems not very convenient to me -- very often I want to drop them *all*, excluding some particular values. This for example is needed for filtering out all non-standard characters from paths, etc. So in other words, there should be an option to control this behavior. Probably I am missing something here, but I didn't find such solution for translate() and that is main point of proposal actually. It is all the same as translate() but with this extension it can cover much more usage cases. Mikhail From chris.barker at noaa.gov Mon Oct 24 16:54:58 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Oct 2016 13:54:58 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On Mon, Oct 24, 2016 at 1:30 PM, Mikhail V wrote: > But how would you with current translate function drop all characters > that are not in the table? that is another question altogether, and one for a different list, actually. I don't know a way to do "remove every character except these", but someone I expect there is a way to do that efficiently with Python strings. you could probably (ab)use the codecs module, though. If there really is no way to do it, then you might have feature worth pursuing, but be prepared with use-cases! The only use-case I've had for that sort of this is when I want only ASCII -- but I can uses the ascii codec for that :-) This for example > is needed for filtering out all non-standard characters from paths, etc. > You'd usually want to replace those with something, rather than remove them entirely, yes? -CHB > So in other words, there should be an option to control this behavior. > Probably I am missing something here, but I didn't find such solution > for translate() and that is main point of proposal actually. > It is all the same as translate() but with this extension it can cover > much more usage cases. > > > Mikhail > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 24 17:10:04 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 24 Oct 2016 22:10:04 +0100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On 24 October 2016 at 21:54, Chris Barker wrote: > I don't know a way to do "remove every character except these", but someone > I expect there is a way to do that efficiently with Python strings. It's easy enough with the re module: >>> re.sub('[^0-9]', '', 'ab0c2m3g5') '0235' Possibly because there's a lot of good Python builtins that allow you to avoid the re module when *not* needed, it's easy to forget it in the cases where it does pretty much exactly what you want, or can be persuaded to do so with much less difficulty than rolling your own solution (I know I'm guilty of that...). Paul From mikhailwas at gmail.com Mon Oct 24 17:31:11 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 24 Oct 2016 23:31:11 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On 24 October 2016 at 22:54, Chris Barker wrote: > On Mon, Oct 24, 2016 at 1:30 PM, Mikhail V wrote: >> >> But how would you with current translate function drop all characters >> that are not in the table? > > > that is another question altogether, and one for a different list, actually. > > I don't know a way to do "remove every character except these", but someone > I expect there is a way to do that efficiently with Python strings. > > you could probably (ab)use the codecs module, though. > > If there really is no way to do it, then you might have feature worth > pursuing, but be prepared with use-cases! > > The only use-case I've had for that sort of this is when I want only ASCII > -- but I can uses the ascii codec for that :-) > >> This for example >> is needed for filtering out all non-standard characters from paths, etc. > > > You'd usually want to replace those with something, rather than remove them > entirely, yes? Just a pair of usage cases which I was facing in my practice: 1. Imagine I perform some admin tasks in a company with very different users who also tend to name the files as they wish. So only God knows what can be there in filenames. And I know foe example that there can be Cyrillic besides ASCII their. So I just define a table like: { 1072: 97 1073: 98 1074: 99 ... [which localizes Cyrillic into ASCII] ... 97:97 98:98 99:99 ... [those chars that are OK, leave them] } Then I use os.walk() and os.rename() and voila! the file system regains it virginity in one simple script. 2. Say I have a multi-lingual file or whatever, I want to filter out some unwanted characters so I can do it similarly. Mikhail From rosuav at gmail.com Mon Oct 24 17:56:07 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 25 Oct 2016 08:56:07 +1100 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 4:48 AM, Chris Barker wrote: > On Mon, Oct 24, 2016 at 8:21 AM, Michel Desmoulin > wrote: >> >> This actually could be implemented directly in str.replace() without >> breaking the API by accepting: >> >> "stuff".replace('a', '') >> "stuff".replace(('a', 'b', 'c'), '') >> "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) > > > +1 -- I have found I Need to do this often enough that I've wondered why > it's not there. > > making three calls to replace() isn't too bad, but is klunky and has > performance issues. And it may not be semantically identical. In the examples above, three separate replace calls would work, but a syntax like this ought to be capable of an exchange - "aabbccdd".replace(('b', 'd'), ('d', 'b')) == "aaddccbb". ChrisA From rymg19 at gmail.com Mon Oct 24 18:07:13 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 24 Oct 2016 17:07:13 -0500 Subject: [Python-ideas] Showing qualified names when a function call fails Message-ID: I personally find it kind of annoying when you have code like this: x = A(1, B(2, 3)) and Python's error message looks like this: TypeError: __init__() takes 1 positional argument but 2 were given It doesn't give much of a clue to which `__init__` is being called. At all. The idea: when showing the function name in an error like this, show the fully qualified name, like: TypeError: A.__init__() takes 1 positional argument but 2 were given This would be MUCH more helpful! Another related change would be to do the same thing in tracebacks: Traceback (most recent call last): File "", line 1, in File "", line 2, in __init__ AssertionError to: Traceback (most recent call last): File "", line 1, in File "", line 2, in MyClass.__init__ AssertionError which could make it easier to find where exactly an error originated. -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From neatnate at gmail.com Mon Oct 24 18:11:48 2016 From: neatnate at gmail.com (Nathan Schneider) Date: Mon, 24 Oct 2016 18:11:48 -0400 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Message-ID: On Mon, Oct 24, 2016 at 5:56 PM, Chris Angelico wrote: > On Tue, Oct 25, 2016 at 4:48 AM, Chris Barker > wrote: > > On Mon, Oct 24, 2016 at 8:21 AM, Michel Desmoulin > > wrote: > >> > >> This actually could be implemented directly in str.replace() without > >> breaking the API by accepting: > >> > >> "stuff".replace('a', '') > >> "stuff".replace(('a', 'b', 'c'), '') > >> "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) > > > > > > +1 -- I have found I Need to do this often enough that I've wondered why > > it's not there. > > > > making three calls to replace() isn't too bad, but is klunky and has > > performance issues. > > And it may not be semantically identical. In the examples above, three > separate replace calls would work, but a syntax like this ought to be > capable of an exchange - "aabbccdd".replace(('b', 'd'), ('d', 'b')) == > "aaddccbb". > What would be the expected behavior of "aabbccdd".replace(('a', 'aa'), ('x', 'y'))? It's not obvious to me whether longer replacement strings ('aa') or earlier replacement strings ('a') should take priority. Or is the proposal to only support this for replacements of single characters? Nathan > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Mon Oct 24 18:17:20 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Tue, 25 Oct 2016 00:17:20 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: On 24 October 2016 at 23:10, Paul Moore wrote: > On 24 October 2016 at 21:54, Chris Barker wrote: >> I don't know a way to do "remove every character except these", but someone >> I expect there is a way to do that efficiently with Python strings. > > It's easy enough with the re module: > >>>> re.sub('[^0-9]', '', 'ab0c2m3g5') > '0235' > > Possibly because there's a lot of good Python builtins that allow you > to avoid the re module when *not* needed, it's easy to forget it in > the cases where it does pretty much exactly what you want, or can be > persuaded to do so with much less difficulty than rolling your own > solution (I know I'm guilty of that...). > > Paul Thanks, this would solve the task of course. However for example in the case in my last example (filenames) this would require: - Write a function to construct the expression for "all except given" characters from my table. This could be easy I believe, but still another task. Then: 1. Apply translate() with my table to the string. 2. Apply re.sub() to the string. I usually start using RE when I want to find/replace words or patterns, but not translate/filter the characters directly. So since there is already an "inclusive" translate() then probably having an "exclusive" one is not a bad idea. I believe it is something very similar in implementation, so instead of appending next character which is not in the table, it simply does nothing. Mikhail From greg.ewing at canterbury.ac.nz Mon Oct 24 18:24:27 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Oct 2016 11:24:27 +1300 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> Message-ID: <580E8A1B.1000404@canterbury.ac.nz> There was a discussion about this a while ago. From what I remember, the conclusion reached was that there are too many degrees of freedom to be able to express reduction operations in a comprehension-like way that's any clearer than just using reduce() or writing out the appropriate loops. -- Greg From rosuav at gmail.com Mon Oct 24 18:54:29 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 25 Oct 2016 09:54:29 +1100 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 9:11 AM, Nathan Schneider wrote: > What would be the expected behavior of "aabbccdd".replace(('a', 'aa'), ('x', > 'y'))? It's not obvious to me whether longer replacement strings ('aa') or > earlier replacement strings ('a') should take priority. I'm actually not sure, so I would look at prior art. But in any case, this is a question you can't even ask until replace() accepts multiple arguments. Hence I'm +1 on the notion of simultaneous replacements being supported. ChrisA From chris.barker at noaa.gov Mon Oct 24 20:29:50 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 24 Oct 2016 17:29:50 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: <-3835296038052368971@unknownmsgid> >> >>>>> re.sub('[^0-9]', '', 'ab0c2m3g5') >> '0235' >> >> Possibly because there's a lot of good Python builtins that allow you >> to avoid the re module when *not* needed, it's easy to forget it in >> the cases where it does pretty much exactly what you want, There is a LOT of overhead to figuring out how to use the re module. I've always though t it had it's place, but it sure seems like overkill for something this seemingly simple. If (a big if) removing "all but these" was a common use case, it would be nice to have a way to do it with string methods. This is a classic case of: Put it on PyPi, and see how much interest it garners. -CHB From chris.barker at noaa.gov Mon Oct 24 20:33:50 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 24 Oct 2016 17:33:50 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: <-5894375791951498087@unknownmsgid> > Just a pair of usage cases which I was facing in my practice: > So I just define a table like: > { > 1072: 97 > 1073: 98 > 1074: 99 > ... > [which localizes Cyrillic into ASCII] > ... > 97:97 > 98:98 > 99:99 > ... > [those chars that are OK, leave them] > } > > Then I use os.walk() and os.rename() and voila! the file system > regains it virginity > in one simple script. This sounds like a perfect use case for str.translate() as it is. > 2. Say I have a multi-lingual file or whatever, I want to filter out > some unwanted > characters so I can do it similarly. Filtering out is different-- but I would think that you would want replace, rather than remove. If you wanted names to all comply with a given encoding (ascii or Latin-1, or...), then encoding/decoding (with error set to replace) would do nicely. -CHB > > > Mikhail From chris.barker at noaa.gov Mon Oct 24 20:37:29 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 24 Oct 2016 17:37:29 -0700 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> Message-ID: <-2137551868925949077@unknownmsgid> > On Oct 24, 2016, at 3:54 PM, Chris Angelico wrote: > . But in any case, > this is a question you can't even ask until replace() accepts multiple > arguments. Hence I'm +1 on the notion of simultaneous replacements > being supported. Agreed -- there are a lot of edge cases to work out, and there is not one way to define the API, but if folks think it's a good idea, we can hash those out. If anyone decides to take this on, be prepared for a lot of bike shedding! -CHB > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 24 21:59:49 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Oct 2016 10:59:49 +0900 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: <22542.48277.333896.349836@turnbull.sk.tsukuba.ac.jp> Chris Barker wrote: > Nick Coghlan wrote: >> Chris, would you be open to trying a thought experiment with some of >> your students looking at ways to introduce function-scoped >> deterministic resource management *before* introducing with >> statements? I'm with Chris, I think: this seems inappropriate to me. A student has to be rather sophisticated to understand resource management at all in Python. Eg, generators and closures can hang on to resources between calls, yet there's no syntactic marker at the call site. >> The idea here is to change the requirement for new developers from >> "telling the interpreter what to *do*" (which is the situation we have >> for context managers) to "telling the interpreter what we *want*" >> (which is for it to link a managed resource with the lifecycle of the >> currently running function call, regardless of interpreter >> implementation details) I think this attempt at a distinction is spurious. On the syntactic side, with open("file") as f: results = read_and_process_lines(f) the with statement effectively links management of the file resource to the lifecycle of read_and_process_lines. (Yes, I know what you mean by "link" -- will "new developers"?) On the semantic side, constructs like closures and generators (which they may be cargo- culting!) mean that it's harder to link resource management to (syntactic) function calls than a new developer might think. (Isn't that Nathaniel's motivation for the OP?) And then there's the loop that may not fully consume an iterator problem: that must be explicitly decided -- the question for language designers is which of "close generators on loop exit" or "leave generators open on loop exit" should be marked with explicit syntax -- and what if you've got two generators involved, and want different decisions for both? Chris: > I can see that, but I'm not sure newbies will -- it either case, > you have to think about what you want -- which is the complexity > I'm trying to avoid at this stage. Indeed. > Until much later, when I get into weak references, I can pretty > much tell people that python will take care of itself with regards > to resource management. I hope you phrase that very carefully. Python takes care of itself, but does not take care of the use case. That's the programmer's responsibility. In a very large number of use cases, including the novice developer's role in a large project, that is a distinction that makes no difference. But the "close generators on loop exit" (or maybe not!) use case makes it clear that in general the developer must explicitly manage resources. > That's what context mangers are for, in fact. YOU can use: > > with open(...) as infile: > ..... > > Without needing to know what actually has to be "cleaned up" about > a file. In the case of files, it's a close() call, simple enough > (in the absence of Exceptions...), but with a database connection > or something, it could be a lot more complex, and it's nice to know > that it will simply be taken care of for you by the context > manager. But somebody has to write that context manager. I suppose in the organizational context imagined here, it was written for the project by the resource management wonk in the group, and the new developer just cargo-cults it at first. > > The big refactoring benefit that this feature would offer over > > with statements is that it doesn't require a structural change to > > the code - it's just wrapping an existing expression in a new > > function call that says "clean this up promptly when the function > > terminates, even if it's still part of a reference cycle, or > > we're not using a reference counting GC". > > hmm -- that would be simpler in one sense, but wouldn't it require > a new function to be defined for everything you might want to do > this with? rather than the same "with" syntax for everything? Even if it can be done with a single "ensure_cleanup" function, Python isn't Haskell. I think context management deserves syntax to mark it. After all, from the "open and read one file" scripting standpoint, there's really not a difference between f = open("file") process(f) and with open("file") as f: process(f) (see "taking care of Python ~= taking care of use case" above). But the with statement and indentation clearly mark the call to process as receiving special treatment. As Chris says, the developer doesn't need to know anything but that the object returned by the with expression participates "appropriately" in the context manager protocol (which she may think of as the "with protocol"!, ie, *magic*) and gets the "special treatment" it needs. So (for me) this is full circle: "with" context management is what we need, but it interacts poorly with stateful "function" calls -- and that's what Nathaniel proposes to deal with. From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 24 22:00:30 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Oct 2016 11:00:30 +0900 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> Message-ID: <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Chris Barker writes: > I think the "better error message" option is the way to go, > however. At least until we all have better Unicode support in all > our tools.... I don't think "better Unicode support" helps with confusables in programming languages that value TOOWTDI. OK, we already have 4 kinds of quoting in Python which suggests that TOOWTDI doesn't apply to quoting, but I think that's a bit naive. Given the frequency with which quotes appear in strings, and the fact that English quotation marks can't nest but rarely need to nest more than once, use of both "" and '' with identical semantics to make one level of nesting convenient and readable was plausible. The use of triple quotes for block quoting again has arguments for it. You can think that these were experiments with "meh" results[1], but I don't think it's appropriate to say that therefore TOOWTDI doesn't apply to quote marks. As a general rule, I think use of confusables in new syntax (eg, double curly quotes = f"") runs into "Syntax shall not look like grit on Tim's screen". OTOH, better Unicode support should (cautiously) be used to support new operators and syntax subject to TOOWDTI and other considerations of Pythonicity. Footnotes: [1] Personally, I immediately liked the triple quotes, because the (Emacs) Lisp convention of allowing literal newline characters in all strings caused a number of small annoyances. I also quickly evolved a personal convention where single quotes indicate "string as protocol constant" (eg, where today we'd use enums), while double quotes indicate "arbitrary text content". But those are both obviously YMMV evaluations. From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 24 21:38:44 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Oct 2016 10:38:44 +0900 Subject: [Python-ideas] Civility on this mailing list In-Reply-To: References: <3702cac3-f59c-75d9-281c-6edb40ed4592@gmail.com> Message-ID: <22542.47012.212764.539348@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > P.S. Given the existence of the constraints discussed above, folks may > then be curious as to why we have a brainstorming list at all, given > that the default answer is almost always going to be "No", Besides providing a place that encourages discussion of ideas from out of the blue that just might be evolutionary steps forward, it also provides a place where language design principles can be discussed and illustrated in the context of concrete proposals, and an archive of those discussions. I realize that it's a significant amount of effort to find the discussions where principles are enunciated and elaborated, and don't have a good solution to propose to those who prefer not to spend the effort. But the resource is there. From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 24 22:26:43 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Oct 2016 11:26:43 +0900 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> Message-ID: <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> Danilo J. S. Bellini writes: > >>> [prev * k for k in [5, 2, 4, 3] from prev = 1] > [1, 5, 10, 40, 120] > Among the examples I wrote on PyScanPrev, there are use cases on: > - maths > - physics > - economics As a practicing economist, I wonder what use cases you're referring to. I can't think of any use cases where if one previous value is useful, having all previous values available (ie, an arbitrary temporal structure, at the modeler's option) isn't vastly more useful. This means that in modern econometrics, for example, simple procedures like Cochrane-Orcutt (which handles one previous value of the dependent variable in a single-equation regression) are subsumed in ARIMA and VAR estimation, which generalize the number of equations and/or the number of lags to greater than one. BTW, numerical accuracy considerations often mean you don't want to use the compact "for ... in ... if ..." expression syntax anyway, as accuracy can often be greatly improved with appropriate reordering of values in the series. Even "online" regression algorithms, where you might think to write ( updated_model(datum, prev) for datum in sensor_data() from prev = something ) 'prev' need to refer not to the previous value of 'datum', but to the previous value of 'updated_model()' (since you need a sufficient statistic for all previous data). And 'prev' as currently conceived is just plain useless for any long period moving average, etc. So in the end, even if there are plausible use cases for quick and dirty code, an experienced implementer wouldn't use them anyway as more powerful tools are likely to be immediately to hand. From steve at pearwood.info Mon Oct 24 22:37:05 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Oct 2016 13:37:05 +1100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: Message-ID: <20161025023704.GD15983@ando.pearwood.info> On Mon, Oct 24, 2016 at 07:39:16PM +0200, Mikhail V wrote: > Hello all, > > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) That's an interesting concept for "user friendly". Apart from functions that are actually designed to read files of a particular format, can you think of any built-in functions that take a file as argument? This is how you would use this "user friendly version of translate": path = '/tmp/table' # hope no other program is using it... with open(path, 'w') as f: f.write('97 {65}\n') f.write('98 {66}\n') f.write('99 {67}\n') with open(path, 'r') as f: new_string = old_string.newtranslate(f, False, True) Compared to the existing solution: new_string = old_string.translate(str.maketrans('abc', 'ABC')) Mikhail, I appreciate that you have many ideas and want to share them, but try to think about how those ideas would work. The Python standard library is full of really well-designed programming interfaces. You can learn a lot by thinking "what existing function is this like? how does that existing function work?". str.translate and str.maketrans already exist. Look at how maketrans builds a translation table: it can take either two equal length strings, and maps characters in one to the equivalent character in the other: str.maketrans('abc', 'ABC') Or it can take a mapping (usually a dict) that maps either characters or ordinal numbers to a new string (not just a single character, but an arbitrary string) or ordinal numbers. str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) (or None, to delete them). Note the flexibility: you don't need to specify ahead of time whether you are specifying the ordinal value as a decimal, hex, octal or binary value. Any expression that evaluates to a string or a int within the legal range is valid. That's a good programming interface. Could it be better? Perhaps. I've suggested that maybe translate could automatically call maketrans if given more than one argument. Maybe there's an easier way to just delete unwanted characters. Perhaps there could be a way to say "any character not in the translation table should be dropped". These are interesting questions. > Further thoughts: for 8-bit strings this should be simple to implement > I think. I doubt that these new features will be added to bytes as well as strings. For 8-bits byte strings, it is easy enough to generate your own translation and deletion tables -- there are only 256 values to consider. > For 16-bit of course > there is issue of memory usage for lookup tables, but the gurus could > probably optimise it. There are no 16-bit strings. Unicode is a 21-bit encoding, usually encoded as either fixed-width sequence of 4-byte code units (UTF-32) or a variable-width sequence of 2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a "16-bit string". [...] > but as said I don't like very much the idea and would be OK for me to > use numeric values only. I think you are very possibly the only Python programmer in the world who thinks that writing decimal ordinal values is more user-friendly than writing the actual character itself. I know I would much rather see $, ? or ? than 36, 960 or 9556. -- Steve From sjoerdjob at sjoerdjob.com Tue Oct 25 02:57:41 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Tue, 25 Oct 2016 08:57:41 +0200 Subject: [Python-ideas] Easily remove characters from a string. In-Reply-To: <-2137551868925949077@unknownmsgid> References: <53b8b5ce-6171-b1b6-31da-b870cdddbb7c@gmail.com> <-2137551868925949077@unknownmsgid> Message-ID: <20161025065741.GJ13170@sjoerdjob.com> On Mon, Oct 24, 2016 at 05:37:29PM -0700, Chris Barker - NOAA Federal wrote: > > On Oct 24, 2016, at 3:54 PM, Chris Angelico wrote: > > > . But in any case, > > this is a question you can't even ask until replace() accepts multiple > > arguments. Hence I'm +1 on the notion of simultaneous replacements > > being supported. > > Agreed -- there are a lot of edge cases to work out, and there is not > one way to define the API, but if folks think it's a good idea, we can > hash those out. > > If anyone decides to take this on, be prepared for a lot of bike shedding! Regarding prior art, I think that the PHP ``strtr`` function is a good example: http://php.net/manual/en/function.strtr.php Especially with regards to the ``replace_pairs`` argument: If given two arguments, the second should be an array in the form array('from' => 'to', ...). The return value is a string where all the occurrences of the array keys have been replaced by the corresponding values. The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again. This is one I have sometimes used when writing a mini template language, where `{{ username }}` had to be replaced. In contrast to other ways, ``strtr`` gives a one-pass garantuee, which means that it was safe against hypothetical attacks where one would add a template-string to one of the values. From danilo.bellini at gmail.com Tue Oct 25 03:18:46 2016 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Tue, 25 Oct 2016 05:18:46 -0200 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> Message-ID: > > As a practicing economist, I wonder what use cases you're referring > to. I can't think of any use cases where if one previous value is > useful, having all previous values available (ie, an arbitrary > temporal structure, at the modeler's option) isn't vastly more useful. > Well, see the itertools.accumulate examples yourself then, the ones at docs.python.org... We can start with something really simple like interest rates or uniform series, but... before arguing here, please convince other people to update the Wikipedia: "Recurrence relations, especially linear recurrence relations, are used extensively in both theoretical and empirical economics." https://en.wikipedia.org/wiki/Recurrence_relation#Economics So in the end, even if there are plausible use cases for quick and dirty > code, [...] > The proposal isn't about quick and dirty code. The State-space example includes a general linear time-varying MIMO simulation implementation trying to keep the syntax as similar as possible to the control theory engineers are used to. Also, my goal when I was looking for a scan syntax to solve the conditional toggling example was to make it cleaner. If you aren't seeing the examples I wrote, I wonder what are you writing about. There was a discussion about this a while ago. And where's the link? [...]. From what > I remember, the conclusion reached was that there are too > many degrees of freedom to be able to express reduction > operations in a comprehension-like way that's any clearer > I don't know if that's a conclusion from any other thread, but that's wrong. The only extra "freedom" required a way to access the previous output (or "accumulator", "memory", "state"... you can use the name you prefer, but they're all the same). How many parameters does itertools.scan have? And map/filter? I can't talk about a discussion I didn't read, it would be unfair, disrespectful. Perhaps that discussion was about an specific proposal and not about the requirements to express a scan/fold. This proposal should be renamed to "recursive list comprehension". 3 words, and it's a complete description of what I'm talking about. For people from a functional programming background, that's about an alternative syntax to write the scan higher order function. Forget the word "reduce", some people here seem to have way too much taboo with that word, and I know there are people who would prefer a higher McCabe complexity just to avoid it. Perhaps there are people who prefer masochist rituals instead of using "reduce", who knows? Who cares? I like reduce, but I'm just proposing a cleaner syntax for recursive list comprehensions, and "reduce" isn't the general use case for that. On the contrary, "reduce" is just the specific scenario where only the last value matters. [...] you better have a VERY GOOD reason. I spent months writing PyScanPrev, mainly the examples in several reStructuredText files, not because I was forcing that, but because there are way too many use cases for it, and I know those examples aren't exhaustive. https://pypi.python.org/pypi/pyscanprev But talking about "good reasons" reminds me of the "annotations"! A function that uses annotations for one library can't use annotations for another library unless their annotation values are the same, but if one package/framework needs "type information" from your function parameters/result and another package/framework collects "documentation" from it, there's no way to get that working together. Something like "x : int = 2" makes the default assignment seem like an assignment to the annotation, and there's even a new token "->" for annotations. When I saw Python annotations at first I though it was a joke, now I know it's something serious with [mutually incompatible] libraries/packages using them. I strongly agree that everything should need a good reason, but I wrote a lot about the scan use cases and no one here seem to have read what I wrote, and the only reason that matters seem to be a kind of social status, not really "reason". I probably wrote way more reasons for that proposal than annotations could ever have. But if no one seem to care enough to read, then why should I insist? That's like my pprint bugfix patch some months ago, was it applied? AFAIK not even core developers giving +1 was enough for it to be applied. This maillist isn't very inviting... but I hope some of you at least try to read the rationale and the examples. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Oct 25 03:19:57 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 25 Oct 2016 00:19:57 -0700 (PDT) Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> I was thinking of posting something like the first suggestion myself. Both would be a great additions. On Monday, October 24, 2016 at 6:10:52 PM UTC-4, Ryan Gonzalez wrote: > > I personally find it kind of annoying when you have code like this: > > > x = A(1, B(2, 3)) > > > and Python's error message looks like this: > > > TypeError: __init__() takes 1 positional argument but 2 were given > > > It doesn't give much of a clue to which `__init__` is being called. At all. > > The idea: when showing the function name in an error like this, show the > fully qualified name, like: > > > TypeError: A.__init__() takes 1 positional argument but 2 were given > > > This would be MUCH more helpful! > > > Another related change would be to do the same thing in tracebacks: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in __init__ > AssertionError > > > to: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in MyClass.__init__ > AssertionError > > > which could make it easier to find where exactly an error originated. > > -- > Ryan (????) > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 25 03:53:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Oct 2016 17:53:45 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On 25 October 2016 at 03:16, Chris Barker wrote: > On Sat, Oct 22, 2016 at 9:17 AM, Nick Coghlan wrote: > >> >> This is actually a case where style guidelines would ideally differ >> between between scripting use cases ... and >> library(/framework/application) development use cases > > > Hmm -- interesting idea -- and I recall Guido bringing something like this > up on one of these lists not too long ago -- "scripting" use cases really > are different that "systems programming" > >> However, that script/library distinction isn't well-defined in >> computing instruction in general, > > no it's not -- except in the case of "scripting languages" vs. "systems > languages" -- you can go back to the classic Ousterhout paper: > > https://www.tcl.tk/doc/scripting.html > > But Python really is suitable for both use cases, so tricky to know how to > teach. Steven Lott was pondering the same question a few years back (regarding his preference for teaching procedural programming before any other paradigms), so I had a go at articulating the general idea: http://www.curiousefficiency.org/posts/2011/08/scripting-languages-and-suitable.html The main paragraph is still pretty unhelpful though, since I handwave away the core of the problem as "the art of software design": """A key part of the art of software design is learning how to choose an appropriate level of complexity for the problem at hand - when a problem calls for a simple script, throwing an entire custom application at it would be overkill. On the other hand, trying to write complex applications using only scripts and no higher level constructs will typically lead to an unmaintainable mess.""" Cheers, Nick. P.S. I'm going to stop now since we're getting somewhat off-topic, but I wanted to highlight this excellent recent article on the challenges of determining the level of "suitable complexity" for any given software engineering problem: https://hackernoon.com/how-to-accept-over-engineering-for-what-it-really-is-6fca9a919263#.k4nqzjl52 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Tue Oct 25 03:23:38 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 25 Oct 2016 00:23:38 -0700 (PDT) Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> Message-ID: Also, for something like this: In [1]: class A: ...: pass ...: In [2]: A(x=2) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 A(x=2) TypeError: object() takes no parameters It would be nice to say TypeError: object() takes no parameters, but keyword argument "x" given. I understand that the value of "x" might not have a __repr__ method, but the key name has to be string, so this should be easily doable at least for extra keyword arguments? For positional arguments, maybe just print how many were passed to object? Knowing the key name would have helped me with debugging. Usually, I print(kwargs) somewhere up the inheritance chain and run my program again. On Tuesday, October 25, 2016 at 3:19:57 AM UTC-4, Neil Girdhar wrote: > > I was thinking of posting something like the first suggestion myself. > Both would be a great additions. > > On Monday, October 24, 2016 at 6:10:52 PM UTC-4, Ryan Gonzalez wrote: >> >> I personally find it kind of annoying when you have code like this: >> >> >> x = A(1, B(2, 3)) >> >> >> and Python's error message looks like this: >> >> >> TypeError: __init__() takes 1 positional argument but 2 were given >> >> >> It doesn't give much of a clue to which `__init__` is being called. At >> all. >> >> The idea: when showing the function name in an error like this, show the >> fully qualified name, like: >> >> >> TypeError: A.__init__() takes 1 positional argument but 2 were given >> >> >> This would be MUCH more helpful! >> >> >> Another related change would be to do the same thing in tracebacks: >> >> >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 2, in __init__ >> AssertionError >> >> >> to: >> >> >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 2, in MyClass.__init__ >> AssertionError >> >> >> which could make it easier to find where exactly an error originated. >> >> -- >> Ryan (????) >> [ERROR]: Your autotools build scripts are 200 lines longer than your >> program. Something?s wrong. >> http://kirbyfan64.github.io/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 25 04:16:34 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Oct 2016 18:16:34 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> Message-ID: On 25 October 2016 at 03:32, Chris Barker wrote: > On Sat, Oct 22, 2016 at 8:22 PM, Nick Coghlan wrote: >> The big refactoring benefit that this feature would offer over with >> statements is that it doesn't require a structural change to the code >> - it's just wrapping an existing expression in a new function call >> that says "clean this up promptly when the function terminates, even >> if it's still part of a reference cycle, or we're not using a >> reference counting GC". > > hmm -- that would be simpler in one sense, but wouldn't it require a new > function to be defined for everything you might want to do this with? rather > than the same "with" syntax for everything? Nope, hence the references to contextlib.ExitStack: https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack That's a tool for dynamic manipulation of context managers, so even today you can already write code like this: >>> @with_resource_manager ... def example(rm, *, msg=None, exc=None): ... rm.enter_context(cm()) ... rm.callback(print, "Deferred callback") ... if msg is not None: print(msg) ... if exc is not None: raise exc ... >>> example(msg="Normal return") Enter CM Normal return Deferred callback Exit CM >>> example(exc=RuntimeError("Exception thrown")) Enter CM Deferred callback Exit CM Traceback (most recent call last): ... RuntimeError: Exception thrown The setup code to support it is just a few lines of code: >>> import functools >>> from contextlib import ExitStack >>> def with_resource_manager(f): ... @functools.wraps(f) ... def wrapper(*args, **kwds): ... with ExitStack() as rm: ... return f(rm, *args, **kwds) ... return wrapper ... Plus the example context manager definition: >>> from contextlib import contextmanager >>> @contextmanager ... def cm(): ... print("Enter CM") ... try: ... yield ... finally: ... print("Exit CM") ... So the gist of my proposal (from an implementation perspective) is that if we give frame objects an ExitStack instance (or an operational equivalent) that can be created on demand and will be cleaned up when the frame exits (regardless of how that happens), then we can define an API for adding "at frame termination" callbacks (including making it easy to dynamically add context managers to that stack) without needing to define your own scaffolding for that feature - it would just be a natural part of the way frame objects work. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Oct 25 04:33:35 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Oct 2016 18:33:35 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <22542.48277.333896.349836@turnbull.sk.tsukuba.ac.jp> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> <22542.48277.333896.349836@turnbull.sk.tsukuba.ac.jp> Message-ID: On 25 October 2016 at 11:59, Stephen J. Turnbull wrote: > On the semantic side, > constructs like closures and generators (which they may be cargo- > culting!) mean that it's harder to link resource management to > (syntactic) function calls than a new developer might think. (Isn't > that Nathaniel's motivation for the OP?) This is my read of Nathaniel's motivation as well, and hence my proposal: rather than trying to auto-magically guess when a developer intended for their resource management to be linked to the current executing frame (which requires fundamentally changing how iteration works in a way that breaks the world, and still doesn't solve the problem in general), I'm starting to think that we instead need a way to let them easily say "This resource, the one I just created or have otherwise gained access to? Link its management to the lifecycle of the currently running function or frame, so it gets cleaned up when it finishes running". Precisely *how* a particular implementation did that resource management would be up to the particular Python implementation, but one relatively straightforward way would be to use contextlib.ExitStack under the covers, and then when the frame finishes execution have a check that goes: - did the lazily instantiated ExitStack instance get created during frame execution? - if yes, close it immediately, thus reclaiming all the registered resources The spelling of the *surface* API though is something I'd need help from educators in designing - my problem is that I already know all the moving parts and how they fit together (hence my confidence that something like this would be relatively easy to implement, at least in CPython, if we decided we wanted to do it), but I *don't* know what kinds for terms could be used in the API if we wanted to make it approachable to relative beginners. My initial thought would be to offer: from local_resources import function_resource and: from local_resources import frame_resource Where the only difference between the two is that the first one would complain if you tried to use it outside a normal function body, while the second would be usable anywhere (function, class, module, generator, coroutine). Both would accept and automatically enter context managers as input, as if you'd wrapped the rest of the frame body in a with statement. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From desmoulinmichel at gmail.com Tue Oct 25 09:20:24 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 25 Oct 2016 15:20:24 +0200 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> Message-ID: <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Thos are great idea. I love the current trend of trying to give better error messages. Another pet peave of mine: TypeError not providing at least 'raiser' 'accepted type', 'given value' and 'given value type'. E.G, int() does an ok job: >>> int(foo) ValueError: invalid literal for int() with base 10: '1O' raiser: int() accept type: a base 10 literal value given: 1O But it could be improved by givin the type of the give value. Indeed in that case, I got a string composed of one and the letter O, but looks like the number 10. Some students can struggle with those. list, set and tuple less not as good: >>> tuple(foo) TypeError: 'int' object is not iterable No raiser, no value given. It's hard to find out what's the problem is. The biggest issue here is that if you have a long line with tuple() in the middle, yuou need to know the problem comes from tuple. Another problem is that many people don't know what iterable means. A better error message would be: TypeError: tuple() only accept iterables (any object you can use a for loop on). But it received '1', which is of type . Some things deserve a big explanation to solve the problem. It would be nice to add a link to official tutorial in the documentation. E.G, encoding is a big one: In [8]: b'?' + '?' File "", line 1 b'?' + '?' ^ SyntaxError: bytes can only contain ASCII literal characters. This is not helpful to somebody unaware of the difference between text and bytes. Possible solution: In [8]: b'?' + '?' File "", line 1 b'?' + '?' ^ SyntaxError: You cannnot concatenate bytes (b'?...') with a string ('?...'). Learn more about fixing this error at https://doc.python.org/errors/7897978 Of course, the repr will often need to be shorten but a short repr is better than none. Should we make a PEP with all of those ? Le 25/10/2016 ? 09:23, Neil Girdhar a ?crit : > Also, for something like this: > > In [1]: class A: > ...: pass > ...: > > In [2]: A(x=2) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > in () > ----> 1 A(x=2) > > TypeError: object() takes no parameters > > It would be nice to say TypeError: object() takes no parameters, but > keyword argument "x" given. I understand that the value of "x" might > not have a __repr__ method, but the key name has to be string, so this > should be easily doable at least for extra keyword arguments? For > positional arguments, maybe just print how many were passed to object? > Knowing the key name would have helped me with debugging. Usually, I > print(kwargs) somewhere up the inheritance chain and run my program again. > > On Tuesday, October 25, 2016 at 3:19:57 AM UTC-4, Neil Girdhar wrote: > > I was thinking of posting something like the first suggestion > myself. Both would be a great additions. > > On Monday, October 24, 2016 at 6:10:52 PM UTC-4, Ryan Gonzalez wrote: > > I personally find it kind of annoying when you have code like this: > > > x = A(1, B(2, 3)) > > > and Python's error message looks like this: > > > TypeError: __init__() takes 1 positional argument but 2 were given > > > It doesn't give much of a clue to which `__init__` is being > called. At all. > > The idea: when showing the function name in an error like this, > show the fully qualified name, like: > > > TypeError: A.__init__() takes 1 positional argument but 2 were given > > > This would be MUCH more helpful! > > > Another related change would be to do the same thing in tracebacks: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in __init__ > AssertionError > > > to: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in MyClass.__init__ > AssertionError > > > which could make it easier to find where exactly an error > originated. > > -- > Ryan (????) > [ERROR]: Your autotools build scripts are 200 lines longer than > your program. Something?s wrong. > http://kirbyfan64.github.io/ > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Tue Oct 25 09:58:31 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 26 Oct 2016 00:58:31 +1100 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On Wed, Oct 26, 2016 at 12:20 AM, Michel Desmoulin wrote: > list, set and tuple less not as good: > > >>> tuple(foo) > > TypeError: 'int' object is not iterable > > No raiser, no value given. It's hard to find out what's the problem is. The > biggest issue here is that if you have a long line with tuple() in the > middle, yuou need to know the problem comes from tuple. > > Another problem is that many people don't know what iterable means. > > A better error message would be: > > TypeError: tuple() only accept iterables (any object you can use a for loop > on). But it received '1', which is of type . -1 on this one. It doesn't really add very much - "iterable" is a good keyword that anyone can put into a search engine. Adding the repr of the object that was passed is nice if it's an integer, but less so if you passed in some huge object. If your lines of code are so complicated that you can't pinpoint the cause of the TypeError, the solution is probably to break the line. > Some things deserve a big explanation to solve the problem. It would be nice > to add a link to official tutorial in the documentation. > > E.G, encoding is a big one: > > In [8]: b'?' + '?' > File "", line 1 > b'?' + '?' > ^ > SyntaxError: bytes can only contain ASCII literal characters. > > This is not helpful to somebody unaware of the difference between text and > bytes. Someone unaware of the difference between text and bytes probably isn't messing with code that has b"..." strings in it. Ultimately, there's not a lot you can do about that; people just have to learn certain things, and quite probably, searching the web for this error message will find good information (it did for me). -0.5 on this change. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Oct 25 10:02:09 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Oct 2016 23:02:09 +0900 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> Message-ID: <22543.26081.957166.331323@turnbull.sk.tsukuba.ac.jp> Danilo J. S. Bellini writes: No attribution. Please attribute, at least when you mix quotes from different people. > > As a practicing economist, I wonder what use cases you're referring > > to. I can't think of any use cases where if one previous value is > > useful, having all previous values available (ie, an arbitrary > > temporal structure, at the modeler's option) isn't vastly more useful. > > Well, see the itertools.accumulate examples yourself then, the ones at > docs.python.org... And this: "list(accumulate(data, max))", needs syntax, why? I scratch my head over how you can improve over that. > We can start with something really simple like interest rates or > uniform series, but... Don't waste the list's time being snide. My point is that although new syntax may be useful for simple cases, serious applications will worry about computational accuracy and likely will provide packages that handle general cases that nest these simple cases. Given they exist, most modelers will prefer using those packages to writing their own comprehensions. That may not apply in other fields, but AFAICS it does apply in economics. So if you can't handle the complex cases, the syntax is just cognitive overhead: TOOWTDI will be the high-quality general packages even where they're theoretically overkill. The basic list comprehension doesn't need to deal with that kind of issue. > before arguing here, please convince other people to update the > Wikipedia: Irrelevant and rude. Please, don't. > "Recurrence relations, especially linear recurrence relations, are used > extensively in both theoretical and empirical economics." > https://en.wikipedia.org/wiki/Recurrence_relation#Economics I didn't contest that, as quoted above. What I contest is the claim that in empirical economics syntactic sugar for 'accumulate' would be particularly useful. Sure, you *can* express a second-order difference equation as two first-order equations, and perhaps you would actually calculate it that way. But in economics we normally express a second-order diff eq as a second-order diff eq. If the input is a second-order equation that needs to be reformulated as a system of first-order equations, then with the code required to implement such transformations, it is not obvious to me that having syntax for first-order equations is going to be worth the extra syntax in the language. Most of the complexity is going to be in the transformation which AFAICS is likely to be problem-specific, and therefore unavoidable by the modeler. OTOH, as PyScanPrev shows, the complexity of recursion can be hidden in a decorator, which the modeler can cargo-cult. Furthermore, in much of modern time-series econometrics, the order of the equation (number of lagged values to include) will be determined from the data and may differ from variable to variable and across equations, in which case you're effectively going to be carrying around a huge chunk of the data set as "state" (much of it unused) in each list element, which seems like a pretty clunky way to think about such problems computationally, however useful it may be in the theory of mathematical dynamics. I grant that 40 years in the field studying econometrics in terms of fixed data matrices has probably caused my synapses to clot -- and that's precisely why I'm asking *you* to explain to me how to beautify such code using the constructs you propose. I think the examples presented are already quite pretty without new syntax. As for economic theory, theory papers in economics don't include Python programs that I've seen. So having syntax for this feature in Python seems unlikely to improve presentation of economic theory. (I suppose that keeping "theoretical" in the quote was just an accident, but I could be missing something.) > The proposal isn't about quick and dirty code. The State-space example > includes a general linear time-varying MIMO simulation implementation > trying to keep the syntax as similar as possible to the control theory > engineers are used to. Fine, but I only questioned economics. I'm not a rocket scientist, I'll let the rocket scientists question that. If they don't, *I* certainly will concede you have a point in those other fields. > Also, my goal when I was looking for a scan syntax > to solve the conditional toggling example was to make it cleaner. >>> @enable_scan("p") ... def ltvss(A, B, C, D, u, x0=0): ... Ak, Bk, Ck, Dk = map(iter, [A, B, C, D]) ... u1, u2 = itertools.tee(u, 2) ... x = (next(Ak) * p + next(Bk) * uk for uk in prepend(x0, u1)) ... y = (next(Ck) * xk + next(Dk) * uk for xk, uk in zip(x, u2)) ... return y And this needs syntax now ... why? > And where's the link? To what? > > [...]. From what I remember, the conclusion reached was that > > there are too many degrees of freedom to be able to express > > reduction operations in a comprehension-like way that's any > > clearer I didn't write this. Please keep your attributions straight. > I can't talk about a discussion I didn't read, it would be unfair, > disrespectful. You're perfectly willing to tell other people what to read, though. I realize times have changed since 1983, but as long as I've been on the 'net, it's always been considered polite to investigate previous discussions, and AFAIK it still is. Of course that has to be balanced against search costs, but now that you know that what you're looking for exists, the expected benefit of search has jumped, and of course you could ask David for hints to minimize the search costs. No? > Perhaps there are people who prefer masochist rituals instead of > using "reduce", who knows? Who cares? (1) There are. (2) Evidently you don't. (3) You should, because the leading (literally) person who prefers writing code himself to using "reduce" is Guido van Rossum. > This maillist isn't very inviting... Why do you say that? Because people aren't falling over themselves to accept your proposal? You've been asked a simple question several times: why does this feature need to be implemented as syntax rather than a function? The only answers I've seen are the importance of the applications (not contested by anybody AFAICS), and your preference for syntax over a function. You have provided strong motivation for the feature, but not for an implementation via new syntax. > but I hope some of you at least try to read the rationale and the > examples. I did. They are very persuasive ... up to the point where you ask for syntax for something that appears (now that you've done it, kudos!) to be perfectly do-able with a function. It's not for me to say yes or no, but I can tell you that the outcomes of past discussions of this kind indicate that it will be unlikely that this proposal will be accepted without better justification for adding new syntax, preferably something that's impossible to implement performantly with a function. Or perhaps a "killer example" that persuades Guido or one of the other Most Senior Devs that this is way too cool to go without syntax. Steve From ncoghlan at gmail.com Tue Oct 25 10:22:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Oct 2016 00:22:55 +1000 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On 25 October 2016 at 23:20, Michel Desmoulin wrote: > Should we make a PEP with all of those ? No, incrementally improving error messages doesn't require PEP level advocacy - it just requires folks doing the development and review work of updating them without breaking anything, and adjusting the test suite as needed. In a lot of cases what's feasible with an error message (particularly from C code) depends a great deal on what information is readily available at the point the error is being reported, in others it's just that the particular error message hasn't been updated yet to be a bit more user friendly, so it's hard to establish new general principles around error reporting. The question does make wonder if we should consider "Find and improve an error message that annoys you because it omits frequently relevant information" as our new default "I'm interested in contributing, but I don't know what to work on" recommendation? While we don't want folks changing error messages for the sake of changing them, or overwhelming users with frequently irrelevant details, there's still a wide array of error messages that could stand to provide a bit more context regarding what went wrong, and it's the kind of change that can help more folks start to see software errors as "I can solve this!" puzzles rather than "It doesn't work and I don't know where to start in figuring out why not" road blocks. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Oct 25 10:29:21 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 25 Oct 2016 15:29:21 +0100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: <22543.26081.957166.331323@turnbull.sk.tsukuba.ac.jp> References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> <22543.26081.957166.331323@turnbull.sk.tsukuba.ac.jp> Message-ID: On 25 October 2016 at 15:02, Stephen J. Turnbull wrote: > I did. They are very persuasive ... up to the point where you ask for > syntax for something that appears (now that you've done it, kudos!) to > be perfectly do-able with a function. This is an important point by the way. Ideas on this list typically start with proposals along the lines of "let's make X a builtin". Discussion tends to centre around whether X is sufficiently important to be built in - and the OP experiences a lot of pushback. That's natural - the bar for getting something into the stdlib is very high, for a builtin higher still, and for new syntax even higher. The fact that we *do* get proposals accepted is a testament to the high quality of some of the proposals. But it doesn't mean that everything warrants being accepted. The majority won't be. On the other hand, the *ideas* are really interesting and valuable. I'm certainly planning on looking at PyScanPrev when I get the chance. And the discussions can frequently make people rethink their beliefs. So people posting ideas here should expect pushback - and should be prepared to learn how to think about the wider context in which changes to Python need to exist. That pushback won't[1] be hostile or negative, although it can feel that way to newcomers. But if a poster is inclined to take challenges to their idea personally, and even more so if they respond negatively, things can get tense. So please don't :-) So, bringing this back on topic - Danilo, what is your justification for suggesting that this technique should be language syntax, as opposed to simply being a 3rd party module (which you've already written, which is great)? Do you know what sorts of things would be viewed as evidence in favour of promoting this to syntax, or can we help in clarifying the sort of evidence you'd need to collect? Are the relevant design guidelines (things like "there should be one obvious way to do it" that frequently get quoted around here without much explanation) clear to you, or do you have questions? Hopefully we can change your mind about how inviting you find us :-) Paul [1] With the occasional exception that we regret - we're only human, although we try to hold to high standards. From steve at pearwood.info Tue Oct 25 11:00:31 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 02:00:31 +1100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161025150024.GE15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 05:18:46AM -0200, Danilo J. S. Bellini wrote: > > [...]. From what > > I remember, the conclusion reached was that there are too > > many degrees of freedom to be able to express reduction > > operations in a comprehension-like way that's any clearer Danilo, if you are going to quote somebody, especially when you copy and paste their words out of a completely different email from the one you are replying to, please tell us who you are quoting. It is not polite to quote somebody without attribution and out of context. > I don't know if that's a conclusion from any other thread, but that's > wrong. The only extra "freedom" required a way to access the previous > output (or "accumulator", "memory", "state"... you can use the name you > prefer, but they're all the same). How many parameters does itertools.scan > have? And map/filter? I do not know which email thread is being talked about here, but the conclusion is correct. In the most general case, you might want: - the current value of some running total or result; - the previous value of the running result; - the value of the running result before that ("previous previous"); - and so on; - more than one running calculation at the same time; - the current sequence value (for x in [1, 2, 3]); - the previous sequence value (previous value of x); - the sequence value before that ("previous previous x"); - etc. Recurrence relations are not all linear, and they don't always involve only the previous value. Fibonacci numbers need to track two previous running values: F[n] = F[n-1] + F[n-2] The Lagged Fibonacci generator is a pseudo-random number generator that uses a similar recurence, except instead of only needing to remember the previous two results, it remembers some arbitrary N previous results. E.g. we might say: S[n] = S[n - 4] + S[n - 9] and so we use the 4th-previous and 9th-previous result to generate the new one. Just having the current running result is not sufficient. [...] > Forget the word "reduce", some people here seem to have way too much taboo > with that word, and I know there are people who would prefer a higher > McCabe complexity just to avoid it. Perhaps there are people who prefer > masochist rituals instead of using "reduce", who knows? And perhaps there are people who think that using "reduce" and other functional idioms instead of good, clean, easy-to-understand, easy-to- debug imperative code is a "masochist ritual". > but I > wrote a lot about the scan use cases and no one here seem to have read what > I wrote, and the only reason that matters seem to be a kind of social > status, not really "reason". I probably wrote way more reasons for that > proposal than annotations could ever have. I read your use-cases. I went to the Github page and looked at the examples and the documentation. I didn't comment because it doesn't matter whether there is one use-case or a million use-cases, the fundamental question still applies: why do we need ANOTHER way of solving these problems when there are already so many? More use-cases doesn't answer that question. A million weak use-cases is still weak. The obvious way to solve these use-cases is still an imperative for-loop. Python is not Haskell, it is not a functional programming language. It has some functional programming features, but it is not and never will be intended for arbitrarily complex code to be written in a pure functional style. Comprehensions are intentionally kept simple. They are not a substitute for all loops, only the easy 80% or 90%. As confirmed by Nick Coghlan, comprehensions are absolutely and fundamentally intended as syntactic sugar for ONE and ONLY one pattern. For example: [expr for x in seq for y in seq2 if cond] is sugar for: result = [] for x in seq: for y in seq2: if cond: result.append(expr) If you need to keep the previous sequence item, or a running total, or break out of the loop early, or use a while loop, you cannot use a comprehension. So I'm afraid that completely rules out your idea of having a running total or result in a comprehension. That simply isn't compatible with the design of comprehensions as ruled by Guido and the core devs. Could you change their mind? It's not impossible. If you have an compelling answer to the questions "why does this need to be part of comprehension syntax? why not use a for-loop, or itertools.accumulate?" then of course they will reconsider. But the barrier is very high. It is not a matter of more use-cases. We already have solutions to those use-cases. You have to explain why the existing solutions don't work. -- Steve From mikhailwas at gmail.com Tue Oct 25 11:15:58 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Tue, 25 Oct 2016 17:15:58 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: <20161025023704.GD15983@ando.pearwood.info> References: <20161025023704.GD15983@ando.pearwood.info> Message-ID: On 25 October 2016 at 04:37, Steven D'Aprano wrote: >> I would be happy to see a somewhat more general and user friendly >> version of string.translate function. >> It could work this way: >> string.newtranslate(file_with_table, Drop=True, Dec=True) > Mikhail, I appreciate that you have many ideas and want to share them, > but try to think about how those ideas would work. The Python standard > library is full of really well-designed programming interfaces. You can > learn a lot by thinking "what existing function is this like? how does > that existing function work?". Hi Steven, Thank you for the reply. I agree the idea with the file is not good, I already agreed with that and that was pointed by others too. Of course it is up to me how do I store the table. I will try to be more precise with my ideas ;) The new str.translate() interface is indeed much more versatile and provides good ways to define the table. >Or it can take a mapping (usually a dict) that maps either characters or >ordinal numbers to a new string (not just a single character, but an >arbitrary string) or ordinal numbers. > > str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) >(or None, to delete them). Note the flexibility: you don't need to Good. But of course if I do it with big tables, I would anyway need to parse them from some table file. Typing all values direct in code is not a comfortable way. This again should make it clear how I become the "None" value after parsing the table from plain format like 97:[nothin here] (another point for my research). > Could it be better? Perhaps. I've suggested that maybe translate could > automatically call maketrans if given more than one argument. Maybe > there's an easier way to just delete unwanted characters. Perhaps there > could be a way to say "any character not in the translation table should > be dropped". These are interesting questions. So my previous thought on it was, that there could be set of such functions: str.translate_keep(table) - this is current translate, namely keeps non-defined chars untouched str.translate_drop(table) - all the same, but dropping non-defined chars Probaly also a pair of functions without translation: str.remove(chars) - removes given chars str.keep(chars) - removes all, except chars Motivation is that those can be optimised for speed and I suppose those can work faster than re.sub(). The question is how common are these tasks, I don't have any statistics regarding this. >There are no 16-bit strings. >Unicode is a 21-bit encoding, usually encoded as either fixed-width >sequence of 4-byte code units (UTF-32) or a variable-width sequence of >2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a >"16-bit string". So in general case they should expand to 32 bit unsigned integers if I understand correctly? IIRC, Windows uses UTF16 for filenames. Anyway I will not pretend I can give any ideas regarding optimising thing there. It is just that I tend to treat those translate/filter functions as purely numeric, so I should be able to use those on any data chunk without thinking, if it is a text or not, this implies of course I must be sure that units are expanded to fixed bytesize. >> but as said I don't like very much the idea and would be OK for me to >> use numeric values only. > I think you are very possibly the only Python programmer in the world > who thinks that writing decimal ordinal values is more user-friendly > than writing the actual character itself. I know I would much rather > see $, ? or ? than 36, 960 or 9556. Yeah I am strange. This however gives you guarantee for any environment that you can see and input them ans save the work in ASCII. Mikhail From yselivanov.ml at gmail.com Tue Oct 25 11:59:29 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 25 Oct 2016 11:59:29 -0400 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> <22542.48277.333896.349836@turnbull.sk.tsukuba.ac.jp> Message-ID: <266c6cdc-0bc7-9240-cc11-98bebbf1cbf5@gmail.com> On 2016-10-25 4:33 AM, Nick Coghlan wrote: > I'm starting to think that we instead need a way > to let them easily say "This resource, the one I just created or have > otherwise gained access to? Link its management to the lifecycle of > the currently running function or frame, so it gets cleaned up when it > finishes running". But how would it help with a partial iteration over generators with a "with" statement inside? def it(): with open(file) as f: for line in f: yield line Nathaniel proposal addresses this by fixing "for" statements, so that the outer loop that iterates over "it" would close the generator once the iteration is stopped. With your proposal you want to attach the opened file to the frame, but you'd need to attach it to the frame of *caller* of "it", right? Yury From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Oct 25 13:10:40 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 26 Oct 2016 02:10:40 +0900 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> Message-ID: <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> Mikhail V writes: > Good. But of course if I do it with big tables, I would anyway > need to parse them from some table file. That is the kind of thing we can dismiss (for now) as a "SMOP" = "simple matter of programming". You know how to do it, we know how to do it, if it needs optimization, we can do it later. The part that requires discussion is the API design. > So my previous thought on it was, that there could be set of such functions: > > str.translate_keep(table) - this is current translate, namely keeps > non-defined chars untouched > str.translate_drop(table) - all the same, but dropping non-defined chars > > Probaly also a pair of functions without translation: > str.remove(chars) - removes given chars > str.keep(chars) - removes all, except chars > > Motivation is that those can be optimised for speed and I suppose those > can work faster than re.sub(). Others are more expert than I, but as I understand it, Python's function calls are expensive enough that dispatching to internal routines based on types of arguments adds negligible overhead. Optimization also can wait. That said, multiple methods is a valid option for the API. Eg, Guido generally prefers that distinctions that can't be made on type of arguments (such as translate_keep vs translate_drop) be done by giving different names rather than a flag argument. Do you *like* this API, or was this motivated primarily by the possibilities you see for optimization? > The question is how common are these tasks, I don't have any > statistics regarding this. Frequency is useful information, but if you don't have it, don't worry about it. > So in general case they should expand to 32 bit unsigned integers if I > understand correctly? No. The internal string representation is described here: https://www.python.org/dev/peps/pep-0393/. As in the Unicode standard itself, you should think of characters as integers. Yes, with PEP 393 you can deduce the representation of a string from its contents, but you can't guess for individual characters in a longer string -- the whole string has the width needed for its widest character. > so I should be able to use those on any data chunk without > thinking, if it is a text or not, this implies of course I must be > sure that units are expanded to fixed bytesize. The width is constant for any given string. However, I don't see at this point that you'll need more than the functions available in Python already, plus one or more wrappers to marshal the information your API accepts to the data that str.translate wants. Of course later it may be worthwhile to rewrite the wrapper in C and merge it into the existing str.translate(), or the multiple methods you suggest above. > >> but as said I don't like very much the idea and would be OK for me to > >> use numeric values only. > Yeah I am strange. This however gives you guarantee for any environment that you > can see and input them ans save the work in ASCII. This is not going to be a problem if you're running Python and can enter the program and digits. In any case, the API is going to have to be convenient for all the people who expect that they will never again be reduced to a hex keypad and 7-segment display. From brenbarn at brenbarn.net Tue Oct 25 14:01:04 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Tue, 25 Oct 2016 11:01:04 -0700 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> <22542.49891.835655.410788@turnbull.sk.tsukuba.ac.jp> Message-ID: <580F9DE0.8040104@brenbarn.net> On 2016-10-25 00:18, Danilo J. S. Bellini wrote: > > Well, see the itertools.accumulate examples yourself then, the ones at > docs.python.org... We can start with something really simple like > interest rates or uniform series, but... before arguing here, please > convince other people to update the Wikipedia: > > "Recurrence relations, especially linear recurrence relations, are used > extensively in both theoretical and empirical economics." > https://en.wikipedia.org/wiki/Recurrence_relation#Economics > The fact that that page is about recurrence relations supports the position that your proposed change is too specific. Recurrence relations are much more general than just "have access to the previous value". They may have access to any of the earlier values, and/or multiple earlier values. So if what we wanted was to able to use recurrence relations, your proposal would be insufficient. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From rob.cliffe at btinternet.com Tue Oct 25 13:55:25 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 25 Oct 2016 18:55:25 +0100 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> Message-ID: On 24/10/2016 06:11, Danilo J. S. Bellini wrote: > For example, a product: > > >>> [prev * k for k in [5, 2, 4, 3] from prev = 1] > [1, 5, 10, 40, 120] > > That makes sense for me, and seem simpler than: > > >>> from itertools import accumulate, chain > >>> list(accumulate(chain([1], [5, 2, 4, 3]), lambda prev, k: prev * k)) > [1, 5, 10, 40, 120] > Well, if you want an instant reaction from someone skimming this thread: I looked at the first example and couldn't understand it. Then I looked at the second one, and could understand it (even though I may never have used "chain" or heard of "accumulate"). Obviously your mileage varies. Best wishes, Rob Cliffe -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Tue Oct 25 15:11:06 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 25 Oct 2016 21:11:06 +0200 Subject: [Python-ideas] f-string, for dictionaries Message-ID: We have a syntax to create strings with variables automatically inferred from its context: >>> name = "Guido" >>> print(f'Hello {name}') Hello Guido Similarly, I'd like to suggest a similar feature for building dictionaries: >>> foo = 1 >>> bar = 2 >>> {:bar, :foo} {'bar': 1, 'foo', 2} And a similar way to get the content from the dictionary into variables: >>> values = {'bar': 1, 'foo', 2} >>> {:bar, :foo} = values >>> bar 1 >>> foo 2 The syntaxes used here are of course just to illustrate the concept and I'm suggesting we must use those. From njs at pobox.com Tue Oct 25 15:17:56 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Oct 2016 12:17:56 -0700 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 6:58 AM, Chris Angelico wrote: > On Wed, Oct 26, 2016 at 12:20 AM, Michel Desmoulin > wrote: >> list, set and tuple less not as good: >> >> >>> tuple(foo) >> >> TypeError: 'int' object is not iterable >> >> No raiser, no value given. It's hard to find out what's the problem is. The >> biggest issue here is that if you have a long line with tuple() in the >> middle, yuou need to know the problem comes from tuple. >> >> Another problem is that many people don't know what iterable means. >> >> A better error message would be: >> >> TypeError: tuple() only accept iterables (any object you can use a for loop >> on). But it received '1', which is of type . > > -1 on this one. It doesn't really add very much - "iterable" is a good > keyword that anyone can put into a search engine. Adding the repr of > the object that was passed is nice if it's an integer, but less so if > you passed in some huge object. Agreed that showing the repr in general is tricky. The length isn't such a big deal (you need some cleverness -- don't show the repr if it's >40 characters, say -- but that's doable if someone wants to do the work). What I'd be a little nervous of is the time/memory cost of computing the repr for an arbitrary object every time a TypeError is thrown. One possible workaround for this would be to be lazy, i.e. wait until someone actually calls str() on the exception object before computing the repr, but the problem then is that you'd have to attach the offending object to the TypeError, which pins it in memory longer than it otherwise would be. Perhaps this is an acceptable trade-off -- having the offending object around is actually pretty useful for debugging! But we probably can't just blindly start showing repr's everywhere -- it'll need some case-by-case consideration. A good example actually might be the int constructor -- if the object is an unrecognized type, then it shows the type; if it's a recognized type but the value is bad, then it shows the value: In [3]: int([]) TypeError: int() argument must be a string, a bytes-like object or a number, not 'list' In [4]: int("a") ValueError: invalid literal for int() with base 10: 'a' > If your lines of code are so complicated that you can't pinpoint the > cause of the TypeError, the solution is probably to break the line. Yes, breaking the line will also work, but often difficult to do (e.g. you might have a traceback but not an easy recipe to reproduce the exception), and anyway, why is that a reason to punish people who haven't learned that trick? I get that this list's default is to push back on proposed changes, and it's a good principle in general, but "improved error messages" are *really* cheap. The bar should be pretty low, IMO. If someone's willing to do the work to make the error messages friendlier, while addressing technical considerations like the repr issue, then that's awesome, and if that means that users get somewhat redundant information to help them debug then... cool? Terseness per se is not a cardinal virtue :-). -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Tue Oct 25 15:23:06 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Oct 2016 12:23:06 -0700 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 6:20 AM, Michel Desmoulin wrote: > Some things deserve a big explanation to solve the problem. It would be nice > to add a link to official tutorial in the documentation. > > E.G, encoding is a big one: > > In [8]: b'?' + '?' > File "", line 1 > b'?' + '?' > ^ > SyntaxError: bytes can only contain ASCII literal characters. > > This is not helpful to somebody unaware of the difference between text and > bytes. > > Possible solution: > > In [8]: b'?' + '?' > File "", line 1 > b'?' + '?' > ^ > SyntaxError: You cannnot concatenate bytes (b'?...') with > a string ('?...'). Learn more about fixing this error at > https://doc.python.org/errors/7897978 I don't disagree with the principle, but I don't see how this particular example works. The interpreter here doesn't know that you're trying concatenate a bytes and a string, because the error happens before that, when it tries to make the bytes object. These really are two different errors. -n -- Nathaniel J. Smith -- https://vorpus.org From p.f.moore at gmail.com Tue Oct 25 16:27:58 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 25 Oct 2016 21:27:58 +0100 Subject: [Python-ideas] f-string, for dictionaries In-Reply-To: References: Message-ID: On 25 October 2016 at 20:11, Michel Desmoulin wrote: > Similarly, I'd like to suggest a similar feature for building dictionaries: > >>>> foo = 1 >>>> bar = 2 >>>> {:bar, :foo} > {'bar': 1, 'foo', 2} I don't see a huge advantage over >>> dict(foo=foo, bar=bar) Avoiding having to repeat the variable names doesn't feel like a killer advantage here, certainly not sufficient to warrant adding yet another dictionary construction syntax. Do you have examples of code that would be substantially improved with this syntax (over using an existing approach)? > And a similar way to get the content from the dictionary into variables: > >>>> values = {'bar': 1, 'foo', 2} >>>> {:bar, :foo} = values >>>> bar > 1 >>>> foo > 2 There aren't as many obvious alternative approaches here, but it's not clear why you'd want to do this. So in this case, I'd want to see real-life use cases. Most of the ones I can think of are just to allow a shorter form for values['foo']. For those uses >>> from types import SimpleNamespace >>> o = SimpleNamespace(**values) >> o.foo 1 works pretty well. > The syntaxes used here are of course just to illustrate the concept and I'm > suggesting we must use those. Well, if we ignore the syntax for a second, what is your proposal exactly? It seems to be in 2 parts: 1. "We should have a dictionary building feature that uses keys based on variables from the local namespace". OK, that's not something I've needed much, and when I have, there have usually been existing ways to do the job (such as dict(foo=foo) noted above) that are perfectly sufficient. Sometimes the existing alternatives look a little clumsy and repetitive, but that's a very subjective judgement, and any new syntax could easily look worse to me (your specific proposal, for example, does). So I can see a small potential benefit in (subjective) readability, but that's offset by the proposal being another way of doing something that's already pretty well covered in the language. Add to that all of the "normal" objections to new syntax (more to teach/learn, hard to google, difficulty finding a form that suits everyone, etc) and it's hard to see this getting accepted. 2. "We should have a way of unpacking a dictionary into local variables". That's not something that I can immediately think of a way of doing currently - so that's a point in its favour. But honestly, I've never seen the need to do this outside of interactive use (for which see below). If all you want is to avoid the d['name'] syntax, which is quite punctuation-heavy, the SimpleNamespace trick above does that. So there's almost no use case that I can see for this. Can you give examples of real-world code where this would be useful? On the other hand, your proposal, like many that have come up recently, seems to be driven (if it's OK for me to guess at your motivations) by an interest in being able to write relatively terse one-liners, or at least to avoid some of the syntactic overheads of existing constructs. It seems to me that the environment I'd most want to do this in is the interactive interpreter. So I wonder if this (and similar) proposals are driven by a feeling that it's "clumsy" writing code at the interactive prompt. That may well be so. The standard interactive prompt is pretty basic, and yet it's a *huge* part of the unique experience working with Python to be able to work at the prompt as you develop. So maybe there's scope for discussion here on constructs focused more on interactive use? That probably warrants a separate thread, though, so I'll split it off from this discussion. Feel free to contribute there if I'm right in where I think the motivation for your proposals came from. Paul From p.f.moore at gmail.com Tue Oct 25 17:13:54 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 25 Oct 2016 22:13:54 +0100 Subject: [Python-ideas] A better interactive prompt Message-ID: I've seen a lot of syntax proposals recently that are based around providing better ways of writing "one liner" styles of code. Typically, the proposals seem to get into trouble because: 1. They duplicate things that can already be done, just not in a single expression/statement. 2. They are seen as over-terse, which is not generally seen as a good thing in Python. However, looking at them from the point of view of someone working at the interactive prompt, they can seem much more attractive. The natural unit of interaction at the command line is the single line. To the extent that (for example) fixing a mistake in a multi-line construct at the command line is a real pain. But these limitations are not inherent to Python - they are problems with the interactive prompt, which is fairly basic[1]. So maybe it's worth looking at the root issue, how to make the interactive prompt easier to use[2]? But that's something of a solved problem. IPython offers a rich interactive environment, for people who find the limitations of the standard interactive prompt frustrating. Would it be worth the standard Python documentation promoting IPython for that role? Maybe even, if IPython is available, allowing the user to configure Python to use it by default as the interactive prompt (a bit like readline, but I dislike the way you can't switch off readline integration if it's installed)? Ideally, if IPython was more readily available, fewer users would be frustrated with Python's existing multi-line constructs. And those that were, would have the option of looking into custom IPython magic commands, before being forced to request language changes. Thoughts? Paul [1] On the other hand, the interactive prompt is a huge part of what makes Python so great - these days, when I have to code in languages that don't have an interactive prompt, it drives me nuts. And even those that do, typically don't have one as good as Python's (in spite of the fact that this whole mail is about needing to improve the Python REPL). [2] My apologies to anyone whose proposal was *not* based around interactive use cases. I'm assuming motives here left, right and centre, so if what I'm saying isn't what you were intending, that's fine. Treat this as an unrelated proposal. From desmoulinmichel at gmail.com Tue Oct 25 17:18:10 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 25 Oct 2016 23:18:10 +0200 Subject: [Python-ideas] f-string, for dictionaries In-Reply-To: References: Message-ID: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> Le 25/10/2016 ? 22:27, Paul Moore a ?crit : > On 25 October 2016 at 20:11, Michel Desmoulin wrote: >> Similarly, I'd like to suggest a similar feature for building dictionaries: >> >>>>> foo = 1 >>>>> bar = 2 >>>>> {:bar, :foo} >> {'bar': 1, 'foo', 2} > > I don't see a huge advantage over > >>>> dict(foo=foo, bar=bar) > > Avoiding having to repeat the variable names doesn't feel like a > killer advantage here, certainly not sufficient to warrant adding yet > another dictionary construction syntax. Do you have examples of code > that would be substantially improved with this syntax (over using an > existing approach)? {:bar, :foo} vs dict(foo=foo, bar=bar) has the same benefit that would have f"hello {foo} {bar}" vs "hello {} {}".format(foo, bar) > >> And a similar way to get the content from the dictionary into variables: >> >>>>> values = {'bar': 1, 'foo', 2} >>>>> {:bar, :foo} = values >>>>> bar >> 1 >>>>> foo >> 2 > > There aren't as many obvious alternative approaches here, but it's not > clear why you'd want to do this. So in this case, I'd want to see > real-life use cases. Most of the ones I can think of are just to allow > a shorter form for values['foo']. For those uses > > >>> from types import SimpleNamespace > >>> o = SimpleNamespace(**values) > >> o.foo > 1 > > works pretty well. This is just unpacking for dicts really. As you would do: a, b = iterable you do: {:a, :b} = mapping > >> The syntaxes used here are of course just to illustrate the concept and I'm >> suggesting we must use those. > > Well, if we ignore the syntax for a second, what is your proposal > exactly? It seems to be in 2 parts: > > 1. "We should have a dictionary building feature that uses keys based > on variables from the local namespace". OK, that's not something I've > needed much, and when I have, there have usually been existing ways to > do the job (such as dict(foo=foo) noted above) that are perfectly > sufficient. Sometimes the existing alternatives look a little clumsy > and repetitive, but that's a very subjective judgement, and any new > syntax could easily look worse to me (your specific proposal, for > example, does). So I can see a small potential benefit in (subjective) > readability, but that's offset by the proposal being another way of > doing something that's already pretty well covered in the language. > Add to that all of the "normal" objections to new syntax (more to > teach/learn, hard to google, difficulty finding a form that suits > everyone, etc) and it's hard to see this getting accepted. > > 2. "We should have a way of unpacking a dictionary into local > variables". That's not something that I can immediately think of a way > of doing currently - so that's a point in its favour. But honestly, > I've never seen the need to do this outside of interactive use (for > which see below). If all you want is to avoid the d['name'] syntax, > which is quite punctuation-heavy, the SimpleNamespace trick above does > that. So there's almost no use case that I can see for this. Can you > give examples of real-world code where this would be useful? > > On the other hand, your proposal, like many that have come up > recently, seems to be driven (if it's OK for me to guess at your > motivations) by an interest in being able to write relatively terse > one-liners, or at least to avoid some of the syntactic overheads of > existing constructs. It seems to me that the environment I'd most want > to do this in is the interactive interpreter. So I wonder if this (and > similar) proposals are driven by a feeling that it's "clumsy" writing > code at the interactive prompt. That may well be so. The standard > interactive prompt is pretty basic, and yet it's a *huge* part of the > unique experience working with Python to be able to work at the prompt > as you develop. So maybe there's scope for discussion here on > constructs focused more on interactive use? That probably warrants a > separate thread, though, so I'll split it off from this discussion. > Feel free to contribute there if I'm right in where I think the > motivation for your proposals came from. > > Paul Currently I already have shortcuts those features. I have wrappers for dictionaries such as: d(mapping).unpack('foo', 'bar') Which does some hack with stack frame and locals(). And: d.from_vars('foo', 'bar') I use them only in the shell of course, because you can't really have such hacks in production code. I would use such features in my production code if they was a clean way to do it. It's just convenience syntaxic sugar. You can argue that decorator could be written: def func(): pass func = decorator(func) Instead of: @decorator def func(): pass But the second one is more convenient. And so are comprehensions, unpacking, and f-strings. Clearly not killer features, just nice to have. From p.f.moore at gmail.com Tue Oct 25 17:22:51 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 25 Oct 2016 22:22:51 +0100 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On 25 October 2016 at 20:17, Nathaniel Smith wrote: > I get that this list's default is to push > back on proposed changes, and it's a good principle in general, but > "improved error messages" are *really* cheap. The bar should be pretty > low, IMO. If someone's willing to do the work to make the error > messages friendlier, while addressing technical considerations like > the repr issue, then that's awesome, and if that means that users get > somewhat redundant information to help them debug then... cool? > Terseness per se is not a cardinal virtue :-). Agreed, improved error messages are always worthwhile. There may be issues (you mentioned one earlier in your message), but typically they are about implementation, not about the principle, so I'd say go ahead and implement the proposal, post it at bugs.python.org, and things like that can be thrashed out in code review. For people who don't have the knowledge to actually code the change, a feature request on the tracker is probably a fine start. Paul From chris.barker at noaa.gov Tue Oct 25 17:39:07 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 25 Oct 2016 14:39:07 -0700 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 6:58 AM, Chris Angelico wrote: > > >>> tuple(foo) > > > > TypeError: 'int' object is not iterable > > > > No raiser, no value given. It's hard to find out what's the problem is. > The > > biggest issue here is that if you have a long line with tuple() in the > > middle, yuou need to know the problem comes from tuple. > > > > Another problem is that many people don't know what iterable means. > > > > A better error message would be: > > > > TypeError: tuple() only accept iterables (any object you can use a for > loop > > on). But it received '1', which is of type . > > -1 on this one. It doesn't really add very much - "iterable" is a good > keyword that anyone can put into a search engine. yes, that's OK -- and that is the spec of the tuple constructor, yes? > Adding the repr of > the object that was passed is nice if it's an integer, but less so if > you passed in some huge object. > I'm not sure you need the repr of the object passed in -- the type is usually sufficient. (for a TypeError -- for a ValueError, then the value IS important, and a repr is nice. > If your lines of code are so complicated that you can't pinpoint the > cause of the TypeError, the solution is probably to break the line. yes, but it would be nice not to have to -- maybe I'm just overdoing the one-liners, but I VERY often have errors liek this on a line, and have to go in and break the line by hand to find out where the error is actually coming from. SyntaxErrors make some effort to indicate WHERE in the line the Error is - it would be great to get some help like that in these cases. Not sure how possible it is though. As I think about it, I tend to get this with indexing error, where a have a fairly complex expression with multiple objects being indexed, and then a get an IndexError and have no idea where the problem is. > > SyntaxError: bytes can only contain ASCII literal characters. > > > > This is not helpful to somebody unaware of the difference between text > and > > bytes. > > Someone unaware of the difference between text and bytes probably > isn't messing with code that has b"..." strings in it. or shouldn't be :-) -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Oct 25 17:48:39 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 25 Oct 2016 16:48:39 -0500 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: So, based on everyone's feedback, I just created this: http://bugs.python.org/issue28536 On Mon, Oct 24, 2016 at 5:07 PM, Ryan Gonzalez wrote: > I personally find it kind of annoying when you have code like this: > > > x = A(1, B(2, 3)) > > > and Python's error message looks like this: > > > TypeError: __init__() takes 1 positional argument but 2 were given > > > It doesn't give much of a clue to which `__init__` is being called. At all. > > The idea: when showing the function name in an error like this, show the > fully qualified name, like: > > > TypeError: A.__init__() takes 1 positional argument but 2 were given > > > This would be MUCH more helpful! > > > Another related change would be to do the same thing in tracebacks: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in __init__ > AssertionError > > > to: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in MyClass.__init__ > AssertionError > > > which could make it easier to find where exactly an error originated. > > -- > Ryan (????) > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 25 17:50:32 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 25 Oct 2016 14:50:32 -0700 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Oct 24, 2016 at 7:00 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Chris Barker writes: > > > I think the "better error message" option is the way to go, > > however. At least until we all have better Unicode support in all > > our tools.... > > I don't think "better Unicode support" helps with confusables in > programming languages that value TOOWTDI. that was kind of a throwaway comment, but I think it's a LONG way out, but ideally, the OWTDI would be "curly quotes". The fact that in ASCII, a single quote and a apostrophe are teh same, and that there is no distinction between opening and closing quotes is unfortunate. But it will be a LONG time before we'll all have text editors that can easily let us type that many different characters... and even more time before backward compatibility concerns are alleviated -- probably around the time I can have a snowball fight in the Bad Place. So let's jsut stick with what we have, eh? [1] Personally, I immediately liked the triple quotes, Me too -- I find myself using them in text email messages and the like -- not sure if non-pythonistas get it, but no one has complained yet. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Oct 25 17:55:21 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 25 Oct 2016 16:55:21 -0500 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: Also, as an extension of this idea, would it be possible to improve errors like this: class X: pass X() # object() takes no parameters to show the actual type instead of just 'object'? On Tue, Oct 25, 2016 at 4:48 PM, Ryan Gonzalez wrote: > So, based on everyone's feedback, I just created this: > > http://bugs.python.org/issue28536 > > On Mon, Oct 24, 2016 at 5:07 PM, Ryan Gonzalez wrote: > >> I personally find it kind of annoying when you have code like this: >> >> >> x = A(1, B(2, 3)) >> >> >> and Python's error message looks like this: >> >> >> TypeError: __init__() takes 1 positional argument but 2 were given >> >> >> It doesn't give much of a clue to which `__init__` is being called. At >> all. >> >> The idea: when showing the function name in an error like this, show the >> fully qualified name, like: >> >> >> TypeError: A.__init__() takes 1 positional argument but 2 were given >> >> >> This would be MUCH more helpful! >> >> >> Another related change would be to do the same thing in tracebacks: >> >> >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 2, in __init__ >> AssertionError >> >> >> to: >> >> >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 2, in MyClass.__init__ >> AssertionError >> >> >> which could make it easier to find where exactly an error originated. >> >> -- >> Ryan (????) >> [ERROR]: Your autotools build scripts are 200 lines longer than your >> program. Something?s wrong. >> http://kirbyfan64.github.io/ >> >> > > > > -- > Ryan (????) > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 25 17:55:43 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 25 Oct 2016 14:55:43 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: But that's something of a solved problem. IPython offers a rich > interactive environment, for people who find the limitations of the > standard interactive prompt frustrating. Would it be worth the > standard Python documentation promoting IPython for that role? +1 iPython really makes it easier to do exploratory code -- I have my students install it day one of an intro to python class. However, maybe ironically, iPython is still a bit ugly for editing multi-line constructs -- maybe it will get better. The Jupyter (formally iPython) notebook is the way to go for that, but it has its other downsides... [2] My apologies to anyone whose proposal was *not* based around > interactive use cases. folks DO like compact code, regardless of context :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at mgmiller.net Tue Oct 25 17:49:33 2016 From: python-ideas at mgmiller.net (Mike Miller) Date: Tue, 25 Oct 2016 14:49:33 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: <20d303e1-57d7-8680-9fad-97e7319336a4@mgmiller.net> Would recommend bpython, it is lighter-weight and accessible to newbies, in the sense that a manual is not needed. It just starts helping out as you type. http://bpython-interpreter.org/ From rosuav at gmail.com Tue Oct 25 17:59:20 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 26 Oct 2016 08:59:20 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Oct 26, 2016 at 8:50 AM, Chris Barker wrote: > that was kind of a throwaway comment, but I think it's a LONG way out, but > ideally, the OWTDI would be "curly quotes". The fact that in ASCII, a single > quote and a apostrophe are teh same, and that there is no distinction > between opening and closing quotes is unfortunate. So should French programmers write string literals ?like this?? ChrisA From njs at pobox.com Tue Oct 25 18:25:18 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Oct 2016 15:25:18 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan wrote: > On 20 October 2016 at 07:02, Nathaniel Smith wrote: >> The first change is to replace the outer for loop with a while/pop >> loop, so that if an exception occurs we'll know which iterables remain >> to be processed: >> >> def chain(*iterables): >> try: >> while iterables: >> for element in iterables.pop(0): >> yield element >> ... >> >> Now, what do we do if an exception does occur? We need to call >> iterclose on all of the remaining iterables, but the tricky bit is >> that this might itself raise new exceptions. If this happens, we don't >> want to abort early; instead, we want to continue until we've closed >> all the iterables, and then raise a chained exception. Basically what >> we want is: >> >> def chain(*iterables): >> try: >> while iterables: >> for element in iterables.pop(0): >> yield element >> finally: >> try: >> operators.iterclose(iter(iterables[0])) >> finally: >> try: >> operators.iterclose(iter(iterables[1])) >> finally: >> try: >> operators.iterclose(iter(iterables[2])) >> finally: >> ... >> >> but of course that's not valid syntax. Fortunately, it's not too hard >> to rewrite that into real Python -- but it's a little dense: >> >> def chain(*iterables): >> try: >> while iterables: >> for element in iterables.pop(0): >> yield element >> # This is equivalent to the nested-finally chain above: >> except BaseException as last_exc: >> for iterable in iterables: >> try: >> operators.iterclose(iter(iterable)) >> except BaseException as new_exc: >> if new_exc.__context__ is None: >> new_exc.__context__ = last_exc >> last_exc = new_exc >> raise last_exc >> >> It's probably worth wrapping that bottom part into an iterclose_all() >> helper, since the pattern probably occurs in other cases as well. >> (Actually, now that I think about it, the map() example in the text >> should be doing this instead of what it's currently doing... I'll fix >> that.) > > At this point your code is starting to look a whole lot like the code > in contextlib.ExitStack.__exit__ :) One of the versions I tried but didn't include in my email used ExitStack :-). It turns out not to work here: the problem is that we effectively need to enter *all* the contexts before unwinding, even if trying to enter one of them fails. ExitStack is nested like (try (try (try ... finally) finally) finally), and we need (try finally (try finally (try finally ...))) But this is just a small side-point anyway, since most code is not implementing complicated meta-iterators; I'll address your real proposal below. > Accordingly, I'm going to suggest that while I agree the problem you > describe is one that genuinely emerges in large production > applications and other complex systems, this particular solution is > simply far too intrusive to be accepted as a language change for > Python - you're talking a fundamental change to the meaning of > iteration for the sake of the relatively small portion of the > community that either work on such complex services, or insist on > writing their code as if it might become part of such a service, even > when it currently isn't. Given that simple applications vastly > outnumber complex ones, and always will, I think making such a change > would be a bad trade-off that didn't come close to justifying the > costs imposed on the rest of the ecosystem to adjust to it. > > A potentially more fruitful direction of research to pursue for 3.7 > would be the notion of "frame local resources", where each Python > level execution frame implicitly provided a lazily instantiated > ExitStack instance (or an equivalent) for resource management. > Assuming that it offered an "enter_frame_context" function that mapped > to "contextlib.ExitStack.enter_context", such a system would let us do > things like: So basically a 'with expression', that gives up the block syntax -- taking its scope from the current function instead -- in return for being usable in expression context? That's a really interesting, and I see the intuition that it might be less disruptive if our implicit iterclose calls are scoped to the function rather than the 'for' loop. But having thought about it and investigated some... I don't think function-scoping addresses my problem, and I don't see evidence that it's meaningfully less disruptive to existing code. First, "my problem": Obviously, Python's a language that should be usable for folks doing one-off scripts, and for paranoid folks trying to write robust complex systems, and for everyone in between -- these are all really important constituencies. And unfortunately, there is a trade-off here, where the changes we're discussing effect these constituencies differently. But it's not just a matter of shifting around a fixed amount of pain; the *quality* of the pain really changes under the different proposals. In the status quo: - for one-off scripts: you can just let the GC worry about generator and file handle cleanup, re-use iterators, whatever, it's cool - for robust systems: because it's the *caller's* responsibility to ensure that iterators are cleaned up, you... kinda can't really use generators without -- pick one -- (a) draconian style guides (like forbidding 'with' inside generators or forbidding bare 'for' loops entirely), (b) lots of auditing (every time you write a 'for' loop, go read the source to the generator you're iterating over -- no modularity for you and let's hope the answer doesn't change!), or (c) introducing really subtle bugs. Or all of the above. It's true that a lot of the time you can ignore this problem and get away with it one way or another, but if you're trying to write robust code then this doesn't really help -- it's like saying the footgun only has 1 bullet in the chamber. Not as reassuring as you'd think. It's like if every time you called a function, you had to explicitly say whether you wanted exception handling to be enabled inside that function, and if you forgot then the interpreter might just skip the 'finally' blocks while unwinding. There's just *isn't* a good solution available. In my proposal (for-scoped-iterclose): - for robust systems: life is great -- you're still stopping to think a little about cleanup every time you use an iterator (because that's what it means to write robust code!), but since the iterators now know when they need cleanup and regular 'for' loops know how to invoke it, then 99% of the time (i.e., whenever you don't intend to re-use an iterator) you can be confident that just writing 'for' will do exactly the right thing, and the other 1% of the time (when you do want to re-use an iterator), you already *know* you're doing something clever. So the cognitive overhead on each for-loop is really low. - for one-off scripts: ~99% of the time (actual measurement, see below) everything just works, except maybe a little bit better. 1% of the time, you deploy the clever trick of re-using an iterator with multiple for loops, and it breaks, so this is some pain. Here's what you see: gen_obj = ... for first_line in gen_obj: break for lines in gen_obj: ... Traceback (most recent call last): File "/tmp/foo.py", line 5, in for lines in gen_obj: AlreadyClosedIteratorError: this iterator was already closed, possibly by a previous 'for' loop. (Maybe you want itertools.preserve?) (We could even have a PYTHONDEBUG flag that when enabled makes that error message include the file:line of the previous 'for' loop that called __iterclose__.) So this is pain! But the pain is (a) rare, not pervasive, (b) immediately obvious (an exception, the code doesn't work at all), not subtle and delayed, (c) easily googleable, (d) easy to fix and the fix is reliable. It's a totally different type of pain than the pain that we currently impose on folks who want to write robust code. Now compare to the new proposal (function-scoped-iterclose): - For those who want robust cleanup: Usually, I only need an iterator for as long as I'm iterating over it; that may or may not correspond to the end of the function (often won't). When these don't coincide, it can cause problems. E.g., consider the original example from my proposal: def read_newline_separated_json(path): with open(path) as f: for line in f: yield json.loads(line) but now suppose that I'm a Data Scientist (tm) so instead of having 1 file full of newline-separated JSON, I have a 100 gigabytes worth of the stuff stored in lots of files in a directory tree. Well, that's no problem, I'll just wrap that generator: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: for document in read_newline_separated_json(join(root, path)): yield document And then I'll run it on PyPy, because that's what you do when you have 100 GB of string processing, and... it'll crash, because the call to read_newline_separated_tree ends up doing thousands of calls to read_newline_separated_json, but never cleans up any of them up until the function exits, so eventually we run out of file descriptors. A similar situation arises in the main loop of something like an HTTP server: while True: request = read_request(sock) for response_chunk in application_handler(request): send_response_chunk(sock) Here we'll accumulate arbitrary numbers of un-closed application_handler generators attached to the stack frame, which is no good at all. And this has the interesting failure mode that you'll probably miss it in testing, because most clients will only re-use a connection a small number of times. So what this means is that every time I write a for loop, I can't just do a quick "am I going to break out of the for-loop and then re-use this iterator?" check -- I have to stop and think about whether this for-loop is nested inside some other loop, etc. And, again, if I get it wrong, then it's a subtle bug that will bite me later. It's true that with the status quo, we need to wrap, X% of for-loops with 'with' blocks, and with this proposal that number would drop to, I don't know, (X/5)% or something. But that's not the most important cost: the most important cost is the cognitive overhead of figuring out which for-loops need the special treatment, and in this proposal that checking is actually *more* complicated than the status quo. - For those who just want to write a quick script and not think about it: here's a script that does repeated partial for-loops over a generator object: https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588cb4/Tools/scripts/gprof2html.py#L40-L79 (and note that the generator object even has an ineffective 'with open(...)' block inside it!) With the function-scoped-iterclose, this script would continue to work as it does now. Excellent. But, suppose that I decide that that main() function is really complicated and that it would be better to refactor some of those loops out into helper functions. (Probably actually true in this example.) So I do that and... suddenly the code breaks. And in a rather confusing way, because it has to do with this complicated long-distance interaction between two different 'for' loops *and* where they're placed with respect to the original function versus the helper function. If I were an intermediate-level Python student (and I'm pretty sure anyone who is starting to get clever with re-using iterators counts as "intermediate level"), then I'm pretty sure I'd actually prefer the immediate obvious feedback from the for-scoped-iterclose. This would actually be a good time to teach folks about this aspect of resource handling, actually -- it's certainly an important thing to learn eventually on your way to Python mastery, even if it isn't needed for every script. In the pypy-dev thread about this proposal, there's some very distressed emails from someone who's been writing Python for a long time but only just realized that generator cleanup relies on the garbage collector: https://mail.python.org/pipermail/pypy-dev/2016-October/014709.html https://mail.python.org/pipermail/pypy-dev/2016-October/014720.html It's unpleasant to have the rug pulled out from under you like this and suddenly realize that you might have to go re-evaluate all the code you've ever written, and making for loops safe-by-default and fail-fast-when-unsafe avoids that. Anyway, in summary: function-scoped-iterclose doesn't seem to accomplish my goal of getting rid of the *type* of pain involved when you have to run a background thread in your brain that's doing constant paranoid checking every time you write a for loop. Instead it arguably takes that type of pain and spreads it around both the experts and the novices :-/. ------------- Now, let's look at some evidence about how disruptive the two proposals are for real code: As mentioned else-thread, I wrote a stupid little CPython hack [1] to report when the same iterator object gets passed to multiple 'for' loops, and ran the CPython and Django testsuites with it [2]. Looking just at generator objects [3], across these two large codebases there are exactly 4 places where this happens. (Rough idea of prevalence: these 4 places together account for a total of 8 'for' loops; this is out of a total of 11,503 'for' loops total, of which 665 involve generator objects.) The 4 places are: 1) CPython's Lib/test/test_collections.py:1135, Lib/_collections_abc.py:378 This appears to be a bug in the CPython test suite -- the little MySet class does 'def __init__(self, itr): self.contents = itr', which assumes that itr is a container that can be repeatedly iterated. But a bunch of the methods on collections.abc.Set like to pass in a generator object here instead, which breaks everything. If repeated 'for' loops on generators raised an error then this bug would have been caught much sooner. 2) CPython's Tools/scripts/gprof2html.py lines 45, 54, 59, 75 Discussed above -- as written, for-scoped-iterclose would break this script, but function-scoped-iterclose would not, so here function-scoped-iterclose wins. 3) Django django/utils/regex_helper.py:236 This code is very similar to the previous example in its general outline, except that the 'for' loops *have* been factored out into utility functions. So in this case for-scoped-iterclose and function-scoped-iterclose are equally disruptive. 4) CPython's Lib/test/test_generators.py:723 I have to admit I cannot figure out what this code is doing, besides showing off :-). But the different 'for' loops are in different stack frames, so I'm pretty sure that for-scoped-iterclose and function-scoped-iterclose would be equally disruptive. Obviously there's a bias here in that these are still relatively "serious" libraries; I don't have a big corpus of one-off scripts that are just a big __main__, though gprof2html.py isn't far from that. (If anyone knows where to find such a thing let me know...) But still, the tally here is that out of 4 examples, we have 1 subtle bug that iterclose might have caught, 2 cases where for-scoped-iterclose and function-scoped-iterclose are equally disruptive, and only 1 where function-scoped-iterclose is less disruptive -- and in that case it's arguably just avoiding an obvious error now in favor of a more confusing error later. If this reduced the backwards-incompatible cases by a factor of, like, 10x or 100x, then that would be a pretty strong argument in its favor. But it seems to be more like... 1.5x. -n [1] https://github.com/njsmith/cpython/commit/2b9d60e1c1b89f0f1ac30cbf0a5dceee835142c2 [2] CPython: revision b0a272709b from the github mirror; Django: revision 90c3b11e87 [3] I also looked at "all iterators" and "all iterators with .close methods", but this email is long enough... basically the pattern is the same: there are another 13 'for' loops that involve repeated iteration over non-generator objects, and they're roughly equally split between spurious effects due to bugs in the CPython test-suite or my instrumentation, cases where for-scoped-iterclose and function-scoped-iterclose both cause the same problems, and cases where function-scoped-iterclose is less disruptive. -n -- Nathaniel J. Smith -- https://vorpus.org From mikhailwas at gmail.com Tue Oct 25 18:32:42 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 26 Oct 2016 00:32:42 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> Message-ID: On 25 October 2016 at 19:10, Stephen J. Turnbull wrote: > So my previous thought on it was, that there could be set of such functions: > > str.translate_keep(table) - this is current translate, namely keeps > non-defined chars untouched > str.translate_drop(table) - all the same, but dropping non-defined chars > > Probaly also a pair of functions without translation: > str.remove(chars) - removes given chars > str.keep(chars) - removes all, except chars > > Motivation is that those can be optimised for speed and I suppose those > can work faster than re.sub(). >That said, multiple methods is a valid option for the API. Eg, Guido >generally prefers that distinctions that can't be made on type of >arguments (such as translate_keep vs translate_drop) be done by giving >different names rather than a flag argument. Do you *like* this API, >or was this motivated primarily by the possibilities you see for >optimization? Certainly I like the look of distinct functions more. It allows me to visually parse the code effectively, so e.g. for str.remove() I would not need to look in docs to understand what the function does. It has its downside of course, since new definitions can accidentally be similar to current ones, so more names, more the probability that no good names are left. Speed is not so important for majority of cases, at least for my current tasks. However if I'll need to process very large texts (seems like I will), speed will be more important. >The width is constant for any given string. However, I don't see at >this point that you'll need more than the functions available in >Python already, plus one or more wrappers to marshal the information >your API accepts to the data that str.translate wants. Just in some cases I need to convert them to numpy arrays back and forth, so this unicode vanity worries me a bit. But I cannot clearly explain why exactly I need this. > >> but as said I don't like very much the idea and would be OK for me to > >> use numeric values only. > Yeah I am strange. This however gives you guarantee for any environment that you > can see and input them ans save the work in ASCII. >This is not going to be a problem if you're running Python and can >enter the program and digits. In any case, the API is going to have >to be convenient for all the people who expect that they will never >again be reduced to a hex keypad and 7-segment display Here I will dare to make a lyrical degression again. It could have made an impression that I am stuck in nineties or something. But that is not the case. In nineties I used the PC mostly to play Duke Nukem (yeh big times!). And all the more I hadnt any idea what is efficiency of information representation and readability. Now I kind of realize it. So I am just not the one who believes in these maximalistical "we need over 9000 glyphs" talks. And, somewhat prophetic view on this: with the come of cyber era this all be flushed so fast, that all this diligences around unicode could look funny actually. And a hex keypad will not sound "retro" but "brand new". In other words: I feel really strong that nothin besides standard characters must appear in code sources. If one wants to process unicode, then parse them as resources. So please, at least out of respect to rationally minded, don't make a code look like a christmas-tree. BTW, I use VIM to code actually so anyway I will not see them in my code. Mikhail From njs at pobox.com Tue Oct 25 18:48:53 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Oct 2016 15:48:53 -0700 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: ...Doh. I spent all that time evaluating the function-scoped-cleanup proposal from the high-level design perspective, and then immediately after hitting send, I suddenly realized that I'd missed a much more straightforward technical problem. One thing that 'with' blocks / for-scoped-iterclose do is that they put an upper bound on the lifetime of generator objects. That's important if you're using a non-refcounting-GC, or if there might be reference cycles. But it's not all they do: they also arrange to make sure that any cleanup code is executed in the context of the code that's using the generator. This is *also* really important: if you have an exception in your cleanup code, and the GC runs your cleanup code, then that exception will just disappear into nothingness (well, it'll get printed to the console, but that's hardly better). So you don't want to let the GC run your cleanup code. If you have an async generator, you want to run the cleanup code under supervision of the calling functions coroutine runner, and ideally block the running coroutine while you do it; doing this from the GC is difficult-to-impossible (depending on how picky you are -- PEP 525 does part of it, but not all). Again, letting the GC get involved is bad. So for the function-scoped-iterclose proposal: does this implicit ExitStack-like object take a strong reference to iterators, or just a weak one? If it takes a strong reference, then suddenly we're pinning all iterators in memory until the end of the enclosing function, which will often look like a memory leak. I think this would break a *lot* more existing code than the for-scoped-iterclose proposal does, and in more obscure ways that are harder to detect and warn about ahead of time. So that's out. If it takes a weak reference, ... then there's a good chance that iterators will get garbage collected before the ExitStack has a chance to clean them up properly. So we still have no guarantee that the cleanup will happen in the right context, that exceptions will not be lost, and so forth. In fact, it becomes literally non-deterministic: you might see an exception propagate properly on one run, and not on the next, depending on exactly when the garbage collector happened to run. IMHO that's *way* too spooky to be allowed, but I can't see any way to fix it within the function-scoping framework :-( -n On Tue, Oct 25, 2016 at 3:25 PM, Nathaniel Smith wrote: > On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan wrote: >> On 20 October 2016 at 07:02, Nathaniel Smith wrote: >>> The first change is to replace the outer for loop with a while/pop >>> loop, so that if an exception occurs we'll know which iterables remain >>> to be processed: >>> >>> def chain(*iterables): >>> try: >>> while iterables: >>> for element in iterables.pop(0): >>> yield element >>> ... >>> >>> Now, what do we do if an exception does occur? We need to call >>> iterclose on all of the remaining iterables, but the tricky bit is >>> that this might itself raise new exceptions. If this happens, we don't >>> want to abort early; instead, we want to continue until we've closed >>> all the iterables, and then raise a chained exception. Basically what >>> we want is: >>> >>> def chain(*iterables): >>> try: >>> while iterables: >>> for element in iterables.pop(0): >>> yield element >>> finally: >>> try: >>> operators.iterclose(iter(iterables[0])) >>> finally: >>> try: >>> operators.iterclose(iter(iterables[1])) >>> finally: >>> try: >>> operators.iterclose(iter(iterables[2])) >>> finally: >>> ... >>> >>> but of course that's not valid syntax. Fortunately, it's not too hard >>> to rewrite that into real Python -- but it's a little dense: >>> >>> def chain(*iterables): >>> try: >>> while iterables: >>> for element in iterables.pop(0): >>> yield element >>> # This is equivalent to the nested-finally chain above: >>> except BaseException as last_exc: >>> for iterable in iterables: >>> try: >>> operators.iterclose(iter(iterable)) >>> except BaseException as new_exc: >>> if new_exc.__context__ is None: >>> new_exc.__context__ = last_exc >>> last_exc = new_exc >>> raise last_exc >>> >>> It's probably worth wrapping that bottom part into an iterclose_all() >>> helper, since the pattern probably occurs in other cases as well. >>> (Actually, now that I think about it, the map() example in the text >>> should be doing this instead of what it's currently doing... I'll fix >>> that.) >> >> At this point your code is starting to look a whole lot like the code >> in contextlib.ExitStack.__exit__ :) > > One of the versions I tried but didn't include in my email used > ExitStack :-). It turns out not to work here: the problem is that we > effectively need to enter *all* the contexts before unwinding, even if > trying to enter one of them fails. ExitStack is nested like (try (try > (try ... finally) finally) finally), and we need (try finally (try > finally (try finally ...))) But this is just a small side-point > anyway, since most code is not implementing complicated > meta-iterators; I'll address your real proposal below. > >> Accordingly, I'm going to suggest that while I agree the problem you >> describe is one that genuinely emerges in large production >> applications and other complex systems, this particular solution is >> simply far too intrusive to be accepted as a language change for >> Python - you're talking a fundamental change to the meaning of >> iteration for the sake of the relatively small portion of the >> community that either work on such complex services, or insist on >> writing their code as if it might become part of such a service, even >> when it currently isn't. Given that simple applications vastly >> outnumber complex ones, and always will, I think making such a change >> would be a bad trade-off that didn't come close to justifying the >> costs imposed on the rest of the ecosystem to adjust to it. >> >> A potentially more fruitful direction of research to pursue for 3.7 >> would be the notion of "frame local resources", where each Python >> level execution frame implicitly provided a lazily instantiated >> ExitStack instance (or an equivalent) for resource management. >> Assuming that it offered an "enter_frame_context" function that mapped >> to "contextlib.ExitStack.enter_context", such a system would let us do >> things like: > > So basically a 'with expression', that gives up the block syntax -- > taking its scope from the current function instead -- in return for > being usable in expression context? That's a really interesting, and I > see the intuition that it might be less disruptive if our implicit > iterclose calls are scoped to the function rather than the 'for' loop. > > But having thought about it and investigated some... I don't think > function-scoping addresses my problem, and I don't see evidence that > it's meaningfully less disruptive to existing code. > > First, "my problem": > > Obviously, Python's a language that should be usable for folks doing > one-off scripts, and for paranoid folks trying to write robust complex > systems, and for everyone in between -- these are all really important > constituencies. And unfortunately, there is a trade-off here, where > the changes we're discussing effect these constituencies differently. > But it's not just a matter of shifting around a fixed amount of pain; > the *quality* of the pain really changes under the different > proposals. > > In the status quo: > - for one-off scripts: you can just let the GC worry about generator > and file handle cleanup, re-use iterators, whatever, it's cool > - for robust systems: because it's the *caller's* responsibility to > ensure that iterators are cleaned up, you... kinda can't really use > generators without -- pick one -- (a) draconian style guides (like > forbidding 'with' inside generators or forbidding bare 'for' loops > entirely), (b) lots of auditing (every time you write a 'for' loop, go > read the source to the generator you're iterating over -- no > modularity for you and let's hope the answer doesn't change!), or (c) > introducing really subtle bugs. Or all of the above. It's true that a > lot of the time you can ignore this problem and get away with it one > way or another, but if you're trying to write robust code then this > doesn't really help -- it's like saying the footgun only has 1 bullet > in the chamber. Not as reassuring as you'd think. It's like if every > time you called a function, you had to explicitly say whether you > wanted exception handling to be enabled inside that function, and if > you forgot then the interpreter might just skip the 'finally' blocks > while unwinding. There's just *isn't* a good solution available. > > In my proposal (for-scoped-iterclose): > - for robust systems: life is great -- you're still stopping to think > a little about cleanup every time you use an iterator (because that's > what it means to write robust code!), but since the iterators now know > when they need cleanup and regular 'for' loops know how to invoke it, > then 99% of the time (i.e., whenever you don't intend to re-use an > iterator) you can be confident that just writing 'for' will do exactly > the right thing, and the other 1% of the time (when you do want to > re-use an iterator), you already *know* you're doing something clever. > So the cognitive overhead on each for-loop is really low. > - for one-off scripts: ~99% of the time (actual measurement, see > below) everything just works, except maybe a little bit better. 1% of > the time, you deploy the clever trick of re-using an iterator with > multiple for loops, and it breaks, so this is some pain. Here's what > you see: > > gen_obj = ... > for first_line in gen_obj: > break > for lines in gen_obj: > ... > > Traceback (most recent call last): > File "/tmp/foo.py", line 5, in > for lines in gen_obj: > AlreadyClosedIteratorError: this iterator was already closed, > possibly by a previous 'for' loop. (Maybe you want > itertools.preserve?) > > (We could even have a PYTHONDEBUG flag that when enabled makes that > error message include the file:line of the previous 'for' loop that > called __iterclose__.) > > So this is pain! But the pain is (a) rare, not pervasive, (b) > immediately obvious (an exception, the code doesn't work at all), not > subtle and delayed, (c) easily googleable, (d) easy to fix and the fix > is reliable. It's a totally different type of pain than the pain that > we currently impose on folks who want to write robust code. > > Now compare to the new proposal (function-scoped-iterclose): > > - For those who want robust cleanup: Usually, I only need an iterator > for as long as I'm iterating over it; that may or may not correspond > to the end of the function (often won't). When these don't coincide, > it can cause problems. E.g., consider the original example from my > proposal: > > def read_newline_separated_json(path): > with open(path) as f: > for line in f: > yield json.loads(line) > > but now suppose that I'm a Data Scientist (tm) so instead of having 1 > file full of newline-separated JSON, I have a 100 gigabytes worth of > the stuff stored in lots of files in a directory tree. Well, that's no > problem, I'll just wrap that generator: > > def read_newline_separated_json_tree(tree): > for root, _, paths in os.walk(tree): > for path in paths: > for document in read_newline_separated_json(join(root, path)): > yield document > > And then I'll run it on PyPy, because that's what you do when you have > 100 GB of string processing, and... it'll crash, because the call to > read_newline_separated_tree ends up doing thousands of calls to > read_newline_separated_json, but never cleans up any of them up until > the function exits, so eventually we run out of file descriptors. > > A similar situation arises in the main loop of something like an HTTP server: > > while True: > request = read_request(sock) > for response_chunk in application_handler(request): > send_response_chunk(sock) > > Here we'll accumulate arbitrary numbers of un-closed > application_handler generators attached to the stack frame, which is > no good at all. And this has the interesting failure mode that you'll > probably miss it in testing, because most clients will only re-use a > connection a small number of times. > > So what this means is that every time I write a for loop, I can't just > do a quick "am I going to break out of the for-loop and then re-use > this iterator?" check -- I have to stop and think about whether this > for-loop is nested inside some other loop, etc. And, again, if I get > it wrong, then it's a subtle bug that will bite me later. It's true > that with the status quo, we need to wrap, X% of for-loops with 'with' > blocks, and with this proposal that number would drop to, I don't > know, (X/5)% or something. But that's not the most important cost: the > most important cost is the cognitive overhead of figuring out which > for-loops need the special treatment, and in this proposal that > checking is actually *more* complicated than the status quo. > > - For those who just want to write a quick script and not think about > it: here's a script that does repeated partial for-loops over a > generator object: > > https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588cb4/Tools/scripts/gprof2html.py#L40-L79 > > (and note that the generator object even has an ineffective 'with > open(...)' block inside it!) > > With the function-scoped-iterclose, this script would continue to work > as it does now. Excellent. > > But, suppose that I decide that that main() function is really > complicated and that it would be better to refactor some of those > loops out into helper functions. (Probably actually true in this > example.) So I do that and... suddenly the code breaks. And in a > rather confusing way, because it has to do with this complicated > long-distance interaction between two different 'for' loops *and* > where they're placed with respect to the original function versus the > helper function. > > If I were an intermediate-level Python student (and I'm pretty sure > anyone who is starting to get clever with re-using iterators counts as > "intermediate level"), then I'm pretty sure I'd actually prefer the > immediate obvious feedback from the for-scoped-iterclose. This would > actually be a good time to teach folks about this aspect of resource > handling, actually -- it's certainly an important thing to learn > eventually on your way to Python mastery, even if it isn't needed for > every script. > > In the pypy-dev thread about this proposal, there's some very > distressed emails from someone who's been writing Python for a long > time but only just realized that generator cleanup relies on the > garbage collector: > > https://mail.python.org/pipermail/pypy-dev/2016-October/014709.html > https://mail.python.org/pipermail/pypy-dev/2016-October/014720.html > > It's unpleasant to have the rug pulled out from under you like this > and suddenly realize that you might have to go re-evaluate all the > code you've ever written, and making for loops safe-by-default and > fail-fast-when-unsafe avoids that. > > Anyway, in summary: function-scoped-iterclose doesn't seem to > accomplish my goal of getting rid of the *type* of pain involved when > you have to run a background thread in your brain that's doing > constant paranoid checking every time you write a for loop. Instead it > arguably takes that type of pain and spreads it around both the > experts and the novices :-/. > > ------------- > > Now, let's look at some evidence about how disruptive the two > proposals are for real code: > > As mentioned else-thread, I wrote a stupid little CPython hack [1] to > report when the same iterator object gets passed to multiple 'for' > loops, and ran the CPython and Django testsuites with it [2]. Looking > just at generator objects [3], across these two large codebases there > are exactly 4 places where this happens. (Rough idea of prevalence: > these 4 places together account for a total of 8 'for' loops; this is > out of a total of 11,503 'for' loops total, of which 665 involve > generator objects.) The 4 places are: > > 1) CPython's Lib/test/test_collections.py:1135, Lib/_collections_abc.py:378 > > This appears to be a bug in the CPython test suite -- the little MySet > class does 'def __init__(self, itr): self.contents = itr', which > assumes that itr is a container that can be repeatedly iterated. But a > bunch of the methods on collections.abc.Set like to pass in a > generator object here instead, which breaks everything. If repeated > 'for' loops on generators raised an error then this bug would have > been caught much sooner. > > 2) CPython's Tools/scripts/gprof2html.py lines 45, 54, 59, 75 > > Discussed above -- as written, for-scoped-iterclose would break this > script, but function-scoped-iterclose would not, so here > function-scoped-iterclose wins. > > 3) Django django/utils/regex_helper.py:236 > > This code is very similar to the previous example in its general > outline, except that the 'for' loops *have* been factored out into > utility functions. So in this case for-scoped-iterclose and > function-scoped-iterclose are equally disruptive. > > 4) CPython's Lib/test/test_generators.py:723 > > I have to admit I cannot figure out what this code is doing, besides > showing off :-). But the different 'for' loops are in different stack > frames, so I'm pretty sure that for-scoped-iterclose and > function-scoped-iterclose would be equally disruptive. > > Obviously there's a bias here in that these are still relatively > "serious" libraries; I don't have a big corpus of one-off scripts that > are just a big __main__, though gprof2html.py isn't far from that. (If > anyone knows where to find such a thing let me know...) But still, the > tally here is that out of 4 examples, we have 1 subtle bug that > iterclose might have caught, 2 cases where for-scoped-iterclose and > function-scoped-iterclose are equally disruptive, and only 1 where > function-scoped-iterclose is less disruptive -- and in that case it's > arguably just avoiding an obvious error now in favor of a more > confusing error later. > > If this reduced the backwards-incompatible cases by a factor of, like, > 10x or 100x, then that would be a pretty strong argument in its favor. > But it seems to be more like... 1.5x. > > -n > > [1] https://github.com/njsmith/cpython/commit/2b9d60e1c1b89f0f1ac30cbf0a5dceee835142c2 > [2] CPython: revision b0a272709b from the github mirror; Django: > revision 90c3b11e87 > [3] I also looked at "all iterators" and "all iterators with .close > methods", but this email is long enough... basically the pattern is > the same: there are another 13 'for' loops that involve repeated > iteration over non-generator objects, and they're roughly equally > split between spurious effects due to bugs in the CPython test-suite > or my instrumentation, cases where for-scoped-iterclose and > function-scoped-iterclose both cause the same problems, and cases > where function-scoped-iterclose is less disruptive. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org -- Nathaniel J. Smith -- https://vorpus.org From mikhailwas at gmail.com Tue Oct 25 18:53:59 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 26 Oct 2016 00:53:59 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: On 25 October 2016 at 23:50, Chris Barker wrote: >that was kind of a throwaway comment, >but I think it's a LONG way out, but ideally, >the OWTDI would be "curly quotes". The fact that in ASCII, >a single quote and a apostrophe are teh same, >and that there is no distinction between opening >and closing quotes is unfortunate. Yes from readability POV, curly quotes would make sense, and better than many other options, eg. ?these?. Also from POV of parser this could be beneficial to have opening/closing char (or not?). This only means that those chars should be in ASCII ideally. Which is not the case. And IMO not that now code should allow all characters. Mikhail From njs at pobox.com Tue Oct 25 19:15:07 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Oct 2016 16:15:07 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On Tue, Oct 25, 2016 at 2:55 PM, Chris Barker wrote: > > >> But that's something of a solved problem. IPython offers a rich >> interactive environment, for people who find the limitations of the >> standard interactive prompt frustrating. Would it be worth the >> standard Python documentation promoting IPython for that role? > > > +1 iPython really makes it easier to do exploratory code -- I have my > students install it day one of an intro to python class. > > However, maybe ironically, iPython is still a bit ugly for editing > multi-line constructs -- maybe it will get better. I'm sure it could be improved more, but since the 5.0 release IPython has been *way* better at editing multi-line constructs than the built-in REPL is. For example, if I type: In [1]: def f(): ...: return 1 ...: In [2]: and then press up-arrow once, it gives me the complete function body back and lets me move around and edit it. Incidentally, PyPy's built-in REPL handles multi-line constructs like IPython does, rather than like how the CPython built-in REPL does. There are a lot of logistic issues that would need to be dealt with before CPython could consider making a third-party REPL the default or anything like it... it looks like IPython's dependency tree is all pure-Python, which makes it more viable, but it's still a lot of code and on a very different development cycle than CPython. bpython appears to depend on greenlet, which is a whole other issue... OTOH it seems a little quixotic to spend lots of resources improving the built-in REPL when there are much better ones with vibrant developer communities. -n -- Nathaniel J. Smith -- https://vorpus.org From rob.cliffe at btinternet.com Tue Oct 25 20:25:48 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 26 Oct 2016 01:25:48 +0100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <58007475.9010306@canterbury.ac.nz> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> <58007475.9010306@canterbury.ac.nz> Message-ID: <918d4ff5-c05b-8c65-036a-363412e66703@btinternet.com> On 14/10/2016 07:00, Greg Ewing wrote: > Neil Girdhar wrote: >> At the end of this discussion it might be good to get a tally of how >> many people think the proposal is reasonable and logical. > > I think it's reasonable and logical. > I concur. Two points I personally find in favour, YMMV: (1) [*subseq for subseq in seq] avoids the "conceptual hiatus" I described earlier in [elt for subseq in seq for elt in subseq] (I.e. I think the case for the proposal would be weaker if the loops in a list comprehension were written in reverse order.) (2) This is admittedly a somewhat tangential argument, but: I didn't really know what "yield from" meant. But when I read in an earlier post that someone had proposed "yield *" for it, I had a Eureka moment. Which suggests if "*" is used to mean some sort of unpacking in more contexts, the more familiar and intuitive it may become. I guess the word I'm groping for is 'consistency'. Rob Cliffe From steve at pearwood.info Tue Oct 25 21:11:13 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 12:11:13 +1100 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: <20161026011112.GF15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 04:55:21PM -0500, Ryan Gonzalez wrote: > Also, as an extension of this idea, would it be possible to improve errors > like this: > > > class X: pass > X() # object() takes no parameters > > > to show the actual type instead of just 'object'? My wild guess is that the cause is that __new__ looks like this: class object(): def __new__(cls, *args): if args: raise TypeError('object() takes no parameters') Except in C, of course. This is probably a left-over from the days when object() took and ignored any parameters. If my guess is close, then maybe we can do this: if args: raise TypeError('%s() takes no parameters' % cls.__name__) or equivalent. -- Steve From steve at pearwood.info Tue Oct 25 21:22:35 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 12:22:35 +1100 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: <4e0e81a9-6f27-453a-99d7-3013cddd606b@googlegroups.com> <572eccd4-99c0-7866-468b-25e93c9656c4@gmail.com> Message-ID: <20161026012234.GG15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 12:17:56PM -0700, Nathaniel Smith wrote: > I get that this list's default is to push > back on proposed changes, and it's a good principle in general, but > "improved error messages" are *really* cheap. The bar should be pretty > low, IMO. I think its even lower than most may realise. Error messages are not part of the official API of the function or class, so IMO we can change them any time we want, even between point releases. We shouldn't do so just for the sake of change[1], because chances are that you'll break somebody's doctests. But doctests that depend on the exact wording of a error message are already broken. For a sufficiently good improvement in error reporting, I think we should be free to make that change without having to wait for a new minor release. > If someone's willing to do the work to make the error > messages friendlier, while addressing technical considerations like > the repr issue, then that's awesome, and if that means that users get > somewhat redundant information to help them debug then... cool? Indeed. > Terseness per se is not a cardinal virtue :-). "Note the consistent user interface and error reportage. Ed is generous enough to flag errors, yet prudent enough not to overwhelm the novice with verbosity." https://www.gnu.org/fun/jokes/ed-msg.html -- Steve From rymg19 at gmail.com Tue Oct 25 21:34:12 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 25 Oct 2016 20:34:12 -0500 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: <20161026011112.GF15983@ando.pearwood.info> References: <20161026011112.GF15983@ando.pearwood.info> Message-ID: Yeah, I just checked the source and tried changing it. Seems to work well. On Tue, Oct 25, 2016 at 8:11 PM, Steven D'Aprano wrote: > On Tue, Oct 25, 2016 at 04:55:21PM -0500, Ryan Gonzalez wrote: > > Also, as an extension of this idea, would it be possible to improve > errors > > like this: > > > > > > class X: pass > > X() # object() takes no parameters > > > > > > to show the actual type instead of just 'object'? > > My wild guess is that the cause is that __new__ looks like this: > > class object(): > def __new__(cls, *args): > if args: > raise TypeError('object() takes no parameters') > > > Except in C, of course. > > This is probably a left-over from the days when object() took and > ignored any parameters. If my guess is close, then maybe we can do this: > > if args: > raise TypeError('%s() takes no parameters' % cls.__name__) > > > or equivalent. > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Tue Oct 25 21:37:54 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 26 Oct 2016 03:37:54 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: On 26 October 2016 at 00:53, Mikhail V wrote: > On 25 October 2016 at 23:50, Chris Barker wrote: > >>that was kind of a throwaway comment, >>but I think it's a LONG way out, but ideally, >>the OWTDI would be "curly quotes". The fact that in ASCII, >>a single quote and a apostrophe are teh same, >>and that there is no distinction between opening >>and closing quotes is unfortunate. > > Yes from readability POV, curly quotes would make > sense, and better than many other options, eg. ?these?. > Also from POV of parser this could be > beneficial to have opening/closing char (or not?). > This only means that those chars should be in > ASCII ideally. Which is not the case. > And IMO not that now code should allow > all characters. > > Mikhail Extended ASCII 145 ? ‘ ‘ Left single quotation mark 146 ? ’ ’ Right single quotation mark 147 ? “ “ Left double quotation mark 148 ? ” ” Right double quotation mark 149 ? • • Bullet 150 ? – – En dash 151 ? — — Em dash 152 ? ˜ ˜ Small tilde So we all must repent now and get back to 8-bit charcters. From steve at pearwood.info Tue Oct 25 21:40:32 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 12:40:32 +1100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> Message-ID: <20161026014031.GH15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 05:15:58PM +0200, Mikhail V wrote: [...] > >Or it can take a mapping (usually a dict) that maps either characters or > >ordinal numbers to a new string (not just a single character, but an > >arbitrary string) or ordinal numbers. > > > > str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) > > >(or None, to delete them). Note the flexibility: you don't need to > > Good. But of course if I do it with big tables, I would anyway > need to parse them from some table file. Typing all values > direct in code is not a comfortable way. Why not? What is the difference between typing 123: 456 124: 457 125: 458 # two hundred more lines in a "table.txt" file, and typing: { 123: 456, 124: 457, 125: 458, # two hundred more lines } in a "table.py" file? The difference is insignificant. And the Python version can be cleaned up: for i in range(123, 333): table[i] = 456 - 123 + i Not all data whould be written as code, especially if you expect unskilled users to edit it, but generating data directly in code is a very powerful technique, and the strict syntax of the programming language helps prevent some errors. [...] > Motivation is that those can be optimised for speed That's not a motivation. Why are you talking about "optimizing for speed" functions that we have not yet established are needed? That reminds me of a story I once heard of somebody who was driving across the desert in the US once. One of his passengers noticed the highway signs and said "Wait, aren't we going the wrong way?" The driver replied "Who cares, we're making fantastic time!" Optimizing a function you don't need is not an optimization. It is a waste of time. -- Steve From mertz at gnosis.cx Tue Oct 25 21:36:28 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 25 Oct 2016 21:36:28 -0400 Subject: [Python-ideas] Reduce/fold and scan with generator expressions and comprehensions In-Reply-To: References: <20161023155920.GR22471@ando.pearwood.info> <20161024002939.GV22471@ando.pearwood.info> Message-ID: On Tue, Oct 25, 2016 at 1:55 PM, Rob Cliffe wrote: > > >>> [prev * k for k in [5, 2, 4, 3] from prev = 1] > [1, 5, 10, 40, 120] > > That makes sense for me, and seem simpler than: > > >>> from itertools import accumulate, chain > >>> list(accumulate(chain([1], [5, 2, 4, 3]), lambda prev, k: prev * k)) > [1, 5, 10, 40, 120] > > Well, if you want an instant reaction from someone skimming this thread: I > looked at the first example and couldn't understand it. > After reading every post in the thread, I still don't understand the proposed new syntax really. How does 'prev' get bound in the loop? Is it a new magic keyword for "last accumulated element?" Does the binding in the "from" clause magically say that it should get rebound in the loop where it's no longer mentioned? Why is the word `from` used here when there's no obvious relation to other uses? The alternate version looks much better if you don't try so hard to make it look bad. The much more obvious spelling is: from operator import mul from itertools import accumulate, chain accumulate(chain([1], nums), mul) If you give up a fear of using `import` and stop arbitrarily converting a possibly infinite iterator to a concrete list, this form is extremely short and obvious. Moreover, in the example, it is extra strange that the multiplicative identity is added into the front of the iterator. This is exactly the same thing as simply spelling it: accumulate(nums, mul) Which is even shorter. It's feels very artificially contrived to insist that the initial element must live somewhere other than the iterator itself. But it would be easy enough to write a wrapper to massage an iterator for this special case: def prepend(item, it): return itertools.chain([item], it) Doesn't save any characters, but the name might emphasize the intent. Maybe this implementation forces the point even more: def prepend(item, it): yield item yield from it -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Oct 25 21:46:43 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 25 Oct 2016 21:46:43 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: This is a nice summary of quotation marks used in various languages: https://en.wikipedia.org/wiki/Quotation_mark#Specific_language_features On Tue, Oct 25, 2016 at 9:37 PM, Mikhail V wrote: > On 26 October 2016 at 00:53, Mikhail V wrote: > > On 25 October 2016 at 23:50, Chris Barker wrote: > > > >>that was kind of a throwaway comment, > >>but I think it's a LONG way out, but ideally, > >>the OWTDI would be "curly quotes". The fact that in ASCII, > >>a single quote and a apostrophe are teh same, > >>and that there is no distinction between opening > >>and closing quotes is unfortunate. > > > > Yes from readability POV, curly quotes would make > > sense, and better than many other options, eg. ?these?. > > Also from POV of parser this could be > > beneficial to have opening/closing char (or not?). > > This only means that those chars should be in > > ASCII ideally. Which is not the case. > > And IMO not that now code should allow > > all characters. > > > > Mikhail > > Extended ASCII > > 145 ? ‘ ‘ Left single quotation mark > 146 ? ’ ’ Right single quotation mark > 147 ? “ “ Left double quotation mark > 148 ? ” ” Right double quotation mark > 149 ? • • Bullet > 150 ? – – En dash > 151 ? — — Em dash > 152 ? ˜ ˜ Small tilde > > So we all must repent now and get back to 8-bit charcters. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Tue Oct 25 22:29:13 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 26 Oct 2016 04:29:13 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: <20161026014031.GH15983@ando.pearwood.info> References: <20161025023704.GD15983@ando.pearwood.info> <20161026014031.GH15983@ando.pearwood.info> Message-ID: On 26 October 2016 at 03:40, Steven D'Aprano wrote: > in a "table.txt" file, and typing: > > { > 123: 456, > 124: 457, > 125: 458, > # two hundred more lines > } > > > in a "table.py" file? The difference is insignificant. And the Python > version can be cleaned up: > Ok, you have opened my eyes here. Thank you, you re good. > [...] >> Motivation is that those can be optimised for speed > > That's not a motivation. Why are you talking about "optimizing for > speed" functions that we have not yet established are needed? > > That reminds me of a story I once heard of somebody who was driving > across the desert in the US once. One of his passengers noticed the > highway signs and said "Wait, aren't we going the wrong way?" The driver > replied "Who cares, we're making fantastic time!" > > Optimizing a function you don't need is not an optimization. It is a > waste of time. Making good time is important indeed! I need translate() which drops non-defined chars. Please :) No optimisation, no new syntax. deal? Mikhail From steve at pearwood.info Wed Oct 26 06:50:36 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 21:50:36 +1100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <20161026014031.GH15983@ando.pearwood.info> Message-ID: <20161026105035.GI15983@ando.pearwood.info> On Wed, Oct 26, 2016 at 04:29:13AM +0200, Mikhail V wrote: > I need translate() which drops non-defined chars. Please :) > No optimisation, no new syntax. deal? I still wonder whether this might be worth introducing as a new string method, or an option to translate. But the earliest that will happen is Python 3.7, so in the meantime, something like this should be enough: # untested keep = "abcd??????" text = "..." # Find all the characters in text that are not in keep: delchars = set(text) - set(keep) delchars = ''.join(delchars) text = text.translate(str.maketrans("", "", delchars)) -- Steve From steve at pearwood.info Wed Oct 26 07:29:56 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Oct 2016 22:29:56 +1100 Subject: [Python-ideas] f-string, for dictionaries In-Reply-To: References: Message-ID: <20161026112956.GJ15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 09:11:06PM +0200, Michel Desmoulin wrote: > We have a syntax to create strings with variables automatically inferred > from its context: > > >>> name = "Guido" > >>> print(f'Hello {name}') > Hello Guido > Similarly, I'd like to suggest a similar feature for building dictionaries: > > >>> foo = 1 > >>> bar = 2 > >>> {:bar, :foo} > {'bar': 1, 'foo', 2} How often do you do this? Under what circumstances do you do this? If your code is like my code, the answer is: very rarely; and it is so long since I've needed anything like this I don't recall why. I don't think this is a common operation. So even though writing: {'spam': spam, 'eggs': eggs} or even: dict(spam=spam, eggs=eggs) is a bit of an annoyance, it doesn't happen often enough to matter, and its better to just write what you want explicitly rather than have to learn another special syntax for something you use so rarely you'll probably never remember it. Or at least, *I'll* never remember it. It will be just one more cryptic and strange syntax to confuse beginners and intermediate users with: # I never remember which one to use... {:spam, :eggs} {spam:, eggs:} {spam, eggs} > And a similar way to get the content from the dictionary into variables: > > >>> values = {'bar': 1, 'foo', 2} > >>> {:bar, :foo} = values > >>> bar > 1 > >>> foo > 2 Check the archives: https://mail.python.org/pipermail/python-ideas/2016-May/040430.html https://mail.python.org/pipermail/python-ideas/2008-March/001511.html https://mail.python.org/pipermail/python-ideas/2008-April/001513.html -- Steve From ncoghlan at gmail.com Wed Oct 26 11:21:03 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Oct 2016 01:21:03 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: <266c6cdc-0bc7-9240-cc11-98bebbf1cbf5@gmail.com> References: <1476895131.1459735.761051137.5A71FE83@webmail.messagingengine.com> <20161021071219.GH22471@ando.pearwood.info> <22542.48277.333896.349836@turnbull.sk.tsukuba.ac.jp> <266c6cdc-0bc7-9240-cc11-98bebbf1cbf5@gmail.com> Message-ID: On 26 October 2016 at 01:59, Yury Selivanov wrote: > But how would it help with a partial iteration over generators > with a "with" statement inside? > > def it(): > with open(file) as f: > for line in f: > yield line > > Nathaniel proposal addresses this by fixing "for" statements, > so that the outer loop that iterates over "it" would close > the generator once the iteration is stopped. > > With your proposal you want to attach the opened file to the > frame, but you'd need to attach it to the frame of *caller* of > "it", right? Every frame in the stack would still need to opt in to deterministic cleanup of its resources, but the difference is that it becomes an inline operation within the expression creating the iterator, rather than a complete restructuring of the function: def iter_consumer(fname): for line in function_resource(open(fname)): ... It doesn't matter *where* the iterator is being used (or even if you received it as a parameter), you get an easy way to say "When this function exits, however that happens, clean this up". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Wed Oct 26 12:27:07 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 03:27:07 +1100 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <918d4ff5-c05b-8c65-036a-363412e66703@btinternet.com> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> <58007475.9010306@canterbury.ac.nz> <918d4ff5-c05b-8c65-036a-363412e66703@btinternet.com> Message-ID: <20161026162706.GK15983@ando.pearwood.info> On Wed, Oct 26, 2016 at 01:25:48AM +0100, Rob Cliffe wrote: > (2) This is admittedly a somewhat tangential argument, but: I didn't > really know what "yield from" meant. But when I read in an earlier post > that someone had proposed "yield *" for it, I had a Eureka moment. Are you aware that "yield from it" does not just mean this...? for x in it: yield x "yield from ..." is not just "some sort of unpacking". If all it did was iterate over an iterable and yield the values, it would not have been given special syntax just to save one line. It does *much* more than just those two lines. For start, it is an expression which returns a value, so you can write: result = yield from it A full implementation of "yield from ..." would be 39 lines of Python code, according to the PEP, not two. It has to handle delegating send(), throw() and close() messages, exceptions, plus of course the obvious iteration. > Which suggests if "*" is used to mean some sort of unpacking in more > contexts, the more familiar and intuitive it may become. I guess the > word I'm groping for is 'consistency'. I think that there is zero hope of consistency for * the star operator. That horse has bolted. It is already used for: - multiplication and exponentiation - repetition - "zero or more of the previous element" in regular expressions - "zero or more of any character" in globbing - "everything" in imports - sequence unpacking - sequence packing - collecting positional and keyword arguments Some of which are admittedly *similar* uses, but the * operator does get overloaded for so many unrelated uses. -- Steve From ncoghlan at gmail.com Wed Oct 26 12:54:27 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Oct 2016 02:54:27 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 26 October 2016 at 08:25, Nathaniel Smith wrote: > On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan wrote: >> At this point your code is starting to look a whole lot like the code >> in contextlib.ExitStack.__exit__ :) > > One of the versions I tried but didn't include in my email used > ExitStack :-). It turns out not to work here: the problem is that we > effectively need to enter *all* the contexts before unwinding, even if > trying to enter one of them fails. ExitStack is nested like (try (try > (try ... finally) finally) finally), and we need (try finally (try > finally (try finally ...))) Regardless of any other outcome from this thread, it may be useful to have a "contextlib.ResourceSet" as an abstraction for collective management of resources, regardless of whatever else happens. As you say, the main difference is that the invocation of the cleanup functions wouldn't be nested at all and could be called in an arbitrary order (if that's not sufficient for a particular use case, then you'd need to define an ExitStack for the items where the order of cleanup matters, and then register *that* with the ResourceSet). >> A potentially more fruitful direction of research to pursue for 3.7 >> would be the notion of "frame local resources", where each Python >> level execution frame implicitly provided a lazily instantiated >> ExitStack instance (or an equivalent) for resource management. >> Assuming that it offered an "enter_frame_context" function that mapped >> to "contextlib.ExitStack.enter_context", such a system would let us do >> things like: > > So basically a 'with expression', that gives up the block syntax -- > taking its scope from the current function instead -- in return for > being usable in expression context? That's a really interesting, and I > see the intuition that it might be less disruptive if our implicit > iterclose calls are scoped to the function rather than the 'for' loop. > > But having thought about it and investigated some... I don't think > function-scoping addresses my problem, and I don't see evidence that > it's meaningfully less disruptive to existing code. > > First, "my problem": > > Obviously, Python's a language that should be usable for folks doing > one-off scripts, and for paranoid folks trying to write robust complex > systems, and for everyone in between -- these are all really important > constituencies. And unfortunately, there is a trade-off here, where > the changes we're discussing effect these constituencies differently. > But it's not just a matter of shifting around a fixed amount of pain; > the *quality* of the pain really changes under the different > proposals. > > In the status quo: > - for one-off scripts: you can just let the GC worry about generator > and file handle cleanup, re-use iterators, whatever, it's cool > - for robust systems: because it's the *caller's* responsibility to > ensure that iterators are cleaned up, you... kinda can't really use > generators without -- pick one -- (a) draconian style guides (like > forbidding 'with' inside generators or forbidding bare 'for' loops > entirely), (b) lots of auditing (every time you write a 'for' loop, go > read the source to the generator you're iterating over -- no > modularity for you and let's hope the answer doesn't change!), or (c) > introducing really subtle bugs. (Note: I've changed my preferred API name from "function_resource" + "frame_resource" to the general purpose "scoped_resource" - while it's somewhat jargony, which I consider unfortunate, the goal is to make the runtime scope of the resource match the lexical scope of the reference as closely as is feasible, and if folks are going to understand how Python manages references and resources, they're going to need to learn the basics of Python's scope management at some point) Given your points below, the defensive coding recommendation here would be to - always wrap your iterators in scoped_resource() to tell Python to clean them up when the function is done - explicitly call close_resources() after the affected for loops to clean the resources up early You'd still be vulnerable to resource leaks in libraries you didn't write, but would have decent control over your own code without having to make overly draconian changes to your style guide - you'd only need one new rule, which is "Whenever you're iterating over something, pass it through scoped_resource first". To simplify this from a forwards compatibility perspective (i.e. so it can implicitly adjust when an existing type gains a cleanup method), we'd make scoped_resource() quite permissive, accepting arbitrary objects with the following behaviours: - if it's a context manager, enter it, and register the exit callback - if it's not a context manager, but has a close() method, register the close method - otherwise, pass it straight through without taking any other action This would allow folks to always declare something as a scoped resource without impeding their ability to handle objects that aren't resources at all. The long term question would then become whether it made sense to have certain language constructs implicitly mark their targets as scoped resources *by default*, and clean them up selectively after the loop rather than using the blunt instrument of cleaning up all previously registered resources. If we did start seriously considering such a change, then there would be potential utility in an "unmanaged_iter()" wrapper which forwarded *only* the iterator protocol methods, thus hiding any __exit__() or close() methods from scoped_resource(). However, the time to consider such a change in default behaviour would be *after* we had some experience with explicit declarations and management of scoped resources - plenty of folks are writing plenty of software today in garbage collected languages (including Python), and coping with external resource management problems as they arise, so we don't need to do anything hasty here. I personally think an explicit solution is likely to be sufficient (given the caveat of adding a "gc.collect()" counterpart), with an API like `scoped_resource` being adopted over time in libraries, frameworks and applications based on actual defects found in running production systems as well as the defensive coding style, and your example below makes me even more firmly convinced that that's a better way to go. > In my proposal (for-scoped-iterclose): > - for robust systems: life is great -- you're still stopping to think > a little about cleanup every time you use an iterator (because that's > what it means to write robust code!), but since the iterators now know > when they need cleanup and regular 'for' loops know how to invoke it, > then 99% of the time (i.e., whenever you don't intend to re-use an > iterator) you can be confident that just writing 'for' will do exactly > the right thing, and the other 1% of the time (when you do want to > re-use an iterator), you already *know* you're doing something clever. > So the cognitive overhead on each for-loop is really low. In mine, if your style guide says "Use scoped_resource() and an explicit close_resources() call when iterating", you'd add it (or your automated linter would complain that it was missing). So the cognitive overhead is higher, but it would remain where it belongs (i.e. on professional developers being paid to write robust code). > - for one-off scripts: ~99% of the time (actual measurement, see > below) everything just works, except maybe a little bit better. 1% of > the time, you deploy the clever trick of re-using an iterator with > multiple for loops, and it breaks, so this is some pain. Here's what > you see: > > gen_obj = ... > for first_line in gen_obj: > break > for lines in gen_obj: > ... > > Traceback (most recent call last): > File "/tmp/foo.py", line 5, in > for lines in gen_obj: > AlreadyClosedIteratorError: this iterator was already closed, > possibly by a previous 'for' loop. (Maybe you want > itertools.preserve?) > > (We could even have a PYTHONDEBUG flag that when enabled makes that > error message include the file:line of the previous 'for' loop that > called __iterclose__.) > > So this is pain! But the pain is (a) rare, not pervasive, (b) > immediately obvious (an exception, the code doesn't work at all), not > subtle and delayed, (c) easily googleable, (d) easy to fix and the fix > is reliable. It's a totally different type of pain than the pain that > we currently impose on folks who want to write robust code. And it's completely unecessary - with explicit scoped_resource() calls absolutely nothing changes for the scripting use case, and even with implicit ones, re-use *within the same scope* would still be fine (you'd only get into trouble if the resource escaped the scope where it was first marked as a scoped resource). > Now compare to the new proposal (function-scoped-iterclose): > > - For those who want robust cleanup: Usually, I only need an iterator > for as long as I'm iterating over it; that may or may not correspond > to the end of the function (often won't). When these don't coincide, > it can cause problems. E.g., consider the original example from my > proposal: > > def read_newline_separated_json(path): > with open(path) as f: > for line in f: > yield json.loads(line) > > but now suppose that I'm a Data Scientist (tm) so instead of having 1 > file full of newline-separated JSON, I have a 100 gigabytes worth of > the stuff stored in lots of files in a directory tree. Well, that's no > problem, I'll just wrap that generator: > > def read_newline_separated_json_tree(tree): > for root, _, paths in os.walk(tree): > for path in paths: > for document in read_newline_separated_json(join(root, path)): > yield document If you're being paid to write robust code and are using Python 3.7+, then you'd add scoped_resource() around the read_newline_separated_json() call and then add a close_resources() call after that loop. That'd be part of your job, and just another point in the long list of reasons why developing software as a profession isn't the same thing as doing it as a hobby. We'd design scoped_resource() in such a way that it could be harmlessly wrapped around "paths" as well, even though we know that's technically not necessary (since it's just a list of strings). As noted above, I'm also open to the notion of some day making all for loops implicitly declare the iterators they operate on as scoped resources, but I don't think we should do that without gaining some experience with the explicit form first (where we can be confident that any unexpected negative consequences will be encountered by folks already well equipped to deal with them). > And then I'll run it on PyPy, because that's what you do when you have > 100 GB of string processing, and... it'll crash, because the call to > read_newline_separated_tree ends up doing thousands of calls to > read_newline_separated_json, but never cleans up any of them up until > the function exits, so eventually we run out of file descriptors. And we'll go "Oops", and refactor our code to better control the scope of our resources, either by adding a with statement around the innermost loop or using the new scoped resources API (if such a thing gets added). The *whole point* of iterative development is to solve the problems you know you have, not the problems you or someone else might potentially have at some point in the indeterminate future. > A similar situation arises in the main loop of something like an HTTP server: > > while True: > request = read_request(sock) > for response_chunk in application_handler(request): > send_response_chunk(sock) > > Here we'll accumulate arbitrary numbers of un-closed > application_handler generators attached to the stack frame, which is > no good at all. And this has the interesting failure mode that you'll > probably miss it in testing, because most clients will only re-use a > connection a small number of times. And the fixed code (given the revised API proposal above) looks like this: while True: request = read_request(sock) for response_chunk in scoped_resource(application_handler(request)): send_response_chunk(sock) close_resources() This pattern has the advantage of also working if the resources you want to manage aren't precisely what your iterating over, or if you're iterating over them in a while loop rather than a for loop. > So what this means is that every time I write a for loop, I can't just > do a quick "am I going to break out of the for-loop and then re-use > this iterator?" check -- I have to stop and think about whether this > for-loop is nested inside some other loop, etc. Or you unconditionally add the scoped_resource/close_resources calls to force non-reference-counted implementations to behave a bit more like CPython and don't worry about it further. > - For those who just want to write a quick script and not think about > it: here's a script that does repeated partial for-loops over a > generator object: > > https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588cb4/Tools/scripts/gprof2html.py#L40-L79 > > (and note that the generator object even has an ineffective 'with > open(...)' block inside it!) > > With the function-scoped-iterclose, this script would continue to work > as it does now. Excellent. As it would with the explicit scoped_resource/close_resources API. > But, suppose that I decide that that main() function is really > complicated and that it would be better to refactor some of those > loops out into helper functions. (Probably actually true in this > example.) So I do that and... suddenly the code breaks. And in a > rather confusing way, because it has to do with this complicated > long-distance interaction between two different 'for' loops *and* > where they're placed with respect to the original function versus the > helper function. I do agree the fact that it would break common code refactoring patterns is a good counter-argument against the idea of ever calling scoped_resource() implicitly. > Anyway, in summary: function-scoped-iterclose doesn't seem to > accomplish my goal of getting rid of the *type* of pain involved when > you have to run a background thread in your brain that's doing > constant paranoid checking every time you write a for loop. Instead it > arguably takes that type of pain and spreads it around both the > experts and the novices :-/. Does the addition of the explicit close_resources() API mitigate your concern? > Now, let's look at some evidence about how disruptive the two > proposals are for real code: > > As mentioned else-thread, I wrote a stupid little CPython hack [1] to > report when the same iterator object gets passed to multiple 'for' > loops, and ran the CPython and Django testsuites with it [2]. Looking > just at generator objects [3], across these two large codebases there > are exactly 4 places where this happens. The standard library and a web framework are in no way typical of Python application and scripting code. > 3) Django django/utils/regex_helper.py:236 > > This code is very similar to the previous example in its general > outline, except that the 'for' loops *have* been factored out into > utility functions. So in this case for-scoped-iterclose and > function-scoped-iterclose are equally disruptive. But explicitly scoped resource management leaves it alone. > 4) CPython's Lib/test/test_generators.py:723 > > I have to admit I cannot figure out what this code is doing, besides > showing off :-). But the different 'for' loops are in different stack > frames, so I'm pretty sure that for-scoped-iterclose and > function-scoped-iterclose would be equally disruptive. And explicitly scoped resource management again leaves it alone. > Obviously there's a bias here in that these are still relatively > "serious" libraries; I don't have a big corpus of one-off scripts that > are just a big __main__, though gprof2html.py isn't far from that. (If > anyone knows where to find such a thing let me know...) But still, the > tally here is that out of 4 examples, we have 1 subtle bug that > iterclose might have caught, 2 cases where for-scoped-iterclose and > function-scoped-iterclose are equally disruptive, and only 1 where > function-scoped-iterclose is less disruptive -- and in that case it's > arguably just avoiding an obvious error now in favor of a more > confusing error later. > > If this reduced the backwards-incompatible cases by a factor of, like, > 10x or 100x, then that would be a pretty strong argument in its favor. > But it seems to be more like... 1.5x. The explicit-API-only aspect of the proposal eliminates 100% of the backwards incompatibilities :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Oct 26 13:02:17 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Oct 2016 03:02:17 +1000 Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On 26 October 2016 at 08:48, Nathaniel Smith wrote: > If it takes a strong reference, then suddenly we're pinning all > iterators in memory until the end of the enclosing function, which > will often look like a memory leak. I think this would break a *lot* > more existing code than the for-scoped-iterclose proposal does, and in > more obscure ways that are harder to detect and warn about ahead of > time. It would take a strong reference, which is another reason why close_resources() would be an essential part of the explicit API (since it would drop the references in addition to calling the __exit__() and close() methods of the declared resources), and also yet another reason why you've convinced me that the only implicit API that would ever make sense is one that was scoped specifically to the iteration process. However, I still think the explicit-API-only suggestion is a much better path to pursue than any implicit proposal - it will give folks that see it for the first something to Google, and it's a general purpose technique rather than being restricted specifically to the cases where the resource to be managed and the iterator being iterated over are one and the same object. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Oct 26 13:25:27 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Oct 2016 03:25:27 +1000 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On 26 October 2016 at 09:15, Nathaniel Smith wrote: > Incidentally, PyPy's built-in REPL handles multi-line constructs like > IPython does, rather than like how the CPython built-in REPL does. > > There are a lot of logistic issues that would need to be dealt with > before CPython could consider making a third-party REPL the default or > anything like it... it looks like IPython's dependency tree is all > pure-Python, which makes it more viable, but it's still a lot of code > and on a very different development cycle than CPython. bpython > appears to depend on greenlet, which is a whole other issue... OTOH it > seems a little quixotic to spend lots of resources improving the > built-in REPL when there are much better ones with vibrant developer > communities. The built-in REPL serves two quite divergent use cases, and I think we're well past the point where we can't readily support both use cases with a single implementation: - a minimalist interactive environment, that is *always* present, even if parts of the interpreter (most notably the import system) aren't working or have been deliberately disabled - a day-to-day working environment for Python developers The prevalence of the latter use case then leads to it also being used as a tool for introducing new developers to Python. The problem is that of these two use cases, the current default REPL is really only *good* at the first one - for the latter, it's instead only "frequently good enough", since there are much better alternatives out there that can depend on the whole Python ecosystem rather than having to make the assumption that they should still basically work even if the import system isn't currently set up to bring in external modules. One possible path towards improving the situation might be to look at the PyPy REPL (which is presumably implemented in RPython) and see if that would be suitable for incorporation into CPython as a frozen module (perhaps with some modifications). That has the advantage of making the REPL much easier to iterate on (since you can use the non-frozen version for development), while still making it available at runtime as part of the core Python binary. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 26 14:24:52 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 27 Oct 2016 03:24:52 +0900 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <20161026014031.GH15983@ando.pearwood.info> Message-ID: <22544.62708.70463.758209@turnbull.sk.tsukuba.ac.jp> Mikhail V writes: > I need translate() which drops non-defined chars. Please :) import collections def translate_or_drop(string, table): """ string: a string to process table: a dict as accepted by str.translate """ return string.translate(collections.defaultdict(lambda: None, **table)) All OK now? From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Oct 26 14:58:01 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 27 Oct 2016 03:58:01 +0900 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> Message-ID: <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Mikhail V writes: > >That said, multiple methods is a valid option for the API. > > Certainly I like the look of distinct functions more. > It allows me to visually parse the code effectively, > so e.g. for str.remove() I would not need to look > in docs to understand what the function does. OK, as I said, you're in accord with Guido on that. His rationale is somewhat different, but that's OK. > Just in some cases I need to convert them to numpy arrays back and > forth, so this unicode vanity worries me a bit. I think you're borrowing trouble you actually don't have. Either way, the rest of the world *needs* Unicode to do their work, and it's not going to go away. On the positive side, turning a string into a list of codepoints is trivial: [ord(c) for c in string] > So I am just not the one who believes in these maximalistical "we > need over 9000 glyphs" talks. But you don't need to believe in it. What you do need to believe is that the rest of us believe that we need the union of our character sets as a single, universal character set. As it happens, although there are differences of opinion over how to handle Unicode in Python, there is consensus that Python does have to handle Unicode flexibly, effectively and efficiently. Believe me, it *is* a consensus. If you insist on bucking it, you'll have to do it pretty much alone, perhaps even maintaining your own fork of Python. From p.f.moore at gmail.com Wed Oct 26 15:24:52 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 26 Oct 2016 20:24:52 +0100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On 26 October 2016 at 18:25, Nick Coghlan wrote: > The built-in REPL serves two quite divergent use cases, and I think > we're well past the point where we can't readily support both use > cases with a single implementation: > > - a minimalist interactive environment, that is *always* present, even > if parts of the interpreter (most notably the import system) aren't > working or have been deliberately disabled > - a day-to-day working environment for Python developers > > The prevalence of the latter use case then leads to it also being used > as a tool for introducing new developers to Python. Thinking a little further about this, I think the reason I don't use IPython more, is because my muscle memory types "python" (or more often, "py") when I want an interactive prompt. And I do that for the reason you mention - it's always there. So I think that it would be really useful to be able to plug in a new REPL, when it's available. This has a number of benefits: 1. We don't need to worry about incorporating any complex REPL code into Python. The default REPL can remain simple. 2. Users can choose their preferred REPL, core Python doesn't have to get involved in UI decisions. The downside, of course, is that the default behaviour is inconsistent - new users could attend a course where IPython was preinstalled, but then struggle when back at the office because no-one told them how to set it up. > One possible path towards improving the situation might be to look at > the PyPy REPL (which is presumably implemented in RPython) and see if > that would be suitable for incorporation into CPython as a frozen > module (perhaps with some modifications). That has the advantage of > making the REPL much easier to iterate on (since you can use the > non-frozen version for development), while still making it available > at runtime as part of the core Python binary. I've never used the PyPy REPL, so I can't speak to its features. But this seems to me to simply be a matter of incremental improvement to the standard REPL, That's no bad thing, but as you pointed out at the start, we can't support both of our use cases with a single implementation, so this doesn't solve the fundamental problem - it merely alters the breakpoint at which people need to learn not to fire up the REPL, but rather to start up IPython, or bpython, or their environment of choice. And it doesn't do much for people who (like me) type "python" instinctively, and only realise they needed something better part way through their session. Paul From sjoerdjob at sjoerdjob.com Wed Oct 26 16:30:05 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Wed, 26 Oct 2016 22:30:05 +0200 Subject: [Python-ideas] Fwd: unpacking generalisations for list comprehension In-Reply-To: <20161026162706.GK15983@ando.pearwood.info> References: <20161012154224.GT22471@ando.pearwood.info> <12ccec58-9123-4e6e-a81c-74f3fd994699@googlegroups.com> <58007475.9010306@canterbury.ac.nz> <918d4ff5-c05b-8c65-036a-363412e66703@btinternet.com> <20161026162706.GK15983@ando.pearwood.info> Message-ID: <20161026203005.GL13170@sjoerdjob.com> On Thu, Oct 27, 2016 at 03:27:07AM +1100, Steven D'Aprano wrote: > I think that there is zero hope of consistency for * the star operator. > That horse has bolted. It is already used for: > > - ... > - "zero or more of the previous element" in regular expressions > - "zero or more of any character" in globbing > - ... After having read this multiple times, I still can't really understand why these two matter to the discussion at hand. It's also used to mark emphasised text in Markdown, lists in Markdown. You can also use it for footnotes in plain text. It also has a special meaning in robots.txt files. Yes, I agree with you that the meaning of the asterisk symbol is quite overloaded on the syntax level already. But I think that mentioning regexes and globbing is a bit of a red herring. From nbadger1 at gmail.com Wed Oct 26 15:41:10 2016 From: nbadger1 at gmail.com (Nick Badger) Date: Wed, 26 Oct 2016 12:41:10 -0700 (PDT) Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> Am Freitag, 14. Oktober 2016 23:11:48 UTC-7 schrieb Nick Coghlan: > > > Regarding the spelling details, my current preferences are as follows: > > * None-coalescing operator: x ?or y > * None-severing operator: x ?and y > * None-coalescing augmented assignment: x ?= y > * None-severing attribute access: x?.attr > * None-severing subscript lookup: x?[expr] > > This is, more or less, the syntax added in Nick's PEP 531 draft . The reddit discussion about it raised some pretty major concerns about clarity, and I have to admit, I think if you're learning Python as a first language, the ?and, ?else, x?.attr, etc syntax is likely to be very confusing. For me personally, combining a new operator "?" with existing keywords like "and" or "else" just does not make any intuitive sense. I definitely see the value, though, in particular of None-severing, especially as a tool to explicitly specify which attr can be missing -- ie, disambiguating which attribute is missing in a foo.bar.baz lookup (the alternative to which is nested try: except AttributeError: blocks, which gets very messy very quickly). I'm on board with the idea, and I can absolutely imagine using it in my code, but I disagree on the spelling. A thought I had (perhaps more readable in a reddit comment ) is to condense everything into a single "?" symbol, used for: + Coalescing binary operator: foo ? bar + Coalescing augmented assignment operator: foo ?= bar + Severing unary operator: ?foo *Pseudocode binary operator examples:* >>> foo_exists ? bar_never_evaluated foo_exists >>> foo_missing ? foo_exists foo_exists >>> foo_missing ? bar_missing foo_missing *Pseudocode augmented examples:* >>> foo_exists = 'foo' >>> foo_exists ?= bar_never_evaluated >>> foo_exists == 'foo' True >>> foo = Missing >>> bar_exists = 'bar' >>> foo ?= bar_exists >>> foo == 'bar' True >>> foo = None >>> bar_missing = Missing >>> foo ?= bar_missing >>> foo == None True *Pseudocode unary examples:* >>> ?(foo_exists).bar.baz foo_exists.bar.baz >>> ?(foo_exists)[bar][baz] foo_exists[bar][baz] >>> ?(foo_missing).bar.baz Missing >>> ?(foo_missing)[bar][baz] Missing >>> ?(foo_exists).bar.baz_missing Traceback... AttributeError: object has no attribute 'baz_missing' >>> ?(foo_exists)[bar][baz_missing] Traceback... KeyError: 'baz_missing' >>> ?(foo_missing).bar.baz_missing Missing >>> ?(foo_missing)[bar][baz_missing] Missing I personally think that's substantially more readable, but I suppose that's at least somewhat a matter of personal preference. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Wed Oct 26 16:43:48 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 26 Oct 2016 13:43:48 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: (Paul Moore's message of "Wed, 26 Oct 2016 20:24:52 +0100") References: Message-ID: <87pommhmd7.fsf@thinkpad.rath.org> On Oct 26 2016, Paul Moore wrote: > On 26 October 2016 at 18:25, Nick Coghlan wrote: >> The built-in REPL serves two quite divergent use cases, and I think >> we're well past the point where we can't readily support both use >> cases with a single implementation: >> >> - a minimalist interactive environment, that is *always* present, even >> if parts of the interpreter (most notably the import system) aren't >> working or have been deliberately disabled >> - a day-to-day working environment for Python developers >> >> The prevalence of the latter use case then leads to it also being used >> as a tool for introducing new developers to Python. > > Thinking a little further about this, I think the reason I don't use > IPython more, is because my muscle memory types "python" (or more > often, "py") when I want an interactive prompt. And I do that for the > reason you mention - it's always there. > > So I think that it would be really useful to be able to plug in a new > REPL, when it's available. This has a number of benefits: > > 1. We don't need to worry about incorporating any complex REPL code > into Python. The default REPL can remain simple. > 2. Users can choose their preferred REPL, core Python doesn't have to > get involved in UI decisions. Uh, these are not advantages of plugging in a new REPL when available. This describes the current situation. At least in your email above you seem to be arguing that Python should change to better accomodate your muscle memory. I don't want to diminuish your importance, but doesn't that maybe go a little too far? :-). Best, -Niko -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From p.f.moore at gmail.com Wed Oct 26 17:03:19 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 26 Oct 2016 22:03:19 +0100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: <87pommhmd7.fsf@thinkpad.rath.org> References: <87pommhmd7.fsf@thinkpad.rath.org> Message-ID: On 26 October 2016 at 21:43, Nikolaus Rath wrote: >> So I think that it would be really useful to be able to plug in a new >> REPL, when it's available. This has a number of benefits: >> >> 1. We don't need to worry about incorporating any complex REPL code >> into Python. The default REPL can remain simple. >> 2. Users can choose their preferred REPL, core Python doesn't have to >> get involved in UI decisions. > > Uh, these are not advantages of plugging in a new REPL when > available. This describes the current situation. I'm confused. With regard to (1), the current situation is that there may be benefit to improving some aspects of the REPL. But we have no option if we want to do that, other than modifying the standard REPL. And for (2), sure users can choose the REPL they use (by running a different application such as IPython), but they can't change which is the *default* REPL (the one the "python" command provides). You can disagree as to how significant those benefits are, but they are not things we have right now. > At least in your email above you seem to be arguing that Python should > change to better accomodate your muscle memory. I don't want to > diminuish your importance, but doesn't that maybe go a little too far? > :-). You're misinterpreting me. I'm saying that people in general are used to getting an interactive prompt when they run Python. I'm suggesting that being able to configure a better REPL for that situation would be useful, because it allows people to gain the benefit of enhanced capabilities, while not having to learn (and remember to use) a new command. I'm also saying that making the REPL pluggable means that core Python doesn't have to get into the business of developing/maintaining a better REPL. Yes, I'd find the capability useful. Would it help my credibility if I proposed a change that made my life harder? :-) Paul From toddrjen at gmail.com Wed Oct 26 17:11:59 2016 From: toddrjen at gmail.com (Todd) Date: Wed, 26 Oct 2016 17:11:59 -0400 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On Wed, Oct 26, 2016 at 3:24 PM, Paul Moore wrote: > On 26 October 2016 at 18:25, Nick Coghlan wrote: > > The built-in REPL serves two quite divergent use cases, and I think > > we're well past the point where we can't readily support both use > > cases with a single implementation: > > > > - a minimalist interactive environment, that is *always* present, even > > if parts of the interpreter (most notably the import system) aren't > > working or have been deliberately disabled > > - a day-to-day working environment for Python developers > > > > The prevalence of the latter use case then leads to it also being used > > as a tool for introducing new developers to Python. > > Thinking a little further about this, I think the reason I don't use > IPython more, is because my muscle memory types "python" (or more > often, "py") when I want an interactive prompt. And I do that for the > reason you mention - it's always there. > > So I think that it would be really useful to be able to plug in a new > REPL, when it's available. This has a number of benefits: > > Isn't this what aliases are for? Just set "python" to be an alias for "ipython" for your interactive shell. Personally, my muscle memory is trained to always type "ipython3". I only type "python" or "python3" when I specifically want a vanilla shell (such as for trying things out without all my default imports, which I know could be done with profiles but that is even more typing). Having "python3" somehow changed to "ipython3" automatically would make things more difficult for me since I would need to do something more complicated to get back the vanilla shell when I need it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 26 17:18:31 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 26 Oct 2016 22:18:31 +0100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On 26 October 2016 at 22:11, Todd wrote: > Isn't this what aliases are for? Just set "python" to be an alias for > "ipython" for your interactive shell. I hadn't thought of that option. I might give it a try. Although I'm not sure how I'd set up a Powershell function (I'm on Windows) that would wrap the launcher (which selects the version of Python to use) and invoke IPython. I'll give it a go, though. Paul From Nikolaus at rath.org Wed Oct 26 17:40:36 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 26 Oct 2016 14:40:36 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: (Paul Moore's message of "Wed, 26 Oct 2016 20:24:52 +0100") References: Message-ID: <87mvhqhjqj.fsf@thinkpad.rath.org> On Oct 26 2016, Paul Moore wrote: > Thinking a little further about this, I think the reason I don't use > IPython more, is because my muscle memory types "python" (or more > often, "py") when I want an interactive prompt. And I do that for the > reason you mention - it's always there. > > The downside, of course, is that the default behaviour is inconsistent > - new users could attend a course where IPython was preinstalled, but > then struggle when back at the office because no-one told them how to > set it up. It also imposes a significant burden on scripting. I often have elements like this in shell scripts: output=$(python < References: <87mvhqhjqj.fsf@thinkpad.rath.org> Message-ID: On 26 October 2016 at 22:40, Nikolaus Rath wrote: > It also imposes a significant burden on scripting. I often have elements > like this in shell scripts: > > output=$(python < import h5py > with h5py.File('foo', 'r') as fh: > print((fh['bla'] * fh['com']).sum()) > EOF > ) > > If this now starts up IPython, it'll be *significantly* slower. Good point. We could, of course, detect when stdin is non-interactive, but at that point the code is starting to get unreasonably complex, as well as having way too many special cases. So I agree, that probably kills the proposal. Paul From mikhailwas at gmail.com Wed Oct 26 17:48:58 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Wed, 26 Oct 2016 23:48:58 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On 26 October 2016 at 20:58, Stephen J. Turnbull wrote: >import collections >def translate_or_drop(string, table): > """ > string: a string to process > table: a dict as accepted by str.translate > """ > return string.translate(collections.defaultdict(lambda: None, **table)) >All OK now? Not really. I tried with a simple example intab = "ae" outtab = "XM" table = string.maketrans(intab, outtab) collections.defaultdict(lambda: None, **table) an this gives me TypeError: type object argument after ** must be a mapping, not str But I probably I misunderstood the idea. Anyway this code does not make much sence to me, I would never in life understand what is meant here. And in my not so big, but not so small, Python experience I *never* had an occasion using collections or lambda. >sets as a single, universal character set. As it happens, although >there are differences of opinion over how to handle Unicode in Python, >there is consensus that Python does have to handle Unicode flexibly, >effectively and efficiently. > I was merely talking about syntax and sources files standard, not about unicode strings. No doubt one needs some way to store different glyph sets. So I was talking about that if one defines a syntax and has good intentions for readability in mind, there is not so many rationale to adopt the syntax to current "hybrid" system: 7-bit and/or multibyte paradigm. Again this a too far going discussion, but one should not probably much look ahead on those. The situation is not so good in this sense that most standard software is attached to this strange paradigm (even those which does not have anything to do with multi-lingual typography). So IMO something gone wrong with those standard characters. >If you insist on bucking it, you'll >have to do it pretty much alone, perhaps even maintaining your own >fork of Python. As for me I would take the path of developing of own IDE which will enable typografic quality rendering and of course all useful glyphs, such as curly quotes, bullets, etc, which all is fundamental to any possible improvements of cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for sure. So if Python will take the path of "unicode" code input (e.g. for some punctuaion characters) this would only add a minor issue for generating valid Python source files in this case. Mikhail From cody.piersall at gmail.com Wed Oct 26 18:16:44 2016 From: cody.piersall at gmail.com (Cody Piersall) Date: Wed, 26 Oct 2016 17:16:44 -0500 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: <87mvhqhjqj.fsf@thinkpad.rath.org> Message-ID: On Wed, Oct 26, 2016 at 4:48 PM, Paul Moore wrote: > Good point. We could, of course, detect when stdin is non-interactive, > but at that point the code is starting to get unreasonably complex, as > well as having way too many special cases. So I agree, that probably > kills the proposal. Isn't that check really just an isatty() check? Or is that not reliable enough for some reason? Here's some code that performs that check, and works on Linux and Windows: #include #ifdef _WIN32 # include # define isatty _isatty # define fileno _fileno #else # include #endif int main( void ) { /* If stdin is a tty, you would launch the user's configured REPL*/ if(isatty(fileno(stdin))) printf("stdin has not been redirected to a file\n"); /* Otherwise, launch the default REPL (or maybe don't launch a REPL at all, * and just treat stdin as a file */ else printf("stdin has been redirected to a file\n"); } From njs at pobox.com Wed Oct 26 18:24:47 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 26 Oct 2016 15:24:47 -0700 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: <87mvhqhjqj.fsf@thinkpad.rath.org> Message-ID: On Wed, Oct 26, 2016 at 3:16 PM, Cody Piersall wrote: > On Wed, Oct 26, 2016 at 4:48 PM, Paul Moore wrote: >> Good point. We could, of course, detect when stdin is non-interactive, >> but at that point the code is starting to get unreasonably complex, as >> well as having way too many special cases. So I agree, that probably >> kills the proposal. > > Isn't that check really just an isatty() check? Or is that not > reliable enough for some reason? Here's some code that performs that > check, and works on Linux and Windows: It might or might not be an isatty() check (it's actually a bit more complicated, there's -i and various other things to take into account), but it hardly matters -- Python already has well-defined logic for deciding whether it should launch an interactive REPL or not. If we were going to do this, we'd keep that logic in place while swapping out the actual start_a_REPL() call with something else. There might be showstoppers here but I don't think this is one of them :-) -n -- Nathaniel J. Smith -- https://vorpus.org From chris.barker at noaa.gov Wed Oct 26 18:17:42 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Oct 2016 15:17:42 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: I"ve lost track of what (If anything) is actually being proposed here... so I"m going to try a quick summary: 1) an easy way to spell "remove all the characters other than these" I think that's a good idea. What with unicode having an enormous number of code points, it really does make sense to have a way to specify only what you want, rather than what you don't want. Back in the good old days of 1-byte chars, it wasn't hard to build up a full 256 element translate table -- not so much anymore. And one of the whole points of str.translate() is good performance. a) a new method: str.remove_all_but(sequence_of_chars) (naming TBD) b) a new flag in translate (Kind of like the decode keywords) str.translate(table, missing='ignore'|'remove') (b) has the advantage of adding translation and removal in one fell swoop -- but if you only want to remove, then you have to make a translation table of 1:1 mappings = not hard, but a annoying: table = {c:c for c in sequence_of_chars} I'm on the fence about what I personally prefer. 2) (in another thread, but similar enough) being able to pass in more than one string to replace: str.replace( old=seq_of_strings, new=seq_of_strings ) I know I've wanted this a lot, and certainly from a performance perspective, it could be a nice bonus. But: It overlaps a lot with str.translate -- at least for single character replacements. so really why? so it would really only make sense if supported multi-char strings: str.replace(old = ("aword", "another_word"), ("something", "something else")) However: a string IS a sequence of strings, so we'd have confusion about that: str.replace("this", "four") Does the user want the word "this" replaced with the word "four" -- or do they want each character replaced? Maybe we'd need a .replace_many() method? ugh! There are also other issues with what to di with repeated / overlapping cahractors: str.replace( ("aaa", "a", "b"), ("b", "bbb", "a") and all sort of other complications! THAT I think could be nailed down by defining the "order of operations" Does it lop through the entire string for each item? or through each item for each point in the string? note that if you loop thorugh the entire string for each item, you might as well have written the loop yourself: for old, new in sip(old_list, new_list): s = s.replace(old, new)) and at least if the length of the string si long-ish, and the number of replacements short-ish -- performance would be fine. *** So the question is -- is there support for these enhancements? If so, then it would be worth hashing ot the details. But the next question is -- does anyone care enough to manage that process -- it'll be a lot of work! NOTE: there has also been a fair bit of discussion in this thread about ordinals vs characters, and unicode itself -- I don't think any of that resulted in any possible proposals... -CHB On Wed, Oct 26, 2016 at 2:48 PM, Mikhail V wrote: > On 26 October 2016 at 20:58, Stephen J. Turnbull > wrote: > >import collections > >def translate_or_drop(string, table): > > """ > > string: a string to process > > table: a dict as accepted by str.translate > > """ > > return string.translate(collections.defaultdict(lambda: None, > **table)) > > >All OK now? > > Not really. I tried with a simple example > intab = "ae" > outtab = "XM" > table = string.maketrans(intab, outtab) > collections.defaultdict(lambda: None, **table) > > an this gives me > TypeError: type object argument after ** must be a mapping, not str > > But I probably I misunderstood the idea. Anyway this code does not make > much sence to me, I would never in life understand what is meant here. > And in my not so big, but not so small, Python experience I *never* had > an occasion using collections or lambda. > > >sets as a single, universal character set. As it happens, although > >there are differences of opinion over how to handle Unicode in Python, > >there is consensus that Python does have to handle Unicode flexibly, > >effectively and efficiently. > > > > I was merely talking about syntax and sources files standard, not about > unicode > strings. No doubt one needs some way to store different glyph sets. > > So I was talking about that if one defines a syntax and has good intentions > for readability in mind, there is not so many rationale to adopt the syntax > to current "hybrid" system: 7-bit and/or multibyte paradigm. > Again this a too far going discussion, but one should not probably much > look ahead on those. The situation is not so good in this sense that most > standard software is attached to this strange paradigm > (even those which does not have anything > to do with multi-lingual typography). > So IMO something gone wrong with those standard characters. > > >If you insist on bucking it, you'll > >have to do it pretty much alone, perhaps even maintaining your own > >fork of Python. > > As for me I would take the path of developing of own IDE which will enable > typografic quality rendering and of course all useful glyphs, such as > curly quotes, > bullets, etc, which all is fundamental to any possible improvements of > cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for > sure. > So if Python will take the path of "unicode" code input (e.g. for some > punctuaion characters) > this would only add a minor issue for generating valid Python source > files in this case. > > > Mikhail > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed Oct 26 18:29:31 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 26 Oct 2016 23:29:31 +0100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On 26/10/2016 20:24, Paul Moore wrote: > On 26 October 2016 at 18:25, Nick Coghlan wrote: >> The built-in REPL serves two quite divergent use cases, and I think >> we're well past the point where we can't readily support both use >> cases with a single implementation: >> >> - a minimalist interactive environment, that is *always* present, even >> if parts of the interpreter (most notably the import system) aren't >> working or have been deliberately disabled >> - a day-to-day working environment for Python developers >> >> The prevalence of the latter use case then leads to it also being used >> as a tool for introducing new developers to Python. > > Thinking a little further about this, I think the reason I don't use > IPython more, is because my muscle memory types "python" (or more > often, "py") when I want an interactive prompt. And I do that for the > reason you mention - it's always there. > > So I think that it would be really useful to be able to plug in a new > REPL, when it's available. This has a number of benefits: > > 1. We don't need to worry about incorporating any complex REPL code > into Python. The default REPL can remain simple. > 2. Users can choose their preferred REPL, core Python doesn't have to > get involved in UI decisions. > > The downside, of course, is that the default behaviour is inconsistent > - new users could attend a course where IPython was preinstalled, but > then struggle when back at the office because no-one told them how to > set it up. > I'll just say that on Windows 10 I have ConEmu installed, and I edit the startup file to point me to umpteen different places where I want to work. Ipython is one of them. Of course it is extremely difficult to install. My understanding is that on Windows folk find it difficult to type:- pip install ipython What have I missed? -- Mark Lawrence From python at mrabarnett.plus.com Wed Oct 26 18:48:27 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 26 Oct 2016 23:48:27 +0100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: <78bf676a-597a-1aae-58d6-d1cc5873cbe9@mrabarnett.plus.com> On 2016-10-26 23:17, Chris Barker wrote: > I"ve lost track of what (If anything) is actually being proposed here... > so I"m going to try a quick summary: > > > 1) an easy way to spell "remove all the characters other than these" > > I think that's a good idea. What with unicode having an enormous number > of code points, it really does make sense to have a way to specify only > what you want, rather than what you don't want. > > Back in the good old days of 1-byte chars, it wasn't hard to build up a > full 256 element translate table -- not so much anymore. And one of the > whole points of str.translate() is good performance. > > a) a new method: > > str.remove_all_but(sequence_of_chars) > (naming TBD) > > b) a new flag in translate (Kind of like the decode keywords) > > str.translate(table, missing='ignore'|'remove') > c) pass a function that returns the replacement: def replace(c): return c.upper() if c.isalpha() else '' str.translate(replace) The replacement function could be called only on distinct codepoints. > > (b) has the advantage of adding translation and removal in one fell > swoop -- but if you only want to remove, then you have to make a > translation table of 1:1 mappings = not hard, but a annoying: > > table = {c:c for c in sequence_of_chars} > > I'm on the fence about what I personally prefer. > > 2) (in another thread, but similar enough) being able to pass in more > than one string to replace: > > str.replace( old=seq_of_strings, new=seq_of_strings ) > > I know I've wanted this a lot, and certainly from a performance > perspective, it could be a nice bonus. > > But: It overlaps a lot with str.translate -- at least for single > character replacements. so really why? so it would really only make > sense if supported multi-char strings: > > str.replace(old = ("aword", "another_word"), ("something", "something > else")) > > However: a string IS a sequence of strings, so we'd have confusion about > that: > > str.replace("this", "four") > > Does the user want the word "this" replaced with the word "four" -- or > do they want each character replaced? > > Maybe we'd need a .replace_many() method? ugh! > > There are also other issues with what to di with repeated / overlapping > cahractors: > > str.replace( ("aaa", "a", "b"), ("b", "bbb", "a") > > and all sort of other complications! > Possible choices are: 1) Use the given order. 2) Check from the longest to the shortest. If you're going to pick choice 2, does it have to be 2 tuples/lists? Why not a dict instead? > THAT I think could be nailed down by defining the "order of operations" > Does it lop through the entire string for each item? or through each > item for each point in the string? note that if you loop thorugh the > entire string for each item, you might as well have written the loop > yourself: > > for old, new in sip(old_list, new_list): > s = s.replace(old, new)) > > and at least if the length of the string si long-ish, and the number of > replacements short-ish -- performance would be fine. > > > *** So the question is -- is there support for these enhancements? If > so, then it would be worth hashing ot the details. > > But the next question is -- does anyone care enough to manage that > process -- it'll be a lot of work! > > NOTE: there has also been a fair bit of discussion in this thread about > ordinals vs characters, and unicode itself -- I don't think any of that > resulted in any possible proposals... > [snip] From tim.mitchell at leapfrog3d.com Wed Oct 26 19:05:31 2016 From: tim.mitchell at leapfrog3d.com (Tim Mitchell) Date: Thu, 27 Oct 2016 12:05:31 +1300 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: Mark, Windows folk do not type pip install ipython On windows it's much easier: 1) install pycharm (because it has UI for installing packages) 2) Go to settings > project interpreter 3) select the python interpeter you want to use 4) click the + button 5) search through the entire pypi listing for IPython 6) click install package (Off topic I know - but I couldn't resist!) >> >> > I'll just say that on Windows 10 I have ConEmu installed, and I edit the > startup file to point me to umpteen different places where I want to work. > Ipython is one of them. Of course it is extremely difficult to install. > My understanding is that on Windows folk find it difficult to type:- > > pip install ipython > > What have I missed? > > -- > > Mark Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Oct 26 19:13:26 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 10:13:26 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161026231325.GL15983@ando.pearwood.info> On Wed, Oct 26, 2016 at 03:37:54AM +0200, Mikhail V wrote: > Extended ASCII There are over 200 different, mutually incompatible, so-called "extended ASCII" code pages and encodings. And of course it is ludicruous to think that you can fit all the world's characters into only 8-bits. There are more than 40,000 just from China alone, which makes it impossible to fit into 16-bits. > So we all must repent now and get back to 8-bit charcters. Please stop wasting everyone's time trying to set the clock back to the 1980s. -- Steve From steve at pearwood.info Wed Oct 26 19:18:07 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 10:18:07 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161026231803.GM15983@ando.pearwood.info> On Wed, Oct 26, 2016 at 08:59:20AM +1100, Chris Angelico wrote: > So should French programmers write string literals ?like this?? Not unless they want to get in trouble from the Acad?mie fran?aise. They should write them ? like this ?. *wink* -- Steve From rosuav at gmail.com Wed Oct 26 19:20:59 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 27 Oct 2016 10:20:59 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <20161026231803.GM15983@ando.pearwood.info> References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231803.GM15983@ando.pearwood.info> Message-ID: On Thu, Oct 27, 2016 at 10:18 AM, Steven D'Aprano wrote: > Not unless they want to get in trouble from the Acad?mie fran?aise. They > should write them ? like this ?. ? comme ?a ? ? (Okay, I'm done) ChrisA From steve at pearwood.info Wed Oct 26 20:01:50 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 11:01:50 +1100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: <87mvhqhjqj.fsf@thinkpad.rath.org> References: <87mvhqhjqj.fsf@thinkpad.rath.org> Message-ID: <20161027000149.GN15983@ando.pearwood.info> On Wed, Oct 26, 2016 at 02:40:36PM -0700, Nikolaus Rath wrote: > It also imposes a significant burden on scripting. "It" being a configurable REPL. > I often have elements like this in shell scripts: > > output=$(python < import h5py > with h5py.File('foo', 'r') as fh: > print((fh['bla'] * fh['com']).sum()) > EOF > ) > > If this now starts up IPython, it'll be *significantly* slower. Surely this won't be a real problem in practice. Unless you give -i as a command line switch, calling Python as a script shouldn't need to run the REPL. But even if it did, if Python's REPL was configurable, there has to be a way to configure it. And presumably by default it would fall back to the standard vanilla REPL. So "python" in your shell script will probably refer to "the system Python, using the default REPL" rather than whatever personalised Python + REPL your own personal environment sets up for when you type "python" at the shell prompt. And if not, then something like this: output=$(python --use-std-repl < References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On 27 October 2016 at 01:13, Steven D'Aprano wrote: > On Wed, Oct 26, 2016 at 03:37:54AM +0200, Mikhail V wrote: > >> Extended ASCII > > There are over 200 different, mutually incompatible, so-called > "extended ASCII" code pages and encodings. > > And of course it is ludicruous to think that you can fit all the world's > characters into only 8-bits. There are more than 40,000 just from China > alone, which makes it impossible to fit into 16-bits. > > >> So we all must repent now and get back to 8-bit charcters. > > Please stop wasting everyone's time trying to set the clock back to the > 1980s. In 1980 I was not even born. Would be an intersting experience to set the clock to the time where you did not exist 8-\. And what is so bad in having, say 2 tables: 1) what is now considered as standard unicode 2) a table with characters that are reasonably valuable and cover 99% of all programming, communuication and typography in latin script ??? And where did I say I want to fit all possible chars in 8-bit? All possible chars = infinite amount of chars. Mikhail From mikhailwas at gmail.com Wed Oct 26 20:32:55 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 27 Oct 2016 02:32:55 +0200 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On 27 October 2016 at 00:17, Chris Barker wrote: > I"ve lost track of what (If anything) is actually being proposed here... so > I"m going to try a quick summary: > > > 1) an easy way to spell "remove all the characters other than these" > > I think that's a good idea. What with unicode having an enormous number of > code points, it really does make sense to have a way to specify only what > you want, rather than what you don't want. > > Back in the good old days of 1-byte chars, it wasn't hard to build up a full > 256 element translate table -- not so much anymore. And one of the whole > points of str.translate() is good performance. > > a) a new method: > > str.remove_all_but(sequence_of_chars) > (naming TBD) > > b) a new flag in translate (Kind of like the decode keywords) > > str.translate(table, missing='ignore'|'remove') > > > (b) has the advantage of adding translation and removal in one fell swoop -- > but if you only want to remove, then you have to make a translation table of > 1:1 mappings = not hard, but a annoying: Exactly that is the proposal. And for same exact reason that you point out, I also can't give a comment what would be better. It would be indeed quite strange from syntactical POV if I just want to remove "all except" and must call translate(). So ideally both should exist I think. Mikhail From steve at pearwood.info Wed Oct 26 20:49:18 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 11:49:18 +1100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: <20161027004916.GO15983@ando.pearwood.info> On Tue, Oct 25, 2016 at 10:13:54PM +0100, Paul Moore wrote: > I've seen a lot of syntax proposals recently that are based around > providing better ways of writing "one liner" styles of code. [...] > However, looking at them from the point of view of someone working at > the interactive prompt, they can seem much more attractive. I spend a lot of time working at the interactive prompt, and I don't agree that these proposals seem more attractive for that reason. [...] > But these limitations are not inherent to Python - they are problems > with the interactive prompt, which is fairly basic[1]. It really isn't that basic. But you know that, as your footnote points out ;-) > So maybe it's > worth looking at the root issue, how to make the interactive prompt > easier to use[2]? I'm completely happy with moves to make the REPL better, but it won't make a lick of difference. Some people simply like one-liners, and won't be happy until they can write an entire application as a single, short one-liner. Look at the popularity of Perl. Or for that matter, APL. Some people are more concerned with minimizing the amount of vertical space a chunk of code users, or horizontal space, or keystrokes. Of course we *should* care about these things! We may disagree about the relative weight we give to these factors, not that they are important. There's always going to be tension between those who want to save just one more line and those who think oh no not another cryptic symbol to memorize. I think the Python community has managed to fit itself into a nice position in the "Goldilocks Zone" of syntax: not too much, not too little. But there's always going to be disagreements over precisely how much is too much, and a better REPL isn't going to stop that. There's space for lots of languages in the world, and I'm glad there is room for those who love Perl one-liners to write Perl. I just don't want them turning Python into a second-rate Perl :-) > But that's something of a solved problem. IPython offers a rich > interactive environment, for people who find the limitations of the > standard interactive prompt frustrating. Would it be worth the > standard Python documentation promoting IPython for that role? IPython is no panacea. It's too rich, too powerful for many users. Not everybody wants to replace their regular shell (say, bash) with IPython. And some of us don't like IPython's [in] [out] prompts and all those unnecessary blank lines. We *should* promote third-party REPLs like IPython and BPython as alternatives for power users. http://bpython-interpreter.org/ But we shouldn't use that as an excuse to neglect the standard REPL, which is pretty good. For my uses, there's not a lot that I think is missing: Python's REPL on Linux with readline is pretty darn good. Perhaps the main feature that I miss is this: http://bugs.python.org/issue22228 > Maybe > even, if IPython is available, allowing the user to configure Python > to use it by default as the interactive prompt I feel that is more something that people should configure their personal desktop or shell environment for, e.g. by setting up an alias. > (a bit like readline, > but I dislike the way you can't switch off readline integration if > it's installed)? This comment surprises me. To me, that's like saying "I dislike the way you can't switch off breathing" -- readline is almost indispensible. The REPL experience without line editing (apart from backspace) and history is *horrible*. Why would you want to switch it off? My hobby is collecting old Python versions, some of which are old enough that they don't support readline. (Or were never configured correctly for readline.) I don't use them often, but when I do, the experience is painful. > Ideally, if IPython was more readily available, fewer > users would be frustrated with Python's existing multi-line > constructs. And those that were, would have the option of looking into > custom IPython magic commands, before being forced to request language > changes. I don't think that the interactive prompt is what drives these requests. I think they are usually driven by differences in the weight people give to certain desirable characteristics of code. Some people have a higher tolerance towards terse special syntax; some people have a higher tolerance towards verbosity. Neither is completely wrong (although I feel that COBOL-like verbosity is probably less harmful than extremely terse languages like specialist code-golf languages). There will always be disagreements. > [1] On the other hand, the interactive prompt is a huge part of what > makes Python so great - these days, when I have to code in languages > that don't have an interactive prompt, it drives me nuts. And even > those that do, typically don't have one as good as Python's (in spite > of the fact that this whole mail is about needing to improve the > Python REPL). Exactly. -- Steve From chris.barker at noaa.gov Wed Oct 26 21:51:00 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Oct 2016 18:51:00 -0700 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On Wed, Oct 26, 2016 at 5:10 PM, Mikhail V wrote: > 2) a table with characters that are reasonably valuable > and cover 99% of all programming, communication and typography in latin > script > I think it's called latin-1 And I think you've mentioned numpy - there was a discussion a while back about having a one-byte-per-character string type (the existing ones are 4 byte unicode and kinda-sort-py2-string/bytes dtype) perhaps you might want to revive that conversation. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Oct 26 21:55:13 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Oct 2016 18:55:13 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: <78bf676a-597a-1aae-58d6-d1cc5873cbe9@mrabarnett.plus.com> References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> <78bf676a-597a-1aae-58d6-d1cc5873cbe9@mrabarnett.plus.com> Message-ID: On Wed, Oct 26, 2016 at 3:48 PM, MRAB wrote: > str.replace( ("aaa", "a", "b"), ("b", "bbb", "a") >> >> and all sort of other complications! >> >> > 2) Check from the longest to the shortest. > > If you're going to pick choice 2, does it have to be 2 tuples/lists? Why > not a dict instead? > then we have a string.translate() that accepts a table of string replacements, rather than individual character replacements -- maybe a good idea! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Oct 26 22:00:28 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Oct 2016 19:00:28 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Oct 26, 2016 at 5:32 PM, Mikhail V wrote: > > (b) has the advantage of adding translation and removal in one fell > swoop -- > > but if you only want to remove, then you have to make a translation > table of > > 1:1 mappings = not hard, but a annoying: > > Exactly that is the proposal. And for same exact reason that you point out, > I also can't give a comment what would be better. It would be indeed > quite strange from syntactical POV if I just want to remove "all except" > and must call translate(). So ideally both should exist I think. > That kind of violate OWTDI though. Probably one's enough. and if fact with the use-cases I can think of, and the one you mentioned, they are really two steps: there are the characters you want to translate, and the ones you want to keep, but the ones you want to keep are a superset of the ones you want to translate. so if we added the "remove"option to .translate(), then you would need to add all the "keep" charactors to your translate table. I'm thinking they really are different operations, give them a different method. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eryksun at gmail.com Wed Oct 26 22:21:38 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 27 Oct 2016 02:21:38 +0000 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: <87mvhqhjqj.fsf@thinkpad.rath.org> Message-ID: On Wed, Oct 26, 2016 at 10:16 PM, Cody Piersall wrote: > Isn't that check really just an isatty() check? Or is that not > reliable enough for some reason? It's not reliable in Windows. There are no tty devices, so the C runtime's implementation of isatty() instead returns true for character devices, which includes the console as well as the NUL device and communication ports. For example: C:\>python -c "import sys; print(sys.stdin.isatty())" < nul True `python < nul` starts the REPL, but it immediately closes because there's nothing to read. On the other hand, reading from COM3 on my current system blocks, and the only way to exit the REPL is to kill the process, such as via Ctrl+Break: C:\>python < com3 Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> ^C The way to check for a console input or screen buffer is by calling GetConsoleMode on the handle. From mikhailwas at gmail.com Wed Oct 26 23:06:26 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 27 Oct 2016 05:06:26 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On 27 October 2016 at 03:51, Chris Barker wrote: > On Wed, Oct 26, 2016 at 5:10 PM, Mikhail V wrote: >> >> 2) a table with characters that are reasonably valuable >> and cover 99% of all programming, communication and typography in latin >> script > > > I think it's called latin-1 Yep, double quotes , dashes and bullets are very valuable both for typography and code (which to the largest part is the same) So if just blank out this maximalistic BS: ???????????????????????????????????????????????????? And add few good bullets/blocks, probably arrows, then it would be a reasonable set to use for most cases. Mikhail From rosuav at gmail.com Thu Oct 27 00:24:40 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 27 Oct 2016 15:24:40 +1100 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On Thu, Oct 27, 2016 at 2:06 PM, Mikhail V wrote: > Yep, double quotes , dashes and bullets are very valuable both for typography > and code (which to the largest part is the same) > So if just blank out this maximalistic BS: > ???????????????????????????????????????????????????? > > And add few good bullets/blocks, probably arrows, then it would be a > reasonable set to > use for most cases. You've missed out a half a dozen characters needed by Turkish or Hungarian, and completely missed the point that the Latin script is *NOT SUFFICIENT* for Python. If you want to argue that we should restrict the world to 256 characters, go blog somewhere and let people ignore you there, rather than ignoring you here. Unicode is here to stay. ChrisA From rosuav at gmail.com Thu Oct 27 00:33:31 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 27 Oct 2016 15:33:31 +1100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Oct 27, 2016 at 8:48 AM, Mikhail V wrote: > On 26 October 2016 at 20:58, Stephen J. Turnbull > wrote: >>import collections >>def translate_or_drop(string, table): >> """ >> string: a string to process >> table: a dict as accepted by str.translate >> """ >> return string.translate(collections.defaultdict(lambda: None, **table)) > >>All OK now? > > Not really. I tried with a simple example > intab = "ae" > outtab = "XM" > table = string.maketrans(intab, outtab) > collections.defaultdict(lambda: None, **table) > > an this gives me > TypeError: type object argument after ** must be a mapping, not str > > But I probably I misunderstood the idea. You're 99% of the way to understanding it. Try the exercise again in Python 3. You don't have string.maketrans (which creates a 256-byte translation mapping) - instead, you use a dictionary. ChrisA From p.f.moore at gmail.com Thu Oct 27 05:35:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 27 Oct 2016 10:35:32 +0100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: <20161027004916.GO15983@ando.pearwood.info> References: <20161027004916.GO15983@ando.pearwood.info> Message-ID: On 27 October 2016 at 01:49, Steven D'Aprano wrote: >> (a bit like readline, >> but I dislike the way you can't switch off readline integration if >> it's installed)? > > This comment surprises me. To me, that's like saying "I dislike the way > you can't switch off breathing" -- readline is almost indispensible. The > REPL experience without line editing (apart from backspace) and history > is *horrible*. Why would you want to switch it off? The Windows default command line editing experience is a lot better (IMO) than the (non-readline) Unix default, and it's common throughout all interactive prompts (Python's REPL included). As a result, when readline is installed (pyreadline on Windows, which used to be needed for IPython) it disrupts the "normal" editing experience. It's possible that with a bit of configuration and practice I could get used to the readline experience, but then I get a different experience when in a venv where I don't have pyreadline installed. The idea that simply having a module called "readline" available, changes the REPL behaviour, with no way to configure that, seems incredibly hostile to me. Of course it's arguably pyreadline's fault for reusing a stdlib name, but nevertheless it's not something I agree with. Paul From steve at pearwood.info Thu Oct 27 08:12:13 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Oct 2016 23:12:13 +1100 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: <20161027004916.GO15983@ando.pearwood.info> Message-ID: <20161027121212.GP15983@ando.pearwood.info> On Thu, Oct 27, 2016 at 10:35:32AM +0100, Paul Moore wrote: > The Windows default command line editing experience is a lot better > (IMO) than the (non-readline) Unix default, and it's common throughout > all interactive prompts (Python's REPL included). As a result, when > readline is installed (pyreadline on Windows, which used to be needed > for IPython) it disrupts the "normal" editing experience. It's > possible that with a bit of configuration and practice I could get > used to the readline experience, but then I get a different experience > when in a venv where I don't have pyreadline installed. Ah, that makes sense. > The idea that simply having a module called "readline" available, > changes the REPL behaviour, with no way to configure that, seems > incredibly hostile to me. I think that making readline less aggressive (at least for Windows users) may be a reasonable feature request for 3.7. -- Steve From jamespic at gmail.com Thu Oct 27 08:50:52 2016 From: jamespic at gmail.com (James Pic) Date: Thu, 27 Oct 2016 14:50:52 +0200 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: Hi all ! Ive heard some people saying it was rude to post on a mailing list without introducing yourself so here goes something: my name is James Pic and I've been developing and deploying a wide variety fof Python projects Python for the last 8 years, I love to learn and share and writing documentation amongst other things such as selling liquor. The way I've been deploying Python projects so far is probably similar to what a lot of people do and it almost always includes building runtime dependencies on the production server. So, nobody is going to congratulate me for that for sure but I think a lot of us have been doing so. Now I'm fully aware of distribution specific packaging solutions like dh-virtualenv shared by Spotify but here's my mental problem: I love to learn and to hack. I'm always trying now distributions and I rarely run the one that's in production in my company and when I'm deploying personal projects I like funny distributions like arch, Alpine Linux, or interesting paas solutions such as cloudfoundry, openshift, rancher and many others. And I'm always facing the same problem: I have to either build runtime dependencies on the server, either package my thing in the platform specific way. I feel like I've spent a really huge amount of time doing this king of thing. But the java people, they have jars, and they have smooth deployments no matter where they deploy it. So that's the idea I'm trying to share: I'd like to b able to build a file with my dependencies and my project in it. I'm not sure packaging only Python bytecode would work here because of c modules. Also, I'm always developing against a different Python version because I'm using different distributions because it's part of my passions in life, as ridiculous as it could sound to most people, I'm expecting at least some understanding from this list :) So I wonder, do you think the best solution for me would be to build an elf binary with my Python and dependencies that I could just run on any distribution given its on the right architecture ? Note that I like to use Arm too, so I know I'd need to be able to cross compile too. Thanks a lot for reading and if you can to take some time to share your thoughts and even better : point me in a direction, if that idea is the right solution and I'm going to be the only one interested I don't care if it's going to take years for me to achieve this. Thanks a heap ! Beat regards PS: I'm currently at the openstack summit in Barcelona if anybody there would like to talk about it in person, in which case I'll buy you the drinks ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Thu Oct 27 10:12:43 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 27 Oct 2016 16:12:43 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On 27 October 2016 at 06:24, Chris Angelico wrote: > Unicode is here to stay. Congratulations. And chillax. I don't blog anywhere, have no time for that. Mikhail From rosuav at gmail.com Thu Oct 27 10:32:16 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 28 Oct 2016 01:32:16 +1100 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: On Thu, Oct 27, 2016 at 11:50 PM, James Pic wrote: > So that's the idea I'm trying to share: I'd like to b able to build a file > with my dependencies and my project in it. I'm not sure packaging only > Python bytecode would work here because of c modules. Also, I'm always > developing against a different Python version because I'm using different > distributions because it's part of my passions in life, as ridiculous as it > could sound to most people, I'm expecting at least some understanding from > this list :) > > So I wonder, do you think the best solution for me would be to build an elf > binary with my Python and dependencies that I could just run on any > distribution given its on the right architecture ? Note that I like to use > Arm too, so I know I'd need to be able to cross compile too. In theory, you could do that. You'd have to include *all* of Python, and all of everything else you might depend on, because you can't be sure what is and isn't available, so you might as well ship your app as a VM image or something, with an entire operating system. In practice, you're probably going to need to deal with some sort of package manager, and that's where the difficulties start. You can probably cover most of the Debian-based Linuxes by targeting either Debian Stable or Ubuntu LTS and creating a .deb file that specifies what versions of various libraries you need. There's probably a way to aim an RPM build that will work on RHEL, Fedora, SUSE, etc, but I'm not familiar with that family tree and where their library versions tend to sit. The trouble is that as soon as you land on an OS version that's too far distant from the one you built on, stuff will break. Between the bleeding-edge rolling distros and the super-stable ones could be over a decade of development (RHEL 4, released in 2005, is still supported). What you can probably do is ignore the absolute bleeding edge (anyone who's working on Debian Unstable or Arch or Crunchbang is probably aware of the issues and can solve them), and then decide how far back you support by looking at what you depend on, probably losing the very oldest of distributions. It should be possible to hit 95% of Linuxes out there by providing one .deb and one .rpm (per architecture, if you support multiple), but don't quote me on that figure. Unfortunately, the problem you're facing is virtually unsolvable, simply because the freedom of open systems means there is a LOT of variation out there. But most people on the outskirts are accustomed to doing their own dependency management (like when I used to work primarily on OS/2 - nobody supports it much, so you support it yourself). With all sincerity I say to you, good luck. Try not to lose the enthusiasm that I'm hearing from you at the moment! ChrisA From toddrjen at gmail.com Thu Oct 27 11:05:26 2016 From: toddrjen at gmail.com (Todd) Date: Thu, 27 Oct 2016 11:05:26 -0400 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: On Thu, Oct 27, 2016 at 8:50 AM, James Pic wrote: > Hi all ! > > Ive heard some people saying it was rude to post on a mailing list without > introducing yourself so here goes something: my name is James Pic and I've > been developing and deploying a wide variety fof Python projects Python for > the last 8 years, I love to learn and share and writing documentation > amongst other things such as selling liquor. > > The way I've been deploying Python projects so far is probably similar to > what a lot of people do and it almost always includes building runtime > dependencies on the production server. So, nobody is going to congratulate > me for that for sure but I think a lot of us have been doing so. > > Now I'm fully aware of distribution specific packaging solutions like > dh-virtualenv shared by Spotify but here's my mental problem: I love to > learn and to hack. I'm always trying now distributions and I rarely run the > one that's in production in my company and when I'm deploying personal > projects I like funny distributions like arch, Alpine Linux, or > interesting paas solutions such as cloudfoundry, openshift, rancher and > many others. > > And I'm always facing the same problem: I have to either build runtime > dependencies on the server, either package my thing in the platform > specific way. I feel like I've spent a really huge amount of time doing > this king of thing. But the java people, they have jars, and they have > smooth deployments no matter where they deploy it. > > So that's the idea I'm trying to share: I'd like to b able to build a file > with my dependencies and my project in it. I'm not sure packaging only > Python bytecode would work here because of c modules. Also, I'm always > developing against a different Python version because I'm using different > distributions because it's part of my passions in life, as ridiculous as it > could sound to most people, I'm expecting at least some understanding from > this list :) > > So I wonder, do you think the best solution for me would be to build an > elf binary with my Python and dependencies that I could just run on any > distribution given its on the right architecture ? Note that I like to use > Arm too, so I know I'd need to be able to cross compile too. > > Thanks a lot for reading and if you can to take some time to share your > thoughts and even better : point me in a direction, if that idea is the > right solution and I'm going to be the only one interested I don't care if > it's going to take years for me to achieve this. > > Thanks a heap ! > > Beat regards > > PS: I'm currently at the openstack summit in Barcelona if anybody there > would like to talk about it in person, in which case I'll buy you the > drinks ;) > Are you sure this is really what you need to do? With dependency handling, you can define the dependencies of your project and they will automatically get installed from pypi when the user tries to install the package (if they aren't already installed). manylinux wheels [1] allow you to distribute your own code in a manner that is compatible with most linux distributions, and many c-based projects now offer such wheels. Assuming your dependencies have version agnostic wheels (either manylinux or pure python), what would be the advantage to you of putting everything together in a single file? That being said, I suppose it would be possible to create your own manylinux wheels that include all the necessary dependencies, but that would make building more difficult and opens up the possibility that the installed modules will conflict with users' existing installed packages. Another possibility would be to use docker to create a container [2] that includes everything you need to run the code in an isolated environment that won't conflict [1] https://github.com/pypa/manylinux [2] https://www.digitalocean.com/community/tutorials/docker-explained-how-to-containerize-python-web-applications -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 27 11:34:42 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 28 Oct 2016 02:34:42 +1100 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: <20161027153441.GQ15983@ando.pearwood.info> On Thu, Oct 27, 2016 at 02:50:52PM +0200, James Pic wrote: > And I'm always facing the same problem: I have to either build runtime > dependencies on the server, either package my thing in the platform > specific way. I feel like I've spent a really huge amount of time doing > this king of thing. But the java people, they have jars, and they have > smooth deployments no matter where they deploy it. > > So that's the idea I'm trying to share: I'd like to b able to build a file > with my dependencies and my project in it. [...] > So I wonder, do you think the best solution for me would be to build an elf > binary with my Python and dependencies that I could just run on any > distribution given its on the right architecture ? Note that I like to use > Arm too, so I know I'd need to be able to cross compile too. Your question is off-topic for this list. This list is for proposing new features for the Python language, but you don't seem to proposing anything new. To ask for advice on using Python (including things like packaging dependencies), you probably should ask on the Python-List mailing list, also available on usenet as comp.lang.python. There may be some other dedicated mailing lists that specialise in packaging questions, check the mailing list server here: https://mail.python.org/mailman/listinfo I can't really help you with your question, except to point you in the direction of a little-known feature of Python: zip file application support: https://www.python.org/dev/peps/pep-0441/ https://docs.python.org/3/library/zipapp.html -- Steve From liik.joonas at gmail.com Thu Oct 27 11:27:52 2016 From: liik.joonas at gmail.com (Joonas Liik) Date: Thu, 27 Oct 2016 18:27:52 +0300 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> Message-ID: perhaps just having a utility function can get us some of the way there.. #may error r = a.b.x.z # will default to None r = a?.b?.x?.z r = get_null_aware(a, "b.x.z") # long but no new syntax, can be implemented today. From chris.barker at noaa.gov Thu Oct 27 11:54:21 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 27 Oct 2016 08:54:21 -0700 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: OT, but.... Assuming your dependencies have version agnostic wheels (either manylinux > or pure python), what would be the advantage to you of putting everything > together in a single file? > > That being said, I suppose it would be possible to create your own > manylinux wheels that include all the necessary dependencies, but that > would make building more difficult and opens up the possibility that the > installed modules will conflict with users' existing installed packages. > which is why conda exists -- conda can package up many things besides python packages, and you WILL need things besides python pacakges. building conda packages for everything you need, and then creating an environment.yaml file, and you can create a consistent environment very easy across systems (even across OSs) There is even a project (I forgot what its called -- "collections" maybe?) that bundles up a bunch of conda packages for you. and with conda-forge, there are an awful lot of packages already to go. If you need even more control over your environment, then Docker is the way to go -- you can even use conda inside docker... -CHB > > Another possibility would be to use docker to create a container [2] that > includes everything you need to run the code in an isolated environment > that won't conflict > > [1] https://github.com/pypa/manylinux > [2] https://www.digitalocean.com/community/tutorials/docker- > explained-how-to-containerize-python-web-applications > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Oct 27 12:01:19 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 27 Oct 2016 09:01:19 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: >> return string.translate(collections.defaultdict(lambda: None, **table)) Nice! I forgot about defautdict -- so this just needs a recipe somewhere -- maybe even in the docs for str.translate. BTW, great use case for defautdict -- I had been wondering what the point was, given that a regular dict as .setdefault -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Oct 27 12:41:21 2016 From: random832 at fastmail.com (Random832) Date: Thu, 27 Oct 2016 12:41:21 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> Message-ID: <1477586481.182170.769321641.7AD3F381@webmail.messagingengine.com> On Thu, Oct 27, 2016, at 11:27, Joonas Liik wrote: > perhaps just having a utility function can get us some of the way there.. > > #may error > r = a.b.x.z > > # will default to None > r = a?.b?.x?.z If a.b can't or shouldn't be None, this should be a?.b.x.z I'm not certain how your utility function is supposed to differentiate this case, or handle subscripts or method calls. > r = get_null_aware(a, "b.x.z") # long but no new syntax, can be > implemented today. From ned at nedbatchelder.com Thu Oct 27 14:27:18 2016 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 27 Oct 2016 14:27:18 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: <6a6839e1-b87c-8878-7c67-e8eb8853f3d0@nedbatchelder.com> On 10/27/16 10:12 AM, Mikhail V wrote: > On 27 October 2016 at 06:24, Chris Angelico wrote: > >> Unicode is here to stay. > Congratulations. And chillax. I don't blog anywhere, have no time for that. It's not clear at all where this thread is going, but it's clear to me that it is off-topic. --Ned. From mikhailwas at gmail.com Thu Oct 27 14:28:34 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 27 Oct 2016 20:28:34 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: On 27 October 2016 at 06:24, Chris Angelico wrote: > On Thu, Oct 27, 2016 at 2:06 PM, Mikhail V wrote: >> Yep, double quotes , dashes and bullets are very valuable both for typography >> and code (which to the largest part is the same) >> So if just blank out this maximalistic BS: >> ???????????????????????????????????????????????????? >> >> And add few good bullets/blocks, probably arrows, then it would be a >> reasonable set to >> use for most cases. > > You've missed out a half a dozen characters needed by Turkish or > Hungarian, and completely missed the point that the Latin script is > *NOT SUFFICIENT* for Python. If you want to argue that we should > restrict the world to 256 characters, go blog somewhere and let people > ignore you there, rather than ignoring you here. Unicode is here to > stay. > > ChrisA So you need umlauts to describe an algorithm and to explain yourself in turkish? Cool story. Poor uncle Garamond spins in his coffin... So what about curly quotes? This would make at least some sense, regardless of unicode. Mikhail From random832 at fastmail.com Thu Oct 27 15:40:44 2016 From: random832 at fastmail.com (Random832) Date: Thu, 27 Oct 2016 15:40:44 -0400 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: <1477597244.217112.769501857.6CA4C444@webmail.messagingengine.com> On Thu, Oct 27, 2016, at 14:28, Mikhail V wrote: > So you need umlauts to describe an algorithm and to explain yourself in > turkish? > Cool story. Poor uncle Garamond spins in his coffin... Why do you need 26 letters? The Romans didn't have so many. Hawaiian gets by with half as many - even if you count the accented vowels and the ?okina it's still only 18. Why upper and lower case? Do we *really* need digits, can't we just use the first ten letters? Allowing each language to use its own alphabet, even if any of them may be inefficient and all of them together certainly are, is the only reasonable place to draw the line. From mal at egenix.com Thu Oct 27 15:51:45 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 27 Oct 2016 21:51:45 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: References: <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> Message-ID: <58125AD1.1090303@egenix.com> On 27.10.2016 20:28, Mikhail V wrote: > So what about curly quotes? This would make at > least some sense, regardless of unicode. -1. This would break code using curly quotes in string literals, break existing Python IDEs and parsers. BTW: I have yet to find a keyboard which allows me to enter such quotes. I think you simply have to accept that MS Word is not a supported editor for Python applications ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 27 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From barry at python.org Thu Oct 27 18:28:29 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 27 Oct 2016 18:28:29 -0400 Subject: [Python-ideas] Null coalescing operator References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> Message-ID: <20161027182829.03d3d782@anarchist> On Oct 27, 2016, at 06:27 PM, Joonas Liik wrote: >perhaps just having a utility function can get us some of the way there.. > >#may error >r = a.b.x.z > ># will default to None >r = a?.b?.x?.z >r = get_null_aware(a, "b.x.z") # long but no new syntax, can be >implemented today. You could probably do this by extending operator.attrgetter() to take an optional 'coalesce' keyword. It wouldn't be super pretty, but it has the advantage of no magical new syntax. E.g. your example would be: from operator import attrgetter r = attrgetter('b.x.z', coalesce=True) That might be good enough for honestly how rare I think this use case is. (Similarly with itemgetter().) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From mikhailwas at gmail.com Thu Oct 27 18:34:25 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 28 Oct 2016 00:34:25 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <1477597244.217112.769501857.6CA4C444@webmail.messagingengine.com> References: <20161022063513.GN22471@ando.pearwood.info> <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> <1477597244.217112.769501857.6CA4C444@webmail.messagingengine.com> Message-ID: On 27 October 2016 at 21:40, Random832 wrote: > On Thu, Oct 27, 2016, at 14:28, Mikhail V wrote: >> So you need umlauts to describe an algorithm and to explain yourself in >> turkish? >> Cool story. Poor uncle Garamond spins in his coffin... > > Why do you need 26 letters? The Romans didn't have so many. Hawaiian > gets by with half as many - even if you count the accented vowels and > the ?okina it's still only 18. > > Why upper and lower case? Do we *really* need digits, can't we just use > the first ten letters? > > Allowing each language to use its own alphabet, even if any of them may > be inefficient and all of them together certainly are, is the only > reasonable place to draw the line. Hi Random, Yes that is what I am trying to tell, but some paint a "bigot" of me. So there is no contradiction here. You know you "local" script and you know Latin. So it belongs to my human right if I want to choose a more effective one, so since Latin is most effective now, I take it. Simply like I take a wheel without defects and with tight pressure in tyre. I don't have emotions or sadness that I will forget my strange old letters. And if we return to problem of universal communication "kind of standard" then what the sense to take a defect wheel? I am not the one to allow or disallow anything, but I respect the works of Garamond and his predecessors who made it possible for me to read without pain in eyes and I disrespect attempts to ruin it. And beleive me, it is *very* easy to ruin it all by putting umlauts and accents, just like putting stones in the tyre. Mikhail From barry at python.org Thu Oct 27 18:43:52 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 27 Oct 2016 18:43:52 -0400 Subject: [Python-ideas] Distribution agnostic Python project packaging References: Message-ID: <20161027184352.1fed4cbc@anarchist> On Oct 27, 2016, at 02:50 PM, James Pic wrote: >Now I'm fully aware of distribution specific packaging solutions like >dh-virtualenv shared by Spotify but here's my mental problem: I love to >learn and to hack. I'm always trying now distributions and I rarely run the >one that's in production in my company and when I'm deploying personal >projects I like funny distributions like arch, Alpine Linux, or >interesting paas solutions such as cloudfoundry, openshift, rancher and >many others. > >So that's the idea I'm trying to share: I'd like to b able to build a file >with my dependencies and my project in it. You might want to look at the Snap ecosystem. It's fairly new, but it is cross-distro and cross-arch, and in many ways a very cool way to build self-contained applications where you control all the dependencies. You don't have to worry so much about each distribution's peculiarities, and Python gets first-class support[*]. There are lots of technical and philosophical aspects to Snaps that are off-topic for this mailing list, so I'll just point you to where you can explore it on your own: http://snapcraft.io/ Disclosure: I work for Canonical in my day job, which invented the technology, but it is in very large part an open source community project. Cheers, -Barry [*] In fact, the nice convenience front-end to building snaps is a Python 3 application. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From mikhailwas at gmail.com Thu Oct 27 19:06:30 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Fri, 28 Oct 2016 01:06:30 +0200 Subject: [Python-ideas] Smart/Curly Quote Marks and cPython In-Reply-To: <58125AD1.1090303@egenix.com> References: <22542.48318.438953.123614@turnbull.sk.tsukuba.ac.jp> <20161026231325.GL15983@ando.pearwood.info> <58125AD1.1090303@egenix.com> Message-ID: On 27 October 2016 at 21:51, M.-A. Lemburg wrote: > On 27.10.2016 20:28, Mikhail V wrote: >> So what about curly quotes? This would make at >> least some sense, regardless of unicode. > > -1. This would break code using curly quotes in string literals, > break existing Python IDEs and parsers. > > BTW: I have yet to find a keyboard which allows me to enter > such quotes. I think you simply have to accept that MS Word is > not a supported editor for Python applications ;-) > > -- > Marc-Andre Lemburg Hehe :) For me, putting them in is simply as having this in my vimrc config: inoremap 147 inoremap 148 Currently I don't become code from outer applications so I type them in, so for new code it will not cause much problems. For old code I think it not so infeasible to make batch convert to the new format. AND you know, even in VIM with its spartanic "Courier New" monowidth font, those quotes look sooo much better, that I really want it. And in my code there tons of quotes in concatenating string for console commands. So I am +1 on this, but of course I cannot argue that it is very "uncomfortable" change in general. Mikhail From nbadger1 at gmail.com Thu Oct 27 22:37:14 2016 From: nbadger1 at gmail.com (Nick Badger) Date: Thu, 27 Oct 2016 19:37:14 -0700 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <20161027182829.03d3d782@anarchist> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> <20161027182829.03d3d782@anarchist> Message-ID: The problem with doing that is that it's ambiguous. There's no way of telling which attribute is allowed to coalesce. I think one of the best arguments for a coalescing operator in Python is that it allows you to be more explicit, without the hassle of nested try: except AttributeError blocks. You lose that with something like attrgetter('b.x.z', coalesce=True) -- it would behave identically, regardless of whether b, x, or z were missing, which is (oftentimes) not what you want. Nick Badger https://www.muterra.io https://www.nickbadger.com 2016-10-27 15:28 GMT-07:00 Barry Warsaw : > On Oct 27, 2016, at 06:27 PM, Joonas Liik wrote: > > >perhaps just having a utility function can get us some of the way there.. > > > >#may error > >r = a.b.x.z > > > ># will default to None > >r = a?.b?.x?.z > >r = get_null_aware(a, "b.x.z") # long but no new syntax, can be > >implemented today. > > You could probably do this by extending operator.attrgetter() to take an > optional 'coalesce' keyword. It wouldn't be super pretty, but it has the > advantage of no magical new syntax. E.g. your example would be: > > from operator import attrgetter > r = attrgetter('b.x.z', coalesce=True) > > That might be good enough for honestly how rare I think this use case is. > (Similarly with itemgetter().) > > Cheers, > -Barry > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Oct 28 03:26:06 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 28 Oct 2016 17:26:06 +1000 Subject: [Python-ideas] Distribution agnostic Python project packaging In-Reply-To: References: Message-ID: On 27 October 2016 at 22:50, James Pic wrote: > Hi all ! > > Ive heard some people saying it was rude to post on a mailing list without > introducing yourself so here goes something: my name is James Pic and I've > been developing and deploying a wide variety fof Python projects Python for > the last 8 years, I love to learn and share and writing documentation > amongst other things such as selling liquor. > > The way I've been deploying Python projects so far is probably similar to > what a lot of people do and it almost always includes building runtime > dependencies on the production server. So, nobody is going to congratulate > me for that for sure but I think a lot of us have been doing so. You're right that this is a common problem, but it also isn't a language level problem - it's a software publication and distribution one, and for the Python community, the folks most actively involved in driving and/or popularising improvements in that space are those running packaging.python.org. While there's a fair bit of overlap between the two lists, the main home for those discussions is over on distutils-sig: https://mail.python.org/mailman/listinfo/distutils-sig (so called due to the standard library module that provided Python's original project-agnostic interface for building extension modules) > Now I'm fully aware of distribution specific packaging solutions like > dh-virtualenv shared by Spotify but here's my mental problem: I love to > learn and to hack. I'm always trying now distributions and I rarely run the > one that's in production in my company and when I'm deploying personal > projects I like funny distributions like arch, Alpine Linux, or interesting > paas solutions such as cloudfoundry, openshift, rancher and many others. > > And I'm always facing the same problem: I have to either build runtime > dependencies on the server, either package my thing in the platform specific > way. I feel like I've spent a really huge amount of time doing this king of > thing. But the java people, they have jars, and they have smooth deployments > no matter where they deploy it. If you're not using C extensions (the closest Python equivalent to the typical jar use case), then ``zipapp`` should have you covered: https://docs.python.org/3/library/zipapp.html While the zipapp module itself is relatively new, the underlying interpreter and import system capabilities that it relies on have been around since Python 2.6. > So that's the idea I'm trying to share: I'd like to b able to build a file > with my dependencies and my project in it. I'm not sure packaging only > Python bytecode would work here because of c modules. For extension modules, you're facing a much harder problem than doing the same for pure Python code (where you can just use zipapp). However, engineering folks at Twitter put together pex, which gives you a full virtual environment to play with on the target system, and hence can handle extension modules as well: https://pex.readthedocs.io/en/stable/ The only runtime dependency pex places on the target system is having a Python runtime available. More generally, one of the major problems we have in this area at the moment is that a lot of the relevant information is just plain hard for people to find, so if this is an area you're interested in, then https://github.com/pypa/python-packaging-user-guide/issues/267 is aiming to pull together some of the currently available information into a more readily consumable form and is mainly waiting on a draft PR that attempts to make the existing content more discoverable. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Oct 28 04:30:05 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 28 Oct 2016 18:30:05 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators Message-ID: Hi folks, After the recent discussions of PEP 505's null-coalescing operator (and the significant confusion around why anyone would ever want a feature like that), I was inspired to put together a competing proposal that focuses more on defining a new "existence checking" protocol that generalises the current practicises of: * obj is not None (many different use cases) * obj is not Ellipsis (in multi-dimensional slicing) * obj is not NotImplemented (in operand coercion) * math.isnan(value) * cmath.isnan(value) * decimal.getcontext().is_nan(value) Given that protocol as a basis, it then proceeds to define "?then" and "?else" as existence checking counterparts to the truth-checking "and" and "or", as well as "?.", "?[]" and "?=" as abbreviations for particular idiomatic uses of "?then" and "?else". I think this approach frames the discussion in a more productive way, as it gives us a series of questions to consider in order where a collective answer of "No" at any point would be enough to kill this particular proposal (or parts of it), but precisely *where* we say "No" will determine which future alternatives might be worth considering: 1. Do we collectively agree that "existence checking" is a useful general concept that exists in software development and is distinct from the concept of "truth checking"? 2. Do we collectively agree that the Python ecosystem would benefit from an existence checking protocol that permits generalisation of algorithms (especially short circuiting ones) across different "data missing" indicators, including those defined in the language definition, the standard library, and custom user code? 3. Do we collectively agree that it would be easier to use such a protocol effectively if existence-checking equivalents to the truth-checking "and" and "or" control flow operators were available? Only if we have at least some level of consensus on the above questions regarding whether or not this is a conceptual modeling problem worth addressing at the language level does it then make sense to move on to the more detailed questions regarding the specific proposed *solution* to the problem in the PEP: 4. Do we collectively agree that "?then" and "?else" would be reasonable spellings for such operators? 5a. Do we collectively agree that "access this attribute only if the object exists" would be a particularly common use case for such operators? 5b. Do we collectively agree that "access this subscript only if the object exists" would be a particularly common use case for such operators? 5c. Do we collectively agree that "bind this value to this target only if the value currently bound to the target nominally doesn't exist" would be a particularly common use case for such operators? 6a. Do we collectively agree that 'obj?.attr' would be a reasonable spelling for "access this attribute only if the object exists"? 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable spelling for "access this subscript only if the object exists"? 6c. Do we collectively agree that 'target ?= expr' would be a reasonable spelling for "bind this value to this target only if the value currently bound to the target nominally doesn't exist"? To be clear, this would be a *really* big addition to the language that would have significant long term ramifications for how the language gets taught to new developers. At the same time, asking whether or not an object represents an absence of data rather than the truth of a proposition seems to me like a sufficiently common problem in a wide enough variety of domains that it may be worth elevating to the level of giving it dedicated syntactic support. Regards, Nick. Rendered HTML version: https://www.python.org/dev/peps/pep-0531/ =============================== PEP: 531 Title: Existence checking operators Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 25-Oct-2016 Python-Version: 3.7 Post-History: 28-Oct-2016 Abstract ======== Inspired by PEP 505 and the related discussions, this PEP proposes the addition of two new control flow operators to Python: * Existence-checking precondition ("exists-then"): ``expr1 ?then expr2`` * Existence-checking fallback ("exists-else"): ``expr1 ?else expr2`` as well as the following abbreviations for common existence checking expressions and statements: * Existence-checking attribute access: ``obj?.attr`` (for ``obj ?then obj.attr``) * Existence-checking subscripting: ``obj?[expr]`` (for ``obj ?then obj[expr]``) * Existence-checking assignment: ``value ?= expr`` (for ``value = value ?else expr``) The common ``?`` symbol in these new operator definitions indicates that they use a new "existence checking" protocol rather than the established truth-checking protocol used by if statements, while loops, comprehensions, generator expressions, conditional expressions, logical conjunction, and logical disjunction. This new protocol would be made available as ``operator.exists``, with the following characteristics: * types can define a new ``__exists__`` magic method (Python) or ``tp_exists`` slot (C) to override the default behaviour. This optional method has the same signature and possible return values as ``__bool__``. * ``operator.exists(None)`` returns ``False`` * ``operator.exists(NotImplemented)`` returns ``False`` * ``operator.exists(Ellipsis)`` returns ``False`` * ``float``, ``complex`` and ``decimal.Decimal`` will override the existence check such that ``NaN`` values return ``False`` and other values (including zero values) return ``True`` * for any other type, ``operator.exists(obj)`` returns True by default. Most importantly, values that evaluate to False in a truth checking context (zeroes, empty containers) will still evaluate to True in an existence checking context Relationship with other PEPs ============================ While this PEP was inspired by and builds on Mark Haase's excellent work in putting together PEP 505, it ultimately competes with that PEP due to significant differences in the specifics of the proposed syntax and semantics for the feature. It also presents a different perspective on the rationale for the change by focusing on the benefits to existing Python users as the typical demands of application and service development activities are genuinely changing. It isn't an accident that similar features are now appearing in multiple programming languages, and while it's a good idea for us to learn from how other language designers are handling the problem, precedents being set elsewhere are more relevant to *how* we would go about tackling this problem than they are to whether or not we think it's a problem we should address in the first place. Rationale ========= Existence checking expressions ------------------------------ An increasingly common requirement in modern software development is the need to work with "semi-structured data": data where the structure of the data is known in advance, but pieces of it may be missing at runtime, and the software manipulating that data is expected to degrade gracefully (e.g. by omitting results that depend on the missing data) rather than failing outright. Some particularly common cases where this issue arises are: * handling optional application configuration settings and function parameters * handling external service failures in distributed systems * handling data sets that include some partial records It is the latter two cases that are the primary motivation for this PEP - while needing to deal with optional configuration settings and parameters is a design requirement at least as old as Python itself, the rise of public cloud infrastructure, the development of software systems as collaborative networks of distributed services, and the availability of large public and private data sets for analysis means that the ability to degrade operations gracefully in the face of partial service failures or partial data availability is becoming an essential feature of modern programming environments. At the moment, writing such software in Python can be genuinely awkward, as your code ends up littered with expressions like: * ``value1 = expr1.field.of.interest if expr1 is not None else None`` * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5`` If these are only occasional, then expanding out to full statement forms may help improve readability, but if you have 4 or 5 of them in a row (which is a fairly common situation in data transformation pipelines), then replacing them with 16 or 20 lines of conditional logic really doesn't help matters. Expanding the three examples above that way hopefully helps illustrate that:: _expr1 = expr1 if _expr1 is not None: value1 = _expr1.field.of.interest else: value1 = None _expr2 = expr2 if _expr2 is not None: value2 = _expr2["field"]["of"]["interest"] else: value2 = None _expr3 = expr3 if _expr3 is not None: value3 = _expr3 else: _expr4 = expr4 if _expr4 is not None: value3 = _expr4 else: value3 = expr5 The combined impact of the proposals in this PEP is to allow the above sample expressions to instead be written as: * ``value1 = expr1?.field.of.interest`` * ``value2 = expr2?["field"]["of"]["interest"]`` * ``value3 = expr3 ?else expr4 ?else expr5`` In these forms, almost all of the information presented to the reader is immediately relevant to the question "What does this code do?", while the boilerplate code to handle missing data by passing it through to the output or falling back to an alternative input, has shrunk to two uses of the ``?`` symbol and two uses of the ``?else`` keyword. In the first two examples, the 31 character boilerplate clause `` if exprN is not None else None`` (minimally 27 characters for a single letter variable name) has been replaced by a single ``?`` character, substantially improving the signal-to-pattern-noise ratio of the lines (especially if it encourages the use of more meaningful variable and field names rather than making them shorter purely for the sake of expression brevity). In the last example, two instances of the 21 character boilerplate, `` if exprN is not None`` (minimally 17 characters) are replaced with single characters, again substantially improving the signal-to-pattern-noise ratio. Furthermore, each of our 5 "subexpressions of potential interest" is included exactly once, rather than 4 of them needing to be duplicated or pulled out to a named variable in order to first check if they exist. The existence checking precondition operator is mainly defined to provide a clear conceptual basis for the existence checking attribute access and subscripting operators: * ``obj?.attr`` is roughly equivalent to ``obj ?then obj.attr`` * ``obj?[expr]``is roughly equivalent to ``obj ?then obj[expr]`` The main semantic difference between the shorthand forms and their expanded equivalents is that the common subexpression to the left of the existence checking operator is evaluated only once in the shorthand form (similar to the benefit offered by augmented assignment statements). Existence checking assignment ----------------------------- Existence-checking assignment is proposed as a relatively straightforward expansion of the concepts in this PEP to also cover the common configuration handling idiom: * ``value = value if value is not None else expensive_default()`` by allowing that to instead be abbreviated as: * ``value ?= expensive_default()`` This is mainly beneficial when the target is a subscript operation or subattribute, as even without this specific change, the PEP would still permit this idiom to be updated to: * ``value = value ?else expensive_default()`` The main argument *against* adding this form is that it's arguably ambiguous and could mean either: * ``value = value ?else expensive_default()``; or * ``value = value ?then value.subfield.of.interest`` The second form isn't at all useful, but if this concern was deemed significant enough to address while still keeping the augmented assignment feature, the full keyword could be included in the syntax: * ``value ?else= expensive_default()`` Alternatively, augmented assignment could just be dropped from the current proposal entirely and potentially reconsidered at a later date. Existence checking protocol --------------------------- The existence checking protocol is including in this proposal primarily to allow for proxy objects (e.g. local representations of remote resources) and mock objects used in testing to correctly indicate non-existence of target resources, even though the proxy or mock object itself is not None. However, with that protocol defined, it then seems natural to expand it to provide a type independent way of checking for ``NaN`` values in numeric types - at the moment you need to be aware of the exact data type you're working with (e.g. builtin floats, builtin complex numbers, the decimal module) and use the appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``, ``decimal.getcontext().is_nan()``, respectively) Similarly, it seems reasonable to declare that the other placeholder builtin singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that represent the absence of data moreso than they represent data. Proposed symbolic notation -------------------------- Python has historically only had one kind of implied boolean context: truth checking, which can be invoked directly via the ``bool()`` builtin. As this PEP proposes a new kind of control flow operation based on existence checking rather than truth checking, it is considered valuable to have a reminder directly in the code when existence checking is being used rather than truth checking. The mathematical symbol for existence assertions is U+2203 'THERE EXISTS': ``?`` Accordingly, one possible approach to the syntactic additions proposed in this PEP would be to use that already defined mathematical notation: * ``expr1 ?then expr2`` * ``expr1 ?else expr2`` * ``obj?.attr`` * ``obj?[expr]`` * ``target ?= expr`` However, there are two major problems with that approach, one practical, and one pedagogical. The practical problem is the usual one that most keyboards don't offer any easy way of entering mathematical symbols other than those used in basic arithmetic (even the symbols appearing in this PEP were ultimately copied & pasted from [3]_ rather than being entered directly). The pedagogical problem is that the symbols for existence assertions (``?``) and universal assertions (``?``) aren't going to be familiar to most people the way basic arithmetic operators are, so we wouldn't actually be making the proposed syntax easier to understand by adopting ``?``. By contrast, ``?`` is one of the few remaining unused ASCII punctuation characters in Python's syntax, making it available as a candidate syntactic marker for "this control flow operation is based on an existence check, not a truth check". Taking that path would also have the advantage of aligning Python's syntax with corresponding syntax in other languages that offer similar features. Drawing from the existing summary in PEP 505 and the Wikipedia articles on the "safe navigation operator [1]_ and the "null coalescing operator" [2]_, we see: * The ``?.`` existence checking attribute access syntax precisely aligns with: * the "safe navigation" attribute access operator in C# (``?.``) * the "optional chaining" operator in Swift (``?.``) * the "safe navigation" attribute access operator in Groovy (``?.``) * the "conditional member access" operator in Dart (``?.``) * The ``?[]`` existence checking attribute access syntax precisely aligns with: * the "safe navigation" subscript operator in C# (``?[]``) * the "optional subscript" operator in Swift (``?[].``) * The ``?else`` existence checking fallback syntax semantically aligns with: * the "null-coalescing" operator in C# (``??``) * the "null-coalescing" operator in PHP (``??``) * the "nil-coalescing" operator in Swift (``??``) To be clear, these aren't the only spelling of these operators used in other languages, but they're the most common ones, and the ``?`` symbol is the most common syntactic marker by far (presumably prompted by the use of ``?`` to introduce the "then" clause in C-style conditional expressions, which many of these languages also offer). Proposed keywords ----------------- Given the symbolic marker ``?``, it would be syntactically unambiguous to spell the existence checking precondition and fallback operations using the same keywords as their truth checking counterparts: * ``expr1 ?and expr2`` (instead of ``expr1 ?then expr2``) * ``expr1 ?or expr2`` (instead of ``expr1 ?else expr2``) However, while syntactically unambiguous when written, this approach makes the code incredibly hard to *pronounce* (What's the pronunciation of "?"?) and also hard to *describe* (given reused keywords, there's no obvious shorthand terms for "existence checking precondition (?and)" and "existence checking fallback (?or)" that would distinguish them from "logical conjunction (and)" and "logical disjunction (or)"). We could try to encourage folks to pronounce the ``?`` symbol as "exists", making the shorthand names the "exists-and expression" and the "exists-or expression", but there'd be no way of guessing those names purely from seeing them written in a piece of code. Instead, this PEP takes advantage of the proposed symbolic syntax to introduce a new keyword (``?then``) and borrow an existing one (``?else``) in a way that allows people to refer to "then expressions" and "else expressions" without ambiguity. These keywords also align well with the conditional expressions that are semantically equivalent to the proposed expressions. For ``?else`` expressions, ``expr1 ?else expr2`` is equivalent to:: _lhs_result = expr1 _lhs_result if operator.exists(_lhs_result) else expr2 Here the parallel is clear, since the ``else expr2`` appears at the end of both the abbreviated and expanded forms. For ``?then`` expressions, ``expr1 ?then expr2`` is equivalent to:: _lhs_result = expr1 expr2 if operator.exists(_lhs_result) else _lhs_result Here the parallel isn't as immediately obvious due to Python's traditionally anonymous "then" clauses (introduced by ``:`` in ``if`` statements and suffixed by ``if`` in conditional expressions), but it's still reasonably clear as long as you're already familiar with the "if-then-else" explanation of conditional control flow. Risks and concerns ================== Readability ----------- Learning to read and write the new syntax effectively mainly requires internalising two concepts: * expressions containing ``?`` include an existence check and may short circuit * if ``None`` or another "non-existent" value is an expected input, and the correct handling is to propagate that to the result, then the existence checking operators are likely what you want Currently, these concepts aren't explicitly represented at the language level, so it's a matter of learning to recognise and use the various idiomatic patterns based on conditional expressions and statements. Magic syntax ------------ There's nothing about ``?`` as a syntactic element that inherently suggests ``is not None`` or ``operator.exists``. The main current use of ``?`` as a symbol in Python code is as a trailing suffix in IPython environments to request help information for the result of the preceding expression. However, the notion of existence checking really does benefit from a pervasive visual marker that distinguishes it from truth checking, and that calls for a single-character symbolic syntax if we're going to do it at all. Conceptual complexity --------------------- This proposal takes the currently ad hoc and informal concept of "existence checking" and elevates it to the status of being a syntactic language feature with a clearly defined operator protocol. In many ways, this should actually *reduce* the overall conceptual complexity of the language, as many more expectations will map correctly between truth checking with ``bool(expr)`` and existence checking with ``operator.exists(expr)`` than currently map between truth checking and existence checking with ``expr is not None`` (or ``expr is not NotImplemented`` in the context of operand coercion, or the various NaN-checking operations in mathematical libraries). As a simple example of the new parallels introduced by this PEP, compare:: all_are_true = all(map(bool, iterable)) at_least_one_is_true = any(map(bool, iterable)) all_exist = all(map(operator.exists, iterable)) at_least_one_exists = any(map(operator.exists, iterable)) Design Discussion ================= Subtleties in chaining existence checking expressions ----------------------------------------------------- Similar subtleties arise in chaining existence checking expressions as already exist in chaining logical operators: the behaviour can be surprising if the right hand side of one of the expressions in the chain itself returns a value that doesn't exist. As a result, ``value = arg1 ?then f(arg1) ?else default()`` would be dubious for essentially the same reason that ``value = cond and expr1 or expr2`` is dubious: the former will evaluate ``default()`` if ``f(arg1)`` returns ``None``, just as the latter will evaluate ``expr2`` if ``expr1`` evaluates to ``False`` in a boolean context. Ambiguous interaction with conditional expressions -------------------------------------------------- In the proposal as currently written, the following is a syntax error: * ``value = f(arg) if arg ?else default`` While the following is a valid operation that checks a second condition if the first doesn't exist rather than merely being false: * ``value = expr1 if cond1 ?else cond2 else expr2`` The expression chaining problem described above means that the argument can be made that the first operation should instead be equivalent to: * ``value = f(arg) if operator.exists(arg) else default`` requiring the second to be written in the arguably clearer form: * ``value = expr1 if (cond1 ?else cond2) else expr2`` Alternatively, the first form could remain a syntax error, and the existence checking symbol could instead be attached to the ``if`` keyword: * ``value = expr1 if? cond else expr2`` Existence checking in other truth-checking contexts --------------------------------------------------- The truth-checking protocol is currently used in the following syntactic constructs: * logical conjunction (and-expressions) * logical disjunction (or-expressions) * conditional expressions (if-else expressions) * if statements * while loops * filter clauses in comprehensions and generator expressions In the current PEP, switching from truth-checking with ``and`` and ``or`` to existence-checking is a matter of substituting in the new keywords, ``?then`` and ``?else`` in the appropriate places. For other truth-checking contexts, it proposes either importing and using the ``operator.exists`` API, or else continuing with the current idiom of checking specifically for ``expr is not None`` (or the context appropriate equivalent). The simplest possible enhancement in that regard would be to elevate the proposed ``exists()`` API from an operator module function to a new builtin function. Alternatively, the ``?`` existence checking symbol could be supported as a modifier on the ``if`` and ``while`` keywords to indicate the use of an existence check rather than a truth check. However, it isn't at all clear that the potential consistency benefits gained for either suggestion would justify the additional disruption, so they've currently been omitted from the proposal. Defining expected invariant relations between ``__bool__`` and ``__exists__`` ----------------------------------------------------------------------------- The PEP currently leaves the definition of ``__bool__`` on all existing types unmodified, which ensures the entire proposal remains backwards compatible, but results in the following cases where ``bool(obj)`` returns ``True``, but the proposed ``operator.exists(obj)`` would return ``False``: * ``NaN`` values for ``float``, ``complex``, and ``decimal.Decimal`` * ``Ellipsis`` * ``NotImplemented`` The main argument for potentially changing these is that it becomes easier to reason about potential code behaviour if we have a recommended invariant in place saying that values which indicate they don't exist in an existence checking context should also report themselves as being ``False`` in a truth checking context. Failing to define such an invariant would lead to arguably odd outcomes like ``float("NaN") ?else 0.0`` returning ``0.0`` while ``float("NaN") or 0.0`` returns ``NaN``. Limitations =========== Arbitrary sentinel objects -------------------------- This proposal doesn't attempt to provide syntactic support for the "sentinel object" idiom, where ``None`` is a permitted explicit value, so a separate sentinel object is defined to indicate missing values:: _SENTINEL = object() def f(obj=_SENTINEL): return obj if obj is not _SENTINEL else default_value() This could potentially be supported at the expense of making the existence protocol definition significantly more complex, both to define and to use: * at the Python layer, ``operator.exists`` and ``__exists__`` implementations would return the empty tuple to indicate non-existence, and otherwise return a singleton tuple containing a reference to the object to be used as the result of the existence check * at the C layer, ``tp_exists`` implementations would return NULL to indicate non-existence, and otherwise return a `PyObject *` pointer as the result of the existence check Given that change, the sentinel object idiom could be rewritten as:: class Maybe: SENTINEL = object() def __init__(self, value): self._result = (value,) is value is not self.SENTINEL else () def __exists__(self): return self._result def f(obj=Maybe.SENTINEL): return Maybe(obj) ?else default_value() However, I don't think cases where the 3 proposed standard sentinel values (i.e. ``None``, ``Ellipsis`` and ``NotImplemented``) can't be used are going to be anywhere near common enough for the additional protocol complexity and the loss of symmetry between ``__bool__`` and ``__exists__`` to be worth it. Specification ============= The Abstract already gives the gist of the proposal and the Rationale gives some specific examples. If there's enough interest in the basic idea, then a full specification will need to provide a precise correspondence between the proposed syntactic sugar and the underlying conditional expressions that is sufficient to guide the creation of a reference implementation. ...TBD... Implementation ============== As with PEP 505, actual implementation has been deferred pending in-principle interest in the idea of adding these operators - the implementation isn't the hard part of these proposals, the hard part is deciding whether or not this is a change where the long term benefits for new and existing Python users outweigh the short term costs involved in the wider ecosystem (including developers of other implementations, language curriculum developers, and authors of other Python related educational material) adjusting to the change. ...TBD... References ========== .. [1] Wikipedia: Safe navigation operator (https://en.wikipedia.org/wiki/Safe_navigation_operator) .. [2] Wikipedia: Null coalescing operator (https://en.wikipedia.org/wiki/Null_coalescing_operator) .. [3] FileFormat.info: Unicode Character 'THERE EXISTS' (U+2203) (http://www.fileformat.info/info/unicode/char/2203/index.htm) Copyright ========= This document has been placed in the public domain under the terms of the CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Fri Oct 28 09:13:19 2016 From: barry at python.org (Barry Warsaw) Date: Fri, 28 Oct 2016 09:13:19 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> <20161027182829.03d3d782@anarchist> Message-ID: <20161028091319.44029ad3@subdivisions.wooz.org> On Oct 27, 2016, at 07:37 PM, Nick Badger wrote: >The problem with doing that is that it's ambiguous. There's no way of >telling which attribute is allowed to coalesce. You could of course support exactly the same syntax being proposed as a language change, e.g. from operator import attrgetter r = attrgetter('b?.x?.z') and then you wouldn't even need the `coalesce` argument. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From tjreedy at udel.edu Fri Oct 28 09:59:10 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 28 Oct 2016 09:59:10 -0400 Subject: [Python-ideas] A better interactive prompt In-Reply-To: References: Message-ID: On 10/25/2016 5:13 PM, Paul Moore wrote: > The > natural unit of interaction at the command line is the single line. To > the extent that (for example) fixing a mistake in a multi-line > construct at the command line is a real pain. Try IDLE. The unit of interaction is the statement. One writes, edits, and submits entire statements. PREV recalls the previous statement as a whole and one edits the entire statement before resubmitting. For me, for Python, statements are the natural unit of interaction. -- Terry Jan Reedy From ericsnowcurrently at gmail.com Fri Oct 28 10:13:28 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 28 Oct 2016 08:13:28 -0600 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <20161028091319.44029ad3@subdivisions.wooz.org> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> <20161027182829.03d3d782@anarchist> <20161028091319.44029ad3@subdivisions.wooz.org> Message-ID: On Fri, Oct 28, 2016 at 7:13 AM, Barry Warsaw wrote: > You could of course support exactly the same syntax being proposed as a > language change, e.g. > > from operator import attrgetter > r = attrgetter('b?.x?.z') > > and then you wouldn't even need the `coalesce` argument. +1 -eric From gjcarneiro at gmail.com Fri Oct 28 10:24:06 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 28 Oct 2016 15:24:06 +0100 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <20161028091319.44029ad3@subdivisions.wooz.org> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> <20161027182829.03d3d782@anarchist> <20161028091319.44029ad3@subdivisions.wooz.org> Message-ID: On 28 October 2016 at 14:13, Barry Warsaw wrote: > On Oct 27, 2016, at 07:37 PM, Nick Badger wrote: > > >The problem with doing that is that it's ambiguous. There's no way of > >telling which attribute is allowed to coalesce. > > You could of course support exactly the same syntax being proposed as a > language change, e.g. > > from operator import attrgetter > r = attrgetter('b?.x?.z') > > and then you wouldn't even need the `coalesce` argument. > The main drawback of this type of approach is that code checking tools will hardly ever support checking expressions inside the string like that. Also, you don't get proper syntax highlighting, code completion, etc. You can do anything you want by writing another programming language that is passed as string to a function, but that is not the same thing as having a proper syntax, is it? Just like type annotations with mypy: sure, you can add type annotations in comments, but it's not the same... -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Oct 28 10:28:49 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 28 Oct 2016 10:28:49 -0400 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On 10/26/2016 6:17 PM, Chris Barker wrote: > I"ve lost track of what (If anything) is actually being proposed here... > so I"m going to try a quick summary: > > > 1) an easy way to spell "remove all the characters other than these" In other words, 'only keep these'. We already have easy ways to create filtered strings. >>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf' >>> s2 = ''.join(c for c in s if c in set('abc')) >>> s2 'caaccca' >>> s3 = ''.join(filter(lambda c: c in set('abc'), s)) >>> s3 'caaccca' I expect the first to be a bit faster. Either can be wrapped in a keep() function. If one has a translation dictionary d, use that in twice in the genexp. >>> d = {'a': '1', 'b': '3x', 'c': 'fum'} >>> ''.join(d[c] for c in s if c in d.keys()) 'fum11fumfumfum1' -- Terry Jan Reedy From rosuav at gmail.com Fri Oct 28 10:35:07 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 29 Oct 2016 01:35:07 +1100 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Oct 29, 2016 at 1:28 AM, Terry Reedy wrote: > If one has a translation dictionary d, use that in twice in the genexp. > >>>> d = {'a': '1', 'b': '3x', 'c': 'fum'} >>>> ''.join(d[c] for c in s if c in d.keys()) > 'fum11fumfumfum1' Trivial change: >>> ''.join(d[c] for c in s if c in d) 'fum11fumfumfum1' ChrisA From ncoghlan at gmail.com Fri Oct 28 10:40:11 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Oct 2016 00:40:11 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 28 October 2016 at 23:35, Ryan Birmingham wrote: > I certainly like the concept, but I worry that use of __exists__() could > generalize it a bit beyond what you're intending in practice. It seems like > this should only check if an object exists, and that adding the magic method > would only lead to confusion. The same can be said of using __bool__, __nonzero__ and __len__ to influence normal condition checks, and folks have proven to be pretty responsible using those in practice (or, more accurately, when they're used in problematic ways, users object, and they either eventually get fixed, or folks move on to using other APIs that they consider better behaved). I also don't think the idea is sufficiently general to be worthy of dedicated syntax if it's limited specifically to "is not None" checks - None's definitely special, but it's not *that* special. Unifying None, NaN, NotImplemented and Ellipsis into a meta-category of objects that indicate the absence of information rather than a specific value, though? And also allowing developers to emulate the protocol for testing purposes? That's enough to pique my interest. That's why these are my first two questions on the list - if we don't agree on the core premise that there's a general concept here worth modeling as an abstract protocol, I'm -1 on the whole idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Oct 28 11:17:59 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 28 Oct 2016 16:17:59 +0100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 28 October 2016 at 15:40, Nick Coghlan wrote: > I also don't think the idea is sufficiently general to be worthy of > dedicated syntax if it's limited specifically to "is not None" checks > - None's definitely special, but it's not *that* special. Unifying > None, NaN, NotImplemented and Ellipsis into a meta-category of objects > that indicate the absence of information rather than a specific value, > though? And also allowing developers to emulate the protocol for > testing purposes? That's enough to pique my interest. I think that's the key for me - new syntax for "is not None" types of test seems of limited value (sure, other languages have such things, but that's not compelling - the contexts are different). However, I'm not convinced by your proposal that we can unify None, NaN, NotImplemented and Ellipsis in the way you suggest. I wouldn't expect a[1, None, 2] to mean the same as a[1, ..., 2], so why should an operator that tested for "Ellipsis or None" be useful? Same for NotImplemented - we're not proposing to allow rich comparison operators to return None rather than NotImplemented. The nearest to plausible is NaN vs None - but even there I don't see the two as the same. So, to answer your initial questions, in my opinion: 1. The concept of "checking for existence" is valid. 2. I don't see merging domain-specific values under a common "does not exist" banner as useful. Specifically, because I wouldn't want the "does not exist" values to become interchangeable. 3. I don't think there's much value in specific existence-checking syntax, precisely because I don't view it as a good thing to merge multiple domain-specific "does not exist", and therefore the benefit is limited to a shorter way of saying "is not None". As you noted, given my answers to 1-3, there's not much point in considering the remaining questions. However, I do think that there's another concept tied up in the proposals here, that of "short circuiting attribute access / indexing". The call was for something that said roughly "a.b if a is not None, otherwise None". But this is only one form of this pattern - there's a similar pattern, "a.b if a has an attribute b, otherwise None". And that's been spelled "getattr(a, 'b', None)" for a long time now. The existence of getattr, and the fact that no-one is crying out for it to be replaced with syntax, implies to me that before leaping to a syntax solution we should be looking at a normal function (possibly a builtin, but maybe even just a helper). I'd like to see a justification for why "a.b if a is not None, else None" deserves syntax when "a.b if a has attribute b, else None" doesn't. IMO, there's no need for syntax here. There *might* be some benefit in some helper functions, though. The cynic in me wonders how much of this debate is rooted in the fact that it's simply more fun to propose new syntax, than to just write a quick helper and get on with coding your application... Paul From toddrjen at gmail.com Fri Oct 28 11:28:16 2016 From: toddrjen at gmail.com (Todd) Date: Fri, 28 Oct 2016 11:28:16 -0400 Subject: [Python-ideas] Leave off "else" in ternary expression Message-ID: The null-coalescing discussion made me think about the current ternary "x = a if b else c" expression. In normal "if / else" clauses, the "else" is optional. I propose doing the same thing with ternary expressions (although I don't know what the result would be called, a "binary expression"?) The idea would be to allow this syntax: x = a if b Which would be equivalent to: x = a if b else x I think this would be useful syntax. In particular, I see it being useful for default value checking, but can also be used to override the result of particular corner cases from functions or methods.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mafagafogigante at gmail.com Fri Oct 28 11:36:12 2016 From: mafagafogigante at gmail.com (Bernardo Sulzbach) Date: Fri, 28 Oct 2016 13:36:12 -0200 Subject: [Python-ideas] Leave off "else" in ternary expression In-Reply-To: References: Message-ID: On 10/28/2016 01:28 PM, Todd wrote: > > The idea would be to allow this syntax: > > x = a if b > > Which would be equivalent to: > > x = a if b else x > What if x has not been defined yet? -- Bernardo Sulzbach http://www.mafagafogigante.org/ mafagafogigante at gmail.com From mertz at gnosis.cx Fri Oct 28 11:56:03 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 28 Oct 2016 11:56:03 -0400 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 11:17 AM, Paul Moore wrote: > On 28 October 2016 at 15:40, Nick Coghlan wrote: > > I also don't think the idea is sufficiently general to be worthy of > > dedicated syntax if it's limited specifically to "is not None" checks > > - None's definitely special, but it's not *that* special. Unifying > > None, NaN, NotImplemented and Ellipsis into a meta-category of objects > > that indicate the absence of information rather than a specific value, > > though? > First thing is that I definitely DO NOT want new SYNTAX to do this. I wouldn't mind having a new built-in function for this purpose if we could get the call signature right. Maybe it would be called 'exists()', but actually something like 'valued()' feels like a better fit. For the unusual case where the "null-coalescing" operation is what I'd want, I definitely wouldn't mind something like Barry's proposal of processing a string version of the expression. Sure, *some* code editors might not highlight it as much, but it's a corner case at most, to my mind. For that, I can type 'valued("x.y.z[w]", coalesce=ALL)' or whatever. > However, I'm not convinced by your proposal that we can unify None, NaN, > NotImplemented and Ellipsis in the way you suggest. I wouldn't expect > a[1, None, 2] to mean the same as a[1, ..., 2], so why should an > operator that tested for "Ellipsis or None" be useful? I *especially* think None and nan have very different meanings. A list of [1.1, nan, 3.3] means that I have several floating point numbers, but one came from a calculation that escaped the real domain. A list with [1.1, None, 3.3] means that I have already calculated three values, but am marking the fact I need later to perform a calculation to figure out the middle one. These are both valid and important use cases, but they are completely different from each other. Yours, David... -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Fri Oct 28 12:02:04 2016 From: toddrjen at gmail.com (Todd) Date: Fri, 28 Oct 2016 12:02:04 -0400 Subject: [Python-ideas] Leave off "else" in ternary expression In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 11:36 AM, Bernardo Sulzbach < mafagafogigante at gmail.com> wrote: > On 10/28/2016 01:28 PM, Todd wrote: > >> >> The idea would be to allow this syntax: >> >> x = a if b >> >> Which would be equivalent to: >> >> x = a if b else x >> >> > What if x has not been defined yet? > > Same as "x = a if b else x", it would raise a NameError. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Oct 28 12:20:58 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 28 Oct 2016 12:20:58 -0400 Subject: [Python-ideas] Leave off "else" in ternary expression In-Reply-To: References: Message-ID: This seems pretty nonsensical to me. Ternaries are not only used in simple assignments. E.g. 'myfunc(a, b if pred else c, d)' is common and obvious. 'myfunc(a, b if pred, d)' is strange with no obvious semantics. On Oct 28, 2016 11:29 AM, "Todd" wrote: > The null-coalescing discussion made me think about the current ternary "x > = a if b else c" expression. In normal "if / else" clauses, the "else" is > optional. I propose doing the same thing with ternary expressions > (although I don't know what the result would be called, a "binary > expression"?) > > The idea would be to allow this syntax: > > x = a if b > > Which would be equivalent to: > > x = a if b else x > > I think this would be useful syntax. In particular, I see it being useful > for default value checking, but can also be used to override the result of > particular corner cases from functions or methods.. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Fri Oct 28 12:35:17 2016 From: toddrjen at gmail.com (Todd) Date: Fri, 28 Oct 2016 12:35:17 -0400 Subject: [Python-ideas] Leave off "else" in ternary expression In-Reply-To: References: Message-ID: That is a good point. Nevermind then. On Fri, Oct 28, 2016 at 12:20 PM, David Mertz wrote: > This seems pretty nonsensical to me. Ternaries are not only used in simple > assignments. > > E.g. 'myfunc(a, b if pred else c, d)' is common and obvious. > > 'myfunc(a, b if pred, d)' is strange with no obvious semantics. > > On Oct 28, 2016 11:29 AM, "Todd" wrote: > >> The null-coalescing discussion made me think about the current ternary "x >> = a if b else c" expression. In normal "if / else" clauses, the "else" is >> optional. I propose doing the same thing with ternary expressions >> (although I don't know what the result would be called, a "binary >> expression"?) >> >> The idea would be to allow this syntax: >> >> x = a if b >> >> Which would be equivalent to: >> >> x = a if b else x >> >> I think this would be useful syntax. In particular, I see it being >> useful for default value checking, but can also be used to override the >> result of particular corner cases from functions or methods.. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Fri Oct 28 14:08:31 2016 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 28 Oct 2016 19:08:31 +0100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 9:30 AM, Nick Coghlan wrote: > [...] the current practicises of: > > * obj is not None (many different use cases) > * obj is not Ellipsis (in multi-dimensional slicing) Can you elaborate on this one? I don't think I've ever seen an `is not Ellipsis` check in real code. -- Mark From barry at python.org Fri Oct 28 14:19:30 2016 From: barry at python.org (Barry Warsaw) Date: Fri, 28 Oct 2016 14:19:30 -0400 Subject: [Python-ideas] Null coalescing operator References: <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <34570a16-5cea-4ba9-b870-faaebdbe8d1f@googlegroups.com> <20161027182829.03d3d782@anarchist> <20161028091319.44029ad3@subdivisions.wooz.org> Message-ID: <20161028141930.4d44533d@anarchist> On Oct 28, 2016, at 03:24 PM, Gustavo Carneiro wrote: >The main drawback of this type of approach is that code checking tools will >hardly ever support checking expressions inside the string like that. >Also, you don't get proper syntax highlighting, code completion, etc. > >You can do anything you want by writing another programming language that >is passed as string to a function, but that is not the same thing as having >a proper syntax, is it? Just like type annotations with mypy: sure, you >can add type annotations in comments, but it's not the same... The bar for adding new language syntax is, and must be, high. Every new bit of syntax has a cost, so it has to be worth it. Guido deemed type annotations to be worth it and he may do the same for null coalescing operators. I don't personally think the need is so great or the use cases so common to incur that cost, but I'm just one opinion. The advantage of lower-cost approaches such as adopting the syntax in attrgetter() is that you piggyback on an existing API. Then you can use that as an experiment to see whether you really do solve enough problems in Python for a syntax change to be worth it. It's a lot like the ability to create properties and such before the syntactic sugar of decorators was added. I think that feature's pre-syntax popular and utility proved that the cost of adding syntax was worth it. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From sjoerdjob at sjoerdjob.com Fri Oct 28 15:41:09 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Fri, 28 Oct 2016 21:41:09 +0200 Subject: [Python-ideas] Leave off "else" in ternary expression In-Reply-To: References: Message-ID: <20161028194109.GM13170@sjoerdjob.com> On Fri, Oct 28, 2016 at 11:28:16AM -0400, Todd wrote: > The null-coalescing discussion made me think about the current ternary "x = > a if b else c" expression. In normal "if / else" clauses, the "else" is > optional. I propose doing the same thing with ternary expressions > (although I don't know what the result would be called, a "binary > expression"?) > > The idea would be to allow this syntax: > > x = a if b > > Which would be equivalent to: > > x = a if b else x > > I think this would be useful syntax. In particular, I see it being useful > for default value checking, but can also be used to override the result of > particular corner cases from functions or methods.. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ To me, it is completely un-intuitive that it would work like that. It seems to parse as (x = a) if b instead as x = (a if b) That would make an assignment part of an expression, which seems very un-Pythonic. We also do not have if (x = a): pass When I first read your proposal, I assumed it would mean "use `None` as default `else` expression". Upon reading it, I am quite certain that the semantics you propose are not going to make it into Python. (But then again, I'm not the BDFL). From mehaase at gmail.com Fri Oct 28 16:01:34 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Fri, 28 Oct 2016 16:01:34 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: On Fri, Oct 14, 2016 at 11:36 PM, Guido van Rossum wrote: > I'm not usually swayed by surveys -- Python is not a democracy. Maybe > a bunch of longer examples would help all of us see the advantages of > the proposals. I understand. You said the next phase should be to pick the best operator for each sub-proposal but I'm not sure how I can help with that. If there's something I can do, let me know and I'm happy to try to do it. In terms of "bunch of longer examples", what did you have in mind? I could take some popular library and rewrite a section of it with the proposed operators, but that would depend on the response to the previous paragraph. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Oct 28 16:06:41 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 28 Oct 2016 21:06:41 +0100 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On 2016-08-26 13:47, Steven D'Aprano wrote: > Ken has made what I consider a very reasonable suggestion, to introduce > SI prefixes to Python syntax for numbers. For example, typing 1K will be > equivalent to 1000. > Just for the record, this is what you can now do in C++: User-Defined Literals http://arne-mertz.de/2016/10/modern-c-features-user-defined-literals/ From srkunze at mail.de Fri Oct 28 16:45:40 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 28 Oct 2016 22:45:40 +0200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> On 28.10.2016 22:06, MRAB wrote: > On 2016-08-26 13:47, Steven D'Aprano wrote: >> Ken has made what I consider a very reasonable suggestion, to introduce >> SI prefixes to Python syntax for numbers. For example, typing 1K will be >> equivalent to 1000. >> > Just for the record, this is what you can now do in C++: > > User-Defined Literals > http://arne-mertz.de/2016/10/modern-c-features-user-defined-literals/ Nice to hear. :) They now have 5 years of experience with that. Are there any surveys, experience reports, etc.? Cheers, Sven From srkunze at mail.de Fri Oct 28 16:41:32 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 28 Oct 2016 22:41:32 +0200 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: <783626da-0c34-cced-5483-adc4d57f9ffe@mail.de> Hi Nick, thanks for writing all of this down and composing a PEP. On 28.10.2016 10:30, Nick Coghlan wrote: > 1. Do we collectively agree that "existence checking" is a useful > general concept that exists in software development and is distinct > from the concept of "truth checking"? Right to your first question: If I were to answer this in a world of black and white, I need to say yes. In the real-world it's more probably more like: you can map existence-checking to truth checking in most practical cases without any harm. So, it's usefulness and distinctness is quite reduced. > 2. Do we collectively agree that the Python ecosystem would benefit > from an existence checking protocol that permits generalisation of > algorithms (especially short circuiting ones) across different "data > missing" indicators, including those defined in the language > definition, the standard library, and custom user code? I cannot speak for stdlib. For custom user code, I may repeat what I already said: it might be useful the outer code working on the boundaries of systems as incoming data is hardly perfect. It might harm inner working of software if bad datastructure design permeates it and requires constant checking for existence (or other things). Language definition-wise, I would say that if we can curb the issue described in the former paragraph, we'll be fine. Then it will shine through to all user code and the stdlib as well. However, I don't think we are going to achieve this. The current language design does indeed favor clean datastructure design since messy datastructures are hard to handle in current Python. So, this naturally minimizes the usage of messy datastructures which is not a bad thing IMHO. From my experience, clean datastructure design leads to easy-to-read clean code naturally. If people get their datastructures right in the inner parts of their software that's the most important step. If they subsequently would like to provide some convenience to their consumers (API, UI, etc.), that's a good cause. Still, it keeps the mess/checking in check plus it keeps it in a small amount of places (the boundary code). And it guides consumers also to clean usage (which is also not a bad thing IMHO). > 3. Do we collectively agree that it would be easier to use such a > protocol effectively if existence-checking equivalents to the > truth-checking "and" and "or" control flow operators were available? It's "just" shortcuts. So, yes. However, as truth checking already is available, it might even increase confusion of what checking is to use. I think most developers need less but powerful tools to achieve their full potential. > Only if we have at least some level of consensus on the above > questions regarding whether or not this is a conceptual modeling > problem worth addressing at the language level does it then make sense > to move on to the more detailed questions regarding the specific > proposed *solution* to the problem in the PEP: All in one, you can imagine that I am -1 on this. > 6a. Do we collectively agree that 'obj?.attr' would be a reasonable > spelling for "access this attribute only if the object exists"? > 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable > spelling for "access this subscript only if the object exists"? I know, I don't need to mention this because question 1 to 3 are already problematic, but just my two cents here. To me it's unclear what the ? would refer to anyway: is it the obj that needs checking or is it the attribute/subscript access? I get the feeling that this is totally unclear from the syntax (also confirmed by Paul's post). Still, thanks a lot for your work, Nice. :) Regards, Sven From srkunze at mail.de Fri Oct 28 17:25:53 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 28 Oct 2016 23:25:53 +0200 Subject: [Python-ideas] Marking keyword arguments (was: f-string, for dictionaries) In-Reply-To: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> References: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> Message-ID: <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> Hi Michel, hi Paul, sorry for hijacking this thread somewhat. I would like to extend this proposal a bit in order make it useful for a broader audience and to provide a real-word use-case. So, this post is just to gather general acceptance and utility of it. Michel, you specifically mentioned dicts as the place where to start with this kind of syntax. I have to tell you one could easily extend this thought process to function calls as well. Keyword arguments cannot be simply passed like positional arguments. They always require the parameter name to be mentioned. In production we have masses of code that goes like this: >>> do_thing(important_stuff=important_stuff, request=request) >>> make_any(config=123, important_stuff=important_stuff, other_form=other_form) Causes: - verbose names of variables - extended usage of kwargs - passing the same data over and over again down the calling hierarchy So, instead providing this kind of syntax for dicts only, why not also providing them for kwargs? Basically marking arguments as keyword arguments: my_func(:param1, :param2) ":param" equals "param=param" again but as already said that might just be placeholder syntax. What do you think? On 25.10.2016 23:18, Michel Desmoulin wrote: > > > Le 25/10/2016 ? 22:27, Paul Moore a ?crit : >> On 25 October 2016 at 20:11, Michel Desmoulin >> wrote: >>> Similarly, I'd like to suggest a similar feature for building >>> dictionaries: >>> >>>>>> foo = 1 >>>>>> bar = 2 >>>>>> {:bar, :foo} >>> {'bar': 1, 'foo', 2} >> >> I don't see a huge advantage over >> >>>>> dict(foo=foo, bar=bar) >> >> Avoiding having to repeat the variable names doesn't feel like a >> killer advantage here, certainly not sufficient to warrant adding yet >> another dictionary construction syntax. Do you have examples of code >> that would be substantially improved with this syntax (over using an >> existing approach)? > > {:bar, :foo} > vs > dict(foo=foo, bar=bar) > > has the same benefit that would have > > f"hello {foo} {bar}" > vs > "hello {} {}".format(foo, bar) > >> >>> And a similar way to get the content from the dictionary into variables: >>> >>>>>> values = {'bar': 1, 'foo', 2} >>>>>> {:bar, :foo} = values >>>>>> bar >>> 1 >>>>>> foo >>> 2 >> >> There aren't as many obvious alternative approaches here, but it's not >> clear why you'd want to do this. So in this case, I'd want to see >> real-life use cases. Most of the ones I can think of are just to allow >> a shorter form for values['foo']. For those uses >> >> >>> from types import SimpleNamespace >> >>> o = SimpleNamespace(**values) >> >> o.foo >> 1 >> >> works pretty well. > > This is just unpacking for dicts really. > > As you would do: > > a, b = iterable > > you do: > > {:a, :b} = mapping > >> >>> The syntaxes used here are of course just to illustrate the concept >>> and I'm >>> suggesting we must use those. >> >> Well, if we ignore the syntax for a second, what is your proposal >> exactly? It seems to be in 2 parts: >> >> 1. "We should have a dictionary building feature that uses keys based >> on variables from the local namespace". OK, that's not something I've >> needed much, and when I have, there have usually been existing ways to >> do the job (such as dict(foo=foo) noted above) that are perfectly >> sufficient. Sometimes the existing alternatives look a little clumsy >> and repetitive, but that's a very subjective judgement, and any new >> syntax could easily look worse to me (your specific proposal, for >> example, does). So I can see a small potential benefit in (subjective) >> readability, but that's offset by the proposal being another way of >> doing something that's already pretty well covered in the language. >> Add to that all of the "normal" objections to new syntax (more to >> teach/learn, hard to google, difficulty finding a form that suits >> everyone, etc) and it's hard to see this getting accepted. >> >> 2. "We should have a way of unpacking a dictionary into local >> variables". That's not something that I can immediately think of a way >> of doing currently - so that's a point in its favour. But honestly, >> I've never seen the need to do this outside of interactive use (for >> which see below). If all you want is to avoid the d['name'] syntax, >> which is quite punctuation-heavy, the SimpleNamespace trick above does >> that. So there's almost no use case that I can see for this. Can you >> give examples of real-world code where this would be useful? >> >> On the other hand, your proposal, like many that have come up >> recently, seems to be driven (if it's OK for me to guess at your >> motivations) by an interest in being able to write relatively terse >> one-liners, or at least to avoid some of the syntactic overheads of >> existing constructs. It seems to me that the environment I'd most want >> to do this in is the interactive interpreter. So I wonder if this (and >> similar) proposals are driven by a feeling that it's "clumsy" writing >> code at the interactive prompt. That may well be so. The standard >> interactive prompt is pretty basic, and yet it's a *huge* part of the >> unique experience working with Python to be able to work at the prompt >> as you develop. So maybe there's scope for discussion here on >> constructs focused more on interactive use? That probably warrants a >> separate thread, though, so I'll split it off from this discussion. >> Feel free to contribute there if I'm right in where I think the >> motivation for your proposals came from. >> >> Paul > > > Currently I already have shortcuts those features. > > I have wrappers for dictionaries such as: > > d(mapping).unpack('foo', 'bar') > > Which does some hack with stack frame and locals(). > > And: > > d.from_vars('foo', 'bar') > > I use them only in the shell of course, because you can't really have > such hacks in production code. > > I would use such features in my production code if they was a clean way > to do it. > > It's just convenience syntaxic sugar. > > You can argue that decorator could be written: > > def func(): > pass > > func = decorator(func) > > Instead of: > > @decorator > def func(): > pass > > But the second one is more convenient. And so are comprehensions, > unpacking, and f-strings. Clearly not killer features, just nice to have. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From srkunze at mail.de Fri Oct 28 17:36:58 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 28 Oct 2016 23:36:58 +0200 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: Great idea! Another issue I encounter regularly are things like: >>> func(mylist[i], mylist2[j]) IndexError: list index out of range So, which are the list and index that cause the error? On 25.10.2016 00:07, Ryan Gonzalez wrote: > I personally find it kind of annoying when you have code like this: > > > x = A(1, B(2, 3)) > > > and Python's error message looks like this: > > > TypeError: __init__() takes 1 positional argument but 2 were given > > > It doesn't give much of a clue to which `__init__` is being called. At all. > > The idea: when showing the function name in an error like this, show the > fully qualified name, like: > > > TypeError: A.__init__() takes 1 positional argument but 2 were given > > > This would be MUCH more helpful! > > > Another related change would be to do the same thing in tracebacks: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in __init__ > AssertionError > > > to: > > > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in MyClass.__init__ > AssertionError > > > which could make it easier to find where exactly an error originated. > > -- > Ryan (????) > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From p.f.moore at gmail.com Fri Oct 28 17:41:21 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 28 Oct 2016 22:41:21 +0100 Subject: [Python-ideas] Marking keyword arguments (was: f-string, for dictionaries) In-Reply-To: <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> References: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> Message-ID: On 28 October 2016 at 22:25, Sven R. Kunze wrote: > So, instead providing this kind of syntax for dicts only, why not also > providing them for kwargs? Basically marking arguments as keyword arguments: > > > my_func(:param1, :param2) > > > ":param" equals "param=param" again but as already said that might just be > placeholder syntax. > > > What do you think? -1. I don't like the use of the colon here. I don't think there's any need to avoid the repetition in arg_name=arg_name, it's a common convention, easy to read and understand, and even to write with a bit of editor support. Explicit is better than implicit implies here, IMO. Paul From rosuav at gmail.com Fri Oct 28 17:41:44 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 29 Oct 2016 08:41:44 +1100 Subject: [Python-ideas] Showing qualified names when a function call fails In-Reply-To: References: Message-ID: On Sat, Oct 29, 2016 at 8:36 AM, Sven R. Kunze wrote: > Another issue I encounter regularly are things like: > >>>> func(mylist[i], mylist2[j]) > > IndexError: list index out of range > > > So, which are the list and index that cause the error? +1. Showing the list's contents might be problematic, but it could show the valid indices - if the lists are different lengths, it would tell you which is which. IndexError: list index 5 is out of range -4..3. Added bonus: Someone who doesn't know that negative indices are valid will get a cool hint. ChrisA From mertz at gnosis.cx Fri Oct 28 18:24:21 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 28 Oct 2016 18:24:21 -0400 Subject: [Python-ideas] Marking keyword arguments (was: f-string, for dictionaries) In-Reply-To: References: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> Message-ID: Yes, -1. I feel like we should add a header to all messages on this list: WARNING: PYTHON IS NOT PERL, NOR APL! I know I'm being snarky, but too many of the recent ideas feel like code golf for uncommon user cases. Or at least not common enough to warrant the cognitive burden of more syntax. On Oct 28, 2016 2:42 PM, "Paul Moore" wrote: > On 28 October 2016 at 22:25, Sven R. Kunze wrote: > > So, instead providing this kind of syntax for dicts only, why not also > > providing them for kwargs? Basically marking arguments as keyword > arguments: > > > > > > my_func(:param1, :param2) > > > > > > ":param" equals "param=param" again but as already said that might just > be > > placeholder syntax. > > > > > > What do you think? > > -1. I don't like the use of the colon here. I don't think there's any > need to avoid the repetition in arg_name=arg_name, it's a common > convention, easy to read and understand, and even to write with a bit > of editor support. Explicit is better than implicit implies here, IMO. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Oct 28 22:02:36 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 29 Oct 2016 11:02:36 +0900 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> Message-ID: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Mark E. Haase writes: > In terms of "bunch of longer examples", what did you have in mind? > I could take some popular library and rewrite a section of it with > the proposed operators, but that would depend on the response to > the previous paragraph. I gather you think you have a deadlock here. The way to break it is to just do it. Pick a syntax and do the rewriting. My memory of some past instances is that many of the senior devs (especially Guido) will "see through the syntax" to evaluate the benefits of the proposal, even if they've said they don't particularly like the initially- proposed syntax. Unfortunately here the most plausible syntax is one that Guido has said he definitely doesn't like: using '?'. The alternatives are pretty horrible (a Haskell-like 'maybe' keyword, or the OPEN SQUARE character used by some logicians in modal logic -- the problem with the latter is that for many people it may not display at all with their font configurations, or it may turn into mojibake in email. OTOH, that case was an astral character -- after Guido announced his opposition to '?', the poster used PILE OF POO as the operator. OPEN SQUARE is in the basic multilingual plane, so probably is OK if the recipient can handle Unicode. '?' vs. '?': maybe that helps narrow the choice set? From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Oct 28 23:00:17 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 29 Oct 2016 12:00:17 +0900 Subject: [Python-ideas] Marking keyword arguments (was: f-string, for dictionaries) In-Reply-To: <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> References: <41341db4-b399-1adb-3da1-b76fefed3460@gmail.com> <1660ffe8-f194-ccbc-a40c-4b75821fd28c@mail.de> Message-ID: <22548.4289.494115.124929@turnbull.sk.tsukuba.ac.jp> Sven R. Kunze writes: > So, instead providing this kind of syntax for dicts only, why not also > providing them for kwargs? Basically marking arguments as keyword arguments: > > my_func(:param1, :param2) Please don't. I need to write Lisp, where that syntax has a completely different meaning: "this symbol has itself as a value". Ie, (eval :key) always evaluates to :key itself. Syntax based on single characters is very hard to get right once you get past the 4-function calculator level. C's ternary operator was inspired design IMO (YMMV), but far too many others just fall flat. One aspect of Python that is important to me is that it is executable pseudo-code, as they say. A litmus test for that is that a reasonably bright English-speaking 12-year-old without previous exposure to Python should be able to tell you what a statement does. There are a few cases where that principle is violated (said 12-year-old won't know the difference between "=" and '==' though context will probably teach her, nor the meaning of the attribute operator period), but mostly it works. Infix operators "in", "or", and "and" help a lot. As English, the ternary operator "it's-this if test else it's-that" is stilted but the 12-year-old can parse it. This test also applies to the null-coalescing operators and '@'. The fact that '@' got in means the principle is hardly inviolable, of course. At least in the case of '@' you can tell it's a binary operator, but the "implicit-assignment" colon and "null-coalescing" question mark operator-mode-modifying operator[1] could do anything. Null-coalescing is plausible -- I've never needed it, nor do I expect to need it any time soon, but the examples of where it would be used seem reasonable to me, and where needed the existing WTDIs quickly lead to an unreadable morass of nested ternaries with extremely redundant operands. I'll leave the judgment about whether it clears the bar to those who have broader experience than I. But this colon notation is horrible. foo=foo and 'foo':foo are already the respective TOOWTDIs, and quite readable. These operations don't nest, so I can't see how they would become either unreadable or exponentially redundant. Rather than change Python, improve your editor[2], and your fingers will thank you for the rest of your life, even when you're not editing Python. Footnotes: [1] If that doesn't scare you, what will? [2] In Emacs with "dynabbrev" enabled the keystrokes are respectively "f o o = meta-/" and "' f o o ' : meta-/". A good editor can -- and IMO should -- make this operation efficient, as a by-product of providing context-dependent prefix expansion. From ncoghlan at gmail.com Sat Oct 29 00:52:42 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Oct 2016 14:52:42 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 29 October 2016 at 04:08, Mark Dickinson wrote: > On Fri, Oct 28, 2016 at 9:30 AM, Nick Coghlan wrote: >> [...] the current practicises of: >> >> * obj is not None (many different use cases) >> * obj is not Ellipsis (in multi-dimensional slicing) > > Can you elaborate on this one? I don't think I've ever seen an `is not > Ellipsis` check in real code. It's more often checked the other way around: "if Ellipsis is passed in, then work out the multi-dimensional slice from the underlying object" And that reflects the problem Paul and David highlighted: in any *specific* context, there's typically either only one sentinel we want to either propagate or else replace with a calculated value, or else we want to handle different sentinel values differently, which makes the entire concept of a unifying duck-typing protocol pragmatically dubious, and hence calls into question the idea of introducing new syntax for working with it. On the other hand, if you try to do this as an "only None is special" kind of syntax, then any of the *other* missing data sentinels (of which we have 4 in the builtins alone, and 5 when you add the decimal module) end up being on much the same level as "arbitrary sentinel objects" in the draft PEP 531, which I would consider to be an incredibly poor outcome for a change as invasive as adding new syntax: https://www.python.org/dev/peps/pep-0531/#arbitrary-sentinel-objects Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Oct 29 01:03:22 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Oct 2016 15:03:22 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 29 October 2016 at 01:46, Ryan Gonzalez wrote: > On Oct 28, 2016 3:30 AM, "Nick Coghlan" wrote: >> *snip* >> 4. Do we collectively agree that "?then" and "?else" would be >> reasonable spellings for such operators? > > Personally, I find that kind of ugly. What's wrong with just ? instead of > ?else? When you see the expression "LHS ? RHS", there's zero indication of how to read it other than naming the symbol: "LHS question mark RHS". By contrast, "LHS ?then RHS" and "LHS ?else RHS" suggest the pronunciations "LHS then RHS" and "LHS else RHS", which in turn are potentially useful mnemonics for the full expansions "if LHS exists then RHS else LHS" and "LHS if LHS exists else RHS". (Knowing that "?" indicates an existence check is still something you'd have to learn, but even without knowing that, the keywords could get you quite some way towards correctly understanding what the construct means) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rainventions at gmail.com Sat Oct 29 01:20:30 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Sat, 29 Oct 2016 01:20:30 -0400 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> References: <20160826124716.GP26300@ando.pearwood.info> <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> Message-ID: I'd certainly be interested in hearing about how this has worked with C++, but this would certainly make scientific code less easy to misuse due to unclear units. -Ryan Birmingham On 28 October 2016 at 16:45, Sven R. Kunze wrote: > On 28.10.2016 22:06, MRAB wrote: > >> On 2016-08-26 13:47, Steven D'Aprano wrote: >> >>> Ken has made what I consider a very reasonable suggestion, to introduce >>> SI prefixes to Python syntax for numbers. For example, typing 1K will be >>> equivalent to 1000. >>> >>> Just for the record, this is what you can now do in C++: >> >> User-Defined Literals >> http://arne-mertz.de/2016/10/modern-c-features-user-defined-literals/ >> > > Nice to hear. :) > > They now have 5 years of experience with that. Are there any surveys, > experience reports, etc.? > > > Cheers, > Sven > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 29 02:21:18 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Oct 2016 16:21:18 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 29 October 2016 at 14:52, Nick Coghlan wrote: > And that reflects the problem Paul and David highlighted: in any > *specific* context, there's typically either only one sentinel we want > to either propagate or else replace with a calculated value, or else > we want to handle different sentinel values differently, which makes > the entire concept of a unifying duck-typing protocol pragmatically > dubious, and hence calls into question the idea of introducing new > syntax for working with it. > > On the other hand, if you try to do this as an "only None is special" > kind of syntax, then any of the *other* missing data sentinels (of > which we have 4 in the builtins alone, and 5 when you add the decimal > module) end up being on much the same level as "arbitrary sentinel > objects" in the draft PEP 531, which I would consider to be an > incredibly poor outcome for a change as invasive as adding new syntax: > https://www.python.org/dev/peps/pep-0531/#arbitrary-sentinel-objects Considering this question of "Am I attempting to extract the right underlying design pattern?" further puts me in mind of Greg Ewing's old rejected proposal to permit overloading the "and" and "or" operators: https://www.python.org/dev/peps/pep-0335/ After all, the proposed "?then" and "?else" operators in PEP 531 are really just customised variants of "and" and "or" that use a slightly modified definition of truth-checking. PEP 335 attempted to tackle that operator overloading problem directly, but now I'm wondering if it may be more fruitful to instead consider the problem in terms of the expanded conditional expressions: * "LHS and RHS" -> "RHS if LHS else LHS" * "LHS or RHS" -> "LHS if LHS else RHS" A short-circuiting if-else protocol for arbitrary "THEN if COND else ELSE" expressions could then look like this: _condition = COND if _condition: _then = THEN if hasattr(_condition, "__then__"): return _condition.__then__(_then) return _then else: _else = ELSE if hasattr(_condition, "__else__"): return _condition.__else__(_else) return _else "and" and "or" would then be simplified versions of that, where the condition expression was re-used as either the "THEN" subexpression ("or") or the "ELSE" subexpression ("and"). The reason I think this is potentially interesting in the context of PEPs 505 and 531 is that with that protocol defined, the null-coalescing "operator" wouldn't need to be a new operator, it could just be a new builtin that defined the appropriate underlying control flow: value = if_set(expr1) or if_set(expr2) or expr3 where if_set was defined as: class if_set: def __init__(self, value): self.value = value def __bool__(self): return self is not None def __then__(self, result): if result is self: return self.value return result def __else__(self, result): if result is self: return self.value return result Checking for a custom sentinel value instead of ``None`` would then be as straightforward as using a different conditional control flow manager that replaced the ``__bool__`` check against ``None`` with a check against the specific sentinel of interest. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sat Oct 29 02:30:38 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 29 Oct 2016 17:30:38 +1100 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161029063037.GU15983@ando.pearwood.info> On Sat, Oct 29, 2016 at 11:02:36AM +0900, Stephen J. Turnbull wrote: > Unfortunately here the most plausible syntax is one > that Guido has said he definitely doesn't like: using '?'. The > alternatives are pretty horrible (a Haskell-like 'maybe' keyword, or > the OPEN SQUARE character used by some logicians in modal logic -- the > problem with the latter is that for many people it may not display at > all with their font configurations, or it may turn into mojibake in > email. I think you mean WHITE SQUARE? At least, I can not see any "OPEN SQUARE" code point in Unicode, and the character you use below ? is called WHITE SQUARE. > OTOH, that case was an astral character -- after Guido announced his > opposition to '?', the poster used PILE OF POO as the operator. OPEN > SQUARE is in the basic multilingual plane, so probably is OK if the > recipient can handle Unicode. '?' vs. '?': maybe that helps narrow > the choice set? I cannot wait for the day that we can use non-ASCII operators. But I don't think that day has come: it is still too hard for many people (including me) to generate non-ASCII characters at the keyboard, and font support for some of the more useful ones are still inconsistent or lacking. For example, we don't have a good literal for empty sets. How about ?? Sadly, in my mail client and in the Python REPR, it displays as a "missing glyph" open rectangle. And how would you type it? Ironically, WHITE SQUARE does display, but it took me a while to realise because at first I thought it too was the missing glyph character. And I still have no idea how to type it. Java, I believe, allows you to enter escape sequences in source code, not just in strings. So we could hypothetically allow one of: myobject\N{WHITE SQUARE}attribute myobject\u25a1attribute as a pure-ASCII way of getting myobject?attribute but really, who is going to do that? It is bad enough when strings contain escape sequences, but source code? So even though I *want* to use non-ASCI operators, I have to admit that I *can't* realistically use non-ASCII operators. Not yet. Wishing-that-somebody-can-prove-me-wrong-ly y'rs, -- Steve From p.f.moore at gmail.com Sat Oct 29 05:54:06 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 29 Oct 2016 10:54:06 +0100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On 29 October 2016 at 07:21, Nick Coghlan wrote: > A short-circuiting if-else protocol for arbitrary "THEN if COND else > ELSE" expressions could then look like this: > > _condition = COND > if _condition: > _then = THEN > if hasattr(_condition, "__then__"): > return _condition.__then__(_then) > return _then > else: > _else = ELSE > if hasattr(_condition, "__else__"): > return _condition.__else__(_else) > return _else > > "and" and "or" would then be simplified versions of that, where the > condition expression was re-used as either the "THEN" subexpression > ("or") or the "ELSE" subexpression ("and"). > > The reason I think this is potentially interesting in the context of > PEPs 505 and 531 is that with that protocol defined, the > null-coalescing "operator" wouldn't need to be a new operator, it > could just be a new builtin that defined the appropriate underlying > control flow: This seems to have some potential to me. It doesn't seem massively intrusive (there's a risk that it might be considered a step too far in "making the language mutable", but otherwise it's just a new extension protocol around an existing construct). The biggest downside I see is that it could be seen as simply generalisation for the sake of it. But with the null-coalescing / sentinel checking use case, plus Greg's examples from the motivation section of PEP 335, there may well be enough potential uses to warrant such a change. Paul From steve at pearwood.info Sat Oct 29 05:54:54 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 29 Oct 2016 20:54:54 +1100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: <20161029095453.GW15983@ando.pearwood.info> On Sat, Oct 29, 2016 at 03:03:22PM +1000, Nick Coghlan wrote: > On 29 October 2016 at 01:46, Ryan Gonzalez wrote: > > On Oct 28, 2016 3:30 AM, "Nick Coghlan" wrote: > >> *snip* > >> 4. Do we collectively agree that "?then" and "?else" would be > >> reasonable spellings for such operators? > > > > Personally, I find that kind of ugly. What's wrong with just ? instead of > > ?else? > > When you see the expression "LHS ? RHS", there's zero indication of > how to read it other than naming the symbol: "LHS question mark RHS". /insert tongue firmly in cheek We already have hash # bang ! splat * wack / and twiddle ~ so I suggest: LHS huh? RHS -- Steve From p.f.moore at gmail.com Sat Oct 29 05:59:05 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 29 Oct 2016 10:59:05 +0100 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <20161029063037.GU15983@ando.pearwood.info> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> Message-ID: On 29 October 2016 at 07:30, Steven D'Aprano wrote: > So even though I *want* to use non-ASCI operators, I have to admit that > I *can't* realistically use non-ASCII operators. Not yet. Personally, I'm not even sure I want non-ASCII operators until non-ASCII characters are common, and used without effort, in natural language media such as email (on lists like this), source code comments, documentation, etc. For better or worse, it may be emoji that drive that change ;-) Paul From steve at pearwood.info Sat Oct 29 06:53:23 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 29 Oct 2016 21:53:23 +1100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: <20161029105321.GX15983@ando.pearwood.info> On Sat, Oct 29, 2016 at 02:52:42PM +1000, Nick Coghlan wrote: > On 29 October 2016 at 04:08, Mark Dickinson wrote: > > On Fri, Oct 28, 2016 at 9:30 AM, Nick Coghlan wrote: > >> [...] the current practicises of: > >> > >> * obj is not None (many different use cases) > >> * obj is not Ellipsis (in multi-dimensional slicing) > > > > Can you elaborate on this one? I don't think I've ever seen an `is not > > Ellipsis` check in real code. > > It's more often checked the other way around: "if Ellipsis is passed > in, then work out the multi-dimensional slice from the underlying > object" > > And that reflects the problem Paul and David highlighted: in any > *specific* context, there's typically either only one sentinel we want > to either propagate or else replace with a calculated value, or else > we want to handle different sentinel values differently, which makes > the entire concept of a unifying duck-typing protocol pragmatically > dubious, and hence calls into question the idea of introducing new > syntax for working with it. > > On the other hand, if you try to do this as an "only None is special" > kind of syntax, then any of the *other* missing data sentinels (of > which we have 4 in the builtins alone, and 5 when you add the decimal > module) end up being on much the same level as "arbitrary sentinel > objects" in the draft PEP 531, which I would consider to be an > incredibly poor outcome for a change as invasive as adding new syntax: > https://www.python.org/dev/peps/pep-0531/#arbitrary-sentinel-objects Hmmm. I see your point, but honestly, None *is* special. Even for special objects, None is even more special. Here are your examples again: * obj is not None (many different use cases) * obj is not Ellipsis (in multi-dimensional slicing) * obj is not NotImplemented (in operand coercion) * math.isnan(value) * cmath.isnan(value) * decimal.getcontext().is_nan(value) Aside from the first, the rest are quite unusual: - Testing for Ellipsis occurs in __getitem__, and not even always then. - Testing for NotImplemented occurs in operator dunders, rarely if ever outside those methods. (You probably should never see NotImplemented except in an operator dunder.) In both cases, this will be a useful feature for the writer of the class, not the user of the class. - Testing for NAN is really only something of interest to those writing heavily numeric code and not even always then. You can go a LONG way with numeric code by just assuming that x is a regular number, and leaving NANs for "version 2". Especially in Python, which typically raises an exception where it could return a NAN. In other words, its quite hard to generate an unexpected NAN in Python. So these examples are all quite special and of very limited applicability and quite marginal utility. My guess is that the majority of programmers will never care about these cases, and of those who do, they'll only need it quite rarely. (We use classes far more often than we write classes.) But None is different. My guess is that every Python programmer, from the newest novice to the most experienced guru, will need to check for None, and likely frequently. So my sense is that of all the use-cases for existence checking divide into two categories: - checking for None (> 95%) - everything else (< 5%) I did a very rough search of the Python code on my system and found this: is [not] None: 10955 is [not] Ellipsis: 13 is [not] NotImplemented: 285 is_nan( / isnan( : 470 which is not far from my guess. -- Steve From steve at pearwood.info Sat Oct 29 07:44:16 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 29 Oct 2016 22:44:16 +1100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: <20161029114416.GY15983@ando.pearwood.info> On Fri, Oct 28, 2016 at 06:30:05PM +1000, Nick Coghlan wrote: [...] > 1. Do we collectively agree that "existence checking" is a useful > general concept that exists in software development and is distinct > from the concept of "truth checking"? Not speaking for "we", only for myself: of course. > 2. Do we collectively agree that the Python ecosystem would benefit > from an existence checking protocol that permits generalisation of > algorithms (especially short circuiting ones) across different "data > missing" indicators, including those defined in the language > definition, the standard library, and custom user code? Maybe, but probably not. Checking for "data missing" or other sentinels is clearly an important thing to do, but it remains to be seen whether (1) it should be generalised and (2) there is a benefit to making it a protocol. My sense so far is that generalising beyond None is YAGNI. None of the other examples you give strike me as common enough to justify special syntax, or even a protocol. I'm not *against* the idea, I just remain unconvinced. But in particular, I *don't* think it is useful to introduce a concept similar to "truthiness" for existence. Duck-typing truthiness is useful: most of the time, I don't care which truthy value I have, only that it is truthy. But I'm having difficulty seeing when I would want to extend that to existence checking. The existence singletons are not generally interchangeable: - operator dunder methods don't allow you to pass None instead NotImplemented, nor should they; - (1 + nan) returns a nan, but (1 + Ellipsis) is an error; - array[...] and array[NotImplemented] probably mean different things; etc. More on this below. > 3. Do we collectively agree that it would be easier to use such a > protocol effectively if existence-checking equivalents to the > truth-checking "and" and "or" control flow operators were available? I'm not sure about this one. [...] > 4. Do we collectively agree that "?then" and "?else" would be > reasonable spellings for such operators? As in... spam ?then eggs meaning (conceptually): if (spam is None or spam is NotImplemented or spam is Ellipsis or isnan(spam)): return eggs else: return spam I don't know... I can't see myself ever not caring which "missing" value I have, only that it is "missingly" (by analogy with "truthy"). If I'm writing an operator dunder method, I want to treat NotImplemented as "missing", but anything else (None, Ellipsis, NAN) would be a regular value. If I'm writing a maths function that supports NANs, I'd probably want to treat None, Ellipsis and NotImplemented as errors. While I agree that "existence checking" is a concept, I don't think existence generalises in the same way Truth generalises to truthiness. > 5a. Do we collectively agree that "access this attribute only if the > object exists" would be a particularly common use case for such > operators? Yes, but only for the "object is not None" case. Note that NANs ought to support the same attributes as other floats. If they don't, I'd call it an error: py> nan = float('nan') py> nan.imag 0.0 py> nan.real nan So I shouldn't have to write: y = x if x.isnan() else x.attr I should be able to just write: y = x.attr and have NANs do the right thing. But if we have a separate, dedicated NA/Missing value, like R's NA, things may be different. > 5b. Do we collectively agree that "access this subscript only if the > object exists" would be a particularly common use case for such > operators? I'd be surprised if it were very common, but it might be "not uncommon". > 5c. Do we collectively agree that "bind this value to this target only > if the value currently bound to the target nominally doesn't exist" > would be a particularly common use case for such operators? You mean a short-cut for: if obj is None: obj = spam Sure, that's very common. But: if (obj is None or obj is NotImplemented or obj is Ellipsis or isnan(obj)): obj = spam not so much. > 6a. Do we collectively agree that 'obj?.attr' would be a reasonable > spelling for "access this attribute only if the object exists"? I like that syntax. > 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable > spelling for "access this subscript only if the object exists"? > 6c. Do we collectively agree that 'target ?= expr' would be a > reasonable spelling for "bind this value to this target only if the > value currently bound to the target nominally doesn't exist"? I don't hate either of those. Thanks for writing the PEP! -- Steve From prometheus235 at gmail.com Sat Oct 29 12:43:04 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sat, 29 Oct 2016 11:43:04 -0500 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> Message-ID: >From that page: > User-defined literals are basically normal function calls with a fancy > syntax. [...] While user defined literals look very neat, they are not much > more than syntactic sugar. There is not much difference between defining > and calling a literal operator with "foo"_bar and doing the same with an > ordinary function as bar("foo"). In theory, we could write literal > operators that have side effects and do anything we want, like a normal > function. Obviously the arbitrary-function-part of that will never happen in Python (yes?) Also, for discussion, remember to make the distinction between 'units' (amps, meters, seconds) and 'prefixes' (micro, milli, kilo, mega). Right away from comments, it seems 1_m could look like 1 meter to some, or 0.001 to others. Typically when I need to enter very small/large literals, I'll use "engineering" SI notation (powers divisible by 3 that correspond to the prefixes): 0.1e-9 = 0.1 micro____. On Sat, Oct 29, 2016 at 12:20 AM, Ryan Birmingham wrote: > I'd certainly be interested in hearing about how this has worked with C++, > but this would certainly make scientific code less easy to misuse due to > unclear units. > > -Ryan Birmingham > > On 28 October 2016 at 16:45, Sven R. Kunze wrote: > >> On 28.10.2016 22:06, MRAB wrote: >> >>> On 2016-08-26 13:47, Steven D'Aprano wrote: >>> >>>> Ken has made what I consider a very reasonable suggestion, to introduce >>>> SI prefixes to Python syntax for numbers. For example, typing 1K will be >>>> equivalent to 1000. >>>> >>>> Just for the record, this is what you can now do in C++: >>> >>> User-Defined Literals >>> http://arne-mertz.de/2016/10/modern-c-features-user-defined-literals/ >>> >> >> Nice to hear. :) >> >> They now have 5 years of experience with that. Are there any surveys, >> experience reports, etc.? >> >> >> Cheers, >> Sven >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Oct 28 03:11:10 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 28 Oct 2016 00:11:10 -0700 (PDT) Subject: [Python-ideas] Deterministic iterator cleanup In-Reply-To: References: Message-ID: On Tuesday, October 25, 2016 at 6:26:17 PM UTC-4, Nathaniel Smith wrote: > > On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan > wrote: > > On 20 October 2016 at 07:02, Nathaniel Smith > wrote: > >> The first change is to replace the outer for loop with a while/pop > >> loop, so that if an exception occurs we'll know which iterables remain > >> to be processed: > >> > >> def chain(*iterables): > >> try: > >> while iterables: > >> for element in iterables.pop(0): > >> yield element > >> ... > >> > >> Now, what do we do if an exception does occur? We need to call > >> iterclose on all of the remaining iterables, but the tricky bit is > >> that this might itself raise new exceptions. If this happens, we don't > >> want to abort early; instead, we want to continue until we've closed > >> all the iterables, and then raise a chained exception. Basically what > >> we want is: > >> > >> def chain(*iterables): > >> try: > >> while iterables: > >> for element in iterables.pop(0): > >> yield element > >> finally: > >> try: > >> operators.iterclose(iter(iterables[0])) > >> finally: > >> try: > >> operators.iterclose(iter(iterables[1])) > >> finally: > >> try: > >> operators.iterclose(iter(iterables[2])) > >> finally: > >> ... > >> > >> but of course that's not valid syntax. Fortunately, it's not too hard > >> to rewrite that into real Python -- but it's a little dense: > >> > >> def chain(*iterables): > >> try: > >> while iterables: > >> for element in iterables.pop(0): > >> yield element > >> # This is equivalent to the nested-finally chain above: > >> except BaseException as last_exc: > >> for iterable in iterables: > >> try: > >> operators.iterclose(iter(iterable)) > >> except BaseException as new_exc: > >> if new_exc.__context__ is None: > >> new_exc.__context__ = last_exc > >> last_exc = new_exc > >> raise last_exc > >> > >> It's probably worth wrapping that bottom part into an iterclose_all() > >> helper, since the pattern probably occurs in other cases as well. > >> (Actually, now that I think about it, the map() example in the text > >> should be doing this instead of what it's currently doing... I'll fix > >> that.) > > > > At this point your code is starting to look a whole lot like the code > > in contextlib.ExitStack.__exit__ :) > > One of the versions I tried but didn't include in my email used > ExitStack :-). It turns out not to work here: the problem is that we > effectively need to enter *all* the contexts before unwinding, even if > trying to enter one of them fails. ExitStack is nested like (try (try > (try ... finally) finally) finally), and we need (try finally (try > finally (try finally ...))) But this is just a small side-point > anyway, since most code is not implementing complicated > meta-iterators; I'll address your real proposal below. > > > Accordingly, I'm going to suggest that while I agree the problem you > > describe is one that genuinely emerges in large production > > applications and other complex systems, this particular solution is > > simply far too intrusive to be accepted as a language change for > > Python - you're talking a fundamental change to the meaning of > > iteration for the sake of the relatively small portion of the > > community that either work on such complex services, or insist on > > writing their code as if it might become part of such a service, even > > when it currently isn't. Given that simple applications vastly > > outnumber complex ones, and always will, I think making such a change > > would be a bad trade-off that didn't come close to justifying the > > costs imposed on the rest of the ecosystem to adjust to it. > > > > A potentially more fruitful direction of research to pursue for 3.7 > > would be the notion of "frame local resources", where each Python > > level execution frame implicitly provided a lazily instantiated > > ExitStack instance (or an equivalent) for resource management. > > Assuming that it offered an "enter_frame_context" function that mapped > > to "contextlib.ExitStack.enter_context", such a system would let us do > > things like: > > So basically a 'with expression', that gives up the block syntax -- > taking its scope from the current function instead -- in return for > being usable in expression context? That's a really interesting, and I > see the intuition that it might be less disruptive if our implicit > iterclose calls are scoped to the function rather than the 'for' loop. > > But having thought about it and investigated some... I don't think > function-scoping addresses my problem, and I don't see evidence that > it's meaningfully less disruptive to existing code. > > First, "my problem": > > Obviously, Python's a language that should be usable for folks doing > one-off scripts, and for paranoid folks trying to write robust complex > systems, and for everyone in between -- these are all really important > constituencies. And unfortunately, there is a trade-off here, where > the changes we're discussing effect these constituencies differently. > But it's not just a matter of shifting around a fixed amount of pain; > the *quality* of the pain really changes under the different > proposals. > > In the status quo: > - for one-off scripts: you can just let the GC worry about generator > and file handle cleanup, re-use iterators, whatever, it's cool > - for robust systems: because it's the *caller's* responsibility to > ensure that iterators are cleaned up, you... kinda can't really use > generators without -- pick one -- (a) draconian style guides (like > forbidding 'with' inside generators or forbidding bare 'for' loops > entirely), (b) lots of auditing (every time you write a 'for' loop, go > read the source to the generator you're iterating over -- no > modularity for you and let's hope the answer doesn't change!), or (c) > introducing really subtle bugs. Or all of the above. It's true that a > lot of the time you can ignore this problem and get away with it one > way or another, but if you're trying to write robust code then this > doesn't really help -- it's like saying the footgun only has 1 bullet > in the chamber. Not as reassuring as you'd think. It's like if every > time you called a function, you had to explicitly say whether you > wanted exception handling to be enabled inside that function, and if > you forgot then the interpreter might just skip the 'finally' blocks > while unwinding. There's just *isn't* a good solution available. > > In my proposal (for-scoped-iterclose): > - for robust systems: life is great -- you're still stopping to think > a little about cleanup every time you use an iterator (because that's > what it means to write robust code!), but since the iterators now know > when they need cleanup and regular 'for' loops know how to invoke it, > then 99% of the time (i.e., whenever you don't intend to re-use an > iterator) you can be confident that just writing 'for' will do exactly > the right thing, and the other 1% of the time (when you do want to > re-use an iterator), you already *know* you're doing something clever. > So the cognitive overhead on each for-loop is really low. > - for one-off scripts: ~99% of the time (actual measurement, see > below) everything just works, except maybe a little bit better. 1% of > the time, you deploy the clever trick of re-using an iterator with > multiple for loops, and it breaks, so this is some pain. Here's what > you see: > > gen_obj = ... > for first_line in gen_obj: > break > for lines in gen_obj: > ... > > Traceback (most recent call last): > File "/tmp/foo.py", line 5, in > for lines in gen_obj: > AlreadyClosedIteratorError: this iterator was already closed, > possibly by a previous 'for' loop. (Maybe you want > itertools.preserve?) > > (We could even have a PYTHONDEBUG flag that when enabled makes that > error message include the file:line of the previous 'for' loop that > called __iterclose__.) > > So this is pain! But the pain is (a) rare, not pervasive, (b) > immediately obvious (an exception, the code doesn't work at all), not > subtle and delayed, (c) easily googleable, (d) easy to fix and the fix > is reliable. It's a totally different type of pain than the pain that > we currently impose on folks who want to write robust code. > > Now compare to the new proposal (function-scoped-iterclose): > > - For those who want robust cleanup: Usually, I only need an iterator > for as long as I'm iterating over it; that may or may not correspond > to the end of the function (often won't). When these don't coincide, > it can cause problems. E.g., consider the original example from my > proposal: > > def read_newline_separated_json(path): > with open(path) as f: > for line in f: > yield json.loads(line) > > but now suppose that I'm a Data Scientist (tm) so instead of having 1 > file full of newline-separated JSON, I have a 100 gigabytes worth of > the stuff stored in lots of files in a directory tree. Well, that's no > problem, I'll just wrap that generator: > > def read_newline_separated_json_tree(tree): > for root, _, paths in os.walk(tree): > for path in paths: > for document in read_newline_separated_json(join(root, > path)): > yield document > > > And then I'll run it on PyPy, because that's what you do when you have > 100 GB of string processing, and... it'll crash, because the call to > read_newline_separated_tree ends up doing thousands of calls to > read_newline_separated_json, but never cleans up any of them up until > the function exits, so eventually we run out of file descriptors. > I still don't understand why you can't write it like this: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: with read_newline_separated_json(join(root, path)) as iterable: yield from iterable Zero extra lines. Works today. Does everything you want. > > A similar situation arises in the main loop of something like an HTTP > server: > > while True: > request = read_request(sock) > for response_chunk in application_handler(request): > send_response_chunk(sock) > Same thing: while True: request = read_request(sock) with application_handler(request) as iterable: for response_chunk in iterable: send_response_chunk(sock) I'll stop posting about this, but I don't see the motivation behind this proposals except replacing one explicit context management line with a hidden "line" of cognitive overhead. I think the solution is to stop returning an iterable when you have state needing a cleanup. Instead, return a context manager and force the caller to open it to get at the iterable. Best, Neil > > Here we'll accumulate arbitrary numbers of un-closed > application_handler generators attached to the stack frame, which is > no good at all. And this has the interesting failure mode that you'll > probably miss it in testing, because most clients will only re-use a > connection a small number of times. > So what this means is that every time I write a for loop, I can't just > do a quick "am I going to break out of the for-loop and then re-use > this iterator?" check -- I have to stop and think about whether this > for-loop is nested inside some other loop, etc. And, again, if I get > it wrong, then it's a subtle bug that will bite me later. It's true > that with the status quo, we need to wrap, X% of for-loops with 'with' > blocks, and with this proposal that number would drop to, I don't > know, (X/5)% or something. But that's not the most important cost: the > most important cost is the cognitive overhead of figuring out which > for-loops need the special treatment, and in this proposal that > checking is actually *more* complicated than the status quo. > > - For those who just want to write a quick script and not think about > it: here's a script that does repeated partial for-loops over a > generator object: > > > https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588cb4/Tools/scripts/gprof2html.py#L40-L79 > > (and note that the generator object even has an ineffective 'with > open(...)' block inside it!) > > With the function-scoped-iterclose, this script would continue to work > as it does now. Excellent. > > But, suppose that I decide that that main() function is really > complicated and that it would be better to refactor some of those > loops out into helper functions. (Probably actually true in this > example.) So I do that and... suddenly the code breaks. And in a > rather confusing way, because it has to do with this complicated > long-distance interaction between two different 'for' loops *and* > where they're placed with respect to the original function versus the > helper function. > > If I were an intermediate-level Python student (and I'm pretty sure > anyone who is starting to get clever with re-using iterators counts as > "intermediate level"), then I'm pretty sure I'd actually prefer the > immediate obvious feedback from the for-scoped-iterclose. This would > actually be a good time to teach folks about this aspect of resource > handling, actually -- it's certainly an important thing to learn > eventually on your way to Python mastery, even if it isn't needed for > every script. > > In the pypy-dev thread about this proposal, there's some very > distressed emails from someone who's been writing Python for a long > time but only just realized that generator cleanup relies on the > garbage collector: > > https://mail.python.org/pipermail/pypy-dev/2016-October/014709.html > https://mail.python.org/pipermail/pypy-dev/2016-October/014720.html > > It's unpleasant to have the rug pulled out from under you like this > and suddenly realize that you might have to go re-evaluate all the > code you've ever written, and making for loops safe-by-default and > fail-fast-when-unsafe avoids that. > > Anyway, in summary: function-scoped-iterclose doesn't seem to > accomplish my goal of getting rid of the *type* of pain involved when > you have to run a background thread in your brain that's doing > constant paranoid checking every time you write a for loop. Instead it > arguably takes that type of pain and spreads it around both the > experts and the novices :-/. > > ------------- > > Now, let's look at some evidence about how disruptive the two > proposals are for real code: > > As mentioned else-thread, I wrote a stupid little CPython hack [1] to > report when the same iterator object gets passed to multiple 'for' > loops, and ran the CPython and Django testsuites with it [2]. Looking > just at generator objects [3], across these two large codebases there > are exactly 4 places where this happens. (Rough idea of prevalence: > these 4 places together account for a total of 8 'for' loops; this is > out of a total of 11,503 'for' loops total, of which 665 involve > generator objects.) The 4 places are: > > 1) CPython's Lib/test/test_collections.py:1135, > Lib/_collections_abc.py:378 > > This appears to be a bug in the CPython test suite -- the little MySet > class does 'def __init__(self, itr): self.contents = itr', which > assumes that itr is a container that can be repeatedly iterated. But a > bunch of the methods on collections.abc.Set like to pass in a > generator object here instead, which breaks everything. If repeated > 'for' loops on generators raised an error then this bug would have > been caught much sooner. > > 2) CPython's Tools/scripts/gprof2html.py lines 45, 54, 59, 75 > > Discussed above -- as written, for-scoped-iterclose would break this > script, but function-scoped-iterclose would not, so here > function-scoped-iterclose wins. > > 3) Django django/utils/regex_helper.py:236 > > This code is very similar to the previous example in its general > outline, except that the 'for' loops *have* been factored out into > utility functions. So in this case for-scoped-iterclose and > function-scoped-iterclose are equally disruptive. > > 4) CPython's Lib/test/test_generators.py:723 > > I have to admit I cannot figure out what this code is doing, besides > showing off :-). But the different 'for' loops are in different stack > frames, so I'm pretty sure that for-scoped-iterclose and > function-scoped-iterclose would be equally disruptive. > > Obviously there's a bias here in that these are still relatively > "serious" libraries; I don't have a big corpus of one-off scripts that > are just a big __main__, though gprof2html.py isn't far from that. (If > anyone knows where to find such a thing let me know...) But still, the > tally here is that out of 4 examples, we have 1 subtle bug that > iterclose might have caught, 2 cases where for-scoped-iterclose and > function-scoped-iterclose are equally disruptive, and only 1 where > function-scoped-iterclose is less disruptive -- and in that case it's > arguably just avoiding an obvious error now in favor of a more > confusing error later. > > If this reduced the backwards-incompatible cases by a factor of, like, > 10x or 100x, then that would be a pretty strong argument in its favor. > But it seems to be more like... 1.5x. > > -n > > [1] > https://github.com/njsmith/cpython/commit/2b9d60e1c1b89f0f1ac30cbf0a5dceee835142c2 > [2] CPython: revision b0a272709b from the github mirror; Django: > revision 90c3b11e87 > [3] I also looked at "all iterators" and "all iterators with .close > methods", but this email is long enough... basically the pattern is > the same: there are another 13 'for' loops that involve repeated > iteration over non-generator objects, and they're roughly equally > split between spurious effects due to bugs in the CPython test-suite > or my instrumentation, cases where for-scoped-iterclose and > function-scoped-iterclose both cause the same problems, and cases > where function-scoped-iterclose is less disruptive. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainventions at gmail.com Fri Oct 28 09:35:03 2016 From: rainventions at gmail.com (Ryan Birmingham) Date: Fri, 28 Oct 2016 09:35:03 -0400 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: I certainly like the concept, but I worry that use of __exists__() could generalize it a bit beyond what you're intending in practice. It seems like this should only check if an object exists, and that adding the magic method would only lead to confusion. -Ryan Birmingham On 28 October 2016 at 04:30, Nick Coghlan wrote: > Hi folks, > > After the recent discussions of PEP 505's null-coalescing operator > (and the significant confusion around why anyone would ever want a > feature like that), I was inspired to put together a competing > proposal that focuses more on defining a new "existence checking" > protocol that generalises the current practicises of: > > * obj is not None (many different use cases) > * obj is not Ellipsis (in multi-dimensional slicing) > * obj is not NotImplemented (in operand coercion) > * math.isnan(value) > * cmath.isnan(value) > * decimal.getcontext().is_nan(value) > > Given that protocol as a basis, it then proceeds to define "?then" and > "?else" as existence checking counterparts to the truth-checking "and" > and "or", as well as "?.", "?[]" and "?=" as abbreviations for > particular idiomatic uses of "?then" and "?else". > > I think this approach frames the discussion in a more productive way, > as it gives us a series of questions to consider in order where a > collective answer of "No" at any point would be enough to kill this > particular proposal (or parts of it), but precisely *where* we say > "No" will determine which future alternatives might be worth > considering: > > 1. Do we collectively agree that "existence checking" is a useful > general concept that exists in software development and is distinct > from the concept of "truth checking"? > 2. Do we collectively agree that the Python ecosystem would benefit > from an existence checking protocol that permits generalisation of > algorithms (especially short circuiting ones) across different "data > missing" indicators, including those defined in the language > definition, the standard library, and custom user code? > 3. Do we collectively agree that it would be easier to use such a > protocol effectively if existence-checking equivalents to the > truth-checking "and" and "or" control flow operators were available? > > Only if we have at least some level of consensus on the above > questions regarding whether or not this is a conceptual modeling > problem worth addressing at the language level does it then make sense > to move on to the more detailed questions regarding the specific > proposed *solution* to the problem in the PEP: > > 4. Do we collectively agree that "?then" and "?else" would be > reasonable spellings for such operators? > 5a. Do we collectively agree that "access this attribute only if the > object exists" would be a particularly common use case for such > operators? > 5b. Do we collectively agree that "access this subscript only if the > object exists" would be a particularly common use case for such > operators? > 5c. Do we collectively agree that "bind this value to this target only > if the value currently bound to the target nominally doesn't exist" > would be a particularly common use case for such operators? > 6a. Do we collectively agree that 'obj?.attr' would be a reasonable > spelling for "access this attribute only if the object exists"? > 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable > spelling for "access this subscript only if the object exists"? > 6c. Do we collectively agree that 'target ?= expr' would be a > reasonable spelling for "bind this value to this target only if the > value currently bound to the target nominally doesn't exist"? > > To be clear, this would be a *really* big addition to the language > that would have significant long term ramifications for how the > language gets taught to new developers. > > At the same time, asking whether or not an object represents an > absence of data rather than the truth of a proposition seems to me > like a sufficiently common problem in a wide enough variety of domains > that it may be worth elevating to the level of giving it dedicated > syntactic support. > > Regards, > Nick. > > Rendered HTML version: https://www.python.org/dev/peps/pep-0531/ > =============================== > > PEP: 531 > Title: Existence checking operators > Version: $Revision$ > Last-Modified: $Date$ > Author: Nick Coghlan > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 25-Oct-2016 > Python-Version: 3.7 > Post-History: 28-Oct-2016 > > Abstract > ======== > > Inspired by PEP 505 and the related discussions, this PEP proposes the > addition > of two new control flow operators to Python: > > * Existence-checking precondition ("exists-then"): ``expr1 ?then expr2`` > * Existence-checking fallback ("exists-else"): ``expr1 ?else expr2`` > > as well as the following abbreviations for common existence checking > expressions and statements: > > * Existence-checking attribute access: > ``obj?.attr`` (for ``obj ?then obj.attr``) > * Existence-checking subscripting: > ``obj?[expr]`` (for ``obj ?then obj[expr]``) > * Existence-checking assignment: > ``value ?= expr`` (for ``value = value ?else expr``) > > The common ``?`` symbol in these new operator definitions indicates that > they > use a new "existence checking" protocol rather than the established > truth-checking protocol used by if statements, while loops, comprehensions, > generator expressions, conditional expressions, logical conjunction, and > logical disjunction. > > This new protocol would be made available as ``operator.exists``, with the > following characteristics: > > * types can define a new ``__exists__`` magic method (Python) or > ``tp_exists`` slot (C) to override the default behaviour. This optional > method has the same signature and possible return values as ``__bool__``. > * ``operator.exists(None)`` returns ``False`` > * ``operator.exists(NotImplemented)`` returns ``False`` > * ``operator.exists(Ellipsis)`` returns ``False`` > * ``float``, ``complex`` and ``decimal.Decimal`` will override the > existence > check such that ``NaN`` values return ``False`` and other values > (including > zero values) return ``True`` > * for any other type, ``operator.exists(obj)`` returns True by default. > Most > importantly, values that evaluate to False in a truth checking context > (zeroes, empty containers) will still evaluate to True in an existence > checking context > > > Relationship with other PEPs > ============================ > > While this PEP was inspired by and builds on Mark Haase's excellent work in > putting together PEP 505, it ultimately competes with that PEP due to > significant differences in the specifics of the proposed syntax and > semantics > for the feature. > > It also presents a different perspective on the rationale for the change by > focusing on the benefits to existing Python users as the typical demands of > application and service development activities are genuinely changing. It > isn't an accident that similar features are now appearing in multiple > programming languages, and while it's a good idea for us to learn from how > other > language designers are handling the problem, precedents being set elsewhere > are more relevant to *how* we would go about tackling this problem than > they > are to whether or not we think it's a problem we should address in the > first > place. > > > Rationale > ========= > > Existence checking expressions > ------------------------------ > > An increasingly common requirement in modern software development is the > need > to work with "semi-structured data": data where the structure of the data > is > known in advance, but pieces of it may be missing at runtime, and the > software > manipulating that data is expected to degrade gracefully (e.g. by omitting > results that depend on the missing data) rather than failing outright. > > Some particularly common cases where this issue arises are: > > * handling optional application configuration settings and function > parameters > * handling external service failures in distributed systems > * handling data sets that include some partial records > > It is the latter two cases that are the primary motivation for this PEP - > while > needing to deal with optional configuration settings and parameters is a > design > requirement at least as old as Python itself, the rise of public cloud > infrastructure, the development of software systems as collaborative > networks > of distributed services, and the availability of large public and private > data > sets for analysis means that the ability to degrade operations gracefully > in > the face of partial service failures or partial data availability is > becoming > an essential feature of modern programming environments. > > At the moment, writing such software in Python can be genuinely awkward, as > your code ends up littered with expressions like: > > * ``value1 = expr1.field.of.interest if expr1 is not None else None`` > * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else > None`` > * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not > None else expr5`` > > If these are only occasional, then expanding out to full statement forms > may > help improve readability, but if you have 4 or 5 of them in a row (which > is a > fairly common situation in data transformation pipelines), then replacing > them > with 16 or 20 lines of conditional logic really doesn't help matters. > > Expanding the three examples above that way hopefully helps illustrate > that:: > > _expr1 = expr1 > if _expr1 is not None: > value1 = _expr1.field.of.interest > else: > value1 = None > _expr2 = expr2 > if _expr2 is not None: > value2 = _expr2["field"]["of"]["interest"] > else: > value2 = None > _expr3 = expr3 > if _expr3 is not None: > value3 = _expr3 > else: > _expr4 = expr4 > if _expr4 is not None: > value3 = _expr4 > else: > value3 = expr5 > > The combined impact of the proposals in this PEP is to allow the above > sample > expressions to instead be written as: > > * ``value1 = expr1?.field.of.interest`` > * ``value2 = expr2?["field"]["of"]["interest"]`` > * ``value3 = expr3 ?else expr4 ?else expr5`` > > In these forms, almost all of the information presented to the reader is > immediately relevant to the question "What does this code do?", while the > boilerplate code to handle missing data by passing it through to the output > or falling back to an alternative input, has shrunk to two uses of the > ``?`` > symbol and two uses of the ``?else`` keyword. > > In the first two examples, the 31 character boilerplate clause > `` if exprN is not None else None`` (minimally 27 characters for a single > letter > variable name) has been replaced by a single ``?`` character, substantially > improving the signal-to-pattern-noise ratio of the lines (especially if it > encourages the use of more meaningful variable and field names rather than > making them shorter purely for the sake of expression brevity). > > In the last example, two instances of the 21 character boilerplate, > `` if exprN is not None`` (minimally 17 characters) are replaced with > single > characters, again substantially improving the signal-to-pattern-noise > ratio. > > Furthermore, each of our 5 "subexpressions of potential interest" is > included > exactly once, rather than 4 of them needing to be duplicated or pulled out > to a named variable in order to first check if they exist. > > The existence checking precondition operator is mainly defined to provide a > clear conceptual basis for the existence checking attribute access and > subscripting operators: > > * ``obj?.attr`` is roughly equivalent to ``obj ?then obj.attr`` > * ``obj?[expr]``is roughly equivalent to ``obj ?then obj[expr]`` > > The main semantic difference between the shorthand forms and their expanded > equivalents is that the common subexpression to the left of the existence > checking operator is evaluated only once in the shorthand form (similar to > the benefit offered by augmented assignment statements). > > > Existence checking assignment > ----------------------------- > > Existence-checking assignment is proposed as a relatively straightforward > expansion of the concepts in this PEP to also cover the common > configuration > handling idiom: > > * ``value = value if value is not None else expensive_default()`` > > by allowing that to instead be abbreviated as: > > * ``value ?= expensive_default()`` > > This is mainly beneficial when the target is a subscript operation or > subattribute, as even without this specific change, the PEP would still > permit this idiom to be updated to: > > * ``value = value ?else expensive_default()`` > > The main argument *against* adding this form is that it's arguably > ambiguous > and could mean either: > > * ``value = value ?else expensive_default()``; or > * ``value = value ?then value.subfield.of.interest`` > > The second form isn't at all useful, but if this concern was deemed > significant > enough to address while still keeping the augmented assignment feature, > the full keyword could be included in the syntax: > > * ``value ?else= expensive_default()`` > > Alternatively, augmented assignment could just be dropped from the current > proposal entirely and potentially reconsidered at a later date. > > > Existence checking protocol > --------------------------- > > The existence checking protocol is including in this proposal primarily to > allow for proxy objects (e.g. local representations of remote resources) > and > mock objects used in testing to correctly indicate non-existence of target > resources, even though the proxy or mock object itself is not None. > > However, with that protocol defined, it then seems natural to expand it to > provide a type independent way of checking for ``NaN`` values in numeric > types > - at the moment you need to be aware of the exact data type you're working > with > (e.g. builtin floats, builtin complex numbers, the decimal module) and use > the > appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``, > ``decimal.getcontext().is_nan()``, respectively) > > Similarly, it seems reasonable to declare that the other placeholder > builtin > singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects > that > represent the absence of data moreso than they represent data. > > > Proposed symbolic notation > -------------------------- > > Python has historically only had one kind of implied boolean context: truth > checking, which can be invoked directly via the ``bool()`` builtin. As > this PEP > proposes a new kind of control flow operation based on existence checking > rather > than truth checking, it is considered valuable to have a reminder directly > in the code when existence checking is being used rather than truth > checking. > > The mathematical symbol for existence assertions is U+2203 'THERE EXISTS': > ``?`` > > Accordingly, one possible approach to the syntactic additions proposed in > this > PEP would be to use that already defined mathematical notation: > > * ``expr1 ?then expr2`` > * ``expr1 ?else expr2`` > * ``obj?.attr`` > * ``obj?[expr]`` > * ``target ?= expr`` > > However, there are two major problems with that approach, one practical, > and > one pedagogical. > > The practical problem is the usual one that most keyboards don't offer any > easy > way of entering mathematical symbols other than those used in basic > arithmetic > (even the symbols appearing in this PEP were ultimately copied & pasted > from [3]_ rather than being entered directly). > > The pedagogical problem is that the symbols for existence assertions > (``?``) > and universal assertions (``?``) aren't going to be familiar to most people > the way basic arithmetic operators are, so we wouldn't actually be making > the > proposed syntax easier to understand by adopting ``?``. > > By contrast, ``?`` is one of the few remaining unused ASCII punctuation > characters in Python's syntax, making it available as a candidate syntactic > marker for "this control flow operation is based on an existence check, > not a > truth check". > > Taking that path would also have the advantage of aligning Python's syntax > with corresponding syntax in other languages that offer similar features. > > Drawing from the existing summary in PEP 505 and the Wikipedia articles on > the "safe navigation operator [1]_ and the "null coalescing operator" [2]_, > we see: > > * The ``?.`` existence checking attribute access syntax precisely aligns > with: > > * the "safe navigation" attribute access operator in C# (``?.``) > * the "optional chaining" operator in Swift (``?.``) > * the "safe navigation" attribute access operator in Groovy (``?.``) > * the "conditional member access" operator in Dart (``?.``) > > * The ``?[]`` existence checking attribute access syntax precisely aligns > with: > > * the "safe navigation" subscript operator in C# (``?[]``) > * the "optional subscript" operator in Swift (``?[].``) > > * The ``?else`` existence checking fallback syntax semantically aligns > with: > > * the "null-coalescing" operator in C# (``??``) > * the "null-coalescing" operator in PHP (``??``) > * the "nil-coalescing" operator in Swift (``??``) > > To be clear, these aren't the only spelling of these operators used in > other > languages, but they're the most common ones, and the ``?`` symbol is the > most > common syntactic marker by far (presumably prompted by the use of ``?`` to > introduce the "then" clause in C-style conditional expressions, which many > of these languages also offer). > > > Proposed keywords > ----------------- > > Given the symbolic marker ``?``, it would be syntactically unambiguous to > spell > the existence checking precondition and fallback operations using the same > keywords as their truth checking counterparts: > > * ``expr1 ?and expr2`` (instead of ``expr1 ?then expr2``) > * ``expr1 ?or expr2`` (instead of ``expr1 ?else expr2``) > > However, while syntactically unambiguous when written, this approach makes > the code incredibly hard to *pronounce* (What's the pronunciation of "?"?) > and > also hard to *describe* (given reused keywords, there's no obvious > shorthand > terms for "existence checking precondition (?and)" and "existence checking > fallback (?or)" that would distinguish them from "logical conjunction > (and)" > and "logical disjunction (or)"). > > We could try to encourage folks to pronounce the ``?`` symbol as "exists", > making the shorthand names the "exists-and expression" and the > "exists-or expression", but there'd be no way of guessing those names > purely > from seeing them written in a piece of code. > > Instead, this PEP takes advantage of the proposed symbolic syntax to > introduce > a new keyword (``?then``) and borrow an existing one (``?else``) in a way > that allows people to refer to "then expressions" and "else expressions" > without ambiguity. > > These keywords also align well with the conditional expressions that are > semantically equivalent to the proposed expressions. > > For ``?else`` expressions, ``expr1 ?else expr2`` is equivalent to:: > > _lhs_result = expr1 > _lhs_result if operator.exists(_lhs_result) else expr2 > > Here the parallel is clear, since the ``else expr2`` appears at the end of > both the abbreviated and expanded forms. > > For ``?then`` expressions, ``expr1 ?then expr2`` is equivalent to:: > > _lhs_result = expr1 > expr2 if operator.exists(_lhs_result) else _lhs_result > > Here the parallel isn't as immediately obvious due to Python's > traditionally > anonymous "then" clauses (introduced by ``:`` in ``if`` statements and > suffixed > by ``if`` in conditional expressions), but it's still reasonably clear as > long > as you're already familiar with the "if-then-else" explanation of > conditional > control flow. > > > Risks and concerns > ================== > > Readability > ----------- > > Learning to read and write the new syntax effectively mainly requires > internalising two concepts: > > * expressions containing ``?`` include an existence check and may short > circuit > * if ``None`` or another "non-existent" value is an expected input, and the > correct handling is to propagate that to the result, then the existence > checking operators are likely what you want > > Currently, these concepts aren't explicitly represented at the language > level, > so it's a matter of learning to recognise and use the various idiomatic > patterns based on conditional expressions and statements. > > > Magic syntax > ------------ > > There's nothing about ``?`` as a syntactic element that inherently suggests > ``is not None`` or ``operator.exists``. The main current use of ``?`` as a > symbol in Python code is as a trailing suffix in IPython environments to > request help information for the result of the preceding expression. > > However, the notion of existence checking really does benefit from a > pervasive > visual marker that distinguishes it from truth checking, and that calls for > a single-character symbolic syntax if we're going to do it at all. > > > Conceptual complexity > --------------------- > > This proposal takes the currently ad hoc and informal concept of "existence > checking" and elevates it to the status of being a syntactic language > feature > with a clearly defined operator protocol. > > In many ways, this should actually *reduce* the overall conceptual > complexity > of the language, as many more expectations will map correctly between truth > checking with ``bool(expr)`` and existence checking with > ``operator.exists(expr)`` than currently map between truth checking and > existence checking with ``expr is not None`` (or ``expr is not > NotImplemented`` > in the context of operand coercion, or the various NaN-checking operations > in mathematical libraries). > > As a simple example of the new parallels introduced by this PEP, compare:: > > all_are_true = all(map(bool, iterable)) > at_least_one_is_true = any(map(bool, iterable)) > all_exist = all(map(operator.exists, iterable)) > at_least_one_exists = any(map(operator.exists, iterable)) > > > Design Discussion > ================= > > Subtleties in chaining existence checking expressions > ----------------------------------------------------- > > Similar subtleties arise in chaining existence checking expressions as > already > exist in chaining logical operators: the behaviour can be surprising if the > right hand side of one of the expressions in the chain itself returns a > value that doesn't exist. > > As a result, ``value = arg1 ?then f(arg1) ?else default()`` would be > dubious for > essentially the same reason that ``value = cond and expr1 or expr2`` is > dubious: > the former will evaluate ``default()`` if ``f(arg1)`` returns ``None``, > just > as the latter will evaluate ``expr2`` if ``expr1`` evaluates to ``False`` > in > a boolean context. > > > Ambiguous interaction with conditional expressions > -------------------------------------------------- > > In the proposal as currently written, the following is a syntax error: > > * ``value = f(arg) if arg ?else default`` > > While the following is a valid operation that checks a second condition if > the > first doesn't exist rather than merely being false: > > * ``value = expr1 if cond1 ?else cond2 else expr2`` > > The expression chaining problem described above means that the argument > can be > made that the first operation should instead be equivalent to: > > * ``value = f(arg) if operator.exists(arg) else default`` > > requiring the second to be written in the arguably clearer form: > > * ``value = expr1 if (cond1 ?else cond2) else expr2`` > > Alternatively, the first form could remain a syntax error, and the > existence > checking symbol could instead be attached to the ``if`` keyword: > > * ``value = expr1 if? cond else expr2`` > > > Existence checking in other truth-checking contexts > --------------------------------------------------- > > The truth-checking protocol is currently used in the following syntactic > constructs: > > * logical conjunction (and-expressions) > * logical disjunction (or-expressions) > * conditional expressions (if-else expressions) > * if statements > * while loops > * filter clauses in comprehensions and generator expressions > > In the current PEP, switching from truth-checking with ``and`` and ``or`` > to > existence-checking is a matter of substituting in the new keywords, > ``?then`` > and ``?else`` in the appropriate places. > > For other truth-checking contexts, it proposes either importing and > using the ``operator.exists`` API, or else continuing with the current > idiom > of checking specifically for ``expr is not None`` (or the context > appropriate > equivalent). > > The simplest possible enhancement in that regard would be to elevate the > proposed ``exists()`` API from an operator module function to a new builtin > function. > > Alternatively, the ``?`` existence checking symbol could be supported as a > modifier on the ``if`` and ``while`` keywords to indicate the use of an > existence check rather than a truth check. > > However, it isn't at all clear that the potential consistency benefits > gained > for either suggestion would justify the additional disruption, so they've > currently been omitted from the proposal. > > > Defining expected invariant relations between ``__bool__`` and > ``__exists__`` > ------------------------------------------------------------ > ----------------- > > The PEP currently leaves the definition of ``__bool__`` on all existing > types > unmodified, which ensures the entire proposal remains backwards compatible, > but results in the following cases where ``bool(obj)`` returns ``True``, > but > the proposed ``operator.exists(obj)`` would return ``False``: > > * ``NaN`` values for ``float``, ``complex``, and ``decimal.Decimal`` > * ``Ellipsis`` > * ``NotImplemented`` > > The main argument for potentially changing these is that it becomes easier > to > reason about potential code behaviour if we have a recommended invariant in > place saying that values which indicate they don't exist in an existence > checking context should also report themselves as being ``False`` in a > truth > checking context. > > Failing to define such an invariant would lead to arguably odd outcomes > like > ``float("NaN") ?else 0.0`` returning ``0.0`` while ``float("NaN") or 0.0`` > returns ``NaN``. > > > Limitations > =========== > > Arbitrary sentinel objects > -------------------------- > > This proposal doesn't attempt to provide syntactic support for the > "sentinel > object" idiom, where ``None`` is a permitted explicit value, so a > separate sentinel object is defined to indicate missing values:: > > _SENTINEL = object() > def f(obj=_SENTINEL): > return obj if obj is not _SENTINEL else default_value() > > This could potentially be supported at the expense of making the existence > protocol definition significantly more complex, both to define and to use: > > * at the Python layer, ``operator.exists`` and ``__exists__`` > implementations > would return the empty tuple to indicate non-existence, and otherwise > return > a singleton tuple containing a reference to the object to be used as the > result of the existence check > * at the C layer, ``tp_exists`` implementations would return NULL to > indicate > non-existence, and otherwise return a `PyObject *` pointer as the > result of the existence check > > Given that change, the sentinel object idiom could be rewritten as:: > > class Maybe: > SENTINEL = object() > def __init__(self, value): > self._result = (value,) is value is not self.SENTINEL else () > def __exists__(self): > return self._result > > def f(obj=Maybe.SENTINEL): > return Maybe(obj) ?else default_value() > > However, I don't think cases where the 3 proposed standard sentinel values > (i.e. > ``None``, ``Ellipsis`` and ``NotImplemented``) can't be used are going to > be > anywhere near common enough for the additional protocol complexity and the > loss > of symmetry between ``__bool__`` and ``__exists__`` to be worth it. > > > Specification > ============= > > The Abstract already gives the gist of the proposal and the Rationale gives > some specific examples. If there's enough interest in the basic idea, then > a > full specification will need to provide a precise correspondence between > the > proposed syntactic sugar and the underlying conditional expressions that is > sufficient to guide the creation of a reference implementation. > > ...TBD... > > > Implementation > ============== > > As with PEP 505, actual implementation has been deferred pending > in-principle > interest in the idea of adding these operators - the implementation isn't > the hard part of these proposals, the hard part is deciding whether or not > this is a change where the long term benefits for new and existing Python > users > outweigh the short term costs involved in the wider ecosystem (including > developers of other implementations, language curriculum developers, and > authors of other Python related educational material) adjusting to the > change. > > ...TBD... > > > References > ========== > > .. [1] Wikipedia: Safe navigation operator > (https://en.wikipedia.org/wiki/Safe_navigation_operator) > > .. [2] Wikipedia: Null coalescing operator > (https://en.wikipedia.org/wiki/Null_coalescing_operator) > > .. [3] FileFormat.info: Unicode Character 'THERE EXISTS' (U+2203) > (http://www.fileformat.info/info/unicode/char/2203/index.htm) > > > Copyright > ========= > > This document has been placed in the public domain under the terms of the > CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Oct 28 11:46:42 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 28 Oct 2016 10:46:42 -0500 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: Message-ID: On Oct 28, 2016 3:30 AM, "Nick Coghlan" wrote: > *snip* > > 1. Do we collectively agree that "existence checking" is a useful > general concept that exists in software development and is distinct > from the concept of "truth checking"? I'd hope so! > 2. Do we collectively agree that the Python ecosystem would benefit > from an existence checking protocol that permits generalisation of > algorithms (especially short circuiting ones) across different "data > missing" indicators, including those defined in the language > definition, the standard library, and custom user code? I {%think_string if think_string is not None else 'think'%} so. > *snip* > 4. Do we collectively agree that "?then" and "?else" would be > reasonable spellings for such operators? Personally, I find that kind of ugly. What's wrong with just ? instead of ?else? > 5a. Do we collectively agree that "access this attribute only if the > object exists" would be a particularly common use case for such > operators? Pretty sure I've done this like a zillion times. > 5b. Do we collectively agree that "access this subscript only if the > object exists" would be a particularly common use case for such > operators? I haven't really ever had to do this exactly, but it makes sense. > 5c. Do we collectively agree that "bind this value to this target only > if the value currently bound to the target nominally doesn't exist" > would be a particularly common use case for such operators? Yes. I see stuff like this a lot: if x is not None: x = [] > 6a. Do we collectively agree that 'obj?.attr' would be a reasonable > spelling for "access this attribute only if the object exists"? > 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable > spelling for "access this subscript only if the object exists"? > 6c. Do we collectively agree that 'target ?= expr' would be a > reasonable spelling for "bind this value to this target only if the > value currently bound to the target nominally doesn't exist"? > ' '.join(['Yes!']*3) > To be clear, this would be a *really* big addition to the language > that would have significant long term ramifications for how the > language gets taught to new developers. > > At the same time, asking whether or not an object represents an > absence of data rather than the truth of a proposition seems to me > like a sufficiently common problem in a wide enough variety of domains > that it may be worth elevating to the level of giving it dedicated > syntactic support. > > Regards, > Nick. > > Rendered HTML version: https://www.python.org/dev/peps/pep-0531/ > =============================== > > PEP: 531 > Title: Existence checking operators > Version: $Revision$ > Last-Modified: $Date$ > Author: Nick Coghlan > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 25-Oct-2016 > Python-Version: 3.7 > Post-History: 28-Oct-2016 > > Abstract > ======== > > Inspired by PEP 505 and the related discussions, this PEP proposes the addition > of two new control flow operators to Python: > > * Existence-checking precondition ("exists-then"): ``expr1 ?then expr2`` > * Existence-checking fallback ("exists-else"): ``expr1 ?else expr2`` > > as well as the following abbreviations for common existence checking > expressions and statements: > > * Existence-checking attribute access: > ``obj?.attr`` (for ``obj ?then obj.attr``) > * Existence-checking subscripting: > ``obj?[expr]`` (for ``obj ?then obj[expr]``) > * Existence-checking assignment: > ``value ?= expr`` (for ``value = value ?else expr``) > > The common ``?`` symbol in these new operator definitions indicates that they > use a new "existence checking" protocol rather than the established > truth-checking protocol used by if statements, while loops, comprehensions, > generator expressions, conditional expressions, logical conjunction, and > logical disjunction. > > This new protocol would be made available as ``operator.exists``, with the > following characteristics: > > * types can define a new ``__exists__`` magic method (Python) or > ``tp_exists`` slot (C) to override the default behaviour. This optional > method has the same signature and possible return values as ``__bool__``. > * ``operator.exists(None)`` returns ``False`` > * ``operator.exists(NotImplemented)`` returns ``False`` > * ``operator.exists(Ellipsis)`` returns ``False`` > * ``float``, ``complex`` and ``decimal.Decimal`` will override the existence > check such that ``NaN`` values return ``False`` and other values (including > zero values) return ``True`` > * for any other type, ``operator.exists(obj)`` returns True by default. Most > importantly, values that evaluate to False in a truth checking context > (zeroes, empty containers) will still evaluate to True in an existence > checking context > > > Relationship with other PEPs > ============================ > > While this PEP was inspired by and builds on Mark Haase's excellent work in > putting together PEP 505, it ultimately competes with that PEP due to > significant differences in the specifics of the proposed syntax and semantics > for the feature. > > It also presents a different perspective on the rationale for the change by > focusing on the benefits to existing Python users as the typical demands of > application and service development activities are genuinely changing. It > isn't an accident that similar features are now appearing in multiple > programming languages, and while it's a good idea for us to learn from how other > language designers are handling the problem, precedents being set elsewhere > are more relevant to *how* we would go about tackling this problem than they > are to whether or not we think it's a problem we should address in the first > place. > > > Rationale > ========= > > Existence checking expressions > ------------------------------ > > An increasingly common requirement in modern software development is the need > to work with "semi-structured data": data where the structure of the data is > known in advance, but pieces of it may be missing at runtime, and the software > manipulating that data is expected to degrade gracefully (e.g. by omitting > results that depend on the missing data) rather than failing outright. > > Some particularly common cases where this issue arises are: > > * handling optional application configuration settings and function parameters > * handling external service failures in distributed systems > * handling data sets that include some partial records > > It is the latter two cases that are the primary motivation for this PEP - while > needing to deal with optional configuration settings and parameters is a design > requirement at least as old as Python itself, the rise of public cloud > infrastructure, the development of software systems as collaborative networks > of distributed services, and the availability of large public and private data > sets for analysis means that the ability to degrade operations gracefully in > the face of partial service failures or partial data availability is becoming > an essential feature of modern programming environments. > > At the moment, writing such software in Python can be genuinely awkward, as > your code ends up littered with expressions like: > > * ``value1 = expr1.field.of.interest if expr1 is not None else None`` > * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` > * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not > None else expr5`` > > If these are only occasional, then expanding out to full statement forms may > help improve readability, but if you have 4 or 5 of them in a row (which is a > fairly common situation in data transformation pipelines), then replacing them > with 16 or 20 lines of conditional logic really doesn't help matters. > > Expanding the three examples above that way hopefully helps illustrate that:: > > _expr1 = expr1 > if _expr1 is not None: > value1 = _expr1.field.of.interest > else: > value1 = None > _expr2 = expr2 > if _expr2 is not None: > value2 = _expr2["field"]["of"]["interest"] > else: > value2 = None > _expr3 = expr3 > if _expr3 is not None: > value3 = _expr3 > else: > _expr4 = expr4 > if _expr4 is not None: > value3 = _expr4 > else: > value3 = expr5 > > The combined impact of the proposals in this PEP is to allow the above sample > expressions to instead be written as: > > * ``value1 = expr1?.field.of.interest`` > * ``value2 = expr2?["field"]["of"]["interest"]`` > * ``value3 = expr3 ?else expr4 ?else expr5`` > > In these forms, almost all of the information presented to the reader is > immediately relevant to the question "What does this code do?", while the > boilerplate code to handle missing data by passing it through to the output > or falling back to an alternative input, has shrunk to two uses of the ``?`` > symbol and two uses of the ``?else`` keyword. > > In the first two examples, the 31 character boilerplate clause > `` if exprN is not None else None`` (minimally 27 characters for a single letter > variable name) has been replaced by a single ``?`` character, substantially > improving the signal-to-pattern-noise ratio of the lines (especially if it > encourages the use of more meaningful variable and field names rather than > making them shorter purely for the sake of expression brevity). > > In the last example, two instances of the 21 character boilerplate, > `` if exprN is not None`` (minimally 17 characters) are replaced with single > characters, again substantially improving the signal-to-pattern-noise ratio. > > Furthermore, each of our 5 "subexpressions of potential interest" is included > exactly once, rather than 4 of them needing to be duplicated or pulled out > to a named variable in order to first check if they exist. > > The existence checking precondition operator is mainly defined to provide a > clear conceptual basis for the existence checking attribute access and > subscripting operators: > > * ``obj?.attr`` is roughly equivalent to ``obj ?then obj.attr`` > * ``obj?[expr]``is roughly equivalent to ``obj ?then obj[expr]`` > > The main semantic difference between the shorthand forms and their expanded > equivalents is that the common subexpression to the left of the existence > checking operator is evaluated only once in the shorthand form (similar to > the benefit offered by augmented assignment statements). > > > Existence checking assignment > ----------------------------- > > Existence-checking assignment is proposed as a relatively straightforward > expansion of the concepts in this PEP to also cover the common configuration > handling idiom: > > * ``value = value if value is not None else expensive_default()`` > > by allowing that to instead be abbreviated as: > > * ``value ?= expensive_default()`` > > This is mainly beneficial when the target is a subscript operation or > subattribute, as even without this specific change, the PEP would still > permit this idiom to be updated to: > > * ``value = value ?else expensive_default()`` > > The main argument *against* adding this form is that it's arguably ambiguous > and could mean either: > > * ``value = value ?else expensive_default()``; or > * ``value = value ?then value.subfield.of.interest`` > > The second form isn't at all useful, but if this concern was deemed significant > enough to address while still keeping the augmented assignment feature, > the full keyword could be included in the syntax: > > * ``value ?else= expensive_default()`` > > Alternatively, augmented assignment could just be dropped from the current > proposal entirely and potentially reconsidered at a later date. > > > Existence checking protocol > --------------------------- > > The existence checking protocol is including in this proposal primarily to > allow for proxy objects (e.g. local representations of remote resources) and > mock objects used in testing to correctly indicate non-existence of target > resources, even though the proxy or mock object itself is not None. > > However, with that protocol defined, it then seems natural to expand it to > provide a type independent way of checking for ``NaN`` values in numeric types > - at the moment you need to be aware of the exact data type you're working with > (e.g. builtin floats, builtin complex numbers, the decimal module) and use the > appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``, > ``decimal.getcontext().is_nan()``, respectively) > > Similarly, it seems reasonable to declare that the other placeholder builtin > singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that > represent the absence of data moreso than they represent data. > > > Proposed symbolic notation > -------------------------- > > Python has historically only had one kind of implied boolean context: truth > checking, which can be invoked directly via the ``bool()`` builtin. As this PEP > proposes a new kind of control flow operation based on existence checking rather > than truth checking, it is considered valuable to have a reminder directly > in the code when existence checking is being used rather than truth checking. > > The mathematical symbol for existence assertions is U+2203 'THERE EXISTS': ``?`` > > Accordingly, one possible approach to the syntactic additions proposed in this > PEP would be to use that already defined mathematical notation: > > * ``expr1 ?then expr2`` > * ``expr1 ?else expr2`` > * ``obj?.attr`` > * ``obj?[expr]`` > * ``target ?= expr`` > > However, there are two major problems with that approach, one practical, and > one pedagogical. > > The practical problem is the usual one that most keyboards don't offer any easy > way of entering mathematical symbols other than those used in basic arithmetic > (even the symbols appearing in this PEP were ultimately copied & pasted > from [3]_ rather than being entered directly). > > The pedagogical problem is that the symbols for existence assertions (``?``) > and universal assertions (``?``) aren't going to be familiar to most people > the way basic arithmetic operators are, so we wouldn't actually be making the > proposed syntax easier to understand by adopting ``?``. > > By contrast, ``?`` is one of the few remaining unused ASCII punctuation > characters in Python's syntax, making it available as a candidate syntactic > marker for "this control flow operation is based on an existence check, not a > truth check". > > Taking that path would also have the advantage of aligning Python's syntax > with corresponding syntax in other languages that offer similar features. > > Drawing from the existing summary in PEP 505 and the Wikipedia articles on > the "safe navigation operator [1]_ and the "null coalescing operator" [2]_, > we see: > > * The ``?.`` existence checking attribute access syntax precisely aligns with: > > * the "safe navigation" attribute access operator in C# (``?.``) > * the "optional chaining" operator in Swift (``?.``) > * the "safe navigation" attribute access operator in Groovy (``?.``) > * the "conditional member access" operator in Dart (``?.``) > > * The ``?[]`` existence checking attribute access syntax precisely aligns with: > > * the "safe navigation" subscript operator in C# (``?[]``) > * the "optional subscript" operator in Swift (``?[].``) > > * The ``?else`` existence checking fallback syntax semantically aligns with: > > * the "null-coalescing" operator in C# (``??``) > * the "null-coalescing" operator in PHP (``??``) > * the "nil-coalescing" operator in Swift (``??``) > > To be clear, these aren't the only spelling of these operators used in other > languages, but they're the most common ones, and the ``?`` symbol is the most > common syntactic marker by far (presumably prompted by the use of ``?`` to > introduce the "then" clause in C-style conditional expressions, which many > of these languages also offer). > > > Proposed keywords > ----------------- > > Given the symbolic marker ``?``, it would be syntactically unambiguous to spell > the existence checking precondition and fallback operations using the same > keywords as their truth checking counterparts: > > * ``expr1 ?and expr2`` (instead of ``expr1 ?then expr2``) > * ``expr1 ?or expr2`` (instead of ``expr1 ?else expr2``) > > However, while syntactically unambiguous when written, this approach makes > the code incredibly hard to *pronounce* (What's the pronunciation of "?"?) and > also hard to *describe* (given reused keywords, there's no obvious shorthand > terms for "existence checking precondition (?and)" and "existence checking > fallback (?or)" that would distinguish them from "logical conjunction (and)" > and "logical disjunction (or)"). > > We could try to encourage folks to pronounce the ``?`` symbol as "exists", > making the shorthand names the "exists-and expression" and the > "exists-or expression", but there'd be no way of guessing those names purely > from seeing them written in a piece of code. > > Instead, this PEP takes advantage of the proposed symbolic syntax to introduce > a new keyword (``?then``) and borrow an existing one (``?else``) in a way > that allows people to refer to "then expressions" and "else expressions" > without ambiguity. > > These keywords also align well with the conditional expressions that are > semantically equivalent to the proposed expressions. > > For ``?else`` expressions, ``expr1 ?else expr2`` is equivalent to:: > > _lhs_result = expr1 > _lhs_result if operator.exists(_lhs_result) else expr2 > > Here the parallel is clear, since the ``else expr2`` appears at the end of > both the abbreviated and expanded forms. > > For ``?then`` expressions, ``expr1 ?then expr2`` is equivalent to:: > > _lhs_result = expr1 > expr2 if operator.exists(_lhs_result) else _lhs_result > > Here the parallel isn't as immediately obvious due to Python's traditionally > anonymous "then" clauses (introduced by ``:`` in ``if`` statements and suffixed > by ``if`` in conditional expressions), but it's still reasonably clear as long > as you're already familiar with the "if-then-else" explanation of conditional > control flow. > > > Risks and concerns > ================== > > Readability > ----------- > > Learning to read and write the new syntax effectively mainly requires > internalising two concepts: > > * expressions containing ``?`` include an existence check and may short circuit > * if ``None`` or another "non-existent" value is an expected input, and the > correct handling is to propagate that to the result, then the existence > checking operators are likely what you want > > Currently, these concepts aren't explicitly represented at the language level, > so it's a matter of learning to recognise and use the various idiomatic > patterns based on conditional expressions and statements. > > > Magic syntax > ------------ > > There's nothing about ``?`` as a syntactic element that inherently suggests > ``is not None`` or ``operator.exists``. The main current use of ``?`` as a > symbol in Python code is as a trailing suffix in IPython environments to > request help information for the result of the preceding expression. > > However, the notion of existence checking really does benefit from a pervasive > visual marker that distinguishes it from truth checking, and that calls for > a single-character symbolic syntax if we're going to do it at all. > > > Conceptual complexity > --------------------- > > This proposal takes the currently ad hoc and informal concept of "existence > checking" and elevates it to the status of being a syntactic language feature > with a clearly defined operator protocol. > > In many ways, this should actually *reduce* the overall conceptual complexity > of the language, as many more expectations will map correctly between truth > checking with ``bool(expr)`` and existence checking with > ``operator.exists(expr)`` than currently map between truth checking and > existence checking with ``expr is not None`` (or ``expr is not NotImplemented`` > in the context of operand coercion, or the various NaN-checking operations > in mathematical libraries). > > As a simple example of the new parallels introduced by this PEP, compare:: > > all_are_true = all(map(bool, iterable)) > at_least_one_is_true = any(map(bool, iterable)) > all_exist = all(map(operator.exists, iterable)) > at_least_one_exists = any(map(operator.exists, iterable)) > > > Design Discussion > ================= > > Subtleties in chaining existence checking expressions > ----------------------------------------------------- > > Similar subtleties arise in chaining existence checking expressions as already > exist in chaining logical operators: the behaviour can be surprising if the > right hand side of one of the expressions in the chain itself returns a > value that doesn't exist. > > As a result, ``value = arg1 ?then f(arg1) ?else default()`` would be dubious for > essentially the same reason that ``value = cond and expr1 or expr2`` is dubious: > the former will evaluate ``default()`` if ``f(arg1)`` returns ``None``, just > as the latter will evaluate ``expr2`` if ``expr1`` evaluates to ``False`` in > a boolean context. > > > Ambiguous interaction with conditional expressions > -------------------------------------------------- > > In the proposal as currently written, the following is a syntax error: > > * ``value = f(arg) if arg ?else default`` > > While the following is a valid operation that checks a second condition if the > first doesn't exist rather than merely being false: > > * ``value = expr1 if cond1 ?else cond2 else expr2`` > > The expression chaining problem described above means that the argument can be > made that the first operation should instead be equivalent to: > > * ``value = f(arg) if operator.exists(arg) else default`` > > requiring the second to be written in the arguably clearer form: > > * ``value = expr1 if (cond1 ?else cond2) else expr2`` > > Alternatively, the first form could remain a syntax error, and the existence > checking symbol could instead be attached to the ``if`` keyword: > > * ``value = expr1 if? cond else expr2`` > > > Existence checking in other truth-checking contexts > --------------------------------------------------- > > The truth-checking protocol is currently used in the following syntactic > constructs: > > * logical conjunction (and-expressions) > * logical disjunction (or-expressions) > * conditional expressions (if-else expressions) > * if statements > * while loops > * filter clauses in comprehensions and generator expressions > > In the current PEP, switching from truth-checking with ``and`` and ``or`` to > existence-checking is a matter of substituting in the new keywords, ``?then`` > and ``?else`` in the appropriate places. > > For other truth-checking contexts, it proposes either importing and > using the ``operator.exists`` API, or else continuing with the current idiom > of checking specifically for ``expr is not None`` (or the context appropriate > equivalent). > > The simplest possible enhancement in that regard would be to elevate the > proposed ``exists()`` API from an operator module function to a new builtin > function. > > Alternatively, the ``?`` existence checking symbol could be supported as a > modifier on the ``if`` and ``while`` keywords to indicate the use of an > existence check rather than a truth check. > > However, it isn't at all clear that the potential consistency benefits gained > for either suggestion would justify the additional disruption, so they've > currently been omitted from the proposal. > > > Defining expected invariant relations between ``__bool__`` and ``__exists__`` > ----------------------------------------------------------------------------- > > The PEP currently leaves the definition of ``__bool__`` on all existing types > unmodified, which ensures the entire proposal remains backwards compatible, > but results in the following cases where ``bool(obj)`` returns ``True``, but > the proposed ``operator.exists(obj)`` would return ``False``: > > * ``NaN`` values for ``float``, ``complex``, and ``decimal.Decimal`` > * ``Ellipsis`` > * ``NotImplemented`` > > The main argument for potentially changing these is that it becomes easier to > reason about potential code behaviour if we have a recommended invariant in > place saying that values which indicate they don't exist in an existence > checking context should also report themselves as being ``False`` in a truth > checking context. > > Failing to define such an invariant would lead to arguably odd outcomes like > ``float("NaN") ?else 0.0`` returning ``0.0`` while ``float("NaN") or 0.0`` > returns ``NaN``. > > > Limitations > =========== > > Arbitrary sentinel objects > -------------------------- > > This proposal doesn't attempt to provide syntactic support for the "sentinel > object" idiom, where ``None`` is a permitted explicit value, so a > separate sentinel object is defined to indicate missing values:: > > _SENTINEL = object() > def f(obj=_SENTINEL): > return obj if obj is not _SENTINEL else default_value() > > This could potentially be supported at the expense of making the existence > protocol definition significantly more complex, both to define and to use: > > * at the Python layer, ``operator.exists`` and ``__exists__`` implementations > would return the empty tuple to indicate non-existence, and otherwise return > a singleton tuple containing a reference to the object to be used as the > result of the existence check > * at the C layer, ``tp_exists`` implementations would return NULL to indicate > non-existence, and otherwise return a `PyObject *` pointer as the > result of the existence check > > Given that change, the sentinel object idiom could be rewritten as:: > > class Maybe: > SENTINEL = object() > def __init__(self, value): > self._result = (value,) is value is not self.SENTINEL else () > def __exists__(self): > return self._result > > def f(obj=Maybe.SENTINEL): > return Maybe(obj) ?else default_value() > > However, I don't think cases where the 3 proposed standard sentinel values (i.e. > ``None``, ``Ellipsis`` and ``NotImplemented``) can't be used are going to be > anywhere near common enough for the additional protocol complexity and the loss > of symmetry between ``__bool__`` and ``__exists__`` to be worth it. > > > Specification > ============= > > The Abstract already gives the gist of the proposal and the Rationale gives > some specific examples. If there's enough interest in the basic idea, then a > full specification will need to provide a precise correspondence between the > proposed syntactic sugar and the underlying conditional expressions that is > sufficient to guide the creation of a reference implementation. > > ...TBD... > > > Implementation > ============== > > As with PEP 505, actual implementation has been deferred pending in-principle > interest in the idea of adding these operators - the implementation isn't > the hard part of these proposals, the hard part is deciding whether or not > this is a change where the long term benefits for new and existing Python users > outweigh the short term costs involved in the wider ecosystem (including > developers of other implementations, language curriculum developers, and > authors of other Python related educational material) adjusting to the change. > > ...TBD... > > > References > ========== > > .. [1] Wikipedia: Safe navigation operator > (https://en.wikipedia.org/wiki/Safe_navigation_operator) > > .. [2] Wikipedia: Null coalescing operator > (https://en.wikipedia.org/wiki/Null_coalescing_operator) > > .. [3] FileFormat.info: Unicode Character 'THERE EXISTS' (U+2203) > (http://www.fileformat.info/info/unicode/char/2203/index.htm) > > > Copyright > ========= > > This document has been placed in the public domain under the terms of the > CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (????) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Oct 29 13:19:56 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 30 Oct 2016 02:19:56 +0900 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> Message-ID: <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> Steven d'Aprano writes: > I think you mean WHITE SQUARE? At least, I can not see any "OPEN > SQUARE" code point in Unicode, and the character you use below ? > is called WHITE SQUARE. You're right, I just used a common Japanese name for it. I even checked the table to make sure it was BMP but didn't notice the proper name which is written right there. Sorry for the confusion. Paul Moore writes: > Personally, I'm not even sure I want non-ASCII operators until > non-ASCII characters are common, and used without effort, in natural > language media such as email (on lists like this), source code > comments, documentation, etc. The 3 billion computer users (and their ancestors) who don't live in the U.S. or Western Europe have been using non-ASCII, commonly, without effort, in natural language media on lists like this one for up to 5 decades now. In my own experience, XEmacs lists have explictly allowed Japanese and Russian since 1998, and used to see the occasional posts in German, French and Spanish, with no complaints of mojibake or objections that I can recall. And I have maintained XEmacs code containing Japanese identifiers, both variables and functions, since 1997. I understand why folks are reluctant, but face it, the technical issues were solved before half our users were born. It's purely a social problem now, and pretty much restricted to the U.S. at that. > For better or worse, it may be emoji that drive that change ;-) I suspect that the 100 million or so Chinese, Japanese, Korean, and Indian programmers who have had systems that have no trouble whatsoever handling non-ASCII for as long they've used computers will drive that change. From python at mrabarnett.plus.com Sat Oct 29 13:59:14 2016 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 29 Oct 2016 18:59:14 +0100 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> Message-ID: On 2016-10-29 17:43, Nick Timkovich wrote: [snip] > Also, for discussion, remember to make the distinction between 'units' > (amps, meters, seconds) and 'prefixes' (micro, milli, kilo, mega). Right > away from comments, it seems 1_m could look like 1 meter to some, or > 0.001 to others. Typically when I need to enter very small/large > literals, I'll use "engineering" SI notation (powers divisible by 3 that > correspond to the prefixes): 0.1e-9 = 0.1 micro____. > [snip] 0.1e-9 is 0.1 nano___. From toddrjen at gmail.com Sat Oct 29 14:18:29 2016 From: toddrjen at gmail.com (Todd) Date: Sat, 29 Oct 2016 14:18:29 -0400 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> Message-ID: On Sat, Oct 29, 2016 at 12:43 PM, Nick Timkovich wrote: > From that page: > >> User-defined literals are basically normal function calls with a fancy >> syntax. [...] While user defined literals look very neat, they are not much >> more than syntactic sugar. There is not much difference between defining >> and calling a literal operator with "foo"_bar and doing the same with an >> ordinary function as bar("foo"). In theory, we could write literal >> operators that have side effects and do anything we want, like a normal >> function. > > > Obviously the arbitrary-function-part of that will never happen in Python > (yes?) > > > Why not? It seems like that would solve a lot of use-cases. People get bringing up various new uses for prefix or suffix syntax that they want built directly into the language. Providing a generic way to implement third-party prefixes or suffixes would save having to put all of these directly into the language. And it opens up a lot of other potential use-cases as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Oct 29 15:43:22 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 29 Oct 2016 20:43:22 +0100 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> Message-ID: On 29 October 2016 at 18:19, Stephen J. Turnbull wrote: >> For better or worse, it may be emoji that drive that change ;-) > > I suspect that the 100 million or so Chinese, Japanese, Korean, and > Indian programmers who have had systems that have no trouble > whatsoever handling non-ASCII for as long they've used computers will > drive that change. My apologies. You are of course absolutely right. I'm curious to know how easy it is for Chinese, Japanese, Korean and Indian programmers to use *ASCII* characters. I have no idea in practice whether the current basically entirely-ASCII nature of programming languages is as much a problem for them as I imagine Unicode characters would be for me. I really hope it isn't... Paul From prometheus235 at gmail.com Sat Oct 29 15:55:04 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sat, 29 Oct 2016 14:55:04 -0500 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> <291b8b3d-9ea4-20a8-3703-63652c19019c@mail.de> Message-ID: Ah, always mess up micro = 6/9 until I think about it for half a second. Maybe a "n" suffix could have saved me there ;) For "long" numbers there's the new _ so you can say 0.000_000_1 if you so preferred for 0.1 micro (I generally see _ as more useful for high-precison numbers with more non-zero digits, e.g. 1_234_456_789). Would that be 0.1?, 0.1u in a new system. Veering a bit away from the 'suffixing SI prefixes for literals': Literal unary suffix operators might be slightly nicer than multiplication if they were #1 in operator precedence, then you could omit some parentheses. Right now if I want to use a unit: $ pip install quantities import quantities as pq F = 1 * pq.N d = 1 * pq.m F * d # => array(1.0) * m*N but with literal operators & functions could be something like F = 1 pq.N d = 1 pq.m On Sat, Oct 29, 2016 at 1:18 PM, Todd wrote: > On Sat, Oct 29, 2016 at 12:43 PM, Nick Timkovich > wrote: > >> From that page: >> >>> User-defined literals are basically normal function calls with a fancy >>> syntax. [...] While user defined literals look very neat, they are not much >>> more than syntactic sugar. There is not much difference between defining >>> and calling a literal operator with "foo"_bar and doing the same with an >>> ordinary function as bar("foo"). In theory, we could write literal >>> operators that have side effects and do anything we want, like a normal >>> function. >> >> >> Obviously the arbitrary-function-part of that will never happen in Python >> (yes?) >> >> >> > Why not? It seems like that would solve a lot of use-cases. People get > bringing up various new uses for prefix or suffix syntax that they want > built directly into the language. Providing a generic way to implement > third-party prefixes or suffixes would save having to put all of these > directly into the language. And it opens up a lot of other potential > use-cases as well. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Sat Oct 29 17:03:29 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sat, 29 Oct 2016 23:03:29 +0200 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> Message-ID: On 29 October 2016 at 18:19, Stephen J. Turnbull wrote: >> For better or worse, it may be emoji that drive that change ;-) >> >> I suspect that the 100 million or so Chinese, Japanese, Korean, and >> Indian programmers who have had systems that have no trouble >> whatsoever handling non-ASCII for as long they've used computers will >> drive that change. >My apologies. You are of course absolutely right. > >I'm curious to know how easy it is for Chinese, Japanese, Korean and >Indian programmers to use *ASCII* characters. I have no idea in >practice whether the current basically entirely-ASCII nature of >programming languages is as much a problem for them as I imagine >Unicode characters would be for me. I really hope it isn't... > >Paul The only way to do it http://ic.pics.livejournal.com/ibigdan/8161099/4947638/4947638_original.jpg Seriously, as a russian, I never had any problems with understanding that I should not go that far. I don't know of any axamples when using translit caused any personal problems in online conversations, unless it comes to quarrels and one tries to insult others for using translit. But russians are generally more minimalistically tuned than many other folks. As for returning non null, I suppose most readable way would be something like: non_null(a,b,c...) (sorry if I am missing the whole discussion topic, can easily happen with me since it is really mind blowing, why I would ever need it) Mikhail From ncoghlan at gmail.com Sat Oct 29 22:00:33 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Oct 2016 12:00:33 +1000 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: <20161029114416.GY15983@ando.pearwood.info> References: <20161029114416.GY15983@ando.pearwood.info> Message-ID: On 29 October 2016 at 21:44, Steven D'Aprano wrote: > On Fri, Oct 28, 2016 at 06:30:05PM +1000, Nick Coghlan wrote: > > [...] >> 1. Do we collectively agree that "existence checking" is a useful >> general concept that exists in software development and is distinct >> from the concept of "truth checking"? > > Not speaking for "we", only for myself: of course. > > >> 2. Do we collectively agree that the Python ecosystem would benefit >> from an existence checking protocol that permits generalisation of >> algorithms (especially short circuiting ones) across different "data >> missing" indicators, including those defined in the language >> definition, the standard library, and custom user code? > > Maybe, but probably not. > > Checking for "data missing" or other sentinels is clearly an important > thing to do, but it remains to be seen whether (1) it should be > generalised and (2) there is a benefit to making it a protocol. > > My sense so far is that generalising beyond None is YAGNI. None of the > other examples you give strike me as common enough to justify special > syntax, or even a protocol. I'm not *against* the idea, I just remain > unconvinced. I considered this the weakest link in the proposal when I wrote it, and the discussion on the list has persuaded me that it's not just a weak link, it's a fatal flaw. Accordingly, I've withdrawn the PEP, and explained why with references back to this discussion: https://github.com/python/peps/commit/9a70e511ada63b976699bbab9da142379340758c However, as noted there, I find the possible link back to the rejected boolean operator overloading proposal in PEP 335 interesting, so I'm going to invest some time in writing that up to the same level as I did the existence checking one (i.e. Abstract, Rationale & design discussion, without a full specification or reference implementation yet). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shoyer at gmail.com Sun Oct 30 02:44:16 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 29 Oct 2016 23:44:16 -0700 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: <20161029105321.GX15983@ando.pearwood.info> References: <20161029105321.GX15983@ando.pearwood.info> Message-ID: On Sat, Oct 29, 2016 at 3:53 AM, Steven D'Aprano wrote: > Hmmm. I see your point, but honestly, None *is* special. Even for > special objects, None is even more special. As a contributor to and user of many numerical computing libraries in Python (e.g., NumPy, pandas, Dask, TensorFlow) I also agree here. Implicit non-existence for NotImplemented and Ellipsis seem particularly problematic, because these types are rarely used, and the meaning of these types intentionally differs from other missing types: - In NumPy, None is a valid indexing argument, used as a sentinel marker for "insert a new axis here". Thus x[..., None] means "insert a new axis at the end." - Likewise, implicit checks for NotImplemented would be problematic in arithmetic, because NaN is also a perfectly valid result value for arithmetic. Especially in this case, checking for "missingness" could look attractive at first glance to implementors of special methods for arithmetic but could later lead to subtle bugs. I'm have more mixed fillings on testing for NaNs. NaNs propagate, so explicit testing is rarely needed. Also, in numerical computing we usually work with arrays of NaN, so operator.exists() and all this nice syntax would not be a substitute for numpy.isnan or pandas.isnull. On the whole, I do think that adding systematic checks for None to Python with dedicate syntax would be a win. If making NaN "missing" and allowing user defined types to be "missing" would help make that happen, then sure, go ahead, but I see few use cases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 30 03:00:33 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 30 Oct 2016 16:00:33 +0900 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> Message-ID: <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > On 29 October 2016 at 18:19, Stephen J. Turnbull > wrote: > >> For better or worse, it may be emoji that drive that change ;-) > > > > I suspect that the 100 million or so Chinese, Japanese, Korean, and > > Indian programmers who have had systems that have no trouble > > whatsoever handling non-ASCII for as long they've used computers will > > drive that change. > > My apologies. You are of course absolutely right. tl;dr: A quick apology for the snark, and an attempt at FUD reduction. Using non-ASCII characters will involve some cost, but there are real benefits, and the fear and loathing often evoked by the prospect is unnecessary. I'm not ready to advocate introduction *right* now, but "never" isn't acceptable either. :-) On with the show: "Absolutely" is more than I deserve, as I was being a bit snarky. That said, Ed Yourdon wrote a book in 1990 or so with the self-promoting title of "Decline and Fall of the American Programmer"[1] in which he argued that for many kinds of software outsourcing to China, India, or Ireland got you faster, better, cheaper, and internationalized, with no tradeoffs. (The "and internationalized" is my hobby horse, it wasn't part of Yourdon's thesis.) He later recanted the extremist doomsaying, but a quick review of the fraction of H1B visas granted to Asian-origin programmers should convince you that USA/EUR/ANZ doesn't have a monopoly of good-to-great programming (probably never did, but that's a topic for a different thread). Also note that in Japan, without controlling for other factors, just the programming language used most frequently, Python programmers are the highest paid among developers in all languages with more than 1% of the sample (and yes, that includes COBOL!) To the extent that internationalization matters to a particular kind of programming, these programmers are better placed for those jobs, I think. And while in many cases "on site" has a big advantage (so you can't telecommute from Bangalore, you need that H1B which is available in rather restrictive number), more and more outsourcing does cross oceans so potential competition is immense. There is a benefit to increasing our internationalization in backward- incompatible ways. And that benefit is increasing both in magnitude and in the number of Python developers who will receive it. > I'm curious to know how easy it is for Chinese, Japanese, Korean and > Indian programmers to use *ASCII* characters. I have no idea in > practice whether the current basically entirely-ASCII nature of > programming languages is as much a problem for them Characters are zero problem for them. The East Asian national standards all include the ASCII repertoire, and some device (usually based on ISO 2022 coding extensions rather than UTF-8) for allowing ASCII to be one-byte, even if the "local" characters require two or more bytes. I forget if India's original national standard also included an ASCII subset, but they switched over to Unicode quite early[2], so UTF-8 does the trick for them. English (the language) is a much bigger issue. Most Indians, of course, have little trouble with the derived-from- English nature of much programming syntax and library identifiers, and the Asians all get enough training in both (very) basic English and rote memorization that handling English-derived syntax and library nomenclature is not a problem. However, reading and especially creating documentation can be expensive and inaccurate. At least in Japanese, "straightforward" translations are often poor, as nuances are lost. E.g., a literal Japanese translation from English requires many words to indicate the differences a simple "a" vs. "the" vs. "some" indicates in English. Mostly such nuances can be expressed economically by restructuring a whole paragraph, but translators rarely bother and often seem unaware of the issues. Many Japanese programmers' use of articles is literally chaotic: it's deterministic but appears random to all but the most careful analysis.[3] > as I imagine Unicode characters would be for me. I really hope it > isn't... I think your imagination is running away with you. While I understand how costly it is for those over the age of 12 to develop new habits (I'm 58, and painfully aware of how frequently I balk at learning anything new no matter how productivity-enhancing it is likely to be, and how much more slowly it becomes part of my repertoire), the number of new things you would need to learn would be few, and frequently enough used, at least in Python. It's hard enough to get Guido (and the other Masters of Pythonic Language Design) to sign on to new ASCII syntax; even if in principle non-ASCII were to be admitted, I suspect the barrier there would be even higher. Most of Unicode is irrelevant to everybody. Mathematicians use only a small fraction of the math notation available to them -- it's just that it's a different small fraction for each field. The East Asians need a big chunk (I would guess that educated Chinese and Japanese encounter about 10,000 characters in "daily life" over a lifetime, while those encountered at least once a week number about 3000), but those that need to be memorized are a small minority (less than 5%) of the already defined Unicode repertoire. For Western programmers, the mechanics are almost certainly there. Every personal computer should have at least one font containing all characters defined in the Basic Multilingual Plane, and most will have chunks of the astral planes (emoji, rare math symbols, country flags, ...). Even the Happy Hacker keyboard has enough mode keys (shift, control, ...) to allow defining "3-finger salutes" for commonly-used characters not on the keycaps -- in daily life if you don't need a input method now, you won't need one if Python decides to use WHITE SQUARE to represent an operation you frequently use -- just an extra "control key combo" like the editing control keys (eg, for copy, cut, paste, undo) that aren't marked on any keyboard I have. I'm *not* advocating *imposing* the necessary effort on anyone right now. I just want to reduce the FUD associated with the prospect that it *might* be imposed on *you*, so that you can evaluate the benefits in light of the real costs. They're not zero, but they're unlikely to ruin your whole day, every day, for months.[4] "Although sometimes never is better than *right* now" doesn't apply here. :-) Footnotes: [1] India is a multiscript country, so faces the same pressure for a single, internationally accepted character set as the whole world does, albeit at a lower level. [2] "The American Programmer" was the name of Yourdon's consultancy's newsletter to managers of software projects and software development organizations. [3] Of course the opposite is true when I write Japanese. In particular, there's a syntactic component called "particle" (the closest English equivalent is "preposition", but particles have much more general roles) that I'm sure my usage is equally chaotic from the point of view of a native speaker of Japanese -- even after working in the language for 25 years! N.B. I'm good enough at the language to have written grant proposals that were accepted in it -- and still my usage of particles is unreliable. [4] Well, if your role involves teaching other programmers, their pushback could be a long-lasting irritant. :-( From p.f.moore at gmail.com Sun Oct 30 08:22:10 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Oct 2016 12:22:10 +0000 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: On 30 October 2016 at 07:00, Stephen J. Turnbull wrote: >> as I imagine Unicode characters would be for me. I really hope it > > isn't... > > I think your imagination is running away with you. While I understand > how costly it is for those over the age of 12 to develop new habits > (I'm 58, and painfully aware of how frequently I balk at learning > anything new no matter how productivity-enhancing it is likely to be, > and how much more slowly it becomes part of my repertoire), the number > of new things you would need to learn would be few, and frequently > enough used, at least in Python. It's hard enough to get Guido (and > the other Masters of Pythonic Language Design) to sign on to new ASCII > syntax; even if in principle non-ASCII were to be admitted, I suspect > the barrier there would be even higher. > > Most of Unicode is irrelevant to everybody. Mathematicians use only a > small fraction of the math notation available to them -- it's just > that it's a different small fraction for each field. The East Asians > need a big chunk (I would guess that educated Chinese and Japanese > encounter about 10,000 characters in "daily life" over a lifetime, > while those encountered at least once a week number about 3000), but > those that need to be memorized are a small minority (less than 5%) of > the already defined Unicode repertoire. > > For Western programmers, the mechanics are almost certainly there. > Every personal computer should have at least one font containing all > characters defined in the Basic Multilingual Plane, and most will have > chunks of the astral planes (emoji, rare math symbols, country flags, > ...). Even the Happy Hacker keyboard has enough mode keys (shift, > control, ...) to allow defining "3-finger salutes" for commonly-used > characters not on the keycaps -- in daily life if you don't need a > input method now, you won't need one if Python decides to use WHITE > SQUARE to represent an operation you frequently use -- just an extra > "control key combo" like the editing control keys (eg, for copy, cut, > paste, undo) that aren't marked on any keyboard I have. My point wasn't so much about dealing with the character set of Unicode, as it was about physical entry of non-native text. For example, on my (UK) keyboard, all of the printed keycaps are basically used. And yet, I can't even enter accented letters from latin-1 with a standard keypress, much less extended Unicode. Of course it's possible to get those characters (either by specialised mappings in an editor, or by using an application like Character Map) but there's nothing guaranteed to work across all applications. That's a hardware and OS limitation - the hardware only has so many keys to use, and the OS (Windows, in my case) doesn't support global key mapping (at least not to my knowledge, in a user-friendly manner - I'm excluding writing my own keyboard driver :-)) My interest in East Asian experience is at least in part because the "normal" character sets, as I understand it, are big enough that it's impractical for a keyboard to include a plausible basic range of characters, so I'm curious as to what the physical process is for typing from a vocabulary of thousands of characters on a sanely-sized keyboard. In mentioning emoji, my main point was that "average computer users" are more and more likely to want to use emoji in general applications (emails, web applications, even documents) - and if a sufficiently general solution for that problem is found, it may provide a solution for the general character-entry case. (Also, I couldn't resist the irony of using a :-) smiley while referring to emoji...) But it may be that app-specific solutions (e.g., the smiley menu in Skype) are sufficient for that use case. Or the typical emoji user is likely to be using a tablet/phone rather than a keyboard, and mobile OSes have included an emoji menu in their on-screen keyboards. Coming back to a more mundane example, if I need to type a character like ? in an email, I currently need to reach for Character Map and cut and paste it. The same is true if I have to type it into the console. That's a sufficiently annoying stumbling block that I'm inclined to avoid it - using clumsy workarounds like referring to "the OP" rather than using their name. I'd be fairly concerned about introducing non-ASCII syntax into Python while such stumbling blocks remain - the amount of code typed outside of an editor (interactive prompt, emails, web applications like Jupyter) mean that editor-based workarounds like custom mappings are only a partial solution. But maybe you are right, and it's just my age showing. The fate of APL probably isn't that relevant these days :-) (or ? if you prefer...) Paul From rosuav at gmail.com Sun Oct 30 08:31:54 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 30 Oct 2016 23:31:54 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sun, Oct 30, 2016 at 11:22 PM, Paul Moore wrote: > In mentioning emoji, my main point was that "average computer users" > are more and more likely to want to use emoji in general applications > (emails, web applications, even documents) - and if a sufficiently > general solution for that problem is found, it may provide a solution > for the general character-entry case. Before Unicode emoji were prevalent, ASCII emoticons dominated, and it's not uncommon for multi-character sequences to be automatically transformed into their corresponding emoji. It isn't hard to set something up that does these kinds of transformations for other Unicode characters - use trigraphs for clarity, and type "/:0" to produce "?". Or whatever's comfortable for you. Maybe rig it on Ctrl-Alt-0, if you prefer shift-key sequences. ChrisA From mertz at gnosis.cx Sun Oct 30 09:26:13 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 30 Oct 2016 06:26:13 -0700 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: <20161029105321.GX15983@ando.pearwood.info> Message-ID: On Sat, Oct 29, 2016 at 11:44 PM, Stephan Hoyer wrote: > I'm have more mixed fillings on testing for NaNs. NaNs propagate, so > explicit testing is rarely needed. Also, in numerical computing we usually > work with arrays of NaN, so operator.exists() and all this nice syntax > would not be a substitute for numpy.isnan or pandas.isnull. > NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE 754. >>> nan, inf = float('nan'), float('inf') >>> import math >>> nan**0 1.0 >>> math.hypot(nan, inf) inf >>> min(1, nan) 1 The last one isn't really mandated by IEEE 754, and is weird when you consider `min(nan, 1)`. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 30 09:39:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Oct 2016 13:39:32 +0000 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: On 30 October 2016 at 12:31, Chris Angelico wrote: > On Sun, Oct 30, 2016 at 11:22 PM, Paul Moore wrote: >> In mentioning emoji, my main point was that "average computer users" >> are more and more likely to want to use emoji in general applications >> (emails, web applications, even documents) - and if a sufficiently >> general solution for that problem is found, it may provide a solution >> for the general character-entry case. > > Before Unicode emoji were prevalent, ASCII emoticons dominated, and > it's not uncommon for multi-character sequences to be automatically > transformed into their corresponding emoji. It isn't hard to set > something up that does these kinds of transformations for other > Unicode characters - use trigraphs for clarity, and type "/:0" to > produce "?". Or whatever's comfortable for you. Maybe rig it on > Ctrl-Alt-0, if you prefer shift-key sequences. It's certainly not difficult, in principle. I have (had, I lost it in an upgrade recently...) a little AutoHotkey program that interpreted Vim-style digraphs in any application that needed them. But my point was that we don't want to require people to write such custom utilities, just to be able to write Python code. Or is the feeling that it's acceptable to require that? Paul From ncoghlan at gmail.com Sun Oct 30 10:02:54 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 31 Oct 2016 00:02:54 +1000 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: On 30 October 2016 at 23:39, Paul Moore wrote: > It's certainly not difficult, in principle. I have (had, I lost it in > an upgrade recently...) a little AutoHotkey program that interpreted > Vim-style digraphs in any application that needed them. But my point > was that we don't want to require people to write such custom > utilities, just to be able to write Python code. Or is the feeling > that it's acceptable to require that? Getting folks used to the idea that they need to use the correct kinds of quotes is already challenging :) However, the main issue is the one I mentioned in PEP 531 regarding the "THERE EXISTS" symbol: Python and other programming languages re-use "+", "-", "=" etc because a lot of folks are already familiar with them from learning basic arithmetic. Other symbols are used in Python because they were inherited from C, or were relatively straightforward puns on such previously inherited symbols. What this means is that there aren't likely to be many practical gains in using the "right" symbol for something, even when it's already defined in Unicode, as we expect the number of people learning that symbology *before* learning Python to be dramatically smaller than the proportion learning Python first and the formal mathematical symbols later (if they learn them at all). This means that instead of placing more stringent requirements on editing environments for Python source code in order to use non-ASCII input symbols, we're still far more likely to look to define a suitable keyword, or assign a relatively arbitrary meaning to an ASCII punctuation symbol (and that's assuming we accept that a proposal will see sufficient use to be worthy of new syntax in the first place, which is far from being a given). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Sun Oct 30 10:13:08 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 31 Oct 2016 01:13:08 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Oct 31, 2016 at 12:39 AM, Paul Moore wrote: > It's certainly not difficult, in principle. I have (had, I lost it in > an upgrade recently...) a little AutoHotkey program that interpreted > Vim-style digraphs in any application that needed them. But my point > was that we don't want to require people to write such custom > utilities, just to be able to write Python code. Or is the feeling > that it's acceptable to require that? There's a chicken-and-egg problem. So long as most people don't have tools like that, a language that requires them is going to be very annoying - but so long as no major language uses such characters, there's no reason for developers to set up those kinds of tools. Possibly the best way is a gentle introduction of alternative syntaxes. Since Python currently has no "empty set display" syntax, that seems like a perfect starting point. You can always type "set()", but that involves an actual function call; using ? gives a small performance boost, eliminates the risk of shadowing, etc, etc. All minor points, but could be convenient enough. Also, if repr(set()) returns "?", it'll be easy for anyone to get hold of the character for copy/paste. As of 2016, I think it's not acceptable to *require* this, but it may be time to start making use of it, retaining ASCII-only digraphs and trigraphs, the way C has alternative spelling for braces and so on. Then time passes, most people will be comfortable using the characters themselves, and the digraphs/trigraphs can be deprecated, with new syntax not being given any. Pipe dream? ChrisA From tritium-list at sdamon.com Sun Oct 30 10:43:08 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Sun, 30 Oct 2016 10:43:08 -0400 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: <004c01d232bb$eecb4de0$cc61e9a0$@hotmail.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Paul Moore > Sent: Sunday, October 30, 2016 8:22 AM > To: Stephen J. Turnbull > Cc: Python-Ideas > Subject: Re: [Python-ideas] Non-ASCII in Python syntax? [was: Null > coalescing operator] > > > My point wasn't so much about dealing with the character set of > Unicode, as it was about physical entry of non-native text. For > example, on my (UK) keyboard, all of the printed keycaps are basically > used. And yet, I can't even enter accented letters from latin-1 with a > standard keypress, much less extended Unicode. Of course it's possible > to get those characters (either by specialised mappings in an editor, > or by using an application like Character Map) but there's nothing > guaranteed to work across all applications. That's a hardware and OS > limitation - the hardware only has so many keys to use, and the OS > (Windows, in my case) doesn't support global key mapping (at least not > to my knowledge, in a user-friendly manner - I'm excluding writing my > own keyboard driver :-)) My interest in East Asian experience is at > least in part because the "normal" character sets, as I understand it, > are big enough that it's impractical for a keyboard to include a > plausible basic range of characters, so I'm curious as to what the > physical process is for typing from a vocabulary of thousands of > characters on a sanely-sized keyboard. > Just picking a nit, here, windows will happily let you do silly things like hook 14 keyboards up and let you map all of emoji to them. Sadly, this requires lua. > In mentioning emoji, my main point was that "average computer users" > are more and more likely to want to use emoji in general applications > (emails, web applications, even documents) - and if a sufficiently > general solution for that problem is found, it may provide a solution > for the general character-entry case. (Also, I couldn't resist the > irony of using a :-) smiley while referring to emoji...) But it may be > that app-specific solutions (e.g., the smiley menu in Skype) are > sufficient for that use case. Or the typical emoji user is likely to > be using a tablet/phone rather than a keyboard, and mobile OSes have > included an emoji menu in their on-screen keyboards. > > Coming back to a more mundane example, if I need to type a character > like ? in an email, I currently need to reach for Character Map and > cut and paste it. The same is true if I have to type it into the > console. That's a sufficiently annoying stumbling block that I'm > inclined to avoid it - using clumsy workarounds like referring to "the > OP" rather than using their name. I'd be fairly concerned about > introducing non-ASCII syntax into Python while such stumbling blocks > remain - the amount of code typed outside of an editor (interactive > prompt, emails, web applications like Jupyter) mean that editor-based > workarounds like custom mappings are only a partial solution. > > But maybe you are right, and it's just my age showing. The fate of APL > probably isn't that relevant these days :-) (or ? if you prefer...) > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Sun Oct 30 10:47:49 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Oct 2016 14:47:49 +0000 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <004c01d232bb$eecb4de0$cc61e9a0$@hotmail.com> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <004c01d232bb$eecb4de0$cc61e9a0$@hotmail.com> Message-ID: On 30 October 2016 at 14:43, wrote: > Just picking a nit, here, windows will happily let you do silly things like hook 14 keyboards up and let you map all of emoji to them. Sadly, this requires lua. Off topic, I know, but how? I have a laptop with an external and an internal keyboard. Can I map the internal keyboard to different characters somehow? Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 30 10:51:18 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 30 Oct 2016 23:51:18 +0900 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: <22550.2278.95938.325560@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > My point wasn't so much about dealing with the character set of > Unicode, as it was about physical entry of non-native text. For > example, on my (UK) keyboard, all of the printed keycaps are basically > used. How do you type the pound sign and the Euro sign? Are they on the UK keyboard? Or are you not in the UK and don't need them? > And yet, I can't even enter accented letters from latin-1 with a > standard keypress, much less extended Unicode. I'm pretty sure you can, but since I've been Windows-free for 20 years (except for a short period when I was treasurer for an NPO, and only used it to access the accounting system), I can't tell you what it is. On the Mac, you press alt/option plus a graphic key. Most result in what somebody decided are common non-ASCII characters (German sharp S, Greek lowercase mu, Greek upper- and lowercase sigma), but several are dead keys, producing accented characters when combined with a base character: tilde, accents acute and grave, and so on. Surely Windows has a similar system (I don't mean Alt+digits). (But maybe not, I didn't notice one in my brief Googling.) > My interest in East Asian experience is at least in part because > the "normal" character sets, as I understand it, are big enough > that it's impractical for a keyboard to include a plausible basic > range of characters, so I'm curious as to what the physical process > is for typing from a vocabulary of thousands of characters on a > sanely-sized keyboard. You're right about the size. Korean is special, because the 11,000- odd Hangul are phonetic and generated algorithmically from a set of about 70 phonetic partial glyphs, divided into three groups. The same keys do multiple duty when typed in phonetic order. Other systems use the shift key. For the 100,000 Han ideographs[1], there are a wide variety of methods for entry by key sequence, ranging from code point entry to context-dependent phonetic entry of entire sentences as they would be spoken. Then, of course, there's voice recognition, and handwriting recognition (both static from the image, and dynamic, taking account of the order of pen strokes). The more advanced input methods not only take account of grammar, but also learn the users' habits, remember recent conversions, and predict coming keystrokes based on current context, offering several conversions based on plausible continuations. > In mentioning emoji, my main point was that "average computer > users" are more and more likely to want to use emoji in general > applications (emails, web applications, even documents) - and if a > sufficiently general solution for that problem is found, it may > provide a solution for the general character-entry case. Not for the Asian languages. For them, "character entry" in the sense of character-by-character has long since been obsoleted by predictive sentence-level phonetic methods. But emoji are a perfect example for the present purpose, since they don't have standard pronunciations (although probably many will get them based on the Unicode standard names). On systems with high- enough resolution displays, a palette showing the glyphs is the obvious solution. But that's not pleasant if you type quickly and need those characters frequently. I don't think there's an alternative for emoji though, except for personalized shortcut maps. Math symbols are similar, I think. > Coming back to a more mundane example, if I need to type a character > like ? in an email, I currently need to reach for Character Map and > cut and paste it. The same is true if I have to type it into the > console. You probably have Control, Windows, Menu, Alt, and maybe a "function" key. If you're lucky, one labelled AltGr for "Alternate Graphic" is the obvious suspect. Some combination of the above probably allows entry of accented Latin-1 characters, miscellaneous Latin-1 (eg, sharp S), and a few oddballs (Greek letters, ligatures like oe, the leminiscate usually read infinity). > That's a sufficiently annoying stumbling block It very well could be, although my Windows Google-foo isn't great. But this is what I found. For WHITE SQUARE, the Mac doesn't have a keyboard equivalent, but there's a standard way to set up a set of shortcut keys[2]: http://stackoverflow.com/questions/3685146/how-do-you-do-the-therefore-%E2%88%B4-symbol-on-a-mac-or-in-textmate And I think you can also use the "Input Preferences" screen in System Preferences to set up a few of them. For Windows, it seems that Alt+decimal character codes, or hex Unicode followed by Alt+x are the built-in ways to enter characters not on your keyboard. It's also possible to set up "Math Autocorrect" to automatically convert keysequences according to https://blogs.msdn.microsoft.com/murrays/2011/08/29/sans-serif-mathematical-symbols/ but that's hardly obvious (although maybe it is if you're Dutch?) I have to wonder why so many people stick with a system that seems to hate its users. :-( Footnotes: [1] I'm counting several thousand Taiwanese standard glyphs whose pronunciation and meaning is no longer known (they're culled from old manuscripts), as well as each of the 2 or 3 variants of several thousand characters given simplified glyphs by the Japanese and PRC standard bodies, because all have separate Unicode codepoints assigned. [2] Note: I had to Google this because I use Japanese input methods: when I want a square I type the Japanese word for "square" and then press "next conversion" until the square I want shows up. This also works for most Greek letters and math symbols. This doesn't bother me, because it's normal for typing Japanese (and I do mix Japanese and English enough that I know that it doesn't bug me when I need such a character in an otherwise all-English text). I suspect it would be inadequate for someone who doesn't also type a language requiring a complex input method. From abrault at mapgears.com Sun Oct 30 10:55:15 2016 From: abrault at mapgears.com (Alexandre Brault) Date: Sun, 30 Oct 2016 10:55:15 -0400 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <004c01d232bb$eecb4de0$cc61e9a0$@hotmail.com> Message-ID: On 2016-10-30 10:47 AM, Paul Moore wrote: > On 30 October 2016 at 14:43, wrote: >> Just picking a nit, here, windows will happily let you do silly things like hook 14 keyboards up and let you map all of emoji to them. Sadly, this requires lua. > Off topic, I know, but how? I have a laptop with an external and an > internal keyboard. Can I map the internal keyboard to different > characters somehow? > Look up "The Art of the Bodge: How I Made the Emoji Keyboard" by Tom Scott on Youtube. As the name implies, it's a huge hack with no practicality whatsoever Alex From p.f.moore at gmail.com Sun Oct 30 11:45:25 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Oct 2016 15:45:25 +0000 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <22550.2278.95938.325560@turnbull.sk.tsukuba.ac.jp> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <22550.2278.95938.325560@turnbull.sk.tsukuba.ac.jp> Message-ID: On 30 October 2016 at 14:51, Stephen J. Turnbull wrote: > Paul Moore writes: > > > My point wasn't so much about dealing with the character set of > > Unicode, as it was about physical entry of non-native text. For > > example, on my (UK) keyboard, all of the printed keycaps are basically > > used. > > How do you type the pound sign and the Euro sign? Are they on the UK > keyboard? Or are you not in the UK and don't need them? They are on the keyboard. The ? sign is shift-3, the ? sign uses the AltGr key (which is woefully underused on the standard UK keyboard driver - accented letters *should* be available using it :-() > > And yet, I can't even enter accented letters from latin-1 with a > > standard keypress, much less extended Unicode. > > I'm pretty sure you can, Believe me, I've tried. But I should point out that I *don't* count the "official" way (Alt plus typing the numeric code out on the numeric keypad) as a viable option: 1. It only works for the current codepage, I believe. 2. It gets intercepted by applications (I just tried it here, in the gmail webapp, and got dumped out of the site to a google search page, I've no idea why). > You probably have Control, Windows, Menu, Alt, and maybe a "function" > key. If you're lucky, one labelled AltGr for "Alternate Graphic" is > the obvious suspect. Some combination of the above probably allows > entry of accented Latin-1 characters, miscellaneous Latin-1 (eg, sharp > S), and a few oddballs (Greek letters, ligatures like oe, the > leminiscate usually read infinity). It doesn't, by default. Specialised programs can customise keypresses, but I'd hate to teach Python to newcomers if I needed something like that. (And by "newcomers" I'd include all of my work colleagues, who are far from computer illiterate...) > For Windows, it seems that Alt+decimal character codes, or hex Unicode > followed by Alt+x are the built-in ways to enter characters not on > your keyboard. It's also possible to set up "Math Autocorrect" to > automatically convert keysequences according to > https://blogs.msdn.microsoft.com/murrays/2011/08/29/sans-serif-mathematical-symbols/ > but that's hardly obvious (although maybe it is if you're Dutch?) And it's application specific - noted in the article, "One way any character can be entered into Word or OneNote (but not into PowerPoint, sigh) is" > I have to wonder why so many people stick with a system that seems to > hate its users. :-( OT, but in my case, because it's very good at making a lot of the key things you need to do easy. It's immensely hostile in many ways, but typically if you're finding that to be the case, you're pretty clearly doing something that's not part of the "core target audience". Like console programs, Unicode outside a specific code page, etc. But if you are sticking to the norm, it's great. A question, though. On Linux, (pick your distribution, but ideally "it doesn't matter") how would I type ?, ?, ? ? Assume any answer that starts with "look up the numeric code" is unacceptable, as is anything that only works in a specific application. I'm willing to accept a need for a one-off configuration of some mapping table to get ?, but accented letters and "common" characters like smileys should really be available by default. Assume a qwerty keyboard, something like UK or US layout (because it's the English speakers who need the most help remembering that the whole world isn't ASCII :-)) I doubt it's that much easier than it is on Windows. My ideal is that something like what I defined in the above paragraph *is* the norm, for all computer users. It's just plain silly that English speakers can't type caf?, or a German friend's correctly spelled name, without effort. Anyhow, this is way off topic now. Paul From bussonniermatthias at gmail.com Sun Oct 30 12:03:17 2016 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sun, 30 Oct 2016 09:03:17 -0700 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: Hi all, For those of you not aware, the Julia Programming Language [1] does make extensive use of (mathematical) unicode symbols in its standard library, even document a method of input [2] (hint tab completion). They go even further by recognizing some characters (like \oplus) that parse as operators and have predefined precedences, but no implementations, leaving them available to the user. Regardless of my personal feeling about that, I have observed that this does not seem to hinder Julia developement. Many developers seem to like it a lot. Though my sampling is heavily biased toward developers with a strong math background. So it might be a case study to actually see how this affect an existing language both technically and community wide. Cheers, -- M [1] : julialang.org [2] : http://docs.julialang.org/en/release-0.5/manual/unicode-input/ [3] : http://docs.julialang.org/en/release-0.5/manual/variables/#allowed-variable-names On Sun, Oct 30, 2016 at 7:02 AM, Nick Coghlan wrote: > On 30 October 2016 at 23:39, Paul Moore wrote: >> It's certainly not difficult, in principle. I have (had, I lost it in >> an upgrade recently...) a little AutoHotkey program that interpreted >> Vim-style digraphs in any application that needed them. But my point >> was that we don't want to require people to write such custom >> utilities, just to be able to write Python code. Or is the feeling >> that it's acceptable to require that? > > Getting folks used to the idea that they need to use the correct kinds > of quotes is already challenging :) > > However, the main issue is the one I mentioned in PEP 531 regarding > the "THERE EXISTS" symbol: Python and other programming languages > re-use "+", "-", "=" etc because a lot of folks are already familiar > with them from learning basic arithmetic. Other symbols are used in > Python because they were inherited from C, or were relatively > straightforward puns on such previously inherited symbols. > > What this means is that there aren't likely to be many practical gains > in using the "right" symbol for something, even when it's already > defined in Unicode, as we expect the number of people learning that > symbology *before* learning Python to be dramatically smaller than the > proportion learning Python first and the formal mathematical symbols > later (if they learn them at all). > > This means that instead of placing more stringent requirements on > editing environments for Python source code in order to use non-ASCII > input symbols, we're still far more likely to look to define a > suitable keyword, or assign a relatively arbitrary meaning to an ASCII > punctuation symbol (and that's assuming we accept that a proposal will > see sufficient use to be worthy of new syntax in the first place, > which is far from being a given). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From mikhailwas at gmail.com Sun Oct 30 12:48:07 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Sun, 30 Oct 2016 17:48:07 +0100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: Steven D'Aprano wrote: > I cannot wait for the day that we can use non-ASCII operators. But I > don't think that day has come: it is still too hard for many people > (including me) to generate non-ASCII characters at the keyboard, and > font support for some of the more useful ones are still inconsistent or > lacking. > For example, we don't have a good literal for empty sets. How about ?? > Sadly, in my mail client and in the Python REPR, it displays as a > "missing glyph" open rectangle. And how would you type it? I will just share my view on the whole problematic, trying to concentrate more on the actual code look. So I see it all as a chain of big steps, roughly: 1. One defines *the real code* or syntax, this means: One takes a pen and a paper (photoshop/paint bucket) and *defines* the syntax, this means one defines everything as it is, including pixel precise spaces between operators, punctuation and so on. 2. One develops an application (IDE) which enables you to automatically load code file and (at least) view it *exactly* as you have defined it. 3. Only after that one starts to think about ASCII/unicode/Hangul (forgive me Lord) or whatever someone has defined as something useful/standard. > Java, I believe, allows you to enter escape sequences in source code, > not just in strings. So we could hypothetically allow one of: > > myobject\N{WHITE SQUARE}attribute > myobject\u25a1attribute > > as a pure-ASCII way of getting > > myobject?attribute So this actually would be a possible kind of "bridge" from the real code to what is shown up in arbitrary text editing application or a mailing client. In other words, you believe that in Unicode table you'll find something useful for code definition, but I personally would not even start relying on that, also because it is merely down-top problem solving. Mikhail From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 30 13:17:06 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 31 Oct 2016 02:17:06 +0900 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: <22550.11026.459060.144559@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > My point wasn't so much about dealing with the character set of > Unicode, as it was about physical entry of non-native text. For > example, on my (UK) keyboard, all of the printed keycaps are basically > used. How do you type the pound sign and the Euro sign? Are they on the UK keyboard? Or are you not in the UK and don't need them? > And yet, I can't even enter accented letters from latin-1 with a > standard keypress, much less extended Unicode. I'm pretty sure you can, but since I've been Windows-free for 20 years (except for a short period when I was treasurer for an NGO, and only used it to access the accounting system), I can't tell you what it is. On the Mac, you press alt/option plus a graphic key. Most result in what somebody decided are common non-ASCII characters (German sharp S, Greek lowercase mu, Greek upper- and lowercase sigma), but several are dead keys, producing accented characters when combined with a base character: tilde, accents acute and grave, and so on. Surely Windows has a similar system (I don't mean Alt+digits). (But maybe not, I didn't notice one in my brief Googling.) > My interest in East Asian experience is at least in part because > the "normal" character sets, as I understand it, are big enough > that it's impractical for a keyboard to include a plausible basic > range of characters, so I'm curious as to what the physical process > is for typing from a vocabulary of thousands of characters on a > sanely-sized keyboard. You're right about the size. Korean is special, because the 11,000- odd Hangul are phonetic and generated algorithmically from a set of about 70 phonetic partial glyphs, divided into three groups. The same keys do multiple duty when typed in phonetic order. Other systems use the shift key. For the 100,000 Han ideographs[1], there are a wide variety of methods for entry by key sequence, ranging from code point entry to context-dependent phonetic entry of entire sentences as they would be spoken. Then, of course, there's voice recognition, and handwriting recognition (both static from the image, and dynamic, taking account of the order of pen strokes). The more advanced input methods not only take account of grammar, but also learn the users' habits, remember recent conversions, and predict coming keystrokes based on current context, offering several conversions based on plausible continuations. > In mentioning emoji, my main point was that "average computer > users" are more and more likely to want to use emoji in general > applications (emails, web applications, even documents) - and if a > sufficiently general solution for that problem is found, it may > provide a solution for the general character-entry case. Not for the Asian languages. For them, "character entry" in the sense of character-by-character has long since been obsoleted by predictive sentence-level phonetic methods. But emoji are a perfect example for the present purpose, since they don't have standard pronunciations (although probably many will get them based on the Unicode standard names). On systems with high- enough resolution displays, a palette showing the glyphs is the obvious solution. But that's not pleasant if you type quickly and need those characters frequently. I don't think there's an alternative for emoji though, except for personalized shortcut maps. Math symbols are similar, I think. > Coming back to a more mundane example, if I need to type a character > like ? in an email, I currently need to reach for Character Map and > cut and paste it. The same is true if I have to type it into the > console. You probably have Control, Windows, Menu, Alt, and maybe a "function" key. If you're lucky, one labelled AltGr for "Alternate Graphic" is the obvious suspect. Some combination of the above probably allows entry of accented Latin-1 characters, miscellaneous Latin-1 (eg, sharp S), and a few oddballs (Greek letters, ligatures like oe, the leminiscate usually read infinity). > That's a sufficiently annoying stumbling block It very well could be, although my Windows Google-foo isn't great. But this is what I found. For WHITE SQUARE, the Mac doesn't have a keyboard equivalent, but there's a standard way to set up a set of shortcut keys[2]: http://stackoverflow.com/questions/3685146/how-do-you-do-the-therefore-%E2%88%B4-symbol-on-a-mac-or-in-textmate And I think you can also use the "Input Preferences" screen in System Preferences to set up a few of them. For Windows, it seems that Alt+decimal character codes, or hex Unicode followed by Alt+x are the built-in ways to enter characters not on your keyboard.. It's also possible to set up "Math Autocorrect" to automatically convert keysequences according to https://blogs.msdn.microsoft.com/murrays/2011/08/29/sans-serif-mathematical-symbols/ but that's hardly obvious (although maybe it is if you're Dutch?) I have to wonder why so many people stick with a system that obviously hates users. :-( Footnotes: [1] I'm counting several thousand Taiwanese standard glyphs whose pronunciation and meaning is no longer known (they're culled from old manuscripts), as well as each of the 2 or 3 variants of several thousand characters given simplified glyphs by the Japanese and PRC standard bodies, because all have separate Unicode codepoints assigned. [2] Note: I had to Google this because I use Japanese input methods: when I want a square I type the Japanese word for "square" and then press "next conversion" until the square I want shows up. This also works for most Greek letters and math symbols. This doesn't bother me, because it's normal for typing Japanese (and I do mix Japanese and English enough that I know that it doesn't bug me when I need such a character in an otherwise all-English text). I suspect it would be inadequate for someone who doesn't also type a language requiring a complex input method. From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 30 14:59:06 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 31 Oct 2016 03:59:06 +0900 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <22550.2278.95938.325560@turnbull.sk.tsukuba.ac.jp> Message-ID: <22550.17146.240750.602618@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > They are on the keyboard. The ? sign is shift-3, the ? sign uses the > AltGr key (which is woefully underused on the standard UK keyboard > driver - accented letters *should* be available using it :-() OMG. > Believe me, I've tried. But I should point out that I *don't* count > the "official" way (Alt plus typing the numeric code out on the > numeric keypad) as a viable option: I don't either. > A question, though. On Linux, (pick your distribution, but ideally "it > doesn't matter") It does matter, at least last I checked. Different distros default to different keyboard configurations. And it definitely matters what language you configure as your primary -- accented letters and punctuation used in that language will use AltGr, while those that aren't may require a mode switch or a COMPOSE-ACCENT-BASE sequence. > how would I type ?, ?, ? ? Assume any answer that starts with > "look up the numeric code" is unacceptable, as is anything that > only works in a specific application. Because this is X11/Unix, the answer is "it depends." (For math symbols and emoji, the common denominator default would surely be selection from a palette.) I suspect if there was a popular programming language that used a half-dozen non-ASCII characters, a slew of applets to configure those characters onto the keymap would arise quickly, and one of those that provided relative sane default mappings would become TOOWTDI. This is definitely possible, but at the moment, aside from language-specific mappings we already have, there's no obvious set of default characters that "everybody" needs. So a consistent, discoverable system for Unix won't happen until there's a bunch of non-ASCII everybody needs (and that can't be treated algorithmically like smart quotes), and no programming language will impose that until there's a consistent discoverable system of non-ASCII keymaps. :-( From steve at pearwood.info Sun Oct 30 19:46:10 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Oct 2016 10:46:10 +1100 Subject: [Python-ideas] PEP 531: Existence checking operators In-Reply-To: References: <20161029105321.GX15983@ando.pearwood.info> Message-ID: <20161030234609.GZ15983@ando.pearwood.info> On Sun, Oct 30, 2016 at 06:26:13AM -0700, David Mertz wrote: > NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE > 754. [...] > >>> min(1, nan) > 1 > > The last one isn't really mandated by IEEE 754, and is weird when you > consider `min(nan, 1)`. Python's min() and max() don't treat NANs correctly according to IEEE 754. The 2008 revision to the standard says that: min(x, NaN) = min(NaN, x) = x max(x, NaN) = max(NaN, x) = x https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max Professor Kahan, one of the IEEE 745 committee members, writes: For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree. Page 9, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF I believe that the standard does allow implementations to define a second pair of functions that implement "NAN poisoning", that is, they return NAN when given a NAN argument. -- Steve From steve at pearwood.info Sun Oct 30 20:17:35 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Oct 2016 11:17:35 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161031001734.GA15983@ando.pearwood.info> On Mon, Oct 31, 2016 at 12:02:54AM +1000, Nick Coghlan wrote: > What this means is that there aren't likely to be many practical gains > in using the "right" symbol for something, even when it's already > defined in Unicode, as we expect the number of people learning that > symbology *before* learning Python to be dramatically smaller than the > proportion learning Python first and the formal mathematical symbols > later (if they learn them at all). Depends on the symbol. Most people do study maths in secondary school where they will be introduced to symbols beyond the ASCII + - * / etc, for instance set union and intersection ? ?, although your point certainly applies to some of the more exotic (even for mathematicians) symbols in Unicode. > This means that instead of placing more stringent requirements on > editing environments for Python source code in order to use non-ASCII > input symbols, we're still far more likely to look to define a > suitable keyword, or assign a relatively arbitrary meaning to an ASCII > punctuation symbol Indeed. But there's only so many ASCII punctuation marks, and digraphs and trigraphs can become tiresome. And once people have solved the keyboard entry issue, it is no harder to memorise the "correct" symbol than some arbitrary ASCII sequence. > (and that's assuming we accept that a proposal will > see sufficient use to be worthy of new syntax in the first place, > which is far from being a given). I see that Perl is leading the way here, supporting a large number of Unicode symbols: https://docs.perl6.org/language/unicode_entry.html I must say that it is kinda cute that Perl6 does the right thing for x?. -- Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Oct 30 21:19:58 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 31 Oct 2016 10:19:58 +0900 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <20161031001734.GA15983@ando.pearwood.info> References: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <20161031001734.GA15983@ando.pearwood.info> Message-ID: <22550.39998.349344.967100@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I see that Perl is leading the way here, supporting a large number of > Unicode symbols: > > https://docs.perl6.org/language/unicode_entry.html In what sense is that "support"? What I see on that page is a lot of advice for the kind of people who are already using non-ASCII in Python, as I have been doing since 2001 or so. > I must say that it is kinda cute that Perl6 does the right thing for x?. Uh, as far as I can tell from that page, Perl has absolutely nothing to do with that. You enter the Unicode code point as hex, and if the font supports, you get the character. What Paul is arguing is that entering any character, non-ASCII or ASCII, as a hex code point or as an Alt+digits sequence, is a non-starter for our audience. Much as I'd like to disagree, I can't. From rosuav at gmail.com Sun Oct 30 21:58:10 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 31 Oct 2016 12:58:10 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <22550.39998.349344.967100@turnbull.sk.tsukuba.ac.jp> References: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <20161031001734.GA15983@ando.pearwood.info> <22550.39998.349344.967100@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Oct 31, 2016 at 12:19 PM, Stephen J. Turnbull wrote: > Uh, as far as I can tell from that page, Perl has absolutely nothing > to do with that. You enter the Unicode code point as hex, and if the > font supports, you get the character. What Paul is arguing is that > entering any character, non-ASCII or ASCII, as a hex code point or as > an Alt+digits sequence, is a non-starter for our audience. Much as > I'd like to disagree, I can't. Back when I used a single codepage (IBM OEM, now called 437) and 256 characters, it wasn't unreasonable to memorize the alt-codes for most of those characters. I could do all the single-line and double-line characters from memory (might take me a couple of tries to get the right corner), and if I needed to mix line types, I could just look those up. But with all of Unicode? Totally impractical. You can't expect people to use the hex codes. ChrisA From mertz at gnosis.cx Sun Oct 30 13:16:19 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 30 Oct 2016 10:16:19 -0700 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: My vim configuration for a year or two has looked something like this (the screenshot doesn't show the empty set symbol, but that's part of my conceal configuration: http://gnosis.cx/bin/.vim/after/syntax/python.vim). On Sun, Oct 30, 2016 at 7:13 AM, Chris Angelico wrote: > On Mon, Oct 31, 2016 at 12:39 AM, Paul Moore wrote: > > It's certainly not difficult, in principle. I have (had, I lost it in > > an upgrade recently...) a little AutoHotkey program that interpreted > > Vim-style digraphs in any application that needed them. But my point > > was that we don't want to require people to write such custom > > utilities, just to be able to write Python code. Or is the feeling > > that it's acceptable to require that? > > There's a chicken-and-egg problem. So long as most people don't have > tools like that, a language that requires them is going to be very > annoying - but so long as no major language uses such characters, > there's no reason for developers to set up those kinds of tools. > > Possibly the best way is a gentle introduction of alternative > syntaxes. Since Python currently has no "empty set display" syntax, > that seems like a perfect starting point. You can always type "set()", > but that involves an actual function call; using ? gives a small > performance boost, eliminates the risk of shadowing, etc, etc. All > minor points, but could be convenient enough. Also, if repr(set()) > returns "?", it'll be easy for anyone to get hold of the character for > copy/paste. > > As of 2016, I think it's not acceptable to *require* this, but it may > be time to start making use of it, retaining ASCII-only digraphs and > trigraphs, the way C has alternative spelling for braces and so on. > Then time passes, most people will be comfortable using the characters > themselves, and the digraphs/trigraphs can be deprecated, with new > syntax not being given any. > > Pipe dream? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: conceal.jpg Type: image/jpeg Size: 63843 bytes Desc: not available URL: From steve at pearwood.info Mon Oct 31 10:29:41 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 1 Nov 2016 01:29:41 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: <22550.39998.349344.967100@turnbull.sk.tsukuba.ac.jp> References: <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> <20161031001734.GA15983@ando.pearwood.info> <22550.39998.349344.967100@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161031142940.GC3365@ando.pearwood.info> On Mon, Oct 31, 2016 at 10:19:58AM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I see that Perl is leading the way here, supporting a large number of > > Unicode symbols: > > > > https://docs.perl6.org/language/unicode_entry.html > > In what sense is that "support"? In the sense that Perl 6 not only allows Unicode identifiers (as Python has for many years) but also Unicode operators and symbols. For example, you can use either the Unicode character ? \N{SUBSET OF} or the ASCII trigraph (<) for doing subset tests. > > I must say that it is kinda cute that Perl6 does the right thing for x?. > > Uh, as far as I can tell from that page, Perl has absolutely nothing > to do with that. You enter the Unicode code point as hex, and if the > font supports, you get the character. You missed the bit that Parl 6 interprets "x?" in code as the equivalent of x**2 (x squared). In other words, ? behaves as a unary postfix operator that squares its argument. Likewise for ?, etc. You can even combine them: x?? would be the same as x**33. There's more here: https://docs.perl6.org/language/unicode_texas -- Steve From steve at pearwood.info Mon Oct 31 10:34:44 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 1 Nov 2016 01:34:44 +1100 Subject: [Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator] In-Reply-To: References: <20161029063037.GU15983@ando.pearwood.info> <22548.55868.907868.875328@turnbull.sk.tsukuba.ac.jp> <22549.39569.330494.335762@turnbull.sk.tsukuba.ac.jp> Message-ID: <20161031143444.GD3365@ando.pearwood.info> On Sun, Oct 30, 2016 at 10:16:19AM -0700, David Mertz wrote: > My vim configuration for a year or two has looked something like this (the > screenshot doesn't show the empty set symbol, but that's part of my conceal > configuration: http://gnosis.cx/bin/.vim/after/syntax/python.vim). Oh nice! By the way, anyone looking at this in a browser may find that the browser defaults to treating it as Latin-1, which gives you mojibake. Just tell you browser to treat it as Unicode. -- Steve From mehaase at gmail.com Mon Oct 31 11:51:39 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Mon, 31 Oct 2016 11:51:39 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull wrote: > I gather you think you have a deadlock here. The way to break it is > to just do it. Pick a syntax and do the rewriting. My memory of some > past instances is that many of the senior devs (especially Guido) will > "see through the syntax" to evaluate the benefits of the proposal, > even if they've said they don't particularly like the initially- > proposed syntax. I don't feel deadlocked, but I think you're right about committing to a syntax. So I updated the PEP, summarized here: 1. Spelling a new operator as a keyword is difficult due to backward compatibility. It can be done (see PEP-308 and PEP-492) but requires extreme care. 2. A keyword operator is considered less ugly than punctuation, but it makes assignment shortcut syntax very ugly. Assume the coalesce operator is "foo", then the assignment shortcut would be "x foo= y". This is unacceptable. 3. If eliminate the possibility of a keyword and focus on punctuation, we find that most people think "??" ? the syntax that exists in several other mainstream languages ? is ugly and not Pythonic. 4. However, any other punctuation spelling will be at least as ugly and will not have the benefit of being familiar to programmers who have seen null coalescing in other languages. 5. Therefore, the most reasonable spelling is to borrow the same spelling that other languages use, e.g. "??", "?.", and "?[". I did go down the road of trying to create a new keyword, trying some mundane ideas ("foo else bar") and some more exotic ideas ("try foo then bar"), but I don't know if those syntaxes are even parseable, and as I worked through a bunch of examples, I realized that all of the keywords I was trying were very awkward in practical use, especially when combined with other expressions. Therefore, I have updated the PEP with the punctuation mentioned above, and at this point the PEP can't go any farther. If the best spelling for this new operator is unacceptable, then there's no getting around that. This PEP should be rejected. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 31 12:33:54 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 31 Oct 2016 16:33:54 +0000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: On 31 October 2016 at 15:51, Mark E. Haase wrote: > Therefore, I have updated the PEP with the punctuation mentioned above, and > at this point the PEP can't go any farther. If the best spelling for this > new operator is unacceptable, then there's no getting around that. This PEP > should be rejected. While I agree that there's little point arguing over spelling here - if the ? spelling is unacceptable we should just reject - I'm not sure that's the only sticking point remaining here. I still find the short-circuiting behaviour of ?. (and ?[) to be pretty confusing - and the fact that there's a long paragraph describing the behaviour, with lots of examples of the form "if you think that this example works like so, then you're wrong, and it actually does the following", suggests to me that I'm not going to be the only one struggling. Hopefully, the problem is simply the way the behaviour is presented, and a reworking of the description would make it all crystal clear - but it feels to me that there's some inherent complexity here that's an actual issue with the proposal. Having said that, it appears that the proposed behaviour is the same as in C# (can you just come out and say "C#", rather than hinting with the phrase "other popular languages" - if we're stealing the behaviour as is from C#, let's say so, and if not, can you include examples from more than one language?) Assuming that's the case, then the fact that it's not causing confusion to C# programmers is a definite point in its favour. Paul From mehaase at gmail.com Mon Oct 31 13:09:09 2016 From: mehaase at gmail.com (Mark E. Haase) Date: Mon, 31 Oct 2016 13:09:09 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: The PEP combines ideas from several different languages. For example: * Both Dart and C# have "x ?? y" and "x?.y". * Dart has "x ??= y" and C# does not. * C# has short circuit semantics for "?." and Dart does not. * PHP has "??" but does not have "?." * Etc. Wikipedia lists a lot of other languages[1], but I don't have enough personal experience with any of them to cite them in the PEP. This is why I use the phrase "other mainstream languages" multiple times. If you think the safe navigation operator isn't presented clearly, I am willing to improve it. Is there a particular example that you're struggling with? The simplest explanation is that it works the way you would want it too, e.g. in "foo?.bar.baz", we don't want semantics that could lead to looking up "baz" as an attribute of None. Therefore, if "foo?.bar" evaluates to None, then ".baz" is short circuited ? that attribute is not looked up. [1] https://en.wikipedia.org/wiki/Null_coalescing_operator#SQL On Mon, Oct 31, 2016 at 12:33 PM, Paul Moore wrote: > On 31 October 2016 at 15:51, Mark E. Haase wrote: > > Therefore, I have updated the PEP with the punctuation mentioned above, > and > > at this point the PEP can't go any farther. If the best spelling for this > > new operator is unacceptable, then there's no getting around that. This > PEP > > should be rejected. > > While I agree that there's little point arguing over spelling here - > if the ? spelling is unacceptable we should just reject - I'm not sure > that's the only sticking point remaining here. I still find the > short-circuiting behaviour of ?. (and ?[) to be pretty confusing - and > the fact that there's a long paragraph describing the behaviour, with > lots of examples of the form "if you think that this example works > like so, then you're wrong, and it actually does the following", > suggests to me that I'm not going to be the only one struggling. > Hopefully, the problem is simply the way the behaviour is presented, > and a reworking of the description would make it all crystal clear - > but it feels to me that there's some inherent complexity here that's > an actual issue with the proposal. > > Having said that, it appears that the proposed behaviour is the same > as in C# (can you just come out and say "C#", rather than hinting with > the phrase "other popular languages" - if we're stealing the behaviour > as is from C#, let's say so, and if not, can you include examples from > more than one language?) Assuming that's the case, then the fact that > it's not causing confusion to C# programmers is a definite point in > its favour. > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 31 13:16:28 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 31 Oct 2016 10:16:28 -0700 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: I think we should try to improve our intutition about these operators. Many things that are intuitively clear still require long turgid paragraphs in reference documentation to state the behavior unambiguously -- but that doesn't stop us from intuitively grasping concepts like a+b (when does b.__radd__ get called?) or @classmethod. The main case to build intuition for is "?." -- once you get that the "?[...]" operator works just the same. So here's my attempt: *In a series of attribute accesses like "foo.bar.baz.bletch", putting a `?` before a specific dot inserts a None check for the expression to the left and skips everything to the right when the None check is true.* We still need to clarify what we mean by "expression to the left" and "everything to the right", but in most situations you will guess right without thinking about it. The expression to the left is easy -- it's determined by syntactic operator precedence, so that if we have "x = y + foo.bar?.baz.bletch", the expression to the left of the "?." is just "foo.bar". (But see below -- you won't actually see such usage much.) For "everything to the right" it would seem we have some freedom: e.g. if we have "foo.bar?.baz(bletch)" is the call included? The answer is yes -- the concept we're after here is named "trailer" in the Grammar file in the source code ( https://github.com/python/cpython/blob/master/Grammar/Grammar#L119), and "primary" in the reference manual ( https://docs.python.org/3/reference/expressions.html#primaries). This means all attribute references ("x.y"), index/slice operations ("x[...]"), and calls ("x(...)"). Note that in almost all cases the "?." operator will be used in an context where there is no other operator of lower precedence before or after it -- given the above meaning, it doesn't make a lot of sense to write "1 + x?.a" because "1 + None" is always an error (and ditto for "x?.a + 1"). However it still makes sense to assign such an expression to a variable or pass it as an argument to a function. So you can ignore the preceding four paragraphs: just remember the simplified rule (indented and in bold, depending on your email client) and let your intuition do the rest. Maybe it can even be simplified more: *The "?." operator splits the expression in two parts; the second part is skipped if the first part is None.* Eventually this *will* become intuitive. The various constraints are all naturally imposed by the grammar so you won't have to think about them consciously. --Guido On Mon, Oct 31, 2016 at 9:33 AM, Paul Moore wrote: > On 31 October 2016 at 15:51, Mark E. Haase wrote: > > Therefore, I have updated the PEP with the punctuation mentioned above, > and > > at this point the PEP can't go any farther. If the best spelling for this > > new operator is unacceptable, then there's no getting around that. This > PEP > > should be rejected. > > While I agree that there's little point arguing over spelling here - > if the ? spelling is unacceptable we should just reject - I'm not sure > that's the only sticking point remaining here. I still find the > short-circuiting behaviour of ?. (and ?[) to be pretty confusing - and > the fact that there's a long paragraph describing the behaviour, with > lots of examples of the form "if you think that this example works > like so, then you're wrong, and it actually does the following", > suggests to me that I'm not going to be the only one struggling. > Hopefully, the problem is simply the way the behaviour is presented, > and a reworking of the description would make it all crystal clear - > but it feels to me that there's some inherent complexity here that's > an actual issue with the proposal. > > Having said that, it appears that the proposed behaviour is the same > as in C# (can you just come out and say "C#", rather than hinting with > the phrase "other popular languages" - if we're stealing the behaviour > as is from C#, let's say so, and if not, can you include examples from > more than one language?) Assuming that's the case, then the fact that > it's not causing confusion to C# programmers is a definite point in > its favour. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 31 13:46:51 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 31 Oct 2016 17:46:51 +0000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: On 31 October 2016 at 17:16, Guido van Rossum wrote: > I think we should try to improve our intutition about these operators. Many > things that are intuitively clear still require long turgid paragraphs in > reference documentation to state the behavior unambiguously -- but that > doesn't stop us from intuitively grasping concepts like a+b (when does > b.__radd__ get called?) or @classmethod. [...] > The "?." operator splits the expression in two parts; the second part is > skipped if the first part is None. > > Eventually this *will* become intuitive. The various constraints are all > naturally imposed by the grammar so you won't have to think about them > consciously. Thanks. Yes, I agree that details in a spec are never particularly obvious, and we need an intuition of what the operator does if it's to be successful. Mark - I couldn't offer a specific rewording, precisely because I found the whole thing confusing. But based on Guido's post, I'd say that the "intuitive" explanation of the proposed operators should be right at the top of the PEP, in the abstract - and should be repeated as the first statement in the specification section for each operator. The details can then follow, including all of the corner cases. But I'd be inclined there to word the corner cases as positive statements, rather than negative ones. Take for example, the case "d?.year.numerator + 1" - you say """ Note that the error in the second example is not on the attribute access numerator . In fact, that attribute access is never performed. The error occurs when adding None + 1 , because the None -aware attribute access does not short circuit + . """ which reads to me as presenting the misconception (that the error was from the access to numerator) before the correct explanation, and then explaining to the reader why they were confused if they thought that. I'd rather see it worded something along the lines of: """ >>> d = date.today() >>> d?.year.numerator + 1 2016 >>> d = None >>> d?.year.numerator + 1 Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'NoneType' and 'int' Note that the second example demonstrates that when ?. splits the enclosing expression into 2 parts, operators like + have a lower precedence, and so are not short circuited. So, we get a TypeError if d is None, because we're trying to add None to an integer (as the error states). """ There's no need to even *mention* the incorrect interpretation, it does nothing for people who'd misinterpreted the example in the first place, but for people who hadn't, it just suggests to them an alternative explanation they hadn't thought of - so confusing them where they weren't confused before. Does this clarify what I was struggling with in the way the PEP was worded? Paul From python at mrabarnett.plus.com Mon Oct 31 14:11:45 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 31 Oct 2016 18:11:45 +0000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: <77decbf9-04ff-8545-137e-0c5de3c15ec1@mrabarnett.plus.com> On 2016-10-31 17:16, Guido van Rossum wrote: > I think we should try to improve our intutition about these operators. > Many things that are intuitively clear still require long turgid > paragraphs in reference documentation to state the behavior > unambiguously -- but that doesn't stop us from intuitively grasping > concepts like a+b (when does b.__radd__ get called?) or @classmethod. > > The main case to build intuition for is "?." -- once you get that the > "?[...]" operator works just the same. So here's my attempt: > > *In a series of attribute accesses like "foo.bar.baz.bletch", putting a > `?` before a specific dot inserts a None check for the expression to the > left and skips everything to the right when the None check is true.* > > We still need to clarify what we mean by "expression to the left" and > "everything to the right", but in most situations you will guess right > without thinking about it. > > The expression to the left is easy -- it's determined by syntactic > operator precedence, so that if we have "x = y + foo.bar?.baz.bletch", > the expression to the left of the "?." is just "foo.bar". (But see below > -- you won't actually see such usage much.) > > For "everything to the right" it would seem we have some freedom: e.g. > if we have "foo.bar?.baz(bletch)" is the call included? The answer is > yes -- the concept we're after here is named "trailer" in the Grammar > file in the source code > (https://github.com/python/cpython/blob/master/Grammar/Grammar#L119), > and "primary" in the reference manual > (https://docs.python.org/3/reference/expressions.html#primaries). This > means all attribute references ("x.y"), index/slice operations > ("x[...]"), and calls ("x(...)"). > > Note that in almost all cases the "?." operator will be used in an > context where there is no other operator of lower precedence before or > after it -- given the above meaning, it doesn't make a lot of sense to > write "1 + x?.a" because "1 + None" is always an error (and ditto for > "x?.a + 1"). However it still makes sense to assign such an expression > to a variable or pass it as an argument to a function. > > So you can ignore the preceding four paragraphs: just remember the > simplified rule (indented and in bold, depending on your email client) > and let your intuition do the rest. Maybe it can even be simplified more: > > *The "?." operator splits the expression in two parts; the second part > is skipped if the first part is None. > * > > Eventually this *will* become intuitive. The various constraints are all > naturally imposed by the grammar so you won't have to think about them > consciously. > Would it help if we referred to them collectively as "suffixes"? Coincidentally, David Mertz's post includes a case where this feature would shorten the code. In normal Python form his code has: if x in stop_on or (end_if and end_if(x)): With this feature it could be: if x in stop_on or end_if?(x): From p.f.moore at gmail.com Mon Oct 31 15:14:59 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 31 Oct 2016 19:14:59 +0000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <77decbf9-04ff-8545-137e-0c5de3c15ec1@mrabarnett.plus.com> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <77decbf9-04ff-8545-137e-0c5de3c15ec1@mrabarnett.plus.com> Message-ID: On 31 October 2016 at 18:11, MRAB wrote: > With this feature it could be: > > if x in stop_on or end_if?(x): I don't think "null-aware function call" is in the current version of the PEP. Paul From python at mrabarnett.plus.com Mon Oct 31 16:27:29 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 31 Oct 2016 20:27:29 +0000 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <77decbf9-04ff-8545-137e-0c5de3c15ec1@mrabarnett.plus.com> Message-ID: <30c70377-cacb-c7ff-1924-fa23e0376f68@mrabarnett.plus.com> On 2016-10-31 19:14, Paul Moore wrote: > On 31 October 2016 at 18:11, MRAB wrote: > > With this feature it could be: > > > > if x in stop_on or end_if?(x): > > I don't think "null-aware function call" is in the current version of the PEP. That might be because it's not clear how useful it would be in practice. If that's the case, then here's a use case. From random832 at fastmail.com Mon Oct 31 17:44:07 2016 From: random832 at fastmail.com (Random832) Date: Mon, 31 Oct 2016 17:44:07 -0400 Subject: [Python-ideas] Null coalescing operator In-Reply-To: References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> Message-ID: <1477950247.1280228.773133201.5A3B88B6@webmail.messagingengine.com> On Mon, Oct 31, 2016, at 13:16, Guido van Rossum wrote: > For "everything to the right" it would seem we have some freedom: e.g. > if we have "foo.bar?.baz(bletch)" is the call included? The answer is > yes -- the concept we're after here is named "trailer" in the Grammar > file in the source code ( > https://github.com/python/cpython/blob/master/Grammar/Grammar#L119), > and "primary" in the reference manual ( > https://docs.python.org/3/reference/expressions.html#primaries). This > means all attribute references ("x.y"), index/slice operations > ("x[...]"), and calls ("x(...)"). One thing that I think I touched on in an earlier iteration of this discussion but hasn't been revisited is: what's the AST going to look like? Right now, foo.bar.baz(bletch) is Call(Attribute(Attribute(Name('foo'), 'bar'), 'baz'), [Name('bletch')])), which is identical to (foo.bar).baz(bletch) or (foo.bar.baz)(bletch). These are treated, essentially, as postfix operators, where you can parenthesize any left part of the expression and leave its meaning [and its AST] unchanged. Is the AST going to be unchanged, leading to the conclusion that the short-circuiting in (foo?.bar).baz will "reach outside of" the parentheses, and relying on the fact that wanting to do that with None is a silly thing to do in almost all cases? Or is there going to be a new kind of AST that is sequential rather than recursive in how it represents trailer/primary expressions? From guido at python.org Mon Oct 31 17:55:59 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 31 Oct 2016 14:55:59 -0700 Subject: [Python-ideas] Null coalescing operator In-Reply-To: <1477950247.1280228.773133201.5A3B88B6@webmail.messagingengine.com> References: <20160910002719.GG22471@ando.pearwood.info> <22484.14715.118314.556074@turnbull.sk.tsukuba.ac.jp> <1473527686.2634873.721679809.56D51639@webmail.messagingengine.com> <22548.828.996348.657305@turnbull.sk.tsukuba.ac.jp> <1477950247.1280228.773133201.5A3B88B6@webmail.messagingengine.com> Message-ID: Obviously the AST needs to be changed. How? I dunno. Sounds like you have some ideas. :-) On Mon, Oct 31, 2016 at 2:44 PM, Random832 wrote: > On Mon, Oct 31, 2016, at 13:16, Guido van Rossum wrote: > > For "everything to the right" it would seem we have some freedom: e.g. > > if we have "foo.bar?.baz(bletch)" is the call included? The answer is > > yes -- the concept we're after here is named "trailer" in the Grammar > > file in the source code ( > > https://github.com/python/cpython/blob/master/Grammar/Grammar#L119), > > and "primary" in the reference manual ( > > https://docs.python.org/3/reference/expressions.html#primaries). This > > means all attribute references ("x.y"), index/slice operations > > ("x[...]"), and calls ("x(...)"). > > One thing that I think I touched on in an earlier iteration of this > discussion but hasn't been revisited is: what's the AST going to look > like? > > Right now, foo.bar.baz(bletch) is Call(Attribute(Attribute(Name('foo'), > 'bar'), 'baz'), [Name('bletch')])), which is identical to > (foo.bar).baz(bletch) or (foo.bar.baz)(bletch). These are treated, > essentially, as postfix operators, where you can parenthesize any left > part of the expression and leave its meaning [and its AST] unchanged. > > Is the AST going to be unchanged, leading to the conclusion that the > short-circuiting in (foo?.bar).baz will "reach outside of" the > parentheses, and relying on the fact that wanting to do that with None > is a silly thing to do in almost all cases? Or is there going to be a > new kind of AST that is sequential rather than recursive in how it > represents trailer/primary expressions? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 31 19:33:29 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 31 Oct 2016 16:33:29 -0700 Subject: [Python-ideas] More user-friendly version for string.translate() In-Reply-To: References: <20161025023704.GD15983@ando.pearwood.info> <22543.37392.571089.528253@turnbull.sk.tsukuba.ac.jp> <22544.64697.310751.462520@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Oct 28, 2016 at 7:28 AM, Terry Reedy wrote: > >>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf' > >>> s2 = ''.join(c for c in s if c in set('abc')) > pretty slick -- but any hope of it being as fast as a C implemented method? for example, with a 1000 char string: In [59]: % timeit string.translate(table) 100000 loops, best of 3: 3.62 ?s per loop In [60]: % timeit ''.join(c for c in string if c in set(letters)) 1000 loops, best of 3: 1.14 ms per loop so the translate() method is about 300 times faster in this case. (and it used a defaultdict with a None factory, which is probably a bit slower than a pure C implementation might be. I've always figured that Python's rich string methods provided two things: 1) single method call to do common things 2) nice fast, pure C performance so I think a "keep these" method would help with both of these goals. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: