From rhodri at kynesim.co.uk Thu Nov 1 08:50:51 2018 From: rhodri at kynesim.co.uk (Rhodri James) Date: Thu, 1 Nov 2018 12:50:51 +0000 Subject: [Python-ideas] Allow Context Managers to Support Suspended Execution In-Reply-To: References: Message-ID: On 01/11/2018 02:52, David Allemang wrote: > I do not think there is currently a good way for Context Managers to > support suspended execution, as in await or yield. Both of these > instructions cause the interpreter to leave the with block, yet no > indication of this (temporary) exit or subsequent re-entrance is given > to the context manager. If the intent of a Context Manager is to say > "no matter how this block is entered or exited, the context will be > correctly maintained", then this needs to be possible. I think you're going to have to justify this a bit more. From my point of view, yielding does not leave the with block in any meaningful sense. Indeed I'd be quite hacked off with a file context manager that was so inefficient as to close the file on yielding a line, only to have to re-open and seek when it got control back. -- Rhodri James *-* Kynesim Ltd From cspealma at redhat.com Thu Nov 1 09:39:54 2018 From: cspealma at redhat.com (Calvin Spealman) Date: Thu, 1 Nov 2018 09:39:54 -0400 Subject: [Python-ideas] Allow Context Managers to Support Suspended Execution In-Reply-To: References: Message-ID: I'm very curious about the idea, but can't come up with any use cases based just one your explanation. Maybe you could give some examples where this would be useful? In particular, what are some cases that are really hard to handle now and how would those cases be improved like this? On Wed, Oct 31, 2018 at 10:53 PM David Allemang wrote: > I do not think there is currently a good way for Context Managers to > support suspended execution, as in await or yield. Both of these > instructions cause the interpreter to leave the with block, yet no > indication of this (temporary) exit or subsequent re-entrance is given > to the context manager. If the intent of a Context Manager is to say > "no matter how this block is entered or exited, the context will be > correctly maintained", then this needs to be possible. > > I would propose magic methods __suspend__ and __resume__ as companions > to the existing __enter__ and __exit__ methods (and their async > variants). __suspend__, if present, would be called upon suspending > execution on an await or yield statement, and __resume__, if present, > would be called when execution is resumed. If __suspend__ or > __resume__ are not present then nothing should be done, so that the > behavior of existing context managers is preserved. > > Here is an example demonstrating the issue with await: > https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc > and one with yield: > https://gist.github.com/allemangD/f2534f16d3a0c642c2cdc02c544e854f > > The context manager used is clearly not thread-safe, and I'm not > actually sure how to approach a thread-safe implementation with the > proposed __suspend__ and __resume__ - but I don't believe that > introducing these new methods would create any issues that aren't > already present with __enter__ and __exit__. > > It's worth noting that the context manager used in those examples is, > essentially, identical contextlib's redirect_stdout and decimal's > localcontext managers. Any context manager such as these which modify > global state or the behavior of global functions would benefit from > this. It may also make sense to, for example, have the __suspend__ > method on file objects flush buffers without closing the file, similar > to their current __exit__ behavior, but I'm unsure what impact this > would have on performance. > > It is important, though, that yield and await not use __enter__ or > __exit__, as not all context-managers are reusable. I'm unsure what > the best term would be to describe this type of context, as the > documentation for contextlib already gives a different definition for > "reentrant" - I would then call them "suspendable" contexts. It would > make sense to have an @suspendable decorator, probably in contextlib, > to indicate that a context manager can use __enter__ and __exit__ > methods rather than __suspend__ and __resume__. All it would need to > do is define __suspend__ to call __enter__() and __resume__ to call > __exit__(None, None, None). > > It is also important, since __suspend__ and __resume__ would be called > after a context is entered but before it is exited, that __suspend__ > not accept any parameters and that __resume__ not use its return > value. __suspend__ could not be triggered by an exception, only by a > yield or await, and __resume__ could not have its return value named > with as. > > Thanks, > > David > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Nov 1 11:05:37 2018 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Nov 2018 08:05:37 -0700 Subject: [Python-ideas] Allow Context Managers to Support Suspended Execution In-Reply-To: References: Message-ID: Check out the decimal example here: https://www.python.org/dev/peps/pep-0568/ (PEP 568 is deferred, but PEP 567 is implemented in Python 3.7). Those Contexts aren't context managers, but still there's some thought put into swapping contexts out at the boundaries of generators. On Wed, Oct 31, 2018 at 7:54 PM David Allemang wrote: > I do not think there is currently a good way for Context Managers to > support suspended execution, as in await or yield. Both of these > instructions cause the interpreter to leave the with block, yet no > indication of this (temporary) exit or subsequent re-entrance is given > to the context manager. If the intent of a Context Manager is to say > "no matter how this block is entered or exited, the context will be > correctly maintained", then this needs to be possible. > > I would propose magic methods __suspend__ and __resume__ as companions > to the existing __enter__ and __exit__ methods (and their async > variants). __suspend__, if present, would be called upon suspending > execution on an await or yield statement, and __resume__, if present, > would be called when execution is resumed. If __suspend__ or > __resume__ are not present then nothing should be done, so that the > behavior of existing context managers is preserved. > > Here is an example demonstrating the issue with await: > https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc > and one with yield: > https://gist.github.com/allemangD/f2534f16d3a0c642c2cdc02c544e854f > > The context manager used is clearly not thread-safe, and I'm not > actually sure how to approach a thread-safe implementation with the > proposed __suspend__ and __resume__ - but I don't believe that > introducing these new methods would create any issues that aren't > already present with __enter__ and __exit__. > > It's worth noting that the context manager used in those examples is, > essentially, identical contextlib's redirect_stdout and decimal's > localcontext managers. Any context manager such as these which modify > global state or the behavior of global functions would benefit from > this. It may also make sense to, for example, have the __suspend__ > method on file objects flush buffers without closing the file, similar > to their current __exit__ behavior, but I'm unsure what impact this > would have on performance. > > It is important, though, that yield and await not use __enter__ or > __exit__, as not all context-managers are reusable. I'm unsure what > the best term would be to describe this type of context, as the > documentation for contextlib already gives a different definition for > "reentrant" - I would then call them "suspendable" contexts. It would > make sense to have an @suspendable decorator, probably in contextlib, > to indicate that a context manager can use __enter__ and __exit__ > methods rather than __suspend__ and __resume__. All it would need to > do is define __suspend__ to call __enter__() and __resume__ to call > __exit__(None, None, None). > > It is also important, since __suspend__ and __resume__ would be called > after a context is entered but before it is exited, that __suspend__ > not accept any parameters and that __resume__ not use its return > value. __suspend__ could not be triggered by an exception, only by a > yield or await, and __resume__ could not have its return value named > with as. > > Thanks, > > David > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Nov 1 11:40:44 2018 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 1 Nov 2018 11:40:44 -0400 Subject: [Python-ideas] Allow Context Managers to Support Suspended Execution In-Reply-To: References: Message-ID: Yep, PEP 567 addresses this for coroutines, so David's first example is covered; here's a link to the fixed version: [1] The proposal to add __suspend__ and __resume__ is very similar to PEP 521 which was withdrawn. PEP 568 (which needs to be properly updated) is the way to go if we want to address this issue for generators. [1] https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc#gistcomment-2748803 Yury On Thu, Nov 1, 2018 at 11:06 AM Guido van Rossum wrote: > > Check out the decimal example here: https://www.python.org/dev/peps/pep-0568/ (PEP 568 is deferred, but PEP 567 is implemented in Python 3.7). > > Those Contexts aren't context managers, but still there's some thought put into swapping contexts out at the boundaries of generators. > > On Wed, Oct 31, 2018 at 7:54 PM David Allemang wrote: >> >> I do not think there is currently a good way for Context Managers to >> support suspended execution, as in await or yield. Both of these >> instructions cause the interpreter to leave the with block, yet no >> indication of this (temporary) exit or subsequent re-entrance is given >> to the context manager. If the intent of a Context Manager is to say >> "no matter how this block is entered or exited, the context will be >> correctly maintained", then this needs to be possible. >> >> I would propose magic methods __suspend__ and __resume__ as companions >> to the existing __enter__ and __exit__ methods (and their async >> variants). __suspend__, if present, would be called upon suspending >> execution on an await or yield statement, and __resume__, if present, >> would be called when execution is resumed. If __suspend__ or >> __resume__ are not present then nothing should be done, so that the >> behavior of existing context managers is preserved. >> >> Here is an example demonstrating the issue with await: >> https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc >> and one with yield: >> https://gist.github.com/allemangD/f2534f16d3a0c642c2cdc02c544e854f >> >> The context manager used is clearly not thread-safe, and I'm not >> actually sure how to approach a thread-safe implementation with the >> proposed __suspend__ and __resume__ - but I don't believe that >> introducing these new methods would create any issues that aren't >> already present with __enter__ and __exit__. >> >> It's worth noting that the context manager used in those examples is, >> essentially, identical contextlib's redirect_stdout and decimal's >> localcontext managers. Any context manager such as these which modify >> global state or the behavior of global functions would benefit from >> this. It may also make sense to, for example, have the __suspend__ >> method on file objects flush buffers without closing the file, similar >> to their current __exit__ behavior, but I'm unsure what impact this >> would have on performance. >> >> It is important, though, that yield and await not use __enter__ or >> __exit__, as not all context-managers are reusable. I'm unsure what >> the best term would be to describe this type of context, as the >> documentation for contextlib already gives a different definition for >> "reentrant" - I would then call them "suspendable" contexts. It would >> make sense to have an @suspendable decorator, probably in contextlib, >> to indicate that a context manager can use __enter__ and __exit__ >> methods rather than __suspend__ and __resume__. All it would need to >> do is define __suspend__ to call __enter__() and __resume__ to call >> __exit__(None, None, None). >> >> It is also important, since __suspend__ and __resume__ would be called >> after a context is entered but before it is exited, that __suspend__ >> not accept any parameters and that __resume__ not use its return >> value. __suspend__ could not be triggered by an exception, only by a >> yield or await, and __resume__ could not have its return value named >> with as. >> >> Thanks, >> >> David >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Yury From allemang.d at gmail.com Thu Nov 1 13:11:38 2018 From: allemang.d at gmail.com (David Allemang) Date: Thu, 1 Nov 2018 13:11:38 -0400 Subject: [Python-ideas] Allow Context Managers to Support Suspended Execution In-Reply-To: References: Message-ID: Yes, so PEP 512 is exactly what I was suggesting. My apologies for not finding it before sending this. So, then, PEP 567 solves the issue for coroutines and PEP 568 would solve it for generators as well? On Thu, Nov 1, 2018, 11:40 AM Yury Selivanov Yep, PEP 567 addresses this for coroutines, so David's first example > is covered; here's a link to the fixed version: [1] > > The proposal to add __suspend__ and __resume__ is very similar to PEP > 521 which was withdrawn. PEP 568 (which needs to be properly updated) > is the way to go if we want to address this issue for generators. > > [1] > https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc#gistcomment-2748803 > > Yury > On Thu, Nov 1, 2018 at 11:06 AM Guido van Rossum wrote: > > > > Check out the decimal example here: > https://www.python.org/dev/peps/pep-0568/ (PEP 568 is deferred, but PEP > 567 is implemented in Python 3.7). > > > > Those Contexts aren't context managers, but still there's some thought > put into swapping contexts out at the boundaries of generators. > > > > On Wed, Oct 31, 2018 at 7:54 PM David Allemang > wrote: > >> > >> I do not think there is currently a good way for Context Managers to > >> support suspended execution, as in await or yield. Both of these > >> instructions cause the interpreter to leave the with block, yet no > >> indication of this (temporary) exit or subsequent re-entrance is given > >> to the context manager. If the intent of a Context Manager is to say > >> "no matter how this block is entered or exited, the context will be > >> correctly maintained", then this needs to be possible. > >> > >> I would propose magic methods __suspend__ and __resume__ as companions > >> to the existing __enter__ and __exit__ methods (and their async > >> variants). __suspend__, if present, would be called upon suspending > >> execution on an await or yield statement, and __resume__, if present, > >> would be called when execution is resumed. If __suspend__ or > >> __resume__ are not present then nothing should be done, so that the > >> behavior of existing context managers is preserved. > >> > >> Here is an example demonstrating the issue with await: > >> https://gist.github.com/allemangD/bba8dc2d059310623f752ebf65bb6cdc > >> and one with yield: > >> https://gist.github.com/allemangD/f2534f16d3a0c642c2cdc02c544e854f > >> > >> The context manager used is clearly not thread-safe, and I'm not > >> actually sure how to approach a thread-safe implementation with the > >> proposed __suspend__ and __resume__ - but I don't believe that > >> introducing these new methods would create any issues that aren't > >> already present with __enter__ and __exit__. > >> > >> It's worth noting that the context manager used in those examples is, > >> essentially, identical contextlib's redirect_stdout and decimal's > >> localcontext managers. Any context manager such as these which modify > >> global state or the behavior of global functions would benefit from > >> this. It may also make sense to, for example, have the __suspend__ > >> method on file objects flush buffers without closing the file, similar > >> to their current __exit__ behavior, but I'm unsure what impact this > >> would have on performance. > >> > >> It is important, though, that yield and await not use __enter__ or > >> __exit__, as not all context-managers are reusable. I'm unsure what > >> the best term would be to describe this type of context, as the > >> documentation for contextlib already gives a different definition for > >> "reentrant" - I would then call them "suspendable" contexts. It would > >> make sense to have an @suspendable decorator, probably in contextlib, > >> to indicate that a context manager can use __enter__ and __exit__ > >> methods rather than __suspend__ and __resume__. All it would need to > >> do is define __suspend__ to call __enter__() and __resume__ to call > >> __exit__(None, None, None). > >> > >> It is also important, since __suspend__ and __resume__ would be called > >> after a context is entered but before it is exited, that __suspend__ > >> not accept any parameters and that __resume__ not use its return > >> value. __suspend__ could not be triggered by an exception, only by a > >> yield or await, and __resume__ could not have its return value named > >> with as. > >> > >> Thanks, > >> > >> David > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Yury > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Nov 1 15:36:19 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 1 Nov 2018 21:36:19 +0200 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <20181031120710.0824d82d@fsol> References: <20181031120710.0824d82d@fsol> Message-ID: 31.10.18 13:07, Antoine Pitrou ????: > l.pop(default=...) has the potential to be multi-thread-safe, while > your alternatives haven't. The multi-thread-safe alternative is: try: value = l.pop() except IndexError: value = default From storchaka at gmail.com Thu Nov 1 15:45:26 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 1 Nov 2018 21:45:26 +0200 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <20181031120813.69b9a23b@fsol> References: <20181031120813.69b9a23b@fsol> Message-ID: 31.10.18 13:08, Antoine Pitrou ????: > +1 from me. dict.pop() already has an optional default. This is a > straight-forward improvement to the API and no Python programmer will > be surprised. list.pop() corresponds two dict methods. With argument it corresponds dict.pop(). But there are differences: dict.pop() called repeatedly with the same key will raise an error (or return the default), while list.pop() will likely return other item. Without argument it corresponds dict.popitem() which doesn't have an optional default. From ethan at stoneleaf.us Thu Nov 1 20:12:19 2018 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 1 Nov 2018 17:12:19 -0700 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> On 10/31/2018 02:29 PM, Chris Angelico wrote: > Exactly how a team of core devs can make unified > decisions is a little up in the air at the moment I wouldn't worry too much about it. I don't think we have ever made entirely unified decisions. -- ~Ethan~ From rosuav at gmail.com Thu Nov 1 20:15:32 2018 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 2 Nov 2018 11:15:32 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: On Fri, Nov 2, 2018 at 11:12 AM Ethan Furman wrote: > > On 10/31/2018 02:29 PM, Chris Angelico wrote: > > > Exactly how a team of core devs can make unified > > decisions is a little up in the air at the moment > > I wouldn't worry too much about it. I don't think we have ever made > entirely unified decisions. > LOL, there is that. But somehow, a single decision has to be made: merge or don't merge? And getting a group of people to the point of making a single decision is the bit that's up in the air. ChrisA From robertve92 at gmail.com Thu Nov 1 20:18:26 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 2 Nov 2018 01:18:26 +0100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: Just English Vocabulary, what do you mean by "being in the air at the moment" ? Like, that's a subject that a lot of people in here like to talk ? Yes, to merge or not to merge, but people can UpVote/DownVote can't they ? :D Le ven. 2 nov. 2018 ? 01:15, Chris Angelico a ?crit : > On Fri, Nov 2, 2018 at 11:12 AM Ethan Furman wrote: > > > > On 10/31/2018 02:29 PM, Chris Angelico wrote: > > > > > Exactly how a team of core devs can make unified > > > decisions is a little up in the air at the moment > > > > I wouldn't worry too much about it. I don't think we have ever made > > entirely unified decisions. > > > > LOL, there is that. But somehow, a single decision has to be made: > merge or don't merge? And getting a group of people to the point of > making a single decision is the bit that's up in the air. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Nov 1 20:22:12 2018 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 2 Nov 2018 11:22:12 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: On Fri, Nov 2, 2018 at 11:19 AM Robert Vanden Eynde wrote: > > Just English Vocabulary, what do you mean by "being in the air at the moment" ? > Like, that's a subject that a lot of people in here like to talk ? "Up in the air" means uncertain, subject to change. https://idioms.thefreedictionary.com/up+in+the+air In this case, the governance model for the Python language is being discussed. > Yes, to merge or not to merge, but people can UpVote/DownVote can't they ? :D Upvotes and downvotes don't mean anything. So, sure! It's like upvoting or downvoting one of your country's laws... nobody's stopping you (at least, I hope you live in a country where you're allowed to express your likes and dislikes), but it doesn't change anything unless you're one of the handful of people who actually make the decision. ChrisA From robertve92 at gmail.com Thu Nov 1 20:26:06 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 2 Nov 2018 01:26:06 +0100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: > > In this case, the governance model for the Python language is being > discussed. > This was the info I was missing, where is it discussed ? Not only on this list I assume ^^ > Upvotes and downvotes don't mean anything. [...] > Yes, that's why random people wouldn't vote. But like, voting between like the 10 core devs where they all have the same importance, that does help for choosing "to merge or not to merge", isn't it ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Nov 1 20:28:19 2018 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 2 Nov 2018 11:28:19 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: On Fri, Nov 2, 2018 at 11:26 AM Robert Vanden Eynde wrote: >> >> In this case, the governance model for the Python language is being discussed. > > > This was the info I was missing, where is it discussed ? Not only on this list I assume ^^ There are a number of PEPs in the 8000s that would be worth reading. >> Upvotes and downvotes don't mean anything. [...] > > > Yes, that's why random people wouldn't vote. > But like, voting between like the 10 core devs where they all have the same importance, > that does help for choosing "to merge or not to merge", isn't it ? That is just one of the possible options - that decisions are made by vote. ChrisA From robertve92 at gmail.com Thu Nov 1 20:30:38 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 2 Nov 2018 01:30:38 +0100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> <0f52b26d-9e9f-601c-3301-5ba7daadb19c@stoneleaf.us> Message-ID: > > There are a number of PEPs in the 8000s that would be worth reading. > Will read that *? l'occaz*, closing the disgression now ^^ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashafer01 at gmail.com Thu Nov 1 21:06:54 2018 From: ashafer01 at gmail.com (Alex Shafer) Date: Thu, 1 Nov 2018 19:06:54 -0600 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon Message-ID: I'd like to propose an addition to `dict` but I'm not necessarily proposing what's written here as the API. When I initially saw the need for this myself, I hastily wrote it as: def setdefault_call(a_dict, key, default_func): try: return a_dict[key] except KeyError: default = default_func() a_dict[key] = default return default If its not clear, the purpose is to eliminate the overhead of creating an empty list or similar in situations like this: d = {} for i in range(1000000): # some large loop l = d.setdefault(somekey, []) l.append(somevalue) # instead... for i in range(1000000): l = d.setdefault_call(somekey, list) l.append(somevalue) One potential drawback I see to the concept is that I think there will be a need to explicitly say "no arguments can get passed into this call". Otherwise users may defeat the purpose with constructions like this: d.setdefault_call("foo", list, ["default value"]) I'd mainly like feedback on this concept overall, and if its liked, perhaps an API discussion to follow. Thanks! PS Other APIs I've considered for this are a new keyword argument to the existing `setdefault()`, or perhaps more radically for Python, a new keyword argument to the `dict()` constructor that would get called as an implicit default for `setdefault()` and perhaps used in other scenarios (essentially defining a type for dict values). -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Nov 1 21:12:45 2018 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 2 Nov 2018 12:12:45 +1100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: On Fri, Nov 2, 2018 at 12:07 PM Alex Shafer wrote: > Other APIs I've considered for this are a new keyword argument to the existing `setdefault()`, or perhaps more radically for Python, a new keyword argument to the `dict()` constructor that would get called as an implicit default for `setdefault()` and perhaps used in other scenarios (essentially defining a type for dict values). > The time machine has been put to good use here. Are you aware of __missing__ and collections.defaultdict? You just create a defaultdict with a callable (very common to use a class like "list"), and any time you try to use something that's missing, it'll call that to generate a value. from collections import defaultdict d = defaultdict(list) for category, item in some_stuff: d[category].append(item) Easy way to group things into their categories. ChrisA From amit.mixie at gmail.com Thu Nov 1 21:13:39 2018 From: amit.mixie at gmail.com (Amit Green) Date: Thu, 1 Nov 2018 21:13:39 -0400 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: I use this a lot in my code. Since `setdefault_call` does not exist, here is how I do it: d = {} lookup_d = d.get provide_d = d.setdefault for i in range(1000000): # some large loop l = (lookup_d(somekey)) or (provide_d(somekey, [])) l.append(somevalue) I am not arguing for or against `.setdefault_call` -- I'm just providing information, that I use the referenced behavior hundreds of time in my code. My solution of using `lookup_d(...) or provide_d(...)` is obviously inefficient in that it has to do two dictionary lookups (in the case that the `lookup_d` fails). A `setdefault_call` would be more efficient; though having to create a lambda function, might offset this efficency. So the key issue is readability, not efficiency. On Thu, Nov 1, 2018 at 9:07 PM Alex Shafer wrote: > I'd like to propose an addition to `dict` but I'm not necessarily > proposing what's written here as the API. When I initially saw the need for > this myself, I hastily wrote it as: > > def setdefault_call(a_dict, key, default_func): > try: > return a_dict[key] > except KeyError: > default = default_func() > a_dict[key] = default > return default > > If its not clear, the purpose is to eliminate the overhead of creating an > empty list or similar in situations like this: > > d = {} > for i in range(1000000): # some large loop > l = d.setdefault(somekey, []) > l.append(somevalue) > > # instead... > > for i in range(1000000): > l = d.setdefault_call(somekey, list) > l.append(somevalue) > > One potential drawback I see to the concept is that I think there will be > a need to explicitly say "no arguments can get passed into this call". > Otherwise users may defeat the purpose with constructions like this: > > d.setdefault_call("foo", list, ["default value"]) > > I'd mainly like feedback on this concept overall, and if its liked, > perhaps an API discussion to follow. Thanks! > > PS > > Other APIs I've considered for this are a new keyword argument to the > existing `setdefault()`, or perhaps more radically for Python, a new > keyword argument to the `dict()` constructor that would get called as an > implicit default for `setdefault()` and perhaps used in other scenarios > (essentially defining a type for dict values). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Thu Nov 1 21:57:15 2018 From: prometheus235 at gmail.com (Nick Timkovich) Date: Thu, 1 Nov 2018 20:57:15 -0500 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031120813.69b9a23b@fsol> Message-ID: Does it make sense to draw some sort of parallel between next(myiterator, default="whatever") and mylist.pop(default="whatever")? They exhaust the iterator/list then start emitting the default argument (if provided). On Thu, Nov 1, 2018 at 2:46 PM Serhiy Storchaka wrote: > 31.10.18 13:08, Antoine Pitrou ????: > > +1 from me. dict.pop() already has an optional default. This is a > > straight-forward improvement to the API and no Python programmer will > > be surprised. > > list.pop() corresponds two dict methods. With argument it corresponds > dict.pop(). But there are differences: dict.pop() called repeatedly with > the same key will raise an error (or return the default), while > list.pop() will likely return other item. Without argument it > corresponds dict.popitem() which doesn't have an optional default. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashafer01 at gmail.com Thu Nov 1 22:07:31 2018 From: ashafer01 at gmail.com (Alex Shafer) Date: Thu, 1 Nov 2018 20:07:31 -0600 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: I had actually managed to miss collections.defaultdict! I'd like to instead propose that a reference to that be added to the dict.setdefault docs. I can't imagine I'm the only one that has missed this. Date: Fri, 2 Nov 2018 12:12:45 +1100 > From: Chris Angelico > To: python-ideas > Subject: Re: [Python-ideas] dict.setdefault_call(), or API variations > thereupon > Message-ID: > < > CAPTjJmqg_qtK3OfR+4VAaaNa7JXjHjHLpnx6EfEZX5n4tttqCQ at mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Fri, Nov 2, 2018 at 12:07 PM Alex Shafer wrote: > > Other APIs I've considered for this are a new keyword argument to the > existing `setdefault()`, or perhaps more radically for Python, a new > keyword argument to the `dict()` constructor that would get called as an > implicit default for `setdefault()` and perhaps used in other scenarios > (essentially defining a type for dict values). > > > > The time machine has been put to good use here. Are you aware of > __missing__ and collections.defaultdict? You just create a defaultdict > with a callable (very common to use a class like "list"), and any time > you try to use something that's missing, it'll call that to generate a > value. > > from collections import defaultdict > d = defaultdict(list) > for category, item in some_stuff: > d[category].append(item) > > Easy way to group things into their categories. > > ChrisA > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertve92 at gmail.com Thu Nov 1 22:08:44 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 2 Nov 2018 03:08:44 +0100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031120813.69b9a23b@fsol> Message-ID: > > Does it make sense to draw some sort of parallel between next(myiterator, > default="whatever") and mylist.pop(default="whatever")? They exhaust the > iterator/list then start emitting the default argument (if provided). > Yep that's what I just did in my previous mail. """ I think the same way about set.pop, list.pop. About .index I agree adding default= would make sense but that's not exactly the same thing as the others. """ Being picky: "TypeError: next() takes no keyword arguments", that's next(myierator, "whatever") ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Nov 1 22:20:04 2018 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Nov 2018 19:20:04 -0700 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: The two are less connected than you seem to think. On Thu, Nov 1, 2018 at 7:08 PM Alex Shafer wrote: > I had actually managed to miss collections.defaultdict! > > I'd like to instead propose that a reference to that be added to the > dict.setdefault docs. I can't imagine I'm the only one that has missed this. > > > Date: Fri, 2 Nov 2018 12:12:45 +1100 >> From: Chris Angelico >> To: python-ideas >> Subject: Re: [Python-ideas] dict.setdefault_call(), or API variations >> thereupon >> Message-ID: >> < >> CAPTjJmqg_qtK3OfR+4VAaaNa7JXjHjHLpnx6EfEZX5n4tttqCQ at mail.gmail.com> >> Content-Type: text/plain; charset="UTF-8" > > >> >> On Fri, Nov 2, 2018 at 12:07 PM Alex Shafer wrote: >> > Other APIs I've considered for this are a new keyword argument to the >> existing `setdefault()`, or perhaps more radically for Python, a new >> keyword argument to the `dict()` constructor that would get called as an >> implicit default for `setdefault()` and perhaps used in other scenarios >> (essentially defining a type for dict values). >> > >> >> The time machine has been put to good use here. Are you aware of >> __missing__ and collections.defaultdict? You just create a defaultdict >> with a callable (very common to use a class like "list"), and any time >> you try to use something that's missing, it'll call that to generate a >> value. >> >> from collections import defaultdict >> d = defaultdict(list) >> for category, item in some_stuff: >> d[category].append(item) >> >> Easy way to group things into their categories. >> >> ChrisA >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertve92 at gmail.com Thu Nov 1 22:23:05 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 2 Nov 2018 03:23:05 +0100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: > > The two are less connected than you seem to think. > Really ? What's the use mainstream use cases for setdefault ? I was often in the case of Alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Nov 1 22:49:57 2018 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Nov 2018 19:49:57 -0700 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: On Thu, Nov 1, 2018 at 7:23 PM Robert Vanden Eynde wrote: > The two are less connected than you seem to think. >> > > Really ? What's the use mainstream use cases for setdefault ? > I was often in the case of Alex. > Well, defaultdict configures a default when an instance is created, while setdefault() is used when inserting a value. A major issue IMO with defaultdict is that if you try to *read* a non-existing key it will be inserted. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashafer01 at gmail.com Thu Nov 1 22:58:28 2018 From: ashafer01 at gmail.com (Alex Shafer) Date: Thu, 1 Nov 2018 20:58:28 -0600 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: So it actually sounds like having a dict method for performing write operations with a factory function would be a semantic improvement. On Thu, Nov 1, 2018 at 8:50 PM Guido van Rossum wrote: > On Thu, Nov 1, 2018 at 7:23 PM Robert Vanden Eynde > wrote: > >> The two are less connected than you seem to think. >>> >> >> Really ? What's the use mainstream use cases for setdefault ? >> I was often in the case of Alex. >> > > Well, defaultdict configures a default when an instance is created, while > setdefault() is used when inserting a value. > > A major issue IMO with defaultdict is that if you try to *read* a > non-existing key it will be inserted. > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Nov 1 23:34:11 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 2 Nov 2018 14:34:11 +1100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: Message-ID: <20181102033409.GI3817@ando.pearwood.info> On Thu, Nov 01, 2018 at 08:58:28PM -0600, Alex Shafer wrote: > So it actually sounds like having a dict method for performing write > operations with a factory function would be a semantic improvement. As Chris pointed out, that's what __missing__ does. py> class MyDict(dict): ... def __missing__(self, key): ... return "something" ... py> d = MyDict(a=1, b=2) py> d['z'] 'something' py> d {'a': 1, 'b': 2} If you want the key to be inserted, do so in the __missing__ method. Is there something missing (pun not intended) from this existing functionality? The only improvement I'd like to see is to remove the need to subclass, so we could do this: py> d = {'a': 1} # Plain ol' regular dict, not a subclass. py> d.__missing__ = lambda self, key: "something" Traceback (most recent call last): File "", line 1, in AttributeError: 'dict' object has no attribute '__missing__' but as you can see, that doesn't work. We'd need to either give every dict a full __dict__ instance namespace, or a __missing__ slot. Given how rare it is to use __missing__ I suspect the cost is not worth it. The bottom line is, if I understand your proposal, the functionality already exists. All you need do is subclass dict and give it a __missing__ method which does what you want. -- Steve From songofacandy at gmail.com Thu Nov 1 23:58:37 2018 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 2 Nov 2018 12:58:37 +0900 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031120813.69b9a23b@fsol> Message-ID: On Fri, Nov 2, 2018 at 4:45 AM Serhiy Storchaka wrote: > > 31.10.18 13:08, Antoine Pitrou ????: > > +1 from me. dict.pop() already has an optional default. This is a > > straight-forward improvement to the API and no Python programmer will > > be surprised. > > list.pop() corresponds two dict methods. With argument it corresponds > dict.pop(). But there are differences: dict.pop() called repeatedly with > the same key will raise an error (or return the default), while > list.pop() will likely return other item. Without argument it > corresponds dict.popitem() which doesn't have an optional default. > I think there is one more important difference between dict and list. dict has .get(key[, default]), but list doesn't have it. If we add only `list.pop([default])`, it is tempting that using it even when they don't have to remove the item. Unnecessary destructive change is bad. It reduces code readability, and it may create hard bug. If this proposal is adding `list.get([index[, default]])` too, I still -0. I don't know how often it is useful. Regards, -- INADA Naoki From storchaka at gmail.com Fri Nov 2 04:48:22 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 2 Nov 2018 10:48:22 +0200 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: 31.10.18 21:23, Robert Vanden Eynde ????: > Should I write a PEP even though I know it's going to be rejected > because the mailing list was not really into it ? It is better to not do this. PEP 572 was initially written with the intention to be rejected. From steve at pearwood.info Fri Nov 2 06:26:35 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 2 Nov 2018 21:26:35 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031120710.0824d82d@fsol> Message-ID: <20181102102635.GK3817@ando.pearwood.info> On Thu, Nov 01, 2018 at 09:36:19PM +0200, Serhiy Storchaka wrote: > 31.10.18 13:07, Antoine Pitrou ????: > >l.pop(default=...) has the potential to be multi-thread-safe, while > >your alternatives haven't. > > The multi-thread-safe alternative is: > > try: > value = l.pop() > except IndexError: > value = default That's not an expression, so there are limits to where and when you can use it. What we need is a helper function that wraps that, called "pop". And since this seems to be reoccuring request going back nearly 20 years now: https://mail.python.org/pipermail/python-dev/1999-July/000550.html https://stackoverflow.com/questions/31216428/python-pop-from-empty-list as is the more general get(list, index, default=None) helper: https://stackoverflow.com/questions/2574636/getting-a-default-value-on-index-out-of-range-in-python https://stackoverflow.com/questions/2492087/how-to-get-the-nth-element-of-a-python-list-or-a-default-if-not-available https://stackoverflow.com/questions/5125619/why-doesnt-list-have-safe-get-method-like-dictionary https://stackoverflow.com/questions/17721748/default-value-for-out-of-bounds-list-index we could save people from having to re-invent the wheel over and over again by add them to a new module called "things_that_should_be_list_methods_but_arent.py" *wink* -- Steve From steve at pearwood.info Fri Nov 2 06:59:15 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 2 Nov 2018 21:59:15 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: <20181102105912.GL3817@ando.pearwood.info> On Fri, Nov 02, 2018 at 10:48:22AM +0200, Serhiy Storchaka wrote: > 31.10.18 21:23, Robert Vanden Eynde ????: > >Should I write a PEP even though I know it's going to be rejected > >because the mailing list was not really into it ? I disagree that "the mailing list was not really into it". So far, I count 12 people who responded to the original post by Giampaolo. By my count, I see: * five people in favour; * three people against, or see no need for it; * four people I can't tell if they are for or against, (possibly neutral?) [1] I know that adding features isn't decided by majority vote, but it seems clear to me that there is a substantial set of Python users, perhaps a majority, who would find this feature useful and more obvious than the alternatives. [Serhiy] > It is better to not do this. PEP 572 was initially written with the > intention to be rejected. Sounds like an excellent reason to write a PEP :-) There are some issues that ought to be addressed: - The status quo is easy to get wrong: # I've written this. More than once. L.pop(idx) if idx < len(L) else default is wrong if there is any chance of idx being negative. - The more common case of popping from the front of the list is not thread-safe: L.pop() if L else default - This clever trick is probably thread-safe (I think...) but it is wasteful and inefficient: (L or [default]).pop() and it isn't obvious how to adapt it efficiently if you need to pop from an arbitrary index. I came up with this: (L[idx:idx+1] or [default]).pop() but it is doubly wrong. - The obvious thread-safe EAFP idiom is a try...except statement, so it needs to be wrapped in a helper function to use it in expressions. That adds more overhead. The proposed .get(idx, default=x) and .pop(idx, default=x) signatures ought to be obvious and unsurprising to any moderately experienced Python programmer. These aren't complicated APIs. On the other hand: - I'm not volunteering to do the work (I don't know enough C to write a patch). Unless somebody has a patch, we can't expect the core devs who aren't interested in this feature to write it. (Hence, status quo wins a stalemate.) [1] "What makes a man turn neutral? Lust for gold? Power? Or were you just born with a heart full of neutrality?" -- Captain Zapp Brannigan -- Steve From boxed at killingar.net Fri Nov 2 07:39:15 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Fri, 2 Nov 2018 12:39:15 +0100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <20181102105912.GL3817@ando.pearwood.info> References: <20181031010851.GC3817@ando.pearwood.info> <20181102105912.GL3817@ando.pearwood.info> Message-ID: <30307018-58AD-4823-9338-85A748F6E9CE@killingar.net> > So far, I count 12 people who responded to the original post by > Giampaolo. By my count, I see: > > * five people in favour; > * three people against, or see no need for it; > * four people I can't tell if they are for or against, > (possibly neutral?) [1] For the little it's worth I'm +1 too. This seems like an obvious little improvement. / Anders From 2QdxY4RzWzUUiLuE at potatochowder.com Fri Nov 2 08:38:04 2018 From: 2QdxY4RzWzUUiLuE at potatochowder.com (Dan Sommers) Date: Fri, 2 Nov 2018 08:38:04 -0400 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <30307018-58AD-4823-9338-85A748F6E9CE@killingar.net> References: <20181031010851.GC3817@ando.pearwood.info> <20181102105912.GL3817@ando.pearwood.info> <30307018-58AD-4823-9338-85A748F6E9CE@killingar.net> Message-ID: <1292a03c-16c8-4276-8052-e441b60fe622@potatochowder.com> On 11/2/18 7:39 AM, Anders Hovm??ller wrote:> >> So far, I count 12 people who responded to the original post by >> Giampaolo. By my count, I see: >> >> * five people in favour; >> * three people against, or see no need for it; >> * four people I can't tell if they are for or against, >> (possibly neutral?) [1] > > For the little it's worth I'm +1 too. This seems like an obvious little improvement. I'm having a hard time seeing a real use case. Giampaolo's original post contains this link: https://github.com/giampaolo/psutil/blob/d8b05151e65f9348aff9b58da977abd8cacb2127/psutil/_pslinux.py#L1068 Yuck (from an aesthetics standpoint, not a functional standpoint). :-) There's an impedance mismatch between the data, which is structured and has changed apparently arbitrarily between Linux releases, and the return value of string.split, which is an ordered collection. This code effectively hides that mismatch and yields Python tuples, which represent structured data. I can certainly see the desire for a simpler solution (for some definition of simpler), but how would adding a default parameter to list.pop make this code any simpler? Dan From philip.martin2007 at gmail.com Fri Nov 2 12:19:52 2018 From: philip.martin2007 at gmail.com (Philip Martin) Date: Fri, 2 Nov 2018 11:19:52 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON Message-ID: Is there any reason why date, datetime, and UUID objects automatically serialize to default strings using the csv module, but json.dumps throws an error as a default? i.e. import csv import json import io from datetime import date stream = io.StringIO() writer = csv.writer(stream) writer.writerow([date(2018, 11, 2)]) # versus json.dumps(date(2018, 11, 2)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cspealma at redhat.com Fri Nov 2 12:28:18 2018 From: cspealma at redhat.com (Calvin Spealman) Date: Fri, 2 Nov 2018 12:28:18 -0400 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: First, this list is not appropriate. You should ask such a question in python-list. Second, JSON is a specific serialization format that explicitly rejects datetime objects in *all* the languages with JSON libraries. You can only use date objects in JSON if you control or understand both serialization and deserialization ends and have an agreed representation. On Fri, Nov 2, 2018 at 12:20 PM Philip Martin wrote: > Is there any reason why date, datetime, and UUID objects automatically > serialize to default strings using the csv module, but json.dumps throws an > error as a default? i.e. > > import csv > import json > import io > from datetime import date > > stream = io.StringIO() > writer = csv.writer(stream) > writer.writerow([date(2018, 11, 2)]) > # versus > json.dumps(date(2018, 11, 2)) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Nov 2 12:31:33 2018 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 2 Nov 2018 17:31:33 +0100 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: Serialization of those data types is not defined in the JSON standard: https://www.json.org/ so you have to extend the parser/serializers to support them. On 02.11.2018 17:19, Philip Martin wrote: > Is there any reason why date, datetime, and UUID objects automatically > serialize to default strings using the csv module, but json.dumps throws > an error as a default? i.e. > > import csv > import json > import io > from datetime import date > > stream = io.StringIO() > writer = csv.writer(stream) > writer.writerow([date(2018, 11, 2)]) > # versus > json.dumps(date(2018, 11, 2)) > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Nov 02 2018) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From chris.barker at noaa.gov Fri Nov 2 12:52:24 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 2 Nov 2018 09:52:24 -0700 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: <20181102033409.GI3817@ando.pearwood.info> References: <20181102033409.GI3817@ando.pearwood.info> Message-ID: On Thu, Nov 1, 2018 at 8:34 PM, Steven D'Aprano wrote: > The bottom line is, if I understand your proposal, the functionality > already exists. All you need do is subclass dict and give it a > __missing__ method which does what you want. or subclass dict and give it a "setdefault_call") method :-) But as I think Guido wasa pointing out, the real difference here is that DefaultDict, or any other subclass, is specifying what the default callable is for the entire dict, rather than at time of use. Personally, I'm pretty sure I"ve only used one default for any given dict, but I can imaige the are use cases for having different defaults for the same dict depending on context. As for the OP's justification: """ If it's not clear, the purpose is to eliminate the overhead of creating an empty list or similar in situations like this: d = {} for i in range(1000000): # some large loop l = d.setdefault(somekey, []) l.append(somevalue) # instead... for i in range(1000000): l = d.setdefault_call(somekey, list) l.append(somevalue) """ I presume the point is that in the first case, somekey might be often the same, and setdefault requires creating an actual empty list even if the key is alredy there. whereas case 2 will only create the empty list if the key is not there. doing some timing with defaultdict: In [19]: def setdefault(): ...: d = {} ...: somekey = 5 ...: for i in range(1000000): # some large loop ...: l = d.setdefault(somekey, []) ...: l.append(i) ...: return d In [20]: def default_dict(): ...: d = defaultdict(list) ...: somekey = 5 ...: for i in range(1000000): # some large loop ...: l = d[somekey] ...: l.append(i) ...: return d In [21]: % timeit setdefault() 185 ms ? 1.23 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) In [22]: % timeit default_dict() 128 ms ? 1.65 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) so yeah, it's a little more performant, and I suppose if you were using a more expensive constructor, it would make a lot more difference. But then, how much is it likely to matter in a real use cases -- this was 1 million calls for one key and you got a 50% speed up -- is that common? So it seems this would give us slightly better performance than .setdefault() for the use cases where you are using more than one default for a given dict. BTW: +1 for a mention of defaultdict in the dict.setdefault docs -- you can't do everything with defaultdict that you can with setdefault, but it is a very common use case. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Nov 2 13:16:24 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 2 Nov 2018 10:16:24 -0700 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <20181102105912.GL3817@ando.pearwood.info> References: <20181031010851.GC3817@ando.pearwood.info> <20181102105912.GL3817@ando.pearwood.info> Message-ID: On Fri, Nov 2, 2018 at 3:59 AM, Steven D'Aprano wrote: > - I'm not volunteering to do the work (I don't know enough C to write > a patch). Unless somebody has a patch, we can't expect the core devs > who aren't interested in this feature to write it. > > (Hence, status quo wins a stalemate.) > Well, it would be frustrating to have a feature accepted but not implemented, but the steps are separate. And it wouldn't have to be a core dev that implements it -- anyone with the C chops (not me :-) ) could do it. As an example, a good chunk of PEP 485 was implemented by someone else (I wrote the first prototype, but it was not complete), who I'm pretty sure is not a core dev. A core dev has to actually merge it, of course, and that is a bottleneck, but not a show stopper. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Nov 2 13:26:58 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 2 Nov 2018 10:26:58 -0700 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Fri, Nov 2, 2018 at 9:31 AM, M.-A. Lemburg wrote: > Serialization of those data types is not defined in the JSON standard: > > https://www.json.org/ That being said, ISO 8601 is a standard for datetime stamps, and a defacto one for JSON So building encoding of datetime into Python's json encoder would be pretty useful. (I would not have any automatic decoding though -- as an ISO8601 string would still be just a string in JSON) Could we have a "pedantic" mode for "fully standard conforming" JSON, and then add some extensions to the standard? As another example, I would find it very handy if the json decoder would respect comments in JSON (I know that they are explicitly not part of the standard), but they are used in other applications, particularly when JSON is used as a configuration language. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip.martin2007 at gmail.com Fri Nov 2 13:31:22 2018 From: philip.martin2007 at gmail.com (Philip Martin) Date: Fri, 2 Nov 2018 12:31:22 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: My bad. I think I need to clarify my objective. I definitely understand the issues regarding serialization/deserialization on JSON, i.e. decimals as strings, etc., and hooking in a default serializer function is easy enough. I guess my question is more related to why the csv writer and DictWriter don't provide similar functionality for serialization/deserialization hooks? There seems to be a wide gap between reaching for a tool like pandas were maybe too much auto-magical parsing and guessing happens, and wrapping the functionality around the csv module IMO. I was curious to see if anyone else had similar opinions, and if so, whether conversion around what extended functionality would be most fruitful? On Fri, Nov 2, 2018 at 11:28 AM Calvin Spealman wrote: > First, this list is not appropriate. You should ask such a question in > python-list. > > Second, JSON is a specific serialization format that explicitly rejects > datetime objects in *all* the languages with JSON libraries. You can only > use date objects in JSON if you control or understand both serialization > and deserialization ends and have an agreed representation. > > On Fri, Nov 2, 2018 at 12:20 PM Philip Martin > wrote: > >> Is there any reason why date, datetime, and UUID objects automatically >> serialize to default strings using the csv module, but json.dumps throws an >> error as a default? i.e. >> >> import csv >> import json >> import io >> from datetime import date >> >> stream = io.StringIO() >> writer = csv.writer(stream) >> writer.writerow([date(2018, 11, 2)]) >> # versus >> json.dumps(date(2018, 11, 2)) >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Nov 2 13:49:00 2018 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 3 Nov 2018 02:49:00 +0900 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: References: Message-ID: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> Andre Delfino writes: > Frequently, while globbing, one needs to work with multiple extensions. I?d > like to propose for fnmatch.filter to handle a tuple of patterns (while > preserving the single str argument functionality, alas str.endswith), This is one of those famous 3-line functions, though: import fnmatch def multifilter(names, *patterns): result = [] for p in patterns: result.extend(fnmatch.filter(names, p) return result It's a 3-line function in 5 lines, OK, but still. > as a first step for glob.i?glob to accept multiple patterns as well. If you're going to improve the glob module, why not use bash or zsh extended globbing ('**', '{a,b}') as the model? This is more powerful, and already familiar to many users. From rosuav at gmail.com Fri Nov 2 14:00:44 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 3 Nov 2018 05:00:44 +1100 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull wrote: > > Andre Delfino writes: > > > Frequently, while globbing, one needs to work with multiple extensions. I?d > > like to propose for fnmatch.filter to handle a tuple of patterns (while > > preserving the single str argument functionality, alas str.endswith), > > This is one of those famous 3-line functions, though: > > import fnmatch > def multifilter(names, *patterns): > result = [] > for p in patterns: > result.extend(fnmatch.filter(names, p) > return result > > It's a 3-line function in 5 lines, OK, but still. > And like many "hey it's this easy" demonstrations, that isn't quite identical, as a single file can match multiple patterns (but shouldn't be in the result multiple times). Whether that's an important distinction or not remains to be seen, but I do know of situations where this would have bitten me. ChrisA From boxed at killingar.net Fri Nov 2 14:27:04 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Fri, 2 Nov 2018 19:27:04 +0100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: <20181102033409.GI3817@ando.pearwood.info> Message-ID: <87E85E43-A6E8-4FFA-8534-122E4F134C65@killingar.net> Just a little improvement: you don't need the l local variable, you can just call append: d.setdefault(foo, []).append(bar) And correspondingly: d[foo].append(bar) > On 2 Nov 2018, at 17:52, Chris Barker via Python-ideas wrote: > >> On Thu, Nov 1, 2018 at 8:34 PM, Steven D'Aprano wrote: >> The bottom line is, if I understand your proposal, the functionality >> already exists. All you need do is subclass dict and give it a >> __missing__ method which does what you want. > > or subclass dict and give it a "setdefault_call") method :-) > > But as I think Guido wasa pointing out, the real difference here is that DefaultDict, or any other subclass, is specifying what the default callable is for the entire dict, rather than at time of use. Personally, I'm pretty sure I"ve only used one default for any given dict, but I can imaige the are use cases for having different defaults for the same dict depending on context. > > As for the OP's justification: > > """ > If it's not clear, the purpose is to eliminate the overhead of creating an empty list or similar in situations like this: > > d = {} > for i in range(1000000): # some large loop > l = d.setdefault(somekey, []) > l.append(somevalue) > > # instead... > > for i in range(1000000): > l = d.setdefault_call(somekey, list) > l.append(somevalue) > > """ > > I presume the point is that in the first case, somekey might be often the same, and setdefault requires creating an actual empty list even if the key is alredy there. whereas case 2 will only create the empty list if the key is not there. doing some timing with defaultdict: > > In [19]: def setdefault(): > ...: d = {} > ...: somekey = 5 > ...: for i in range(1000000): # some large loop > ...: l = d.setdefault(somekey, []) > ...: l.append(i) > ...: return d > > In [20]: def default_dict(): > ...: d = defaultdict(list) > ...: somekey = 5 > ...: for i in range(1000000): # some large loop > ...: l = d[somekey] > ...: l.append(i) > ...: return d > > In [21]: % timeit setdefault() > 185 ms ? 1.23 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) > > In [22]: % timeit default_dict() > 128 ms ? 1.65 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) > > so yeah, it's a little more performant, and I suppose if you were using a more expensive constructor, it would make a lot more difference. But then, how much is it likely to matter in a real use cases -- this was 1 million calls for one key and you got a 50% speed up -- is that common? > > So it seems this would give us slightly better performance than .setdefault() for the use cases where you are using more than one default for a given dict. > > BTW: > > +1 for a mention of defaultdict in the dict.setdefault docs -- you can't do everything with defaultdict that you can with setdefault, but it is a very common use case. > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Fri Nov 2 14:41:40 2018 From: mike at selik.org (Michael Selik) Date: Fri, 2 Nov 2018 11:41:40 -0700 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Fri, Nov 2, 2018 at 10:31 AM Philip Martin wrote: > [Why don't] csv writer and DictWriter provide ... > serialization/deserialization hooks? > Do you have a specific use-case in mind? My intuition is that comprehensions provide sufficient functionality such that changing the csv module interface is unnecessary. Unlike JSON, CSV files are easy to read in a streaming/iterator fashion, so the module doesn't need to provide a way to intercept values during a holistic encode/decode. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Nov 2 15:17:25 2018 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 2 Nov 2018 15:17:25 -0400 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: JSON-LD supports datetimes (as e.g. IS8601 xsd:datetimes) https://www.w3.org/TR/json-ld11/#typed-values Jsonpickle (Python, JS, ) supports datetimes, numpy arrays, pandas dataframes https://github.com/jsonpickle/jsonpickle JSON5 supports comments in JSON. https://github.com/json5/json5/issues/3 ... Some form of schema is necessary to avoid having to try parsing every string value as a date time (and to specify precision: "2018" is not the same as "2018 00:00:01") On Friday, November 2, 2018, Chris Barker via Python-ideas < python-ideas at python.org> wrote: > On Fri, Nov 2, 2018 at 9:31 AM, M.-A. Lemburg wrote: > >> Serialization of those data types is not defined in the JSON standard: >> >> https://www.json.org/ > > > That being said, ISO 8601 is a standard for datetime stamps, and a defacto > one for JSON > > So building encoding of datetime into Python's json encoder would be > pretty useful. > > (I would not have any automatic decoding though -- as an ISO8601 string > would still be just a string in JSON) > > Could we have a "pedantic" mode for "fully standard conforming" JSON, and > then add some extensions to the standard? > > As another example, I would find it very handy if the json decoder would > respect comments in JSON (I know that they are explicitly not part of the > standard), but they are used in other applications, particularly when JSON > is used as a configuration language. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Nov 2 20:05:24 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 3 Nov 2018 11:05:24 +1100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: References: <20181102033409.GI3817@ando.pearwood.info> Message-ID: <20181103000524.GP3817@ando.pearwood.info> On Fri, Nov 02, 2018 at 09:52:24AM -0700, Chris Barker wrote: > On Thu, Nov 1, 2018 at 8:34 PM, Steven D'Aprano wrote: > > > The bottom line is, if I understand your proposal, the functionality > > already exists. All you need do is subclass dict and give it a > > __missing__ method which does what you want. > > > or subclass dict and give it a "setdefault_call") method :-) Well sure, if we're making up our own methods and calling them anything we like :-) The status quo (as I see it): dict.setdefault: - takes an explicit, but eagerly evaluated, default value; dict.__missing__: - requires subclassing to make it work; - passes the missing key to the method, so the method can decide what value to return; defaultdict: - takes a zero-argument factory function which is unconditionally called when the key is missing. Did I miss any? What we don't have is a version of setdefault where the default is evaluated only on need. That would be a great use-case for Call-By-Name semantics and thunks, if Python supported such :-) (That's just a half-baked thought, not a concrete proposal.) > But as I think Guido wasa pointing out, the real difference here is that > DefaultDict, or any other subclass, is specifying what the default callable > is for the entire dict, rather than at time of use. As you show below, a default callable for the dict is precisely the use-case the OP gives: l = d.setdefault_call(somekey, list) would be equivalent to defaultdict(list) and l = d[somekey]. (I think. Have I missed something?) Nevertheless, Guido's point is reasonable -- if it comes up in practice often enough to care. [...] > As for the OP's justification: > > """ > If it's not clear, the purpose is to eliminate the overhead of creating an > empty list or similar in situations like this: > > d = {} > for i in range(1000000): # some large loop > l = d.setdefault(somekey, []) > l.append(somevalue) > > # instead... > > for i in range(1000000): > l = d.setdefault_call(somekey, list) > l.append(somevalue) > > """ Are we sure that the overhead is significantly more than the cost of the name lookup of "list" and the expense of calling it? You do demonstrate a speed difference with defaultdict (thanks for doing the timing tests) but the situation isn't precisely comparable to the proposed method, since you aren't looking up the name "list" each time through the outer loop. Could construction of the empty list be optimized more? That might reduce the benefit even further (at least for the given case, but not for the general case of an arbitrarily expensive default). We keep coming up against the issue of *eager evaluation* versus *delayed evaluation*, and I can't help feel that rather that solving this problem on an ad-hoc basis each time it comes up, maybe we really do need a way to tell the interpreter "delay evaluating this expression until needed". Then we could use it anywhere it was important, without having to create a plethora of special case setdefault_call() methods and the like. -- Steve From boxed at killingar.net Fri Nov 2 20:15:04 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Sat, 3 Nov 2018 01:15:04 +0100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: <20181103000524.GP3817@ando.pearwood.info> References: <20181102033409.GI3817@ando.pearwood.info> <20181103000524.GP3817@ando.pearwood.info> Message-ID: <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> > defaultdict: > - takes a zero-argument factory function which is > unconditionally called when the key is missing. > > Did I miss any? > > What we don't have is a version of setdefault where the default is > evaluated only on need. That would be a great use-case for Call-By-Name > semantics and thunks, if Python supported such :-) Could you explain what the difference is between defaultdicts "factory which is unconditionally called when the key is missing" and "the default is evaluated only on need"? To me it seems that "unconditionally called when the key is missing" is a complex way of saying "called only when needed". I must be missing some nuance here. / Anders From mike at selik.org Fri Nov 2 21:34:20 2018 From: mike at selik.org (Michael Selik) Date: Fri, 2 Nov 2018 18:34:20 -0700 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> References: <20181102033409.GI3817@ando.pearwood.info> <20181103000524.GP3817@ando.pearwood.info> <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> Message-ID: On Fri, Nov 2, 2018 at 5:25 PM Anders Hovm?ller wrote: > Could you explain what the difference is between defaultdicts "factory > which is unconditionally called when the key is missing" and "the default > is evaluated only on need"? > The distinction was the motivation for this thread: setdefault requires a constructed default instance as an argument, regardless of whether the key is missing, whereas defaultdict's factory is only called if necessary. If the key is present in a defaultdict, no default is constructed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Nov 2 22:49:11 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 3 Nov 2018 13:49:11 +1100 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> References: <20181102033409.GI3817@ando.pearwood.info> <20181103000524.GP3817@ando.pearwood.info> <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> Message-ID: <20181103024911.GQ3817@ando.pearwood.info> On Sat, Nov 03, 2018 at 01:15:04AM +0100, Anders Hovm?ller wrote: > > > defaultdict: > > - takes a zero-argument factory function which is > > unconditionally called when the key is missing. > > > > Did I miss any? > > > > What we don't have is a version of setdefault where the default is > > evaluated only on need. That would be a great use-case for Call-By-Name > > semantics and thunks, if Python supported such :-) > > Could you explain what the difference is between defaultdicts "factory > which is unconditionally called when the key is missing" and "the > default is evaluated only on need"? To me it seems that > "unconditionally called when the key is missing" is a complex way of > saying "called only when needed". I must be missing some nuance here. Consider the use-case where you want to pass a different default value to the dict each time: d.setdefault(key, expensive_function(1, 2, 3)) d.setdefault(key, expensive_function(4, 8, 16)) d.setdefault(key, expensive_function(10, 100, 1000)) The expensive function is eagerly evaluated each time you call setdefault, whether the result is needed or not. defaultdict won't help, because your factory function takes no arguments: there's no way to supply arguments for the factory. __missing__ won't help, because it only receives the key, not arbitrary arguments. We can of course subclass dict and give it a method with the semantics we want: d.my_setdefault(key, expensive_function, args=(1, 2, 3), kw={}) but it would be nicer and more expressive if we could tell the interpreter "don't evaluate expensive_function(...) unless you really need it". Other languages have this -- I believe it is called "Call By Need" or "Call By Name", depending on the precise details of how it works. I call it delayed evaluation, and Python already has it, but only in certain special syntactic forms: spam and spam or if condition else There are others: e.g. the body of functions, including lambda. But functions are kinda heavyweight to make and build and call. -- Steve From daveshawley at gmail.com Sat Nov 3 09:01:44 2018 From: daveshawley at gmail.com (David Shawley) Date: Sat, 3 Nov 2018 09:01:44 -0400 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Nov 2, 2018, at 12:28 PM, Calvin Spealman wrote: > Second, JSON is a specific serialization format that explicitly rejects > datetime objects in *all* the languages with JSON libraries. You can only > use date objects in JSON if you control or understand both serialization > and deserialization ends and have an agreed representation. I would hardly say that "rejects datetime objects in *all* languages..." Most Javascript implementations do handle dates correctly which is a bit telling for me. For example, the Mozilla reference calls out Date as explicitly supported [1]. I also ran it through the Javascript console and repl.it to make sure that it wasn't a doc glitch [2]. Go also supports serialization of date/times as shown in this repl.it session [3]. As does rust, though rust doesn't use ISO-8601 [4]. That being said, I'm +1 on adding support for serializing datetime.date and datetime.datetime *but* I'm -1 on automatically deserializing anything that looks like a ISO-8601 in json.load*. The asymmetry is the only thing that kept me from bringing this up previously. What about implementing this as a protocol? The Javascript implementation of JSON.stringify looks for a method named toJSON() when it encounters a non-primitive type and uses the result for serialization. This would be a pretty easy lift in json.JSONEncoder.default: class JSONEncoder(object): def default(self, o): if hasattr(o, 'to_json'): return o.to_json(self) raise TypeError(f'Object of type {o.__class__.__name__} ' f'is not JSON serializable') I would recommend passing the JSONEncoder instance in to ``to_json()`` as I did in the snippet. This makes serialization much easier for classes since they do not have to assume a particular set of JSON serialization options. Is this something that is PEP-worthy or is a PR with a simple flag to enable the functionality in JSON encoder enough? - cheers, dave. [1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify [2]: https://repl.it/@dave_shawley/OffensiveParallelResource [3]: https://repl.it/@dave_shawley/EvenSunnyForce [4]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=73de1454da4ac56900cde37edb0d6c8f From rosuav at gmail.com Sat Nov 3 09:29:32 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 4 Nov 2018 00:29:32 +1100 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Sun, Nov 4, 2018 at 12:02 AM David Shawley wrote: > > On Nov 2, 2018, at 12:28 PM, Calvin Spealman wrote: > > > Second, JSON is a specific serialization format that explicitly rejects > > datetime objects in *all* the languages with JSON libraries. You can only > > use date objects in JSON if you control or understand both serialization > > and deserialization ends and have an agreed representation. > > I would hardly say that "rejects datetime objects in *all* languages..." > > Most Javascript implementations do handle dates correctly which is a bit > telling for me. For example, the Mozilla reference calls out Date as > explicitly supported [1]. I also ran it through the Javascript console and > repl.it to make sure that it wasn't a doc glitch [2]. I think we need to clarify an important distinction here. JSON, as a format, does *not* support date/time objects in any way. But JavaScript's JSON.stringify() function is happy to accept them, and will represent them as strings. If the suggestion here is to have json.dumps(datetime.date(2018,11,4)) to return an encoded string, either by natively supporting it, or by having a protocol which the date object implements, that's fine and reasonable; but json.loads(s) won't return that date object. So, yes, it would be asymmetric. I personally don't have a problem with this (though I also don't have any strong use-cases). Custom encoders and decoders could do this, with or without symmetry. What would it be like to add a couple to the json module that can handle these extra types? ChrisA From storchaka at gmail.com Sat Nov 3 09:46:38 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 3 Nov 2018 15:46:38 +0200 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: 02.11.18 19:26, Chris Barker via Python-ideas ????: > On Fri, Nov 2, 2018 at 9:31 AM, M.-A. Lemburg > > wrote: > > Serialization of those data types is not defined in the JSON standard: > > https://www.json.org/ > > > That being said, ISO 8601 is a standard for datetime stamps, and a > defacto one for JSON It is not the only standard. Other common representation is as POSIX timestamp. And, for a date without time, the Julian day. From daveshawley at gmail.com Sat Nov 3 10:00:42 2018 From: daveshawley at gmail.com (David Shawley) Date: Sat, 3 Nov 2018 10:00:42 -0400 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Nov 3, 2018, at 9:29 AM, Chris Angelico wrote: > > I think we need to clarify an important distinction here. JSON, as a > format, does *not* support date/time objects in any way. But > JavaScript's JSON.stringify() function is happy to accept them, and > will represent them as strings. > Very good point. The JSON document type only supports object literals, numbers, strings, and Boolean literals. My suggestion was specifically to provide an extensible mechanism for encoding arbitrary objects into the supported primitives. > If the suggestion here is to have json.dumps(datetime.date(2018,11,4)) > to return an encoded string, either by natively supporting it, or by > having a protocol which the date object implements, that's fine and > reasonable; but json.loads(s) won't return that date object. So, yes, > it would be asymmetric. I personally don't have a problem with this > (though I also don't have any strong use-cases). Custom encoders and > decoders could do this, with or without symmetry. What would it be > like to add a couple to the json module that can handle these extra > types? Completely agreed here. I've seen many attempts to support "round trip" encode/decode in JSON libraries and it really doesn't work well unless you go down the path of type hinting. I believe that MongoDB uses something akin to hinting when it handles dates. Something like the following representation if I recall correctly. { "now": { "$type": "JSONDate", "value": "2018-11-03T09:52:20-0400" } } During deserialization they recognize the hint and instantiate the object instead of parsing it. This is interesting but pretty awful for interoperability since there isn't a standard that I'm aware of. I'm certainly not proposing that but I did want to mention it for completeness. I'll try to put together a PR/branch that adds protocol support in JSON encoder and to datetime, date, and uuid as well. It will give us something to point at and discuss. - cheers, dave. -- Mathematics provides a framework for dealing precisely with notions of "what is". Computation provides a framework for dealing precisely with notions of "how to". SICP Preface -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Nov 3 10:16:53 2018 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 3 Nov 2018 10:16:53 -0400 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: jsondate, for example, supports both .load[s]() and .dump[s](); but only for UTC datetimes https://github.com/rconradharris/jsondate/blob/master/jsondate/__init__.py UTC is only sometimes a fair assumption; otherwise it's dangerous to assume that timezone-naieve [ISO8601] strings represent UTC-0 datetimes. In that respect - aside from readability - arbitrary-precision POSIX timestamps are less error-prone. On Saturday, November 3, 2018, David Shawley wrote: > On Nov 3, 2018, at 9:29 AM, Chris Angelico wrote: > > > I think we need to clarify an important distinction here. JSON, as a > format, does *not* support date/time objects in any way. But > JavaScript's JSON.stringify() function is happy to accept them, and > will represent them as strings. > > > Very good point. The JSON document type only supports object literals, > numbers, strings, and Boolean literals. My suggestion was specifically to > provide an extensible mechanism for encoding arbitrary objects into the > supported primitives. > > If the suggestion here is to have json.dumps(datetime.date(2018,11,4)) > to return an encoded string, either by natively supporting it, or by > having a protocol which the date object implements, that's fine and > reasonable; but json.loads(s) won't return that date object. So, yes, > it would be asymmetric. I personally don't have a problem with this > (though I also don't have any strong use-cases). Custom encoders and > decoders could do this, with or without symmetry. What would it be > like to add a couple to the json module that can handle these extra > types? > > > Completely agreed here. I've seen many attempts to support "round trip" > encode/decode in JSON libraries and it really doesn't work well unless you > go > down the path of type hinting. I believe that MongoDB uses something akin > to > hinting when it handles dates. Something like the following representation > if I recall correctly. > > { > "now": { > "$type": "JSONDate", > "value": "2018-11-03T09:52:20-0400" > } > } > > During deserialization they recognize the hint and instantiate the object > instead of parsing it. This is interesting but pretty awful for > interoperability since there isn't a standard that I'm aware of. I'm > certainly not proposing that but I did want to mention it for completeness. > > I'll try to put together a PR/branch that adds protocol support in JSON > encoder > and to datetime, date, and uuid as well. It will give us something to > point at > and discuss. > > - cheers, dave. > -- > Mathematics provides a framework for dealing precisely with notions of > "what is". > Computation provides a framework for dealing precisely with notions of > "how to". > SICP Preface > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Nov 3 13:29:55 2018 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 4 Nov 2018 02:29:55 +0900 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> Message-ID: <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull > wrote: > > Andre Delfino writes: > > > Frequently, while globbing, one needs to work with multiple > > > extensions. I?d like to propose for fnmatch.filter to handle a > > > tuple of patterns (while preserving the single str argument > > > functionality, alas str.endswith), > > This is one of those famous 3-line functions, though: > > > > import fnmatch > > def multifilter(names, *patterns): > > result = [] > > for p in patterns: > > result.extend(fnmatch.filter(names, p)) > > return result > > > > It's a 3-line function in 5 lines, OK, but still. > And like many "hey it's this easy" demonstrations, that isn't quite > identical, as a single file can match multiple patterns Sure. I would have written it with set.union() on general principles except I forgot how to say "union", didn't feel like looking it up, and wanted to keep the def as close to 3 lines as I could without being obfuscated (see below). I wonder how many people would fall into the trap I did. (I don't consider myself a great programmer, but maybe that's all the more reason for this? Not-so-great minds think alike? :-) I was really more interested in the second question, though. Why invent yet another interface when we already have one that is well-known and more powerful? P.S. I can't resist. This is horrible, but: def multifilter(names, *patterns): return list(set().union(*[fnmatch.filter(names, p) for p in patterns])) Who even needs a function? ;-) From mertz at gnosis.cx Sat Nov 3 13:45:10 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 3 Nov 2018 13:45:10 -0400 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Nov 3, 2018, 1:30 PM Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp wrote: > P.S. I can't resist. This is horrible, but: > > def multifilter(names, *patterns): > return list(set().union(*[fnmatch.filter(names, p) for p in patterns])) > Yes, that is a horrible spelling for: {fnmatch.filter(names, p) for p in patterns} ;-) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Nov 3 15:02:19 2018 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 3 Nov 2018 19:02:19 +0000 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> Message-ID: <5bc59e6b-119a-6419-1243-2626b224280e@mrabarnett.plus.com> On 2018-11-03 17:45, David Mertz wrote: > On Sat, Nov 3, 2018, 1:30 PM Stephen J. Turnbull > wrote: > > P.S.? ?I can't resist.? This is horrible, but: > > def multifilter(names, *patterns): > ? ? return list(set().union(*[fnmatch.filter(names, p) for p in > patterns])) > > > Yes, that is a horrible spelling for: > > ? ? {fnmatch.filter(names, p) for p in patterns} > > ;-) > But it has the advantage that it works. :-) From rosuav at gmail.com Sat Nov 3 15:15:21 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 4 Nov 2018 06:15:21 +1100 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sun, Nov 4, 2018 at 4:29 AM Stephen J. Turnbull wrote: > > Chris Angelico writes: > > On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull > > wrote: > > > Andre Delfino writes: > > > > > Frequently, while globbing, one needs to work with multiple > > > > extensions. I?d like to propose for fnmatch.filter to handle a > > > > tuple of patterns (while preserving the single str argument > > > > functionality, alas str.endswith), > > > > This is one of those famous 3-line functions, though: > > > > > > import fnmatch > > > def multifilter(names, *patterns): > > > result = [] > > > for p in patterns: > > > result.extend(fnmatch.filter(names, p)) > > > return result > > > > > > It's a 3-line function in 5 lines, OK, but still. > > > And like many "hey it's this easy" demonstrations, that isn't quite > > identical, as a single file can match multiple patterns > > Sure. I would have written it with set.union() on general principles > except I forgot how to say "union", didn't feel like looking it up, > and wanted to keep the def as close to 3 lines as I could without > being obfuscated (see below). I wonder how many people would fall > into the trap I did. (I don't consider myself a great programmer, but > maybe that's all the more reason for this? Not-so-great minds think > alike? :-) A very fair point; and still supporting the notion that "it's a 3-line function" doesn't instantly silence the need. TBH, it's the moments when we AREN'T great programmers that we need the language to help us out. Why is it that we love strong rules and tight exceptions? Because they tell us when we've done something stupid, and help us to fix that bug with a minimum of fuss :) > I was really more interested in the second question, though. Why > invent yet another interface when we already have one that is > well-known and more powerful? That kind of globbing might also solve the use-cases, but I'm worried about backward compatibility. Creating more glob-special characters could potentially change the meaning of globs that are already in use. I don't personally glob files with braces in their names, but someone somewhere is doing it (and I do have a bunch of files with UUIDs in their names, mainly in Wine directories); adding a feature like that might break code, or alternatively, would have to be fnmatch_with_braces(). In contrast, accepting a tuple of strings can't possibly break any working code that uses individual strings. > P.S. I can't resist. This is horrible, but: > > def multifilter(names, *patterns): > return list(set().union(*[fnmatch.filter(names, p) for p in patterns])) > > Who even needs a function? ;-) > .... wow. I do want to make one small change to it, though: instead of list() at the end of the chain, I'd use sorted(). You're throwing away the original order of file names, so it'd look tidier to return them in order, rather than in whichever order iterating over the set gives them. Also, I am a very very bad person for suggesting an 'improvement' to a function of that nature. That is... a piece of art. Modern art, the sort where you go "This is incomprehensible therefore it is beautiful". :) ChrisA From rosuav at gmail.com Sat Nov 3 15:18:43 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 4 Nov 2018 06:18:43 +1100 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Sun, Nov 4, 2018 at 1:00 AM David Shawley wrote: > Very good point. The JSON document type only supports object literals, > numbers, strings, and Boolean literals. My suggestion was specifically to > provide an extensible mechanism for encoding arbitrary objects into the > supported primitives. > Okay, so to clarify: We currently have a mechanism for custom encoders and decoders, which you have to specify as you're thinking about encoding. But you're proposing having the core json.dumps() allow objects to customize their own representation. Sounds like a plan, and not even all that complex a plan. ChrisA From mertz at gnosis.cx Sat Nov 3 15:29:26 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 3 Nov 2018 15:29:26 -0400 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: <5bc59e6b-119a-6419-1243-2626b224280e@mrabarnett.plus.com> References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> <23517.56083.203909.511942@turnbull.sk.tsukuba.ac.jp> <5bc59e6b-119a-6419-1243-2626b224280e@mrabarnett.plus.com> Message-ID: On Sat, Nov 3, 2018 at 3:03 PM MRAB wrote: > > Yes, that is a horrible spelling for: > > > > {fnmatch.filter(names, p) for p in patterns} > > But it has the advantage that it works. :-) > Indeed! Excellent point :-). I definitely should not post untested code from my tablet. This is still slightly less horrible, but I recognize it's starting to border on horrible: {n for p in patterns for n in fnmatch.filter(names, p)} This seems worse: set(chain(*(fnmatch.filter(names, p) for p in patterns))) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python.gem at gmail.com Sat Nov 3 16:54:39 2018 From: python.gem at gmail.com (Joy Diamond) Date: Sat, 3 Nov 2018 16:54:39 -0400 Subject: [Python-ideas] Are we supposed to be able to have our own class dictionary in python 3? Message-ID: Team, Are we supposed to be able to have our own class dictionary in python 3? If we currently cannot -- do we want to be able to? That we can have out own class dictionary in python 3 is strongly implied in the following at https://www.python.org/dev/peps/pep-3115/ where it says: """ # The metaclass invocation def __new__(cls, name, bases, classdict): # Note that we replace the classdict with a regular # dict before passing it to the superclass, so that we # don't continue to record member names after the class # has been created. result = type.__new__(cls, name, bases, dict(classdict)) result.member_names = classdict.member_names return result """ I don't understand this. As far as I can tell, no matter what class dictionary you pass into `type.__new__` it creates a copy of it. Am I missing something? Is this supposed to work? Is the documentation wrong? Thanks, Joy Diamond. Program that shows that the class dictionary created is not what we pass in --- Shows the actual symbol table is `dict` not `SymbolTable` class SymbolTable(dict): pass members = SymbolTable(a = 1) X = type('X', ((object,)), members) members['b'] = 2 print('X.a: {}'.format(X.a)) try: print('X.b: {}'.format(X.b)) except AttributeError as e: print('X.b: does not exist') # # Get the actual symbol table of `X`, bypassing the mapping proxy. # X__symbol_table = __import__('gc').get_referents(X.__dict__)[0] print('The type of the actual symbol table of X is: {} with keys: {}'.format( type(X__symbol_table), X__symbol_table.keys())) # Prints out # X.a: 1 # X.b: does not exist # The type of the actual symbol table of X is: with keys: dict_keys(['a', '__module__', '__dict__', '__weakref__', '__doc__']) Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Nov 3 17:43:57 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 04 Nov 2018 10:43:57 +1300 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: <5BDE169D.4040408@canterbury.ac.nz> David Shawley wrote: > I'm +1 on adding support for serializing datetime.date and > datetime.datetime *but* I'm -1 on automatically deserializing anything that > looks like a ISO-8601 in json.load*. The asymmetry is the only thing that > kept me from bringing this up previously. This asymmetry bothers me too. It makes me think that datetime handling belongs at a different level of abstraction, something that knows about the structure of the data being serialised or deserialised. Java's JSON libraries have a mechanism where you can give it a class and a lump of JSON and it will figure out from runtime type information what to do. It seems like we should be able to do something similar using type annotations. -- Greg From dfmoisset at gmail.com Sat Nov 3 19:59:02 2018 From: dfmoisset at gmail.com (Daniel Moisset) Date: Sat, 3 Nov 2018 23:59:02 +0000 Subject: [Python-ideas] Are we supposed to be able to have our own class dictionary in python 3? In-Reply-To: References: Message-ID: Sorry, should have replied to the list too On Sat, 3 Nov 2018, 23:55 Daniel Moisset If I understood correctly what you want, it's possible with a metaclass. > Check the __prepare__ method at > https://docs.python.org/3/reference/datamodel.html#preparing-the-class-namespace > and Pep 3115 > > On Sat, 3 Nov 2018, 20:55 Joy Diamond >> Team, >> >> Are we supposed to be able to have our own class dictionary in python 3? >> >> If we currently cannot -- do we want to be able to? >> >> That we can have out own class dictionary in python 3 is strongly implied >> in the following at https://www.python.org/dev/peps/pep-3115/ where it >> says: >> >> """ >> >> # The metaclass invocation >> def __new__(cls, name, bases, classdict): >> # Note that we replace the classdict with a regular >> # dict before passing it to the superclass, so that we >> # don't continue to record member names after the class >> # has been created. >> result = type.__new__(cls, name, bases, dict(classdict)) >> result.member_names = classdict.member_names >> return result >> >> """ >> >> I don't understand this. As far as I can tell, no matter what class >> dictionary you pass into `type.__new__` it creates a copy of it. >> >> Am I missing something? Is this supposed to work? Is the documentation >> wrong? >> >> Thanks, >> >> Joy Diamond. >> >> Program that shows that the class dictionary created is not what we pass >> in --- Shows the actual symbol table is `dict` not `SymbolTable` >> >> class SymbolTable(dict): >> pass >> >> members = SymbolTable(a = 1) >> >> X = type('X', ((object,)), members) >> >> members['b'] = 2 >> >> print('X.a: {}'.format(X.a)) >> >> try: >> print('X.b: {}'.format(X.b)) >> except AttributeError as e: >> print('X.b: does not exist') >> >> # >> # Get the actual symbol table of `X`, bypassing the mapping proxy. >> # >> X__symbol_table = __import__('gc').get_referents(X.__dict__)[0] >> >> print('The type of the actual symbol table of X is: {} with keys: >> {}'.format( >> type(X__symbol_table), >> X__symbol_table.keys())) >> >> >> # Prints out >> # X.a: 1 >> # X.b: does not exist >> # The type of the actual symbol table of X is: with keys: >> dict_keys(['a', '__module__', '__dict__', '__weakref__', '__doc__']) >> >> >> Virus-free. >> www.avast.com >> >> <#m_8752709194315235090_m_603818574152564677_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mixie at gmail.com Sat Nov 3 20:02:56 2018 From: amit.mixie at gmail.com (Amit Green) Date: Sat, 3 Nov 2018 20:02:56 -0400 Subject: [Python-ideas] Are we supposed to be able to have our own class dictionary in python 3? In-Reply-To: References: Message-ID: Thanks Daniel, I found my answer here (using your link): https://docs.python.org/3/reference/datamodel.html#preparing-the-class-namespace """ When a new class is created by type.__new__, the object provided as the namespace parameter is copied to a new ordered mapping and the original object is discarded. """ Therefore the answer seems to be that https://www.python.org/dev/peps/pep-3115/ needs to be updated & fixed. Replace the following: """ def __new__(cls, name, bases, classdict): # Note that we replace the classdict with a regular # dict before passing it to the superclass, so that we # don't continue to record member names after the class # has been created. result = type.__new__(cls, name, bases, dict(classdict)) result.member_names = classdict.member_names return result """ With: """ def __new__(cls, name, bases, classdict): result = type.__new__(cls, name, bases, classdict) result.member_names = classdict.member_names return result """ Removing the incorrect comments & the copying of `classdict` I will go file a bug report to that effect. Thanks, Joy Diamond. On Sat, Nov 3, 2018 at 7:55 PM Daniel Moisset wrote: > If I understood correctly what you want, it's possible with a metaclass. > Check the __prepare__ method at > https://docs.python.org/3/reference/datamodel.html#preparing-the-class-namespace > and Pep 3115 > > On Sat, 3 Nov 2018, 20:55 Joy Diamond >> Team, >> >> Are we supposed to be able to have our own class dictionary in python 3? >> >> If we currently cannot -- do we want to be able to? >> >> That we can have out own class dictionary in python 3 is strongly implied >> in the following at https://www.python.org/dev/peps/pep-3115/ where it >> says: >> >> """ >> >> # The metaclass invocation >> def __new__(cls, name, bases, classdict): >> # Note that we replace the classdict with a regular >> # dict before passing it to the superclass, so that we >> # don't continue to record member names after the class >> # has been created. >> result = type.__new__(cls, name, bases, dict(classdict)) >> result.member_names = classdict.member_names >> return result >> >> """ >> >> I don't understand this. As far as I can tell, no matter what class >> dictionary you pass into `type.__new__` it creates a copy of it. >> >> Am I missing something? Is this supposed to work? Is the documentation >> wrong? >> >> Thanks, >> >> Joy Diamond. >> >> Program that shows that the class dictionary created is not what we pass >> in --- Shows the actual symbol table is `dict` not `SymbolTable` >> >> class SymbolTable(dict): >> pass >> >> members = SymbolTable(a = 1) >> >> X = type('X', ((object,)), members) >> >> members['b'] = 2 >> >> print('X.a: {}'.format(X.a)) >> >> try: >> print('X.b: {}'.format(X.b)) >> except AttributeError as e: >> print('X.b: does not exist') >> >> # >> # Get the actual symbol table of `X`, bypassing the mapping proxy. >> # >> X__symbol_table = __import__('gc').get_referents(X.__dict__)[0] >> >> print('The type of the actual symbol table of X is: {} with keys: >> {}'.format( >> type(X__symbol_table), >> X__symbol_table.keys())) >> >> >> # Prints out >> # X.a: 1 >> # X.b: does not exist >> # The type of the actual symbol table of X is: with keys: >> dict_keys(['a', '__module__', '__dict__', '__weakref__', '__doc__']) >> >> >> Virus-free. >> www.avast.com >> >> <#m_3863327966708087084_m_7370047320667964513_m_603818574152564677_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Nov 3 20:41:04 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 4 Nov 2018 11:41:04 +1100 Subject: [Python-ideas] Make fnmatch.filter accept a tuple of patterns In-Reply-To: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> References: <23516.36364.586345.44292@turnbull.sk.tsukuba.ac.jp> Message-ID: <20181104004104.GT3817@ando.pearwood.info> On Sat, Nov 03, 2018 at 02:49:00AM +0900, Stephen J. Turnbull wrote: > If you're going to improve the glob module, why not use bash or zsh > extended globbing ('**', '{a,b}') as the model? This is more > powerful, and already familiar to many users. I thought it did support extended globbing? https://docs.python.org/3/library/glob.html#glob.glob But brace expansion should be a thing. For backwards compatibility reasons, we probably need a switch to turn it on, or a separate function call, or maybe a deprecation period. -- Steve From dfmoisset at gmail.com Sun Nov 4 06:43:39 2018 From: dfmoisset at gmail.com (Daniel Moisset) Date: Sun, 4 Nov 2018 11:43:39 +0000 Subject: [Python-ideas] Are we supposed to be able to have our own class dictionary in python 3? In-Reply-To: References: Message-ID: I think the documentation is correct but you misinterpreted the intent of that code. The code you're quoting, which is an example, is not about ending up with a custom dict within the instance, the intent of the author was just to captur the memeber_names list. So what it does for that is customizing the class dict in prepare(), but then in __init__ it *intentionally* converts it to a regular dict after extracting the member_names. The goal of the example is ending up with instances with regular attribute dicts but an extra member_names attributes, while I think that you're looking for to end up with a custom attribute dict (so in *your* case, you do not need to do the copying) On Sun, 4 Nov 2018 at 00:03, Amit Green wrote: > Thanks Daniel, > > I found my answer here (using your link): > https://docs.python.org/3/reference/datamodel.html#preparing-the-class-namespace > > """ > When a new class is created by type.__new__, the object provided as the > namespace parameter is copied to a new ordered mapping and the original > object is discarded. > """ > > Therefore the answer seems to be that > https://www.python.org/dev/peps/pep-3115/ needs to be updated & fixed. > > Replace the following: > > """ > > def __new__(cls, name, bases, classdict): > # Note that we replace the classdict with a regular > # dict before passing it to the superclass, so that we > # don't continue to record member names after the class > # has been created. > result = type.__new__(cls, name, bases, dict(classdict)) > result.member_names = classdict.member_names > return result > > """ > > With: > > """ > def __new__(cls, name, bases, classdict): > > result = type.__new__(cls, name, bases, classdict) > result.member_names = classdict.member_names > return result > > """ > > Removing the incorrect comments & the copying of `classdict` > > I will go file a bug report to that effect. > > Thanks, > > Joy Diamond. > > > > On Sat, Nov 3, 2018 at 7:55 PM Daniel Moisset wrote: > >> If I understood correctly what you want, it's possible with a metaclass. >> Check the __prepare__ method at >> https://docs.python.org/3/reference/datamodel.html#preparing-the-class-namespace >> and Pep 3115 >> >> On Sat, 3 Nov 2018, 20:55 Joy Diamond > >>> Team, >>> >>> Are we supposed to be able to have our own class dictionary in python 3? >>> >>> If we currently cannot -- do we want to be able to? >>> >>> That we can have out own class dictionary in python 3 is strongly >>> implied in the following at https://www.python.org/dev/peps/pep-3115/ >>> where it says: >>> >>> """ >>> >>> # The metaclass invocation >>> def __new__(cls, name, bases, classdict): >>> # Note that we replace the classdict with a regular >>> # dict before passing it to the superclass, so that we >>> # don't continue to record member names after the class >>> # has been created. >>> result = type.__new__(cls, name, bases, dict(classdict)) >>> result.member_names = classdict.member_names >>> return result >>> >>> """ >>> >>> I don't understand this. As far as I can tell, no matter what class >>> dictionary you pass into `type.__new__` it creates a copy of it. >>> >>> Am I missing something? Is this supposed to work? Is the documentation >>> wrong? >>> >>> Thanks, >>> >>> Joy Diamond. >>> >>> Program that shows that the class dictionary created is not what we pass >>> in --- Shows the actual symbol table is `dict` not `SymbolTable` >>> >>> class SymbolTable(dict): >>> pass >>> >>> members = SymbolTable(a = 1) >>> >>> X = type('X', ((object,)), members) >>> >>> members['b'] = 2 >>> >>> print('X.a: {}'.format(X.a)) >>> >>> try: >>> print('X.b: {}'.format(X.b)) >>> except AttributeError as e: >>> print('X.b: does not exist') >>> >>> # >>> # Get the actual symbol table of `X`, bypassing the mapping proxy. >>> # >>> X__symbol_table = __import__('gc').get_referents(X.__dict__)[0] >>> >>> print('The type of the actual symbol table of X is: {} with keys: >>> {}'.format( >>> type(X__symbol_table), >>> X__symbol_table.keys())) >>> >>> >>> # Prints out >>> # X.a: 1 >>> # X.b: does not exist >>> # The type of the actual symbol table of X is: with keys: >>> dict_keys(['a', '__module__', '__dict__', '__weakref__', '__doc__']) >>> >>> >>> Virus-free. >>> www.avast.com >>> >>> <#m_7342312465347859571_m_3863327966708087084_m_7370047320667964513_m_603818574152564677_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From chibicitiberiu at gmail.com Sun Nov 4 07:20:31 2018 From: chibicitiberiu at gmail.com (Tiberiu Chibici) Date: Sun, 4 Nov 2018 14:20:31 +0200 Subject: [Python-ideas] Proposal to add a key field to the bisect library Message-ID: Hi, I would like to propose an improvement to the functions in the bisect library, to add a 'key' parameter, similar to 'sorted' or other system functions. -- Chibici Tiberiu -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Nov 4 08:19:49 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 00:19:49 +1100 Subject: [Python-ideas] Proposal to add a key field to the bisect library In-Reply-To: References: Message-ID: <20181104131948.GV3817@ando.pearwood.info> On Sun, Nov 04, 2018 at 02:20:31PM +0200, Tiberiu Chibici wrote: > Hi, > I would like to propose an improvement to the functions in the bisect > library, to add a 'key' parameter, similar to 'sorted' or other system > functions. Quoting the bug tracker: This request has come up repeatedly (and been rejected) in the past. See issues 2954, 3374, 1185383, 1462228, 1451588, 1619060. https://bugs.python.org/issue4356 Unless you have something new to add, something people have missed, I don't think this idea is going to go anywhere. -- Steve From steve at pearwood.info Sun Nov 4 08:33:19 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 00:33:19 +1100 Subject: [Python-ideas] Proposal to add a key field to the bisect library In-Reply-To: <20181104131948.GV3817@ando.pearwood.info> References: <20181104131948.GV3817@ando.pearwood.info> Message-ID: <20181104133319.GW3817@ando.pearwood.info> On Mon, Nov 05, 2018 at 12:19:49AM +1100, Steven D'Aprano wrote: > On Sun, Nov 04, 2018 at 02:20:31PM +0200, Tiberiu Chibici wrote: > > Hi, > > I would like to propose an improvement to the functions in the bisect > > library, to add a 'key' parameter, similar to 'sorted' or other system > > functions. > > Quoting the bug tracker: [...] > https://bugs.python.org/issue4356 Actually, reading further along, it looks like there has been concensus that bisect ought to get a key function. Guido said: "Bingo. That clinches it. We need to add key=." also: "PS. It should also be added to heapq." and Raymond said: "I'll add a key= variant for Python 3.6." Obviously this didn't happen, but it might happen for 3.8. -- Steve From daveshawley at gmail.com Sun Nov 4 08:49:08 2018 From: daveshawley at gmail.com (David Shawley) Date: Sun, 4 Nov 2018 08:49:08 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: <5BDE169D.4040408@canterbury.ac.nz> References: <5BDE169D.4040408@canterbury.ac.nz> Message-ID: <1397FA69-0B7A-4BDF-B7C7-CBCA1A8D4F90@gmail.com> On Nov 3, 2018, at 5:43 PM, Greg Ewing wrote: > David Shawley wrote: > > I'm +1 on adding support for serializing datetime.date and > > datetime.datetime *but* I'm -1 on automatically deserializing anything that > > looks like a ISO-8601 in json.load*. The asymmetry is the only thing that > > kept me from bringing this up previously. > > This asymmetry bothers me too. It makes me think that datetime > handling belongs at a different level of abstraction, something > that knows about the structure of the data being serialised or > deserialised. > > Java's JSON libraries have a mechanism where you can give it > a class and a lump of JSON and it will figure out from runtime > type information what to do. It seems like we should be able > to do something similar using type annotations. I was thinking about trying to do something similar to what golang has done in their JSON support [1]. It is similar to what I would have done with JAXB when I was still doing Java [2]. In both cases you have a type explicitly bound to a JSON blob. The place to make this sort of change might be in the JSONDecoder and JSONEncoder classes. Personally, I would place this sort of serialization logic outside of the Standard Library -- maybe following the pattern that the rust community adopted on this very issue. In short, they separated serialization & de-serialization into a free-standing library. The best discussion that I have found is a reddit thread [3]. The new library that they built is called serde [4] and the previous code is in their deprecated library section [5]. The difference between the two approaches is that golang simply annotates the types similar to what I would expect to happen in the Python case. Then you are required to pass a list of types into the deserializer so it knows what which types are candidates for deserialization. The rust and JAXB approaches rely on type registration into the deserialiation framework. We could probably use type annotations to handle the asymmetry provided that we change the JSONDecoder interface to accept a list of classes that are candidates for deserialization or something similar. I would place this outside of the Standard Library as a generalized serialization / de-serialization framework since it feels embryonic to me. This could be a new implementation for CSV and pickle as well. Bringing the conversation back around, I'm going to continue adding a simple JSON formatting protocol that is asymmetric since it does solve a need that I and others have today. I'm not completely sure what the best way to move this forward is. I have most of an implementation working based on a simple protocol of one method. Should I: 1. Open a BPO and continue the discussion there once I have a working prototype? 2. Continue the discussion here? 3. Move the discussion to python-dev under a more appropriate subject? cheers, dave. [1]: https://golang.org/pkg/encoding/json/#Marshal [2]: https://docs.oracle.com/javaee/6/tutorial/doc/gkknj.html#gmfnu [3]: https://www.reddit.com/r/rust/comments/3v4ktz/differences_between_serde_and_rustc_serialize/ [4]: https://serde.rs [5]: https://github.com/rust-lang-deprecated/rustc-serialize -- "Syntactic sugar causes cancer of the semicolon" - Alan Perlis -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Sun Nov 4 08:53:46 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Sun, 4 Nov 2018 14:53:46 +0100 Subject: [Python-ideas] Proposal to add a key field to the bisect library In-Reply-To: <20181104131948.GV3817@ando.pearwood.info> References: <20181104131948.GV3817@ando.pearwood.info> Message-ID: <9C3CF048-7A3A-470A-BB70-45A1BA54697E@killingar.net> That link has Guido and Raymond Hettinger ending +1 and looking to either add it or writing a simple copy paste:able recipe to the docs. I mean, that's how I read it, did you read that and come to a different impression? > On 4 Nov 2018, at 14:19, Steven D'Aprano wrote: > >> On Sun, Nov 04, 2018 at 02:20:31PM +0200, Tiberiu Chibici wrote: >> Hi, >> I would like to propose an improvement to the functions in the bisect >> library, to add a 'key' parameter, similar to 'sorted' or other system >> functions. > > Quoting the bug tracker: > > This request has come up repeatedly (and been rejected) in the past. > See issues 2954, 3374, 1185383, 1462228, 1451588, 1619060. > > https://bugs.python.org/issue4356 > > Unless you have something new to add, something people have missed, I > don't think this idea is going to go anywhere. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From boxed at killingar.net Sun Nov 4 08:54:21 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Sun, 4 Nov 2018 14:54:21 +0100 Subject: [Python-ideas] Proposal to add a key field to the bisect library In-Reply-To: <20181104133319.GW3817@ando.pearwood.info> References: <20181104131948.GV3817@ando.pearwood.info> <20181104133319.GW3817@ando.pearwood.info> Message-ID: <83EC260D-9C44-47D2-8F85-F9C74813C827@killingar.net> Oh heh. Well there we go :) > On 4 Nov 2018, at 14:33, Steven D'Aprano wrote: > >> On Mon, Nov 05, 2018 at 12:19:49AM +1100, Steven D'Aprano wrote: >>> On Sun, Nov 04, 2018 at 02:20:31PM +0200, Tiberiu Chibici wrote: >>> Hi, >>> I would like to propose an improvement to the functions in the bisect >>> library, to add a 'key' parameter, similar to 'sorted' or other system >>> functions. >> >> Quoting the bug tracker: > [...] >> https://bugs.python.org/issue4356 > > Actually, reading further along, it looks like there has been concensus > that bisect ought to get a key function. Guido said: > > "Bingo. That clinches it. We need to add key=." > > also: > > "PS. It should also be added to heapq." > > and Raymond said: > > "I'll add a key= variant for Python 3.6." > > Obviously this didn't happen, but it might happen for 3.8. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From wes.turner at gmail.com Sun Nov 4 16:09:26 2018 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 4 Nov 2018 16:09:26 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: <1397FA69-0B7A-4BDF-B7C7-CBCA1A8D4F90@gmail.com> References: <5BDE169D.4040408@canterbury.ac.nz> <1397FA69-0B7A-4BDF-B7C7-CBCA1A8D4F90@gmail.com> Message-ID: Here's a JSONEncoder subclass with a default method that checks variable types in a defined sequence that includes datetime: https://gist.github.com/majgis/4200488 Passing an ordered map of (Type, fn) may or may not be any more readable than simply subclassing JSONEncoder and defining .default(). On Sunday, November 4, 2018, David Shawley wrote: > On Nov 3, 2018, at 5:43 PM, Greg Ewing > wrote: > > > David Shawley wrote: > > > I'm +1 on adding support for serializing datetime.date and > > > datetime.datetime *but* I'm -1 on automatically deserializing anything > that > > > looks like a ISO-8601 in json.load*. The asymmetry is the only thing > that > > > kept me from bringing this up previously. > > > > This asymmetry bothers me too. It makes me think that datetime > > handling belongs at a different level of abstraction, something > > that knows about the structure of the data being serialised or > > deserialised. > > > > Java's JSON libraries have a mechanism where you can give it > > a class and a lump of JSON and it will figure out from runtime > > type information what to do. It seems like we should be able > > to do something similar using type annotations. > > I was thinking about trying to do something similar to what golang has > done in > their JSON support [1]. It is similar to what I would have done with JAXB > when > I was still doing Java [2]. In both cases you have a type explicitly > bound to > a JSON blob. The place to make this sort of change might be in the > JSONDecoder and JSONEncoder classes. > > Personally, I would place this sort of serialization logic outside of the > Standard Library -- maybe following the pattern that the rust community > adopted on this very issue. In short, they separated serialization & > de-serialization into a free-standing library. The best discussion that I > have found is a reddit thread [3]. The new library that they built is > called > serde [4] and the previous code is in their deprecated library section [5]. > > The difference between the two approaches is that golang simply annotates > the > types similar to what I would expect to happen in the Python case. Then > you > are required to pass a list of types into the deserializer so it knows what > which types are candidates for deserialization. The rust and JAXB > approaches > rely on type registration into the deserialiation framework. > > We could probably use type annotations to handle the asymmetry provided > that > we change the JSONDecoder interface to accept a list of classes that are > candidates for deserialization or something similar. I would place this > outside of the Standard Library as a generalized serialization / > de-serialization framework since it feels embryonic to me. This could be a > new implementation for CSV and pickle as well. > > Bringing the conversation back around, I'm going to continue adding a > simple > JSON formatting protocol that is asymmetric since it does solve a need that > I and others have today. I'm not completely sure what the best way to move > this forward is. I have most of an implementation working based on a > simple > protocol of one method. Should I: > > 1. Open a BPO and continue the discussion there once I have a working > prototype? > > 2. Continue the discussion here? > > 3. Move the discussion to python-dev under a more appropriate subject? > > cheers, dave. > > [1]: https://golang.org/pkg/encoding/json/#Marshal > [2]: https://docs.oracle.com/javaee/6/tutorial/doc/gkknj.html#gmfnu > [3]: https://www.reddit.com/r/rust/comments/3v4ktz/differences_ > between_serde_and_rustc_serialize/ > [4]: https://serde.rs > [5]: https://github.com/rust-lang-deprecated/rustc-serialize > > -- > "Syntactic sugar causes cancer of the semicolon" - Alan Perlis > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 5 19:11:30 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 5 Nov 2018 16:11:30 -0800 Subject: [Python-ideas] dict.setdefault_call(), or API variations thereupon In-Reply-To: <20181103024911.GQ3817@ando.pearwood.info> References: <20181102033409.GI3817@ando.pearwood.info> <20181103000524.GP3817@ando.pearwood.info> <98AF148B-86A0-4C76-9BC8-0E07F5DCDC6B@killingar.net> <20181103024911.GQ3817@ando.pearwood.info> Message-ID: On Fri, Nov 2, 2018 at 7:49 PM, Steven D'Aprano wrote: > Consider the use-case where you want to pass a different default value > to the dict each time: > exactly - the "default" is per call, not the same for the whole dict. though again, how common is this? > d.setdefault(key, expensive_function(1, 2, 3)) > d.setdefault(key, expensive_function(4, 8, 16)) > d.setdefault(key, expensive_function(10, 100, 1000)) > also -- aside from performance, if expensive_function() has side effects, you may really not want to call it when you don't need to (not that that would be well-designed code, but...) and of course, you can always simply do: if key in d: val = d[key] else: val = expensive_function(4, 8, 16) d[key] = val sure, it requires looking up the key twice, but doesn't call the function unnecessarily. So it's a pretty small subset of cases, where this would be needed. defaultdict won't help, because your factory function takes no > arguments: there's no way to supply arguments for the factory. > maybe that's a feature defaultdict should have? -CHB > __missing__ won't help, because it only receives the key, not arbitrary > arguments. > > We can of course subclass dict and give it a method with the semantics > we want: > > d.my_setdefault(key, expensive_function, args=(1, 2, 3), kw={}) > > but it would be nicer and more expressive if we could tell the > interpreter "don't evaluate expensive_function(...) unless you really > need it". > > Other languages have this -- I believe it is called "Call By Need" or > "Call By Name", depending on the precise details of how it works. I call > it delayed evaluation, and Python already has it, but only in certain > special syntactic forms: > > spam and > spam or > if condition else > > There are others: e.g. the body of functions, including lambda. But > functions are kinda heavyweight to make and build and call. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 5 19:17:59 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 5 Nov 2018 16:17:59 -0800 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: On Fri, Nov 2, 2018 at 12:17 PM, Wes Turner wrote: > JSON5 supports comments in JSON. > https://github.com/json5/json5/issues/3 > and other nifty things -- any plans to support JSON5 in the stdlib json library? I think that would be great. -CHB > ... Some form of schema is necessary to avoid having to try parsing every > string value as a date time (and to specify precision: "2018" is not the > same as "2018 00:00:01") > > On Friday, November 2, 2018, Chris Barker via Python-ideas < > python-ideas at python.org> wrote: > >> On Fri, Nov 2, 2018 at 9:31 AM, M.-A. Lemburg wrote: >> >>> Serialization of those data types is not defined in the JSON standard: >>> >>> https://www.json.org/ >> >> >> That being said, ISO 8601 is a standard for datetime stamps, and a >> defacto one for JSON >> >> So building encoding of datetime into Python's json encoder would be >> pretty useful. >> >> (I would not have any automatic decoding though -- as an ISO8601 string >> would still be just a string in JSON) >> >> Could we have a "pedantic" mode for "fully standard conforming" JSON, and >> then add some extensions to the standard? >> >> As another example, I would find it very handy if the json decoder would >> respect comments in JSON (I know that they are explicitly not part of the >> standard), but they are used in other applications, particularly when JSON >> is used as a configuration language. >> >> -CHB >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206 >> ) >> 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Nov 6 01:42:01 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 6 Nov 2018 08:42:01 +0200 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: Message-ID: 06.11.18 02:17, Chris Barker via Python-ideas ????: > and other nifty things -- any plans to support JSON5 in the stdlib json > library? I think that would be great. When it be widely used official standard. There is a number of general data exchange formats more popular than JSON5. From daveshawley at gmail.com Tue Nov 6 06:46:46 2018 From: daveshawley at gmail.com (David Shawley) Date: Tue, 6 Nov 2018 06:46:46 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: References: <5BDE169D.4040408@canterbury.ac.nz> <1397FA69-0B7A-4BDF-B7C7-CBCA1A8D4F90@gmail.com> Message-ID: <667737F3-6965-463E-923C-EEE84CDCF47D@gmail.com> On Nov 4, 2018, at 12:43 PM, Michael Selik wrote: > If you're making a module >> > > On Sun, Nov 4, 2018, 5:49 AM David Shawley > Personally, I would place this sort of serialization logic outside of the > > Standard Library -- maybe following the pattern that the rust community > > adopted on this very issue. In short, they separated serialization & > > de-serialization into a free-standing library. > > You don't need a bug on the tracker or discussion on -dev to share a module > on PyPI or GitHub. When you've got something started, share a link in this > thread. I modified a branch of python/cpython to implement what I had in mind. [1] The idea is to introduce a new protocol with a single method: self.jsonformat() -> object If this method exists, then json.encoder.JSONEncoder will call it to generate a JSON representation *instead* of calling *default*. This method must return a value that json.encoder.JSONEncoder can encoder or fail in the same manner as the *default* hook. The implementation wasn't too difficult once I learned a little more about how Standard Library classes are implemented when C speedups are included. There are a few things that I haven't done: 1. I didn't guard the functionality with a flag to the JSONEncoder initializer. This was oversight but I would add one before doing a PR against python/cpython. 2. As discussed before this is an asymmetric proposal since there is no support for detecting and de-serializing in JSONDecoder. That is what I had in mind. I'm not sure how we want to spell extension methods like this one. I chose to not use a double-underscore method since I view them as ``for use by the interpreter/language'' more so than for Library-recognised methods. The name is the least of my worries. Let me know if there is any reason that I shouldn't move forward with a bpo and PR against python/cpython. - cheers, dave. [1]: https://github.com/dave-shawley/cpython/pull/2 -- "State and behavior. State and behavior. If it doesn?t bundle state and behavior in a sensible way, it should not be an object, and there should not be a class that produces it." eevee -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Tue Nov 6 14:03:54 2018 From: abedillon at gmail.com (Abe Dillon) Date: Tue, 6 Nov 2018 13:03:54 -0600 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: I don't understand the rationale behind PEP 463's rejection. Guido says, "I disagree with the position that EAFP is better than LBYL, or "generally recommended" by Python. (Where do you get that?..."; but it's been in the official Python.org docs for a while and even provides a pretty good justification for why EAFP is preferable to LBYL (aside from the language calling EAFP "common", "clean", and "fast" that's notably absent from LBYL's description): "In a multi-threaded environment, the LBYL approach can risk introducing a race condition between ?the looking? and ?the leaping?. For example, the code, if key in mapping: returnmapping[key] can fail if another thread removes *key* from *mapping* after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach." Which brings me to the question: What happens when a PEP gets rejected? Is it final? Is there a process for reviving a PEP? I personally would love to have *both* more consistent methods on built-in classes AND exception handling expressions. I think the colon (and maybe 'except' keyword) could be replaced with an exclamation point: value = lst[2] except IndexError! "No value" or just: value = lst[2] IndexError! "No value" if that appeases the people who dislike the over-use of colons. A full exception list would have to be in parentheses which get's ugly, but would also be (I would wager) a less common form: dirlist.append(os.getcwd() (AttributeError, OSError as e)! os.curdir) That might need some work. I don't know if it's compatible w/ the compiler. It may have to start with "try" or something, but it seems pretty close to a workable solution. On Wed, Oct 31, 2018 at 4:42 AM Chris Angelico wrote: > On Wed, Oct 31, 2018 at 8:24 PM Nicolas Rolin > wrote: > > > > > > As a user I always found a bit disurbing that dict pop method have a > default while list and set doesn't. > > While it is way more computationally easy to check wether a list or a > set is empty that to check if a key is in a dict, it still create a > signature difference for no real reason (having a default to a built-in in > python is pretty standard). > > It would be nice if every built-in/method of built-in type that returns > a value and raise in some case have access to a default instead of raise, > and not having to check the doc to see if it supports a default. > > > > https://www.python.org/dev/peps/pep-0463/ wants to say hi. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Nov 6 15:00:29 2018 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 7 Nov 2018 07:00:29 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: On Wed, Nov 7, 2018 at 6:04 AM Abe Dillon wrote: > > Which brings me to the question: What happens when a PEP gets rejected? Is it final? Is there a process for reviving a PEP? It remains as a permanent document. No, that isn't final; and the process for reviving a PEP basically consists of answering the objections that led to its rejection. There have been a few cases where a proposal lies dormant for years before finally being accepted (such as the matrix multiplication operator). So if you want to do that, open a new thread, and specifically respond to the issues in the PEP - anything named as a reason for rejection, and anything else that you think ought to be improved. ChrisA From steve at pearwood.info Tue Nov 6 18:16:21 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 7 Nov 2018 10:16:21 +1100 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: References: <20181031010851.GC3817@ando.pearwood.info> Message-ID: <20181106231621.GC4071@ando.pearwood.info> On Tue, Nov 06, 2018 at 01:03:54PM -0600, Abe Dillon wrote: > I don't understand the rationale behind PEP 463's rejection. Guido says, "I > disagree with the position that EAFP is better than LBYL, or "generally > recommended" by Python. (Where do you get that?..."; I can't comment on Guido's question about "generally recommended", but as for the first part, I agree: neither EAFP nor LBYL is "better", they are both appropriate under different circumstances. Sometimes one is clearer and more efficient than the other. The only time I would say that EAFP is clearly better is when LBYL introduces "Time Of Check To Time Of Use" bugs. > Which brings me to the question: What happens when a PEP gets rejected? Is > it final? Is there a process for reviving a PEP? Nothing is final-final. You can try opening a competing PEP, or take over as champion of the existing PEP (assuming Chris is willing to step aside). You ought to respond to the reasons given in the rejection. It's probably a good idea to gauge the chances of success by asking on Python-Ideas and Python-Dev first, to avoid the core devs saying "Oh give it up, it's not going to happen!" after you've wasted time trying to revise a rejected PEP. [...] > I think the colon (and maybe > 'except' keyword) could be replaced with an exclamation point: > > value = lst[2] except IndexError! "No value" [...] > if that appeases the people who dislike the over-use of colons. And I think that this is precisely the sort of syntax that prompted Guido to write many years ago that language design is not merely a problem-solving exercise. Aesthetics are important. This is not just a matter of finding an unused character or two and hammering it into the the language. That's how you get Perl, which is not a pretty language. > A full exception list would have to be in parentheses which get's ugly, but > would also be (I would wager) a less common form: > > dirlist.append(os.getcwd() (AttributeError, OSError as e)! os.curdir) > > That might need some work. I don't know if it's compatible w/ the compiler. > It may have to start with "try" or something, but it seems pretty close to > a workable solution. Seeing that syntax, the phrase that came to my mind was not so much "close to workable" and more "kill it with fire!". -- Steve From eric at trueblade.com Tue Nov 6 19:05:27 2018 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 6 Nov 2018 19:05:27 -0500 Subject: [Python-ideas] Serialization of CSV vs. JSON In-Reply-To: <667737F3-6965-463E-923C-EEE84CDCF47D@gmail.com> References: <5BDE169D.4040408@canterbury.ac.nz> <1397FA69-0B7A-4BDF-B7C7-CBCA1A8D4F90@gmail.com> <667737F3-6965-463E-923C-EEE84CDCF47D@gmail.com> Message-ID: <006e4b9a-c2f5-ff43-88eb-cd706a4012bf@trueblade.com> On 11/6/2018 6:46 AM, David Shawley wrote: > On Nov 4, 2018, at 12:43 PM, Michael Selik > wrote: > > If you're making a module > >> > > > On Sun, Nov 4, 2018, 5:49 AM David Shawley wrote: > > > Personally, I would place this sort of serialization logic outside > of the > > > Standard Library -- maybe following the pattern that the rust > community > > > adopted on this very issue. ?In short, they separated serialization & > > > de-serialization into a free-standing library. > > > > You don't need a bug on the tracker or discussion on -dev to share a > module > > on PyPI or GitHub. When you've got something started, share a link > in this > > thread. > > I modified a branch of python/cpython to implement what I had in mind. [1] > The idea is to introduce a new protocol with a single method: > > ? ? self.jsonformat() -> object > > ? ? If this method exists, then json.encoder.JSONEncoder will call it > ? ? to generate a JSON representation *instead* of calling *default*. > ? ? This method must return a value that json.encoder.JSONEncoder can > ? ? encoder or fail in the same manner as the *default* hook. > > The implementation wasn't too difficult once I learned a little more about > how Standard Library classes are implemented when C speedups are included. > There are a few things that I haven't done: > > 1. I didn't guard the functionality with a flag to the JSONEncoder > ? ?initializer. ?This was oversight but I would add one before doing a PR > ? ?against python/cpython. > > 2. As discussed before this is an asymmetric proposal since there is no > ? ?support for detecting and de-serializing in JSONDecoder. > > That is what I had in mind. ?I'm not sure how we want to spell extension > methods like this one. ?I chose to not use a double-underscore method > since > I view them as ``for use by the interpreter/language'' more so than for > Library-recognised methods. ?The name is the least of my worries. > > Let me know if there is any reason that I shouldn't move forward with > a bpo > and PR against python/cpython. > I wouldn't support putting this in the stdlib yet. We need to get real-world experience first. Modifying existing object with what's basically a new protocol seems too heavyweight for a protocol that's not all that commonly used. How about implementing this with functools.singledispatch? It's designed for exactly this sort of case: some base functionality, then per-type specialization. It would be super-easy to whip up something with datetime.date and datetime.datetime specializations. I have a long-term goal of moving parts of the stdlib to singledispatch where it makes sense (say the next generation of pprint, for example). I also think you should pass in a context object, and maybe have None signify a default context, although I'll admit I haven't thought it through yet. It will take some design iterations to get it right, once the use cases are clear. Eric > - cheers, dave. > > [1]: https://github.com/dave-shawley/cpython/pull/2 > > -- > /"State and behavior. State and behavior. If it doesn?t bundle state > and behavior in a sensible way, it should not be an object, and there > should not be a class that produces it."/eevee > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct:http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Nov 8 20:49:18 2018 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 8 Nov 2018 19:49:18 -0600 Subject: [Python-ideas] Add "default" kwarg to list.pop() In-Reply-To: <20181106231621.GC4071@ando.pearwood.info> References: <20181031010851.GC3817@ando.pearwood.info> <20181106231621.GC4071@ando.pearwood.info> Message-ID: > > > neither EAFP nor LBYL is "better", they are both appropriate under > different circumstances. Sometimes one is clearer and more efficient than the other. > One of the reasons LBYL is sometimes cleaner than EAPF is because it has more support from the language in the form of an expression which is what PEP 463 intends to change. The only time I would say that EAFP is clearly better is when LBYL > introduces "Time Of Check To Time Of Use" bugs. It also puts the intent of the logic up front instead of requiring the reader to scroll through a preamble of edge-case checks to get to what the code is actually trying to do. I think that this is precisely the sort of syntax that prompted > Guido to write many years ago that language design is not merely a > problem-solving exercise. The sort of syntax that prompted that post was "precisely" multi-line lambdas. Guido explained that he tried to throw people off the idea of multi-line lambdas by posing it as an unsolvable puzzle (which people promptly solved) when really he just thought the concept of multi-line lambdas was flawed to begin with. I agree with him on that point. The whole point of a lambda is that, in certain cases, they allow you to write more expressive code by saying in-line exactly what you want to do. It only works if that action is easily expressed in a line: button.onClick(lambda: print("Hello!")) If it's a long an complicated bit of code, the expressiveness of lambda is lost and it makes more sense to give it a name and write: button.onClick(doThatComplicatedThing) I'm not trying to solve a puzzle that implements an anti-pattern (unless you have some argument for why expressionized try-except would be an anti-pattern). Aesthetics are important. This is not just a matter of finding an unused > character or two and hammering it into the the language. Yeah. That's why I didn't just try to find an unused character and hammer it into the language without paying any regard to aesthetics. I find value = lst[2] except IndexError! "No value" to be pretty well in keeping w/ Python's aesthetics because raising an exception pretty naturally fits with an exclamation point, but of-course; aesthetics are subjective. I know what Perl is BTW, and share your distaste for it. Seeing that syntax, the phrase that came to my mind was not so much > "close to workable" and more "kill it with fire!". Funny, that's exactly how I felt about the None-aware operators, only; I didn't reject the entire concept simply because I disliked the syntax choice. I simply rejected the syntax choice because I disliked the syntax choice... On Tue, Nov 6, 2018 at 5:21 PM Steven D'Aprano wrote: > On Tue, Nov 06, 2018 at 01:03:54PM -0600, Abe Dillon wrote: > > > I don't understand the rationale behind PEP 463's rejection. Guido says, > "I > > disagree with the position that EAFP is better than LBYL, or "generally > > recommended" by Python. (Where do you get that?..."; > > I can't comment on Guido's question about "generally recommended", but > as for the first part, I agree: neither EAFP nor LBYL is "better", they > are both appropriate under different circumstances. Sometimes one is > clearer and more efficient than the other. The only time I would say > that EAFP is clearly better is when LBYL introduces "Time Of Check To > Time Of Use" bugs. > > > > Which brings me to the question: What happens when a PEP gets rejected? > Is > > it final? Is there a process for reviving a PEP? > > Nothing is final-final. You can try opening a competing PEP, or take > over as champion of the existing PEP (assuming Chris is willing to step > aside). You ought to respond to the reasons given in the rejection. > > It's probably a good idea to gauge the chances of success by asking on > Python-Ideas and Python-Dev first, to avoid the core devs saying "Oh > give it up, it's not going to happen!" after you've wasted time trying > to revise a rejected PEP. > > > [...] > > I think the colon (and maybe > > 'except' keyword) could be replaced with an exclamation point: > > > > value = lst[2] except IndexError! "No value" > [...] > > if that appeases the people who dislike the over-use of colons. > > And I think that this is precisely the sort of syntax that prompted > Guido to write many years ago that language design is not merely a > problem-solving exercise. Aesthetics are important. This is not just a > matter of finding an unused character or two and hammering it into the > the language. That's how you get Perl, which is not a pretty language. > > > > A full exception list would have to be in parentheses which get's ugly, > but > > would also be (I would wager) a less common form: > > > > dirlist.append(os.getcwd() (AttributeError, OSError as e)! os.curdir) > > > > That might need some work. I don't know if it's compatible w/ the > compiler. > > It may have to start with "try" or something, but it seems pretty close > to > > a workable solution. > > Seeing that syntax, the phrase that came to my mind was not so much > "close to workable" and more "kill it with fire!". > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danish.bluecheese at gmail.com Fri Nov 9 17:54:50 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Fri, 9 Nov 2018 14:54:50 -0800 Subject: [Python-ideas] Relative Imports Message-ID: Hi all, Im tired of not being able to make relative imports freely. Now trying to develop a module which enable any project to use relative imports once it is loaded. Anybody interested in? Best, -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Nov 9 18:16:35 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Nov 2018 10:16:35 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: References: Message-ID: <20181109231635.GL4071@ando.pearwood.info> On Fri, Nov 09, 2018 at 02:54:50PM -0800, danish bluecheese wrote: > Im tired of not being able to make relative imports freely. Python has supported relative imports for a while now. https://docs.python.org/3/tutorial/modules.html#intra-package-references What do you mean? -- Steve From danish.bluecheese at gmail.com Fri Nov 9 18:20:52 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Fri, 9 Nov 2018 15:20:52 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: <20181109231635.GL4071@ando.pearwood.info> References: <20181109231635.GL4071@ando.pearwood.info> Message-ID: It supports, but whenever you get multiple folders there is no clean solution. Either there are some sys.path hacks or running things as modules in some cases. These are not pleasant at all. I think we can come up with something better. Interested in? On Fri, Nov 9, 2018 at 3:17 PM Steven D'Aprano wrote: > On Fri, Nov 09, 2018 at 02:54:50PM -0800, danish bluecheese wrote: > > > Im tired of not being able to make relative imports freely. > > Python has supported relative imports for a while now. > > https://docs.python.org/3/tutorial/modules.html#intra-package-references > > What do you mean? > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Nov 9 18:39:15 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Nov 2018 10:39:15 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> Message-ID: <20181109233915.GM4071@ando.pearwood.info> On Fri, Nov 09, 2018 at 03:20:52PM -0800, danish bluecheese wrote: > It supports, but whenever you get multiple folders there is no clean > solution. What do you mean? -- Steve From danish.bluecheese at gmail.com Fri Nov 9 18:51:46 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Fri, 9 Nov 2018 15:51:46 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: <20181109233915.GM4071@ando.pearwood.info> References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> Message-ID: ??? src ??? __init__.py ??? main.py ??? test ??? __init__.py ??? test_main.py assume the structure above. To be able to use relative imports with such fundamental structure either i can go for sys.path hacks or could run as a module from one further level parent. I do not like this :D want to be able to use relational imports freely as soon as I provide correct relational path. Please let me know, if it is not clear on any aspect. Thank you. Regards. On Fri, Nov 9, 2018 at 3:39 PM Steven D'Aprano wrote: > On Fri, Nov 09, 2018 at 03:20:52PM -0800, danish bluecheese wrote: > > It supports, but whenever you get multiple folders there is no clean > > solution. > > What do you mean? > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Nov 9 19:00:16 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 10 Nov 2018 11:00:16 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> Message-ID: On Sat, Nov 10, 2018 at 10:52 AM danish bluecheese wrote: > > ??? src > ??? __init__.py > ??? main.py > ??? test > ??? __init__.py > ??? test_main.py > > assume the structure above. To be able to use relative imports with such fundamental structure either i can go for sys.path hacks or could run as a module from one further level parent. > I do not like this :D want to be able to use relational imports freely as soon as I provide correct relational path. > > Please let me know, if it is not clear on any aspect. The main thing that's not clear here is what you're proposing. What is the idea under discussion? ChrisA From steve at pearwood.info Fri Nov 9 19:10:43 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Nov 2018 11:10:43 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> Message-ID: <20181110001043.GO4071@ando.pearwood.info> On Fri, Nov 09, 2018 at 03:51:46PM -0800, danish bluecheese wrote: > ??? src > ??? __init__.py > ??? main.py > ??? test > ??? __init__.py > ??? test_main.py > > assume the structure above. To be able to use relative imports with such > fundamental structure either i can go for sys.path hacks or could run as a > module from one further level parent. I don't understand. From the top level of the package, running inside either __init__ or main, you should be able to say: from . import test from .test import test_main >From the test subpackage, you should be able to say: from .. import main to get the src/main module, or from . import test_main to get the test/test_main module from the test/__init__ module. (Disclaimer: I have not actually run the above code to check that it works, beyond testing that its not a SyntaxError.) What *precisely* is the problem you are trying to solve, and your proposed solution? -- Steve From ashafer01 at gmail.com Fri Nov 9 19:22:57 2018 From: ashafer01 at gmail.com (Alex Shafer) Date: Fri, 9 Nov 2018 17:22:57 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 144, Issue 24 In-Reply-To: References: Message-ID: I think this about the limitation to . and .. possibly? > > Message: 6 > Date: Sat, 10 Nov 2018 11:00:16 +1100 > From: Chris Angelico > To: python-ideas > Subject: Re: [Python-ideas] Relative Imports > Message-ID: > < > CAPTjJmrtApAXib4wM6FEh-XkpU_cf5RNX2TR0VzCAGgxPPgf5A at mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Sat, Nov 10, 2018 at 10:52 AM danish bluecheese > wrote: > > > > ??? src > > ??? __init__.py > > ??? main.py > > ??? test > > ??? __init__.py > > ??? test_main.py > > > > assume the structure above. To be able to use relative imports with such > fundamental structure either i can go for sys.path hacks or could run as a > module from one further level parent. > > I do not like this :D want to be able to use relational imports freely > as soon as I provide correct relational path. > > > > Please let me know, if it is not clear on any aspect. > > The main thing that's not clear here is what you're proposing. What is > the idea under discussion? > > ChrisA > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 144, Issue 24 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danish.bluecheese at gmail.com Fri Nov 9 19:32:47 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Fri, 9 Nov 2018 16:32:47 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: <20181110001043.GO4071@ando.pearwood.info> References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> Message-ID: you are right on the lines you mentioned. Those are all working if i run it as a module which i do every time. This is somewhat unpleasant to me, especially while developing something and trying to test it quickly. I just want to be able to use same relative imports and run single file with `python3 test_main.py` for example. Running files as modules every time is tiring. This is my problem. I could not come up with a concrete solution idea yet i am thinking on it. Open to suggestions. Thank you all for your help! On Fri, Nov 9, 2018 at 4:16 PM Steven D'Aprano wrote: > On Fri, Nov 09, 2018 at 03:51:46PM -0800, danish bluecheese wrote: > > ??? src > > ??? __init__.py > > ??? main.py > > ??? test > > ??? __init__.py > > ??? test_main.py > > > > assume the structure above. To be able to use relative imports with such > > fundamental structure either i can go for sys.path hacks or could run as > a > > module from one further level parent. > > I don't understand. From the top level of the package, running inside > either __init__ or main, you should be able to say: > > from . import test > from .test import test_main > > From the test subpackage, you should be able to say: > > from .. import main > > to get the src/main module, or > > from . import test_main > > to get the test/test_main module from the test/__init__ module. > > (Disclaimer: I have not actually run the above code to check that it > works, beyond testing that its not a SyntaxError.) > > What *precisely* is the problem you are trying to solve, and your > proposed solution? > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Fri Nov 9 20:41:34 2018 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 9 Nov 2018 23:41:34 -0200 Subject: [Python-ideas] Python octal escape character encoding "wats" Message-ID: I just saw some document which reminded me that strings with a backslash followed by 3 octal digits. When a backslash is followed by 3 octal digits, that means a character with the corresponding codepoint and all is well. The "valid scenaario": In [42]: "\777" Out[42]: '?' The problem is when you have just two valid octal digits In [40]: "\778" Out[40]: '?8' Which is ambiguous at least -- why is this not "\x07" "77" for example? (0ct(77) actually corresponds to the "?" (63) character) Or...when the first digit is not valid as octal - that is: In [41]: "\877" Out[41]: '\\877' And then when the second digit is not valid octal: In [43]: "\797" Out[43]: '\x0797' WAT? So, between the possibly ambiguous scenario with two octal digits followed by a no-octal digit, and the complety unexpected expansion to a 4-hexadecimal digit codepoint in the last case, what do you say of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 octal digits, and yield a syntax error for that from Python 3.9 (or 3.10) on? Best regards, js -><- From rosuav at gmail.com Fri Nov 9 20:56:07 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 10 Nov 2018 12:56:07 +1100 Subject: [Python-ideas] Python octal escape character encoding "wats" In-Reply-To: References: Message-ID: On Sat, Nov 10, 2018 at 12:42 PM Joao S. O. Bueno wrote: > > I just saw some document which reminded me that strings with a > backslash followed by 3 octal digits. When a backslash is followed by > 3 octal digits, that means a character with the corresponding > codepoint and all is well. > > The "valid scenaario": > > In [42]: "\777" > Out[42]: '?' > > The problem is when you have just two valid octal digits > > In [40]: "\778" > Out[40]: '?8' > > Which is ambiguous at least -- why is this not "\x07" "77" for > example? (0ct(77) actually corresponds to the "?" (63) character) Not ambiguous. It takes as many valid octal digits as it can. https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals \ooo ==> Character with octal value ooo Note 1: As in Standard C, up to three octal digits are accepted. "Up to" means that one or two digits can also define a character. For obvious reasons, it has to take digits greedily (otherwise "\777" would be "\x07" followed by "77"), and it's not an error to have fewer digits. Permitting a single digit means that "\0" means the NUL character, which is often convenient. > And then when the second digit is not valid octal: > In [43]: "\797" > Out[43]: '\x0797' > WAT? > > So, between the possibly ambiguous scenario with two octal digits > followed by a no-octal digit, and the complety unexpected expansion > to a 4-hexadecimal digit codepoint in the last case You may possibly be misinterpreting the last result. It's exactly the same as the previous ones. >>> list("\797") ['\x07', '9', '7'] The octal escape grabs as many digits as it can, and when it finds a character in the literal that isn't a valid octal digit (same whether it's a '9' or a 'q'), it stops. The remaining characters have no special meaning; this does not become four hex digits. A "\xNN" escape in Python must be exactly two digits, no more and no less. > what do you say > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 > octal digits, and yield a syntax error for that from Python 3.9 (or > 3.10) on? Nope. Would break code for no good reason. ChrisA From jsbueno at python.org.br Fri Nov 9 21:04:22 2018 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 10 Nov 2018 00:04:22 -0200 Subject: [Python-ideas] Python octal escape character encoding "wats" In-Reply-To: References: Message-ID: On Fri, 9 Nov 2018 at 23:56, Chris Angelico wrote: > >>> list("\797") > ['\x07', '9', '7'] > The octal escape grabs as many digits as it can, and when it finds a > character in the literal that isn't a valid octal digit (same whether > it's a '9' or a 'q'), it stops. The remaining characters have no > special meaning; this does not become four hex digits. A "\xNN" escape > in Python must be exactly two digits, no more and no less. Yes- I had just figured this out before going to sleep, and was comming back that although strange, this was no motive for breaking stuff up. Thank your for the lengthy reply!! > > On Sat, Nov 10, 2018 at 12:42 PM Joao S. O. Bueno wrote: > > > > I just saw some document which reminded me that strings with a > > backslash followed by 3 octal digits. When a backslash is followed by > > 3 octal digits, that means a character with the corresponding > > codepoint and all is well. > > > > The "valid scenaario": > > > > In [42]: "\777" > > Out[42]: '?' > > > > The problem is when you have just two valid octal digits > > > > In [40]: "\778" > > Out[40]: '?8' > > > > Which is ambiguous at least -- why is this not "\x07" "77" for > > example? (0ct(77) actually corresponds to the "?" (63) character) > > Not ambiguous. It takes as many valid octal digits as it can. > > https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals > > \ooo ==> Character with octal value ooo > Note 1: As in Standard C, up to three octal digits are accepted. > > "Up to" means that one or two digits can also define a character. For > obvious reasons, it has to take digits greedily (otherwise "\777" > would be "\x07" followed by "77"), and it's not an error to have fewer > digits. Permitting a single digit means that "\0" means the NUL > character, which is often convenient. > > > And then when the second digit is not valid octal: > > In [43]: "\797" > > Out[43]: '\x0797' > > WAT? > > > > So, between the possibly ambiguous scenario with two octal digits > > followed by a no-octal digit, and the complety unexpected expansion > > to a 4-hexadecimal digit codepoint in the last case > > You may possibly be misinterpreting the last result. It's exactly the > same as the previous ones. > > >>> list("\797") > ['\x07', '9', '7'] > > The octal escape grabs as many digits as it can, and when it finds a > character in the literal that isn't a valid octal digit (same whether > it's a '9' or a 'q'), it stops. The remaining characters have no > special meaning; this does not become four hex digits. A "\xNN" escape > in Python must be exactly two digits, no more and no less. > > > what do you say > > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 > > octal digits, and yield a syntax error for that from Python 3.9 (or > > 3.10) on? > > Nope. Would break code for no good reason. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Fri Nov 9 23:19:09 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Nov 2018 15:19:09 +1100 Subject: [Python-ideas] Python octal escape character encoding "wats" In-Reply-To: References: Message-ID: <20181110041908.GP4071@ando.pearwood.info> On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote: > Not ambiguous. It takes as many valid octal digits as it can. What is the rationale for that? Hex escapes don't. My guess is, "Because that's what C does". And C probably does it because "Dennis Ritchie wanted to minimize the number of keypresses when he was typing" :-) > "Up to" means that one or two digits can also define a character. For > obvious reasons, it has to take digits greedily (otherwise "\777" > would be "\x07" followed by "77"), and it's not an error to have fewer > digits. In hindsight, I think we should have insisted that octal escapes must always be three digits, just as hex escapes are always two. The status quo has too much magical "Do What I Mean" in it for my liking: py> '\509\51' # pair of brackets surrounding a nine '(9)' py> '\507\51' # pair of brackets surrounding a seven 'G)' Dammit Python, that's not what I meant! > > what do you say > > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 > > octal digits, and yield a syntax error for that from Python 3.9 (or > > 3.10) on? > > Nope. Would break code for no good reason. There's a good reason: to make the behaviour more sensible and less confusing and have fewer "oops, that's not what I wanted" bugs. But we should have made that change for 3.0. Now, I agree: it would be breakage where the benefit doesn't outweigh the cost. Maybe in Python 5000. In the meantime, one or two digit octal escapes ought to be a linter warning. -- Steve From rosuav at gmail.com Fri Nov 9 23:39:36 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 10 Nov 2018 15:39:36 +1100 Subject: [Python-ideas] Python octal escape character encoding "wats" In-Reply-To: <20181110041908.GP4071@ando.pearwood.info> References: <20181110041908.GP4071@ando.pearwood.info> Message-ID: On Sat, Nov 10, 2018 at 3:19 PM Steven D'Aprano wrote: > > On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote: > > > Not ambiguous. It takes as many valid octal digits as it can. > > What is the rationale for that? Hex escapes don't. Irrelevant to whether it's ambiguous or not. > > "Up to" means that one or two digits can also define a character. For > > obvious reasons, it has to take digits greedily (otherwise "\777" > > would be "\x07" followed by "77"), and it's not an error to have fewer > > digits. > > In hindsight, I think we should have insisted that octal escapes must > always be three digits, just as hex escapes are always two. The status > quo has too much magical "Do What I Mean" in it for my liking: > > py> '\509\51' # pair of brackets surrounding a nine > '(9)' > py> '\507\51' # pair of brackets surrounding a seven > 'G)' > > Dammit Python, that's not what I meant! How often do you actually do that with octal escapes, though? Ever had actual real-world situations where this comes up? I don't recall *ever* coming across a problem where sometimes I have an octal escape followed by a nine, and other times by a different digit. I also do not recall often wanting an octal escape followed by a digit, even without that confusion. > > > what do you say > > > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 > > > octal digits, and yield a syntax error for that from Python 3.9 (or > > > 3.10) on? > > > > Nope. Would break code for no good reason. > > There's a good reason: to make the behaviour more sensible and less > confusing and have fewer "oops, that's not what I wanted" bugs. But we > should have made that change for 3.0. Now, I agree: it would be breakage > where the benefit doesn't outweigh the cost. We can debate whether it would be, in the abstract, better to mandate exactly three digits, or to allow fewer. But I think we're all agreed that it is nowhere _near_ enough of a problem to justify the breakage. I perhaps exaggerated slightly in saying "no" good reason, but certainly not enough to consider the change. > Maybe in Python 5000. > > In the meantime, one or two digit octal escapes ought to be a linter > warning. Maybe. Or just have the editor colour the octal escape differently; that way, the end of the colour will tell you if the language is misinterpreting your intentions. Either way, yeah, something that tooling can help with. ChrisA From Richard at Damon-Family.org Sat Nov 10 08:08:59 2018 From: Richard at Damon-Family.org (Richard Damon) Date: Sat, 10 Nov 2018 08:08:59 -0500 Subject: [Python-ideas] Python octal escape character encoding "wats" In-Reply-To: <20181110041908.GP4071@ando.pearwood.info> References: <20181110041908.GP4071@ando.pearwood.info> Message-ID: On 11/9/18 11:19 PM, Steven D'Aprano wrote: > On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote: > >> Not ambiguous. It takes as many valid octal digits as it can. > What is the rationale for that? Hex escapes don't. > > My guess is, "Because that's what C does". And C probably does it > because "Dennis Ritchie wanted to minimize the number of keypresses when > he was typing" :-) > > >> "Up to" means that one or two digits can also define a character. For >> obvious reasons, it has to take digits greedily (otherwise "\777" >> would be "\x07" followed by "77"), and it's not an error to have fewer >> digits. > In hindsight, I think we should have insisted that octal escapes must > always be three digits, just as hex escapes are always two. The status > quo has too much magical "Do What I Mean" in it for my liking: > > py> '\509\51' # pair of brackets surrounding a nine > '(9)' > py> '\507\51' # pair of brackets surrounding a seven > 'G)' > > Dammit Python, that's not what I meant! > Since the 'normal' usage for octal escapes in C (which came long before hex escapes) was to input control characters, the most likely being \0, and the next most likely \33 (Escape), and by far most being in the range of \0 - \37, requiring 3 all the time would be very inconvenient. You would never use the escape for a printable character and interleave it with other printable characters. Yes, if you are putting in codes for a string of arbitrary byte values using escapes, then you would likely always use 3 digits for readability, but then you don't have the ambiguity as EVERY code is an escape. The one case where you might get the problem is if you had a control character (like escape) followed by a digit between 0 and 7, you needed to expand the escape to 3 digits. This was just one of the traps you learned to live with (and it seemed that terminal escape codes seemed to avoid that issue by normally following the escape character with a non-digit character.) -- Richard Damon From erotemic at gmail.com Sat Nov 10 20:36:52 2018 From: erotemic at gmail.com (Jonathan Crall) Date: Sat, 10 Nov 2018 20:36:52 -0500 Subject: [Python-ideas] Proposing additions to the standard library Message-ID: I'm interested in proposing several additions to the Python standard library, and I would like more information on the procedure for doing so. Are all additions done via a PEP? If not what is the procedure. If so, I've read that the first step was to email this board and get feedback. I have a library called `ubelt` that contains several tools that I think might be worthy of adding to the standard library. Here's my bullet point pitch: - Python is batteries included. Ubelt contains extra batteries. function are extra batteries. - Most function in ubelt are fast. All 222 tests takes 7.33 seconds. - Ubelt has 100% test coverage (sans `# nocover` locations). - I'm only championing a subset of the functions in ubelt. There are certainly functions in there that do not belong in the standard library. - I have a Jupyter notebook that give a demo of some select functions (not necessarily the same as the ones proposed here): https://github.com/Erotemic/ubelt/blob/master/docs/notebooks/Ubelt%20Demo.ipynb - I do have documentation (mostly in docstrings) and in the docs folder, but I've been having trouble auto-updating read-the-docs. Here is the link anyway: https://ubelt.readthedocs.io/en/latest/ Here is a tentative list of interesting functions. Hopefully the names are descriptive (if not, see docstrings: https://github.com/Erotemic/ubelt) ub.cmd ub.compressuser ub.group_items ub.dict_hist ub.find_duplicates ub.AutoDict ub.import_module_from_path ub.import_module_from_name ub.modname_to_modpath, ub.modpath_to_modname ub.ProgIter ub.ensuredir ub.expandpath almost everything in util_list: allsame, argmax, argmin, argsort, argunique, chunks, flatten, iter_window, take, unique These functions might be worth modifying into dictionary methods: ub.dict_subset ub.dict_take ub.map_vals ub.map_keys ub.Timerit ub.Timer Because I built the library, I tend to like all the functions. Its difficult to decide if they are stdlib worthy, so there might be some false positives / negatives. I'm on the fence about: CacheStamp, Cacher, NoParam, argflag, argval, dzip, delete, hash_data, hash_file, memoize, memoize_method, NiceRepr, augpath, userhome, ensure_app_cache_dir, ensure_app_resource_dir, find_exe, find_path, get_app_cache_dir, get_app_resource_dir, platform_cache_dir, platform_resource_dir, CaptureStdout, codeblock, ensure_unicode, hzcat, indent, OrderedSet Its my hope that some of these are actually useful. Let me know any of the following: what you think, if there are any questions, if something else needs to be done, or what the next steps are. -- -Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Nov 10 21:14:01 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 11 Nov 2018 13:14:01 +1100 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: Message-ID: <20181111021400.GU4071@ando.pearwood.info> On Sat, Nov 10, 2018 at 08:36:52PM -0500, Jonathan Crall wrote: > I'm interested in proposing several additions to the Python standard > library, and I would like more information on the procedure for doing so. > Are all additions done via a PEP? Not necessarily. Small, obvious enhancements can go straight to the bug tracker. The tricky part is deciding what is "obvious". Sometimes there's a good, useful function than doesn't get added because there's no reasonable place to put it. For example, a "flatten" function has been talked about since Python 1.x days, and we still don't have a standard solution for it, because (1) it isn't clear *precisely* what it should do, and (2) it isn't clear where it should go. Given that once something gets added to the std lib, it is hard to remove it or even rename it, its better to be conservative about adding things and leave it to third party libraries to cover the gaps. > If not what is the procedure. If so, I've > read that the first step was to email this board and get feedback. That's a good idea. If the enhancement request isn't both small and obvious, or is the least bit controversial, you'll usually be sent back here. > I have a library called `ubelt` that contains several tools that I think > might be worthy of adding to the standard library. Generally speaking, we don't typically add grab-bags of random utility functions. There is no "utilities" or "toolbox" module in the std lib. [...] > Here is a tentative list of interesting functions. Hopefully the names are > descriptive (if not, see docstrings: https://github.com/Erotemic/ubelt) Sorry, some of these aren't descriptive enough, and if you're trying to make a pitch for these features, you ought to give at least a one-sentence explanation of them here in the email. You will lose half your audience as soon as you ask them to click through to a link, and even if they do, that risks splitting the discussion across two places. My advice is to collate the functions you want to add into groups of related functionality, find the class or module in the std lib where you think they belong, and begin a new thread for each group. E.g. "New dict methods", "New importlib functions". -- Steve From erotemic at gmail.com Sat Nov 10 21:56:18 2018 From: erotemic at gmail.com (Jonathan Crall) Date: Sat, 10 Nov 2018 21:56:18 -0500 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: <20181111021400.GU4071@ando.pearwood.info> References: <20181111021400.GU4071@ando.pearwood.info> Message-ID: @Steve, this is just the sort of feedback I was looking for. Small and conservative additions make sense. I definitely think that some functions do fit into existing stdlib modules. For instance, AutoDict might go in collections. Sorry, some of these aren't descriptive enough, and if you're trying to > make a pitch for these features. ... My advice is to collate the functions you want to add into groups of > related functionality. Makes sense. I figured that my original list had too may entries to do that for, or else the email would explode. Separating each small group into its own thread will allow me to describe the specific function without writing a novel. Sometimes there's a good, useful function than doesn't get added because > there's no reasonable place to put it. For example, a "flatten" function > has been talked about since Python 1.x days, and we still don't have a > standard solution for it, because (1) it isn't clear *precisely* what it > should do, and (2) it isn't clear where it should go. The flatten example is good to know about. Is there a link to this discussion or a summary of it? I would think flatten could go in itertools, but clearly there must some reason why its not there. I imagine the duplication with it.chain.from_iter + "There should be one-- and preferably only one --obvious way to do it."? As for what it should do, I'm guessing the controversy was over flattening one level vs all levels? That makes sense and is good to know. I guess I won't pick `flatten` as one of my first functions to pick for a writeup. On a similar note, do you (or anyone else) have an intuition for which of these functions --- judging by name only (so you don't have to click any links) --- might be the least controversial? I'm not very good at judging controversy, which is one of the main reasons for this initial email. Maybe `expandpath` to os.path? Or perhaps start with ub.modname_to_modpath and ub.modpath_to_modname to importlib? Maybe some of the dict-methods? Perhaps I'm overestimating the clear usefulness of any of these functions to the stdlib? On Sat, Nov 10, 2018 at 9:14 PM Steven D'Aprano wrote: > On Sat, Nov 10, 2018 at 08:36:52PM -0500, Jonathan Crall wrote: > > I'm interested in proposing several additions to the Python standard > > library, and I would like more information on the procedure for doing so. > > Are all additions done via a PEP? > > Not necessarily. Small, obvious enhancements can go straight to the > bug tracker. The tricky part is deciding what is "obvious". > > Sometimes there's a good, useful function than doesn't get added because > there's no reasonable place to put it. For example, a "flatten" function > has been talked about since Python 1.x days, and we still don't have a > standard solution for it, because (1) it isn't clear *precisely* what it > should do, and (2) it isn't clear where it should go. > > Given that once something gets added to the std lib, it is hard to > remove it or even rename it, its better to be conservative about adding > things and leave it to third party libraries to cover the gaps. > > > > If not what is the procedure. If so, I've > > read that the first step was to email this board and get feedback. > > That's a good idea. If the enhancement request isn't both small and > obvious, or is the least bit controversial, you'll usually be sent back > here. > > > > I have a library called `ubelt` that contains several tools that I think > > might be worthy of adding to the standard library. > > Generally speaking, we don't typically add grab-bags of random utility > functions. There is no "utilities" or "toolbox" module in the std lib. > > > [...] > > Here is a tentative list of interesting functions. Hopefully the names > are > > descriptive (if not, see docstrings: https://github.com/Erotemic/ubelt) > > Sorry, some of these aren't descriptive enough, and if you're trying to > make a pitch for these features, you ought to give at least a > one-sentence explanation of them here in the email. You will lose half > your audience as soon as you ask them to click through to a link, and > even if they do, that risks splitting the discussion across two places. > > My advice is to collate the functions you want to add into groups of > related functionality, find the class or module in the std lib where you > think they belong, and begin a new thread for each group. E.g. "New dict > methods", "New importlib functions". > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- -Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholasharrison222 at gmail.com Sun Nov 11 00:58:02 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Sat, 10 Nov 2018 22:58:02 -0700 Subject: [Python-ideas] Range and slice syntax Message-ID: I'm aware that syntax for ranges and slices has been discussed a good amount over the years, but I wanted to float an idea out there to see if it hasn't been considered before. It's not really original. Rather, it's a combination of a couple parts of Python, and I find it fascinatingly-consistent with the rest of the language. This will look similar to PEP 204, but there are some important differences and clarifications. (start:stop:step) Meet a range/slice object. Parentheses are required. (Its syntax in this regard follows exactly the same rules as a generator expression.) I say both range and slice because it can be used in either role. On the one hand, it is iterable and functions exactly like range(start, stop, step) in those contexts. On the other, it can also be passed into list indexing exactly like slice(start, stop, step). This is a proposal that range and slice are really the same thing, just in different contexts. Why is it useful? I at least find its syntax to be simple, intuitive, and concise -- more so than the range(...) or slice(...) alternatives. It's quite obvious for an experienced Python user and just as simple to pick up as slice notation for a beginner (since it *is* slice notation). It condenses and clears up sometimes-cumbersome range expressions. A couple examples: sum(1:6) # instead of sum(range(1, 6)) list(1:6) for i in (1:6): print(i**2) (i**2 for i in (1:6)) It also makes forming reusable slices clearer and easier: my_slice = (:6:2) # instead of slice(None, 6, 2) my_list[my_slice] It has a couple of siblings that should be obvious (think list or set comprehension): [start:stop:step] # gives a list {start:stop:step} # gives a set This is similar to passing a range/slice object into the respective constructor: [1:6] # list(1:6) or [1, 2, 3, 4, 5] {1:6} # set(1:6) or {1, 2, 3, 4, 5} Note that the parentheses aren't needed when it is the only argument of a function call or is the only element within brackets or braces. It takes on its respective roles for these bracket and brace cases, just like comprehensions. This also gives rise to the normal slice syntax: my_list[1:6:2] # What is inside the brackets is a slice object. my_list[(1:6:2)] # Equivalent. The parentheses are valid but unnecessary. So here's the part that requires a little more thought. Any of the values may be omitted and in the slice context the behavior has no changes from what it already does: start and stop default to the beginning or end of the list depending on direction and the step defaults to 1. In the range context, we simply repeat these semantics, but noting that there is no longer a beginning or end of a list. Step defaults to 1 (just like range or slice). Start defaults to 0 when counting up and -1 when counting down (just like slice). If stop is omitted, the object will act like an itertools.count object, counting indefinitely. I have found infinite iteration to be a natural and oft-desired extension to a range object, but I can understand that some may want it to remain separate and pure within itertools. I also admit that the ability to form an infinite list with only three characters can be a scary thought (though we are all adults here, right? ;). Normally you have to take a couple extra keystrokes: from itertools import count list(count()) # rather than just [:] If that is the case, then raising an error when iter() is called on a range/slice object with no stop value could be another acceptable course of action. The syntax will still be left valid. And that's mainly it. Slice is iterable or range is "indexable" and the syntax can be used anywhere successive values are desired. If you want to know what it does or how to use it in some case, just think, "what would a slice object do?" or "what would a range object do?" or "how would I write a generator expression/list comprehension here?". Here are a few more examples: for i in (:5): # 5 elements 0 to 4, i.e. range(5) print(i**2) for i in (1:): # counts up from one for as long as you want, i.e. count(1) print(i**2) if i == 5: break it = iter(:) # a convenient usage for an infinite counter next(it) ' '.join(map(str, (:5:2))) # gives '0 2 4' [(:5), (5:10)] # list of range/slice objects [[:5], [5:10]] # list of lists [*(:5), *(5:10)] # uses unpacking to get flat list [*[:5], *[5:10]] # same unpacking to get flat list Otherwise you'd have to do: [list(range(5)), list(range(5, 10))] # list of lists [*range(5), *range(5, 10)] # flat list Tuples: tuple(1:6:2) # (1, 3, 5) *(1:6:2), # same I don't actually have experience developing the interpreter and underlying workings of Python, so I don't know how much of a change this requires. I thought it might be possible since the constructs already exist in the language. They just haven't been unified yet. I also realize that there are a few other use-cases that need to be ironed out. The syntax might also be too minimal in some cases to be obvious. One of the trickiest things may be what it will be called, since the current language has the two different terms. In the end it's just another range/slice idea, and the idea has probably already been proposed sometime in the past few decades, but what thoughts are there? - Nicholas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Nov 11 01:00:59 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 11 Nov 2018 17:00:59 +1100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: On Sun, Nov 11, 2018 at 4:59 PM Nicholas Harrison wrote: > It has a couple of siblings that should be obvious (think list or set comprehension): > > [start:stop:step] # gives a list > {start:stop:step} # gives a set > Be careful of this last one. If you omit the step, it looks like this: {start:stop} which is a dictionary display. ChrisA From steve at pearwood.info Sun Nov 11 04:35:38 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 11 Nov 2018 20:35:38 +1100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: <20181111093537.GW4071@ando.pearwood.info> On Sat, Nov 10, 2018 at 10:58:02PM -0700, Nicholas Harrison wrote: [...] > (start:stop:step) > > > Meet a range/slice object. Parentheses are required. (Its syntax in this > regard follows exactly the same rules as a generator expression.) I say > both range and slice because it can be used in either role. Ranges and slices are conceptually different things. Even if they have similar features, they are very different: - range is a lazy sequence, which produces its values on demand; - it supports (all? most?) of the Sequence ABC, including membership testing and len(); - but its values are intentionally limited to integers; - slice objects, on the other hand, are an abstraction referring to a context-sensitive sequence of abstract indices; - those indices can be anything you like: py> s = slice("Surprise!", range(1, 100, 3), slice(None)) py> s.start 'Surprise!' py> s.stop range(1, 100, 3) py> s.step slice(None, None, None) - they don't support membership testing, len() or other Sequence operations; - most importantly, because they are context-sensitive, we don't even know how many indexes are included in a slice until we know what we're slicing. That last item is why slice objects have an indices() method that takes a mandatory length parameter. If slices were limited to single integer indices, then there would be an argument that they are redundant and we could use range objects in their place; but they aren't. [...] > Why is it useful? I at least find its syntax to be simple, intuitive, and > concise -- more so than the range(...) or slice(...) alternatives. Concise, I will grant, but *intuitive*? I have never forgot the first time I saw Python code, and after being told over and over again how "intuitive" it was I was utterly confused by these mysterious list[:] and list[1:] and similar expressions. I had no idea what they were or what they were supposed to do. I didn't even have a name I could put to them. At least range(x) was something I could *name* and ask sensible questions about. I didn't even have a name for this strange square-bracket and colon syntax, and no context for understanding what it did. There's surely few things in Python more cryptic than mylist = mylist[:] until you've learned what slicing does and how it operates. Your proposal has the same disadvantages: it is cryptic punctuation that is meaningless until the reader has learned what it means, without even an obvious name they can refer to. Don't get me wrong: slice syntax is great, *once you have learned it*. But it is a million miles from intuitive. If this proposal is a winner, it won't be because it will make Python easier for beginners. [...] > sum(1:6) # instead of sum(range(1, 6)) That looks like you tried to take a slice of a sequence called "sum" but messed up the brackets, using round instead of square. > list(1:6) Same. > for i in (1:6): Looks like a tuple done wrong. I think this is not an improvement, unless you're trying to minimize the number of characters in an expression. > It also makes forming reusable slices clearer and easier: > > my_slice = (:6:2) # instead of slice(None, 6, 2) "Easier" in the sense of "fewer characters to type", but "clearer"? I don't think so. [...] > So here's the part that requires a little more thought. Are you saying that so far you haven't put any thought into this proposal? *wink* (You don't have to answer this part, it was just my feeble attempt at humour.) -- Steve From robertve92 at gmail.com Sun Nov 11 06:48:09 2018 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Sun, 11 Nov 2018 12:48:09 +0100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: I'm wondering how your examples would go with from funcoperators import infix (https://pypi.org/project/funcoperators/) sum(1:6) # instead of sum(range(1, 6)) > > sum(1 /exclusive/ 6) list(1:6) > > list(1 /exclusive/ 6) set(1 /exclusive/ 1) Note that you can pick another name. Note that you can pick another function : @infix def inclusive (a, b): return range(a, b+1) sum(1 /inclusive/ 6) for i in (1:6): > > print(i**2) > > for i in 1 /exclusive/ 6: print(i**2) (i**2 for i in (1:6)) > > (i ** 2 for i in 1 /exclusive/ 6) It also makes forming reusable slices clearer and easier: > > my_slice = (:6:2) # instead of slice(None, 6, 2) > my_list[my_slice] > > I don't have exact equivalent here, I would create a function or explicitly say slice(0, 6, 2) This is similar to passing a range/slice object into the respective > constructor: > > > [1:6] # list(1:6) or [1, 2, 3, 4, 5] > {1:6} # set(1:6) or {1, 2, 3, 4, 5} > > As mentioned before {1:6} is a dict. Here are a few more examples: > > > for i in (:5): # 5 elements 0 to 4, i.e. range(5) > > print(i**2) > > Everybody knows i in range(5). > for i in (1:): # counts up from one for as long as you want, i.e. count(1) > > Well, count(1) is nice and people can google it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hemflit at gmail.com Sun Nov 11 07:13:25 2018 From: hemflit at gmail.com (=?UTF-8?Q?Vladimir_Filipovi=C4=87?=) Date: Sun, 11 Nov 2018 13:13:25 +0100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: On Sun, Nov 11, 2018 at 6:59 AM Nicholas Harrison wrote: > Any of the values may be omitted and in the slice context the behavior has no changes from what it already does: start and stop default to the beginning or end of the list depending on direction and the step defaults to 1. Just to point out, with slices it's a bit more complicated than that currently. The start, stop and step values each default to None. When slice-indexing built-in and (all? probably, not sure) standard-library types, None values for start and stop are interpreted consistently with what you described as defaults. A None value for step is interpreted as either 1 or -1, depending on the comparison of start and step, and accounting for None values in either of them too. ------ In real life I've found a use for non-integer slice objects, and been happy that Python allowed me to treat the slice as a purely syntactic construct whose semantics (outside builtins) are not fixed. My case was an interface to an external sparse time-series store, and it was easy to make the objects indexable with [datetime1 : datetime2 : timedelta], with None's treated right etc. (The primary intended use was in a REPL in a data-science context, so if your first thought was a doubt about whether that syntax is neat or abusive, please compare it to numpy or pandas idioms, not to collection classes you use in server or application code.) If this had not been syntactically possible, it would not have been a great pain to have to work around it, but now it's existing code and I can imagine other existing projects adapting the slice syntax to their own needs. At first blush, it seems like your proposal would give slices enough compulsory semantics to break some of such existing code - maybe even numpy itself. (FWIW, I've also occasionally had a need for non-integer ranges, and chafed at having to implement or install them. I've also missed hashable slices in real life, because functools.lru_cache.) ------ (Note I'm just a random person commenting on the mailing list, not anybody with any authority or influence.) I find this recurring idea of unifying slices and ranges seductive. But it would take a lot more shaking-out to make sure the range semantics can be vague-ified enough that they don't break non-integer slice usage. Also, I could imagine some disagreements about exactly how much non-standard slice usage should be protected from breakage. Someone could make the argument that _some_ objects as slice parameters are just abuse and no sane person should have used them in the first place. ("Really, slicing with [int : [[sys], ...] : __import__]? We need to take care to not break THAT too?") From apalala at gmail.com Sun Nov 11 11:34:16 2018 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Sun, 11 Nov 2018 16:34:16 +0000 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: On Sun, Nov 11, 2018 at 6:00 AM Chris Angelico wrote: > Be careful of this last one. If you omit the step, it looks like this: > > {start:stop} > > which is a dictionary display. > The parenthesis could always be required for this new syntax. In [*1*]: {'a':1} Out[*1*]: {'a': 1} In [*2*]: {('a':1)} File "", line 1 {('a':1)} ^ SyntaxError: invalid syntax -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Nov 11 16:43:10 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 12 Nov 2018 08:43:10 +1100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: <20181111214309.GY4071@ando.pearwood.info> On Sun, Nov 11, 2018 at 04:34:16PM +0000, Juancarlo A?ez wrote: > On Sun, Nov 11, 2018 at 6:00 AM Chris Angelico wrote: > > > Be careful of this last one. If you omit the step, it looks like this: > > > > {start:stop} > > > > which is a dictionary display. > > > > The parenthesis could always be required for this new syntax. Under the proposed syntax {(start:stop)} would be a set with a single item, a slice object. If slice objects were hashable, that would be legal. The OP's proposal is for {start:stop} to be equivalent to set(range(start, stop)) so they will be very different things. -- Steve From nicholasharrison222 at gmail.com Mon Nov 12 10:17:32 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Mon, 12 Nov 2018 08:17:32 -0700 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: That's a good point. It might be better to disallow the list and set versions all together. To get a list or set you would instead have to explicitly unpack a range/slice object: [*(:5)] # [:5] no longer allowed {*(1:6)} # {1:6} is a dict That would also solve the misstep of the three-character infinite list. On Sat, Nov 10, 2018 at 11:00 PM Chris Angelico wrote: > On Sun, Nov 11, 2018 at 4:59 PM Nicholas Harrison > wrote: > > It has a couple of siblings that should be obvious (think list or set > comprehension): > > > > [start:stop:step] # gives a list > > {start:stop:step} # gives a set > > > > Be careful of this last one. If you omit the step, it looks like this: > > {start:stop} > > which is a dictionary display. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholasharrison222 at gmail.com Mon Nov 12 10:23:42 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Mon, 12 Nov 2018 08:23:42 -0700 Subject: [Python-ideas] Range and slice syntax In-Reply-To: <20181111093537.GW4071@ando.pearwood.info> References: <20181111093537.GW4071@ando.pearwood.info> Message-ID: Overall, I agree with you. It is more intuitive to an experienced Python user, and not so helpful to beginners. It decreases the ability to read out code like English sentences and makes it harder to know what to search for online. So it boosts facility after you know the language, but not when starting out. On Sun, Nov 11, 2018 at 2:35 AM Steven D'Aprano wrote: > On Sat, Nov 10, 2018 at 10:58:02PM -0700, Nicholas Harrison wrote: > > [...] > > (start:stop:step) > > > > > > Meet a range/slice object. Parentheses are required. (Its syntax in this > > regard follows exactly the same rules as a generator expression.) I say > > both range and slice because it can be used in either role. > > Ranges and slices are conceptually different things. Even if they have > similar features, they are very different: > > - range is a lazy sequence, which produces its values on demand; > > - it supports (all? most?) of the Sequence ABC, including > membership testing and len(); > > - but its values are intentionally limited to integers; > > - slice objects, on the other hand, are an abstraction referring > to a context-sensitive sequence of abstract indices; > > - those indices can be anything you like: > > py> s = slice("Surprise!", range(1, 100, 3), slice(None)) > py> s.start > 'Surprise!' > py> s.stop > range(1, 100, 3) > py> s.step > slice(None, None, None) > > > - they don't support membership testing, len() or other Sequence > operations; > > - most importantly, because they are context-sensitive, we don't > even know how many indexes are included in a slice until we know > what we're slicing. > > That last item is why slice objects have an indices() method that takes > a mandatory length parameter. > > If slices were limited to single integer indices, then there would be an > argument that they are redundant and we could use range objects in their > place; but they aren't. > > > [...] > > Why is it useful? I at least find its syntax to be simple, intuitive, and > > concise -- more so than the range(...) or slice(...) alternatives. > > Concise, I will grant, but *intuitive*? > > I have never forgot the first time I saw Python code, and after being > told over and over again how "intuitive" it was I was utterly confused > by these mysterious list[:] and list[1:] and similar expressions. I had > no idea what they were or what they were supposed to do. I didn't even > have a name I could put to them. > > At least range(x) was something I could *name* and ask sensible > questions about. I didn't even have a name for this strange > square-bracket and colon syntax, and no context for understanding what > it did. There's surely few things in Python more cryptic than > > mylist = mylist[:] > > until you've learned what slicing does and how it operates. > > Your proposal has the same disadvantages: it is cryptic punctuation that > is meaningless until the reader has learned what it means, without even > an obvious name they can refer to. > > Don't get me wrong: slice syntax is great, *once you have learned it*. > But it is a million miles from intuitive. If this proposal is a winner, > it won't be because it will make Python easier for beginners. > > > > [...] > > sum(1:6) # instead of sum(range(1, 6)) > > That looks like you tried to take a slice of a sequence called "sum" but > messed up the brackets, using round instead of square. > > > > list(1:6) > > Same. > > > for i in (1:6): > > Looks like a tuple done wrong. > > > I think this is not an improvement, unless you're trying to minimize the > number of characters in an expression. > > > > It also makes forming reusable slices clearer and easier: > > > > my_slice = (:6:2) # instead of slice(None, 6, 2) > > "Easier" in the sense of "fewer characters to type", but "clearer"? I > don't think so. > > > [...] > > So here's the part that requires a little more thought. > > Are you saying that so far you haven't put any thought into this > proposal? > > *wink* > > (You don't have to answer this part, it was just my feeble attempt at > humour.) > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholasharrison222 at gmail.com Mon Nov 12 10:25:22 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Mon, 12 Nov 2018 08:25:22 -0700 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: Interesting. I haven't looked at that package before. It looks like it would work well for that. On Sun, Nov 11, 2018 at 4:48 AM Robert Vanden Eynde wrote: > I'm wondering how your examples would go with from funcoperators import > infix (https://pypi.org/project/funcoperators/) > > sum(1:6) # instead of sum(range(1, 6)) >> >> > sum(1 /exclusive/ 6) > > list(1:6) >> >> > list(1 /exclusive/ 6) > set(1 /exclusive/ 1) > > Note that you can pick another name. > Note that you can pick another function : > > @infix > def inclusive (a, b): > return range(a, b+1) > > sum(1 /inclusive/ 6) > > for i in (1:6): >> >> print(i**2) >> >> > for i in 1 /exclusive/ 6: > print(i**2) > > (i**2 for i in (1:6)) >> >> > (i ** 2 for i in 1 /exclusive/ 6) > > It also makes forming reusable slices clearer and easier: >> >> my_slice = (:6:2) # instead of slice(None, 6, 2) >> my_list[my_slice] >> >> > I don't have exact equivalent here, I would create a function or > explicitly say slice(0, 6, 2) > > This is similar to passing a range/slice object into the respective >> constructor: >> >> >> [1:6] # list(1:6) or [1, 2, 3, 4, 5] >> {1:6} # set(1:6) or {1, 2, 3, 4, 5} >> >> > As mentioned before {1:6} is a dict. > > Here are a few more examples: >> >> >> for i in (:5): # 5 elements 0 to 4, i.e. range(5) >> >> print(i**2) >> >> > Everybody knows i in range(5). > > >> for i in (1:): # counts up from one for as long as you want, i.e. >> count(1) >> >> > Well, count(1) is nice and people can google it. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholasharrison222 at gmail.com Mon Nov 12 10:42:33 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Mon, 12 Nov 2018 08:42:33 -0700 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: That's true. I should clarify what I was thinking a bit more. Maybe it's better to say that the new syntax creates a slice object: (::) # this creates slice(None, None, None) It accepts any object into its arguments and they default to None when they are left off. This can be passed into list indexing and used as a slice. The new addition is that slice is now iterable: iter(slice(None, None, None)) # becomes valid Only when this is called (implicitly or explicitly) do checks for valid objects and bounds occur. From my experience using slices, this is how they work in that context too. my_slice = slice('what?') # slice(None, 'what?', None) my_list[my_slice] # TypeError: slice indices must be integers or None or have an __index__ method # similarly iter(my_slice) # TypeError: slice indices must be integers or None or have an __index__ method I still may not understand slices well enough though. On Sun, Nov 11, 2018 at 5:13 AM Vladimir Filipovi? wrote: > On Sun, Nov 11, 2018 at 6:59 AM Nicholas Harrison > wrote: > > > Any of the values may be omitted and in the slice context the behavior > has no changes from what it already does: start and stop default to the > beginning or end of the list depending on direction and the step defaults > to 1. > > Just to point out, with slices it's a bit more complicated than that > currently. > > The start, stop and step values each default to None. > > When slice-indexing built-in and (all? probably, not sure) > standard-library types, None values for start and stop are interpreted > consistently with what you described as defaults. > A None value for step is interpreted as either 1 or -1, depending on > the comparison of start and step, and accounting for None values in > either of them too. > > ------ > > In real life I've found a use for non-integer slice objects, and been > happy that Python allowed me to treat the slice as a purely syntactic > construct whose semantics (outside builtins) are not fixed. > > My case was an interface to an external sparse time-series store, and > it was easy to make the objects indexable with [datetime1 : datetime2 > : timedelta], with None's treated right etc. > > (The primary intended use was in a REPL in a data-science context, so > if your first thought was a doubt about whether that syntax is neat or > abusive, please compare it to numpy or pandas idioms, not to > collection classes you use in server or application code.) > > If this had not been syntactically possible, it would not have been a > great pain to have to work around it, but now it's existing code and I > can imagine other existing projects adapting the slice syntax to their > own needs. At first blush, it seems like your proposal would give > slices enough compulsory semantics to break some of such existing code > - maybe even numpy itself. > > (FWIW, I've also occasionally had a need for non-integer ranges, and > chafed at having to implement or install them. I've also missed > hashable slices in real life, because functools.lru_cache.) > > ------ > > (Note I'm just a random person commenting on the mailing list, not > anybody with any authority or influence.) > > I find this recurring idea of unifying slices and ranges seductive. > But it would take a lot more shaking-out to make sure the range > semantics can be vague-ified enough that they don't break non-integer > slice usage. > > Also, I could imagine some disagreements about exactly how much > non-standard slice usage should be protected from breakage. Someone > could make the argument that _some_ objects as slice parameters are > just abuse and no sane person should have used them in the first > place. ("Really, slicing with [int : [[sys], ...] : __import__]? We > need to take care to not break THAT too?") > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Nov 12 11:23:21 2018 From: mertz at gnosis.cx (David Mertz) Date: Mon, 12 Nov 2018 11:23:21 -0500 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: I mostly like the abstraction being proposed, but the syntactical edge cases like `[::3]` (infinite list crashes) and {4:10} (a dict not a slice/range set) tip the balance against it for me. Saying to add some various stars and parens in various not-really-obvious places just makes the proposal way too much special casing. Moreover, we can get what we want without new syntax. Or at least the bulk of it. Both Pandas and NumPy offer special accessors for slices that look the way you'd like: >>> import pandas as pd >>> import numpy as np >>> I = pd.IndexSlice >>> J = np.s_ >>> I[4:10:3] slice(4, 10, 3) >>> J[4:10:3] slice(4, 10, 3) These are incredibly simple classes, but they are worth including because many programmers will forget how to write their own. You don't get your range-like behavior with those, but it's easy to construct. I'm having a think-o. I think it should be possible to make a RangeSlice class that will act like an enhanced version of pd.IndexSlice, but my try was wrong. But simpler: >>> def R(sl): ... return range(sl.start, sl.stop, sl.step or sys.maxsize) ... >>> for i in R(I[4:10:3]): ... print(i) ... 4 7 Someone should figure out how to make that simply `RS[4:10:3]` that will act both ways. :-) On Mon, Nov 12, 2018 at 10:44 AM Nicholas Harrison < nicholasharrison222 at gmail.com> wrote: > That's true. I should clarify what I was thinking a bit more. Maybe it's > better to say that the new syntax creates a slice object: > > (::) # this creates slice(None, None, None) > > It accepts any object into its arguments and they default to None when > they are left off. This can be passed into list indexing and used as a > slice. The new addition is that slice is now iterable: > > iter(slice(None, None, None)) # becomes valid > > Only when this is called (implicitly or explicitly) do checks for valid > objects and bounds occur. From my experience using slices, this is how they > work in that context too. > > my_slice = slice('what?') # slice(None, 'what?', None) > > my_list[my_slice] # TypeError: slice indices must be integers or None or > have an __index__ method > > # similarly > > iter(my_slice) # TypeError: slice indices must be integers or None or have > an __index__ method > > > I still may not understand slices well enough though. > > On Sun, Nov 11, 2018 at 5:13 AM Vladimir Filipovi? > wrote: > >> On Sun, Nov 11, 2018 at 6:59 AM Nicholas Harrison >> wrote: >> >> > Any of the values may be omitted and in the slice context the behavior >> has no changes from what it already does: start and stop default to the >> beginning or end of the list depending on direction and the step defaults >> to 1. >> >> Just to point out, with slices it's a bit more complicated than that >> currently. >> >> The start, stop and step values each default to None. >> >> When slice-indexing built-in and (all? probably, not sure) >> standard-library types, None values for start and stop are interpreted >> consistently with what you described as defaults. >> A None value for step is interpreted as either 1 or -1, depending on >> the comparison of start and step, and accounting for None values in >> either of them too. >> >> ------ >> >> In real life I've found a use for non-integer slice objects, and been >> happy that Python allowed me to treat the slice as a purely syntactic >> construct whose semantics (outside builtins) are not fixed. >> >> My case was an interface to an external sparse time-series store, and >> it was easy to make the objects indexable with [datetime1 : datetime2 >> : timedelta], with None's treated right etc. >> >> (The primary intended use was in a REPL in a data-science context, so >> if your first thought was a doubt about whether that syntax is neat or >> abusive, please compare it to numpy or pandas idioms, not to >> collection classes you use in server or application code.) >> >> If this had not been syntactically possible, it would not have been a >> great pain to have to work around it, but now it's existing code and I >> can imagine other existing projects adapting the slice syntax to their >> own needs. At first blush, it seems like your proposal would give >> slices enough compulsory semantics to break some of such existing code >> - maybe even numpy itself. >> >> (FWIW, I've also occasionally had a need for non-integer ranges, and >> chafed at having to implement or install them. I've also missed >> hashable slices in real life, because functools.lru_cache.) >> >> ------ >> >> (Note I'm just a random person commenting on the mailing list, not >> anybody with any authority or influence.) >> >> I find this recurring idea of unifying slices and ranges seductive. >> But it would take a lot more shaking-out to make sure the range >> semantics can be vague-ified enough that they don't break non-integer >> slice usage. >> >> Also, I could imagine some disagreements about exactly how much >> non-standard slice usage should be protected from breakage. Someone >> could make the argument that _some_ objects as slice parameters are >> just abuse and no sane person should have used them in the first >> place. ("Really, slicing with [int : [[sys], ...] : __import__]? We >> need to take care to not break THAT too?") >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholasharrison222 at gmail.com Mon Nov 12 11:31:45 2018 From: nicholasharrison222 at gmail.com (Nicholas Harrison) Date: Mon, 12 Nov 2018 09:31:45 -0700 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: For sake of completeness, here is another possible problem I've found with it. I was afraid of making something context-dependent, and therefore breaking its consistency. Here is the use of slices in the current language that breaks my rules: my_array[:,2] # valid syntax, though I've typically only seen it used in numpy This is interpreted as a tuple of a slice object and an integer. But with the new syntax, the slice has to be surrounded by parentheses if not the sole element in brackets: my_array[(:),2] This is a sticking point and would destroy backwards compatibility. I realize that the context-dependence is due to the behavior of the current slice syntax. For example, my_array[(:,2)] is invalid even though it looks like a tuple of a slice object and an integer. So I'm actually OK with parentheses not being mandatory in a slicing role, since that is already correct syntax in the current language. This might be a reconciliation. That would mean parentheses are not required when it is the sole argument to a function or when it is used inside indexing. my_array[:,2] # fine my_indexing = (:), 2 # parentheses required my_array[((:), 2)] # same here, but no need to do this This actually makes building up index expressions easier and gives a possible solution to the recent slice literals discussion. Maybe this should be the only context when parentheses aren't required, so you know you're dealing with an object inside of a function call. sum((1:6)) Anyways, just some more thoughts. On Sat, Nov 10, 2018 at 10:58 PM Nicholas Harrison < nicholasharrison222 at gmail.com> wrote: > I'm aware that syntax for ranges and slices has been discussed a good > amount over the years, but I wanted to float an idea out there to see if it > hasn't been considered before. It's not really original. Rather, it's a > combination of a couple parts of Python, and I find it > fascinatingly-consistent with the rest of the language. This will look > similar to PEP 204, but there are some important differences and > clarifications. > > (start:stop:step) > > > Meet a range/slice object. Parentheses are required. (Its syntax in this > regard follows exactly the same rules as a generator expression.) I say > both range and slice because it can be used in either role. On the one > hand, it is iterable and functions exactly like range(start, stop, step) in > those contexts. On the other, it can also be passed into list indexing > exactly like slice(start, stop, step). This is a proposal that range and > slice are really the same thing, just in different contexts. > > Why is it useful? I at least find its syntax to be simple, intuitive, and > concise -- more so than the range(...) or slice(...) alternatives. It's > quite obvious for an experienced Python user and just as simple to pick up > as slice notation for a beginner (since it *is* slice notation). > > It condenses and clears up sometimes-cumbersome range expressions. A > couple examples: > > > sum(1:6) # instead of sum(range(1, 6)) > > list(1:6) > > > for i in (1:6): > > print(i**2) > > > (i**2 for i in (1:6)) > > > It also makes forming reusable slices clearer and easier: > > my_slice = (:6:2) # instead of slice(None, 6, 2) > my_list[my_slice] > > > It has a couple of siblings that should be obvious (think list or set > comprehension): > > [start:stop:step] # gives a list > {start:stop:step} # gives a set > > > This is similar to passing a range/slice object into the respective > constructor: > > > [1:6] # list(1:6) or [1, 2, 3, 4, 5] > {1:6} # set(1:6) or {1, 2, 3, 4, 5} > > > Note that the parentheses aren't needed when it is the only argument of a > function call or is the only element within brackets or braces. It takes on > its respective roles for these bracket and brace cases, just like > comprehensions. This also gives rise to the normal slice syntax: > > my_list[1:6:2] # What is inside the brackets is a slice object. > my_list[(1:6:2)] # Equivalent. The parentheses are valid but unnecessary. > > > So here's the part that requires a little more thought. Any of the values > may be omitted and in the slice context the behavior has no changes from > what it already does: start and stop default to the beginning or end of the > list depending on direction and the step defaults to 1. In the range > context, we simply repeat these semantics, but noting that there is no > longer a beginning or end of a list. > > Step defaults to 1 (just like range or slice). > Start defaults to 0 when counting up and -1 when counting down (just like > slice). > If stop is omitted, the object will act like an itertools.count object, > counting indefinitely. > > I have found infinite iteration to be a natural and oft-desired extension > to a range object, but I can understand that some may want it to remain > separate and pure within itertools. I also admit that the ability to form > an infinite list with only three characters can be a scary thought (though > we are all adults here, right? ;). Normally you have to take a couple extra > keystrokes: > > from itertools import count > list(count()) > # rather than just [:] > > > If that is the case, then raising an error when iter() is called on a > range/slice object with no stop value could be another acceptable course of > action. The syntax will still be left valid. > > And that's mainly it. Slice is iterable or range is "indexable" and the > syntax can be used anywhere successive values are desired. If you want to > know what it does or how to use it in some case, just think, "what would a > slice object do?" or "what would a range object do?" or "how would I write > a generator expression/list comprehension here?". > > Here are a few more examples: > > > for i in (:5): # 5 elements 0 to 4, i.e. range(5) > > print(i**2) > > > for i in (1:): # counts up from one for as long as you want, i.e. count(1) > > print(i**2) > > if i == 5: break > > it = iter(:) # a convenient usage for an infinite counter > > next(it) > > > ' '.join(map(str, (:5:2))) # gives '0 2 4' > > [(:5), (5:10)] # list of range/slice objects > [[:5], [5:10]] # list of lists > [*(:5), *(5:10)] # uses unpacking to get flat list > [*[:5], *[5:10]] # same unpacking to get flat list > > > Otherwise you'd have to do: > > [list(range(5)), list(range(5, 10))] # list of lists > [*range(5), *range(5, 10)] # flat list > > > Tuples: > > tuple(1:6:2) # (1, 3, 5) > *(1:6:2), # same > > > I don't actually have experience developing the interpreter and underlying > workings of Python, so I don't know how much of a change this requires. I > thought it might be possible since the constructs already exist in the > language. They just haven't been unified yet. I also realize that there are > a few other use-cases that need to be ironed out. The syntax might also be > too minimal in some cases to be obvious. One of the trickiest things may be > what it will be called, since the current language has the two different > terms. > > In the end it's just another range/slice idea, and the idea has probably > already been proposed sometime in the past few decades, but what thoughts > are there? > > - Nicholas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Mon Nov 12 14:36:59 2018 From: mike at selik.org (Michael Selik) Date: Mon, 12 Nov 2018 11:36:59 -0800 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: <20181111021400.GU4071@ando.pearwood.info> Message-ID: On Sat, Nov 10, 2018 at 6:56 PM Jonathan Crall wrote: > Sometimes there's a good, useful function than doesn't get added because >> there's no reasonable place to put it. For example, a "flatten" function >> has been talked about since Python 1.x days, and we still don't have a >> standard solution for it, because (1) it isn't clear *precisely* what it >> should do, and (2) it isn't clear where it should go. > > > The flatten example is good to know about. Is there a link to this > discussion or a summary of it? I would think flatten could go in itertools, > but clearly there must some reason why its not there. I imagine the > duplication with it.chain.from_iter + "There should be one-- and preferably > only one --obvious way to do it."? > https://docs.python.org/3/library/itertools.html#itertools-recipes There's an example of ``flatten`` in the itertools recipes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Mon Nov 12 14:50:18 2018 From: prometheus235 at gmail.com (Nick Timkovich) Date: Mon, 12 Nov 2018 19:50:18 +0000 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: <20181111021400.GU4071@ando.pearwood.info> Message-ID: Not to derail the conversation, but I've always been curious why the itertools recipes are recipes and not ready-made goods (pre-baked?) that I can just consume. They're great examples to draw from, but that shouldn't preclude them from also being in the stdlib. On Mon, Nov 12, 2018 at 7:41 PM Michael Selik wrote: > > > On Sat, Nov 10, 2018 at 6:56 PM Jonathan Crall wrote: > >> Sometimes there's a good, useful function than doesn't get added because >>> there's no reasonable place to put it. For example, a "flatten" function >>> has been talked about since Python 1.x days, and we still don't have a >>> standard solution for it, because (1) it isn't clear *precisely* what it >>> should do, and (2) it isn't clear where it should go. >> >> >> The flatten example is good to know about. Is there a link to this >> discussion or a summary of it? I would think flatten could go in itertools, >> but clearly there must some reason why its not there. I imagine the >> duplication with it.chain.from_iter + "There should be one-- and preferably >> only one --obvious way to do it."? >> > > https://docs.python.org/3/library/itertools.html#itertools-recipes > There's an example of ``flatten`` in the itertools recipes. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericfahlgren at gmail.com Mon Nov 12 15:23:03 2018 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Mon, 12 Nov 2018 12:23:03 -0800 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: <20181111021400.GU4071@ando.pearwood.info> Message-ID: My intuition has always been that the recipes, taking 'flatten' as an excellent example, solve problems in a specific way that is not generally considered to be the "right" way. For example, should 'flatten' perform one-level flattening or deep recursive flattening? Should it handle strings as single entities, or should it treat them as iterables? What about byte strings, should they be treated differently than strings or the same? I could go on, but you probably get the point... On Mon, Nov 12, 2018 at 11:50 AM Nick Timkovich wrote: > Not to derail the conversation, but I've always been curious why the > itertools recipes are recipes and not ready-made goods (pre-baked?) that I > can just consume. They're great examples to draw from, but that shouldn't > preclude them from also being in the stdlib. > > On Mon, Nov 12, 2018 at 7:41 PM Michael Selik wrote: > >> >> >> On Sat, Nov 10, 2018 at 6:56 PM Jonathan Crall >> wrote: >> >>> Sometimes there's a good, useful function than doesn't get added because >>>> there's no reasonable place to put it. For example, a "flatten" >>>> function >>>> has been talked about since Python 1.x days, and we still don't have a >>>> standard solution for it, because (1) it isn't clear *precisely* what >>>> it >>>> should do, and (2) it isn't clear where it should go. >>> >>> >>> The flatten example is good to know about. Is there a link to this >>> discussion or a summary of it? I would think flatten could go in itertools, >>> but clearly there must some reason why its not there. I imagine the >>> duplication with it.chain.from_iter + "There should be one-- and preferably >>> only one --obvious way to do it."? >>> >> >> https://docs.python.org/3/library/itertools.html#itertools-recipes >> There's an example of ``flatten`` in the itertools recipes. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Nov 12 16:10:06 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 13 Nov 2018 08:10:06 +1100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: <20181112211006.GD4071@ando.pearwood.info> On Mon, Nov 12, 2018 at 11:23:21AM -0500, David Mertz wrote: > >>> import pandas as pd > >>> import numpy as np > >>> I = pd.IndexSlice > >>> J = np.s_ > >>> I[4:10:3] > slice(4, 10, 3) I'm not entirely sure that I like the idea of conflating slice *constructor* with slice *usage*. Slice objects are objects, like any other, and I'm not convinced that overloading slice syntax to create slice objects is a good design. I'm pretty sure it would have confused the hell out of me as a beginner to learn that mylist[1::2] took a slice and [1::2] make a slice. But at least numpy and pandas has the virtual of needing a prefix to make it work. > You don't get your range-like behavior with those, but it's easy to > construct. I'm having a think-o. I think it should be possible to make a > RangeSlice class that will act like an enhanced version of pd.IndexSlice, > but my try was wrong. Just because we can do something, doesn't mean we should. -- Steve From sf at fermigier.com Tue Nov 13 02:04:54 2018 From: sf at fermigier.com (=?UTF-8?Q?St=C3=A9fane_Fermigier?=) Date: Tue, 13 Nov 2018 08:04:54 +0100 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: Message-ID: Are you aware of https://boltons.readthedocs.io/ (whose motto is "Functionality that should be in the standard library.") ? Or similar endeavours such as: - https://pypi.org/project/auxlib/ - https://pypi.org/project/omakase/ - (And probably many others on PyPI with similar descriptions such as "a library of stuff I'm using / we're using at company X for all my / our project(s)...") Or the functional libraries listed here: https://github.com/sfermigier/awesome-functional-python/blob/master/README.md#libraries => IMHO there is room for a "semi-standard" library, stuff that's not included by default and has a release lifecycle more active that Python itself, but that can be considered the standard by a large group of users. Similar ideas can be found for instance in Java with Apache Commons ( https://commons.apache.org/ -> "an Apache project focused on all aspects of reusable Java components."). One could argue, though, that the Java standard library is much less developed than the Python standard library, so it's much easier to justify the existence of Apache Commons than a similar Python project. There is also the question of the porosity between such a project and the stdlib, which is the essence of the original question by the OP. Another interesting issue is the granularity of such a project. I sometimes, and somewhat foolishly, make libraries such as toolz or boltons a dependency of my projects, for just one or two function calls from my code. Regards, S. On Sun, Nov 11, 2018 at 2:37 AM Jonathan Crall wrote: > I'm interested in proposing several additions to the Python standard > library, and I would like more information on the procedure for doing so. > Are all additions done via a PEP? If not what is the procedure. If so, I've > read that the first step was to email this board and get feedback. > > I have a library called `ubelt` that contains several tools that I think > might be worthy of adding to the standard library. > > Here's my bullet point pitch: > > - Python is batteries included. Ubelt contains extra batteries. > function are extra batteries. > - Most function in ubelt are fast. All 222 tests takes 7.33 seconds. > - Ubelt has 100% test coverage (sans `# nocover` locations). > - I'm only championing a subset of the functions in ubelt. There are > certainly functions in there that do not belong in the standard library. > - I have a Jupyter notebook that give a demo of some select functions > (not necessarily the same as the ones proposed here): > https://github.com/Erotemic/ubelt/blob/master/docs/notebooks/Ubelt%20Demo.ipynb > - I do have documentation (mostly in docstrings) and in the docs > folder, but I've been having trouble auto-updating read-the-docs. Here is > the link anyway: https://ubelt.readthedocs.io/en/latest/ > > Here is a tentative list of interesting functions. Hopefully the names are > descriptive (if not, see docstrings: https://github.com/Erotemic/ubelt) > > ub.cmd > > ub.compressuser > > ub.group_items > > ub.dict_hist > > ub.find_duplicates > > ub.AutoDict > > ub.import_module_from_path > > ub.import_module_from_name > > ub.modname_to_modpath, > > ub.modpath_to_modname > > ub.ProgIter > > ub.ensuredir > > ub.expandpath > > > almost everything in util_list: > > allsame, argmax, argmin, argsort, argunique, > > chunks, flatten, iter_window, take, unique > > > These functions might be worth modifying into dictionary methods: > > ub.dict_subset > > ub.dict_take > > ub.map_vals > > ub.map_keys > > ub.Timerit > > ub.Timer > > > > Because I built the library, I tend to like all the functions. Its > difficult to decide if they are stdlib worthy, so there might be some false > positives / negatives. > > I'm on the fence about: > CacheStamp, Cacher, NoParam, argflag, argval, dzip, delete, hash_data, > hash_file, memoize, memoize_method, NiceRepr, augpath, userhome, > ensure_app_cache_dir, ensure_app_resource_dir, find_exe, find_path, > get_app_cache_dir, get_app_resource_dir, platform_cache_dir, > platform_resource_dir, CaptureStdout, codeblock, ensure_unicode, hzcat, > indent, OrderedSet > > > Its my hope that some of these are actually useful. Let me know any of the > following: what you think, if there are any questions, if something else > needs to be done, or what the next steps are. > > -- > -Jon > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group @ Systematic Cluster - https://systematic-paris-region.org/fr/groupe-thematique-logiciel-libre/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hemflit at gmail.com Tue Nov 13 12:49:03 2018 From: hemflit at gmail.com (=?UTF-8?Q?Vladimir_Filipovi=C4=87?=) Date: Tue, 13 Nov 2018 18:49:03 +0100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: On Mon, Nov 12, 2018 at 4:43 PM Nicholas Harrison wrote: > Only when this is called (implicitly or explicitly) do checks for valid objects and bounds occur. From my experience using slices, this is how they work in that context too. On reconsideration, I've found one more argument in favour of (at least this aspect of?) the proposal: the slice.indices method, which takes a sequence's length and returns an iterable (range) of all indices of such a sequence that would be "selected" by the slice. Not sure if it's supposed to be documented. So there is definitely precedent for "though slices in general are primarily a syntactic construct and new container-like classes can choose any semantics for indexing with them, the semantics specifically in the context of sequences have a bit of a privileged place in the language with concrete expectations, including strictly integer (or None) attributes". From allemang.d at gmail.com Tue Nov 13 13:13:12 2018 From: allemang.d at gmail.com (David Allemang) Date: Tue, 13 Nov 2018 13:13:12 -0500 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: That is not what slice.indices does. Per help(slice.indices) - "S.indices(len) -> (start, stop, stride) "Assuming a sequence of length len, calculate the start and stop indices, and the stride length of the extended slice described by S. Out of bounds indices are clipped in a manner consistent with handling of normal slices. Essentially, it returns (S.start, len, S.step), with start and stop adjusted to prevent out-of-bounds indices. On Tue, Nov 13, 2018, 12:50 PM Vladimir Filipovi? On Mon, Nov 12, 2018 at 4:43 PM Nicholas Harrison > wrote: > > Only when this is called (implicitly or explicitly) do checks for valid > objects and bounds occur. From my experience using slices, this is how they > work in that context too. > > On reconsideration, I've found one more argument in favour of (at > least this aspect of?) the proposal: the slice.indices method, which > takes a sequence's length and returns an iterable (range) of all > indices of such a sequence that would be "selected" by the slice. Not > sure if it's supposed to be documented. > > So there is definitely precedent for "though slices in general are > primarily a syntactic construct and new container-like classes can > choose any semantics for indexing with them, the semantics > specifically in the context of sequences have a bit of a privileged > place in the language with concrete expectations, including strictly > integer (or None) attributes". > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Nov 13 13:18:54 2018 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 14 Nov 2018 05:18:54 +1100 Subject: [Python-ideas] Range and slice syntax In-Reply-To: References: Message-ID: On Wed, Nov 14, 2018 at 5:14 AM David Allemang wrote: > > That is not what slice.indices does. Per help(slice.indices) - > > "S.indices(len) -> (start, stop, stride) > > "Assuming a sequence of length len, calculate the start and stop indices, and the stride length of the extended slice described by S. Out of bounds indices are clipped in a manner consistent with handling of normal slices. > > Essentially, it returns (S.start, len, S.step), with start and stop adjusted to prevent out-of-bounds indices. And to handle negative indexing. >>> slice(1,-1).indices(100) (1, 99, 1) A range from 1 to -1 doesn't make sense (or rather, it's an empty range), but a slice from 1 to -1 will exclude the first and last of any sequence. ChrisA From chris.barker at noaa.gov Tue Nov 13 19:45:14 2018 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 13 Nov 2018 16:45:14 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> Message-ID: This is somewhat unpleasant to me, especially while developing something and trying to test it quickly. I just want to be able to use same relative imports and run single file with `python3 test_main.py` for example. I had the same frustration when I first tried to use relative imports. Then I discovered setuptools? develop mode (now pip editable install) It is the right way to run code in packages under development. -CHB Running files as modules every time is tiring. This is my problem. I could not come up with a concrete solution idea yet i am thinking on it. Open to suggestions. Thank you all for your help! On Fri, Nov 9, 2018 at 4:16 PM Steven D'Aprano wrote: > On Fri, Nov 09, 2018 at 03:51:46PM -0800, danish bluecheese wrote: > > ??? src > > ??? __init__.py > > ??? main.py > > ??? test > > ??? __init__.py > > ??? test_main.py > > > > assume the structure above. To be able to use relative imports with such > > fundamental structure either i can go for sys.path hacks or could run as > a > > module from one further level parent. > > I don't understand. From the top level of the package, running inside > either __init__ or main, you should be able to say: > > from . import test > from .test import test_main > > From the test subpackage, you should be able to say: > > from .. import main > > to get the src/main module, or > > from . import test_main > > to get the test/test_main module from the test/__init__ module. > > (Disclaimer: I have not actually run the above code to check that it > works, beyond testing that its not a SyntaxError.) > > What *precisely* is the problem you are trying to solve, and your > proposed solution? > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Nov 13 20:21:44 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 13 Nov 2018 17:21:44 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> Message-ID: On Fri, Nov 9, 2018 at 4:32 PM, danish bluecheese wrote: > you are right on the lines you mentioned. Those are all working if i run it > as a module which i do every time. > This is somewhat unpleasant to me, especially while developing something and > trying to test it quickly. > I just want to be able to use same relative imports and run single file with > `python3 test_main.py` for example. > Running files as modules every time is tiring. This is my problem. Have you tried 'python3 -m test_main'? IIRC it should be effectively the same as 'python3 test_main.py' but with working relative imports. -n -- Nathaniel J. Smith -- https://vorpus.org From ubershmekel at gmail.com Wed Nov 14 00:34:33 2018 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 13 Nov 2018 21:34:33 -0800 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> Message-ID: On Tue, Nov 13, 2018 at 4:46 PM Chris Barker - NOAA Federal via Python-ideas wrote: > Then I discovered setuptools? develop mode (now pip editable install) > > It is the right way to run code in packages under development. > In multiple workplaces I found a folder with python utility scripts that users can just double-click. The need for installing causes problems with handling different versions on one machine, and the need for "__init__.py" files makes the folders less pretty. Sure - sometimes I need to install stuff anyway - but that's just one "install.py" double click away. I would like to propose allowing importing of strings that would support relative paths. For example in Danish's example: # use this in `test_main.py` import '../main.py' as main Maybe the syntax can be improved, but to me this need has been aching since I started using Python 12 years ago. I've used C, C++, and Javascript where the whole "how do I connect these two files that are a folder apart" problem doesn't require googling for documentation on packaging tools, magic filenames, constraints and gotchas. The solution is always obvious because it works just like it works in every system - with a file-relative path. File-relative imports is probably highest on my Python wish list. I've drafted but not sent out a python-ideas email about it multiple times. I've seen a lot of "sys.path" hacking that would've been solved by file-relative-paths. Cheers and thanks, Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Nov 14 01:15:02 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 14 Nov 2018 17:15:02 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> Message-ID: <20181114061502.GG4071@ando.pearwood.info> On Tue, Nov 13, 2018 at 09:34:33PM -0800, Yuval Greenfield wrote: > I would like to propose allowing importing of strings that would support > relative paths. For example in Danish's example: > > # use this in `test_main.py` > import '../main.py' as main How does that differ from existing syntax? from .. import main Off the top of my head, a few more questions that don't have obvious answers (at least not to me): What happens if main.py doesn't exist, but main.pyc does? What if you want to import from a sub-package, rather than a single-file module? What happens when Windows users use a backslash instead of a forward-slash? Does this syntax support arbitrary relative paths anywhere on the file system, or is it limited to only searching the current package? How does it interact with namespace packages? What happens if you call os.chdir() before calling this? Invariably people will want to write things like: path = '../spam.py' import path as spam (I know that's something I'd try.) What will happen there? If that is supported, invariably people will want to use pathlib.Path objects. Should that work? > Maybe the syntax can be improved, but to me this need has been aching since > I started using Python 12 years ago. I've used C, C++, and Javascript where > the whole "how do I connect these two files that are a folder apart" > problem doesn't require googling for documentation on packaging tools, > magic filenames, constraints and gotchas. The solution is always obvious > because it works just like it works in every system - with a file-relative > path. Beware of "obvious" solutions, because so often they lead to not so obvious problems. Like Javascript's "relative import hell": Quote: // what we want import reducer from 'reducer'; // what we don't want import reducer from '../../../reducer'; https://medium.com/@sherryhsu/how-to-change-relative-paths-to-absolute-paths-for-imports-32ba6cce18a5 And more here: https://goenning.net/2017/07/21/how-to-avoid-relative-path-hell-javascript-typescript-projects/ https://lostechies.com/derickbailey/2014/02/20/how-i-work-around-the-require-problem-in-nodejs/ It seems to me that in languages which support this file-relative import feature, people spend a lot of time either trying to avoid using it, or building tools to allow them to avoid using it. I don't know if that makes it better or worse than Python's solution for relative imports :-) -- Steve From rosuav at gmail.com Wed Nov 14 01:27:15 2018 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 14 Nov 2018 17:27:15 +1100 Subject: [Python-ideas] Relative Imports In-Reply-To: <20181114061502.GG4071@ando.pearwood.info> References: <20181109231635.GL4071@ando.pearwood.info> <20181109233915.GM4071@ando.pearwood.info> <20181110001043.GO4071@ando.pearwood.info> <20181114061502.GG4071@ando.pearwood.info> Message-ID: On Wed, Nov 14, 2018 at 5:15 PM Steven D'Aprano wrote: > Beware of "obvious" solutions, because so often they lead to not so > obvious problems. Like Javascript's "relative import hell": > > Quote: > > // what we want > import reducer from 'reducer'; > // what we don't want > import reducer from '../../../reducer'; > > https://medium.com/@sherryhsu/how-to-change-relative-paths-to-absolute-paths-for-imports-32ba6cce18a5 Agreed. Having spent a lot of time with JavaScript students, I actually am NOT a fan of directory-relative imports. They inevitably result in equivalent code looking different, and subtly different code looking identical. Consider: // main.js import User from './models/users'; // routers/users.js import User from '../models/users'; That example is probably okay, because if you ever get it wrong, you get an immediate error. But what about this? // routers/index.js import usersRouter from './users'; // models/stuff.js import User from './users'; The exact same import does completely different things based on which file it's in. That's dangerous, because it makes code subtly context sensitive. I would much rather work with package-relative pathing, where there is a known basis for *all* local imports, no matter what file the import is actually happening in. In Python, that's best done by naming the package again, so that's not quite ideal either, but it's better than having to pile in the exact right number of "../" to make the import work. ChrisA From erotemic at gmail.com Wed Nov 14 19:31:49 2018 From: erotemic at gmail.com (Jonathan Crall) Date: Wed, 14 Nov 2018 19:31:49 -0500 Subject: [Python-ideas] Proposing additions to the standard library In-Reply-To: References: Message-ID: @St?fane Bolton looks really neat! I'll take a look. Some of my stuff may fit better as a PR for this library. Also I don't think its foolish to depend on a package for one function, given that that (a) that function is really useful or (b) the size of the dependency itself is small. Given my initial impressions of boltons, I would guess that it doesn't have a large download size of runtime impact. Although if this case keeps reoccurring with a particular function, then perhaps that function might improve the stdlib? After really reviewing my stuff, I think a few functions I have would make the stdlib better, but its probably only a small few. I bet there are things in bolton that would improve the stdlib as well, but I do agree that "semi-standard" libraries (e.g. numpy / scipy / what-I-hope-ubelt-to-be) might be better place for these functions that would otherwise cause costly-clutter in the stdlib. On Tue, Nov 13, 2018 at 2:05 AM St?fane Fermigier wrote: > Are you aware of https://boltons.readthedocs.io/ (whose motto is > "Functionality that should be in the standard library.") ? > > Or similar endeavours such as: > > - https://pypi.org/project/auxlib/ > - https://pypi.org/project/omakase/ > - (And probably many others on PyPI with similar descriptions such as "a > library of stuff I'm using / we're using at company X for all my / our > project(s)...") > > Or the functional libraries listed here: > https://github.com/sfermigier/awesome-functional-python/blob/master/README.md#libraries > > => IMHO there is room for a "semi-standard" library, stuff that's not > included by default and has a release lifecycle more active that Python > itself, but that can be considered the standard by a large group of users. > > Similar ideas can be found for instance in Java with Apache Commons ( > https://commons.apache.org/ -> "an Apache project focused on all aspects > of reusable Java components."). One could argue, though, that the Java > standard library is much less developed than the Python standard library, > so it's much easier to justify the existence of Apache Commons than a > similar Python project. > > There is also the question of the porosity between such a project and the > stdlib, which is the essence of the original question by the OP. > > Another interesting issue is the granularity of such a project. I > sometimes, and somewhat foolishly, make libraries such as toolz or boltons > a dependency of my projects, for just one or two function calls from my > code. > > Regards, > > S. > > > On Sun, Nov 11, 2018 at 2:37 AM Jonathan Crall wrote: > >> I'm interested in proposing several additions to the Python standard >> library, and I would like more information on the procedure for doing so. >> Are all additions done via a PEP? If not what is the procedure. If so, I've >> read that the first step was to email this board and get feedback. >> >> I have a library called `ubelt` that contains several tools that I think >> might be worthy of adding to the standard library. >> >> Here's my bullet point pitch: >> >> - Python is batteries included. Ubelt contains extra batteries. >> function are extra batteries. >> - Most function in ubelt are fast. All 222 tests takes 7.33 seconds. >> - Ubelt has 100% test coverage (sans `# nocover` locations). >> - I'm only championing a subset of the functions in ubelt. There are >> certainly functions in there that do not belong in the standard library. >> - I have a Jupyter notebook that give a demo of some select functions >> (not necessarily the same as the ones proposed here): >> https://github.com/Erotemic/ubelt/blob/master/docs/notebooks/Ubelt%20Demo.ipynb >> - I do have documentation (mostly in docstrings) and in the docs >> folder, but I've been having trouble auto-updating read-the-docs. Here is >> the link anyway: https://ubelt.readthedocs.io/en/latest/ >> >> Here is a tentative list of interesting functions. Hopefully the names >> are descriptive (if not, see docstrings: >> https://github.com/Erotemic/ubelt) >> >> ub.cmd >> >> ub.compressuser >> >> ub.group_items >> >> ub.dict_hist >> >> ub.find_duplicates >> >> ub.AutoDict >> >> ub.import_module_from_path >> >> ub.import_module_from_name >> >> ub.modname_to_modpath, >> >> ub.modpath_to_modname >> >> ub.ProgIter >> >> ub.ensuredir >> >> ub.expandpath >> >> >> almost everything in util_list: >> >> allsame, argmax, argmin, argsort, argunique, >> >> chunks, flatten, iter_window, take, unique >> >> >> These functions might be worth modifying into dictionary methods: >> >> ub.dict_subset >> >> ub.dict_take >> >> ub.map_vals >> >> ub.map_keys >> >> ub.Timerit >> >> ub.Timer >> >> >> >> Because I built the library, I tend to like all the functions. Its >> difficult to decide if they are stdlib worthy, so there might be some false >> positives / negatives. >> >> I'm on the fence about: >> CacheStamp, Cacher, NoParam, argflag, argval, dzip, delete, hash_data, >> hash_file, memoize, memoize_method, NiceRepr, augpath, userhome, >> ensure_app_cache_dir, ensure_app_resource_dir, find_exe, find_path, >> get_app_cache_dir, get_app_resource_dir, platform_cache_dir, >> platform_resource_dir, CaptureStdout, codeblock, ensure_unicode, hzcat, >> indent, OrderedSet >> >> >> Its my hope that some of these are actually useful. Let me know any of >> the following: what you think, if there are any questions, if something >> else needs to be done, or what the next steps are. >> >> -- >> -Jon >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier > - http://linkedin.com/in/sfermigier > Founder & CEO, Abilian - Enterprise Social Software - > http://www.abilian.com/ > Chairman, Free&OSS Group @ Systematic Cluster - > https://systematic-paris-region.org/fr/groupe-thematique-logiciel-libre/ > Co-Chairman, National Council for Free & Open Source Software (CNLL) - > http://cnll.fr/ > Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & > http://pydata.fr/ > -- -Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From rene at pylots.com Sat Nov 17 00:10:02 2018 From: rene at pylots.com (Rene Nejsum) Date: Sat, 17 Nov 2018 06:10:02 +0100 Subject: [Python-ideas] f-string "debug" conversion Message-ID: <21400AAD-4D76-441B-A45A-CA13FE512EEB@pylots.com> +1 for this, I would use it all the time for debugging and tracing programs breakpoints and IDE?s can be nice, but my code is filled with lines like: logger.debug(f?transaction_id={transaction_id}, state={state}, amount={amount}, etc={etc}?) So yeah, well +10 actually :-) /rene From steve at pearwood.info Mon Nov 19 19:19:51 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 20 Nov 2018 11:19:51 +1100 Subject: [Python-ideas] Enhancing range object string displays Message-ID: <20181120001951.GA4319@ando.pearwood.info> On the bug tracker, there is a proposal to enhance range objects so that printing them will display a snapshot of the values included, including the end points. For example: print(range(10)) currently displays "range(10)". The proposal is for the __str__ method to instead return "". https://bugs.python.org/issue35200 print(range(2, 200, 3)) would display Note that the original proposal was for range objects' __repr__ to display this behaviour. But given the loss of eval(repr(obj)) round tripping, and the risk of breaking backwards compatibility, it was decided that isn't acceptable but using the same display for __str__ (and hence produced by print) would be nearly as useful but without the downsides. The developer who proposed the feature, Julien, now wants to reject the feature request. I think it is still a useful feature for range objects. What do others think? Is this worth re-opening? -- Steve From danish.bluecheese at gmail.com Mon Nov 19 20:09:25 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Mon, 19 Nov 2018 17:09:25 -0800 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: <20181120001951.GA4319@ando.pearwood.info> References: <20181120001951.GA4319@ando.pearwood.info> Message-ID: I think it is kind of useless effort. If somebody using range() then probably knows about it. Also there are some workarounds inspect range() result already. Like: *range(10) or if it is big: *range(10000000)[:10] On Mon, Nov 19, 2018 at 4:25 PM Steven D'Aprano wrote: > On the bug tracker, there is a proposal to enhance range objects so that > printing them will display a snapshot of the values included, including > the end points. For example: > > print(range(10)) > > currently displays "range(10)". The proposal is for the __str__ method > to instead return "". > > https://bugs.python.org/issue35200 > > print(range(2, 200, 3)) would display > > > > Note that the original proposal was for range objects' __repr__ to > display this behaviour. But given the loss of eval(repr(obj)) round > tripping, and the risk of breaking backwards compatibility, it was > decided that isn't acceptable but using the same display for __str__ > (and hence produced by print) would be nearly as useful but without the > downsides. > > The developer who proposed the feature, Julien, now wants to reject the > feature request. I think it is still a useful feature for range objects. > What do others think? Is this worth re-opening? > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Nov 19 21:09:27 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 20 Nov 2018 13:09:27 +1100 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: References: <20181120001951.GA4319@ando.pearwood.info> Message-ID: <20181120020926.GA5054@ando.pearwood.info> On Mon, Nov 19, 2018 at 05:09:25PM -0800, danish bluecheese wrote: > I think it is kind of useless effort. If somebody using range() then > probably knows about it. For experienced users, sure, but this is an enhancement to help beginners who may be confused by the half-open end points. Even non-beginners may find it nice to be able to easily see the end points when the step size is not 1. If range objects had this, I'd use it in the REPL to check the end points. Sure, I could convert to a list and take a slice, but giving the object a nicer print output makes less work for the user. -- Steve From python at mrabarnett.plus.com Mon Nov 19 21:17:34 2018 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 20 Nov 2018 02:17:34 +0000 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: <20181120001951.GA4319@ando.pearwood.info> References: <20181120001951.GA4319@ando.pearwood.info> Message-ID: <24ecfa43-2ba7-a721-625b-843203e1dc9c@mrabarnett.plus.com> On 2018-11-20 00:19, Steven D'Aprano wrote: > On the bug tracker, there is a proposal to enhance range objects so that > printing them will display a snapshot of the values included, including > the end points. For example: > > print(range(10)) > > currently displays "range(10)". The proposal is for the __str__ method > to instead return "". > > https://bugs.python.org/issue35200 > > print(range(2, 200, 3)) would display > > > > Note that the original proposal was for range objects' __repr__ to > display this behaviour. But given the loss of eval(repr(obj)) round > tripping, and the risk of breaking backwards compatibility, it was > decided that isn't acceptable but using the same display for __str__ > (and hence produced by print) would be nearly as useful but without the > downsides. > > The developer who proposed the feature, Julien, now wants to reject the > feature request. I think it is still a useful feature for range objects. > What do others think? Is this worth re-opening? > Well, if it's not going to round-trip, and it's going to be more verbose, then I think it shouldn't be making the step size implicit. Maybe something more like: But, overall, I'm ?0. From rosuav at gmail.com Mon Nov 19 21:22:16 2018 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 20 Nov 2018 13:22:16 +1100 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: <20181120020926.GA5054@ando.pearwood.info> References: <20181120001951.GA4319@ando.pearwood.info> <20181120020926.GA5054@ando.pearwood.info> Message-ID: On Tue, Nov 20, 2018 at 1:10 PM Steven D'Aprano wrote: > > On Mon, Nov 19, 2018 at 05:09:25PM -0800, danish bluecheese wrote: > > I think it is kind of useless effort. If somebody using range() then > > probably knows about it. > > For experienced users, sure, but this is an enhancement to help > beginners who may be confused by the half-open end points. > > Even non-beginners may find it nice to be able to easily see the end > points when the step size is not 1. > > If range objects had this, I'd use it in the REPL to check the end > points. Sure, I could convert to a list and take a slice, but giving the > object a nicer print output makes less work for the user. > I'm a fairly experienced Python programmer, and I still just fire up a REPL to confirm certain uses of range() with steps. What would it be like if the string form looked like this: >>> range(1, 30, 3) range([1, 4, 7, ..., 25, 28]) In theory, this could actually be made legal, and could be a cool feature for an enhanced REPL to support. (All you have to do is define 'range' as a function that checks if it's been given a list, and if not, passes it on unchanged.) Whether this form or the original, I think this would be an improvement. ChrisA From njs at pobox.com Tue Nov 20 00:02:25 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 19 Nov 2018 21:02:25 -0800 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: <20181120020926.GA5054@ando.pearwood.info> References: <20181120001951.GA4319@ando.pearwood.info> <20181120020926.GA5054@ando.pearwood.info> Message-ID: On Mon, Nov 19, 2018 at 6:09 PM Steven D'Aprano wrote: > > On Mon, Nov 19, 2018 at 05:09:25PM -0800, danish bluecheese wrote: > > I think it is kind of useless effort. If somebody using range() then > > probably knows about it. > > For experienced users, sure, but this is an enhancement to help > beginners who may be confused by the half-open end points. > > Even non-beginners may find it nice to be able to easily see the end > points when the step size is not 1. > > If range objects had this, I'd use it in the REPL to check the end > points. Sure, I could convert to a list and take a slice, but giving the > object a nicer print output makes less work for the user. I feel like the kind of users who would benefit the most from this are exactly the same users who are baffled by the distinction between str() and repr() and which one is used when, and thus would struggle to benefit from it? -n -- Nathaniel J. Smith -- https://vorpus.org From p.f.moore at gmail.com Tue Nov 20 04:10:03 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 20 Nov 2018 09:10:03 +0000 Subject: [Python-ideas] Enhancing range object string displays In-Reply-To: References: <20181120001951.GA4319@ando.pearwood.info> <20181120020926.GA5054@ando.pearwood.info> Message-ID: On Tue, 20 Nov 2018 at 02:23, Chris Angelico wrote: > I'm a fairly experienced Python programmer, and I still just fire up a > REPL to confirm certain uses of range() with steps. > > What would it be like if the string form looked like this: > > >>> range(1, 30, 3) > range([1, 4, 7, ..., 25, 28]) Wouldn't that use the repr, which is *not* changing in this proposal? > In theory, this could actually be made legal, and could be a cool > feature for an enhanced REPL to support. (All you have to do is define > 'range' as a function that checks if it's been given a list, and if > not, passes it on unchanged.) > > Whether this form or the original, I think this would be an improvement. I do like the improved display, but I don't know how useful it would be in practice, given that I don't *often* use raw range objects (as opposed to "for x in range(...)") and the default repr display won't change. I am inclined to think that we're overthinking the problem - changing the str() of a range object is unlikely to break anything, is a small but clear usability improvement, and more time has probably been spent debating whether it's a good idea than it would have cost to just make the change... Paul From celelibi at gmail.com Sun Nov 25 11:52:37 2018 From: celelibi at gmail.com (Celelibi) Date: Sun, 25 Nov 2018 17:52:37 +0100 Subject: [Python-ideas] hybrid regex engine: backtracking + Thompson NFA Message-ID: Hello, I found this topic discussed on the python-dev ML back in 2010 [1]. I'm bringing it up 8 years later with a variation. In short: The article [2] highlight that backtracking-based regex engine (like SRE in python) have pathological cases that run in an exponential time of the input, while they could run in a linear time. Not mentioned by the article is that even quadratic time can be a problem with large inputs. Which happen pretty often when you're looking for delimited stuff like this : re.match(r'.*?,(.*),,', "a,"*10000) Of course, there's a catch. Backreferences most-likely cannot be implemented in a guaranteed linear time, and some cases of look-behind assertions might prove difficult as well. But other features like alternatives (foo|bar) and repetitions (.*) are no problem. The general idea of the Thompson's algorithm is that the simulation of the NFA is basically in all the reachable states at the same time while parsing the input string character by character. Of course, for regex engines that use a VM (like Python's SRE) you can also make the execution of the equivalent byte-code be in several states at once. The 2010 discussion seems to be about having two separate engines and selecting the best one for a given regex. What I'm proposing here is to resort to backtracking only for the regex features that need it. Meaning that within a regex like r'(.*),.* .*,\1' the evaluation of the middle part would use the Thompson's algorithm, while the \1 could trigger the backtracking mechanism if the string doesn't match. What do you think about it? Has this already been discussed and rejected? Is it just a matter of showing the code? (_sre.c seems... non-trivial) Has this already been juged not worth the maintainance effort? AFAIK, there's not many hybrid regex engines out there. But one notable implementation is the third version of the Henry Spencer's regex engine [3]. Which he doesn't seem to have documented publicly, but postgres has done a pretty good job at reverse-engineering a high level view of it [4]. I'm still unsure how the backtracking of some parts of the regex interact with the multi-state evaluation of the other parts. But at least it exists and works, so it is feasible. Best regards, Celelibi [1] https://mail.python.org/pipermail/python-dev/2010-March/098354.html [2] https://swtch.com/~rsc/regexp/regexp1.html [3] https://github.com/garyhouston/hsrex [4] https://github.com/postgres/postgres/tree/master/src/backend/regex From python at mrabarnett.plus.com Sun Nov 25 12:59:23 2018 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 25 Nov 2018 17:59:23 +0000 Subject: [Python-ideas] hybrid regex engine: backtracking + Thompson NFA In-Reply-To: References: Message-ID: <017cffe5-0608-6a10-8631-ff5cc5f37047@mrabarnett.plus.com> On 2018-11-25 16:52, Celelibi wrote: > Hello, > > I found this topic discussed on the python-dev ML back in 2010 [1]. > I'm bringing it up 8 years later with a variation. > > In short: The article [2] highlight that backtracking-based regex > engine (like SRE in python) have pathological cases that run in an > exponential time of the input, while they could run in a linear time. > Not mentioned by the article is that even quadratic time can be a > problem with large inputs. Which happen pretty often when you're > looking for delimited stuff like this : > re.match(r'.*?,(.*),,', "a,"*10000) > > Of course, there's a catch. Backreferences most-likely cannot be > implemented in a guaranteed linear time, and some cases of look-behind > assertions might prove difficult as well. But other features like > alternatives (foo|bar) and repetitions (.*) are no problem. > > The general idea of the Thompson's algorithm is that the simulation of > the NFA is basically in all the reachable states at the same time > while parsing the input string character by character. Of course, for > regex engines that use a VM (like Python's SRE) you can also make the > execution of the equivalent byte-code be in several states at once. > > The 2010 discussion seems to be about having two separate engines and > selecting the best one for a given regex. What I'm proposing here is > to resort to backtracking only for the regex features that need it. > Meaning that within a regex like r'(.*),.* .*,\1' the evaluation of > the middle part would use the Thompson's algorithm, while the \1 could > trigger the backtracking mechanism if the string doesn't match. > > What do you think about it? > Has this already been discussed and rejected? > Is it just a matter of showing the code? (_sre.c seems... non-trivial) > Has this already been juged not worth the maintainance effort? > > AFAIK, there's not many hybrid regex engines out there. But one > notable implementation is the third version of the Henry Spencer's > regex engine [3]. Which he doesn't seem to have documented publicly, > but postgres has done a pretty good job at reverse-engineering a high > level view of it [4]. I'm still unsure how the backtracking of some > parts of the regex interact with the multi-state evaluation of the > other parts. But at least it exists and works, so it is feasible. > > > Best regards, > Celelibi > > [1] https://mail.python.org/pipermail/python-dev/2010-March/098354.html > [2] https://swtch.com/~rsc/regexp/regexp1.html > [3] https://github.com/garyhouston/hsrex > [4] https://github.com/postgres/postgres/tree/master/src/backend/regex > This is open source. Nothing gets done unless someone decides to do it. So, yes, it's (just?) a matter of showing the code, and, if you want it in the re module, to persuade the core devs that it's worth doing and that you're willing to maintain it and fix the bugs. From kale at thekunderts.net Mon Nov 26 16:29:21 2018 From: kale at thekunderts.net (Kale Kundert) Date: Mon, 26 Nov 2018 13:29:21 -0800 Subject: [Python-ideas] __len__() for map() Message-ID: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> I just ran into the following behavior, and found it surprising: >>> len(map(float, [1,2,3])) TypeError: object of type 'map' has no len() I understand that map() could be given an infinite sequence and therefore might not always have a length.? But in this case, it seems like map() should've known that its length was 3.? I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. My proposal is to delegate map.__len__() to the underlying iterable.? Similarly, map.__getitem__() could be implemented if the underlying iterable supports item access: class map: def __init__(self, func, iterable): self.func = func self.iterable = iterable def __iter__(self): yield from (self.func(x) for x in self.iterable) def __len__(self): return len(self.iterable) def __getitem__(self, key): return self.func(self.iterable[key]) Let me know if there any downsides to this that I'm not seeing.? From my perspective, it seems like there would be only a number of (small) advantages: - Less surprising - Avoid some unnecessary copies - Backwards compatible -Kale -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Mon Nov 26 17:06:52 2018 From: mike at selik.org (Michael Selik) Date: Mon, 26 Nov 2018 14:06:52 -0800 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: If you know the input is sizeable, why not check its length instead of the map's? On Mon, Nov 26, 2018, 1:35 PM Kale Kundert I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore > might not always have a length. But in this case, it seems like map() > should've known that its length was 3. I also understand that I can just > call list() on the whole thing and get a list, but the nice thing about > map() is that it doesn't copy data, so it's unfortunate to lose that > advantage for no particular reason. > > My proposal is to delegate map.__len__() to the underlying iterable. > Similarly, map.__getitem__() could be implemented if the underlying > iterable supports item access: > > class map: > > def __init__(self, func, iterable): > self.func = func > self.iterable = iterable > > def __iter__(self): > yield from (self.func(x) for x in self.iterable) > > def __len__(self): > return len(self.iterable) > > def __getitem__(self, key): > return self.func(self.iterable[key]) > > Let me know if there any downsides to this that I'm not seeing. From my > perspective, it seems like there would be only a number of (small) > advantages: > > - Less surprising > - Avoid some unnecessary copies > - Backwards compatible > > -Kale > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Mon Nov 26 17:14:58 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Mon, 26 Nov 2018 22:14:58 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: Hi Kale Thank you for the sample code. It's most helpful. Please consider >>> it = iter(range(4)) >>> list(map(float, it)) [0.0, 1.0, 2.0, 3.0] >>> it = iter(range(4)) >>> list(zip(it, it)) [(0, 1), (2, 3)] >>> list(zip(range(4), range(4))) [(0, 0), (1, 1), (2, 2), (3, 3)] A sequence is iterable. An iterator is iterable. There are other things that are iterable. A random number generator is an iterator, whose underlying object does not have a length. Briefly, I don't like your suggestion because many important iterables don't have a length! -- Jonathan From rosuav at gmail.com Mon Nov 26 17:36:08 2018 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 27 Nov 2018 09:36:08 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Tue, Nov 27, 2018 at 9:15 AM Jonathan Fine wrote: > Briefly, I don't like your suggestion because many important iterables > don't have a length! That part's fine. The implication is that mapping over an iterable with a length would give a map with a known length, and mapping over something without a length wouldn't. But I think there are enough odd edge cases (for instance, is it okay to call the function twice if you __getitem__ twice, or should you cache it?) that it's probably best to keep the built-in map() simple and reliable. Don't forget, too, that map() can take more than one iterable, and some may not have lengths. (You can define enumerate in terms of map and itertools.count; what is the length of the resulting enumeration?) If you want a map-like object that takes specifically a single list, and is a mapped view to that list, then go for it - but that can be its own beast, not related to the map() built-in function. Also, it may be of value to check out more-itertools; you might find something there that you like. ChrisA From steve at pearwood.info Mon Nov 26 17:37:20 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Nov 2018 09:37:20 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181126223720.GJ4319@ando.pearwood.info> On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote: > If you know the input is sizeable, why not check its length instead of the > map's? The consumer of map may not be the producer of map. You might know that alist supports len(), but by the time I see it, I only see map(f, alist), not alist itself. -- Steve From steve at pearwood.info Mon Nov 26 17:39:16 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Nov 2018 09:39:16 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181126223915.GK4319@ando.pearwood.info> On Mon, Nov 26, 2018 at 10:14:58PM +0000, Jonathan Fine wrote: > Briefly, I don't like your suggestion because many important iterables > don't have a length! But many important iterables do. -- Steve From danish.bluecheese at gmail.com Mon Nov 26 18:08:40 2018 From: danish.bluecheese at gmail.com (danish bluecheese) Date: Mon, 26 Nov 2018 15:08:40 -0800 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181126223915.GK4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126223915.GK4319@ando.pearwood.info> Message-ID: > > > On Mon, Nov 26, 2018 at 10:14:58PM +0000, Jonathan Fine wrote: > > > Briefly, I don't like your suggestion because many important iterables > > don't have a length! > > But many important iterables do. > I agree many important iterables do have. On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote: > If you know the input is sizeable, why not check its length instead of the > map's? The consumer of map may not be the producer of map. Very good point. Honestly, i like the proposal but love to see more reviews on the idea. Maybe i am missing something. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Nov 26 18:37:13 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Nov 2018 10:37:13 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181126233713.GL4319@ando.pearwood.info> On Mon, Nov 26, 2018 at 01:29:21PM -0800, Kale Kundert wrote: > I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore might > not always have a length.? But in this case, it seems like map() should've known > that its length was 3. This seems straightforward, but I think there's more complexity than you might realise, a nasty surprise which I expect is going to annoy people no matter what decision we make, and the usefulness is probably less than you might think. First, the usefulness: we still have to wrap the call to len() in a try...except block, even if we know we have a map object, because we won't know whether the underlying iterable supports len. So it won't reduce the amount of code we have to write. At best it will allow us to take a fast-path when len() returns a value, and a slow-path when it raises. Here's the definition of the Sized abc: https://docs.python.org/3/library/collections.abc.html#collections.abc.Sized and the implementation simply checks for the existence of __len__. We (rightly) assume that if __len__ exists, the object has a known length, and that calling len() on it will succeed or at least not raise TypeError. Your proposal will break that expectation. map objects will be sized, but since sometimes the underlying iterator won't be, they may still raise TypeError. Of course there are ways to work around this. We could just change our expectations: even Sized objects might not be *actually* sized. Or map() could catch the TypeError and raise instead a ValueError, or something. Or we could rethink the whole length concept (see below), which after all was invented back in Python 1 days and is looking a bit old. As for the nasty surprise... do you agree that this ought to be an invariant for sized iterables? count = len(it) i = 0 for obj in it: i += 1 assert i == count That's the invariant I expect, and breaking that will annoy me (and I expect many other people) greatly. But that means that map() cannot just delegate its length to the underlying iterable. The implementation must be more complex, keeping track of how many items it has seen. And consider this case: it = map(lambda x: x, [1, 2, 3, 4, 5]) x = next(it) x = next(it) assert len(it) == 5 # underlying length of the iterable assert len(list(it)) == 3 # but only three items left assert len(it) == 5 # still 5 assert len(list(it)) == 0 # but nothing left So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above. But that's going to annoy other people for another reason: we rightly expect that iterables shouldn't change their length just because you iterate over them! The length should only change if you *modify* them. So these two snippets should do the same: # 1 n = len(it) x = sum(it) # 2 x = sum(it) n = len(it) but if map() updates its length as it goes, it will break that invariant. So *whichever* behaviour we choose, we're going to break *something*. Either the reported length isn't necessarily the same as the actual length you get from iterating over the items, which will be annoying and confusing, or it varies as you iterate, which will ALSO be annoying and confusing. Either way, this apparently simple and obvious change will be annoying and confusing. Rethinking object length ------------------------ len() was invented back in Python 1 days, or earlier, when we effectively had only one kind of iterable: sequences like lists, with a known length. Today, iterables can have: 1. a known, finite length; 2. a known infinite length; 3. An unknown length (and usually no way to estimate it). At least. The len() protocol is intentionally simple, it only supports the first case, with the expectation that iterables will simply not define __len__ in the other two cases. Perhaps there is a case for updating the len() concept to explicitly handle cases 2 and 3, instead of simply not defining __len__. Perhaps it could return -1 for unknown and -2 for infinite. Or raise some other exception apart from TypeError. (I know there have been times I've wanted to know if an iterable was infinite, before spending the rest of my life iterating over it...) And perhaps we can come up with a concept of total length, versus length of items remaining. But these aren't simple issues with obvious solutions, it would surely need a PEP. And the benefit isn't obvious either. -- Steve From steve at pearwood.info Mon Nov 26 18:41:19 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Nov 2018 10:41:19 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181126234119.GM4319@ando.pearwood.info> On Tue, Nov 27, 2018 at 09:36:08AM +1100, Chris Angelico wrote: > Don't forget, too, that map() can take more than one iterable I forgot about that! But in this case, I think the answer is obvious: the length of the map object is the *smallest* length of the iterables, ignoring any unsized or infinite ones. Same would apply to zip(). But as per my previous post, there are other problems with this concept that aren't so easy to solve. -- Steve From toddrjen at gmail.com Mon Nov 26 18:45:06 2018 From: toddrjen at gmail.com (Todd) Date: Mon, 26 Nov 2018 18:45:06 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181126223720.GJ4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126223720.GJ4319@ando.pearwood.info> Message-ID: On Mon, Nov 26, 2018, 17:38 Steven D'Aprano On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote: > > If you know the input is sizeable, why not check its length instead of > the > > map's? > > The consumer of map may not be the producer of map. > > You might know that alist supports len(), but by the time I see it, I > only see map(f, alist), not alist itself. > Then you have no way of knowing whether it is safe to use "len" or not. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Nov 26 19:02:31 2018 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 27 Nov 2018 11:02:31 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181126234119.GM4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126234119.GM4319@ando.pearwood.info> Message-ID: On Tue, Nov 27, 2018 at 10:41 AM Steven D'Aprano wrote: > > On Tue, Nov 27, 2018 at 09:36:08AM +1100, Chris Angelico wrote: > > > Don't forget, too, that map() can take more than one iterable > > I forgot about that! > > But in this case, I think the answer is obvious: the length of the map > object is the *smallest* length of the iterables, ignoring any unsized > or infinite ones. Equally obvious and valid answer: The length is the smallest length of its iterables, ignoring any infinite ones, but if any iterable is unsized, the map is unsized. And both answers will surprise people. I still think there's room in the world for a "mapped list view" type, which retains a reference to an underlying list, plus a function, and proxies everything through to the function. It would NOT have the flexibility of map(), but it would be able to directly subscript, it wouldn't need any cache, etc, etc. ChrisA From kale at thekunderts.net Mon Nov 26 20:37:11 2018 From: kale at thekunderts.net (Kale Kundert) Date: Mon, 26 Nov 2018 17:37:11 -0800 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181126233713.GL4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126233713.GL4319@ando.pearwood.info> Message-ID: <82dbbb7d-7f8b-ac9f-f78b-9d8c2505e65a@thekunderts.net> Hi Steven, Thanks for the good feedback. > First, the usefulness: we still have to wrap the call to > len() in a try...except block, even if we know we have a map object, > because we won't know whether the underlying iterable supports len. So > it won't reduce the amount of code we have to write. At best it will > allow us to take a fast-path when len() returns a value, and a slow-path > when it raises. I think most of the time you would know whether the underlying iterable was sized or not.? After all, if you need the length, whatever code you're writing would probably not work on an infinite/unsized iterable. > So the length of the iterable has to vary as you iterate over it, or you > break the invariant shown above. I think I see the problem here.? map() is an iterator, where I was thinking of it as a wrapper around an iterable.? Since an iterator is really just a pointer into an iterable, it doesn't really make sense for it to have a length.? Give it one, and you end up with the inconsistencies you describe. I guess I probably would have disagreed with the decision to make map() an iterator rather than a wrapper around an iterable.? Such a prominent function should have an API geared towards usability, not towards implementing a low-level protocol (in my opinion).? But clearly that ship has sailed. -Kale From kale at thekunderts.net Mon Nov 26 20:44:55 2018 From: kale at thekunderts.net (Kale Kundert) Date: Mon, 26 Nov 2018 17:44:55 -0800 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126234119.GM4319@ando.pearwood.info> Message-ID: <778a8ecb-24cf-abdf-5987-86dd7df0b402@thekunderts.net> > > Equally obvious and valid answer: The length is the smallest length of > its iterables, ignoring any infinite ones, but if any iterable is > unsized, the map is unsized. > > And both answers will surprise people. > > I still think there's room in the world for a "mapped list view" type, > which retains a reference to an underlying list, plus a function, and > proxies everything through to the function. It would NOT have the > flexibility of map(), but it would be able to directly subscript, it > wouldn't need any cache, etc, etc. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I don't really agree that there are multiple surprising answers here.? If you iterate through the whole map, that will produce some number of elements, and that's the length.? Whether you can calculate that number in __len__() depends on the particular iterables you have, which is fine, but I don't think the definition of length is ambiguous. But I think Steven is right that you can't implement __len__() for an iterator without running into some inconsistencies.? It's just unfortunate that map() is an iterator. -Kale From rosuav at gmail.com Mon Nov 26 20:45:02 2018 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 27 Nov 2018 12:45:02 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <82dbbb7d-7f8b-ac9f-f78b-9d8c2505e65a@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181126233713.GL4319@ando.pearwood.info> <82dbbb7d-7f8b-ac9f-f78b-9d8c2505e65a@thekunderts.net> Message-ID: On Tue, Nov 27, 2018 at 12:37 PM Kale Kundert wrote: > I guess I probably would have disagreed with the decision to make map() an > iterator rather than a wrapper around an iterable. Such a prominent function > should have an API geared towards usability, not towards implementing a > low-level protocol (in my opinion). But clearly that ship has sailed. For map() returns an iterable that can be used more than once, it has to be mapping over an iterable that can be used more than once. That limits it. The way map is currently defined, it can accept any iterable, and it returns a one-shot iterable (which happens to be its own iterator). That's why I think the best solution is to create a separate mapped-sequence-view that depends on its iterable being an actual sequence, and exposes itself as a sequence also. (Yes, I said "list" in my previous post, but any sequence would work.) It can carry the length through, it can directly support subscripting, etc, etc, etc. Both it and map() would have their places. ChrisA From tjreedy at udel.edu Tue Nov 27 12:47:47 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 27 Nov 2018 12:47:47 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On 11/26/2018 4:29 PM, Kale Kundert wrote: > I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and > therefore might not always have a length. The len function is defined as always returning the length, an int >= 0. Hence .__len__ methods should always do the same. https://docs.python.org/3/reference/datamodel.html#object.__len__ Objects that cannot do that should not have this method. The previous discussion of this issue lead to function operator.length_hint and special method object.__length_hint__ in 3.4. https://docs.python.org/3/library/operator.html#operator.length_hint """ operator.length_hint(obj, default=0) Return an estimated length for the object o. First try to return its actual length, then an estimate using object.__length_hint__(), and finally return the default value. New in version 3.4. """ https://docs.python.org/3/reference/datamodel.html#object.__length_hint__ """ object.__length_hint__(self) Called to implement operator.length_hint(). Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer >= 0. This method is purely an optimization and is never required for correctness. New in version 3.4. """ > But in this case, it seems > like map() should've known that its length was 3. As others have pointed out, this is not true. If not infinite, the size, defined as the number of items to be yielded, and hence the size of list(iterator), shrinks by 1 after every next call, just as with pop methods. >>> it = iter([1,2,3]) >>> it.__length_hint__() 3 >>> next(it) 1 >>> it.__length_hint__() 2 >>> list(it) [2, 3] >>> it.__length_hint__() 0 Last I heard, list() uses length_hint for its initial allocation. But this is undocumented implementation. Built-in map does not have .__length_hint__, for the reasons others gave for it not having .__len__. But for private code, you are free to define a subclass that does, with the definition you want. -- Terry Jan Reedy From abedillon at gmail.com Tue Nov 27 15:21:55 2018 From: abedillon at gmail.com (Abe Dillon) Date: Tue, 27 Nov 2018 14:21:55 -0600 Subject: [Python-ideas] Make None a subclass of int [alternative to iNaN] In-Reply-To: References: Message-ID: I'm -1 on this idea. None is and should remain domain-independent. Specific domains may require additional special values like "NaN", "+/-inf", etc. for floating point math, in which case it makes more sense to define a domain-specific special value than compromise the independence of None. Doing so could break code that assumes None is only an instance of NoneType: if isinstance(x, int): handle_integer(x) else if x is None: handle_none() On Sun, Sep 30, 2018 at 1:06 AM Ken Hilton wrote: > Hi all, > > Reading the iNaN discussion, most of the opposition seems to be that > adding iNaN would add a new special value to integers and therefore add new > complexity. > > I propose, instead, that we make None a subclass of int (or even a certain > value of int) to represent iNaN. Therefore: > > >>> None + 1, None - 1, None * 2, None / 2, None // 2 > (None, None, None, nan, None) # mathematical operations on NaN return > NaN > >>> None & 1, None | 1, None ^ 1 > # I'm not sure about this one. The following could be plausible: > (0, 1, 1) > # or this might make more sense, as this *is* NaN we're talking about: > (None, None, None) > >>> isinstance(None, int) > True # the whole point of this idea > >>> issubclass(type(None), int) > True # no matter whether None *is* an int or just a subclass, this > will be true as issubclass(int, int) is True > > I know this is a crazy idea, but I thought it could have some merit, so > why not throw it out here? > > Sharing, > Ken Hilton; > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Tue Nov 27 23:47:06 2018 From: abedillon at gmail.com (Abe Dillon) Date: Tue, 27 Nov 2018 22:47:06 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs Message-ID: I've been pulling a lot of ideas from the recent discussion on design by contract (DBC), the elegance and drawbacks of doctests , and the amazing talk given by Hillel Wayne at this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the Next Level". To recap a lot of previous discussions: - Documentation should tell you: A) What a variable represents B) What kind of thing a variable is C) The acceptable values a variable can take - Typing and Tests can partially take the place of documentation by filling in B and C (respectively) and sometimes A can be inferred from decent naming and context. - Contracts can take the place of many tests (especially when combined with a library like hypothesis) - Contracts/assertions can provide "stable" documentation in the sense that it can't get out of sync with the code. - Attempts to implement contracts using standard Python syntax are verbose and noisy because they rely heavily on decorators that add a lot of repetitive preamble to the methods being decorated. They may also require a metaclass which restricts their use to code that doesn't already use a metaclass. - There was some discussion about the importance of "what a variable represents" which pointed to this article by Philip J. Guo (author of the magnificent pythontutor.com). I believe Guo's usage of "in-the-small" and "in-the-large" are confusing because a well decoupled program shouldn't yield functions that know or care how they're being used in the grand machinations of your project. The examples he gives are of functions that could use a doc string and some type annotations, but don't actually say how they relate to the rest of the project. One thing that caught me about Hillel Wayne's talk was that some of his examples were close to needing practically no code. He starts with: def tail(lst: List[Any]) -> List[Any]: assert len(lst) > 0, "precondition" result = lst[1:] assert [lst[0]] + result == lst, "postcondition" return result He then re-writes the function using a contracts library: @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] He then writes a unit test for the function: @given(lists(integers(), 1)) def test_tail(lst): tail(lst) What strikes me as interesting is that the test pretty-much doesn't need to be written. The 'given' statement should be redundant based on the type annotation and the precondition. Anyone who knows hypothesis, just imagine the @require is a hypothesis 'assume' call. Furthermore, hypothesis should be able to build strategies for more complex objects based on class invariants and attribute types: @invariant("no overdrafts", lambda self: self.balance >= 0) class Account: def __init__(self, number: int, balance: float = 0): super().__init__() self.number: int = number self.balance: float = balance A library like hypothesis should be able to generate valid account objects. Hypothesis also has stateful testing but I think the implementation could use some work. As it is, you have inherit from a class that uses a metaclass AND you have to pollute your class's name-space with helper objects and methods. If we could figure out a cleaner syntax for defining invariants, preconditions, and postconditions we'd be half-way to automated testing UTOPIA! (ok, maybe I'm being a little over-zealous) I think there are two missing pieces to this testing problem: side-effect verification and failure verification. Failure verification should test that the expected exceptions get thrown when known bad data is passed in or when an object is put in a known illegal state. This should be doable by allowing Hypothesis to probe the bounds of unacceptable input data or states, though it might seem a bit silly because if you've already added a precondition, "x >= 0" to a function, then it obviously should raise a PreconditionViolated when passed any x < 0. It may be important, however; if for performance reasons, you need to disable invariant checking but you still want certain bad input to raise exceptions, or your system has two components that interact with slightly mis-matched invariants and you want to make sure the components handle the edge-condition correctly. You can think of Types from a set-theory perspective where the Integer type is conceptually the set of all integers, and invariants would specify a smaller subset than Typing alone, however if the set of all valid outputs of one component is not completely contained within the set of all valid inputs to another component, then there will be edge-cases resulting from the mismatch. In that sense, some of the invariant verification could be static-ish (as much as Python allows). Side-effect verification is usually done by mocking dependencies. You pass in a mock database connection and make sure my object sends and receives data as expected. As crazy as it sounds, this too can be almost completely automated away if all of the above tools are in place AND if Python gained support for Exception annotations. I wrote a Java (yuck) library at work that does this. I wan't to port it to Python and share it, but it basically enumerates a bunch of stuff: the "sources" and "destinations" of the system, how those relate to dependencies, how they relate to each other (if dependency X is unresponsive, I can't get sources A, B, or G and if I can't get source B, I can't write destination Y), the dependency failure modes (Exceptions raised, timeouts, unrecognized key, missing data, etc.), all the public methods of the class under test and what sources and destinations they use. Then I enumerate 'k' from 0 to some limit for the max number of simultaneous faults to test for: Then for each method that can have n >= k simultaneous faults I test all (n choose k) combinations of faults for that method against the desired behavior. I'm sure that explanation is as clear as mud. I will try to get a working Python example at some point to demonstrate. Finally, in the PyCon video; Hillel Wayne shows an example of testing that an "add" function is commutative. It seems that once you write that invariant, it might apply to many different functions. A similar invariant may be "reversibility" like: @given(text()) def test_reversable_codex(s): assert s == decode(encode(s)), "not reversible" That might be a common property that other functions share: @invariant(reversible(decode)) def encode(s: str) -> bytes: ... Having said all that, I wanted to brainstorm some possible solutions for implementing some or all of the above in Python without drowning you code in decorators. NOTE: Please don't get hung up on specific syntax suggestions! Try to see the forest through the trees! An example syntax could be: #Instead of this @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] #Maybe this? non_empty = invariant("Must not be empty", lambda x: len(x) > 0) # can be re-used def tail(lst: List[Any] d"Description of what this param represents. {non_empty}") -> List[Any] d"Description of return value {lst == [lst[0]] + __result__}": """ Description of function """ return lst[1:] Python could build the full doc string like so: """ Description of function Args: lst: Description of what this param represents. Must not be empty. Returns: Description of return value. """ d-strings have some description followed by some terminator after which either invariant objects or [optionally strings] followed by an expression on the arguments and __return__? I'm sorry this is so half-baked. I don't really like the d-string concept and I'm pretty sure there are a million problems with it. I'll try to flesh out the side-effect verification concept more later along with all the other poorly explained stuff. I just wanted to get these thoughts out for discussion, but now it's super late and I have to go! -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.ristin at gmail.com Wed Nov 28 02:12:41 2018 From: marko.ristin at gmail.com (Marko Ristin-Kaufmann) Date: Wed, 28 Nov 2018 08:12:41 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: Hi Abe, I've been pulling a lot of ideas from the recent discussion on design by > contract (DBC), the elegance and drawbacks > of doctests > , and the amazing talk > given by Hillel Wayne at > this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the > Next Level". > Have you looked at the recent discussions regarding design-by-contract on this list ( https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU and the following forked threads)? You might want to have a look at static checking techniques such as abstract interpretation. I hope to be able to work on such a tool for Python in some two years from now. We can stay in touch if you are interested. Re decorators: to my own surprise, using decorators in a larger code base is completely practical including the readability and maintenance of the code. It's neither that ugly nor problematic as it might seem at first look. We use our https://github.com/Parquery/icontract at the company. Most of the design choices come from practical issues we faced -- so you might want to read the doc even if you don't plant to use the library. Some of the aspects we still haven't figured out are: how to approach multi-threading (locking around the whole function with an additional decorator?) and granularity of contract switches (right now we use always/optimized, production/non-optimized and teating/slow, but it seems that a larger system requires finer categories). Cheers Marko -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Nov 28 04:44:43 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Nov 2018 20:44:43 +1100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: <20181128094443.GS4319@ando.pearwood.info> On Tue, Nov 27, 2018 at 10:47:06PM -0600, Abe Dillon wrote: > If we could figure out a cleaner syntax for defining invariants, > preconditions, and postconditions we'd be half-way to automated testing > UTOPIA! (ok, maybe I'm being a little over-zealous) You should look at the state of the art in Design By Contract. In Eiffel, DBC is integrated in the language: https://www.eiffel.com/values/design-by-contract/introduction/ https://www.eiffel.org/doc/eiffel/ET-_Design_by_Contract_%28tm%29%2C_Assertions_and_Exceptions Eiffel uses a rather Pythonic block structure to define invariants. The syntax is not identical to Python's (Eiffel eschews the colons) but it also comes close to executable pseudo-code. trust this syntax requires little explanation: require ... preconditions, tested on function entry do ... body of the function ensure ... postconditions, tested on function exit end There is a similar invariant block for classes. Cobra is a language which intentionally modeled its syntax on Python. It too has contracts integrated with the language: http://cobra-language.com/how-to/DeclareContracts/ http://cobra-language.com/trac/cobra/wiki/Contracts -- Steve From solipsis at pitrou.net Wed Nov 28 09:18:10 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 15:18:10 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs References: Message-ID: <20181128151810.7c2393c2@fsol> On Tue, 27 Nov 2018 22:47:06 -0600 Abe Dillon wrote: > > If we could figure out a cleaner syntax for defining invariants, > preconditions, and postconditions we'd be half-way to automated testing > UTOPIA! (ok, maybe I'm being a little over-zealous) I think utopia is the word here. Fuzz testing can be useful, but it's not a replacement for manual testing of carefully selected values. Also, the idea that fuzz testing will automatically find edge cases in your code is idealistic. It depends on the algorithm you've implemented and the distribution of values chosen by the tester. Showcasing trivially wrong examples (such as an addition function that always returns 0, or a tail function that doesn't return the tail) isn't very helpful for a real-world analysis, IMHO. In the end, you have to be rigorous when writing tests, and for most non-trivial functions it requires that you devise the distribution of input values depending on the implemented algorithm, not leave that distribution to a third-party library that knows nothing about your program. Regards Antoine. From erik.m.bray at gmail.com Wed Nov 28 09:27:25 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 15:27:25 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert wrote: > > I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore might not always have a length. But in this case, it seems like map() should've known that its length was 3. I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. > > My proposal is to delegate map.__len__() to the underlying iterable. Similarly, map.__getitem__() could be implemented if the underlying iterable supports item access: I mostly agree with the existing objections, though I have often found myself wanting this too, especially now that `map` does not simply return a list. This problem alone (along with the same problem for filter) has had a ridiculously outsized impact on the Python 3 porting effort for SageMath, and I find it really irritating at times. As a simple counter-proposal which I believe has fewer issues, I would really like it if the built-in `map()` and `filter()` at least provided a Python-level attribute to access the underlying iterables. This is necessary because if I have a function that used to take, say, a list as an argument, and it receives a `map` object, I now have to be able to deal with map()s, and I may have checks I want to perform on the underlying iterables before, say, I try to iterate over the `map`. Exactly what those checks are and whether or not they're useful may be highly application-specific, which is why say a generic `map.__len__` is not workable. However, if I can at least inspect those iterables I can make my own choices on how to handle the map. Exposing the underlying iterables to Python also has dangers in that I could directly call `next()` on them and possibly create some confusion, but consenting adults and all that... From jfine2358 at gmail.com Wed Nov 28 09:45:36 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Wed, 28 Nov 2018 14:45:36 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Wed, Nov 28, 2018 at 2:28 PM E. Madison Bray wrote: > I mostly agree with the existing objections, though I have often found > myself wanting this too, especially now that `map` does not simply > return a list. This problem alone (along with the same problem for > filter) has had a ridiculously outsized impact on the Python 3 porting > effort for SageMath, and I find it really irritating at times. I'm a mathematician, so understand your concerns. Here's what I hope is a helpful suggestion. Create a module, say sage.itertools that contains (not tested) def py2map(iterable): return list(map(iterable)) The porting to Python 3 (for map) is now reduced to writing from .itertools import py2map as map at the head of each module. Please let me know if this helps. -- Jonathan From rosuav at gmail.com Wed Nov 28 09:54:18 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 01:54:18 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 1:46 AM Jonathan Fine wrote: > > On Wed, Nov 28, 2018 at 2:28 PM E. Madison Bray wrote: > > > I mostly agree with the existing objections, though I have often found > > myself wanting this too, especially now that `map` does not simply > > return a list. This problem alone (along with the same problem for > > filter) has had a ridiculously outsized impact on the Python 3 porting > > effort for SageMath, and I find it really irritating at times. > > I'm a mathematician, so understand your concerns. Here's what I hope > is a helpful suggestion. > > Create a module, say sage.itertools that contains (not tested) > > def py2map(iterable): > return list(map(iterable)) With the nitpick that the arguments should be (func, *iterables) rather than just the single iterable, yes, this is a viable transition strategy. In fact, it's very similar to what 2to3 would do, except that 2to3 would do it at the call site. If any Py3 porting process is being held up significantly by this, I would strongly recommend giving 2to3 an eyeball - run it on some of your code, then either accept its changes or just learn from the diffs. It's not perfect (nothing is), but it's a useful tool. ChrisA From steve at pearwood.info Wed Nov 28 10:03:49 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 02:03:49 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181128150348.GT4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 03:27:25PM +0100, E. Madison Bray wrote: > I mostly agree with the existing objections, though I have often found > myself wanting this too, especially now that `map` does not simply > return a list. This problem alone (along with the same problem for > filter) has had a ridiculously outsized impact on the Python 3 porting > effort for SageMath, and I find it really irritating at times. *scratches head* I presume that SageMath worked fine with Python 2 map and filter? You can have them back again: # put this in a module called py2 _map = map def map(*args): return list(_map(*args)) And similarly for filter. The only annoying part is to import this new map at the start of every module that needs it, but while that's annoying, I wouldn't call it a "ridiculously outsized impact". Its one line at the top of each module. from py2 import map, filter What am I missing? > As a simple counter-proposal which I believe has fewer issues, I would > really like it if the built-in `map()` and `filter()` at least > provided a Python-level attribute to access the underlying iterables. > This is necessary because if I have a function that used to take, say, > a list as an argument, and it receives a `map` object, I now have to > be able to deal with map()s, and I may have checks I want to perform > on the underlying iterables before, say, I try to iterate over the > `map`. > > Exactly what those checks are and whether or not they're useful may be > highly application-specific, which is why say a generic `map.__len__` > is not workable. However, if I can at least inspect those iterables I > can make my own choices on how to handle the map. Can you give a concrete example of what you would do in practice? I'm having trouble thinking of how and when this sort of thing would be useful. Aside from extracting the length of the iterable(s), under what circumstances would you want to bypass the call to map() or filter() and access the iterables directly? > Exposing the underlying iterables to Python also has dangers in that I > could directly call `next()` on them and possibly create some > confusion, but consenting adults and all that... I don't think that's worse than what we can already do if you hold onto a reference to the underlying iterable: py> a = [1, 2, 3] py> it = map(lambda x: x+100, a) py> next(it) 101 py> a.insert(0, None) py> next(it) 101 -- Steve From erik.m.bray at gmail.com Wed Nov 28 10:04:33 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 16:04:33 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Wed, Nov 28, 2018 at 3:54 PM Chris Angelico wrote: > > On Thu, Nov 29, 2018 at 1:46 AM Jonathan Fine wrote: > > > > On Wed, Nov 28, 2018 at 2:28 PM E. Madison Bray wrote: > > > > > I mostly agree with the existing objections, though I have often found > > > myself wanting this too, especially now that `map` does not simply > > > return a list. This problem alone (along with the same problem for > > > filter) has had a ridiculously outsized impact on the Python 3 porting > > > effort for SageMath, and I find it really irritating at times. > > > > I'm a mathematician, so understand your concerns. Here's what I hope > > is a helpful suggestion. > > > > Create a module, say sage.itertools that contains (not tested) > > > > def py2map(iterable): > > return list(map(iterable)) > > With the nitpick that the arguments should be (func, *iterables) > rather than just the single iterable, yes, this is a viable transition > strategy. In fact, it's very similar to what 2to3 would do, except > that 2to3 would do it at the call site. If any Py3 porting process is > being held up significantly by this, I would strongly recommend giving > 2to3 an eyeball - run it on some of your code, then either accept its > changes or just learn from the diffs. It's not perfect (nothing is), > but it's a useful tool. That effort is already mostly done and adding a helper function would not have worked as users *passing* map(...) as an argument to some function just expect it to work. The only alternative would have been replacing the builtin map with something else at the globals level. 2to3 is mostly useless since a major portion of Sage is written in Cython anyways. I just mentioned that porting effort for background. I still believe that the actual proposal of making the arguments to a map(...) call accessible from Python as attributes of the map object (ditto filter, zip, etc.) is useful in its own right, rather than just having this completely opaque iterator. From erik.m.bray at gmail.com Wed Nov 28 10:05:50 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 16:05:50 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128150348.GT4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 4:04 PM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 03:27:25PM +0100, E. Madison Bray wrote: > > > I mostly agree with the existing objections, though I have often found > > myself wanting this too, especially now that `map` does not simply > > return a list. This problem alone (along with the same problem for > > filter) has had a ridiculously outsized impact on the Python 3 porting > > effort for SageMath, and I find it really irritating at times. > > *scratches head* > > I presume that SageMath worked fine with Python 2 map and filter? You > can have them back again: > > # put this in a module called py2 > _map = map > def map(*args): > return list(_map(*args)) > > > And similarly for filter. The only annoying part is to import this new > map at the start of every module that needs it, but while that's > annoying, I wouldn't call it a "ridiculously outsized impact". Its one > line at the top of each module. > > from py2 import map, filter > > > What am I missing? Large amounts of context; size of code base. From steve at pearwood.info Wed Nov 28 10:14:13 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 02:14:13 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181128151412.GU4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote: > That effort is already mostly done and adding a helper function would > not have worked as users *passing* map(...) as an argument to some > function just expect it to work. Ah, that's what I was missing. But... surely the function will still work if they pass an opaque iterator *other* than map() and/or filter? it = (func(x) for x in something if condition(x)) some_sage_function(it) You surely don't expect to be able to peer inside every and any iterator that you are given? So if you have to handle the opaque iterator case anyway, how is it *worse* when the user passes map() or filter() instead of a generator like the above? > I just mentioned that porting effort for background. I still believe > that the actual proposal of making the arguments to a map(...) call > accessible from Python as attributes of the map object (ditto filter, > zip, etc.) is useful in its own right, rather than just having this > completely opaque iterator. Perhaps... I *want* to agree with this, but I'm having trouble thinking of when and how it would be useful. Some concrete examples would help justify it. -- Steve From erik.m.bray at gmail.com Wed Nov 28 10:14:24 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 16:14:24 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128150348.GT4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 4:04 PM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 03:27:25PM +0100, E. Madison Bray wrote: > > > I mostly agree with the existing objections, though I have often found > > myself wanting this too, especially now that `map` does not simply > > return a list. This problem alone (along with the same problem for > > filter) has had a ridiculously outsized impact on the Python 3 porting > > effort for SageMath, and I find it really irritating at times. > > > > As a simple counter-proposal which I believe has fewer issues, I would > > really like it if the built-in `map()` and `filter()` at least > > provided a Python-level attribute to access the underlying iterables. > > This is necessary because if I have a function that used to take, say, > > a list as an argument, and it receives a `map` object, I now have to > > be able to deal with map()s, and I may have checks I want to perform > > on the underlying iterables before, say, I try to iterate over the > > `map`. > > > > Exactly what those checks are and whether or not they're useful may be > > highly application-specific, which is why say a generic `map.__len__` > > is not workable. However, if I can at least inspect those iterables I > > can make my own choices on how to handle the map. > > Can you give a concrete example of what you would do in practice? I'm > having trouble thinking of how and when this sort of thing would be > useful. Aside from extracting the length of the iterable(s), under what > circumstances would you want to bypass the call to map() or filter() and > access the iterables directly? For example, some function that used to expect some finite-sized sequence such as a list or tuple is now passed a "map", possibly wrapping one or more iterable of arbitrary, possibly non-finite size. For the purposes of some algorithm I have this is not useful and I need to convert it to a sequence anyways but don't want to do that without some guarantee that I won't blow up the user's memory usage. So I might want to check: finite_definite = True for it in my_map.iters: try: len(it) except TypeError: finite_definite = False if finite_definite: my_seq = list(my_map) else: # some other algorithm Of course, some arbitrary object could lie about its __len__ but I'm not concerned about pathological cases here. There may be other opportunities for optimization as well that are otherwise hidden. Either way, I don't see any reason to hide this data; it's a couple of slot attributes and instantly better introspection capability. From erik.m.bray at gmail.com Wed Nov 28 10:18:35 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 16:18:35 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128151412.GU4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128151412.GU4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote: > > > That effort is already mostly done and adding a helper function would > > not have worked as users *passing* map(...) as an argument to some > > function just expect it to work. > > Ah, that's what I was missing. > > But... surely the function will still work if they pass an opaque > iterator *other* than map() and/or filter? > > it = (func(x) for x in something if condition(x)) > some_sage_function(it) That one is admittedly tricky. For that matter it might be nice to have more introspection of generator expressions too, but there at least we have .gi_code if nothing else. But those are a far less common example in my case, whereas map() is *everywhere* in math code :) From rosuav at gmail.com Wed Nov 28 10:24:05 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 02:24:05 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128151412.GU4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 2:19 AM E. Madison Bray wrote: > > On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano wrote: > > > > On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote: > > > > > That effort is already mostly done and adding a helper function would > > > not have worked as users *passing* map(...) as an argument to some > > > function just expect it to work. > > > > Ah, that's what I was missing. > > > > But... surely the function will still work if they pass an opaque > > iterator *other* than map() and/or filter? > > > > it = (func(x) for x in something if condition(x)) > > some_sage_function(it) > > That one is admittedly tricky. For that matter it might be nice to > have more introspection of generator expressions too, but there at > least we have .gi_code if nothing else. Considering that a genexp can do literally anything, I doubt you'll get anywhere with that introspection. > But those are a far less common example in my case, whereas map() is > *everywhere* in math code :) Perhaps then, the problem is that math code treats "map" as something that is more akin to "instrumented list" than it is to a generator. If you know for certain that you're mapping a low-cost pure function over an immutable collection, the best solution may be to proxy through to the original list than to generate values on the fly. And if that's the case, you don't want the Py3 map *or* the Py2 one, although the Py2 one can behave this way, at the cost of crazy amounts of efficiency. ChrisA From erik.m.bray at gmail.com Wed Nov 28 10:31:35 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 16:31:35 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128151412.GU4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 4:24 PM Chris Angelico wrote: > > On Thu, Nov 29, 2018 at 2:19 AM E. Madison Bray wrote: > > > > On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano wrote: > > > > > > On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote: > > > > > > > That effort is already mostly done and adding a helper function would > > > > not have worked as users *passing* map(...) as an argument to some > > > > function just expect it to work. > > > > > > Ah, that's what I was missing. > > > > > > But... surely the function will still work if they pass an opaque > > > iterator *other* than map() and/or filter? > > > > > > it = (func(x) for x in something if condition(x)) > > > some_sage_function(it) > > > > That one is admittedly tricky. For that matter it might be nice to > > have more introspection of generator expressions too, but there at > > least we have .gi_code if nothing else. > > Considering that a genexp can do literally anything, I doubt you'll > get anywhere with that introspection. > > > But those are a far less common example in my case, whereas map() is > > *everywhere* in math code :) > > Perhaps then, the problem is that math code treats "map" as something > that is more akin to "instrumented list" than it is to a generator. If > you know for certain that you're mapping a low-cost pure function over > an immutable collection, the best solution may be to proxy through to > the original list than to generate values on the fly. And if that's > the case, you don't want the Py3 map *or* the Py2 one, although the > Py2 one can behave this way, at the cost of crazy amounts of > efficiency. Yep, that's a great example where it might be possible to introspect a given `map` object and take it apart to do something more efficient with it. This is less of a problem with internal code where it's easy to just not use map() at all, and that is often the case. But a lot of the people who develop code for Sage are mathematicians, not engineers, and they may not be aware of this, so they write code that passes `map()` objects to more internal machinery. And users will do the same even moreso. I can (and have) written horrible C-level hacks--not for this specific issue, but others like it--and am sometimes tempted to do the same here :( From steve at pearwood.info Wed Nov 28 10:33:06 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 02:33:06 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> Message-ID: <20181128153305.GV4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 04:14:24PM +0100, E. Madison Bray wrote: > For example, some function that used to expect some finite-sized > sequence such as a list or tuple is now passed a "map", possibly > wrapping one or more iterable of arbitrary, possibly non-finite size. > For the purposes of some algorithm I have this is not useful and I > need to convert it to a sequence anyways but don't want to do that > without some guarantee that I won't blow up the user's memory usage. > So I might want to check: > > finite_definite = True > for it in my_map.iters: > try: > len(it) > except TypeError: > finite_definite = False > > if finite_definite: > my_seq = list(my_map) > else: > # some other algorithm But surely you didn't need to do this just because of *map*. Users could have passed an infinite, unsized iterable going back to Python 1 days with the old sequence protocol. They certainly could pass a generator or other opaque iterator apart from map. So I'm having trouble seeing why the Python 2/3 change to map made things worse for SageMath. But in any case, this example comes back to the question of len again, and we've already covered why this is problematic. In case you missed it, let's take a toy example which demonstrates the problem: def mean(it): if isinstance(it, map): # Hypothetical attribute access to the underlying iterable. n = len(it.iterable) return sum(it)/n Now let's pass a map object to it: data = [1, 2, 3, 4, 5] it = map(lambda x: x, data) assert len(it.iterable) == 5 next(it); next(it); next(it) assert mean(it) == 4.5 # fails, as it actually returns 9/5 instead of 9/2 -- Steve From marcos.eliziario at gmail.com Wed Nov 28 10:41:07 2018 From: marcos.eliziario at gmail.com (Marcos Eliziario) Date: Wed, 28 Nov 2018 13:41:07 -0200 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: <20181128151810.7c2393c2@fsol> References: <20181128151810.7c2393c2@fsol> Message-ID: > > In the end, you have to be rigorous when writing tests, and for most > non-trivial functions it requires that you devise the distribution of > input values depending on the implemented algorithm, not leave that > distribution to a third-party library that knows nothing about your > program. > Indeed. But the great thing about the "hypothesis" tool is that it allows me to somewhat automate the generation of sets of input values based on my specific requirements derived from my knowledge of my program. It allows me to think about what is the reasonable distribution of values for each argument in a function by either using existing strategies, using their arguments, combining and extending them, and them letting the tool do the grunt work of running the test for lots of different equivalent classes of argument values. I think that as long as the tool user keeps what you said in mind and uses the tool accordingly it can be a great helper, and probably even force the average programmer to think more rigorously about the input values to be tested, not to mention the whole class of trivial mistakes and forgetfulness we are all bound to be subject when writing test cases. Best, Em qua, 28 de nov de 2018 ?s 12:18, Antoine Pitrou escreveu: > On Tue, 27 Nov 2018 22:47:06 -0600 > Abe Dillon wrote: > > > > If we could figure out a cleaner syntax for defining invariants, > > preconditions, and postconditions we'd be half-way to automated testing > > UTOPIA! (ok, maybe I'm being a little over-zealous) > > I think utopia is the word here. Fuzz testing can be useful, but it's > not a replacement for manual testing of carefully selected values. > > Also, the idea that fuzz testing will automatically find edge cases in > your code is idealistic. It depends on the algorithm you've > implemented and the distribution of values chosen by the tester. > Showcasing trivially wrong examples (such as an addition function that > always returns 0, or a tail function that doesn't return the tail) > isn't very helpful for a real-world analysis, IMHO. > > In the end, you have to be rigorous when writing tests, and for most > non-trivial functions it requires that you devise the distribution of > input values depending on the implemented algorithm, not leave that > distribution to a third-party library that knows nothing about your > program. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marcos Elizi?rio Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario at gmail.com linked-in : https://www.linkedin.com/in/eliziario/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Wed Nov 28 10:52:17 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Wed, 28 Nov 2018 15:52:17 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128153305.GV4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> <20181128153305.GV4319@ando.pearwood.info> Message-ID: Suppose itr_1 is an iterator. Now consider itr_2 = map(lambda x: x, itr_1) itr_3 = itr_1 We now have itr_1, itr_2 and itr_3. There are all, effectively, the same iterator (unless we do an 'x is y' comparision). I conclude that this suggestion amounts to have a __len__ for ANY iterator, and not just a map. In other words, this suggestion has broader scope and consequences than were presented in the original post. -- Jonathan From erik.m.bray at gmail.com Wed Nov 28 11:04:26 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 17:04:26 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128151412.GU4319@ando.pearwood.info> Message-ID: One thing I'd like to add real quick to this (I'm on my phone so apologies for crappy quoting): Although there are existing cases where there is a loss of efficiency over Python 2 map() when dealing with the opaque, iterable Python 3 map(), the latter also presents many opportunities for enhancements that weren't possible before. For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers. On Wed, Nov 28, 2018, 16:31 E. Madison Bray On Wed, Nov 28, 2018 at 4:24 PM Chris Angelico wrote: > > > > On Thu, Nov 29, 2018 at 2:19 AM E. Madison Bray > wrote: > > > > > > On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano > wrote: > > > > > > > > On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote: > > > > > > > > > That effort is already mostly done and adding a helper function > would > > > > > not have worked as users *passing* map(...) as an argument to some > > > > > function just expect it to work. > > > > > > > > Ah, that's what I was missing. > > > > > > > > But... surely the function will still work if they pass an opaque > > > > iterator *other* than map() and/or filter? > > > > > > > > it = (func(x) for x in something if condition(x)) > > > > some_sage_function(it) > > > > > > That one is admittedly tricky. For that matter it might be nice to > > > have more introspection of generator expressions too, but there at > > > least we have .gi_code if nothing else. > > > > Considering that a genexp can do literally anything, I doubt you'll > > get anywhere with that introspection. > > > > > But those are a far less common example in my case, whereas map() is > > > *everywhere* in math code :) > > > > Perhaps then, the problem is that math code treats "map" as something > > that is more akin to "instrumented list" than it is to a generator. If > > you know for certain that you're mapping a low-cost pure function over > > an immutable collection, the best solution may be to proxy through to > > the original list than to generate values on the fly. And if that's > > the case, you don't want the Py3 map *or* the Py2 one, although the > > Py2 one can behave this way, at the cost of crazy amounts of > > efficiency. > > Yep, that's a great example where it might be possible to introspect a > given `map` object and take it apart to do something more efficient > with it. This is less of a problem with internal code where it's easy > to just not use map() at all, and that is often the case. But a lot > of the people who develop code for Sage are mathematicians, not > engineers, and they may not be aware of this, so they write code that > passes `map()` objects to more internal machinery. And users will do > the same even moreso. > > I can (and have) written horrible C-level hacks--not for this specific > issue, but others like it--and am sometimes tempted to do the same > here :( > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Wed Nov 28 11:16:30 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 17:16:30 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128153305.GV4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> <20181128153305.GV4319@ando.pearwood.info> Message-ID: Probably the most proliferate reason it made things *worse* is that many functions that can take collections as arguments--in fact probably most--were never written to accept arbitrary iterables in the first place. Perhaps they should have been, but the majority of that was before my time so I and others who worked on the Python 3 port were stuck with that. Sure the fix is simple enough: check if the object is iterable (itself not always as simple as one might assume) and then call list() on it. But we're talking thousands upon thousands of functions that need to be updated where examples involving map previously would have just worked. But on top of the obvious workarounds I would now like to do things like protect users, where possible, from doing things like passing arbitrarily sized data to relatively flimsy C libraries, or as I mentioned in my last message make new optimizations that weren't possible before. Of course this isn't always possible in some cases where dealing with an arbitrary opaque iterator, or some pathological cases. But I'm concerned more about doing the best we can in the most common cases (lists, tuples, vectors, etc) which are *vastly* more common. I use SageMath as an example but I'm sure others could come up with their own clever use cases. I know there are other cases where I've wanted to at least try to get the len of a map, at least in cases where it was unambiguous (for example making a progress meter or something) On Wed, Nov 28, 2018, 16:33 Steven D'Aprano On Wed, Nov 28, 2018 at 04:14:24PM +0100, E. Madison Bray wrote: > > > For example, some function that used to expect some finite-sized > > sequence such as a list or tuple is now passed a "map", possibly > > wrapping one or more iterable of arbitrary, possibly non-finite size. > > For the purposes of some algorithm I have this is not useful and I > > need to convert it to a sequence anyways but don't want to do that > > without some guarantee that I won't blow up the user's memory usage. > > So I might want to check: > > > > finite_definite = True > > for it in my_map.iters: > > try: > > len(it) > > except TypeError: > > finite_definite = False > > > > if finite_definite: > > my_seq = list(my_map) > > else: > > # some other algorithm > > But surely you didn't need to do this just because of *map*. Users could > have passed an infinite, unsized iterable going back to Python 1 days > with the old sequence protocol. They certainly could pass a generator or > other opaque iterator apart from map. So I'm having trouble seeing why > the Python 2/3 change to map made things worse for SageMath. > > But in any case, this example comes back to the question of len again, > and we've already covered why this is problematic. In case you missed > it, let's take a toy example which demonstrates the problem: > > > def mean(it): > if isinstance(it, map): > # Hypothetical attribute access to the underlying iterable. > n = len(it.iterable) > return sum(it)/n > > > Now let's pass a map object to it: > > data = [1, 2, 3, 4, 5] > it = map(lambda x: x, data) > assert len(it.iterable) == 5 > next(it); next(it); next(it) > > assert mean(it) == 4.5 > # fails, as it actually returns 9/5 instead of 9/2 > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Wed Nov 28 11:25:03 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Wed, 28 Nov 2018 17:25:03 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> <20181128153305.GV4319@ando.pearwood.info> Message-ID: I should add, I know the history here of bitterness surrounding Python 3 complaints and this is not one. I defend most things Python 3 and have ported many projects (Sage just being the largest by orders of magnitude, with every Python 3 porting quirk represented and often magnified). I agree with the new iterable map(), filter(), and zip() and welcomed that change. But I think making them more introspectable would be a useful enhancement. On Wed, Nov 28, 2018, 17:16 E. Madison Bray Probably the most proliferate reason it made things *worse* is that many > functions that can take collections as arguments--in fact probably > most--were never written to accept arbitrary iterables in the first place. > Perhaps they should have been, but the majority of that was before my time > so I and others who worked on the Python 3 port were stuck with that. > > Sure the fix is simple enough: check if the object is iterable (itself not > always as simple as one might assume) and then call list() on it. But we're > talking thousands upon thousands of functions that need to be updated where > examples involving map previously would have just worked. > > But on top of the obvious workarounds I would now like to do things like > protect users, where possible, from doing things like passing arbitrarily > sized data to relatively flimsy C libraries, or as I mentioned in my last > message make new optimizations that weren't possible before. > > Of course this isn't always possible in some cases where dealing with an > arbitrary opaque iterator, or some pathological cases. But I'm concerned > more about doing the best we can in the most common cases (lists, tuples, > vectors, etc) which are *vastly* more common. > > I use SageMath as an example but I'm sure others could come up with their > own clever use cases. I know there are other cases where I've wanted to at > least try to get the len of a map, at least in cases where it was > unambiguous (for example making a progress meter or something) > > On Wed, Nov 28, 2018, 16:33 Steven D'Aprano >> On Wed, Nov 28, 2018 at 04:14:24PM +0100, E. Madison Bray wrote: >> >> > For example, some function that used to expect some finite-sized >> > sequence such as a list or tuple is now passed a "map", possibly >> > wrapping one or more iterable of arbitrary, possibly non-finite size. >> > For the purposes of some algorithm I have this is not useful and I >> > need to convert it to a sequence anyways but don't want to do that >> > without some guarantee that I won't blow up the user's memory usage. >> > So I might want to check: >> > >> > finite_definite = True >> > for it in my_map.iters: >> > try: >> > len(it) >> > except TypeError: >> > finite_definite = False >> > >> > if finite_definite: >> > my_seq = list(my_map) >> > else: >> > # some other algorithm >> >> But surely you didn't need to do this just because of *map*. Users could >> have passed an infinite, unsized iterable going back to Python 1 days >> with the old sequence protocol. They certainly could pass a generator or >> other opaque iterator apart from map. So I'm having trouble seeing why >> the Python 2/3 change to map made things worse for SageMath. >> >> But in any case, this example comes back to the question of len again, >> and we've already covered why this is problematic. In case you missed >> it, let's take a toy example which demonstrates the problem: >> >> >> def mean(it): >> if isinstance(it, map): >> # Hypothetical attribute access to the underlying iterable. >> n = len(it.iterable) >> return sum(it)/n >> >> >> Now let's pass a map object to it: >> >> data = [1, 2, 3, 4, 5] >> it = map(lambda x: x, data) >> assert len(it.iterable) == 5 >> next(it); next(it); next(it) >> >> assert mean(it) == 4.5 >> # fails, as it actually returns 9/5 instead of 9/2 >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Wed Nov 28 11:28:49 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Wed, 28 Nov 2018 16:28:49 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> <20181128153305.GV4319@ando.pearwood.info> Message-ID: Hi Madison Is there a URL somewhere where I can view code written to port sage to Python3? I've already found https://trac.sagemath.org/search?q=python3 And because I'm a bit interested in cluster algebra, I went to https://git.sagemath.org/sage.git/commit/?id=3a6f494ac1d4dbc1e22b0ecbebdbc639f6c7f6d3 Is this a good example of the change required? Are there other example worth looking at? -- Jonathan From boxed at killingar.net Wed Nov 28 11:37:39 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Wed, 28 Nov 2018 17:37:39 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: > I just mentioned that porting effort for background. I still believe > that the actual proposal of making the arguments to a map(...) call > accessible from Python as attributes of the map object (ditto filter, > zip, etc.) is useful in its own right, rather than just having this > completely opaque iterator. +1. Throwing away information is almost always a bad idea. That was fixed with classes and kwargs in 3.6 which removes a lot of fiddle workarounds for example. Throwing away data needlessly is also why 2to3, baron, Parso and probably many more had to reimplement a python parser instead of using the built in. We should have information preservation and transparency be general design goals imo. Not because we can see the obvious use now but because it keeps the door open to discover uses later. / Anders From tjreedy at udel.edu Wed Nov 28 14:53:50 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 28 Nov 2018 14:53:50 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On 11/28/2018 9:27 AM, E. Madison Bray wrote: > On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert wrote: >> >> I just ran into the following behavior, and found it surprising: >> >>>>> len(map(float, [1,2,3])) >> TypeError: object of type 'map' has no len() >> >> I understand that map() could be given an infinite sequence and therefore might not always have a length. But in this case, it seems like map() should've known that its length was 3. I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. >> >> My proposal is to delegate map.__len__() to the underlying iterable. One of the guidelines in the Zen of Python is "Special cases aren't special enough to break the rules." This proposal claims that the Python 3 built-in iterator class 'map' is so special that it should break the rule that iterators in general cannot and therefore do not have .__len__ methods because their size may be infinite, unknowable until exhaustion, or declining with each .__next__ call. For iterators, 3.4 added an optional __length_hint__ method. This makes sense for iterators, like tuple_iterator, list_iterator, range_iterator, and dict_keyiterator, based on a known finite collection. At the time, map.__length_hint__ was proposed and rejected as problematic, for obvious reasons, and insufficiently useful. The proposal above amounts to adding an unspecified __length_hint__ misnamed as __len__. Won't happen. Instead, proponents should define and test one or more specific implementations of __length_hint__ in map subclass(es). > I mostly agree with the existing objections, though I have often found > myself wanting this too, especially now that `map` does not simply > return a list. What makes the map class special among all built-in iterator classes? It appears not to be a property of the class itself, as an iterator class, but of its name. In Python 2, 'map' was bound to a different implementation of the map idea, a function that produced a list, which has a length. I suspect that if Python 3 were the original Python, we would not have this discussion. > As a simple counter-proposal which I believe has fewer issues, I would > really like it if the built-in `map()` and `filter()` at least > provided a Python-level attribute to access the underlying iterables. This proposes to make map (and filter) special in a different way, by adding other special (dunder) attributes. In general, built-in callables do not attach their args to their output, for obvious reasons. If they do, they do not expose them. If input data must be saved, the details are implementation dependent. A C-coded callable would not necessarily save information in the form of Python objects. Again, it seems to me that the only thing special about these two, versus the other iterators left in itertools, is the history of the names. > This is necessary because if I have a function that used to take, say, > a list as an argument, and it receives a `map` object, I now have to > be able to deal with map()s, If a function is documented as requiring a list, or a sequence, or a length object, it is a user bug to pass an iterator. The only thing special about map and filter as errors is the rebinding of the names between Py2 and Py3, so that the same code may be good in 2.x and bad in 3.x. Perhaps 2.7, in addition to future imports of text as unicode and print as a function, should have had one to make map and filter be the 3.x iterators. Perhaps Sage needs something like def size_map(func, *iterables): for it in iterables: if not hasattr(it, '__len__'): raise TypeError(f'iterable {repr(it)} has no size') return map(func, *iterables) https://docs.python.org/3/library/functions.html#map says "map(function, iterable, ...) Return an iterator [...]" The wording is intentional. The fact that map is a class and the iterator an instance of the class is a CPython implementation detail. Another implementation could use the generator function equivalent given in the Python 2 itertools doc, or a translation thereof. I don't know what pypy and other implementations do. The fact that CPython itertools callables are (now) C-coded classes instead Python-coded generator functions, or C translations thereof (which is tricky) is for performance and ease of maintenance. -- Terry Jan Reedy From abedillon at gmail.com Wed Nov 28 15:28:16 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 14:28:16 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: [Marko Ristin-Kaufmann] > > Have you looked at the recent discussions regarding design-by-contract on > this list I tried to read through them all before posting, but I may have missed some of the forks. There was a lot of good discussion! [Marko Ristin-Kaufmann] > You might want to have a look at static checking techniques such as > abstract interpretation. I hope to be able to work on such a tool for > Python in some two years from now. We can stay in touch if you are > interested. I'll look into that! I'm very interested! [Marko Ristin-Kaufmann] > Re decorators: to my own surprise, using decorators in a larger code base > is completely practical including the readability and maintenance of the > code. It's neither that ugly nor problematic as it might seem at first look. Interesting. In the thread you linked on DBC, it seemed like Steve D'Aprano and David Mertz (and possibly others) were put off by the verbosity and noisiness of the decorator-based solution you provided with icontract (though I think there are ways to streamline that solution). It seems like syntactic support could offer a more concise and less noisy implementation. One thing that I can get on a soap-box about is the benefit putting the most relevant information to the reader in the order of top to bottom and left to right whenever possible. I've written many posts about this. I think a lot of Python syntax gets this right. It would have been easy to follow the same order as for-loops when designing comprehensions, but expressions allow you some freedom to order things differently, so now comprehensions read: squares = ... # squares is squares = [... # squares is a list squares = [number*number... # squares is a list of num squared squares = [number*number for num in numbers] # squares is a list of num squared 'from' numbers I think decorators sort-of break this rule because they can put a lot of less important information (like, that a function is logged or timed) before more important information (like the function's name, signature, doc-string, etc...). It's not a huge deal because they tend to be de-emphasized by my IDE and there typically aren't dozens of them on each function, but I definitely prefer Eiffel's syntax over decorators for that reason. I understand that syntax changes have an very high bar for very good reasons. Hillel Wayne's PyCon talk got me thinking that we might be close enough to a really great solution to a wide variety of testing problems that it might justify some new syntax or perhaps someone has an idea that wouldn't require new syntax that I didn't think of. [Marko Ristin-Kaufmann] > Some of the aspects we still haven't figured out are: how to approach > multi-threading (locking around the whole function with an additional > decorator?) and granularity of contract switches (right now we use > always/optimized, production/non-optimized and teating/slow, but it seems > that a larger system requires finer categories). Yeah... I don't know anything about testing concurrent or parallel code. On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin at gmail.com> wrote: > Hi Abe, > > I've been pulling a lot of ideas from the recent discussion on design by >> contract (DBC), the elegance and drawbacks >> of doctests >> , and the amazing talk >> given by Hillel Wayne at >> this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the >> Next Level". >> > > Have you looked at the recent discussions regarding design-by-contract on > this list ( > https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU > and the following forked threads)? > > You might want to have a look at static checking techniques such as > abstract interpretation. I hope to be able to work on such a tool for > Python in some two years from now. We can stay in touch if you are > interested. > > Re decorators: to my own surprise, using decorators in a larger code base > is completely practical including the readability and maintenance of the > code. It's neither that ugly nor problematic as it might seem at first look. > > We use our https://github.com/Parquery/icontract at the company. Most of > the design choices come from practical issues we faced -- so you might want > to read the doc even if you don't plant to use the library. > > Some of the aspects we still haven't figured out are: how to approach > multi-threading (locking around the whole function with an additional > decorator?) and granularity of contract switches (right now we use > always/optimized, production/non-optimized and teating/slow, but it seems > that a larger system requires finer categories). > > Cheers Marko > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Nov 28 15:41:41 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 Nov 2018 09:41:41 +1300 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <5BFEFD85.3060700@canterbury.ac.nz> E. Madison Bray wrote: > if I have a function that used to take, say, > a list as an argument, and it receives a `map` object, I now have to > be able to deal with map()s, and I may have checks I want to perform > on the underlying iterables before, say, I try to iterate over the > `map`. This sounds like a backwards way to address the issue. If you have a function that expects a list in particular, it's up to its callers to make sure they give it one. Instead of maing the function do a bunch of looking before it leaps, it would be better to define something like def lmap(f, *args): return list(map(f, *args)) and then replace 'map' with 'lmap' elsewhere in your code. -- Greg From abedillon at gmail.com Wed Nov 28 16:31:54 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 15:31:54 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: <20181128094443.GS4319@ando.pearwood.info> References: <20181128094443.GS4319@ando.pearwood.info> Message-ID: [Steven D'Aprano] > You should look at the state of the art in Design By Contract. In > Eiffel, DBC is integrated in the language: > https://www.eiffel.com/values/design-by-contract/introduction/ > > https://www.eiffel.org/doc/eiffel/ET-_Design_by_Contract_%28tm%29%2C_Assertions_and_Exceptions > > Eiffel uses a rather Pythonic block structure to define invariants. > The syntax is not identical to Python's (Eiffel eschews the colons) but > it also comes close to executable pseudo-code. Thank you! I forgot to mention this (or look into how other languages solve this problem). I saw your example syntax in the recent DBC main thread and liked it a lot. One thought I keep coming back to is this comparison between doc string formats . It seems obvious that the "Sphynxy" style is the noisiest, most verbose, and ugliest format. Instead of putting ":arg ...:" and ":type ...:" for each parameter and the return value, it makes much more sense to open up an Args: section and use a concise notation for type. The decorator-based pre and post conditions seem like they suffer from the same redundant, noisy, verbosity problem as the Sphynxy docstring format but makes it worse by put all that noise before the function declaration itself. It makes sense to me that a docstring might have a markdown-style syntax like def format_exception(etype, value): """ Format the exception with a traceback. Args: etype (str): what etype represents [some constraint on etype](precondition) [another constraint on etype](in_line_precondition?) value (int): what value represents [some constraint on value](precondition) [some constraints across multiple params](precondition) Returns: What the return value represents # usually very similar to the description at the top [some constraint on return](postcondition) """ ... That ties most bits of the documentation to some code that enforces the correctness of the documentation. And if it's a little noisy, we could take another page from markdown's book and offer alternate ways to reference precondition and postcondition logic. I'm worried that such a style would carry a lot of the same drawbacks as doctest Also, my sense of coding style has been heavily influenced by [this talk]( https://vimeo.com/74316116), particularly the part where he shoves a mangled Hamlet Soliloquy into the margins, so now many of my functions adopt the following style: def someDescriptiveName( arg1: SomeType, arg2: AnotherType[Thing], ... argN: SomeOtherType = default_value) -> ReturnType: """ what the function does Args: arg1: what arg1 represents arg2: what arg2 represents ... """ ... This highlights a rather obvious duplication of code. We declare an arguments section in code and list all the arguments, then we do so again in the doc string. If you want your doc string to stay in sync with the code, this duplication is a problem. It makes more sense to tie the documentation for an argument to said argument: def someDescriptiveName( # what the function does arg1: SomeType, # what arg1 represents arg2: AnotherType[Thing], # what arg2 represents ... argN: SomeOtherType = default_value # what argN represents ) -> ReturnType: # what the return value represents ... I think it especially makes sense if you consider the preconditions, postconditions, and invariants as a sort-of extension of typing in the sense that it Typing narrows the set of acceptable values to a set of types and contracts restrict that set further. I hope that clarifies my thought process. I don't like the d-strings that I proposed. I'd prefer syntax closer to Eiffel, but the above is the line of thought I was following to arrive at d-strings. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Wed Nov 28 16:58:24 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 15:58:24 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: <20181128151810.7c2393c2@fsol> References: <20181128151810.7c2393c2@fsol> Message-ID: [Antoinw Pitrou] > I think utopia is the word here. Fuzz testing can be useful, but it's > not a replacement for manual testing of carefully selected values. First, they aren't mutually exclusive. It's trivial to add manually selected cases to a hypothesis test. Second, from my experience; people rarely choose between carefully selected optimal values and Fuzz testing, they usually choose between manually selected trivial values or no test at all. Thirdly, Computers are very good at exhaustively searching multidimensional spaces. If your tool sucks so bad at that that a human can do it better, then your tool needs work. Improving the tool saves way more time than reverting to manual testing. There was a post long ago (I think I read it on Digg.com for some indication) about how to run a cloud-based system correctly. One of the controversial practice the article advocated was disabling ssh on the machine instances. The rational is that you never want to waste your time fiddling with an instance that's not behaving properly. In cloud-systems, instances should not be special. If they fail, blow them away and bring up another. If the failure persists, it's a problem with the *system* not the instance. If you care about individual instances YOU'RE DOING IT WRONG. You need to re-design the system. On Wed, Nov 28, 2018 at 8:19 AM Antoine Pitrou wrote: > On Tue, 27 Nov 2018 22:47:06 -0600 > Abe Dillon wrote: > > > > If we could figure out a cleaner syntax for defining invariants, > > preconditions, and postconditions we'd be half-way to automated testing > > UTOPIA! (ok, maybe I'm being a little over-zealous) > > I think utopia is the word here. Fuzz testing can be useful, but it's > not a replacement for manual testing of carefully selected values. > > Also, the idea that fuzz testing will automatically find edge cases in > your code is idealistic. It depends on the algorithm you've > implemented and the distribution of values chosen by the tester. > Showcasing trivially wrong examples (such as an addition function that > always returns 0, or a tail function that doesn't return the tail) > isn't very helpful for a real-world analysis, IMHO. > > In the end, you have to be rigorous when writing tests, and for most > non-trivial functions it requires that you devise the distribution of > input values depending on the implemented algorithm, not leave that > distribution to a third-party library that knows nothing about your > program. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Nov 28 17:03:23 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 09:03:23 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181128220323.GX4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 05:37:39PM +0100, Anders Hovm?ller wrote: > > > > I just mentioned that porting effort for background. I still believe > > that the actual proposal of making the arguments to a map(...) call > > accessible from Python as attributes of the map object (ditto filter, > > zip, etc.) is useful in its own right, rather than just having this > > completely opaque iterator. > > +1. Throwing away information is almost always a bad idea. "Almost always"? Let's take this seriously, and think about the consequences if we actually believed that. If I created a series of integers: a = 23 b = 0x17 c = 0o27 d = 0b10111 e = int('1b', 12) your assertion would say it is a bad idea to throw away the information about how they were created, and hence we ought to treat all five values as distinct and distinguishable. So much for the small integer cache... Perhaps every single object we create ought to hold onto a AST representing the literal or expression which was used to create it. Let's not exaggerate the benefit, and ignore the costs, of "throwing away information". Sometimes we absolutely do want to throw away information, or at least make it inaccessible to the consumer of our data structures. Sometimes the right thing to do is *not* to open up interfaces unless there is a clear need for it to be open. Doing so adds bloat to the interface, prevents many changes in implementation including potential optimizations, and may carry significant memory burdens. Bringing this discussion back to the concrete proposal in this thread, as I said earlier, I want to agree with this proposal. I too like the idea of having map (and filter, and zip...) objects expose their arguments, and for the same reason: "it might be useful some day". But every time we scratch beneath the surface and try to think about how and when we would actually use that information, we run into conceptual and practical problems which suggests strongly to me that doing this will turn it into a serious bug magnet, an anti-feature which sounds good but causes more problems than it solves. I'm really hoping someone can convince me this is a good idea, but so far the proposal seems like an attractive nuisance and not a feature. > We should have information preservation and transparency be general > design goals imo. Not because we can see the obvious use now but > because it keeps the door open to discover uses later. While that is a reasonable position to take in some circumstances, in others it goes completely against YAGNI. -- Steve From solipsis at pitrou.net Wed Nov 28 17:08:26 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 23:08:26 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs References: <20181128151810.7c2393c2@fsol> Message-ID: <20181128230826.39ce721c@fsol> On Wed, 28 Nov 2018 15:58:24 -0600 Abe Dillon wrote: > Thirdly, Computers are very good at exhaustively searching multidimensional > spaces. How long do you think it will take your computer to exhaustively search the space of possible input values to a 2-integer addition function? Do you think it can finish before the Earth gets engulfed by the Sun? Regards Antoine. From abedillon at gmail.com Wed Nov 28 17:24:50 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 16:24:50 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: <20181128230826.39ce721c@fsol> References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: [Antoine Pitrou] > How long do you think it will take your computer to exhaustively search > the space of possible input values to a 2-integer addition function? > Do you think it can finish before the Earth gets engulfed by the Sun? Yes, ok. I used the word "exhaustively" wrong. Sorry about that. I don't think humans are made of a magical substance that can exhaustively search the space of possible pairs of integers before the heat-death of the universe. I think humans use strategies based, hopefully; in logic to come up with test examples, and that it's often more valuable to capture said strategies in code than to make a human run the algorithms. In cases where domain-knowledge helps inform the search strategy, there should be easy-to-use tools to build a domain-specific search strategy. On Wed, Nov 28, 2018 at 4:09 PM Antoine Pitrou wrote: > On Wed, 28 Nov 2018 15:58:24 -0600 > Abe Dillon wrote: > > Thirdly, Computers are very good at exhaustively searching > multidimensional > > spaces. > > How long do you think it will take your computer to exhaustively search > the space of possible input values to a 2-integer addition function? > > Do you think it can finish before the Earth gets engulfed by the Sun? > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Nov 28 17:27:14 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 09:27:14 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181128222714.GY4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 02:53:50PM -0500, Terry Reedy wrote: > One of the guidelines in the Zen of Python is > "Special cases aren't special enough to break the rules." > > This proposal claims that the Python 3 built-in iterator class 'map' is > so special that it should break the rule that iterators in general > cannot and therefore do not have .__len__ methods because their size may > be infinite, unknowable until exhaustion, or declining with each > .__next__ call. > > For iterators, 3.4 added an optional __length_hint__ method. This makes > sense for iterators, like tuple_iterator, list_iterator, range_iterator, > and dict_keyiterator, based on a known finite collection. At the time, > map.__length_hint__ was proposed and rejected as problematic, for > obvious reasons, and insufficiently useful. Thanks for the background Terry, but doesn't that suggest that sometimes special cases ARE special enough to break the rules? *wink* Unfortunately, I don't think it is obvious why map.__length_hint__ is problematic. It only needs to return the *maximum* length, or some sentinel (zero?) to say "I don't know". It doesn't need to be accurate, unlike __len__ itself. Perhaps we should rethink the decision not to give map() and filter() a length hint? [...] > What makes the map class special among all built-in iterator classes? > It appears not to be a property of the class itself, as an iterator > class, but of its name. In Python 2, 'map' was bound to a different > implementation of the map idea, a function that produced a list, which > has a length. I suspect that if Python 3 were the original Python, we > would not have this discussion. No, in fairness, I too have often wanted to know the length of an arbitrary iterator, including map(), without consuming it. In general this is an unsolvable problem, but sometimes it is (or at least, at first glance *seems*) solvable. map() is one of those cases. If we could solve it, that would be great -- but I'm not convinced that it is solvable, since the solution seems worse than the problem it aims to solve. But I live in hope that somebody cleverer than me can point out the flaws in my argument. [...] > If a function is documented as requiring a list, or a sequence, or a > length object, it is a user bug to pass an iterator. The only thing > special about map and filter as errors is the rebinding of the names > between Py2 and Py3, so that the same code may be good in 2.x and bad in > 3.x. > > Perhaps 2.7, in addition to future imports of text as unicode and print > as a function, should have had one to make map and filter be the 3.x > iterators. I think that's future_builtins: [steve at ando ~]$ python2.7 -c "from future_builtins import *; print map(len, [])" But that wouldn't have helped E. Madison Bray or SageMath, since their difficulty is not their own internal use of map(), but their users' use of map(). Unless they simply ban any use of iterators at all, which I imagine will be a backwards-incompatible change (and for that matter an excessive overreaction for many uses), SageMath can't prevent users from providing map() objects or other iterator arguments. -- Steve From steve at pearwood.info Wed Nov 28 17:03:23 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 09:03:23 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181128220323.GX4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 05:37:39PM +0100, Anders Hovm?ller wrote: > > > > I just mentioned that porting effort for background. I still believe > > that the actual proposal of making the arguments to a map(...) call > > accessible from Python as attributes of the map object (ditto filter, > > zip, etc.) is useful in its own right, rather than just having this > > completely opaque iterator. > > +1. Throwing away information is almost always a bad idea. "Almost always"? Let's take this seriously, and think about the consequences if we actually believed that. If I created a series of integers: a = 23 b = 0x17 c = 0o27 d = 0b10111 e = int('1b', 12) your assertion would say it is a bad idea to throw away the information about how they were created, and hence we ought to treat all five values as distinct and distinguishable. So much for the small integer cache... Perhaps every single object we create ought to hold onto a AST representing the literal or expression which was used to create it. Let's not exaggerate the benefit, and ignore the costs, of "throwing away information". Sometimes we absolutely do want to throw away information, or at least make it inaccessible to the consumer of our data structures. Sometimes the right thing to do is *not* to open up interfaces unless there is a clear need for it to be open. Doing so adds bloat to the interface, prevents many changes in implementation including potential optimizations, and may carry significant memory burdens. Bringing this discussion back to the concrete proposal in this thread, as I said earlier, I want to agree with this proposal. I too like the idea of having map (and filter, and zip...) objects expose their arguments, and for the same reason: "it might be useful some day". But every time we scratch beneath the surface and try to think about how and when we would actually use that information, we run into conceptual and practical problems which suggests strongly to me that doing this will turn it into a serious bug magnet, an anti-feature which sounds good but causes more problems than it solves. I'm really hoping someone can convince me this is a good idea, but so far the proposal seems like an attractive nuisance and not a feature. > We should have information preservation and transparency be general > design goals imo. Not because we can see the obvious use now but > because it keeps the door open to discover uses later. While that is a reasonable position to take in some circumstances, in others it goes completely against YAGNI. -- Steve From greg.ewing at canterbury.ac.nz Wed Nov 28 17:45:05 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 Nov 2018 11:45:05 +1300 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <5BFF1A71.203@canterbury.ac.nz> E. Madison Bray wrote: > I still believe > that the actual proposal of making the arguments to a map(...) call > accessible from Python as attributes of the map object (ditto filter, > zip, etc.) is useful in its own right, rather than just having this > completely opaque iterator. But it will only help if the user passes a map object in particular, and not some other kind of iterator. Also it won't help if the inputs to the map are themselves iterators that aren't amenable to inspection. This smells like exposing an implementation detail of your function in its API. I don't see how it would help with your Sage port either, since the original code only got the result of the mapping and wouldn't have been able to inspect the underlying iterables. I wonder whether it's too late to redefine map() so that it returns a view object instead of an iterator, as was done when merging dict.{items, iter_items} etc. Alternatively, add a mapped() bultin that returns a view. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 28 17:59:27 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 Nov 2018 11:59:27 +1300 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> Message-ID: <5BFF1DCF.5040003@canterbury.ac.nz> E. Madison Bray wrote: > So I might want to check: > > finite_definite = True > for it in my_map.iters: > try: > len(it) > except TypeError: > finite_definite = False > > if finite_definite: > my_seq = list(my_map) > else: > # some other algorithm If map is being passed into your function, you can still do this check before calling map. If the user is doing the mapping themselves, then in Python 2 it would have blown up anyway before your function even got called, so nothing is any worse. -- Greg From abedillon at gmail.com Wed Nov 28 18:14:54 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 17:14:54 -0600 Subject: [Python-ideas] __len__() for map() In-Reply-To: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: I raised a related problem a while back when I found that random.sample can only take a sequence. The example I gave was randomly sampling points on a 2D grid to initialize a board for Conway's Game of Life: >>> def random_board(height: int, width: int, ratio: float = 0.5) -> Set[Tuple[int, int]]: ... """ produce a set of points randomly chosen from an height x width grid """ ... all_points = itertools.product(range(height), range(width)) ... num_samples = ratio*height*width ... return set(random.sample(all_points, num_samples)) ... >>> random_board(height=5, width=10, ratio=0.25) TypeError: Population must be a sequence or set. For dicts, use list(d). It seems like there should be some way to pass along the information that the size *is* known, but I couldn't think of any way of passing that info along without adding massive amounts of complexity everywhere. If map is able to support len() under certain circumstances, it makes sense that other iterators and generators would be able to do the same. You might even want a way to annotate a generator function with logic about how it might support len(). I don't have an answer to this problem, but I hope this provides some sense of the scope of what you're asking. On Mon, Nov 26, 2018 at 3:36 PM Kale Kundert wrote: > I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore > might not always have a length. But in this case, it seems like map() > should've known that its length was 3. I also understand that I can just > call list() on the whole thing and get a list, but the nice thing about > map() is that it doesn't copy data, so it's unfortunate to lose that > advantage for no particular reason. > > My proposal is to delegate map.__len__() to the underlying iterable. > Similarly, map.__getitem__() could be implemented if the underlying > iterable supports item access: > > class map: > > def __init__(self, func, iterable): > self.func = func > self.iterable = iterable > > def __iter__(self): > yield from (self.func(x) for x in self.iterable) > > def __len__(self): > return len(self.iterable) > > def __getitem__(self, key): > return self.func(self.iterable[key]) > > Let me know if there any downsides to this that I'm not seeing. From my > perspective, it seems like there would be only a number of (small) > advantages: > > - Less surprising > - Avoid some unnecessary copies > - Backwards compatible > > -Kale > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Nov 28 18:24:19 2018 From: mertz at gnosis.cx (David Mertz) Date: Wed, 28 Nov 2018 18:24:19 -0500 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: <20181128230826.39ce721c@fsol> References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us. On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou On Wed, 28 Nov 2018 15:58:24 -0600 > Abe Dillon wrote: > > Thirdly, Computers are very good at exhaustively searching > multidimensional > > spaces. > > How long do you think it will take your computer to exhaustively search > the space of possible input values to a 2-integer addition function? > > Do you think it can finish before the Earth gets engulfed by the Sun? > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Nov 28 18:27:03 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 10:27:03 +1100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: On Thu, Nov 29, 2018 at 10:25 AM David Mertz wrote: > > That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us. > Python ints are not 32-bit ints. Have fun. :) ChrisA From antoine at python.org Wed Nov 28 18:42:50 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 00:42:50 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: But Python integers are variable-sized, and their size is basically limited by available memory or address space. Let's take a typical 64-bit Python build, assuming 4 GB RAM available. Let's also assume that 90% of those 4 GB can be readily allocated for Python objects (there's overhead, etc.). Also let's take a look at the Python integer representation: >>> sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4) This means that every 4 bytes of integer object store 30 bit of actual integer data. So, how many bits has the largest allocatable integer on that system, assuming 90% of 4 GB are available for allocation? >>> nbits = (2**32)*0.9*30/4 >>> nbits 28991029248.0 Now how many possible integers are there in that number of bits? >>> x = 1 << int(nbits) >>> x.bit_length() 28991029249 (yes, that number was successfully allocated in full. And the Python process occupies 3.7 GB RAM at that point, which validates the estimate.) Let's try to have a readable approximation of that number. Convert it to a float perhaps? >>> float(x) Traceback (most recent call last): File "", line 1, in OverflowError: int too large to convert to float Well, of course. So let's just extract a power of 10: >>> math.log10(x) 8727169408.819794 >>> 10**0.819794 6.603801339268099 (yes, math.log10() works on non-float-convertible integers. I'm impressed!) So the number of representable integers on that system is approximately 6.6e8727169408. Let's hope the Sun takes its time. (and of course, what is true for ints is true for any variable-sized input, such as strings, lists, dicts, sets, etc.) Regards Antoine. Le 29/11/2018 ? 00:24, David Mertz a ?crit?: > That's easy, Antoine. On a reasonable modern multi-core workstation, I > can do 4 billion additions per second. A year is just over 30 million > seconds. For 32-bit ints, I can whiz through the task in only 130,000 > years. We have at least several hundred million years before the sun > engulfs us. > > On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou wrote: > > On Wed, 28 Nov 2018 15:58:24 -0600 > Abe Dillon > wrote: > > Thirdly, Computers are very good at exhaustively searching > multidimensional > > spaces. > > How long do you think it will take your computer to exhaustively search > the space of possible input values to a 2-integer addition function? > > Do you think it can finish before the Earth gets engulfed by the Sun? > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From marcos.eliziario at gmail.com Wed Nov 28 20:22:20 2018 From: marcos.eliziario at gmail.com (Marcos Eliziario) Date: Wed, 28 Nov 2018 23:22:20 -0200 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: But nobody is talking about exhausting the combinatoric space of all possible values. Property Based Testing looks like Fuzzy Testing but it is not quite the same thing. Property based testing is not about just generating random values till the heath death of the universe, but generating sensible values in a configurable way to cover all equivalence classes we can think of. if my function takes two floating point numbers as arguments, hypothesis "strategies" won't try all possible combinations of all possible floating point values, but instead all possible combination of interesting values (NaN, Infinity, too big, too small, positive, negative, zero, None, decimal fractions, etc..), something that an experienced programmer probably would end up doing by himself with a lot of test cases, but that can be better done with less effort by the automation provided by the hypothesis package. It could be well that just by using such a tool, a naive programmer could end up being convinced of the fact that maybe he probably would better be served by sticking to Decimal Arithmetics :-) Em qua, 28 de nov de 2018 ?s 21:43, Antoine Pitrou escreveu: > > But Python integers are variable-sized, and their size is basically > limited by available memory or address space. > > Let's take a typical 64-bit Python build, assuming 4 GB RAM available. > Let's also assume that 90% of those 4 GB can be readily allocated for > Python objects (there's overhead, etc.). > > Also let's take a look at the Python integer representation: > > >>> sys.int_info > sys.int_info(bits_per_digit=30, sizeof_digit=4) > > This means that every 4 bytes of integer object store 30 bit of actual > integer data. > > So, how many bits has the largest allocatable integer on that system, > assuming 90% of 4 GB are available for allocation? > > >>> nbits = (2**32)*0.9*30/4 > >>> nbits > 28991029248.0 > > Now how many possible integers are there in that number of bits? > > >>> x = 1 << int(nbits) > >>> x.bit_length() > 28991029249 > > (yes, that number was successfully allocated in full. And the Python > process occupies 3.7 GB RAM at that point, which validates the estimate.) > > Let's try to have a readable approximation of that number. Convert it > to a float perhaps? > > >>> float(x) > Traceback (most recent call last): > File "", line 1, in > OverflowError: int too large to convert to float > > Well, of course. So let's just extract a power of 10: > > >>> math.log10(x) > 8727169408.819794 > >>> 10**0.819794 > 6.603801339268099 > > (yes, math.log10() works on non-float-convertible integers. I'm > impressed!) > > So the number of representable integers on that system is approximately > 6.6e8727169408. Let's hope the Sun takes its time. > > (and of course, what is true for ints is true for any variable-sized > input, such as strings, lists, dicts, sets, etc.) > > Regards > > Antoine. > > > Le 29/11/2018 ? 00:24, David Mertz a ?crit : > > That's easy, Antoine. On a reasonable modern multi-core workstation, I > > can do 4 billion additions per second. A year is just over 30 million > > seconds. For 32-bit ints, I can whiz through the task in only 130,000 > > years. We have at least several hundred million years before the sun > > engulfs us. > > > > On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou > wrote: > > > > On Wed, 28 Nov 2018 15:58:24 -0600 > > Abe Dillon > wrote: > > > Thirdly, Computers are very good at exhaustively searching > > multidimensional > > > spaces. > > > > How long do you think it will take your computer to exhaustively > search > > the space of possible input values to a 2-integer addition function? > > > > Do you think it can finish before the Earth gets engulfed by the Sun? > > > > Regards > > > > Antoine. > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marcos Elizi?rio Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario at gmail.com linked-in : https://www.linkedin.com/in/eliziario/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Wed Nov 28 20:26:17 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 19:26:17 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: Marko, I have a few thoughts that might improve icontract. First, multiple clauses per decorator: @pre( *lambda* x: x >= 0, *lambda* y: y >= 0, *lambda* width: width >= 0, *lambda* height: height >= 0, *lambda* x, width, img: x + width <= width_of(img), *lambda* y, height, img: y + height <= height_of(img)) @post( *lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height *def* __contains__(self, pt: Tuple[int, int]) -> bool: x, y = pt return (self.x <= x < self.x + self.width) and (self.y <= y < self.y + self.height) You might be able to get away with some magic by decorating a method just to flag it as using contracts: @contract # <- does byte-code and/or AST voodoo *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: pre(x >= 0, y >= 0, width >= 0, height >= 0, x + width <= width_of(img), y + height <= height_of(img)) # this would probably be declared at the class level inv(*lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height That might be super tricky to implement, but it saves you some lambda noise. Also, I saw a forked thread in which you were considering some sort of transpiler with similar syntax to the above example. That also works. Another thing to consider is that the role of descriptors overlaps some with the role of invariants. I don't know what to do with that knowledge, but it seems like it might be useful. Anyway, I hope those half-baked thoughts have *some* value... On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin at gmail.com> wrote: > Hi Abe, > > I've been pulling a lot of ideas from the recent discussion on design by >> contract (DBC), the elegance and drawbacks >> of doctests >> , and the amazing talk >> given by Hillel Wayne at >> this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the >> Next Level". >> > > Have you looked at the recent discussions regarding design-by-contract on > this list ( > https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU > and the following forked threads)? > > You might want to have a look at static checking techniques such as > abstract interpretation. I hope to be able to work on such a tool for > Python in some two years from now. We can stay in touch if you are > interested. > > Re decorators: to my own surprise, using decorators in a larger code base > is completely practical including the readability and maintenance of the > code. It's neither that ugly nor problematic as it might seem at first look. > > We use our https://github.com/Parquery/icontract at the company. Most of > the design choices come from practical issues we faced -- so you might want > to read the doc even if you don't plant to use the library. > > Some of the aspects we still haven't figured out are: how to approach > multi-threading (locking around the whole function with an additional > decorator?) and granularity of contract switches (right now we use > always/optimized, production/non-optimized and teating/slow, but it seems > that a larger system requires finer categories). > > Cheers Marko > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Nov 28 20:46:35 2018 From: mertz at gnosis.cx (David Mertz) Date: Wed, 28 Nov 2018 20:46:35 -0500 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: I was assuming it was a Numba-ized function since it's purely numeric. ;-) FWIW, the theoretical limit of Python ints is limited by the fact 'int.bit_length()' is a platform native int. So my system cannot store ints larger than (2**(2**63-1)). It'll take a lot more memory than my measly 4GiB to store that number though. So yes, that's way longer that heat-death-of-universe even before 128-bit machines are widespread. On Wed, Nov 28, 2018, 6:43 PM Antoine Pitrou > But Python integers are variable-sized, and their size is basically > limited by available memory or address space. > > Let's take a typical 64-bit Python build, assuming 4 GB RAM available. > Let's also assume that 90% of those 4 GB can be readily allocated for > Python objects (there's overhead, etc.). > > Also let's take a look at the Python integer representation: > > >>> sys.int_info > sys.int_info(bits_per_digit=30, sizeof_digit=4) > > This means that every 4 bytes of integer object store 30 bit of actual > integer data. > > So, how many bits has the largest allocatable integer on that system, > assuming 90% of 4 GB are available for allocation? > > >>> nbits = (2**32)*0.9*30/4 > >>> nbits > 28991029248.0 > > Now how many possible integers are there in that number of bits? > > >>> x = 1 << int(nbits) > >>> x.bit_length() > 28991029249 > > (yes, that number was successfully allocated in full. And the Python > process occupies 3.7 GB RAM at that point, which validates the estimate.) > > Let's try to have a readable approximation of that number. Convert it > to a float perhaps? > > >>> float(x) > Traceback (most recent call last): > File "", line 1, in > OverflowError: int too large to convert to float > > Well, of course. So let's just extract a power of 10: > > >>> math.log10(x) > 8727169408.819794 > >>> 10**0.819794 > 6.603801339268099 > > (yes, math.log10() works on non-float-convertible integers. I'm > impressed!) > > So the number of representable integers on that system is approximately > 6.6e8727169408. Let's hope the Sun takes its time. > > (and of course, what is true for ints is true for any variable-sized > input, such as strings, lists, dicts, sets, etc.) > > Regards > > Antoine. > > > Le 29/11/2018 ? 00:24, David Mertz a ?crit : > > That's easy, Antoine. On a reasonable modern multi-core workstation, I > > can do 4 billion additions per second. A year is just over 30 million > > seconds. For 32-bit ints, I can whiz through the task in only 130,000 > > years. We have at least several hundred million years before the sun > > engulfs us. > > > > On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou > wrote: > > > > On Wed, 28 Nov 2018 15:58:24 -0600 > > Abe Dillon > wrote: > > > Thirdly, Computers are very good at exhaustively searching > > multidimensional > > > spaces. > > > > How long do you think it will take your computer to exhaustively > search > > the space of possible input values to a 2-integer addition function? > > > > Do you think it can finish before the Earth gets engulfed by the Sun? > > > > Regards > > > > Antoine. > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Wed Nov 28 20:49:24 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 19:49:24 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: OK. I know I made a mistake by saying, "computers are very good at *exhaustively* searching multidimensional spaces." I should have said, "computers are very good at enumerating examples from multi-dimensional spaces" or something to that effect. Now that we've had our fun, can you guys please continue in a forked conversation so it doesn't derail the conversation? On Wed, Nov 28, 2018 at 7:47 PM David Mertz wrote: > I was assuming it was a Numba-ized function since it's purely numeric. ;-) > > FWIW, the theoretical limit of Python ints is limited by the fact > 'int.bit_length()' is a platform native int. So my system cannot store ints > larger than (2**(2**63-1)). It'll take a lot more memory than my measly > 4GiB to store that number though. > > So yes, that's way longer that heat-death-of-universe even before 128-bit > machines are widespread. > > On Wed, Nov 28, 2018, 6:43 PM Antoine Pitrou >> >> But Python integers are variable-sized, and their size is basically >> limited by available memory or address space. >> >> Let's take a typical 64-bit Python build, assuming 4 GB RAM available. >> Let's also assume that 90% of those 4 GB can be readily allocated for >> Python objects (there's overhead, etc.). >> >> Also let's take a look at the Python integer representation: >> >> >>> sys.int_info >> sys.int_info(bits_per_digit=30, sizeof_digit=4) >> >> This means that every 4 bytes of integer object store 30 bit of actual >> integer data. >> >> So, how many bits has the largest allocatable integer on that system, >> assuming 90% of 4 GB are available for allocation? >> >> >>> nbits = (2**32)*0.9*30/4 >> >>> nbits >> 28991029248.0 >> >> Now how many possible integers are there in that number of bits? >> >> >>> x = 1 << int(nbits) >> >>> x.bit_length() >> 28991029249 >> >> (yes, that number was successfully allocated in full. And the Python >> process occupies 3.7 GB RAM at that point, which validates the estimate.) >> >> Let's try to have a readable approximation of that number. Convert it >> to a float perhaps? >> >> >>> float(x) >> Traceback (most recent call last): >> File "", line 1, in >> OverflowError: int too large to convert to float >> >> Well, of course. So let's just extract a power of 10: >> >> >>> math.log10(x) >> 8727169408.819794 >> >>> 10**0.819794 >> 6.603801339268099 >> >> (yes, math.log10() works on non-float-convertible integers. I'm >> impressed!) >> >> So the number of representable integers on that system is approximately >> 6.6e8727169408. Let's hope the Sun takes its time. >> >> (and of course, what is true for ints is true for any variable-sized >> input, such as strings, lists, dicts, sets, etc.) >> >> Regards >> >> Antoine. >> >> >> Le 29/11/2018 ? 00:24, David Mertz a ?crit : >> > That's easy, Antoine. On a reasonable modern multi-core workstation, I >> > can do 4 billion additions per second. A year is just over 30 million >> > seconds. For 32-bit ints, I can whiz through the task in only 130,000 >> > years. We have at least several hundred million years before the sun >> > engulfs us. >> > >> > On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou > > wrote: >> > >> > On Wed, 28 Nov 2018 15:58:24 -0600 >> > Abe Dillon > >> wrote: >> > > Thirdly, Computers are very good at exhaustively searching >> > multidimensional >> > > spaces. >> > >> > How long do you think it will take your computer to exhaustively >> search >> > the space of possible input values to a 2-integer addition function? >> > >> > Do you think it can finish before the Earth gets engulfed by the >> Sun? >> > >> > Regards >> > >> > Antoine. >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Wed Nov 28 21:47:07 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 28 Nov 2018 20:47:07 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: One thought I had pertains to a very narrow sub-set of cases, but may provide a starting point. For the cases where a precondition, invariant, or postcondition only involves a single parameter, attribute, or the return value (respectively) and it's reasonably simple, one could write it as an expression acting directly on the type annotation: def encabulate( reactive_inductance: 1 >= float > 0, # description capacitive_diractance: int > 1, # description delta_winding: bool # description ) -> len(Set[DingleArm]) > 0: # ??? I don't know how you would handle more complex objects... do_stuff with_things .... Anyway. Just more food for thought... On Tue, Nov 27, 2018 at 10:47 PM Abe Dillon wrote: > I've been pulling a lot of ideas from the recent discussion on design by > contract (DBC), the elegance and drawbacks > of doctests > , and the amazing talk > given by Hillel Wayne at > this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the > Next Level". > > To recap a lot of previous discussions: > > - Documentation should tell you: > A) What a variable represents > B) What kind of thing a variable is > C) The acceptable values a variable can take > > - Typing and Tests can partially take the place of documentation by > filling in B and C (respectively) and sometimes A can be inferred from > decent naming and context. > > - Contracts can take the place of many tests (especially when combined > with a library like hypothesis) > > - Contracts/assertions can provide "stable" documentation in the sense > that it can't get out of sync with the code. > > - Attempts to implement contracts using standard Python syntax are verbose > and noisy because they rely heavily on decorators that add a lot of > repetitive preamble to the methods being decorated. They may also require a > metaclass which restricts their use to code that doesn't already use a > metaclass. > > - There was some discussion about the importance of "what a variable > represents" which pointed to this article > by Philip J. Guo (author of > the magnificent pythontutor.com). I believe Guo's usage of "in-the-small" > and "in-the-large" are confusing because a well decoupled program shouldn't > yield functions that know or care how they're being used in the grand > machinations of your project. The examples he gives are of functions that > could use a doc string and some type annotations, but don't actually say > how they relate to the rest of the project. > > One thing that caught me about Hillel Wayne's talk was that some of his > examples were close to needing practically no code. He starts with: > > def tail(lst: List[Any]) -> List[Any]: > assert len(lst) > 0, "precondition" > result = lst[1:] > assert [lst[0]] + result == lst, "postcondition" > return result > > He then re-writes the function using a contracts library: > > @require("lst must not be empty", lambda args: len(args.lst) > 0) > @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + > result == args.lst) > def tail(lst: List[Any]) -> List[Any]: > return lst[1:] > > He then writes a unit test for the function: > > @given(lists(integers(), 1)) > def test_tail(lst): > tail(lst) > > What strikes me as interesting is that the test pretty-much doesn't need > to be written. The 'given' statement should be redundant based on the type > annotation and the precondition. Anyone who knows hypothesis, just imagine > the @require is a hypothesis 'assume' call. Furthermore, hypothesis should > be able to build strategies for more complex objects based on class > invariants and attribute types: > > @invariant("no overdrafts", lambda self: self.balance >= 0) > class Account: > def __init__(self, number: int, balance: float = 0): > super().__init__() > self.number: int = number > self.balance: float = balance > > A library like hypothesis should be able to generate valid account > objects. Hypothesis also has stateful testing > but I think > the implementation could use some work. As it is, you have inherit from a > class that uses a metaclass AND you have to pollute your class's name-space > with helper objects and methods. > > If we could figure out a cleaner syntax for defining invariants, > preconditions, and postconditions we'd be half-way to automated testing > UTOPIA! (ok, maybe I'm being a little over-zealous) > > I think there are two missing pieces to this testing problem: side-effect > verification and failure verification. > > Failure verification should test that the expected exceptions get thrown > when known bad data is passed in or when an object is put in a known > illegal state. This should be doable by allowing Hypothesis to probe the > bounds of unacceptable input data or states, though it might seem a bit > silly because if you've already added a precondition, "x >= 0" to a > function, then it obviously should raise a PreconditionViolated when passed > any x < 0. It may be important, however; if for performance reasons, you > need to disable invariant checking but you still want certain bad input to > raise exceptions, or your system has two components that interact with > slightly mis-matched invariants and you want to make sure the components > handle the edge-condition correctly. You can think of Types from a > set-theory perspective where the Integer type is conceptually the set of > all integers, and invariants would specify a smaller subset than Typing > alone, however if the set of all valid outputs of one component is not > completely contained within the set of all valid inputs to another > component, then there will be edge-cases resulting from the mismatch. In > that sense, some of the invariant verification could be static-ish (as much > as Python allows). > > Side-effect verification is usually done by mocking dependencies. You pass > in a mock database connection and make sure my object sends and receives > data as expected. As crazy as it sounds, this too can be almost completely > automated away if all of the above tools are in place AND if Python gained > support for Exception annotations. I wrote a Java (yuck) library at work > that does this. I wan't to port it to Python and share it, but it basically > enumerates a bunch of stuff: the "sources" and "destinations" of the > system, how those relate to dependencies, how they relate to each other (if > dependency X is unresponsive, I can't get sources A, B, or G and if I can't > get source B, I can't write destination Y), the dependency failure modes > (Exceptions raised, timeouts, unrecognized key, missing data, etc.), all > the public methods of the class under test and what sources and > destinations they use. > > Then I enumerate 'k' from 0 to some limit for the max number of > simultaneous faults to test for: > Then for each method that can have n >= k simultaneous faults I test > all (n choose k) combinations of faults for that method against the desired > behavior. > > I'm sure that explanation is as clear as mud. I will try to get a working > Python example at some point to demonstrate. > > Finally, in the PyCon video; Hillel Wayne shows an example of testing that > an "add" function is commutative. It seems that once you write that > invariant, it might apply to many different functions. A similar invariant > may be "reversibility" like: > > @given(text()) > def test_reversable_codex(s): > assert s == decode(encode(s)), "not reversible" > > That might be a common property that other functions share: > > @invariant(reversible(decode)) > def encode(s: str) -> bytes: ... > > Having said all that, I wanted to brainstorm some possible solutions for > implementing some or all of the above in Python without drowning you code > in decorators. > > NOTE: Please don't get hung up on specific syntax suggestions! Try to see > the forest through the trees! > > An example syntax could be: > > #Instead of this > @require("lst must not be empty", lambda args: len(args.lst) > 0) > @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + > result == args.lst) > def tail(lst: List[Any]) -> List[Any]: > return lst[1:] > > #Maybe this? > non_empty = invariant("Must not be empty", lambda x: len(x) > 0) # can be > re-used > > def tail(lst: List[Any] d"Description of what this param represents. > {non_empty}") -> List[Any] d"Description of return value {lst == [lst[0]] > + __result__}": > """ > Description of function > """ > return lst[1:] > > Python could build the full doc string like so: > > """ > Description of function > > Args: > lst: Description of what this param represents. Must not be empty. > > Returns: > Description of return value. > """ > > d-strings have some description followed by some terminator after which > either invariant objects or [optionally strings] followed by an expression > on the arguments and __return__? > > I'm sorry this is so half-baked. I don't really like the d-string concept > and I'm pretty sure there are a million problems with it. I'll try to flesh > out the side-effect verification concept more later along with all the > other poorly explained stuff. I just wanted to get these thoughts out for > discussion, but now it's super late and I have to go! > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Thu Nov 29 00:06:51 2018 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Thu, 29 Nov 2018 06:06:51 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128220323.GX4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> Message-ID: >> +1. Throwing away information is almost always a bad idea. > > "Almost always"? Let's take this seriously, and think about the > consequences if we actually believed that. If I created a series of > integers: ?Almost". It?s part of my sentence. I have known about addition for many years in fact :) / Anders From marko.ristin at gmail.com Thu Nov 29 01:25:31 2018 From: marko.ristin at gmail.com (Marko Ristin-Kaufmann) Date: Thu, 29 Nov 2018 07:25:31 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: Hi, Property based testing is not about just generating random values till the > heath death of the universe, but generating sensible values in a > configurable way to cover all equivalence classes we can think of. if my > function takes two floating point numbers as arguments, hypothesis > "strategies" won't try all possible combinations of all possible floating > point values, but instead all possible combination of interesting values > (NaN, Infinity, too big, too small, positive, negative, zero, None, decimal > fractions, etc..), something that an experienced programmer probably would > end up doing by himself with a lot of test cases, but that can be better > done with less effort by the automation provided by the hypothesis package. > Exactly. A tool can go a step further and, based on the assertions and contracts, generate the tests automatically or prove that certain properties of the program always hold. I would encourage people interested in automatic testing to have a look at the scientific literature on the topic (formal static analysis). Abstract interpretation has been already mentioned: https://en.wikipedia.org/wiki/Abstract_interpretation. For some bleeding edge, have a look what they do at this lab with the machine learning: https://eth-sri.github.io/publications/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.ristin at gmail.com Thu Nov 29 02:05:00 2018 From: marko.ristin at gmail.com (Marko Ristin-Kaufmann) Date: Thu, 29 Nov 2018 08:05:00 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: Hi Abe, Thanks for your suggestions! We actually already considered the two alternatives you propose. *Multiple predicates per decorator. *The problem is that you can not deal with toggling/describing individual contracts easily. While you can hack your way through it (considering the arguments in the sequence, for example), we found it clearer to have separate decorators. Moreover, tracebacks are much easier to read, which is important when you debug a program. *AST magic. *The problem with any approach based on parsing (be it parsing the code or the description) is that parsing is slow so you end up spending a lot of cycles on contracts which might not be enabled (many contracts are applied only in the testing environment, not int he production). Hence you must have an approach that offers practically zero overhead cost to importing a module when its contracts are turned off. Decoding byte-code does not work as current decoding libraries can not keep up with the changes in the language and the compiler hence they are always lagging behind. *Practicality of decorators. *We have retrospective meetings at the company and I frequently survey the opinions related to the contracts (explicitly asking about the readability and maintainability) -- so far nobody had any difficulties and nobody was bothered by the noisy syntax. The decorator syntax is simply not beautiful, no discussion about that. But when it comes to maintenance, there's a linter included ( https://github.com/Parquery/pyicontract-lint), and if you want contracts rendered in an appealing way, there's a documentation tool for sphinx ( https://github.com/Parquery/sphinx-icontract). The linter facilitates the maintainability a lot and sphinx tool gives you nice documentation for a library so that you don't even have to look into the source code that often if you don't want to. We need to be careful not to mistake issues of aesthetics for practical issues. Something might not be beautiful, but can be useful unless it's unreadable. *Conclusion. *What we do need at this moment, IMO, is a broad practical experience of using contracts in Python. Once you make a change to the language, it's impossible to undo. In contrast to what has been suggested in the previous discussions (including my own voiced opinions), I actually now don't think that introducing a language change would be beneficial *at this precise moment*. We don't know what the use cases are, and there is no practical experience to base the language change on. I'd prefer to hear from people who actually use contracts in their professional Python programming -- apart from the noisy syntax, how was the experience? Did it help you catch bugs (and how many)? Were there big problems with maintainability? Could you easily refactor? What were the limits of the contracts you encountered? What kind of snapshot mechanism do we need? How did you deal with multi-threading? And so on. icontract library is already practically usable and, if you don't use inheritance, dpcontracts is usable as well. I would encourage everybody to try out programming with contracts using an existing library and just hold their nose when writing the noisy syntax. Once we unearthed deeper problems related to contracts, I think it will be much easier and much more convincing to write a proposal for introducing contracts in the core language. If I had to write a proposal right now, it would be only based on the experience of writing a humble 100K code base by a team of 5-10 people. Not very convincing. Cheers, Marko On Thu, 29 Nov 2018 at 02:26, Abe Dillon wrote: > Marko, I have a few thoughts that might improve icontract. > First, multiple clauses per decorator: > > @pre( > *lambda* x: x >= 0, > *lambda* y: y >= 0, > *lambda* width: width >= 0, > *lambda* height: height >= 0, > *lambda* x, width, img: x + width <= width_of(img), > *lambda* y, height, img: y + height <= height_of(img)) > @post( > *lambda* self: (self.x, self.y) in self, > *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, > *lambda* self: (self.x+self.width, self.y+self.height) not in self) > *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: > int) -> None: > self.img = img[y : y+height, x : x+width].copy() > self.x = x > self.y = y > self.width = width > self.height = height > > *def* __contains__(self, pt: Tuple[int, int]) -> bool: > x, y = pt > return (self.x <= x < self.x + self.width) and (self.y <= y < self.y + > self.height) > > > You might be able to get away with some magic by decorating a method just > to flag it as using contracts: > > > @contract # <- does byte-code and/or AST voodoo > *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: > int) -> None: > pre(x >= 0, > y >= 0, > width >= 0, > height >= 0, > x + width <= width_of(img), > y + height <= height_of(img)) > > # this would probably be declared at the class level > inv(*lambda* self: (self.x, self.y) in self, > *lambda* self: (self.x+self.width-1, self.y+self.height-1) in > self, > *lambda* self: (self.x+self.width, self.y+self.height) not in > self) > > self.img = img[y : y+height, x : x+width].copy() > self.x = x > self.y = y > self.width = width > self.height = height > > That might be super tricky to implement, but it saves you some lambda > noise. Also, I saw a forked thread in which you were considering some sort > of transpiler with similar syntax to the above example. That also works. > Another thing to consider is that the role of descriptors > overlaps > some with the role of invariants. I don't know what to do with that > knowledge, but it seems like it might be useful. > > Anyway, I hope those half-baked thoughts have *some* value... > > On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < > marko.ristin at gmail.com> wrote: > >> Hi Abe, >> >> I've been pulling a lot of ideas from the recent discussion on design by >>> contract (DBC), the elegance and drawbacks >>> of doctests >>> , and the amazing talk >>> given by Hillel Wayne at >>> this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the >>> Next Level". >>> >> >> Have you looked at the recent discussions regarding design-by-contract on >> this list ( >> https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU >> and the following forked threads)? >> >> You might want to have a look at static checking techniques such as >> abstract interpretation. I hope to be able to work on such a tool for >> Python in some two years from now. We can stay in touch if you are >> interested. >> >> Re decorators: to my own surprise, using decorators in a larger code base >> is completely practical including the readability and maintenance of the >> code. It's neither that ugly nor problematic as it might seem at first look. >> >> We use our https://github.com/Parquery/icontract at the company. Most of >> the design choices come from practical issues we faced -- so you might want >> to read the doc even if you don't plant to use the library. >> >> Some of the aspects we still haven't figured out are: how to approach >> multi-threading (locking around the whole function with an additional >> decorator?) and granularity of contract switches (right now we use >> always/optimized, production/non-optimized and teating/slow, but it seems >> that a larger system requires finer categories). >> >> Cheers Marko >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricocotam at gmail.com Thu Nov 29 02:28:28 2018 From: ricocotam at gmail.com (Adrien Ricocotam) Date: Thu, 29 Nov 2018 08:28:28 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> Message-ID: <76635835-61DC-41F3-9F76-036698D38477@gmail.com> Hi everyone, first participation in Python?s mailing list, don?t be too hard on me Some suggested above to change the definition of len in the long term. Then I think it could be interesting to define len such as : - If has a finite length : return that length (the way it works now) - If has a length that is infinity : return infinity - If has no length : return None There?s an issue with this solution, having None returned add complexity to the usage of len, then I suggest to have a wrapper over __len__ methods so it throws the current error. But still, there?s a problem with infinite length objects. If people code : for i in range(len(infinite_list)): # Something It?s not clear if people actually want to do this. It?s opened to discussion and it is just a suggestion. If we now consider map, then the length of map (or filter or any other generator based on an iterator) is the same as the iterator itself which could be either infinite or non defined. Cheers > On 29 Nov 2018, at 06:06, Anders Hovm?ller wrote: > > >>> +1. Throwing away information is almost always a bad idea. >> >> "Almost always"? Let's take this seriously, and think about the >> consequences if we actually believed that. If I created a series of >> integers: > > ?Almost". It?s part of my sentence. I have known about addition for many years in fact :) > > / Anders > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From julien at tayon.net Thu Nov 29 03:55:09 2018 From: julien at tayon.net (julien tayon) Date: Thu, 29 Nov 2018 09:55:09 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References: Message-ID: I wrote a lib specially for the case of validator that would also override the documentation : default is if name of function +args speaks by it itself then only this is added to the docstring ex: @require_odd_numbers() => it would add require_odd_numbers at the end of __doc__ and the possibilitly to add template of doc strings. https://github.com/jul/check_arg -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Nov 29 04:25:38 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 29 Nov 2018 04:25:38 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128222714.GY4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> Message-ID: On 11/28/2018 5:27 PM, Steven D'Aprano wrote: > On Wed, Nov 28, 2018 at 02:53:50PM -0500, Terry Reedy wrote: > >> One of the guidelines in the Zen of Python is >> "Special cases aren't special enough to break the rules." >> >> This proposal claims that the Python 3 built-in iterator class 'map' is >> so special that it should break the rule that iterators in general >> cannot and therefore do not have .__len__ methods because their size may >> be infinite, unknowable until exhaustion, or declining with each >> .__next__ call. >> >> For iterators, 3.4 added an optional __length_hint__ method. This makes >> sense for iterators, like tuple_iterator, list_iterator, range_iterator, >> and dict_keyiterator, based on a known finite collection. At the time, >> map.__length_hint__ was proposed and rejected as problematic, for >> obvious reasons, and insufficiently useful. > > Thanks for the background Terry, but doesn't that suggest that sometimes > special cases ARE special enough to break the rules? *wink* Yes, but these cases is not special enough to break the rules for len and __len__, especially when an alternative already exists. > Unfortunately, I don't think it is obvious why map.__length_hint__ is > problematic. It is less obvious (there are more details to fill in) than the (exact) length_hints for the list, tuple, range, and dict iterators. This are *always* based on a sized collection. Map is *sometimes* based on sized collection(s). It is the other cases that are problematic, as illustrated by your next sentence. > It only needs to return the *maximum* length, or > sentinel (zero?) to say "I don't know". It doesn't > need to be accurate, unlike __len__ itself. > Perhaps we should rethink the decision not to give map() and filter() a > length hint? I should have said this more explicitly. This is why I suggested that someone define and test one or specific map.__length_hint__ implementations. Someone doing so should look into the C code for list to see how list handles iterators with a length hint. I suspect that low estimates are better than high estimates. Does list recognize any value as "I don't know"? >> What makes the map class special among all built-in iterator classes? >> It appears not to be a property of the class itself, as an iterator >> class, but of its name. In Python 2, 'map' was bound to a different >> implementation of the map idea, a function that produced a list, which >> has a length. I suspect that if Python 3 were the original Python, we >> would not have this discussion. > > No, in fairness, I too have often wanted to know the length of an > arbitrary iterator, including map(), without consuming it. In general > this is an unsolvable problem, but sometimes it is (or at least, at first > glance *seems*) solvable. map() is one of those cases. > > If we could solve it, that would be great -- but I'm not convinced that > it is solvable, since the solution seems worse than the problem it aims > to solve. But I live in hope that somebody cleverer than me can point > out the flaws in my argument. The current situation with length_hint reminds me a bit of the situation with annotations before the addition of typing. Perhaps it is time to think about conventions for the non-obvious 'other cases'. >> Perhaps 2.7, in addition to future imports of text as unicode and print >> as a function, should have had one to make map and filter be the 3.x >> iterators. > > I think that's future_builtins: > > [steve at ando ~]$ python2.7 -c "from future_builtins import *; print map(len, [])" > Thanks for the info. > But that wouldn't have helped E. Madison Bray or SageMath, since their > difficulty is not their own internal use of map(), but their users' use > of map(). In particular, by people who are not vividly aware that we broke the back-compatibility rule by rebinding 'map' and 'filter' in 3.0. Breaking back-compatibility *again* by redefining len (to mean something like operator.length) is not the right solution to problems caused by the 3.0 break. > Unless they simply ban any use of iterators at all, which I imagine will > be a backwards-incompatible change (and for that matter an excessive > overreaction for many uses), SageMath can't prevent users from providing > map() objects or other iterator arguments. I think their special case problem requires some special case solutions. At this point, I am refraining from making suggestions. -- Terry Jan Reedy From erik.m.bray at gmail.com Thu Nov 29 05:32:20 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 11:32:20 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <5BFF1DCF.5040003@canterbury.ac.nz> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128150348.GT4319@ando.pearwood.info> <5BFF1DCF.5040003@canterbury.ac.nz> Message-ID: On Wed, Nov 28, 2018 at 11:59 PM Greg Ewing wrote: > > E. Madison Bray wrote: > > So I might want to check: > > > > finite_definite = True > > for it in my_map.iters: > > try: > > len(it) > > except TypeError: > > finite_definite = False > > > > if finite_definite: > > my_seq = list(my_map) > > else: > > # some other algorithm > > If map is being passed into your function, you can still do this > check before calling map. > > If the user is doing the mapping themselves, then in Python 2 it > would have blown up anyway before your function even got called, > so nothing is any worse. You either missed, or completely ignored, my previous message where I addressed this: "For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers." From erik.m.bray at gmail.com Thu Nov 29 05:37:19 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 11:37:19 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128220323.GX4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 11:04 PM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 05:37:39PM +0100, Anders Hovm?ller wrote: > > > > > > > I just mentioned that porting effort for background. I still believe > > > that the actual proposal of making the arguments to a map(...) call > > > accessible from Python as attributes of the map object (ditto filter, > > > zip, etc.) is useful in its own right, rather than just having this > > > completely opaque iterator. > > > > +1. Throwing away information is almost always a bad idea. > > "Almost always"? Let's take this seriously, and think about the > consequences if we actually believed that. If I created a series of > integers: > > a = 23 > b = 0x17 > c = 0o27 > d = 0b10111 > e = int('1b', 12) > > your assertion would say it is a bad idea to throw away the information > about how they were created, and hence we ought to treat all five values > as distinct and distinguishable. So much for the small integer cache... Not to go too off-topic but I don't think this is a great example either. Although as a practical consideration I agree Python shouldn't preserve the base representation from which an integer were created I often *wish* it would. It's useful information to have. There's nothing I hate more than doing hex arithmetic in Python and having it print out decimal results, then having to wrap everything in hex(...) before displaying. Base representation is still meaningful, often useful information. From solipsis at pitrou.net Thu Nov 29 05:58:04 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 11:58:04 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs References: <20181128151810.7c2393c2@fsol> <20181128230826.39ce721c@fsol> Message-ID: <20181129115804.426cca20@fsol> On Wed, 28 Nov 2018 23:22:20 -0200 Marcos Eliziario wrote: > But nobody is talking about exhausting the combinatoric space of all > possible values. Property Based Testing looks like Fuzzy Testing but it is > not quite the same thing. Well, the OP did talk about "exhaustively searching the multidimensional space". But I agree mere sampling is useful. I might give hypothesis a try someday. Usually I prefer hand-rolling my own stress testing routines. Regards Antoine. From erik.m.bray at gmail.com Thu Nov 29 06:13:56 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 12:13:56 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy wrote: > > On 11/28/2018 9:27 AM, E. Madison Bray wrote: > > On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert wrote: > >> > >> I just ran into the following behavior, and found it surprising: > >> > >>>>> len(map(float, [1,2,3])) > >> TypeError: object of type 'map' has no len() > >> > >> I understand that map() could be given an infinite sequence and therefore might not always have a length. But in this case, it seems like map() should've known that its length was 3. I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. > >> > >> My proposal is to delegate map.__len__() to the underlying iterable. > > One of the guidelines in the Zen of Python is > "Special cases aren't special enough to break the rules." This seems to be replying to the OP, whom I was quoting. On one hand I would argue that this is cherry-picking the "Zen" since not all rules are special in the first place. But in this case I agree that map should not have a length or possibly even a length hint (although the latter is more justifiable). > > As a simple counter-proposal which I believe has fewer issues, I would > > really like it if the built-in `map()` and `filter()` at least > > provided a Python-level attribute to access the underlying iterables. > > This proposes to make map (and filter) special in a different way, by > adding other special (dunder) attributes. In general, built-in > callables do not attach their args to their output, for obvious reasons. > If they do, they do not expose them. If input data must be saved, the > details are implementation dependent. A C-coded callable would not > necessarily save information in the form of Python objects. Who said anything about "special", or adding "special (dunder) attributes"? Nor did I make any general statement about all built-ins. For arbitrary functions it doesn't necessarily make sense to hold on to their arguments, but in the case of something like map() its arguments are the only thing that give it meaning at all: The fact remains that for something like a map in particular it can be treated in a formal sense as a collection of a function and some sequence of arguments (possibly unbounded) on which that function is to be evaluated (perhaps not immediately). As an analogy, a series in an object in its own right without having to evaluate the entire series: lots of information can be gleaned from the properties of a series without having to evaluate it. Just because you don't see the use doesn't mean others can't find one. The CPython map() implementation already carries this data on it as "func" and "iters" members in its struct. It's trivial to expose those to Python as ".funcs" and ".iters" attributes. Nothing "special" about it. However, that brings me to... > https://docs.python.org/3/library/functions.html#map says > "map(function, iterable, ...) > Return an iterator [...]" > > The wording is intentional. The fact that map is a class and the > iterator an instance of the class is a CPython implementation detail. > Another implementation could use the generator function equivalent given > in the Python 2 itertools doc, or a translation thereof. I don't know > what pypy and other implementations do. The fact that CPython itertools > callables are (now) C-coded classes instead Python-coded generator > functions, or C translations thereof (which is tricky) is for > performance and ease of maintenance. Exactly how intentional is that wording though? If it returns an iterator it has to return *some object* that implements iteration in the manner prescribed by map. Generator functions could theoretically allow attributes attached to them. Roughly speaking: def map(func, *iters): def map_inner(): for args in zip(*iters): yield func(*args) gen = map_inner() gen.func = func gen.iters = iters return gen As it happens this won't work in CPython since it does not allow attribute assignment on generator objects. Perhaps there's some good reason for that, but AFAICT--though I may be missing a PEP or something--this fact is not prescribed anywhere and is also particular to CPython. Point being, I don't think it's a massive leap or imposition on any implementation to go from "Return an iterator [...]" to "Return an iterator that has these attributes [...]" P.S. > > This is necessary because if I have a function that used to take, say, > > a list as an argument, and it receives a `map` object, I now have to > > be able to deal with map()s, > > If a function is documented as requiring a list, or a sequence, or a > length object, it is a user bug to pass an iterator. The only thing > special about map and filter as errors is the rebinding of the names > between Py2 and Py3, so that the same code may be good in 2.x and bad in > 3.x. It's not a user bug if you're porting a massive computer algebra application that happens to use Python as its implementation language (rather than inventing one from scratch) and your users don't need or want to know too much about Python 2 vs Python 3. Besides, the fact that they are passing an iterator now is probably in many cases a good thing for them, but it takes away my ability as a developer to find out more about what they're trying to do, as opposed to say just being given a list of finite size. That said, I regret bringing up Sage; I was using it as an example but I think the point stands on its own. From erik.m.bray at gmail.com Thu Nov 29 06:16:37 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 12:16:37 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181128222714.GY4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 02:53:50PM -0500, Terry Reedy wrote: > > What makes the map class special among all built-in iterator classes? > > It appears not to be a property of the class itself, as an iterator > > class, but of its name. In Python 2, 'map' was bound to a different > > implementation of the map idea, a function that produced a list, which > > has a length. I suspect that if Python 3 were the original Python, we > > would not have this discussion. > > No, in fairness, I too have often wanted to know the length of an > arbitrary iterator, including map(), without consuming it. In general > this is an unsolvable problem, but sometimes it is (or at least, at first > glance *seems*) solvable. map() is one of those cases. > > If we could solve it, that would be great -- but I'm not convinced that > it is solvable, since the solution seems worse than the problem it aims > to solve. But I live in hope that somebody cleverer than me can point > out the flaws in my argument. In general it's unsolvable, so no attempt should be made to provide a pre-baked attempt at a solution that won't always work. But in many, if not the majority of cases, it *is* solvable. So let's give intelligent people the tools they need to solve it in those cases that they know they can solve it :) > But that wouldn't have helped E. Madison Bray or SageMath, since their > difficulty is not their own internal use of map(), but their users' use > of map(). > > Unless they simply ban any use of iterators at all, which I imagine will > be a backwards-incompatible change (and for that matter an excessive > overreaction for many uses), SageMath can't prevent users from providing > map() objects or other iterator arguments. That is the majority of the case I was concerned about, yes. From rosuav at gmail.com Thu Nov 29 06:16:37 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 22:16:37 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 10:14 PM E. Madison Bray wrote: > P.S. > > > > This is necessary because if I have a function that used to take, say, > > > a list as an argument, and it receives a `map` object, I now have to > > > be able to deal with map()s, > > > > If a function is documented as requiring a list, or a sequence, or a > > length object, it is a user bug to pass an iterator. The only thing > > special about map and filter as errors is the rebinding of the names > > between Py2 and Py3, so that the same code may be good in 2.x and bad in > > 3.x. > > It's not a user bug if you're porting a massive computer algebra > application that happens to use Python as its implementation language > (rather than inventing one from scratch) and your users don't need or > want to know too much about Python 2 vs Python 3. Besides, the fact > that they are passing an iterator now is probably in many cases a good > thing for them, but it takes away my ability as a developer to find > out more about what they're trying to do, as opposed to say just being > given a list of finite size. If that's the case, then it should be no problem to rebind builtins.map to return a list. Problem solved. ChrisA From erik.m.bray at gmail.com Thu Nov 29 06:18:33 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 12:18:33 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 12:16 PM Chris Angelico wrote: > > On Thu, Nov 29, 2018 at 10:14 PM E. Madison Bray wrote: > > P.S. > > > > > > This is necessary because if I have a function that used to take, say, > > > > a list as an argument, and it receives a `map` object, I now have to > > > > be able to deal with map()s, > > > > > > If a function is documented as requiring a list, or a sequence, or a > > > length object, it is a user bug to pass an iterator. The only thing > > > special about map and filter as errors is the rebinding of the names > > > between Py2 and Py3, so that the same code may be good in 2.x and bad in > > > 3.x. > > > > It's not a user bug if you're porting a massive computer algebra > > application that happens to use Python as its implementation language > > (rather than inventing one from scratch) and your users don't need or > > want to know too much about Python 2 vs Python 3. Besides, the fact > > that they are passing an iterator now is probably in many cases a good > > thing for them, but it takes away my ability as a developer to find > > out more about what they're trying to do, as opposed to say just being > > given a list of finite size. > > If that's the case, then it should be no problem to rebind > builtins.map to return a list. Problem solved. Rebind where? How? In sage.__init__? How do you think that will fly with other packages loaded in the same interpreter? From rosuav at gmail.com Thu Nov 29 06:21:15 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 22:21:15 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 10:18 PM E. Madison Bray wrote: > > On Thu, Nov 29, 2018 at 12:16 PM Chris Angelico wrote: > > > > On Thu, Nov 29, 2018 at 10:14 PM E. Madison Bray wrote: > > > P.S. > > > > > > > > This is necessary because if I have a function that used to take, say, > > > > > a list as an argument, and it receives a `map` object, I now have to > > > > > be able to deal with map()s, > > > > > > > > If a function is documented as requiring a list, or a sequence, or a > > > > length object, it is a user bug to pass an iterator. The only thing > > > > special about map and filter as errors is the rebinding of the names > > > > between Py2 and Py3, so that the same code may be good in 2.x and bad in > > > > 3.x. > > > > > > It's not a user bug if you're porting a massive computer algebra > > > application that happens to use Python as its implementation language > > > (rather than inventing one from scratch) and your users don't need or > > > want to know too much about Python 2 vs Python 3. Besides, the fact > > > that they are passing an iterator now is probably in many cases a good > > > thing for them, but it takes away my ability as a developer to find > > > out more about what they're trying to do, as opposed to say just being > > > given a list of finite size. > > > > If that's the case, then it should be no problem to rebind > > builtins.map to return a list. Problem solved. > > Rebind where? How? In sage.__init__? How do you think that will fly > with other packages loaded in the same interpreter? Either this is Python, or it's just an algebra language that happens to be implemented in Python. If the former, the Py2/Py3 distinction should matter to your users, since they are programming in Python. If the latter, it's all about Sage, ergo you can rebind map to mean what you expect it to mean. Take your pick. ChrisA From steve at pearwood.info Thu Nov 29 07:38:24 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 23:38:24 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> Message-ID: <20181129123823.GB4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 12:16:37PM +0100, E. Madison Bray wrote: > On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano wrote: ["it" below being the length of an arbitrary iterator] > > If we could solve it, that would be great -- but I'm not convinced that > > it is solvable, since the solution seems worse than the problem it aims > > to solve. But I live in hope that somebody cleverer than me can point > > out the flaws in my argument. > > In general it's unsolvable, so no attempt should be made to provide a > pre-baked attempt at a solution that won't always work. But in many, > if not the majority of cases, it *is* solvable. So let's give > intelligent people the tools they need to solve it in those cases that > they know they can solve it :) So you say, but the solutions made so far seem fatally flawed to me. Just repeating the assertion that it is solvable isn't very convincing. -- Steve From erik.m.bray at gmail.com Thu Nov 29 08:12:19 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 14:12:19 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 12:21 PM Chris Angelico wrote: > > On Thu, Nov 29, 2018 at 10:18 PM E. Madison Bray wrote: > > > > On Thu, Nov 29, 2018 at 12:16 PM Chris Angelico wrote: > > > > > > On Thu, Nov 29, 2018 at 10:14 PM E. Madison Bray wrote: > > > > P.S. > > > > > > > > > > This is necessary because if I have a function that used to take, say, > > > > > > a list as an argument, and it receives a `map` object, I now have to > > > > > > be able to deal with map()s, > > > > > > > > > > If a function is documented as requiring a list, or a sequence, or a > > > > > length object, it is a user bug to pass an iterator. The only thing > > > > > special about map and filter as errors is the rebinding of the names > > > > > between Py2 and Py3, so that the same code may be good in 2.x and bad in > > > > > 3.x. > > > > > > > > It's not a user bug if you're porting a massive computer algebra > > > > application that happens to use Python as its implementation language > > > > (rather than inventing one from scratch) and your users don't need or > > > > want to know too much about Python 2 vs Python 3. Besides, the fact > > > > that they are passing an iterator now is probably in many cases a good > > > > thing for them, but it takes away my ability as a developer to find > > > > out more about what they're trying to do, as opposed to say just being > > > > given a list of finite size. > > > > > > If that's the case, then it should be no problem to rebind > > > builtins.map to return a list. Problem solved. > > > > Rebind where? How? In sage.__init__? How do you think that will fly > > with other packages loaded in the same interpreter? > > Either this is Python, or it's just an algebra language that happens > to be implemented in Python. If the former, the Py2/Py3 distinction > should matter to your users, since they are programming in Python. Porque no los dos? Sage is a superset of Python, and while on some level (in terms of advanced programming constructs) users will need to care about the distinction. But most users don't really know exactly what it does when they pass something like map(a_func, a_list) as an argument to a function call. They don't necessarily appreciate the distinction that, depending on how that function is implemented, an arbitrary iterable has to be treated very differently than a list. I certainly don't mind supporting arbitrary iterables--I think they should be supported. But now there are optimizations I can't make that I could have made before when map() just returned a list. In most cases I didn't have to make these optimizations manually because the code is written in Cython. It's true that when a user called map() previously some opportunities for optimization were already lost, but now it's even worse because I have to treat a simple map of a list on par with the necessarily slower arbitrary iterator case, when technically-speaking there is no reason that has to be the case. Cython could even handle that case automatically as well by turning a map(, ) into something like: list = map.iter[0]; for (idx=0; idx < PyList_Length(list); idx++) { wrapped_c_function(PyList_GET_ITEM(list, idx); } > If the latter, it's all about Sage, ergo you can rebind map to mean what > you expect it to mean. Take your pick. I'm' still not sure what makes you think one can just blithely replace a builtin with something that doesn't work how all other Python libraries expect that builtin to work. At best I could subclass map() and add this functionality but now you're adding at least three pointers to every map() that are not necessary since the information is already there in the C struct. For most cases this isn't too bad in terms of overhead but consider cases (which I've seen plenty of), like: list_of_lists = [map(int, x) for x in list_of_lists] Now the user who previously expected to have a list of lists has a list of maps. It's already bad enough that each map holds a pointer to a function but I wouldn't want to make that worse. Anyways, I'd love to get off the topic of Sage and just ask why you would object to useful introspection capabilities? I don't even care if it were CPython-specific. From erik.m.bray at gmail.com Thu Nov 29 08:16:48 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 14:16:48 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181129123823.GB4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 1:38 PM Steven D'Aprano wrote: > > On Thu, Nov 29, 2018 at 12:16:37PM +0100, E. Madison Bray wrote: > > On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano wrote: > > ["it" below being the length of an arbitrary iterator] > > > > If we could solve it, that would be great -- but I'm not convinced that > > > it is solvable, since the solution seems worse than the problem it aims > > > to solve. But I live in hope that somebody cleverer than me can point > > > out the flaws in my argument. > > > > In general it's unsolvable, so no attempt should be made to provide a > > pre-baked attempt at a solution that won't always work. But in many, > > if not the majority of cases, it *is* solvable. So let's give > > intelligent people the tools they need to solve it in those cases that > > they know they can solve it :) > > So you say, but the solutions made so far seem fatally flawed to me. > > Just repeating the assertion that it is solvable isn't very convincing. Okay, let's keep it simple: m = map(str, [1, 2, 3]) len_of_m = None if len(m.iters) == 1 and isinstance(m.iters[0], Sized): len_of_m = len(m.iters[0]) You can give me pathological cases where that isn't true, but you can't say there's no context in which that wouldn't be virtually guaranteed and consenting adults can decide whether or not that's a safe-enough assumption in their own code. From steve at pearwood.info Thu Nov 29 08:13:09 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 00:13:09 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181129131308.GC4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 10:21:15PM +1100, Chris Angelico wrote: > On Thu, Nov 29, 2018 at 10:18 PM E. Madison Bray wrote: > > > > On Thu, Nov 29, 2018 at 12:16 PM Chris Angelico wrote: > > > > > > On Thu, Nov 29, 2018 at 10:14 PM E. Madison Bray wrote: [...] > > > If that's the case, then it should be no problem to rebind > > > builtins.map to return a list. Problem solved. > > > > Rebind where? How? In sage.__init__? How do you think that will fly > > with other packages loaded in the same interpreter? > > Either this is Python, or it's just an algebra language that happens > to be implemented in Python. False dichotomy. Sage is *all* of these things: - a stand-alone application which is (partially) written in Python; - an application which runs under iPython/Jupiter; - a package which has to interoperate with other Python packages; - an algebra language. > If the former, the Py2/Py3 distinction > should matter to your users, since they are programming in Python. Even if they know, and care, about the difference between iterators and lists, they cannot be expected to know or care about how the hundreds of Sage functions process lists differently from iterators. Which would be implementation details of the Sage functions, and subject to change without warning. I sympathise with this proposal. In my own tiny little way, I've had to grapple with something similar for the stdlib statistics library, and I'm not totally happy with the work-around I came up with. And I have a few ideas for the future which will either render the difference moot, or make the problem worse, I'm not sure which :-) > If > the latter, it's all about Sage, ergo you can rebind map to mean what > you expect it to mean. Take your pick. Sage wraps a number of Python libraries, such as numpy, sympy and others, and itself can run under iPython which for all we know may already have monkeypatched the builtins for its own ~~nefarious~~ useful purposes. Are you really comfortable with monkeypatching the builtins in this way in such a complex ecosystem of packages? Maybe it will work, but I think you're being awfully gung-ho about the suggestion. (At least my earlier suggestion didn't involve monkey-patching the builtin map, merely shadowing it.) Personally, even if monkeypatching in this way solved the problem, as a (potential) user of SageMath I'd be really, really peeved if it patched map() in the way you suggest and regressed map() to the 2.x version. -- Steve From rosuav at gmail.com Thu Nov 29 08:22:01 2018 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 30 Nov 2018 00:22:01 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181129131308.GC4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181129131308.GC4319@ando.pearwood.info> Message-ID: On Fri, Nov 30, 2018 at 12:18 AM Steven D'Aprano wrote: > Sage wraps a number of Python libraries, such as numpy, sympy and > others, and itself can run under iPython which for all we know may > already have monkeypatched the builtins for its own ~~nefarious~~ useful > purposes. Are you really comfortable with monkeypatching the builtins in > this way in such a complex ecosystem of packages? Maybe it will work, > but I think you're being awfully gung-ho about the suggestion. To be quite honest, no, I am not comfortable with it. But I *am* comfortable with expecting Python programmers to program in Python, and thus deeming that breakage as a result of user code being migrated from Py2 to Py3 is to be fixed by the user. You can mess around with map(), but there are plenty of other things you can't mess with, so I don't see why this one thing should be Sage's problem. ChrisA From erik.m.bray at gmail.com Thu Nov 29 09:05:02 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 15:05:02 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181129131308.GC4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 2:22 PM Chris Angelico wrote: > > On Fri, Nov 30, 2018 at 12:18 AM Steven D'Aprano wrote: > > Sage wraps a number of Python libraries, such as numpy, sympy and > > others, and itself can run under iPython which for all we know may > > already have monkeypatched the builtins for its own ~~nefarious~~ useful > > purposes. Are you really comfortable with monkeypatching the builtins in > > this way in such a complex ecosystem of packages? Maybe it will work, > > but I think you're being awfully gung-ho about the suggestion. > > To be quite honest, no, I am not comfortable with it. But I *am* > comfortable with expecting Python programmers to program in Python, > and thus deeming that breakage as a result of user code being migrated > from Py2 to Py3 is to be fixed by the user. You can mess around with > map(), but there are plenty of other things you can't mess with, so I > don't see why this one thing should be Sage's problem. The users--often scientists--of SageMath and many other scientific Python packages* are not "Python programmers" as such**. My job as a software engineer is to make the lower-level libraries they use for their day-to-day research work _just work_, and in particular _optimize_ that lower-level code in as many ways as I can find to. In some cases we do have to tell them about Python 2 vs Python 3 things (especially w.r.t. print()) but most of the time it is relatively transparent, as it should be. Steven has the right idea about it. Not every detail can be made perfectly transparent in terms of how users use or misuse them, no. But there are lots of areas where they should absolutely not have to care (e.g. like Steven wrote they cannot be expected to know how every single function might treat an iterator like map() over a finite sequence distinctly from the original finite sequence itself). In the case of map(), although maybe I have not articulated it well, I can say for sure that I've had perfectly valid use cases that were stymied merely by a semi-arbitrary decision to hide the data the wrapped by the "iterator returned by map()" (if you want to be pedantic about it). I'm willing to accept some explanation for why that would be actively harmful, but as someone with concrete problems to solve I'm less convinced by appeals to abstracts, or "why not just X" as if I hadn't considered "X" and found it flawed (which is not to say that I don't mind any new idea being put thoroughly through its paces.) * (Pandas, SymPy, Astropy, and even lower-level packages like NumPy, not to mention Jupyter which implements kernels for dozens of languages, but is primarily implemented in Python) ** With an obligatory asterisk to counter a common refrain from those who experience impostor syndrome, that if you are using this software then yes you are in fact a Python programmer, you just haven't realized it yet ;) From jfine2358 at gmail.com Thu Nov 29 09:28:07 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Thu, 29 Nov 2018 14:28:07 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181129131308.GC4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 2:05 PM E. Madison Bray wrote: > The users--often scientists--of SageMath and many other scientific > Python packages* are not "Python programmers" as such**. My job as a > software engineer is to make the lower-level libraries they use for > their day-to-day research work _just work_, and in particular > _optimize_ that lower-level code in as many ways as I can find to. In > some cases we do have to tell them about Python 2 vs Python 3 things > (especially w.r.t. print()) but most of the time it is relatively > transparent, as it should be. Well said. Unlike many people on this list, programming Python is not their top skill. For example, Paul Romer, the 2018 Economic Nobel Memorial Laurate. His strength is economics. Python is one of the many tools he uses. But it's not his top skill (smile). https://developers.slashdot.org/story/18/10/09/0042240/economics-nobel-laureate-paul-romer-is-a-python-programming-convert In some sense, I think, what Madison wants is an internal domain specific language (IDSL) that works well for Sage users. Just as Django is an IDSL that works well for many web developers. See, for example https://martinfowler.com/books/dsl.html for the general idea. We might not agree on the specifics. But that's perhaps mostly a matter for the domain experts, such as Madison and Sage users. -- Jonathan From steve at pearwood.info Thu Nov 29 09:43:12 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 01:43:12 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> Message-ID: <20181129144311.GD4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 02:16:48PM +0100, E. Madison Bray wrote: > Okay, let's keep it simple: > > m = map(str, [1, 2, 3]) > len_of_m = None > if len(m.iters) == 1 and isinstance(m.iters[0], Sized): > len_of_m = len(m.iters[0]) > > You can give me pathological cases where that isn't true, but you > can't say there's no context in which that wouldn't be virtually > guaranteed Yes I can, and they aren't pathological cases. They are ordinary cases working the way iterators are designed to work. All you get is a map object. You have no way of knowing how many times the iterator has been advanced by calling next(). Consequently, there is no guarantee that len(m.iters[0]) == len(list(m)) except by the merest accident that the map object hasn't had next() called on it yet. *This is not pathological behaviour*. This is how iterators are designed to work. The ability to partially advance an iterator, pause, then pass it on to another function to be completed is a huge benefit of the iterator protocol. I've written code like this on more than one occasion: # toy example for x in it: process(x) if condition(x): for y in it: do_something_else(y) # Strictly speaking, this isn't needed, since "it" is consumed. break If I pass the partially consumed map iterator to your function, it will use the wrong length and give me back inaccurate results. (Assuming it actually uses the length as part of the calculated result.) You might say that your users are not so advanced, or that they're naive enough not to even know they could do that, but that's a pretty unsafe assumption as well as being rather insulting to your own users, some of whom are surely advanced Python coders not just naive dabblers. Even if only one in a hundred users knows that they can partially iterate over the map, and only one in a hundred of those actually do so, you're still making an unsafe assumption that will return inaccurate results based on an invalid value of len_of_m. > and consenting adults can decide whether or not that's a > safe-enough assumption in their own code. Which consenting adults? How am I, wearing the hat of a Sage user, supposed to know which of the hundreds of Sage functions make this "safe-enough" assumption and return inaccurate results as a consequence? -- Steve From erik.m.bray at gmail.com Thu Nov 29 10:32:00 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Thu, 29 Nov 2018 16:32:00 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181129144311.GD4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> <20181129144311.GD4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 3:43 PM Steven D'Aprano wrote: > > On Thu, Nov 29, 2018 at 02:16:48PM +0100, E. Madison Bray wrote: > > > Okay, let's keep it simple: > > > > m = map(str, [1, 2, 3]) > > len_of_m = None > > if len(m.iters) == 1 and isinstance(m.iters[0], Sized): > > len_of_m = len(m.iters[0]) > > > > You can give me pathological cases where that isn't true, but you > > can't say there's no context in which that wouldn't be virtually > > guaranteed > > Yes I can, and they aren't pathological cases. They are ordinary cases > working the way iterators are designed to work. > > All you get is a map object. You have no way of knowing how many times > the iterator has been advanced by calling next(). Consequently, there is > no guarantee that len(m.iters[0]) == len(list(m)) except by the merest > accident that the map object hasn't had next() called on it yet. > > *This is not pathological behaviour*. This is how iterators are designed > to work. > > The ability to partially advance an iterator, pause, then pass it on to > another function to be completed is a huge benefit of the iterator > protocol. I've written code like this on more than one occasion: That's a fair point and probably the killer flaw in this proposal (or any involving getting the lengths of iterators). I still think it would be useful to be able to introspect map objects, but this does throw some doubt on the overall reliability of this. I'd say that in most cases it would still work, but you're right it's harder to guarantee in this context. One obvious workaround would be to attach a flag indicating whether or not __next__ has been called (or as long as you have such a flag, why not a counter for the number of times __next__ has been called)? That would effectively solve the problem, but I admit it's a taller order in terms of adding API surface. From mertz at gnosis.cx Thu Nov 29 11:39:56 2018 From: mertz at gnosis.cx (David Mertz) Date: Thu, 29 Nov 2018 11:39:56 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <76635835-61DC-41F3-9F76-036698D38477@gmail.com> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> <76635835-61DC-41F3-9F76-036698D38477@gmail.com> Message-ID: On Thu, Nov 29, 2018 at 2:29 AM Adrien Ricocotam wrote: > Some suggested above to change the definition of len in the long term. > Then I think it could be interesting to define len such as : > > - If has a finite length : return that length (the way it works now) > - If has a length that is infinity : return infinity > - If has no length : return None > Do you anticipate that the `len()` function will be able to solve the Halting Problem? It is simply not possible to know whether a given iterator will produce finitely many or infinitely many elements. Even those that will produce finitely many do not, in general, have a knowable length without running them until exhaustion. Here's a trivial example: >>> def seq(): ... while random() > 0.1: ... yield 1 >>> len(seq()) # What answer do you want here? Here's a slightly less trivial one: In [1]: from itertools import count In [2]: def mandelbrot(z): ...: "Yield each value until escape iteration" ...: c = z ...: for n in count(): ...: if abs(z) > 2: ...: return n ...: yield z ...: z = z*z + c What should len(mandelbrot(my_complex_number)) be? Hint, depending on the complex number chosen, it might be any Natural Number (or it might not terminate). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricocotam at gmail.com Thu Nov 29 11:45:47 2018 From: ricocotam at gmail.com (Adrien Ricocotam) Date: Thu, 29 Nov 2018 17:45:47 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> <76635835-61DC-41F3-9F76-036698D38477@gmail.com> Message-ID: Alright, I didn?t see those problem. Though I was suggesting that for functions like map, we just let the used iterator answer, this is interesting. Thanks for this On Thu 29 Nov 2018 at 17:40, David Mertz wrote: > On Thu, Nov 29, 2018 at 2:29 AM Adrien Ricocotam > wrote: > >> Some suggested above to change the definition of len in the long term. >> Then I think it could be interesting to define len such as : >> >> - If has a finite length : return that length (the way it works now) >> - If has a length that is infinity : return infinity >> - If has no length : return None >> > > Do you anticipate that the `len()` function will be able to solve the > Halting Problem? > > It is simply not possible to know whether a given iterator will produce > finitely many or infinitely many elements. Even those that will produce > finitely many do not, in general, have a knowable length without running > them until exhaustion. > > Here's a trivial example: > > >>> def seq(): > ... while random() > 0.1: > ... yield 1 > >>> len(seq()) > # What answer do you want here? > > Here's a slightly less trivial one: > > In [1]: from itertools import count > In [2]: def mandelbrot(z): > ...: "Yield each value until escape iteration" > ...: c = z > ...: for n in count(): > ...: if abs(z) > 2: > ...: return n > ...: yield z > ...: z = z*z + c > > What should len(mandelbrot(my_complex_number)) be? Hint, depending on the > complex number chosen, it might be any Natural Number (or it might not > terminate). > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Thu Nov 29 13:13:50 2018 From: jfine2358 at gmail.com (Jonathan Fine) Date: Thu, 29 Nov 2018 18:13:50 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181129144311.GD4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> <20181129144311.GD4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 2:44 PM Steven D'Aprano wrote: > You might say that your users are not so advanced, or that they're naive > enough not to even know they could do that, but that's a pretty unsafe > assumption as well as being rather insulting to your own users, some of > whom are surely advanced Python coders not just naive dabblers. I think that what above all unites Sage users is knowledge of mathematics. Use of Python would be secondary. The goal surely is to discover and develop conventions and interface that work for such a group of users. In this area the original poster is probably the expert, and I think should be respected as such. Steve's post divides Sage users into "advanced Python coders" and "naive dabblers". This misses the point, which is to get something that works well for all users. This, I'd say, is one of the features of Python's success. Most Python users are people who want to get something done. By the way, I'd expect that most Sage users fall into the middle range of Python expertise. I think that to focus on the extremes is both unhelpful and divisive. -- Jonathan From tjreedy at udel.edu Thu Nov 29 15:36:29 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 29 Nov 2018 15:36:29 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On 11/29/2018 6:13 AM, E. Madison Bray wrote: > On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy wrote: > The CPython map() implementation already carries this data on it as > "func" and "iters" members in its struct. It's trivial to expose > those to Python as ".funcs" and ".iters" attributes. Nothing > "special" about it. However, that brings me to... I will come back to this when you do. >> https://docs.python.org/3/library/functions.html#map says >> "map(function, iterable, ...) >> Return an iterator [...]" >> >> The wording is intentional. The fact that map is a class and the >> iterator an instance of the class is a CPython implementation detail. >> Another implementation could use the generator function equivalent given >> in the Python 2 itertools doc, or a translation thereof. I don't know >> what pypy and other implementations do. The fact that CPython itertools >> callables are (now) C-coded classes instead Python-coded generator >> functions, or C translations thereof (which is tricky) is for >> performance and ease of maintenance. > > Exactly how intentional is that wording though? The use of 'iterator' is exactly intended, and the iterator protocol is *intentionally minimal*, with one iterator specific __next__ method and one boilerplate __iter__ method returning self. This is more minimal than some might like. An argument against the addition of length_hint and __length_hint__ was that it might be seen as extending at least the 'expected' iterator protocol. The docs were written to avoid this. > If it returns an > iterator it has to return *some object* that implements iteration in > the manner prescribed by map. > Generator functions could theoretically > allow attributes attached to them. Roughly speaking: > > def map(func, *iters): > def map_inner(): > for args in zip(*iters): > yield func(*args) > > gen = map_inner() > gen.func = func > gen.iters = iters > > return gen > As it happens this won't work in CPython since it does not allow > attribute assignment on generator objects. Perhaps there's some good > reason for that, but AFAICT--though I may be missing a PEP or > something--this fact is not prescribed anywhere and is also particular > to CPython. Instances of C-coded classes generally cannot be augmented. But set this issue aside. > Point being, I don't think it's a massive leap or > imposition on any implementation to go from "Return an iterator [...]" > to "Return an iterator that has these attributes [...]" Do you propose exposing the inner struct members of *all* C-coded iterators? (And would you propose that all Python-coded iterators should use public names for the equivalents?) Some subset thereof? (What choice rule?) Or only for map? If the latter, why do you consider map so special? >>> This is necessary because if I have a function that used to take, say, >>> a list as an argument, and it receives a `map` object, I now have to >>> be able to deal with map()s, In both 2 and 3, the function has to deal with iterator inputs one way or another. In both 2 and 3, possible interator inputs includes maps passed as generator comprehensions, '( for x in iterable)'. >> If a function is documented as requiring a list, or a sequence, or a >> length object, it is a user bug to pass an iterator. The only thing >> special about map and filter as errors is the rebinding of the names >> between Py2 and Py3, so that the same code may be good in 2.x and bad in >> 3.x. > > It's not a user bug if you're porting a massive computer algebra > application that happens to use Python as its implementation language > (rather than inventing one from scratch) and your users don't need or > want to know too much about Python 2 vs Python 3. As a former 'scientist who programs' I can understand the desire for ignorance of such details. As a Python core developer, I would say that if you want Sage to allow and cater to such ignorance, you have to either make Sage a '2 and 3' environment, without burdening Python 3, or make future Sage a strictly Python 3 environment (as many scientific stack packages are doing or planning to do). ... > That said, I regret bringing up Sage; I was using it as an example but > I think the point stands on its own. Yes, the issues of hiding versus exposing implementation details, and that of saving versus deleting and, when needed, recreating 'redundant' information, are independent of Sage and 2 versus 3. -- Terry Jan Reedy From tjreedy at udel.edu Thu Nov 29 16:28:19 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 29 Nov 2018 16:28:19 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> Message-ID: On 11/29/2018 8:16 AM, E. Madison Bray wrote: > Okay, let's keep it simple: > > m = map(str, [1, 2, 3]) > len_of_m = None > if len(m.iters) == 1 and isinstance(m.iters[0], Sized): > len_of_m = len(m.iters[0]) As I have noted before, the existing sized collection __length_hint__ methods (properly) return the remaining items = len(underlying_iterable) - items_already_produced. This is fairly easy at the C level. The following seems to work in Python. class map1: def __init__(self, func, sized): " if isinstance(sized, (list, tuple, range, dict)): self._iter = iter(sized) self._gen = (func(x) for x in self._iter) else: raise TypeError(f'{size} not one of list, tuple, range, dict') def __iter__(self): return self def __next__(self): return next(self._gen) def __length_hint__(self): return __length_hint__(self._iter) m = map1(int, [1.0, 2.0, 3.0]) print(m.__length_hint__()) print('first item', next(m)) print(m.__length_hint__()) print('remainer', list(m)) print(m.__length_hint__()) # prints, as expected and desired 3 first item 1 2 remainer [2, 3] 0 A package could include a version of this, possibly compiled, for use when applicable. -- Terry Jan Reedy From abedillon at gmail.com Thu Nov 29 18:03:57 2018 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 29 Nov 2018 17:03:57 -0600 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128220323.GX4319@ando.pearwood.info> <76635835-61DC-41F3-9F76-036698D38477@gmail.com> Message-ID: [David Mertz] > Do you anticipate that the `len()` function will be able to solve the > Halting Problem? > It is simply not possible to know whether a given iterator will produce > finitely many or infinitely many elements. Even those that will produce > finitely many do not, in general, have a knowable length without running > them until exhaustion. You don't have to solve the halting problem. You simply ask the object. The default behavior would be "I don't know" whether that's communicated by returning None or some other sentinel value (NaN?) or by raising a special exception. Then you simply override the default behavior for cases where the object does or at least might know. itertools.repeat, for example; would have an infinite length unless "times" is provided, then its length would be the value of "times". map would return the length of the shortest iterable unless there is an unknown sized iterable, then len would be unknown, if all iterables are infinite, the length would be infinite. We could add a decorator for length and/or length hints on generator functions: @length(lambda times: times or float("+inf"))*def* repeat(obj, times=None): if times is None: while True: yield obj else: for i in range(times): yield obj On Thu, Nov 29, 2018 at 10:40 AM David Mertz wrote: > On Thu, Nov 29, 2018 at 2:29 AM Adrien Ricocotam > wrote: > >> Some suggested above to change the definition of len in the long term. >> Then I think it could be interesting to define len such as : >> >> - If has a finite length : return that length (the way it works now) >> - If has a length that is infinity : return infinity >> - If has no length : return None >> > > Do you anticipate that the `len()` function will be able to solve the > Halting Problem? > > It is simply not possible to know whether a given iterator will produce > finitely many or infinitely many elements. Even those that will produce > finitely many do not, in general, have a knowable length without running > them until exhaustion. > > Here's a trivial example: > > >>> def seq(): > ... while random() > 0.1: > ... yield 1 > >>> len(seq()) > # What answer do you want here? > > Here's a slightly less trivial one: > > In [1]: from itertools import count > In [2]: def mandelbrot(z): > ...: "Yield each value until escape iteration" > ...: c = z > ...: for n in count(): > ...: if abs(z) > 2: > ...: return n > ...: yield z > ...: z = z*z + c > > What should len(mandelbrot(my_complex_number)) be? Hint, depending on the > complex number chosen, it might be any Natural Number (or it might not > terminate). > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Thu Nov 29 19:31:13 2018 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Thu, 29 Nov 2018 20:31:13 -0400 Subject: [Python-ideas] [off-topic?] Unwinding generators Message-ID: This is code from the Twisted library: https://github.com/twisted/twisted/blob/trunk/src/twisted/internet/defer.py#L1542-L1614 It "unwinds" a generator to yield a result before others. I don't have hard evidence, but my experience is that that kind of manipulation leaks resources, specially if exceptions escape from the final callback. Is there a bug in exception handling in the generator logic, or is unwinding just inherently wrong? How could the needs tried to solved with unwinding be handled with async? -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul-python at svensson.org Thu Nov 29 20:13:12 2018 From: paul-python at svensson.org (Paul Svensson) Date: Thu, 29 Nov 2018 20:13:12 -0500 (EST) Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert wrote: > > I just ran into the following behavior, and found it surprising: > >>>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore might not always have a length. But in this case, it seems like map() should've known that its length was 3. I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. > > My proposal is to delegate map.__len__() to the underlying iterable. Similarly, map.__getitem__() could be implemented if the underlying iterable supports item access: > Excellent proposal, followed by a flood of confused replies, which I will mostly disregard, since all miss the obvious. What's being proposed is simple, either: * len(map(f, x)) == len(x), or * both raise TypeError That implies, loosely speaking: * map(f, Iterable) -> Iterable, and * map(f, Sequence) -> Sequence But, *not*: * map(f, Iterable|Sequence) -> Magic. So, the map() function becomes a factory, returning an object with __len__ or without, depending on what it was called with. /Paul From abedillon at gmail.com Thu Nov 29 23:59:43 2018 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 29 Nov 2018 22:59:43 -0600 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: That would be great especially if it returned objects of a subclass of map so that it didn't break any code that checks isinstance, however; I think this goes a little beyond map. I've run into cases using itertools where I wished the iterators could support len. I suppose you could turn those all into factories too, but I wonder if that's the most elegant solution. On Thu, Nov 29, 2018 at 7:22 PM Paul Svensson wrote: > On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert > wrote: > > > > I just ran into the following behavior, and found it surprising: > > > >>>> len(map(float, [1,2,3])) > > TypeError: object of type 'map' has no len() > > > > I understand that map() could be given an infinite sequence and > therefore might not always have a length. But in this case, it seems like > map() should've known that its length was 3. I also understand that I can > just call list() on the whole thing and get a list, but the nice thing > about map() is that it doesn't copy data, so it's unfortunate to lose that > advantage for no particular reason. > > > > My proposal is to delegate map.__len__() to the underlying iterable. > Similarly, map.__getitem__() could be implemented if the underlying > iterable supports item access: > > > > Excellent proposal, followed by a flood of confused replies, > which I will mostly disregard, since all miss the obvious. > > What's being proposed is simple, either: > * len(map(f, x)) == len(x), or > * both raise TypeError > > That implies, loosely speaking: > * map(f, Iterable) -> Iterable, and > * map(f, Sequence) -> Sequence > > But, *not*: > * map(f, Iterable|Sequence) -> Magic. > > So, the map() function becomes a factory, returning an object > with __len__ or without, depending on what it was called with. > > /Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Fri Nov 30 03:54:47 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Fri, 30 Nov 2018 09:54:47 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181128222714.GY4319@ando.pearwood.info> <20181129123823.GB4319@ando.pearwood.info> <20181129144311.GD4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 7:16 PM Jonathan Fine wrote: > > On Thu, Nov 29, 2018 at 2:44 PM Steven D'Aprano wrote: > > > You might say that your users are not so advanced, or that they're naive > > enough not to even know they could do that, but that's a pretty unsafe > > assumption as well as being rather insulting to your own users, some of > > whom are surely advanced Python coders not just naive dabblers. > > I think that what above all unites Sage users is knowledge of > mathematics. Use of Python would be secondary. The goal surely is to > discover and develop conventions and interface that work for such a > group of users. In this area the original poster is probably the > expert, and I think should be respected as such. > > Steve's post divides Sage users into "advanced Python coders" and > "naive dabblers". This misses the point, which is to get something > that works well for all users. This, I'd say, is one of the features > of Python's success. Most Python users are people who want to get > something done. > > By the way, I'd expect that most Sage users fall into the middle range > of Python expertise. I think that to focus on the extremes is both > unhelpful and divisive. Yes, thank you. They are all very smart people--most of them much moreso than I. The vast majority are mathematicians first, and software developers second, third, fourth, or even further down the line. Some of the most prolific contributors to Sage barely know how to use git without some wrappers we've provided around it (not that they couldn't learn, but let's be honest git is a terrible tool for anyone who isn't Linus Torvalds). They still write good code and sometimes brilliant algorithms. But they're not all Python experts. Many of them are also students who are only using Python because Sage uses it, and not using Sage because it uses Python. The Sagebook [1] may be their first introduction to Python, and even then it only introduces Python programming in drips and drabs as needed for the topics at hand (e.g. variables, loops, functions). I'm trying to consider users at all levels. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf From erik.m.bray at gmail.com Fri Nov 30 04:32:31 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Fri, 30 Nov 2018 10:32:31 +0100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: On Thu, Nov 29, 2018 at 9:36 PM Terry Reedy wrote: > >> https://docs.python.org/3/library/functions.html#map says > >> "map(function, iterable, ...) > >> Return an iterator [...]" > >> > >> The wording is intentional. The fact that map is a class and the > >> iterator an instance of the class is a CPython implementation detail. > >> Another implementation could use the generator function equivalent given > >> in the Python 2 itertools doc, or a translation thereof. I don't know > >> what pypy and other implementations do. The fact that CPython itertools > >> callables are (now) C-coded classes instead Python-coded generator > >> functions, or C translations thereof (which is tricky) is for > >> performance and ease of maintenance. > > > > Exactly how intentional is that wording though? > > The use of 'iterator' is exactly intended, and the iterator protocol is > *intentionally minimal*, with one iterator specific __next__ method and > one boilerplate __iter__ method returning self. This is more minimal > than some might like. An argument against the addition of length_hint > and __length_hint__ was that it might be seen as extending at least the > 'expected' iterator protocol. The docs were written to avoid this. You still seem to be confusing my point. I'm not advocating even for __length_hint__ (I think there are times that would be useful but it's still pretty problematic). I admit one thing I'm a little stuck on though is that map() currently just immediately calls iter() on its arguments to get their iterators, and does not store references to the original iterables. It would be nice if more iterators could have an exposed reference to the objects they're iterating, in cases where that's even meaningful. For some reason I thought, for example, that a list_iterator could give me a reference back to the list itself. This was probably omitted intentionally but it still feels pretty limiting :( > > Point being, I don't think it's a massive leap or > > imposition on any implementation to go from "Return an iterator [...]" > > to "Return an iterator that has these attributes [...]" > > Do you propose exposing the inner struct members of *all* C-coded > iterators? (And would you propose that all Python-coded iterators > should use public names for the equivalents?) Some subset thereof? > (What choice rule?) Or only for map? If the latter, why do you > consider map so special? Not necessarily, no. But certainly a few: I'm using map() as an example but at the very least map() and filter(). An exact choice rule is something worth thinking about but I don't think you're going to find an "objective" rule. I think it goes without saying that map() is special in a way: It's one of the most basic extensions to function application and is a fundamental construct in functional programming and from a category-theortical perspective. I'm not saying Python's built-in map() needs to represent anything mathematically formal, but it's certainly quite fundamental which is why it's a built-in in the first place. > >>> This is necessary because if I have a function that used to take, say, > >>> a list as an argument, and it receives a `map` object, I now have to > >>> be able to deal with map()s, > > In both 2 and 3, the function has to deal with iterator inputs one way > or another. In both 2 and 3, possible interator inputs includes maps > passed as generator comprehensions, '( for x in > iterable)'. Yes, but those are still less common, and generator expressions were not even around when Sage was first started: I've been around long enough to remember when they were added to the language, and were well predated by map() and filter(). The Sagebook [1] introduces them around page 60. I'm not sure if it even introduces generators expressions at all. I think a lot of Python and C++ experts don't realize that the "iterator" concept is not at all immediately obvious to a lot of non-programmers. Most iterator inputs supplied by users are things like sized collections for which it's easy to think about "going over them one by one" and not more abstract iterators. This is true whether the user is a Python expert or not. > >> If a function is documented as requiring a list, or a sequence, or a > >> length object, it is a user bug to pass an iterator. The only thing > >> special about map and filter as errors is the rebinding of the names > >> between Py2 and Py3, so that the same code may be good in 2.x and bad in > >> 3.x. > > > > It's not a user bug if you're porting a massive computer algebra > > application that happens to use Python as its implementation language > > (rather than inventing one from scratch) and your users don't need or > > want to know too much about Python 2 vs Python 3. > > As a former 'scientist who programs' I can understand the desire for > ignorance of such details. As a Python core developer, I would say that > if you want Sage to allow and cater to such ignorance, you have to > either make Sage a '2 and 3' environment, without burdening Python 3, or > make future Sage a strictly Python 3 environment (as many scientific > stack packages are doing or planning to do). "ignorance" is not a word I would use here, frankly. > ... > > That said, I regret bringing up Sage; I was using it as an example but > > I think the point stands on its own. > > Yes, the issues of hiding versus exposing implementation details, and > that of saving versus deleting and, when needed, recreating 'redundant' > information, are independent of Sage and 2 versus 3. I agree there, that this is not really an argument about Sage or Python 2/3. Though I don't think this is an "implementation detail". In an abstract sense a map is a special container for a function and a sequence that has special semantics. As far as I'm concerned this is what it *is* in some ontological sense, and this fact is not a mere implementation detail. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf From steve at pearwood.info Fri Nov 30 10:45:01 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 1 Dec 2018 02:45:01 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181130154500.GJ4319@ando.pearwood.info> On Fri, Nov 30, 2018 at 10:32:31AM +0100, E. Madison Bray wrote: > I think it goes without saying that > map() is special in a way: It's one of the most basic extensions to > function application and is a fundamental construct in functional > programming and from a category-theortical perspective. I'm not > saying Python's built-in map() needs to represent anything > mathematically formal, but it's certainly quite fundamental which is > why it's a built-in in the first place. Its a built-in in the first place, because back in Python 0.9 or 1.0 or thereabouts, a fan of Lisp added it to the builtins (together with filter and reduce) and nobody objected (possibly because they didn't notice) at the time. It was much easier to add things to the language back then. During the transition to Python 3, Guido wanted to remove all three (as well as lambda): https://www.artima.com/weblogs/viewpost.jsp?thread=98196 Although map, filter and lambda have stayed, reduce has been relegated to the functools module. -- Steve From steve at pearwood.info Fri Nov 30 20:17:35 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 1 Dec 2018 12:17:35 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> Message-ID: <20181201011734.GN4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 08:13:12PM -0500, Paul Svensson wrote: > Excellent proposal, followed by a flood of confused replies, > which I will mostly disregard, since all miss the obvious. When everyone around you is making technical responses which you think are "confused", it is wise to consider the possibility that it is you who is missing something rather than everyone else. > What's being proposed is simple, either: > * len(map(f, x)) == len(x), or > * both raise TypeError Simple, obvious, and problematic. Here's a map object I prepared earlier: from itertools import islice mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) If I now pass you the map object, mo, what should len(mo) return? Five or eight? No matter which choice you make, you're going to surprise and annoy people, and there will be circumstances where that choice will introduce bugs into their code. > That implies, loosely speaking: > * map(f, Iterable) -> Iterable, and > * map(f, Sequence) -> Sequence But map objects aren't sequences. They're iterators. Just adding a __len__ method isn't going to make them sequences (not even "loosely speaking") or solve the problem above. In principle, we could make this work, by turning the output of map() into a view like dict.keys() etc, or a lazy sequence type like range(). wrapping the underlying sequence. That might be worth exploring. I can't think of any obvious problems with a view-like interface, but that doesn't mean there aren't any. I've spent like 30 seconds thinking about it, so the fact that I can't see any problems with it means little. But its also a big change, not just a matter of exposing the __len__ method of the underlying iterable (or iterables). -- Steve