From bitsink at gmail.com Mon Apr 1 00:16:35 2019 From: bitsink at gmail.com (Nam Nguyen) Date: Sun, 31 Mar 2019 21:16:35 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Sun, Mar 31, 2019 at 12:13 PM David Mertz wrote: > I just found this nice summary. It's not complete, but it looks well > written. https://tomassetti.me/parsing-in-python/ > > On Sun, Mar 31, 2019, 3:09 PM David Mertz wrote: > >> There are about a half dozen widely used parsing libraries for Python. >> Each one of them takes a dramatically different approach to the defining a >> grammar. Each one has been debugged for over a decade. >> >> While I can imagine proposing one for inclusion in the standard library, >> you'd have to choose one (or write a new one) and explain why that one is >> better for everyone (or at least a better starting point) than all the >> others are. >> > I'm not at that stage, yet. By the way, it still is not clear to me if you think having one in the stdlib is desirable. > You're also have to explain why it needs to be in the standard library >> rather than installed by 'pip install someparser'. >> > Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those parsers. Cheers, Nam > >> On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen wrote: >> >>> Hello list, >>> >>> What do you think of a universal parsing library in the stdlib mainly >>> for use by other libraries in the stdlib? >>> >>> Through out the years we have had many issues with protocol parsing. >>> Some have even introduced security bugs. The main cause of these issues is >>> the use of simple regular expressions. >>> >>> Having a universal parsing library in the stdlib would help cut down >>> these issues. Such a library should be minimal yet encompassing, and whole >>> parse trees should be entirely expressible in code. I am thinking of >>> combinatoric parsing as the main candidate that fits this bill. >>> >>> What do you say? >>> >>> Thanks! >>> Nam >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Mon Apr 1 01:07:34 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Mon, 1 Apr 2019 07:07:34 +0200 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> Message-ID: <639C3854-415A-4153-86FE-BD713F3859DE@killingar.net> > On 1 Apr 2019, at 02:23, David Mertz wrote: > >> On Sun, Mar 31, 2019, 8:11 PM Steven D'Aprano wrote: >> Regarding later proposals to add support for multiple affixes, to >> recursively delete the affix repeatedly, and to take an additional >> argument to limit how many affixes will be removed: YAGNI. > > > That's simply not true, and I think it's clearly illustrated by the example I gave a few times. Not just conceivably, but FREQUENTLY I write code to accomplish the effect of the suggested: > > basename = fname.rstrip(('.jpg', '.gif', '.png')) > > I probably do this MORE OFTEN than removing a single suffix. Doing this with a for loop and without_suffix is fine though. Without without_suffix it's suddenly error prone. With a without_suffix that takes a typle it's unclear what happens without reading the code. I think a single string argument is a great sweet spot: avoid the most error prone part and keep the loop in user code. / Anders -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Apr 1 01:06:23 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 01 Apr 2019 18:06:23 +1300 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190331235950.GJ31406@ando.pearwood.info> References: <23709.35761.793748.866930@turnbull.sk.tsukuba.ac.jp> <20190330013736.GU31406@ando.pearwood.info> <23711.45063.758826.321466@turnbull.sk.tsukuba.ac.jp> <20190331044323.GB6059@ando.pearwood.info> <07170fd4-0195-f2d4-d2d7-77f7272e263b@potatochowder.com> <5CA14701.7090504@canterbury.ac.nz> <20190331235950.GJ31406@ando.pearwood.info> Message-ID: <5CA19C4F.6090104@canterbury.ac.nz> Steven D'Aprano wrote: > The best thing is that there will no longer be any confusion as to > whether you are looking at a Unicode string or a byte-string: > > a = a.new_string_trimmed_on_the_left() > a = a.new_bytes_trimmed_on_the_left() To keep the RUE happy, instead of a "string" we should call it a "mathematically_valid_encoded_code_point_sequence". -- Greg From guido at python.org Mon Apr 1 01:14:41 2019 From: guido at python.org (Guido van Rossum) Date: Sun, 31 Mar 2019 22:14:41 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: We do have a parser generator in the standard library: https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2 On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen wrote: > On Sun, Mar 31, 2019 at 12:13 PM David Mertz wrote: > >> I just found this nice summary. It's not complete, but it looks well >> written. https://tomassetti.me/parsing-in-python/ >> >> On Sun, Mar 31, 2019, 3:09 PM David Mertz wrote: >> >>> There are about a half dozen widely used parsing libraries for Python. >>> Each one of them takes a dramatically different approach to the defining a >>> grammar. Each one has been debugged for over a decade. >>> >>> While I can imagine proposing one for inclusion in the standard library, >>> you'd have to choose one (or write a new one) and explain why that one is >>> better for everyone (or at least a better starting point) than all the >>> others are. >>> >> > I'm not at that stage, yet. By the way, it still is not clear to me if you > think having one in the stdlib is desirable. > > >> You're also have to explain why it needs to be in the standard library >>> rather than installed by 'pip install someparser'. >>> >> > Installing a package out of stdlib does not solve the problem that > motivated this thread. The libraries included in the stdlib can't use those > parsers. > > Cheers, > Nam > > >> >>> On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen wrote: >>> >>>> Hello list, >>>> >>>> What do you think of a universal parsing library in the stdlib mainly >>>> for use by other libraries in the stdlib? >>>> >>>> Through out the years we have had many issues with protocol parsing. >>>> Some have even introduced security bugs. The main cause of these issues is >>>> the use of simple regular expressions. >>>> >>>> Having a universal parsing library in the stdlib would help cut down >>>> these issues. Such a library should be minimal yet encompassing, and whole >>>> parse trees should be entirely expressible in code. I am thinking of >>>> combinatoric parsing as the main candidate that fits this bill. >>>> >>>> What do you say? >>>> >>>> Thanks! >>>> Nam >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Mon Apr 1 01:15:15 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Mon, 1 Apr 2019 07:15:15 +0200 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> Message-ID: > On 1 Apr 2019, at 04:58, David Mertz wrote: > > >> Lacking a good set of semantics for removing multiple affixes at once, we shouldn't rush to guess what people want. You don't even know what behaviour YOU want, let alone what the community as a whole needs. > > > This is both dumb and dishonest. There are basically two choices, both completely clear. I think the more obvious one is to treat several prefixes or suffixes as substring class, much as .[rl]strip() does character class. Please don't say "dumb and dishonest". Especially not when you directly follow up by radically redefining what you want. To get the same semantics as strip() it must follow that "foofoobarfoo".without_suffix(("foo", "bar")) == "" / Anders -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamtlu at gmail.com Mon Apr 1 01:43:46 2019 From: jamtlu at gmail.com (James Lu) Date: Mon, 1 Apr 2019 01:43:46 -0400 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: Stack-based LL(1) push down automata can be implemented by hand, indeed isn?t that that a textmateLanguage file is? There?s also the option of using Iro to generate a tmLanguage. From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Apr 1 02:00:16 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Apr 2019 15:00:16 +0900 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <07170fd4-0195-f2d4-d2d7-77f7272e263b@potatochowder.com> References: <1E2D83BC-326E-4278-A46C-7FE4F7FE2560@getmailspring.com> <669F88F5-3CB2-4B33-A451-621307BDBD0F@killingar.net> <23709.35761.793748.866930@turnbull.sk.tsukuba.ac.jp> <20190330013736.GU31406@ando.pearwood.info> <23711.45063.758826.321466@turnbull.sk.tsukuba.ac.jp> <20190331044323.GB6059@ando.pearwood.info> <07170fd4-0195-f2d4-d2d7-77f7272e263b@potatochowder.com> Message-ID: <23713.43248.715433.532329@turnbull.sk.tsukuba.ac.jp> On 3/31/19 1:48 AM, Chris Angelico wrote: > > * strip_prefix/strip_suffix I don't like "strip" because .strip already has a different meaning, although the inclusion of prefix/suffix makes the intended sematics clear enough for the new methods. I wonder if it might make the semantics of .strip even harder to learn, though. > > * cut_prefix/cut_suffix > > * cut_start/cut_end Substitute "trim" or "crop" for "cut" in any of the above, because "cut" might mean "split". I don't think it's very important, and prefer "cut" because it will come early in an alphabetical list of public string methods (discoverability for the new methods). > > * Any of the above with the underscore removed > > * lcut/rcut > > * ltrim/rtrim (and maybe trim) > > * truncate (end only, no from-start equivalent) Dan Sommers writes: > without_prefix > without_suffix > > They're a little longer, but IMO "without" helps > reenforce the immutability of the underlying string. None > of these functions actually remove part of the original > string, but rather they return a new string that's the > original string without some piece of it. I think this rationale is plausible but don't think it's important enough to justify the additional length over "cut". Another possibility to address this would be to use past tense: prefix_trimmed prefix_cut # I think this is awkward. but writing it out makes me think "nah". Regarding allowing a tuple argument, I don't see any reason not to take the "cut the first matching affix and return what's left" semantics, which is closely analogous to how startswith/endswith work. As long as the verb isn't "strip", of course. For me, this possibility puts the last nail in any variation on "strip". I don't see a good reason for the "longest match" variation, except the analogy to POSIX sematics for regexps, which seems pretty weak to me. Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Apr 1 02:16:01 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Apr 2019 15:16:01 +0900 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: <23713.44193.803088.183109@turnbull.sk.tsukuba.ac.jp> David Mertz writes: > While I can imagine proposing one for inclusion in the standard > library, you'd have to choose one (or write a new one) and explain > why that one is better for everyone (or at least a better starting > point) than all the others are. In principle, no, one just needs to explain why this battery fits most of the toys encountered in practice. That's good enough, and if during discussion somebody shows another one is better on a lot of fronts, sure, do that instead. We should avoid letting the perfect be the enemy of the good (as people keep insisting about str.cutsuffix). Politically, sure, it's almost 100% certain that somebody will object that there's a whole class of cases handled by the PackMule parser that the ShavedYacc parser doesn't handle, and somebody else will point out the opposite, so neither is acceptable. Ignore them, they're both wrong about "acceptable". ;-) > You're also have to explain why it needs to be in the standard > library rather than installed by 'pip install someparser'. Again, the bar isn't so high as "needs". There's a balance of equities, such as people with Python installations restricted by QA or security vetting, applications where you really don't want to spend most of your hour allocated to teaching the feature downloading requirements, and cases where pretty much everybody performs the task frequently (for some value of frequently), vs. costs of maintenance (we generally require that a core developer vouch for someone who volunteers to take responsibility for it for 3-5 years) and effects on complexity of learning Python (usually not great for such a module, since the excess burden on documentation ends up being one line in the TOC and a half-dozen in the index). Yes, Nam should be prepared for pushback on both grounds. Most pressingly, without a specific package being proposed, discussion will just go in circles indefinitely. But a parser generator package is something that's been lurking, waiting for an enthusiastic proponent for a long time. There's a lot of low-level support for it. Maybe it just needs a specific proposal to take off. And maybe it won't. He won't know unless he tries. Steve P.S. Guido mentioned lib2to3.pgen2, which is in the stdlib. But help(pgen2) isn't very helpful, so there's at least some documentation work to be done there. From p.f.moore at gmail.com Mon Apr 1 03:20:03 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Apr 2019 08:20:03 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190331235950.GJ31406@ando.pearwood.info> References: <23709.35761.793748.866930@turnbull.sk.tsukuba.ac.jp> <20190330013736.GU31406@ando.pearwood.info> <23711.45063.758826.321466@turnbull.sk.tsukuba.ac.jp> <20190331044323.GB6059@ando.pearwood.info> <07170fd4-0195-f2d4-d2d7-77f7272e263b@potatochowder.com> <5CA14701.7090504@canterbury.ac.nz> <20190331235950.GJ31406@ando.pearwood.info> Message-ID: On Mon, 1 Apr 2019 at 01:01, Steven D'Aprano wrote: > > On Mon, Apr 01, 2019 at 12:02:25PM +1300, Greg Ewing wrote: > > Dan Sommers wrote: > > > > >without_prefix > > >without_suffix > > > > > >They're a little longer, but IMO "without" helps > > >reenforce the immutability of the underlying string. > > > > We don't seem to worry about that distinction for other > > string methods, such as lstrip and rstrip. > > Perhaps we ought to. In the spirit of today's date, let me propose > renaming existing string methods to be more explicit, e.g.: > > str.new_string_in_uppercase > str.new_string_with_substrings_replaced > str.new_string_filled_to_the_given_length_with_zeroes_on_the_left > str.new_string_with_character_translations_not_natural_language_translations > > The best thing is that there will no longer be any confusion as to > whether you are looking at a Unicode string or a byte-string: > > a = a.new_string_trimmed_on_the_left() > a = a.new_bytes_trimmed_on_the_left() > > *wink* In order to support duck typing can I suggest that we also add bytes.new_string_trimmed_on_the_left() str.new_bytes_trimmed_on_the_left() These will do the obvious thing, so they do not need documenting. Obviously only needed for Python 3.x, as in 2.x people never use variables that may sometimes be strings and sometimes bytes. *wink* (although the joke is obvious, and so should not need documenting :-P) Paul From cspealma at redhat.com Mon Apr 1 08:26:08 2019 From: cspealma at redhat.com (Calvin Spealman) Date: Mon, 1 Apr 2019 08:26:08 -0400 Subject: [Python-ideas] PEP-582 and multiple Python versions Message-ID: While the PEP does show the version number as part of the path to the actual packages, implying support for multiple versions, this doesn't seem to be spelled out in the actual text. Presumably __pypackages__/3.8/ might sit beside __pypackages__/3.9/, etc. to keep future versions capable of installing packages for each version, the way virtualenv today is bound to one version of Python. I'd like to raise a potential edge case that might be a problem, and likely an increasingly common one: users with multiple installations of the *same* version of Python. This is actually a common setup for Windows users who use WSL, Microsoft's Linux-on-Windows solution, as you could have both the Windows and Linux builds of a given Python version installed on the same machine. The currently implied support for multiple versions would not be able to separate these and could create problems if users pip install a Windows binary package through Powershell and then try to run a script in Bash from the same directory, causing the Linux version of Python to try to use Windows python packages. I'm not actually sure what the solution here is. Mostly I wanted to raise the concern, because I'm very keen on WSL being a great entry path for new developers and I want to make that a better experience, not a more confusing one. Maybe that version number could include some other unique identify, maybe based on Python's own executable. A hash maybe? I don't know if anything like that already exists to uniquely identify a Python build or installation. -- CALVIN SPEALMAN SENIOR QUALITY ENGINEER cspealma at redhat.com M: +1.336.210.5107 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Apr 1 09:41:28 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Apr 2019 09:41:28 -0400 Subject: [Python-ideas] Built-in parsing library In-Reply-To: <23713.44193.803088.183109@turnbull.sk.tsukuba.ac.jp> References: <23713.44193.803088.183109@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Apr 1, 2019 at 2:16 AM Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > In principle, no, one just needs to explain why this battery fits most > of the toys encountered in practice. That's good enough, and if > during discussion somebody shows another one is better on a lot of > fronts, sure, do that instead. OK, I'll acknowledge my comment might have overstated the bar to overcome. A parser added to the standard library doesn't need to be perfect for everyone. But adding to stdlib *does* provide a kind of endorsement of the right default way to go about things. Among the popular third party libraries, we have several very different attitudes towards designing grammars. On the one hand, there are formal differences in the power of different grammars?some are LL, some LR, some LALR, some Early. Also, some libraries separate parser from lexer while others are scannerless. But most of these can all parse the simple cases fine, so that's good for the 90% coverage. However, cross-cutting that formal power issue, there are two main programming styles used by different libraries. Some libraries use BNF definitions of a grammar as another mini-language inside Python. Exactly where those BNF definitions live varies, but using them is largely similar (i.e. are they in a separate file, in docstrings, contents of variables, etc). And sure, EBNF vs. BNF proper. But other libraries instead use Python functions or classes to define the productions, where each class/function is effectively one term of the grammar. Typically this latter style allows triggering events as soon as some production is encountered?the event could be "accumulate a counter", or "write an output string", or "perform a computation", or other things. There are lots of good arguments for why to use different libraries along the axes I mention, on all sides. What is not possible is to reconcile the very different decisions into a common denominator. Something in the standard library would have to be partisan in selecting one particular approach as the "official" one. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Apr 1 10:24:55 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Apr 2019 23:24:55 +0900 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: <23713.44193.803088.183109@turnbull.sk.tsukuba.ac.jp> Message-ID: <23714.7991.700779.260358@turnbull.sk.tsukuba.ac.jp> David Mertz writes: > OK, I'll acknowledge my comment might have overstated the bar to overcome. > A parser added to the standard library doesn't need to be perfect for > everyone. But adding to stdlib *does* provide a kind of endorsement of the > right default way to go about things. Indeed it does, but TOOWTDI is not absolute. > However, cross-cutting that formal power issue, there are two main > programming styles used by different libraries. I concede this tends to raise the bar quite a bit. > Something in the standard library would have to be partisan in > selecting one particular approach as the "official" one. Perhaps. Even there, though, we have an example: XML. We gotcher SAX, we gotcher DOM, we gotcher ElementTree, we gotcher expat. I think XML processing is probably a *lot* more used and in a lot more modes than general parsing. But the analogy is valid, even though I can't say it's powerful *enough*. There definitely is a bar to clear. I don't know if it's worth Nam's effort to try to clear it -- there's no guarantee of success on something like this. I just think we shouldn't be *too* discouraging. And I personally think parsing formal languages is an important enough field to deserve consideration for stdlib inclusion, even if it's not going to be used every day. Steve From antoine.pietri1 at gmail.com Mon Apr 1 10:27:50 2019 From: antoine.pietri1 at gmail.com (Antoine Pietri) Date: Mon, 1 Apr 2019 16:27:50 +0200 Subject: [Python-ideas] Backward-incompatible changes for Python 4 Message-ID: While the switch to Python 3 did an excellent job in removing some of the old inconsistencies we had in the language, pretty much everyone agrees that some other backwards-incompatible changes could be made to remove some old warts and bring even more consistency to Python. Since Python 4 is getting closer and closer, I think it?s time to finally discuss some of the most obvious changes we should do for Python 4. Here is the list I compiled: - The / operator returns floats, which loses information when both of the operands are integer. In Python 4, ?1 / 2? should return a decimal.Decimal. To ease the transition, we propose to add a new ?from __future__ import decimal_division? in Python 3.9 to enable this behavior. - As most of the Python ecosystem is moving towards async, some of the old I/O-blocking APIs should be progressively migrated to an async by default model. The most obvious candidate to start this transition is the print function, which blocks on the I/O of flushes. We propose to make ?print? an async coroutine. In Python 3.9, this feature could be optionally enabled with ?from __future__ import print_coroutine?. - To ease compatibility with the Windows API, the PyUnicode* objects should be internally represented as an array of uint16_t, as it would avoid the conversion overhead from UCS. CPython migration details are left as an exercise for the developer. We think more changes are obviously warranted (e.g adding a new string formatting module, changing the semantic of the import system, using := in with statements...), but these changes will need specific threads of their own. So, can you think of other backward-incompatible changes that should be done in Python 4? Don't hesitate to add your own ideas :-) Thanks, -- Antoine Pietri From 2QdxY4RzWzUUiLuE at potatochowder.com Mon Apr 1 10:41:40 2019 From: 2QdxY4RzWzUUiLuE at potatochowder.com (Dan Sommers) Date: Mon, 1 Apr 2019 10:41:40 -0400 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: <22ff9916-6971-ad7d-c908-05a114954abf@potatochowder.com> On 4/1/19 10:27 AM, Antoine Pietri wrote: > - The / operator returns floats, which loses information when both of > the operands are integer. In Python 4, ?1 / 2? should return a > decimal.Decimal. To ease the transition, we propose to add a new ?from > __future__ import decimal_division? in Python 3.9 to enable this > behavior. ?1 / 2? should be a syntax error. "1 / 2" should return a string. 1 / 2 should return a fractions.Fraction. From solipsis at pitrou.net Mon Apr 1 10:50:30 2019 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 1 Apr 2019 16:50:30 +0200 Subject: [Python-ideas] Backward-incompatible changes for Python 4 References: <22ff9916-6971-ad7d-c908-05a114954abf@potatochowder.com> Message-ID: <20190401165030.38d94ec3@fsol> On Mon, 1 Apr 2019 10:41:40 -0400 Dan Sommers <2QdxY4RzWzUUiLuE at potatochowder.com> wrote: > On 4/1/19 10:27 AM, Antoine Pietri wrote: > > > - The / operator returns floats, which loses information when both of > > the operands are integer. In Python 4, ?1 / 2? should return a > > decimal.Decimal. To ease the transition, we propose to add a new ?from > > __future__ import decimal_division? in Python 3.9 to enable this > > behavior. > > ?1 / 2? should be a syntax error. > > "1 / 2" should return a string. > > 1 / 2 should return a fractions.Fraction. And 01 / 04 / 2019 should return a April 1st datetime. (except in the US, of course) Regards Antoine. From ericfahlgren at gmail.com Mon Apr 1 10:55:04 2019 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Mon, 1 Apr 2019 07:55:04 -0700 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: <20190401165030.38d94ec3@fsol> References: <22ff9916-6971-ad7d-c908-05a114954abf@potatochowder.com> <20190401165030.38d94ec3@fsol> Message-ID: On Mon, Apr 1, 2019 at 7:50 AM Antoine Pitrou wrote: > > And 01 / 04 / 2019 should return a April 1st datetime. > > (except in the US, of course) > Where it would of course be I / IIII / MMXVIIII, unless you have ISO-8601 set in which case it would be MMXVIIII - IIII - I -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Mon Apr 1 10:58:16 2019 From: toddrjen at gmail.com (Todd) Date: Mon, 1 Apr 2019 10:58:16 -0400 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: On Mon, Apr 1, 2019, 10:28 Antoine Pietri wrote: > While the switch to Python 3 did an excellent job in removing some of > the old inconsistencies we had in the language, pretty much everyone > agrees that some other backwards-incompatible changes could be made to > remove some old warts and bring even more consistency to Python. > > Since Python 4 is getting closer and closer, I think it?s time to > finally discuss some of the most obvious changes we should do for > Python 4. Here is the list I compiled: > > - The / operator returns floats, which loses information when both of > the operands are integer. In Python 4, ?1 / 2? should return a > decimal.Decimal. To ease the transition, we propose to add a new ?from > __future__ import decimal_division? in Python 3.9 to enable this > behavior. > - As most of the Python ecosystem is moving towards async, some of the > old I/O-blocking APIs should be progressively migrated to an async by > default model. The most obvious candidate to start this transition is > the print function, which blocks on the I/O of flushes. We propose to > make ?print? an async coroutine. In Python 3.9, this feature could be > optionally enabled with ?from __future__ import print_coroutine?. > - To ease compatibility with the Windows API, the PyUnicode* objects > should be internally represented as an array of uint16_t, as it would > avoid the conversion overhead from UCS. CPython migration details are > left as an exercise for the developer. > > We think more changes are obviously warranted (e.g adding a new string > formatting module, changing the semantic of the import system, using > := in with statements...), but these changes will need specific > threads of their own. > > So, can you think of other backward-incompatible changes that should > be done in Python 4? Don't hesitate to add your own ideas :-) > > Thanks, > We should probably just get rid of floats entirely and use decimals internally. Floats just have too much unexpected behavior. Anyone wanting real IEEE floats can still use numpy float scalars. Another thing is that a lot of people in numeric computing are used to open-ended indices starting at 1. I would like to see the starting index and whether an index is open-ended or closed-ended configurable on a per-module basis and/or with a context manager. Currently there is no empty set literal. This is a hold-over from when there were no sets. Now would be a good opportunity to add one. I suggest {} become an empty set and {:} be an empty dict. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Apr 1 11:03:01 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Apr 2019 16:03:01 +0100 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: On Mon, 1 Apr 2019 at 15:59, Todd wrote: > Currently there is no empty set literal. This is a hold-over from when there were no sets. Now would be a good opportunity to add one. I suggest {} become an empty set and {:} be an empty dict. There should be no need for two styles - now that Python has type inference, it should be possible for users to just type {} and have the interpreter work out which was intended from context. Any ambiguity can easily be resolved by using a type hint. Paul From boxed at killingar.net Mon Apr 1 11:35:31 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Mon, 1 Apr 2019 17:35:31 +0200 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: Please let's all agree that April 1 is the worst day of the year. > On 1 Apr 2019, at 16:27, Antoine Pietri wrote: > > While the switch to Python 3 did an excellent job in removing some of > the old inconsistencies we had in the language, pretty much everyone > agrees that some other backwards-incompatible changes could be made to > remove some old warts and bring even more consistency to Python. > > Since Python 4 is getting closer and closer, I think it?s time to > finally discuss some of the most obvious changes we should do for > Python 4. Here is the list I compiled: > > - The / operator returns floats, which loses information when both of > the operands are integer. In Python 4, ?1 / 2? should return a > decimal.Decimal. To ease the transition, we propose to add a new ?from > __future__ import decimal_division? in Python 3.9 to enable this > behavior. > - As most of the Python ecosystem is moving towards async, some of the > old I/O-blocking APIs should be progressively migrated to an async by > default model. The most obvious candidate to start this transition is > the print function, which blocks on the I/O of flushes. We propose to > make ?print? an async coroutine. In Python 3.9, this feature could be > optionally enabled with ?from __future__ import print_coroutine?. > - To ease compatibility with the Windows API, the PyUnicode* objects > should be internally represented as an array of uint16_t, as it would > avoid the conversion overhead from UCS. CPython migration details are > left as an exercise for the developer. > > We think more changes are obviously warranted (e.g adding a new string > formatting module, changing the semantic of the import system, using > := in with statements...), but these changes will need specific > threads of their own. > > So, can you think of other backward-incompatible changes that should > be done in Python 4? Don't hesitate to add your own ideas :-) > > Thanks, > > -- > Antoine Pietri > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Mon Apr 1 14:33:11 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Apr 2019 19:33:11 +0100 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: On Mon, 1 Apr 2019 at 16:36, Anders Hovm?ller wrote: > > Please let's all agree that April 1 is the worst day of the year. Maybe in Python 4, datetime.datetime should silently convert 1st April to the 2nd? :-P Paul From brett at python.org Mon Apr 1 14:27:19 2019 From: brett at python.org (Brett Cannon) Date: Mon, 1 Apr 2019 11:27:19 -0700 Subject: [Python-ideas] PEP-582 and multiple Python versions In-Reply-To: References: Message-ID: I just wanted to warn people that I don't know if any of the authors of PEP 582 subscribe to python-ideas and they have not brought it forward for discussion yet, so there's no guarantee of a response. On Mon, Apr 1, 2019 at 5:27 AM Calvin Spealman wrote: > While the PEP does show the version number as part of the path to the > actual packages, implying support for multiple versions, this doesn't seem > to be spelled out in the actual text. Presumably __pypackages__/3.8/ might > sit beside __pypackages__/3.9/, etc. to keep future versions capable of > installing packages for each version, the way virtualenv today is bound to > one version of Python. > > I'd like to raise a potential edge case that might be a problem, and > likely an increasingly common one: users with multiple installations of the > *same* version of Python. This is actually a common setup for Windows users > who use WSL, Microsoft's Linux-on-Windows solution, as you could have both > the Windows and Linux builds of a given Python version installed on the > same machine. The currently implied support for multiple versions would not > be able to separate these and could create problems if users pip install a > Windows binary package through Powershell and then try to run a script in > Bash from the same directory, causing the Linux version of Python to try to > use Windows python packages. > > I'm not actually sure what the solution here is. Mostly I wanted to raise > the concern, because I'm very keen on WSL being a great entry path for new > developers and I want to make that a better experience, not a more > confusing one. Maybe that version number could include some other unique > identify, maybe based on Python's own executable. A hash maybe? I don't > know if anything like that already exists to uniquely identify a Python > build or installation. > > -- > > CALVIN SPEALMAN > > SENIOR QUALITY ENGINEER > > cspealma at redhat.com M: +1.336.210.5107 > > TRIED. TESTED. TRUSTED. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Apr 1 15:02:21 2019 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 1 Apr 2019 20:02:21 +0100 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: <1c854d5a-63cb-8987-b2b9-a8deed0ecac9@mrabarnett.plus.com> On 2019-04-01 19:33, Paul Moore wrote: > On Mon, 1 Apr 2019 at 16:36, Anders Hovm?ller wrote: >> >> Please let's all agree that April 1 is the worst day of the year. > > Maybe in Python 4, datetime.datetime should silently convert 1st April > to the 2nd? :-P > Converting silently is not Pythonic. It should raise an exception. :-) From njs at pobox.com Mon Apr 1 15:17:23 2019 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 1 Apr 2019 12:17:23 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen wrote: > Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those parsers. Can you be more specific about exactly which code in the stdlib you think should be rewritten to use a parsing library? -n -- Nathaniel J. Smith -- https://vorpus.org From jelle.zijlstra at gmail.com Mon Apr 1 16:05:30 2019 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Mon, 1 Apr 2019 13:05:30 -0700 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: El lun., 1 abr. 2019 a las 7:28, Antoine Pietri () escribi?: > While the switch to Python 3 did an excellent job in removing some of > the old inconsistencies we had in the language, pretty much everyone > agrees that some other backwards-incompatible changes could be made to > remove some old warts and bring even more consistency to Python. > > Since Python 4 is getting closer and closer, I think it?s time to > finally discuss some of the most obvious changes we should do for > Python 4. Here is the list I compiled: > > - The / operator returns floats, which loses information when both of > the operands are integer. In Python 4, ?1 / 2? should return a > decimal.Decimal. To ease the transition, we propose to add a new ?from > __future__ import decimal_division? in Python 3.9 to enable this > behavior. > More broadly, one of the best changes in Python 3 was the sanitization of the string/unicode logic: in Python 2 str and unicode were mostly-but-not-always interchangeable, but not always, and that led to a lot of hard to debug errors. Python 3 fixed this by separating the two more cleanly. Python 4 has the opportunity to do something similar to separate out another pair of easily confused types: int and float. Broadly speaking, we should use float for human-understandable numbers, and int for things that map directly to memory offsets in the computer, and we should avoid mixing them. This suggests the following changes: - int + float (and generally any mixed operation between ints and floats) should throw a TypeError - len() should return a float - list.__getitem__ should only accepts ints, not floats - integer overflow should use two's complement wraparound instead of infinite precision > - As most of the Python ecosystem is moving towards async, some of the > old I/O-blocking APIs should be progressively migrated to an async by > default model. The most obvious candidate to start this transition is > the print function, which blocks on the I/O of flushes. We propose to > make ?print? an async coroutine. In Python 3.9, this feature could be > optionally enabled with ?from __future__ import print_coroutine?. > - To ease compatibility with the Windows API, the PyUnicode* objects > should be internally represented as an array of uint16_t, as it would > avoid the conversion overhead from UCS. CPython migration details are > left as an exercise for the developer. > > We think more changes are obviously warranted (e.g adding a new string > formatting module, changing the semantic of the import system, using > := in with statements...), but these changes will need specific > threads of their own. > > So, can you think of other backward-incompatible changes that should > be done in Python 4? Don't hesitate to add your own ideas :-) > > Thanks, > > -- > Antoine Pietri > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bitsink at gmail.com Mon Apr 1 16:08:02 2019 From: bitsink at gmail.com (Nam Nguyen) Date: Mon, 1 Apr 2019 13:08:02 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: Sure! Same examples mentioned in Victor's https://vstinner.github.io/tag/security.html could have been fixed by having a more proper parser. This one that I helped author was also a parsing issue. https://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html Thanks for the pointer to pgen2, Guido. I have only quickly skimmed through it and thought it was really closely tied to the Python language. Maybe I'm wrong, so I'll need some time to try it out on some of those previous security fixes. Cheers, Nam On Mon, Apr 1, 2019 at 12:17 PM Nathaniel Smith wrote: > On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen wrote: > > Installing a package out of stdlib does not solve the problem that > motivated this thread. The libraries included in the stdlib can't use those > parsers. > > Can you be more specific about exactly which code in the stdlib you > think should be rewritten to use a parsing library? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricocotam at gmail.com Mon Apr 1 17:12:44 2019 From: ricocotam at gmail.com (Adrien Ricocotam) Date: Mon, 1 Apr 2019 23:12:44 +0200 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: Am I being fooled ? I guess yes That?s the worst idea I ever heard. Python is supposed to be easy to use, don?t change it into Rust ! On Mon 1 Apr 2019 at 22:06, Jelle Zijlstra wrote: > > > El lun., 1 abr. 2019 a las 7:28, Antoine Pietri (< > antoine.pietri1 at gmail.com>) escribi?: > >> While the switch to Python 3 did an excellent job in removing some of >> the old inconsistencies we had in the language, pretty much everyone >> agrees that some other backwards-incompatible changes could be made to >> remove some old warts and bring even more consistency to Python. >> >> Since Python 4 is getting closer and closer, I think it?s time to >> finally discuss some of the most obvious changes we should do for >> Python 4. Here is the list I compiled: >> >> - The / operator returns floats, which loses information when both of >> the operands are integer. In Python 4, ?1 / 2? should return a >> decimal.Decimal. To ease the transition, we propose to add a new ?from >> __future__ import decimal_division? in Python 3.9 to enable this >> behavior. >> > More broadly, one of the best changes in Python 3 was the sanitization of > the string/unicode logic: in Python 2 str and unicode were > mostly-but-not-always interchangeable, but not always, and that led to a > lot of hard to debug errors. Python 3 fixed this by separating the two more > cleanly. Python 4 has the opportunity to do something similar to separate > out another pair of easily confused types: int and float. > > Broadly speaking, we should use float for human-understandable numbers, > and int for things that map directly to memory offsets in the computer, and > we should avoid mixing them. This suggests the following changes: > - int + float (and generally any mixed operation between ints and floats) > should throw a TypeError > - len() should return a float > - list.__getitem__ should only accepts ints, not floats > - integer overflow should use two's complement wraparound instead of > infinite precision > > >> - As most of the Python ecosystem is moving towards async, some of the >> old I/O-blocking APIs should be progressively migrated to an async by >> default model. The most obvious candidate to start this transition is >> the print function, which blocks on the I/O of flushes. We propose to >> make ?print? an async coroutine. In Python 3.9, this feature could be >> optionally enabled with ?from __future__ import print_coroutine?. >> - To ease compatibility with the Windows API, the PyUnicode* objects >> should be internally represented as an array of uint16_t, as it would >> avoid the conversion overhead from UCS. CPython migration details are >> left as an exercise for the developer. >> >> We think more changes are obviously warranted (e.g adding a new string >> formatting module, changing the semantic of the import system, using >> := in with statements...), but these changes will need specific >> threads of their own. >> >> So, can you think of other backward-incompatible changes that should >> be done in Python 4? Don't hesitate to add your own ideas :-) >> >> Thanks, >> >> -- >> Antoine Pietri >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 2QdxY4RzWzUUiLuE at potatochowder.com Mon Apr 1 17:37:42 2019 From: 2QdxY4RzWzUUiLuE at potatochowder.com (Dan Sommers) Date: Mon, 1 Apr 2019 17:37:42 -0400 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: On 4/1/19 11:35 AM, Anders Hovm?ller wrote: > Please let's all agree that April 1 is the worst day of the year. I can't reproduce your problem. What version of Python are you running, and on what OS? Please show us your program, and tell us what you expected it to do and what it did that failed to meet those expectations. Copy and paste exactly any error messages or tracebacks you received. See also http://www.sscce.org/ for more information. From greg.ewing at canterbury.ac.nz Mon Apr 1 17:43:19 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Apr 2019 10:43:19 +1300 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: <5CA285F7.70801@canterbury.ac.nz> Paul Moore wrote: > now that Python has type > inference, it should be possible for users to just type {} and have > the interpreter work out which was intended from context. Or have {} return an ambiguous object that turns into a dict or set depending on what is done to it. We could call it a quict (quantum dict). To support this, we would also have to add a third possible value for type bool: >>> x = {} >>> isinstance(x, dict) Maybe >>> isinstance(x, set) Maybe >>> x['foo'] = 42 >>> isinstance(x, dict) True >>> isinstance(x, set) False -- Greg From tjreedy at udel.edu Mon Apr 1 18:12:00 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 1 Apr 2019 18:12:00 -0400 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On 4/1/2019 1:14 AM, Guido van Rossum wrote: > We do have a parser generator in the standard library: > https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2 It is effectively undocumented and by inference discouraged from use. The entry for lib2to3 in the 2to3 doc: https://docs.python.org/3/library/2to3.html#module-lib2to3 " lib2to3 - 2to3?s library Source code: Lib/lib2to3/ Note: The lib2to3 API should be considered unstable and may change drastically in the future. help(pgen) is not much more helpful. : Help on package lib2to3.pgen2 in lib2to3: NAME lib2to3.pgen2 - The pgen2 package. PACKAGE CONTENTS conv driver grammar literals parse pgen token tokenize FILE c:\programs\python38\lib\lib2to3\pgen2\__init__.py -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Mon Apr 1 18:05:09 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Apr 2019 11:05:09 +1300 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: Message-ID: <5CA28B15.8000105@canterbury.ac.nz> Anders Hovm?ller wrote: > Please let's all agree that April 1 is the worst day of the year. Agreed. In light of that, I propose that the datetime module in Python 4 be changed so that April 1 does not exist: >>> m31 = date(2019, 3, 31) >>> m31 + timedelta(days = 1) datetime.date(2019, 4, 2) This would remove a large amount of confusion from the world, and ensure that Python never receives any more backwards incompatible changes. Obviously, removing a whole day from the year will create problems keeping the calendar in step with the seasons. To compensate, it will be necessary to add approximately 1.25 days worth of leap seconds to each year. This works out to about one leap second every 5 minutes. If a suitable algorithm is devised for distributing these "leap minutes" as evenly as possible over the year, this should cause minimal disruption. -- Greg From cs at cskk.id.au Mon Apr 1 18:29:33 2019 From: cs at cskk.id.au (Cameron Simpson) Date: Tue, 2 Apr 2019 09:29:33 +1100 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: <5CA285F7.70801@canterbury.ac.nz> References: <5CA285F7.70801@canterbury.ac.nz> Message-ID: <20190401222933.GA92321@cskk.homeip.net> On 02Apr2019 10:43, Greg Ewing wrote: >Paul Moore wrote: >>now that Python has type >>inference, it should be possible for users to just type {} and have >>the interpreter work out which was intended from context. > >Or have {} return an ambiguous object that turns into a dict or >set depending on what is done to it. > >We could call it a quict (quantum dict). [...] I have concerns that this may lead to excessive memory use. My personal interpretation of QM is the continuous many worlds form, where _all_ values of the field are valid and real, and where the supposedly "collapse" on observation/interaction is just a measurement effect: we in the classical domain measure a particular value of the field and all of our future classical view flows consistent with that measurement, but there is _no_ value collapse of the system; the classical operation with the specific value is just a specific view of the uncollapsed QM state space. This is analogous to Python "bool(some-float-value)": we get a True or False but the underlying value is unchanged. As such, the quict type should be managed by an underlying state with a thread local view; when current thread performs a dict-like or set-like operation on the quict that thread should get a thread-local dict or set flavour view of the quict, with the underlying quict still open. In this way multiple threads get their "collapsed" view of the underlying quict, directly supporting many worlds programme execution. Obviously, any operations which do not induce a dict/set "measurement" leave the quict in the uncollapsed view. And it follows that "print(quict)" of an uncollapsed quict should choose a dict or set view for that thread at random and from then on the quict would have that flavour in that thread. For simple dict/set quicts this presents a fairly capped memory use (two flavours, and per thread view state) but companion types such as the quoat have much more scope for heavy memory consumption. Cheers, Cameron Simpson From jcgoble3 at gmail.com Mon Apr 1 18:33:40 2019 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Mon, 1 Apr 2019 18:33:40 -0400 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: <5CA28B15.8000105@canterbury.ac.nz> References: <5CA28B15.8000105@canterbury.ac.nz> Message-ID: On Mon, Apr 1, 2019 at 6:12 PM Greg Ewing wrote: > Obviously, removing a whole day from the year will create problems > keeping the calendar in step with the seasons. To compensate, it > will be necessary to add approximately 1.25 days worth of leap > seconds to each year. This works out to about one leap second > every 5 minutes. If a suitable algorithm is devised for distributing > these "leap minutes" as evenly as possible over the year, this > should cause minimal disruption. > Far more disruption than you think, because that would result in daylight at midnight and nighttime at noon for a good chunk of the year. Instead, I suggest permanently extending February to 29 days instead, with a 30th day in leap years. This would limit the disruption to a single month (March), and only by an offset of one day. I never understood what February did wrong to be disrespected with such a short month anyway. Instead, February would be equal in length to April most of the time, and every four years (at least within our lifetimes *cough2100cough*) it would get to gloat over being longer than April. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Apr 1 18:49:15 2019 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Apr 2019 09:49:15 +1100 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: References: <5CA28B15.8000105@canterbury.ac.nz> Message-ID: On Tue, Apr 2, 2019 at 9:34 AM Jonathan Goble wrote: > > On Mon, Apr 1, 2019 at 6:12 PM Greg Ewing wrote: >> >> Obviously, removing a whole day from the year will create problems >> keeping the calendar in step with the seasons. To compensate, it >> will be necessary to add approximately 1.25 days worth of leap >> seconds to each year. This works out to about one leap second >> every 5 minutes. If a suitable algorithm is devised for distributing >> these "leap minutes" as evenly as possible over the year, this >> should cause minimal disruption. > > > Far more disruption than you think, because that would result in daylight at midnight and nighttime at noon for a good chunk of the year. Instead, I suggest permanently extending February to 29 days instead, with a 30th day in leap years. This would limit the disruption to a single month (March), and only by an offset of one day. I never understood what February did wrong to be disrespected with such a short month anyway. Instead, February would be equal in length to April most of the time, and every four years (at least within our lifetimes *cough2100cough*) it would get to gloat over being longer than April. > You don't know what heinous crimes February committed, because they were overshadowed by March which violated the normal rules by not just having a single id(), but multiple. Beware the IDs of March. ChrisA From steve at pearwood.info Mon Apr 1 20:52:52 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Apr 2019 11:52:52 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> Message-ID: <20190402005251.GU31406@ando.pearwood.info> On Mon, Apr 01, 2019 at 02:29:44PM +1100, Chris Angelico wrote: > The multiple affix case has exactly two forms: > > 1) Tearing multiple affixes off (eg stripping "asdf.jpg.png" down to > just "asdf"), which most people are saying "no, don't do that, it > doesn't make sense and isn't needed" Perhaps I've missed something obvious (its been a long thread, and I'm badly distracted with hardware issues that are causing me some considerable grief), but I haven't seen anyone say "don't do that". But I have seen David Mertz say that this was the best behaviour: [quote] fname = 'silly.jpg.png.gif.png.jpg.gif.jpg' I'm honestly not sure what behavior would be useful most often for this oddball case. For the suffixes, I think "remove them all" is probably the best [end quote] I'd also like to point out that this is not an oddball case. There are two popular platforms where file extensions are advisory not mandatory (Linux and Mac), but even on Windows it is possible to get files with multiple, meaningful, extensions (foo.tar.gz for example) as well as periods used in place of spaces (a.funny.cat.video.mp4). > 2) Removing one of several options, which implies that one option is a > strict subpiece of another (eg stripping off "test" and "st") I take it you're only referring to the problematic cases, because there's the third option, where none of the affixes to be removed clash: spam.cut_suffix(("ed", "ing")) But that's pretty uninteresting and a simple loop or repeated call to the method will work fine: spam.cut_suffix("ed").cut_suffix("ing") just as we do with replace: spam.replace(",", "").replace(" ", "") If you only have a few affixes to work with, this is fine. If you have a lot, you may want a helper function, but that's okay. > If anyone is advocating for #1, I would agree with saying YAGNI. David Mertz did. > But #2 is an extremely unlikely edge case, and whatever semantics are > chosen for it, *normal* usage will not be affected. Not just unlikely, but "extremely" unlikely? Presumably you didn't just pluck that statement out of thin air, but have based it on an objective and statistically representative review of existing code and projections of future uses of these new methods. How could I possibly argue with that? Except to say that I think it is recklessly irresponsible for people engaged in language design to dismiss edge cases which will cause users real bugs and real pain so easily. We're not designing for our personal toolbox, we're designing for hundreds of thousands of other people with widely varying needs. It might be rare for you, but for somebody it will be happening ten times a day. And for somebody else, it will only happen once a year, but when it does, their code won't raise an exception it will just silently do the wrong thing. This is why replace does not take a set of multiple targets to replace. The user, who knows their own use-case and what behaviour they want, can write their own multiple-replace function, and we don't have to guess what they want. The point I am making is not that we must not ever support multiple affixes, but that we shouldn't rush that decision. Let's pick the low-hanging fruit, and get some real-world experience with the function before deciding how to handle the multiple affix case. [...] > Or all the behaviours actually do the same thing anyway. In this thread, I keep hearing this message: "My own personal use-case will never be affected by clashing affixes, so I don't care what behaviour we build into the language, so long as we pick something RIGHT NOW and don't give the people actually affected time to use the method and decide what works best in practice for them." Like for the str.replace method, the final answer might be "there is no best behaviour and we should refuse to choose". Why are we rushing to permanently enshrine one specific behaviour into the builtins before any of the users of the feature have a chance to use it and decide for themselves which suits them best? Now is better than never. Although never is often better than *right* now. Somebody (I won't name names, but they know who they are) wrote to me off-list some time ago and accused me of being arrogant and thinking I know more than everyone else. Well perhaps I am, but I'm not so arrogant as to think that I can choose the right behaviour for clashing affixes for other people when my own use-cases don't have clashing affixes. [...] > Sure, but I've often wanted to do something like "strip off a prefix > of http:// or https://", or something else that doesn't have a > semantic that's known to the stdlib. I presume there's a reason you aren't using urllib.parse and you just need a string without the leading scheme. If you're doing further parsing, the stdlib has the right batteries for that. (Aside: perhaps urllib.parse.ParseResult should get an attribute to return the URL minus the scheme? That seems like it would be useful.) > Also, this is still fairly > verbose, and a lot of people are going to reach for a regex, just > because it can be done in one line of code. Okay, they will use a regex. Is this a problem? We're not planning on banning regexes are we? If they're happy using regexes, and don't care that it will be perhaps 3 times slower, let them. > > I posted links to prior art. Unless I missed something, not one of those > > languages or libraries supports multiple affixes in the one call. > > And they don't support multiple affixes in startswith/endswith either, > but we're very happy to have that in Python. But not until we had a couple of releases of experience with them: https://docs.python.org/2.7/library/stdtypes.html#str.endswith And .replace still only takes a single target to be replaced. [...] > We don't have to worry about edge cases that are > unlikely to come up in real-world code, And you are making that pronouncement on the basis of what? Your gut feeling? Perhaps you're thinking too narrowly. Here's a partial list of English prefixes that somebody doing text processing might want to remove to get at the root word: a an ante anti auto circum co com con contra contro de dis en ex extra hyper il im in ir inter intra intro macro micro mono non omni post pre pro sub sym syn tele un uni up I count fourteen clashes: a: an ante anti an: ante anti co: com con contra contro ex: extra in: inter intra intro un: uni (That's over a third of this admittedly incomplete list of prefixes.) I can think of at least one English suffix pair that clash: -ify, -fy. How about other languages? How comfortable are you to say that nobody doing text processing in German or Hindi will need to deal with clashing affixes? -- Steven From rosuav at gmail.com Mon Apr 1 21:12:40 2019 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Apr 2019 12:12:40 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402005251.GU31406@ando.pearwood.info> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: On Tue, Apr 2, 2019 at 11:53 AM Steven D'Aprano wrote: > The point I am making is not that we must not ever support multiple > affixes, but that we shouldn't rush that decision. Let's pick the > low-hanging fruit, and get some real-world experience with the function > before deciding how to handle the multiple affix case. I still haven't seen anyone actually give a good reason for not going with "first wins", other than a paranoia that we don't have any real-world use-cases. And there are PLENTY of real-world use-cases where any semantics will have the same effect, and only a few where it would be at all important (and in all of those, "first wins" has been the correct semantic). By saying "let's add the method, but not give it all the power yet", you just create more version problems. "Oh, so I can use cutprefix back as far as 3.8, but if I use more than one prefix, now I have to say that this requires 3.9." Why not just give it the full power straight away? Are you actually expecting to find enough use-cases where "longest wins" or some other definition will be better? You can debate whether it's "extremely unlikely" to matter or it's "reasonably common" or whatever, but unless it ever matters AND has to be something other than first-match-wins, there's no reason not to lock in those semantics. ChrisA From mertz at gnosis.cx Mon Apr 1 21:34:21 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Apr 2019 21:34:21 -0400 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402005251.GU31406@ando.pearwood.info> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano wrote: > The point I am making is not that we must not ever support multiple > affixes, but that we shouldn't rush that decision. Let's pick the > low-hanging fruit, and get some real-world experience with the function > before deciding how to handle the multiple affix case. > There are exactly two methods of strings that deal specifically with affixes currently. Startswith and endswith. Both of those allow specifying multiple affixes. That's pretty strong real-world experience, and breaking the symmetry for no reason is merely confusing. Especially since the consistency would be obviously as commonly useful. Now look, the sky won't fall if a single-affix-only method is added. For that matter, it won't if nothing is added. In fact, the single affix version makes it a little bit easier to write a custom function handling multiple affixes. And the sky won't fall if the remove-just-one semantics are used rather than remove-from-class. But adding methods with sneakily helpful capabilities often helps users greatly. A lot of folks in this thread didn't even know about passing a tuple to str.startswith() a few days ago. I'm pretty sure that capability was added by Raymond, who has an amazingly good sense of what little tricks can prove really powerful. Apologies to a different developer if it wasn't him, but congrats and thanks to you if so. Somebody (I won't name names, but they know who they are) wrote to me > off-list some time ago and accused me of being arrogant and thinking I know > more than everyone else. Well perhaps I am, but I'm not so arrogant as to > think that I can choose the right behaviour for clashing affixes for other > people when my own use-cases don't have clashing affixes. > That could be me... Unless it's someone else :-). I think my intent was a bit different than you characterize, but I'm very guilty of presuming too much also. So mea culpa. > Sure, but I've often wanted to do something like "strip off a prefix > > of http:// or https://", or something else that doesn't have a > > semantic that's known to the stdlib. > > I presume there's a reason you aren't using urllib.parse and you just need > a string without the leading scheme. If you're doing further parsing, the > stdlib has the right batteries for that. > I know there are lots of specialized string manipulations in the STDLIB. Yeah, I could use os.path.splitext, and os.path.split, and urllib.parse.something, and lots of other things I rarely use. A lot of us like to manipulate strings in generically stringy ways. But not until we had a couple of releases of experience with them: > > https://docs.python.org/2.7/library/stdtypes.html#l.endswith > Ok. Fair point. I used Python 2.4 without the multiple affix option. Here's a partial list of English prefixes that somebody doing text > processing might want to remove to get at the root word: > > a an ante anti auto circum co com con contra contro de dis > en ex extra hyper il im in ir inter intra intro macro micro > mono non omni post pre pro sub sym syn tele un uni up > > I count fourteen clashes: > > a: an ante anti > an: ante anti > co: com con contra contro > ex: extra > in: inter intra intro > un: uni > This seems like a good argument for remove-all-from-class. :-) stem = word.lstrip(prefix_tup) But the we really need 'word.porter_stemmer()' as a built-in method. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Apr 1 22:03:26 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Apr 2019 13:03:26 +1100 Subject: [Python-ideas] Lessons learned from an API design mistake [was New explicit methods to trim strings] In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> Message-ID: <20190402020325.GW31406@ando.pearwood.info> I think the point Chris made about statistics.mode is important enough to start a new subthread about API design, and the lessons learned. On Mon, Apr 01, 2019 at 02:29:44PM +1100, Chris Angelico wrote: > We're basically debating collision semantics here. It's on par with > asking "how should statistics.mode() cope with multiple modes?". > Should the introduction of statistics.mode() have been delayed pending > a thorough review of use-cases, or is it okay to make it do what most > people want, and then be prepared to revisit its edge-case handling? > > (For those who don't know, mode() was changed in 3.8 to return the > first mode encountered, in contrast to previous behaviour where it > would raise an exception.) For those who are unaware, I was responsible for chosing the semantics of statistics.mode. My choice was to treat mode() as it is taught in secondary schools here in Australia, namely that if there are two or more equally common values, there is no mode. Statistically, there is no one right answer to how to treat multiple modes. Sometimes you treat them as true multiple modes, sometimes you say there is no mode, and sometimes you treat the fact that there are multiple modes as an artifact of the sample and pick one or another as the actual mode. There's no particular statistical reason to choose the first over the second or the third. So following the Zen, I refused to guess, and raised an exception. (I toyed with returning None instead, but decided against it for reasons that don't matter here.) This seemed like a good decision up front, and I don't remember there being any objections to that behaviour when the PEP was discussed both here and on Python-Dev. But once we had a few years of real-world practice, it turns out that: (1) Raising an exception was an annoying choice that meant that every use of mode() outside of the interactive interpreter needed to be wrapped in a try...except block, making it painful to use. (2) There are at least one good use-case for returning the first mode, even though statistically there's no reason to prefer it over any other. Importantly, that use-case was something that neither I, nor anyone involved in the original PEP debate for this, had thought of. It took a few years of actual use in the wild before anyone came up with an important, definitive use-case -- and it turns out to be completely unrelated to the statistical use of mode! Raymond Hettinger persuaded me that this non-statistics use-case was important enough for mode to pick a behaviour which has no statistical justification. (Also, many other statistics packages do the same thing, so even if we're wrong, we're no worse than everyone else.) Had I ignored the Zen and, in the face of ambiguity, *guessed* which mode to return, I could have guessed wrongly and returned one of these: - the largest mode - or the smallest - the last seen mode - the mode closest to the mean - or median, or some other measure of central tendency - or some sort of special "multi-mode" object (perhaps a list). I would have missed a real use-case that I never imagined existed, as well as a good opportunity for optimization. Raymond's new version of mode is faster as well as more useful. Because I *refused to guess* and raised an exception: (1) mode was harder to use than it should have been; (2) but we were able to change its behaviour without a lengthy and annoying depreciation period, or introducing a "new_mode" function. Knowing what I know *now*, if I were designing mode() from scratch I'd go with Raymond's design. If it is statistically unjustified, its justified by other reasons, and if it is wrong, it's not so wrong as to be useless, and its wrong in a way that many other statistics libraries are also wrong. So we're in good company. But I didn't know that *then*, and I never would have guessed that there was a non-statistical use for mode. Lesson number one: Just because you have thought about your function for five minutes, or five months, doesn't mean you have thought of all the real-world uses. Lesson number two: A lot of the Zen is intended as a joke, but the koan about refusing to guess is very good advice. When possible, be conservative, take your time to make a decision, and base it on real-world experience, not gut feelings about what is "obviously" correct. In language design even more than personal code, You Ain't Gonna Need It (Yet) applies. Lesson number three: Sometimes, to not make a decision is itself a decision. In the case of mode, I had to deal with multiple modes *somehow*, I couldn't just ignore it. Fortunately I chose to raise an exception, which made it possible to change my mind later without a lengthy deprecation period. But that in turn made the function more annoying and difficult to use in practice. But in the case of the proposed str.cut_prefix and cut_suffix methods, we can avoid the decision of what to do with multiple affixes by just not supporting them! We don't have to make a decision to raise an exception, or return X (for whatever semantics of X we choose). There's no need to choose *anything* about the multiple affix case until we have more real-world experience to make a judgement. Lesson number four: Python is nearly 30 years old, and the str.replace() method still refuses to guess how to deal with the case of multiple target strings. That doesn't make replace useless. -- Steven From 2QdxY4RzWzUUiLuE at potatochowder.com Mon Apr 1 22:10:08 2019 From: 2QdxY4RzWzUUiLuE at potatochowder.com (Dan Sommers) Date: Mon, 1 Apr 2019 22:10:08 -0400 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: <2877460d-a160-dbda-82ed-4676a735bbfe@potatochowder.com> On 4/1/19 9:34 PM, David Mertz wrote: > On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano wrote: > >> The point I am making is not that we must not ever support multiple >> affixes, but that we shouldn't rush that decision. Let's pick the >> low-hanging fruit, and get some real-world experience with the function >> before deciding how to handle the multiple affix case. >> > > There are exactly two methods of strings that deal specifically with > affixes currently. Startswith and endswith. Both of those allow specifying > multiple affixes. That's pretty strong real-world experience, and breaking > the symmetry for no reason is merely confusing. Especially since the > consistency would be obviously as commonly useful. My imagination is failing me: for multiple affixes (affices?), what is a use case for removing one, but not having the function return which one? In other words, shouldn't a function that removes multiple affixes also return which one(s) were removed? I think I'm agreeing with Steven: take the low hanging fruit now, and worry about complexification later (because I'm not sure that the existing API is good when removing multiple affixes). Stemming is hard, because a lot of words begin/end with common affixes, but that string of letters isn't always an affix. For example, removing common prefixes from "relay" leaves "lay," but that's not the root; similarly with "relax" and "area." If my algorithm is "look for the word in a list of known words, if it's there then great, but if it's not then remove one affix and try again," then I don't want to remove all the affixes at once. When removing extensions from filenames, all of my use cases involve removing one at a time and acting on the one that was removed. For example, decompressing foo.tar.gz into foo.tar, and then untarring foo.tar into foo. I suppose I can imagine removing tar.gz and then decompressing and untarring in one step, but again, then I have to know which suffixes were removed. Or maybe I could process foo.tar.gz and want to end up with foo.norm (bonus points for recognizing the XKCD reference), but my personal preference would still be to produce foo.tar.gz.norm by default and let the user specify the ultimate filename if they want something else. So I've seen someone (likely David Mertz?) ask for something like filename.strip_suffix(('.png', '.jpg')). What is the context? Is it strictly a filename processing program? Do you subsequently have to determine the suffix(es) at hand? From steve at pearwood.info Mon Apr 1 22:15:17 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Apr 2019 13:15:17 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: <20190402021517.GX31406@ando.pearwood.info> On Mon, Apr 01, 2019 at 09:34:21PM -0400, David Mertz wrote: > On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano wrote: > > > The point I am making is not that we must not ever support multiple > > affixes, but that we shouldn't rush that decision. Let's pick the > > low-hanging fruit, and get some real-world experience with the function > > before deciding how to handle the multiple affix case. > > > > There are exactly two methods of strings that deal specifically with > affixes currently. Startswith and endswith. Both of those allow specifying > multiple affixes. When testing for the existence of a prefix (or suffix), there are no choices that need to be made for the multiple prefix case. If spam.startswith("contra"), then it also starts with "co", and we don't have to decide whether to delete six characters or two. If spam starts with one of ("de", "ex", "in", "mono"), then it doesn't matter what order we specify the tests, it will return True regardless. If you write a pure Python implementation of multiprefix startswith, there's one *obviously correct* version: def multi_startswith(astring, prefixes): return any(astring.prefix for prefix in prefixes) because it literally doesn't matter which of the prefixes triggered the match. I could randomize the order of the prefixes, and nothing would change. But if you delete the prefix, it matter a lot which prefix triggers the match. > That's pretty strong real-world experience But not as strong as str.replace, which is much older than starts/endswith and still refuses to guess what the user expects to do with multiple substrings. -- Steven From mertz at gnosis.cx Mon Apr 1 22:28:44 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Apr 2019 22:28:44 -0400 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <2877460d-a160-dbda-82ed-4676a735bbfe@potatochowder.com> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <2877460d-a160-dbda-82ed-4676a735bbfe@potatochowder.com> Message-ID: On Mon, Apr 1, 2019 at 10:11 PM Dan Sommers < 2QdxY4RzWzUUiLuE at potatochowder.com> wrote: > So I've seen someone (likely David Mertz?) ask for something > like filename.strip_suffix(('.png', '.jpg')). What is the > context? Is it strictly a filename processing program? Do > you subsequently have to determine the suffix(es) at hand? > Yes, I've sometimes wanted something like "the basename of all the graphic files in that directory." But here's another example that is from my actually current job: I do machine-learning/data science for a living. As part of that, I generate a bunch of models that try to make predictions from the same dataset. So I name those models like: dataset1.KNN_distance_n10.gz dataset1.KNN_distance_n10_poly2_scaled.xz dataset2.KNN_manhattan_n6.zip dataset2.KNN_distance_n10_poly2_scaled.xz dataset1.KNN_minkowski_n5.gz dataset1.LinSVC_Poly3_Scaled.gz dataset2.LogReg.bz2 dataset2.LogReg_Poly.gz dataset1.NuSVC_poly2_scaled.gz I would like to answer the question "What types of models have I tried against the datasets?" Obviously, I *can* answer this question. But it would be pleasant to answer it like this: styles = {model.lstrip(('dataset1', 'dataset2')) .rstrip(('gz', 'xz', 'zip', 'bz2)) for model in models} That's something very close to code I actually have in production now. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Apr 1 22:43:10 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Apr 2019 22:43:10 -0400 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402005251.GU31406@ando.pearwood.info> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: On Mon, Apr 1, 2019 at 8:54 PM Steven D'Aprano wrote: > I can think of at least one English suffix pair that clash: -ify, -fy. > How about other languages? How comfortable are you to say that nobody > doing text processing in German or Hindi will need to deal with clashing > affixes? > Here are the 30 most common suffixes in a large list of Dutch words. For similar answers for other languages, see https://gist.github.com/DavidMertz/1a4aac0e889097d7bf80d8d41a3a644d. Note that there is absolutely nothing morphological here, simply dumb string literals: % head -30 suffix-frequency-nl.txt ('en', 55338) ('er', 14387) ('de', 12541) ('den', 11427) ('ten', 9402) ('te', 8263) ('ng', 7502) ('es', 7398) ('st', 7102) ('ing', 6949) ('gen', 6836) ('rs', 6592) ('ers', 5581) ('ren', 4842) ('el', 4602) ('ngen', 4451) ('rde', 4255) ('ken', 4203) ('re', 3870) ('je', 3868) ('len', 3784) ('ste', 3680) ('ie', 3658) ('nd', 3635) ('erde', 3620) ('rden', 3593) ('jes', 3307) ('eren', 3193) ('id', 3123) ('rd', 3083) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Apr 2 03:23:12 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 2 Apr 2019 16:23:12 +0900 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: <5CA28B15.8000105@canterbury.ac.nz> References: <5CA28B15.8000105@canterbury.ac.nz> Message-ID: <23715.3552.729800.392240@turnbull.sk.tsukuba.ac.jp> Greg Ewing writes: > In light of that, I propose that the datetime module in Python 4 > be changed so that April 1 does not exist: > > >>> m31 = date(2019, 3, 31) > >>> m31 + timedelta(days = 1) > datetime.date(2019, 4, 2) > > This would remove a large amount of confusion from the world, and > ensure that Python never receives any more backwards incompatible > changes. > > Obviously, removing a whole day from the year will create problems It's much simpler to rename April 1 to February 29. (Bikeshed that 29 is already taken, and make it February 30 if you like. And note that February will maintain its uniqueness in an even more prominent way, as the only temporally disconnected month!) From jorropo.pgm at gmail.com Tue Apr 2 04:03:18 2019 From: jorropo.pgm at gmail.com (Jorropo .) Date: Tue, 2 Apr 2019 10:03:18 +0200 Subject: [Python-ideas] Backward-incompatible changes for Python 4 In-Reply-To: <23715.3552.729800.392240@turnbull.sk.tsukuba.ac.jp> References: <5CA28B15.8000105@canterbury.ac.nz> <23715.3552.729800.392240@turnbull.sk.tsukuba.ac.jp> Message-ID: > It's much simpler to rename April 1 to February 29. That will create a gap of one 1 day beetween shity regular callendar and our new one for 2 month. An algorithm distributing leap seconds near 1 first april, is maybe more complicated but reduce side effect with the world. -------------- next part -------------- An HTML attachment was scrubbed... URL: From evrial at gmail.com Tue Apr 2 06:37:18 2019 From: evrial at gmail.com (Alex Grigoryev) Date: Tue, 2 Apr 2019 13:37:18 +0300 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: Message-ID: <32FA2A90-E1E7-43FB-9694-50424B9CDB31@getmailspring.com> In your case you probably should use [model.split(".")[1] for model in models] strip_prefix should not be used with file extensions, method for files already exist. On Apr 2 2019, at 5:43 am, David Mertz wrote: > On Mon, Apr 1, 2019 at 8:54 PM Steven D'Aprano wrote: > > > I can think of at least one English suffix pair that clash: -ify, -fy. > > How about other languages? How comfortable are you to say that nobody > > doing text processing in German or Hindi will need to deal with clashing > > affixes? > > > Here are the 30 most common suffixes in a large list of Dutch words. For similar answers for other languages, see https://gist.github.com/DavidMertz/1a4aac0e889097d7bf80d8d41a3a644d. Note that there is absolutely nothing morphological here, simply dumb string literals: > % head -30 suffix-frequency-nl.txt > ('en', 55338) > ('er', 14387) > ('de', 12541) > ('den', 11427) > ('ten', 9402) > ('te', 8263) > ('ng', 7502) > ('es', 7398) > ('st', 7102) > ('ing', 6949) > ('gen', 6836) > ('rs', 6592) > ('ers', 5581) > ('ren', 4842) > ('el', 4602) > ('ngen', 4451) > ('rde', 4255) > ('ken', 4203) > ('re', 3870) > ('je', 3868) > ('len', 3784) > ('ste', 3680) > ('ie', 3658) > ('nd', 3635) > ('erde', 3620) > ('rden', 3593) > ('jes', 3307) > ('eren', 3193) > ('id', 3123) > ('rd', 3083) > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhodri at kynesim.co.uk Tue Apr 2 07:06:33 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Tue, 2 Apr 2019 12:06:33 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402005251.GU31406@ando.pearwood.info> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> Message-ID: <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> On 02/04/2019 01:52, Steven D'Aprano wrote: > Here's a partial list of English prefixes that somebody doing text > processing might want to remove to get at the root word: > > a an ante anti auto circum co com con contra contro de dis > en ex extra hyper il im in ir inter intra intro macro micro > mono non omni post pre pro sub sym syn tele un uni up > > I count fourteen clashes: > > a: an ante anti > an: ante anti > co: com con contra contro > ex: extra > in: inter intra intro > un: uni > > (That's over a third of this admittedly incomplete list of prefixes.) > > I can think of at least one English suffix pair that clash: -ify, -fy. You're beginning to persuade me that cut/trim methods/functions aren't a good idea :-) So far we have two slightly dubious use-cases. 1. Stripping file extensions. Personally I find that treating filenames like filenames (i.e. using os.path or (nowadays) pathlib) results in me thinking more appropriately about what I'm doing. 2. Stripping prefixes and suffixes to get to root words. Python has been used for natural language work for over a decade, and I don't think I've heard any great call from linguists for the functionality. English isn't a girl who puts out like that on a first date :-) There are too many common exception cases for such a straightforward approach not to cause confusion. 3. My most common use case (not very common at that) is for stripping annoying prompts off text-based APIs. I'm happy using .startswith() and string slicing for that, though your point about the repeated use of the string to be stripped off (or worse, hard-coding its length) is well made. I am beginning to worry slightly that actually there are usually more appropriate things to do than simply cutting off affixes, and that in providing these particular batteries we might be encouraging poor practise. -- Rhodri James *-* Kynesim Ltd From p.f.moore at gmail.com Tue Apr 2 07:23:15 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 2 Apr 2019 12:23:15 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> Message-ID: On Tue, 2 Apr 2019 at 12:07, Rhodri James wrote: > So far we have two slightly dubious use-cases. > > 1. Stripping file extensions. Personally I find that treating filenames > like filenames (i.e. using os.path or (nowadays) pathlib) results in me > thinking more appropriately about what I'm doing. I'd go further and say that filename manipulation is a great example of a place where generic string functions should definitely *not* be used. > 2. Stripping prefixes and suffixes to get to root words. Python has > been used for natural language work for over a decade, and I don't think > I've heard any great call from linguists for the functionality. English > isn't a girl who puts out like that on a first date :-) There are too > many common exception cases for such a straightforward approach not to > cause confusion. Agreed, using prefix/suffix stripping on natural language is at best a "quick hack". For robust usage, one of the natural language processing packages from PyPI is likely a far better fit. But "quick hacks" using the stdlib are not an unrealistic use case, so I don't think we should completely discount this. It's certainly not *compelling*, though. > 3. My most common use case (not very common at that) is for stripping > annoying prompts off text-based APIs. I'm happy using .startswith() and > string slicing for that, though your point about the repeated use of the > string to be stripped off (or worse, hard-coding its length) is well made. > > I am beginning to worry slightly that actually there are usually more > appropriate things to do than simply cutting off affixes, and that in > providing these particular batteries we might be encouraging poor practise. It would be really helpful if someone could go through the various use cases presented in this thread and classify them - filename manipulation, natural language uses, and "other". We could then focus on the "other" category to get a better feel for what use cases might act as a good argument for the feature. To me, it's starting to feel like a proposal that looks deceptively valuable because it's a "natural", or "obvious", addition to make, and there's a weight of people thinking of cases where they "might find it useful", but the reality is that many of those cases are not actually as good a fit for the feature as it seems at first glance. It would help the people in favour of the proposal to make their case if they could dispel that impression by giving a clearer summary of the expected use cases... Paul From cspealma at redhat.com Tue Apr 2 08:00:59 2019 From: cspealma at redhat.com (Calvin Spealman) Date: Tue, 2 Apr 2019 08:00:59 -0400 Subject: [Python-ideas] PEP-582 and multiple Python versions In-Reply-To: References: Message-ID: OK, I didn't know if this or -dev was more appropriate, so I opted on the safer side in terms of annoying people. I'll post to python-dev. On Mon, Apr 1, 2019 at 2:27 PM Brett Cannon wrote: > I just wanted to warn people that I don't know if any of the authors of > PEP 582 subscribe to python-ideas and they have not brought it forward for > discussion yet, so there's no guarantee of a response. > > On Mon, Apr 1, 2019 at 5:27 AM Calvin Spealman > wrote: > >> While the PEP does show the version number as part of the path to the >> actual packages, implying support for multiple versions, this doesn't seem >> to be spelled out in the actual text. Presumably __pypackages__/3.8/ might >> sit beside __pypackages__/3.9/, etc. to keep future versions capable of >> installing packages for each version, the way virtualenv today is bound to >> one version of Python. >> >> I'd like to raise a potential edge case that might be a problem, and >> likely an increasingly common one: users with multiple installations of the >> *same* version of Python. This is actually a common setup for Windows users >> who use WSL, Microsoft's Linux-on-Windows solution, as you could have both >> the Windows and Linux builds of a given Python version installed on the >> same machine. The currently implied support for multiple versions would not >> be able to separate these and could create problems if users pip install a >> Windows binary package through Powershell and then try to run a script in >> Bash from the same directory, causing the Linux version of Python to try to >> use Windows python packages. >> >> I'm not actually sure what the solution here is. Mostly I wanted to raise >> the concern, because I'm very keen on WSL being a great entry path for new >> developers and I want to make that a better experience, not a more >> confusing one. Maybe that version number could include some other unique >> identify, maybe based on Python's own executable. A hash maybe? I don't >> know if anything like that already exists to uniquely identify a Python >> build or installation. >> >> -- >> >> CALVIN SPEALMAN >> >> SENIOR QUALITY ENGINEER >> >> cspealma at redhat.com M: +1.336.210.5107 >> >> TRIED. TESTED. TRUSTED. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- CALVIN SPEALMAN SENIOR QUALITY ENGINEER cspealma at redhat.com M: +1.336.210.5107 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Tue Apr 2 08:27:53 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Tue, 2 Apr 2019 14:27:53 +0200 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> Message-ID: > On 2 Apr 2019, at 13:23, Paul Moore wrote: > > It would be really helpful if someone could go through the various use > cases presented in this thread and classify them - filename > manipulation, natural language uses, and "other". We could then focus > on the "other" category to get a better feel for what use cases might > act as a good argument for the feature. To me, it's starting to feel > like a proposal that looks deceptively valuable because it's a > "natural", or "obvious", addition to make, and there's a weight of > people thinking of cases where they "might find it useful", but the > reality is that many of those cases are not actually as good a fit for > the feature as it seems at first glance. It would help the people in > favour of the proposal to make their case if they could dispel that > impression by giving a clearer summary of the expected use cases... I found two instances of strip_prefix in the code base I work on: stripping "origin/" from git branch names, and "Author:" to get the author from log output, again from git. A good place to look for examples is this: https://github.com/search?utf8=?&q=%22strip_prefix%28%22+extension%3Apy+language%3APython+language%3APython&type=Code&ref=advsearch&l=Python&l=Python A pattern that one sees quickly is that there are lots and lots of functions that strip a specific and hardcoded prefix. There's a lot of path manipulation too. And of course, there's an enormous amount of copy paste (jsfunctions.py is everywhere!). Some examples from the search above: Removing "file:" prefix: https://github.com/merijn/dotfiles/blob/43c736c73c5eda413dc7b4615bb679bd43a18d1a/dotfiles/hg-data/hooks/bitbucket.py#L16 This is a strange one, which seems to strip different things? https://github.com/imperodesign/paas-tools/blob/649372762a18acefed0a24a970b93eb494529df9/deis/prd/controller/registry/tests.py#L99 Removing "master.": https://github.com/mithro/chromium-build/blob/98d83e124dc08510756906171922a22ba27b87fa/scripts/tools/dump_master_cfg.py#L67 Also not path: https://github.com/BlissRoms-x86/platform_external_swiftshader/blob/01c0db17f511badb921efc53981849cdacb82793/third_party/subzero/bloat/bloat.py#L212 Removing "Re:" from email subject lines: https://github.com/emersion/python-emailthreads/blob/0a56af7fd6de16105c27b7c149eeb0282e95e587/emailthreads/util.py#L21 Removing "MAILER_": https://github.com/vitalk/flask-mailer/blob/c724643f13e51d2e57546164e3e4abf9eb5d8097/flask_mailer/util.py#L30 I'm giving up now, because I got tired :) / Anders -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Tue Apr 2 11:42:39 2019 From: mikhailwas at gmail.com (Mikhail V) Date: Tue, 2 Apr 2019 18:42:39 +0300 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <061FBF45-02FD-4C3E-B792-EB09C05ED98E@getmailspring.com> <20190330124105.GC31406@ando.pearwood.info> Message-ID: On Sun, Mar 31, 2019 at 3:28 AM Brandt Bucher wrote: > > > An idea worth considering: one can think of the ?strip? family of methods > as currently taking an iterable of strings as an argument (since a string > is itself an sequence of strings): > > It would not be a huge logical leap to allow them to take any iterable. > Backward compatible, no new methods: > > >>> fname.rstrip(('.jpg', '.png', '.gif')) > > It even, in my opinion, can clarify "classic" strip/rstrip/lstrip usage: > > >>> "abcd".rstrip(("d", "c")) > 'ab' > > Maybe I?m missing a breaking case though, or this isn?t as clear for others. Thoughts? Now with this syntax I would have to write: string.rstrip ([substring]) Which means I would have to remember to put extra brackets. Not that bad, but needs extra caution, since it would release the type of the argument and IMO in the end may become confusing. So you change the behavior by the type of the argument variable. And again - having an iterable with *multiple* elements needs useful and transparent behavior first. The only intuitive behavior seems to be the algorithm: - take first element - check if it's the string suffix - if yes, cut it off and STOP the iteration - if not, repeat check with next element So it guarantees that only one (first) found element will be cut off. I don't see other useful cases yet. Even this one seems to be odd. More complicated behavior would be just too hard to follow. IMO having separate method is more user-friendly. As for the name, I like "rcut", "lcut" Self-explanative enough and matches other similar existing methods names. From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Apr 2 13:55:02 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 3 Apr 2019 02:55:02 +0900 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> Message-ID: <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Rhodri James writes: Steven d'Aprano writes: > > (That's over a third of this admittedly incomplete list of prefixes.) > > > > I can think of at least one English suffix pair that clash: -ify, -fy. And worse: is "tries" the third person present tense of "try" or is it the plural of "trie"? Pure lexical manipulation can't tell you. > You're beginning to persuade me that cut/trim methods/functions aren't a > good idea :-) I don't think I would go there yet (well, I started there, but...). > So far we have two slightly dubious use-cases. > > 1. Stripping file extensions. Personally I find that treating filenames > like filenames (i.e. using os.path or (nowadays) pathlib) results in me > thinking more appropriately about what I'm doing. Very much agree. > 2. Stripping prefixes and suffixes to get to root words. for suffix in english_suffixes: root = word.cutsuffix(suffix) if lookup_in_dictionary(root): do_something_appropriate_with_each_root_found() is surely more flexible and accurate than a hard-coded slice, and significantly more readable than for suffix in english_suffixes: root = word[:-len(suffix)] if word.endswith(suffix) else word if lookup_in_dictionary(root): do_something_appropriate_with_each_root_found() I think enough so that I might use a local def for cutsuffix if the method doesn't exist. So my feeling is that the use case for "or"-ing multiple suffixes is a lot weaker than it is for .endswith, but .cutsuffix itself is plausible. That said, I wouldn't add it if it were up to me. Among other things, for this root-extracting application def extract_root(word, prefix, suffix): word = word[len(prefix):] if word.endswith(prefix) else word word = word[:-len(suffix)] if word.endswith(suffix) else word # perhaps try further transforms like tri -> try here? return word and a double loop for prefix in english_prefixes: # includes '' for suffix in english_suffixes: # includes '' root = extract_root(word, prefix, suffix) if lookup_in_dictionary(root): yield root (probably recursive, as well) seems most elegant. > 3. My most common use case (not very common at that) is for stripping > annoying prompts off text-based APIs. I'm happy using > .startswith() and string slicing for that, though your point about > the repeated use of the string to be stripped off (or worse, > hard-coding its length) is well made. I don't understand this use case, specifically the opposition to hard-coding the length. Although hard-coding the length wouldn't occur to me in many cases, since I'd use # remove my bash prompt prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ') lines = [prompt_re.sub('', line) for line in lines] if I understand the task correctly. Similarly, there's a lot of regexp-removable junk in MTA logs, timestamps and DNS lookups for example, that can't be handled with cutprefix. > I am beginning to worry slightly that actually there are usually > more appropriate things to do than simply cutting off affixes, and > that in providing these particular batteries we might be > encouraging poor practise. I don't think that's a worry, at least if restricted to the single-affix form, because simply cutting off affixes is surely part of most such algorithms. The harder part is remembering that you probably have to deal with multiplicities and further transformations, but that can't be incentivized by refusing to implement .cutsuffix. It's an independent consideration. Steve From rhodri at kynesim.co.uk Tue Apr 2 14:02:09 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Tue, 2 Apr 2019 19:02:09 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Message-ID: On 02/04/2019 18:55, Stephen J. Turnbull wrote: >> = Me > > 3. My most common use case (not very common at that) is for stripping > > annoying prompts off text-based APIs. I'm happy using > > .startswith() and string slicing for that, though your point about > > the repeated use of the string to be stripped off (or worse, > > hard-coding its length) is well made. > > I don't understand this use case, specifically the opposition to > hard-coding the length. Although hard-coding the length wouldn't > occur to me in many cases, since I'd use > > # remove my bash prompt > prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ') > lines = [prompt_re.sub('', line) for line in lines] For me it's more often like input = get_line_from_UART() if input.startswith("INFO>"): input = input[5:] do_something_useful(input) which is error-prone when you cut and paste for a different prompt elsewhere and forget to change the slice to match. -- Rhodri James *-* Kynesim Ltd From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Apr 2 14:10:07 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 3 Apr 2019 03:10:07 +0900 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> Message-ID: <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> Anders Hovm?ller writes: > Removing "file:" prefix: > https://github.com/merijn/dotfiles/blob/43c736c73c5eda413dc7b4615bb679bd43a18d1a/dotfiles/hg-data/hooks/bitbucket.py#L16 This is interesting, because it shows the (so far standard) one-liner: word[len(prefix):] if word.startswith(prefix) else word can be improved (?!) to word[len(prefix) if word.startswith(prefix) else 0:] I don't know if this is more readable, but I think it's less so. Note that version 1 doesn't copy word if it doesn't start with prefix, while version 2 does. In many applications I can think of the results would be accumulated in a set, and version 1 equality tests will also be faster in the frequent case that the word doesn't start with the prefix. So that's the one I'd go with, as I can't think of any applications where multiple copies of the same string would be useful. Steve From python at mrabarnett.plus.com Tue Apr 2 14:28:01 2019 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 2 Apr 2019 19:28:01 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> Message-ID: On 2019-04-02 19:10, Stephen J. Turnbull wrote: > Anders Hovm?ller writes: > > > Removing "file:" prefix: > > https://github.com/merijn/dotfiles/blob/43c736c73c5eda413dc7b4615bb679bd43a18d1a/dotfiles/hg-data/hooks/bitbucket.py#L16 > > This is interesting, because it shows the (so far standard) one-liner: > > word[len(prefix):] if word.startswith(prefix) else word > > can be improved (?!) to > > word[len(prefix) if word.startswith(prefix) else 0:] > It could be 'improved' more to: word[word.startswith(prefix) and len(prefix) : ] > I don't know if this is more readable, but I think it's less so. > > Note that version 1 doesn't copy word if it doesn't start with prefix, > while version 2 does. In many applications I can think of the results > would be accumulated in a set, and version 1 equality tests will also > be faster in the frequent case that the word doesn't start with the > prefix. So that's the one I'd go with, as I can't think of any > applications where multiple copies of the same string would be useful. > _Neither_ version copies if the word doesn't start with the prefix. If you won't believe me, test them! :-) From python at mrabarnett.plus.com Tue Apr 2 14:33:55 2019 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 2 Apr 2019 19:33:55 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Message-ID: <4305c9ce-394b-10de-0717-5a0a7663d3c8@mrabarnett.plus.com> On 2019-04-02 18:55, Stephen J. Turnbull wrote: > Rhodri James writes: > > Steven d'Aprano writes: > > > (That's over a third of this admittedly incomplete list of prefixes.) > > > > > > I can think of at least one English suffix pair that clash: -ify, -fy. > > And worse: is "tries" the third person present tense of "try" or is it > the plural of "trie"? Pure lexical manipulation can't tell you. > > > You're beginning to persuade me that cut/trim methods/functions aren't a > > good idea :-) > > I don't think I would go there yet (well, I started there, but...). > > > So far we have two slightly dubious use-cases. > > > > 1. Stripping file extensions. Personally I find that treating filenames > > like filenames (i.e. using os.path or (nowadays) pathlib) results in me > > thinking more appropriately about what I'm doing. > > Very much agree. > > > 2. Stripping prefixes and suffixes to get to root words. > > for suffix in english_suffixes: > root = word.cutsuffix(suffix) > if lookup_in_dictionary(root): > do_something_appropriate_with_each_root_found() > > is surely more flexible and accurate than a hard-coded slice, and > significantly more readable than > > for suffix in english_suffixes: > root = word[:-len(suffix)] if word.endswith(suffix) else word > if lookup_in_dictionary(root): > do_something_appropriate_with_each_root_found() > [snip] The code above contains a subtle bug. If suffix == '', then word.endswith(suffix) == True, and word[:-len(suffix)] == word[:-0] == ''. Each time I see someone do that, I see more evidence in support of adding the method. From rosuav at gmail.com Tue Apr 2 14:42:19 2019 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Apr 2019 05:42:19 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <4305c9ce-394b-10de-0717-5a0a7663d3c8@mrabarnett.plus.com> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> <4305c9ce-394b-10de-0717-5a0a7663d3c8@mrabarnett.plus.com> Message-ID: On Wed, Apr 3, 2019 at 5:34 AM MRAB wrote: > > The code above contains a subtle bug. > If suffix == '', then word.endswith(suffix) == True, and > word[:-len(suffix)] == word[:-0] == ''. > > Each time I see someone do that, I see more evidence in support of > adding the method. Either that, or it's evidence that negative indexing is only part of the story, and we need a real way to express "zero from the end" other than negative zero. For instance, word[:<0] might mean "zero from the end", and word[:<1] would be "one from the end". As a syntactic element rather than an arithmetic one, it would be safe against accidentally slicing from the front instead of the back. But that's an idea for another day. ChrisA From cs at cskk.id.au Tue Apr 2 18:58:07 2019 From: cs at cskk.id.au (Cameron Simpson) Date: Wed, 3 Apr 2019 09:58:07 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: Message-ID: <20190402225807.GA61774@cskk.homeip.net> On 02Apr2019 12:23, Paul Moore wrote: >On Tue, 2 Apr 2019 at 12:07, Rhodri James wrote: >> So far we have two slightly dubious use-cases. >> >> 1. Stripping file extensions. Personally I find that treating filenames >> like filenames (i.e. using os.path or (nowadays) pathlib) results in me >> thinking more appropriately about what I'm doing. > >I'd go further and say that filename manipulation is a great example >of a place where generic string functions should definitely *not* be >used. Filename manipulation on a path _component_ is generally pretty reliable (yes one can break things by, say, inserting os.sep). I do a fair bit of filename fiddling using string functions, and these fall into 3 categories off the top of my head: - file extensions, and here I do use splitext() - trimming extensions (only barely a second case), and it turns out the only case I could easily find using the endswith/[:-offset] incantation would probably go just as well with splitext() - normalising pathnames; as an example, for the home media library I routinely downcase filenames, convert whitespace into a dash, separate fields with "--" (eg episode designator vs title) and convert _ into a colon (hello Mac Finder and UI file save dialogues, a holdover compatibility mode from OS9) None of these seem to benefit directly from having a cutprefix/cutsuffix method. But splitext aside, I'm generally fiddling a pathname component (and usually a basename), and in that domain the general string functions are very handy and well used. So I think "filename" (basename) fiddling with str methods is actually pretty reasonable. It is _pathname_ fiddling that is hazardous, because the path separators often need to be treated specially. >> 2. Stripping prefixes and suffixes to get to root words. Python has >> been used for natural language work for over a decade, and I don't think >> I've heard any great call from linguists for the functionality. English >> isn't a girl who puts out like that on a first date :-) There are too >> many common exception cases for such a straightforward approach not to >> cause confusion. > >Agreed, using prefix/suffix stripping on natural language is at best a >"quick hack". Yeah. I was looking at the prefix list from a related article and seeing "intra" and thinking "intractable". Hacky indeed. _Unless_ the word has already been qualified as suitable for the action. And once it is, a cutprefix method would indeed be handy. >> 3. My most common use case (not very common at that) is for stripping >> annoying prompts off text-based APIs. I'm happy using .startswith() and >> string slicing for that, though your point about the repeated use of the >> string to be stripped off (or worse, hard-coding its length) is well made. In some ways the verbosity and bugproneness is my personal use case for cutprefix/cutsuffix (however spelt): - repeating the string is wordy and requires human eyeballing whenever I read it (to check for correctness); the same applies whenever I write such a piece of code - personally I'm quite prone to off-by-one errors when hand writing variations on this - a well named method is more readable and expresses intent better (the same argument holds for a standalone function, though a method is a bit better) - the anecdotally not uncommon misuse of .strip() where .cutsuffix() with be correct I confess being a little surprised at how few examples which could use cutsuffix I found in my own code, where I had expected it to be common. I find several bits line this: # parsing text which may have \r\n line endings if line.endswith('\r'): line = line[:-1] # parsing a UNIX network interface listing from ifconfig, # which varies platform to platform if ifname.endswith(':'): ifname = ifname[:-1] Here I DO NOT want rstrip() because I want to strip only one character, rather than as many as there are. So: the optional trailing marker in some input. But doing this for single character markers is much easier to get right than the broader case with longer suffixes, so I think this is not a very strong case. Fiddling the domain suffix on an email address: if not addr.endswith(old_domain): raise ValueError('addr does not end in old_domain') addr2 = addr[:-len(old_domain)] + new_domain which would be a good fit, _except_ for the sanity check. However, that sanity check is just one of a few preceeding the change, so in fact this is a good fit. I have a few classes which annotate their instances with some magic attributes. Here's a snippet from a class' __getattr__ for a db schema: if attr.endswith('_table'): # *_table ==> table "*" nickname = attr[:-6] if nickname in self.table_by_nickname: There's a little suite of "match attribute suffix, trim and do something specific with what's left" if statements. However, they are almost all of the form above, so rewriting it like this: if attr.endswith('_table'): # *_table ==> table "*" nickname = attr.cutsuffix('_table') if nickname in self.table_by_nickname: is a small improvement. Eevry magic number (the "6" above) is an opportunity for bugs. >> I am beginning to worry slightly that actually there are usually more >> appropriate things to do than simply cutting off affixes, and that in >> providing these particular batteries we might be encouraging poor practise. > >It would be really helpful if someone could go through the various use >cases presented in this thread and classify them - filename >manipulation, natural language uses, and "other". Surprisingly for me, the big subjective win is avoiding misuse of lstrip/rstrip by having obvious better named alternatives for affix trimming. Short summary: in my own code I find oportunities for an affix trim method less common than I had expected. But I still like the "might find it useful" argument. I think I find "might find it useful" more compelling than many do. Let me explain. I think a _well_ _defined_ battery is worth including in the kit (str methods) because: - the operation is simple and well defined: people won't be confused by its purpose, and when they want it there is a reliable debugged method sitting there ready for use - variations on this get written _all the time_, and writing those variations using the method is more readable and more reliable - the existing .strip battery is misused for this purpose by accident I have in the past found myself arguing for adding little tools like this in agile teams, and getting a lot of resistence. The resistence tended to take these forms: - YAGNI. While the tiny battery _can_ be written longhand, every time that happens makes for needlessly verbose code, is an opportunity for stupid bugs, and makes code whose purpose must be _deduced_ rather than doing what it says on the tin - not in this ticket: this leads to a starvation issue - the battery never goes in with any ticket, and a ticket just for the battery never gets chosen for a sprint - we've already got this other battery; subtext "not needed" or "we don't want 2 ways to do this", my subtext "does it worse, or does something which only _looks_ like this purpose". Classic example from the codebase I was in at the time was SQL parameter insertion. Eventually I said "... this" and wrote the battery anyway. My position on cut*affix is that (a) it is easy to implement (b) it can thus be debugged once (c) it makes code clearer when used (d) it reduces the liklihood of .strip() misuse. Cheers, Cameron Simpson From eric at trueblade.com Tue Apr 2 19:46:27 2019 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 2 Apr 2019 19:46:27 -0400 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Message-ID: On 4/2/2019 2:02 PM, Rhodri James wrote: > On 02/04/2019 18:55, Stephen J. Turnbull wrote: > >> = Me >> ? > 3. My most common use case (not very common at that) is for stripping >> ? > annoying prompts off text-based APIs.? I'm happy using >> ? > .startswith() and string slicing for that, though your point about >> ? > the repeated use of the string to be stripped off (or worse, >> ? > hard-coding its length) is well made. >> >> I don't understand this use case, specifically the opposition to >> hard-coding the length.? Although hard-coding the length wouldn't >> occur to me in many cases, since I'd use >> >> ???? # remove my bash prompt >> ???? prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ') >> ???? lines = [prompt_re.sub('', line) for line in lines] > > For me it's more often like > > ??? input = get_line_from_UART() > ??? if input.startswith("INFO>"): > ??????? input = input[5:] > ??? do_something_useful(input) > > which is error-prone when you cut and paste for a different prompt > elsewhere and forget to change the slice to match. I originally saw this, and I thought "Yeah, me, too!". But then I realize I rarely want to do this. I almost always want to know if the string began with the prefix. I'd normally use something like this: -------------------------- for line in ["INFO>rest-of-line", "not-INFO>more-text", "text", "INFO>", ""]: start, sep, rest = line.partition("INFO>") if not start and sep: print(f"control line {rest!r}") else: print(f"data line {line!r}") output: control line 'rest-of-line' data line 'not-INFO>more-text' data line 'text' control line '' data line '' -------------------------- Breaking it out as a function gives how I'd need to call this, if we made it a function (or method on str): -------------------------- def str_has_prefix(s, prefix): '''returns (True, rest-of-string) or (False, s)''' start, sep, rest = s.partition(prefix) if not start and sep: return True, rest else: return False, s for line in ["INFO>rest-of-line", "not-INFO>more-text", "text", "INFO>", ""]: has_prefix, line = str_has_prefix(line, "INFO>") if has_prefix: print(f"control line {line!r}") else: print(f"data line {line!r}") -------------------------- Now I'll admit it's not super-efficient to create the start, sep, and rest sub-strings all the time, and maybe the test "not start and sep" isn't so obvious at first glance, but for my work this is good enough. It's not super-important how the function (or method) is implemented, I'm more concerned about the interface. If it was done in C, it obviously wouldn't call .partition(). So while I was originally +1 on this proposal, now I'm not so sure, given how I normally need to check if the string starts with a prefix and get the rest of the string if it does start with the prefix. On the other hand, just this weekend I was helping (again) with someone who misunderstood str.strip() on the bug tracker: https://bugs.python.org/issue36480, so I know .strip() and friends confuses people. But I don't think we can use that fact to say that we need .lcut()/.rcut(). It's just that as it's being proposed here, I think lcut/rcut (of whatever names) just doesn't have a useful interface, for me. I don't think I've ever wanted to remove a prefix/suffix if it existed, else use the whole string, and not know which case occurred. Eric PS: I really tried to find a way to use := in this example so I could put the assignment inside the 'if' statement, but as I think Tim Peters pointed out, without C's comma operator, you can't. From shoyer at gmail.com Tue Apr 2 22:06:25 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 2 Apr 2019 19:06:25 -0700 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Apr 2, 2019 at 5:43 PM Eric V. Smith wrote: > PS: I really tried to find a way to use := in this example so I could > put the assignment inside the 'if' statement, but as I think Tim Peters > pointed out, without C's comma operator, you can't. > Conceivably cut_prefix could return None if not found. Then you could write something like: if (stripped := cut_prefix(line, "INFO>")) is not None: print(f"control line {stripped!r}") else: print(f"data line {line!r}") You could even drop "is not None" in many circumstances, if you know the cut string will be non-empty. That's actually pretty readable: if stripped := cut_prefix(line, "INFO>"): print(f"control line {stripped!r}") else: print(f"data line {line!r}") -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Apr 2 22:44:11 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 3 Apr 2019 13:44:11 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> Message-ID: <20190403024410.GZ31406@ando.pearwood.info> On Tue, Apr 02, 2019 at 07:28:01PM +0100, MRAB wrote: [...] > > word[len(prefix) if word.startswith(prefix) else 0:] > > > It could be 'improved' more to: > > word[word.startswith(prefix) and len(prefix) : ] [...] > _Neither_ version copies if the word doesn't start with the prefix. If > you won't believe me, test them! :-) That slicing doesn't make a copy of the string is an implementation- dependent optimization, not a language guarantee. It's an obvious optimization to make (and in my testing, it does work all the way back to CPython 1.5) but if you want to write implementation-independent code, you shouldn't rely on it. By the letter of the language spec, an interpreter may make a copy of a string when doing a full slice string[0:len(string)]. -- Steven From steve at pearwood.info Tue Apr 2 23:54:10 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 3 Apr 2019 14:54:10 +1100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402225807.GA61774@cskk.homeip.net> References: <20190402225807.GA61774@cskk.homeip.net> Message-ID: <20190403035409.GA31406@ando.pearwood.info> On Wed, Apr 03, 2019 at 09:58:07AM +1100, Cameron Simpson wrote: [...] > Yeah. I was looking at the prefix list from a related article and seeing > "intra" and thinking "intractable". Hacky indeed. That example supports my position that we ought to be cautious about allowing multiple prefixes. The correct prefix in that case is in- not intra-. Deciding which prefix ought to take precedence requires specific domain knowledge, not a simple rule like "first|last|shortest|longest wins". > _Unless_ the word has > already been qualified as suitable for the action. And once it is, a > cutprefix method would indeed be handy. Which is precisely the point. Of course stemming words in full generality is hard. It requires the nuclear reactor of something like NLTK, and even that sometimes gets it wrong. But this is not a proposal for a natural language stemmer, it is a proposal for simple battery which could be used any time you want to cut a known prefix or suffix. [...] > - the anecdotally not uncommon misuse of .strip() where .cutsuffix() > with be correct Anecdotal would be "I knew a guy who made this error", but the evidence presented is objectively verifiable posts on the bug tracker, mailing lists and especially stackoverflow showing that people need to cut affixes and misuse strip for that purpose. > I confess being a little surprised at how few examples which could use > cutsuffix I found in my own code, where I had expected it to be common. I don't expect it to be very common, just common enough to be a repeated source of pain. Its probably more common, and less specialised, than partition and zfill, but less common than startswith/endswith. [...] > if ifname.endswith(':'): > ifname = ifname[:-1] > > Here I DO NOT want rstrip() because I want to strip only one character, > rather than as many as there are. So: the optional trailing marker in > some input. But doing this for single character markers is much easier > to get right than the broader case with longer suffixes, so I think this > is not a very strong case. Imagine that these proposed methods had been added in Python 2.2. Would you be even a tiny bit tempted to write that code above, or would you use the string method? Now imagine it's five years from now, and you're using Python 3.11, and you came across code somebody (possibly even you!) wrote: ifname = ifname.cutsuffix(':') Would you say "Damn, I wish that method had never been added!" and replace it with the earlier code above? Those two questions are not so much aimed at you, Cameron, personally, they're more generic questions for any reader. -- Steven From p.f.moore at gmail.com Wed Apr 3 04:08:47 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Apr 2019 09:08:47 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <20190402225807.GA61774@cskk.homeip.net> References: <20190402225807.GA61774@cskk.homeip.net> Message-ID: On Tue, 2 Apr 2019 at 23:58, Cameron Simpson wrote: > I think I find "might find it useful" more compelling than many do. Let > me explain. > > I think a _well_ _defined_ battery is worth including in the kit (str > methods) because: > > - the operation is simple and well defined: people won't be confused by > its purpose, and when they want it there is a reliable debugged method > sitting there ready for use > > - variations on this get written _all the time_, and writing those > variations using the method is more readable and more reliable > > - the existing .strip battery is misused for this purpose by accident > > I have in the past found myself arguing for adding little tools like > this in agile teams, and getting a lot of resistence. The resistence > tended to take these forms: > > - YAGNI. While the tiny battery _can_ be written longhand, every time > that happens makes for needlessly verbose code, is an opportunity for > stupid bugs, and makes code whose purpose must be _deduced_ rather > than doing what it says on the tin > > - not in this ticket: this leads to a starvation issue - the battery > never goes in with any ticket, and a ticket just for the battery never > gets chosen for a sprint > > - we've already got this other battery; subtext "not needed" or "we > don't want 2 ways to do this", my subtext "does it worse, or does > something which only _looks_ like this purpose". Classic example from > the codebase I was in at the time was SQL parameter insertion. > Eventually I said "... this" and wrote the battery anyway. These are very good arguments, and they aren't something I'd really thought about - they make a very good case (in general) for being sympathetic to proposals for small features that "might be useful", while also offering a couple of good tests for such proposals. "Simple and well defined" in particular strikes me as important (and it's often the one that gets lost when the bikeshedding about end cases starts ;-)) > My position on cut*affix is that (a) it is easy to implement (b) it can > thus be debugged once (c) it makes code clearer when used (d) it reduces > the liklihood of .strip() misuse. IMO, cut*fix at this point is mainly waiting on someone to actually put a feature request on bpo, and an implementation PR on github. At that point, whether it gets implemented will boil down to whether one of the core devs likes it enough to merge. I doubt more discussion here is going to make much difference, and the proposal isn't significant enough to warrant a PEP. Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Apr 3 04:38:14 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 3 Apr 2019 17:38:14 +0900 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> Message-ID: <23716.28918.309131.696660@turnbull.sk.tsukuba.ac.jp> MRAB writes: > On 2019-04-02 19:10, Stephen J. Turnbull wrote: > > word[len(prefix) if word.startswith(prefix) else 0:] > It could be 'improved' more to: > > word[word.startswith(prefix) and len(prefix) : ] Except that it would be asymmetric with suffix. That probably doesn't matter given the sequence[:-0] bug. BTW thank you for pointing out that bug (and not quoting the code where I deliberately explicitly introduced the null suffix! ;-) This works: word[:-len(suffix) or len(word)] if word.endswith(suffix) else word Do tutorials mention this pitfall with computed indicies (that -0 is treated as "beginning of sequence")? (I should check myself, but can't spend time this week and so probably won't. :-( ) > > prefix. So that's the one I'd go with, as I can't think of any > > applications where multiple copies of the same string would be useful. > _Neither_ version copies if the word doesn't start with the prefix. If > you won't believe me, test them! :-) Oh, I believe you. It just means somebody long ago thought more deeply about the need for copying immutable objects than I ever have. From python at mrabarnett.plus.com Wed Apr 3 13:50:37 2019 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 3 Apr 2019 18:50:37 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <23716.28918.309131.696660@turnbull.sk.tsukuba.ac.jp> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> <23716.28918.309131.696660@turnbull.sk.tsukuba.ac.jp> Message-ID: <8e512886-df9e-af4b-c6ea-ccd146b6df25@mrabarnett.plus.com> On 2019-04-03 09:38, Stephen J. Turnbull wrote: > MRAB writes: >? > On 2019-04-02 19:10, Stephen J. Turnbull wrote: > >? > >????? word[len(prefix) if word.startswith(prefix) else 0:] > >? > It could be 'improved' more to: >? > >? >????? word[word.startswith(prefix) and len(prefix) : ] > > Except that it would be asymmetric with suffix.? That probably doesn't > matter given the sequence[:-0] bug. > > BTW thank you for pointing out that bug (and not quoting the code > where I deliberately explicitly introduced the null suffix! ;-)? This > works: > >???? word[:-len(suffix) or len(word)] if word.endswith(suffix) else word > I would've written it as: ??? word[: len(word) - len (suffix)] if word.endswith(suffix) else word > Do tutorials mention this pitfall with computed indicies (that -0 is > treated as "beginning of sequence")?? (I should check myself, but > can't spend time this week and so probably won't. :-( ) > >? > > prefix.? So that's the one I'd go with, as I can't think of any >? > > applications where multiple copies of the same string would be useful. > >? > _Neither_ version copies if the word doesn't start with the prefix. If >? > you won't believe me, test them! :-) > > Oh, I believe you.? It just means somebody long ago thought more > deeply about the need for copying immutable objects than I ever have. > From python at mrabarnett.plus.com Wed Apr 3 13:54:21 2019 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 3 Apr 2019 18:54:21 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.41462.846192.286383@turnbull.sk.tsukuba.ac.jp> Message-ID: <148ecb5c-3890-27f4-8313-f66b20a0b747@mrabarnett.plus.com> On 2019-04-03 03:06, Stephan Hoyer wrote: > On Tue, Apr 2, 2019 at 5:43 PM Eric V. Smith > wrote: > > PS: I really tried to find a way to use := in this example so I could > put the assignment inside the 'if' statement, but as I think Tim Peters > pointed out, without C's comma operator, you can't. > > > Conceivably cut_prefix could return None if not found. Then you could > write something like: > > if (stripped := cut_prefix(line, "INFO>")) is not None: > ? ? ?print(f"control line {stripped!r}") > else: > ? ? ?print(f"data line {line!r}") > > You could even drop "is not None" in many circumstances, if you know the > cut string will be non-empty. That's actually pretty readable: > > if stripped := cut_prefix(line, "INFO>"): > ? ? ?print(f"control line {stripped!r}") > else: > ? ? ?print(f"data line {line!r}") > -1 Sometimes you just want to remove it if present, otherwise leave the string as-is. I wouldn't want to have to write: line = line.lcut("INFO>") or line From barry at barrys-emacs.org Wed Apr 3 14:03:01 2019 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 3 Apr 2019 19:03:01 +0100 Subject: [Python-ideas] New explicit methods to trim strings In-Reply-To: <8e512886-df9e-af4b-c6ea-ccd146b6df25@mrabarnett.plus.com> References: <7D84D131-65B6-4EF7-9C43-51957F9DFAA9@getmailspring.com> <20190401000837.GK31406@ando.pearwood.info> <20190401013425.GC6059@ando.pearwood.info> <20190402005251.GU31406@ando.pearwood.info> <0e0f691b-ecba-72fa-1d2a-f734d6ad5154@kynesim.co.uk> <23715.42367.172126.972274@turnbull.sk.tsukuba.ac.jp> <23716.28918.309131.696660@turnbull.sk.tsukuba.ac.jp> <8e512886-df9e-af4b-c6ea-ccd146b6df25@mrabarnett.plus.com> Message-ID: <13B67A27-F6DE-4057-B6ED-40909B390C4D@barrys-emacs.org> Use "without" as the action picking up on "with" as in startswith, endswith: new_string = a_string.withoutprefix( prefix ) new_string = a_sring.withoutsuffix( suffix ) And since we have "replace" "remove" would also seem obvious. new_string = a_string.removeprefix( prefix ) new_string = a_sring.removesuffix( suffix ) I know that some commented that remove sounds like its inplace. But then so does replace. Would "replacesuffix" and "replaceprefix" work? I'd default the "replacement" to the empty string "". new_string = a_string.replaceprefix( old_prefix, replacement ) new_string = a_sring.replacesuffix( old_suffix, replacement ) Barry From greg.ewing at canterbury.ac.nz Thu Apr 4 03:47:30 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Apr 2019 20:47:30 +1300 Subject: [Python-ideas] Add output() helper function to subprocess module Message-ID: <5CA5B692.2060407@canterbury.ac.nz> The check_output() function of the subprocess module raises an exception if the process returns a non-zero exit status. This is inconvenient for commands such as grep that use the return status to indicate something other than success or failure. The check_call() function has a companion call(), but here is currently no non-checking companion for check_call(). How about adding one with a signature such as output(args) --> (status, output) -- Greg From njs at pobox.com Thu Apr 4 04:08:14 2019 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 4 Apr 2019 01:08:14 -0700 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: <5CA5B692.2060407@canterbury.ac.nz> References: <5CA5B692.2060407@canterbury.ac.nz> Message-ID: On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing wrote: > > The check_output() function of the subprocess module raises an > exception if the process returns a non-zero exit status. This is > inconvenient for commands such as grep that use the return > status to indicate something other than success or failure. > > The check_call() function has a companion call(), but here is > currently no non-checking companion for check_call(). How > about adding one with a signature such as > > output(args) --> (status, output) Isn't this already available as: run(args, stdout=PIPE)? Is the object to the extra typing, or...? -n -- Nathaniel J. Smith -- https://vorpus.org From rosuav at gmail.com Thu Apr 4 04:44:29 2019 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Apr 2019 19:44:29 +1100 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: References: <5CA5B692.2060407@canterbury.ac.nz> Message-ID: On Thu, Apr 4, 2019 at 7:12 PM Nathaniel Smith wrote: > > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing wrote: > > > > The check_output() function of the subprocess module raises an > > exception if the process returns a non-zero exit status. This is > > inconvenient for commands such as grep that use the return > > status to indicate something other than success or failure. > > > > The check_call() function has a companion call(), but here is > > currently no non-checking companion for check_call(). How > > about adding one with a signature such as > > > > output(args) --> (status, output) > > Isn't this already available as: run(args, stdout=PIPE)? Is the object > to the extra typing, or...? > Or discoverability. If you want to run a subprocess and catch its output, you'll naturally reach for check_output, and it feels clunkier to have to use run() instead. +1 on adding a nice simple function, although I'm not 100% sold on the name "output". ChrisA From greg.ewing at canterbury.ac.nz Thu Apr 4 04:59:19 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Apr 2019 21:59:19 +1300 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: References: <5CA5B692.2060407@canterbury.ac.nz> Message-ID: <5CA5C767.2010903@canterbury.ac.nz> Nathaniel Smith wrote: > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing wrote: >>output(args) --> (status, output) > > Isn't this already available as: run(args, stdout=PIPE)? Yes, but you need to do more than that to get the output as a string. This is the relevant part of the implementation of check_output(): process = Popen(stdout=PIPE, *popenargs, **kwargs) output, unused_err = process.communicate() retcode = process.poll() -- Greg From greg.ewing at canterbury.ac.nz Thu Apr 4 05:01:33 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Apr 2019 22:01:33 +1300 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: References: <5CA5B692.2060407@canterbury.ac.nz> Message-ID: <5CA5C7ED.8030008@canterbury.ac.nz> Chris Angelico wrote: > +1 on adding a nice simple function, although I'm not 100% sold on the > name "output". The idea is that output/check_output would go together like call/check_call. -- Greg From rosuav at gmail.com Thu Apr 4 05:04:09 2019 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Apr 2019 20:04:09 +1100 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: <5CA5C7ED.8030008@canterbury.ac.nz> References: <5CA5B692.2060407@canterbury.ac.nz> <5CA5C7ED.8030008@canterbury.ac.nz> Message-ID: On Thu, Apr 4, 2019 at 8:02 PM Greg Ewing wrote: > > Chris Angelico wrote: > > +1 on adding a nice simple function, although I'm not 100% sold on the > > name "output". > > The idea is that output/check_output would go together like > call/check_call. > Yeah, so I think that on balance it's probably the best choice, but as its own thing, it's a bit odd. subprocess.output("...") I'm, let's say, +1 on the idea in general, and +0.9 on calling it "output". ChrisA From njs at pobox.com Thu Apr 4 05:10:50 2019 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 4 Apr 2019 02:10:50 -0700 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: <5CA5C767.2010903@canterbury.ac.nz> References: <5CA5B692.2060407@canterbury.ac.nz> <5CA5C767.2010903@canterbury.ac.nz> Message-ID: On Thu, Apr 4, 2019 at 1:59 AM Greg Ewing wrote: > > Nathaniel Smith wrote: > > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing wrote: > >>output(args) --> (status, output) > > > > Isn't this already available as: run(args, stdout=PIPE)? > > Yes, but you need to do more than that to get the output > as a string. This is the relevant part of the implementation > of check_output(): > > process = Popen(stdout=PIPE, *popenargs, **kwargs) > output, unused_err = process.communicate() > retcode = process.poll() >>> from subprocess import run, pipe >>> p = run(["grep", "njs", "/etc/passwd"], stdout=PIPE) >>> p.returncode 0 >>> p.stdout b'njs:x:1000:1000:Nathaniel J. Smith,,,:/home/njs:/usr/bin/zsh\n' I do think it's a bit weird that you write 'stdout=PIPE' to mean 'please capture stdout' ? it's leaking an internal implementation detail across an abstraction boundary. But it's documented, and run() allows any combination of check=True/False, capturing stdout or not, and capturing stderr or not, without having to invent 8 different functions. -n -- Nathaniel J. Smith -- https://vorpus.org From storchaka at gmail.com Thu Apr 4 06:22:05 2019 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 4 Apr 2019 13:22:05 +0300 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: <5CA5C767.2010903@canterbury.ac.nz> References: <5CA5B692.2060407@canterbury.ac.nz> <5CA5C767.2010903@canterbury.ac.nz> Message-ID: 04.04.19 11:59, Greg Ewing ????: > Nathaniel Smith wrote: >> On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing >> wrote: >>> output(args) --> (status, output) >> >> Isn't this already available as: run(args, stdout=PIPE)? > > Yes, but you need to do more than that to get the output > as a string. This is the relevant part of the implementation > of check_output(): > > ??? process = Popen(stdout=PIPE, *popenargs, **kwargs) > ??? output, unused_err = process.communicate() > ??? retcode = process.poll() > check_output() is currently implemented (besides arguments checks and legacy support) as just run(..., stdout=PIPE, check=True).stdout For getting unchecked output you need just to omit check=True. run(..., stdout=PIPE).stdout I think that after adding run() there is no need in output(). From phd at phdru.name Thu Apr 4 05:46:32 2019 From: phd at phdru.name (Oleg Broytman) Date: Thu, 4 Apr 2019 11:46:32 +0200 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: References: <5CA5B692.2060407@canterbury.ac.nz> Message-ID: <20190404094632.ytboieqzcooz6ax3@phdru.name> On Thu, Apr 04, 2019 at 07:44:29PM +1100, Chris Angelico wrote: > On Thu, Apr 4, 2019 at 7:12 PM Nathaniel Smith wrote: > > > > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing wrote: > > > > > > The check_output() function of the subprocess module raises an > > > exception if the process returns a non-zero exit status. This is > > > inconvenient for commands such as grep that use the return > > > status to indicate something other than success or failure. > > > > > > The check_call() function has a companion call(), but here is > > > currently no non-checking companion for check_call(). How > > > about adding one with a signature such as > > > > > > output(args) --> (status, output) > > > > Isn't this already available as: run(args, stdout=PIPE)? Is the object > > to the extra typing, or...? > > > > Or discoverability. If you want to run a subprocess and catch its > output, you'll naturally reach for check_output, and it feels clunkier > to have to use run() instead. > > +1 on adding a nice simple function, although I'm not 100% sold on the > name "output". get_output ? > ChrisA Oleg. -- Oleg Broytman https://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From guido at python.org Thu Apr 4 13:14:34 2019 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Apr 2019 10:14:34 -0700 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: <20190404094632.ytboieqzcooz6ax3@phdru.name> References: <5CA5B692.2060407@canterbury.ac.nz> <20190404094632.ytboieqzcooz6ax3@phdru.name> Message-ID: Let?s please leave this alone. As Serhiy says run() covers everything. On Thu, Apr 4, 2019 at 3:03 AM Oleg Broytman wrote: > On Thu, Apr 04, 2019 at 07:44:29PM +1100, Chris Angelico > wrote: > > On Thu, Apr 4, 2019 at 7:12 PM Nathaniel Smith wrote: > > > > > > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing < > greg.ewing at canterbury.ac.nz> wrote: > > > > > > > > The check_output() function of the subprocess module raises an > > > > exception if the process returns a non-zero exit status. This is > > > > inconvenient for commands such as grep that use the return > > > > status to indicate something other than success or failure. > > > > > > > > The check_call() function has a companion call(), but here is > > > > currently no non-checking companion for check_call(). How > > > > about adding one with a signature such as > > > > > > > > output(args) --> (status, output) > > > > > > Isn't this already available as: run(args, stdout=PIPE)? Is the object > > > to the extra typing, or...? > > > > > > > Or discoverability. If you want to run a subprocess and catch its > > output, you'll naturally reach for check_output, and it feels clunkier > > to have to use run() instead. > > > > +1 on adding a nice simple function, although I'm not 100% sold on the > > name "output". > > get_output ? > > > ChrisA > > Oleg. > -- > Oleg Broytman https://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cspealma at redhat.com Fri Apr 5 08:20:32 2019 From: cspealma at redhat.com (Calvin Spealman) Date: Fri, 5 Apr 2019 08:20:32 -0400 Subject: [Python-ideas] Add output() helper function to subprocess module In-Reply-To: References: <5CA5B692.2060407@canterbury.ac.nz> <20190404094632.ytboieqzcooz6ax3@phdru.name> Message-ID: This is probably the most common first use case someone has when trying to use subprocess for the first time and I think it has always been a bit of a wart that, given all the helpers and wrappers the subprocess module already has, it lacks one for that very obvious and common need. Yes, run() covers everything, but so does subprocess.Popen() itself. Helpers still help. On Thu, Apr 4, 2019 at 1:15 PM Guido van Rossum wrote: > Let?s please leave this alone. As Serhiy says run() covers everything. > > On Thu, Apr 4, 2019 at 3:03 AM Oleg Broytman wrote: > >> On Thu, Apr 04, 2019 at 07:44:29PM +1100, Chris Angelico < >> rosuav at gmail.com> wrote: >> > On Thu, Apr 4, 2019 at 7:12 PM Nathaniel Smith wrote: >> > > >> > > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing < >> greg.ewing at canterbury.ac.nz> wrote: >> > > > >> > > > The check_output() function of the subprocess module raises an >> > > > exception if the process returns a non-zero exit status. This is >> > > > inconvenient for commands such as grep that use the return >> > > > status to indicate something other than success or failure. >> > > > >> > > > The check_call() function has a companion call(), but here is >> > > > currently no non-checking companion for check_call(). How >> > > > about adding one with a signature such as >> > > > >> > > > output(args) --> (status, output) >> > > >> > > Isn't this already available as: run(args, stdout=PIPE)? Is the object >> > > to the extra typing, or...? >> > > >> > >> > Or discoverability. If you want to run a subprocess and catch its >> > output, you'll naturally reach for check_output, and it feels clunkier >> > to have to use run() instead. >> > >> > +1 on adding a nice simple function, although I'm not 100% sold on the >> > name "output". >> >> get_output ? >> >> > ChrisA >> >> Oleg. >> -- >> Oleg Broytman https://phdru.name/ >> phd at phdru.name >> Programmers don't die, they just GOSUB without RETURN. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- > --Guido (mobile) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- CALVIN SPEALMAN SENIOR QUALITY ENGINEER cspealma at redhat.com M: +1.336.210.5107 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Fri Apr 5 22:05:11 2019 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Fri, 5 Apr 2019 22:05:11 -0400 Subject: [Python-ideas] Python list of list matrixes without NumPy Message-ID: The semantics of list operators are congruent, but unintuitive sometimes. For example, I've fallen over this many times: >>> truth_matrix = [[False] * n ] * m It doesn't create what you'd want. The only safe way I know to creat an NxM matrix is: >>> truth_matrix = [ [False for _ in range(n) for _ in range(m)] There could be a library to deal with that common case, and those of checking if a list of lists (of lists?) is properly rectangular, square, etc., and several other common cases. It takes experience to be aware that `truth_matrix[i][j]` will change more than one "cell" if the initialization is the former. `numpy` is not part of stdlib. A standard library for list of list would be usefu, specially to newcomersl. I don't remember such being discussed lately. >>> truth_matrix = lol(n, m, init=bool) >>> other_matrix = lol(n, m, o, p, init=float) Maybe it could also be done with syntax, but I don't have any ideas in that regard (I don't think "lol()" is overloaded). Regards, -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From pythonchb at gmail.com Sat Apr 6 14:48:36 2019 From: pythonchb at gmail.com (Christopher Barker) Date: Sat, 6 Apr 2019 11:48:36 -0700 Subject: [Python-ideas] Python list of list matrixes without NumPy In-Reply-To: References: Message-ID: By the way, your reply isn't go to the list --bringing it back on. On Sat, Apr 6, 2019 at 5:03 AM Juancarlo A?ez wrote: > Or maybe there is one in PyPi ? have you looked? >> > > There's a bunch of "matrix" packages in PyPi, none of them dealing with > the simple use cases I'm talking about. > Be careful about terminology -- "matrix" is a term from mathematics (linear algebra) that has a specific meaning -- I don't think that's what you mean. If it is what you mean, then what you want is numpy (though it fact, numpy is not a matrix system -- it is general purpose n-dimensional arrays designed for numerical computation that includes some features to support linear algebra ). > It could be static methods in `list`? > > x = list.matrix(n, m, default=0.0) > y = list.matrix(n, m, o, p, default=False) > i = list.eye(n, False, True) > If all you want are constructors for common 2-d or n-d arrays that would create lists of lists, then make a handful of utility functions, put them on PyPi, and see if anyone finds them useful -- if so, then offer it up as an addition the standard list object. But for my part, I think simple constructors are of limited utility -- sure, it's an easy source of errors to do it "wrong", e.g.: [[None] * 5] * 6 appears to create a 2 dimensional array, but really has all the rows as references to the same object. But users will discover this pretty quickly, and there is something to be said for folks needing to understand how python works with regard to multiple references to the same mutable object. But what might be more useful is a 2D or ND array class that would provide more than just constructors, but would provide nifty things like 2-dimension indexing: arr = NdArray((3,4), fill=None) arr: [[None, None, None, None], [None, None, None, None], [None, None, None, None]] and nifty things like 2-d indexing: arr[:, 2] to get columns, for instance. Also some controls on resizing -- you really don't want folks able to append arbitrarily to any of this. THAT might gain some traction. (or not, as you get this with numpy, and so much more, anyway. But either way, write it and show its utility first. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Sat Apr 6 15:49:58 2019 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Sat, 6 Apr 2019 15:49:58 -0400 Subject: [Python-ideas] Python list of list matrixes without NumPy In-Reply-To: References: Message-ID: On Sat, Apr 6, 2019 at 4:11 AM Steve Barnes wrote: > > ipython > > Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit > (AMD64)] > > Type 'copyright', 'credits' or 'license' for more information > > IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help. > > > > In [1]: [[False]*2]*5 > > Out[1]: > > [[False, False], > > [False, False], > > [False, False], > > [False, False], > > [False, False]] > > # Looks like exactly what was wanted > [ins] In [1]: x = [[False]*2]*5 [ins] In [2]: x Out[2]: [[False, False], [False, False], [False, False], [False, False], [False, False]] [ins] In [3]: x[1][1] = True [ins] In [4]: x Out[4]: [[False, True], [False, True], [False, True], [False, True], [False, True]] -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From t_glaessle at gmx.de Sun Apr 7 07:42:46 2019 From: t_glaessle at gmx.de (=?UTF-8?B?VGhvbWFzIEdsw6TDn2xl?=) Date: Sun, 7 Apr 2019 13:42:46 +0200 Subject: [Python-ideas] cli tool to print value, similar to pydoc Message-ID: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> Hi, what do you think about adding a tool to print a specified object, similar to pydoc shows its help? I regularly find myself checking the contents of variables to get a quick grasp on what the value looks like during development, and then type something like this: ??? python -c 'import os; print(os.pathsep)' ??? # or even ??? python ??? >>> import os ??? >>> os.pathsep other examples may be MODULE.__version__, sys.path, or sys.platform, but really this happens a lot for a lot of different things. It would be nice to just type, e.g. any of: ??? pyval os.pathsep ??? python -m pprint os.pathsep ??? python -p os.pathsep ??? pydoc --value os.pathsep There is already a tool like this on PyPI [1] (sadly py2 only atm), but if you agree that this is a common pattern, I believe it would be a lot more useful to have it in the stdlib. [1] https://pypi.org/project/pyeval/ Additional considerations: It might be useful to add an option to pass a format specification, which would then do the equivalent of: ??? print(format(value, spec)) Note that [1] is even more powerful, because it can directly evaluate expressions, e.g.: ??? pyeval 'math.sin(math.pi/5)' This can be really useful at times, but maybe the base version would be enough for a first step. What do you think? Best, Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: OpenPGP digital signature URL: From bitsink at gmail.com Sun Apr 7 21:53:20 2019 From: bitsink at gmail.com (Nam Nguyen) Date: Sun, 7 Apr 2019 18:53:20 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Mon, Apr 1, 2019 at 3:13 PM Terry Reedy wrote: > On 4/1/2019 1:14 AM, Guido van Rossum wrote: > > We do have a parser generator in the standard library: > > https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2 > > It is effectively undocumented and by inference discouraged from use. > I've tried it out over the weekend. The undocumented-ness is kinda annoying but surmountable. What I found was this library is tightly coupled to the Python language, both at the lexer and parser levels. For example, defining a simple grammar like this would not work: genericurl: scheme '://' scheme: ... The reason is '://' is not a known token type in Python language. That is a real bummer. Back to my original goal, I've gathered that there is some interest in having a more general parser library in the stdlib. "Some", but not "much". Should I start out with a straw proposal so that we can hash it out further? Cheers, Nam The entry for lib2to3 in the 2to3 doc: > https://docs.python.org/3/library/2to3.html#module-lib2to3 > " > lib2to3 - 2to3?s library > Source code: Lib/lib2to3/ > Note: The lib2to3 API should be considered unstable and may change > drastically in the future. > > help(pgen) is not much more helpful. > : > Help on package lib2to3.pgen2 in lib2to3: > > NAME > lib2to3.pgen2 - The pgen2 package. > > PACKAGE CONTENTS > conv > driver > grammar > literals > parse > pgen > token > tokenize > > FILE > c:\programs\python38\lib\lib2to3\pgen2\__init__.py > > > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Apr 7 22:35:30 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 12:35:30 +1000 Subject: [Python-ideas] cli tool to print value, similar to pydoc In-Reply-To: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> References: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> Message-ID: <20190408023530.GI31406@ando.pearwood.info> On Sun, Apr 07, 2019 at 01:42:46PM +0200, Thomas Gl??le wrote: > It would be nice to just type, e.g. any of: > > ??? pyval os.pathsep How will it know what object os is, without guessing, if you haven't imported it? > There is already a tool like this on PyPI [1] (sadly py2 only atm), but > if you agree that this is a common pattern, I always have at least one REPL open for precisely this sort of thing, and the interactive interpreter is infinitely more flexible and powerful than a tool to print one value. -- Steven From steve at pearwood.info Sun Apr 7 22:32:05 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 12:32:05 +1000 Subject: [Python-ideas] Sorted lists Message-ID: <20190408023204.GH31406@ando.pearwood.info> There are quite a few important algorithms which require lists to be sorted. For example, the bisect module, and for statistics median and other quantiles. Sorting a list is potentially expensive: while Timsort is very efficient, it is still ultimately an O(N log N) algorithm which limits how efficient it can be. Checking whether a list is sorted is O(N). What if we could check that lists were sorted in constant time? Proposal: let's give lists a dunder flag, __issorted__, that tracks whether the list is definitely sorted or not: - Empty lists, or lists with a single item, are created with __issorted__ = True; lists with two or more items are created with the flag set to False. - Appending or inserting items sets the flag to False. - Deleting or popping items doesn't change the flag. - Reversing the list doesn't change the flag. - Sorting it sets the flag to True. (The sort method should NOT assume the list is sorted just because the flag is set.) Functions that require the list to be sorted can use the flag as a quick check: if not alist.__issorted__: alist.sort() ... The flag will be writable, so that functions such as those in bisect can mark that they have kept the sorted invariant: bisect.insort(alist, x) assert alist.__issorted__ Being writable, the flag is advisory, not a guarantee, and "consenting adults" applies. People can misuse the flag: alist = [1, 4, 2, 0, 5] alist.__issorted__ = True but then they have nobody to blame but themselves if they shoot themselves in the foot. That's no worse than the situation we have now, were you might pass an unsorted list to bisect. The flag doesn't guarantee that the list is sorted the way you want (e.g. biggest to smallest, by some key, etc) only that it has been sorted. Its up to the user to ensure they sort it the right way: # Don't do this and expect it to work! alist.sort(key=random.random) bisect.insort(alist, 1) If you really want to be sure about the state of the list, you have to make a copy and sort it. But that's no different from the situation right now. But for those willing to assume "consenting adults", you might trust the flag and avoid sorting. Downsides: - Every list grows an extra attribute; however, given that lists are already quite big data structures and are often over-allocated, I don't think this will matter much. - insert(), append(), extend(), __setitem__() will be a tiny bit slower due to the need to set the flag. Thoughts? -- Steven From njs at pobox.com Sun Apr 7 23:26:24 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 7 Apr 2019 20:26:24 -0700 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: On Sun, Apr 7, 2019 at 7:37 PM Steven D'Aprano wrote: > There are quite a few important algorithms which require lists to be > sorted. For example, the bisect module, and for statistics median and > other quantiles. But this flag doesn't affect those modules, right? 'bisect' already requires the user to ensure that the list is sorted appropriately, and this bit: > The flag doesn't guarantee that the list is sorted the way you want > (e.g. biggest to smallest, by some key, etc) only that it has been > sorted. Its up to the user to ensure they sort it the right way: ...seems to mean that the 'statistics' module can't use this flag either. It doesn't seem very likely to me that the savings from this flag could outweigh the extra overhead it introduces, just because list operations are *so* common in Python. If you want to push this forward, the thing I'd most like to see is some kind of measurements to demonstrate that average programs will benefit. -n -- Nathaniel J. Smith -- https://vorpus.org From cs at cskk.id.au Sun Apr 7 23:31:14 2019 From: cs at cskk.id.au (Cameron Simpson) Date: Mon, 8 Apr 2019 13:31:14 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: <20190408033114.GA67683@cskk.homeip.net> On 08Apr2019 12:32, Steven D'Aprano wrote: >There are quite a few important algorithms which require lists to be >sorted. For example, the bisect module, and for statistics median and >other quantiles. > >Sorting a list is potentially expensive: while Timsort is very >efficient, it is still ultimately an O(N log N) algorithm which limits >how efficient it can be. Checking whether a list is sorted is O(N). > >What if we could check that lists were sorted in constant time? > >Proposal: let's give lists a dunder flag, __issorted__, that tracks >whether the list is definitely sorted or not: [...specifics, all looking pretty sane for a hand maintained advisory flag...] >- insert(), append(), extend(), __setitem__() will be a tiny bit > slower due to the need to set the flag. __setitem__ concerns me, along with other modification methods: what about subclasses(*)? Every existing subclass which overrides __setitem__ now needs to grow code to maintain __issorted__ if they do not themselves call list.__setitem__. Also, should this imply an issorted() builtin to consult an instance's __issorted__ dunder flag? Should such a builtin return False for instances without an __issorted__ flag? I'm thinking yes since the flag is intended to mean known-to-be-sorted. Cheers, Cameron Simpson * I _know_ subclassing builtins is discouraged, but it is supported and can be done if one is conservative. From tjreedy at udel.edu Mon Apr 8 00:17:13 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 8 Apr 2019 00:17:13 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: On 4/7/2019 10:32 PM, Steven D'Aprano wrote: > There are quite a few important algorithms which require lists to be > sorted. For example, the bisect module, and for statistics median and > other quantiles. > > Sorting a list is potentially expensive: while Timsort is very > efficient, it is still ultimately an O(N log N) algorithm which limits > how efficient it can be. Checking whether a list is sorted is O(N). > > What if we could check that lists were sorted in constant time? > > Proposal: let's give lists a dunder flag, __issorted__, that tracks > whether the list is definitely sorted or not: Does the CPython C-coded list structure have a unused bit that could be used for this? (I realized that other implementations might have a different answer.) Dunder names are not intended for directly use in code. If __issorted__ is a property, it could instead by .is_sorted or a new .is_sorted method, where is_sorted(bool) sets the property. -- Terry Jan Reedy From cs at cskk.id.au Mon Apr 8 01:18:53 2019 From: cs at cskk.id.au (Cameron Simpson) Date: Mon, 8 Apr 2019 15:18:53 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: References: Message-ID: <20190408051853.GA10806@cskk.homeip.net> On 08Apr2019 00:17, Terry Reedy wrote: >On 4/7/2019 10:32 PM, Steven D'Aprano wrote: >>There are quite a few important algorithms which require lists to be >>sorted. [...] >>Proposal: let's give lists a dunder flag, __issorted__, that tracks >>whether the list is definitely sorted or not: > >[...] Dunder names are not intended for directly use in code. If >__issorted__ is a property, it could instead by .is_sorted or a new >.is_sorted method, where is_sorted(bool) sets the property. Dunders are ok to use in implementation code (class internal). I agree it's not nice to access from outside the class. I was imagining a builtin that calls somethings .__issorted__ method. But Steven's suggesting a flat dunder attribute rather than a callable method. If this is a publicly queriable value, is there any need to have a dunder name at all? Why not just give lists a public is_sorted attribute? I'm also not convinced the cost to every insert/append is worth the (subjectively to me) highly infrequent use. I imagine one could increase the utility of the flag by implementing insert/append with a bit of logic like: if self.__issorted__: check-previous/next elements to see if sortedness is preserved so that a list constructed in sorted order may keep the flag. However, it seems to me that such a list would accrue an O(n) cost with all of those O(1) checks over the whole construction, and so not be cheaper than just checking for sortedness aonce at the end before use. Conjecture: anything that requires a sorted list but _accepts_ an unsorted list (eg outer code which uses bisect or median) needs to check for sortedness and sort if not already sorted. So it feels to me like we've got an minimum O(n) cost regardless in most circumstances, including constructing a list in sorted order from scratch. And therefore the additional complexity added to insert/append/__setitem__ probably outweigh the value of valuing an O(1) check at the end; especially since while True is supposed to imply sortedness, False doesn't imply out-of-order, just that we need to check when sortedness is required. Cheers, Cameron Simpson From alex at alexchamberlain.co.uk Mon Apr 8 02:44:41 2019 From: alex at alexchamberlain.co.uk (Alex Chamberlain) Date: Mon, 8 Apr 2019 07:44:41 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408051853.GA10806@cskk.homeip.net> References: <20190408051853.GA10806@cskk.homeip.net> Message-ID: On Mon, 8 Apr 2019 at 06:19, Cameron Simpson wrote: > > On 08Apr2019 00:17, Terry Reedy wrote: > >On 4/7/2019 10:32 PM, Steven D'Aprano wrote: > >>There are quite a few important algorithms which require lists to be > >>sorted. [...] > >>Proposal: let's give lists a dunder flag, __issorted__, that tracks > >>whether the list is definitely sorted or not: > > > >[...] Dunder names are not intended for directly use in code. If > >__issorted__ is a property, it could instead by .is_sorted or a new > >.is_sorted method, where is_sorted(bool) sets the property. > > Dunders are ok to use in implementation code (class internal). I agree > it's not nice to access from outside the class. > > I was imagining a builtin that calls somethings .__issorted__ method. > But Steven's suggesting a flat dunder attribute rather than a callable > method. > > If this is a publicly queriable value, is there any need to have a > dunder name at all? Why not just give lists a public is_sorted > attribute? > > I'm also not convinced the cost to every insert/append is worth the > (subjectively to me) highly infrequent use. > > I imagine one could increase the utility of the flag by implementing > insert/append with a bit of logic like: > > if self.__issorted__: > check-previous/next elements to see if sortedness is preserved > > so that a list constructed in sorted order may keep the flag. However, > it seems to me that such a list would accrue an O(n) cost with all of > those O(1) checks over the whole construction, and so not be cheaper > than just checking for sortedness aonce at the end before use. > > Conjecture: anything that requires a sorted list but _accepts_ an > unsorted list (eg outer code which uses bisect or median) needs to check > for sortedness and sort if not already sorted. > > So it feels to me like we've got an minimum O(n) cost regardless in most > circumstances, including constructing a list in sorted order from > scratch. And therefore the additional complexity added to > insert/append/__setitem__ probably outweigh the value of valuing an O(1) > check at the end; especially since while True is supposed to imply > sortedness, False doesn't imply out-of-order, just that we need to check > when sortedness is required. > > Cheers, > Cameron Simpson > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ Morning all, I think a better abstraction for a sorted list is a new class, which implements the Sequence protocol (and hence can be used in a lot of existing list contexts), but only exposed mutation methods that can guarantee that sorted order can be maintained (and hence is _not_ a MutableSequence). AFAICT that means adding an `insert` method on top of the standard read-only methods of a list and can be implemented easily using the `bisect` module. I think this is a better option, as it allows you to rely on that sorted order, rather than it being a convention. Thanks, Alex From p.f.moore at gmail.com Mon Apr 8 03:02:20 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 08:02:20 +0100 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Mon, 8 Apr 2019 at 02:54, Nam Nguyen wrote: > Back to my original goal, I've gathered that there is some interest in having a more general parser library in the stdlib. "Some", but not "much". Should I start out with a straw proposal so that we can hash it out further? I would expect that the only reasonable way of getting a parsing library in the stdlib would be to propose an established one from PyPI to be moved into the stdlib - and that would require the active support of the library author. I can't imagine any way that I'd support a brand new parsing library getting put in the stdlib - the area is sufficiently complex, and the external alternatives too mature, to make having a new, relatively untried library in the stdlib be a good idea. Paul From p.f.moore at gmail.com Mon Apr 8 03:09:46 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 08:09:46 +0100 Subject: [Python-ideas] cli tool to print value, similar to pydoc In-Reply-To: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> References: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> Message-ID: On Sun, 7 Apr 2019 at 12:44, Thomas Gl??le wrote: > There is already a tool like this on PyPI [1] (sadly py2 only atm), but > if you agree that this is a common pattern, I believe it would be a lot > more useful to have it in the stdlib. There's another tool on PyPI, I think, that does all you mention and more. I can't recall its name now (it's short, one letter or two, which makes for easy command line use, but sucks for searching :-() but as a data point I thought it was a really cool idea, but in practice I never used it and no loner install it when I use Python. So I don't know that it's a sufficiently useful idea in practice to be worth going in the stdlib. But if you can find that project again, and it has a reasonable number of users, it may be that I'm wrong in my assessment. Paul From kirillbalunov at gmail.com Mon Apr 8 03:28:00 2019 From: kirillbalunov at gmail.com (Kirill Balunov) Date: Mon, 8 Apr 2019 10:28:00 +0300 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: Basically, I like this idea and of course theoretically it has use cases. But in the proposed form, it feels strongly curtailed. In practice, what does it mean to be sorted? [('a', 1),('b', 2),('bb', 4),('aaa', 3)] # Is this list sorted? [('a', 1),('aaa', 3),('b', 2),('bb', 4)] # or this? [('a', 1),('b', 2),('aaa', 3),('bb', 4)] # or that? The right answer is that they are all sorted by some means. So, if you offer this __is_sorted__ attribute only for a very special case 1d list of numbers - this makes no sense. (Just re-read the recent thread, why .join is the string method and not the *list *method). On the other hand If you offer *some sort of a general protocol* storing a sort key or some other useful information, then this is awesome! But is this achievable in practice? with kind regards, -gdg ??, 8 ???. 2019 ?. ? 05:38, Steven D'Aprano : > There are quite a few important algorithms which require lists to be > sorted. For example, the bisect module, and for statistics median and > other quantiles. > > Sorting a list is potentially expensive: while Timsort is very > efficient, it is still ultimately an O(N log N) algorithm which limits > how efficient it can be. Checking whether a list is sorted is O(N). > > What if we could check that lists were sorted in constant time? > > Proposal: let's give lists a dunder flag, __issorted__, that tracks > whether the list is definitely sorted or not: > > - Empty lists, or lists with a single item, are created with > __issorted__ = True; lists with two or more items are created > with the flag set to False. > > - Appending or inserting items sets the flag to False. > > - Deleting or popping items doesn't change the flag. > > - Reversing the list doesn't change the flag. > > - Sorting it sets the flag to True. (The sort method should NOT > assume the list is sorted just because the flag is set.) > > Functions that require the list to be sorted can use the flag as a > quick check: > > if not alist.__issorted__: > alist.sort() > ... > > The flag will be writable, so that functions such as those in > bisect can mark that they have kept the sorted invariant: > > > bisect.insort(alist, x) > assert alist.__issorted__ > > > Being writable, the flag is advisory, not a guarantee, and "consenting > adults" applies. People can misuse the flag: > > alist = [1, 4, 2, 0, 5] > alist.__issorted__ = True > > but then they have nobody to blame but themselves if they shoot > themselves in the foot. That's no worse than the situation we have now, > were you might pass an unsorted list to bisect. > > The flag doesn't guarantee that the list is sorted the way you want > (e.g. biggest to smallest, by some key, etc) only that it has been > sorted. Its up to the user to ensure they sort it the right way: > > # Don't do this and expect it to work! > alist.sort(key=random.random) > bisect.insort(alist, 1) > > > If you really want to be sure about the state of the list, you have to > make a copy and sort it. But that's no different from the situation > right now. But for those willing to assume "consenting adults", you > might trust the flag and avoid sorting. > > Downsides: > > - Every list grows an extra attribute; however, given that lists are > already quite big data structures and are often over-allocated, I > don't think this will matter much. > > - insert(), append(), extend(), __setitem__() will be a tiny bit > slower due to the need to set the flag. > > > > Thoughts? > > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moagstar at gmail.com Mon Apr 8 03:30:26 2019 From: moagstar at gmail.com (Daniel Bradburn) Date: Mon, 8 Apr 2019 09:30:26 +0200 Subject: [Python-ideas] cli tool to print value, similar to pydoc In-Reply-To: References: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> Message-ID: Could it be q that you are thinking about? Op ma 8 apr. 2019 om 09:10 schreef Paul Moore : > On Sun, 7 Apr 2019 at 12:44, Thomas Gl??le wrote: > > There is already a tool like this on PyPI [1] (sadly py2 only atm), but > > if you agree that this is a common pattern, I believe it would be a lot > > more useful to have it in the stdlib. > > There's another tool on PyPI, I think, that does all you mention and > more. I can't recall its name now (it's short, one letter or two, > which makes for easy command line use, but sucks for searching :-() > but as a data point I thought it was a really cool idea, but in > practice I never used it and no loner install it when I use Python. So > I don't know that it's a sufficiently useful idea in practice to be > worth going in the stdlib. > > But if you can find that project again, and it has a reasonable number > of users, it may be that I'm wrong in my assessment. > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Mon Apr 8 04:33:15 2019 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Mon, 8 Apr 2019 10:33:15 +0200 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: <5663ac0a-2b0a-1513-124f-9d17edd1d4a4@gmail.com> looks for me that means that all subclass of list has to maintain the __isorted__ invariant, looks like not a backward compatible modification and quite problematic in case of the invariant broken. So if I'm correct the design is brittle. Moreover, that solve only the strict subset of sort case where there is no key function, in ascendant order, .... If one want to make a subclass of list having this properties, that don't look hard to do and far more safer. (of course the problem is that there is two similar list classes and the best one is not builtin) Le 08/04/2019 ? 04:32, Steven D'Aprano a ?crit?: > There are quite a few important algorithms which require lists to be > sorted. For example, the bisect module, and for statistics median and > other quantiles. > > Sorting a list is potentially expensive: while Timsort is very > efficient, it is still ultimately an O(N log N) algorithm which limits > how efficient it can be. Checking whether a list is sorted is O(N). > > What if we could check that lists were sorted in constant time? > > Proposal: let's give lists a dunder flag, __issorted__, that tracks > whether the list is definitely sorted or not: > > - Empty lists, or lists with a single item, are created with > __issorted__ = True; lists with two or more items are created > with the flag set to False. > > - Appending or inserting items sets the flag to False. > > - Deleting or popping items doesn't change the flag. > > - Reversing the list doesn't change the flag. > > - Sorting it sets the flag to True. (The sort method should NOT > assume the list is sorted just because the flag is set.) > > Functions that require the list to be sorted can use the flag as a > quick check: > > if not alist.__issorted__: > alist.sort() > ... > > The flag will be writable, so that functions such as those in > bisect can mark that they have kept the sorted invariant: > > > bisect.insort(alist, x) > assert alist.__issorted__ > > > Being writable, the flag is advisory, not a guarantee, and "consenting > adults" applies. People can misuse the flag: > > alist = [1, 4, 2, 0, 5] > alist.__issorted__ = True > > but then they have nobody to blame but themselves if they shoot > themselves in the foot. That's no worse than the situation we have now, > were you might pass an unsorted list to bisect. > > The flag doesn't guarantee that the list is sorted the way you want > (e.g. biggest to smallest, by some key, etc) only that it has been > sorted. Its up to the user to ensure they sort it the right way: > > # Don't do this and expect it to work! > alist.sort(key=random.random) > bisect.insort(alist, 1) > > > If you really want to be sure about the state of the list, you have to > make a copy and sort it. But that's no different from the situation > right now. But for those willing to assume "consenting adults", you > might trust the flag and avoid sorting. > > Downsides: > > - Every list grows an extra attribute; however, given that lists are > already quite big data structures and are often over-allocated, I > don't think this will matter much. > > - insert(), append(), extend(), __setitem__() will be a tiny bit > slower due to the need to set the flag. > > > > Thoughts? > > > > From p.f.moore at gmail.com Mon Apr 8 04:43:10 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 09:43:10 +0100 Subject: [Python-ideas] cli tool to print value, similar to pydoc In-Reply-To: References: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> Message-ID: On Mon, 8 Apr 2019 at 08:30, Daniel Bradburn wrote: > > Could it be q that you are thinking about? Thanks, but no - that's debug output in your code. The tool I recall was much more like the OP's suggestion > foo sys.version 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] Auto-import of sys, print the value by default, etc. Plus quite a bit more i think. Paul From steve at pearwood.info Mon Apr 8 05:09:20 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 19:09:20 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: <20190408090916.GK31406@ando.pearwood.info> On Sun, Apr 07, 2019 at 08:26:24PM -0700, Nathaniel Smith wrote: > On Sun, Apr 7, 2019 at 7:37 PM Steven D'Aprano wrote: > > There are quite a few important algorithms which require lists to be > > sorted. For example, the bisect module, and for statistics median and > > other quantiles. > > But this flag doesn't affect those modules, right? 'bisect' already > requires the user to ensure that the list is sorted appropriately Naturally the bisect and statistics modules (or any other that requires sorting) won't change to inspect this flag by magic, the code will require modification. Possibly the maintainer of bisect may decide that its not worth the change. But for the statistics module, I would certainly change the implementation of median() to look something vaguely like this: # was data = sorted(data) # may be expensive if data is large # may become if not (isinstance(data, list) and data.__issorted__): data = sorted(data) statistics is soon to grow a quantiles() function, but the thing with quantiles is often you want to get a bunch of them: # This only makes sense if data is a sequence (list) # not an iterator. quartiles = quantiles(data, n=4) quintiles = quantiles(data, n=5) deciles = quantiles(data, n=10) percentiles = quantiles(data, n=100) That's four calls to sorted(). The caller could drop that down to one: data.sort() quartiles = ... etc Now before anyone else mentions it, we could give the function a "dont_sort" argument, or "already_sorted" if you prefer, but I dislike that kind of constant-bool parameter and would prefer to avoid it. > and this bit: > > > The flag doesn't guarantee that the list is sorted the way you want > > (e.g. biggest to smallest, by some key, etc) only that it has been > > sorted. Its up to the user to ensure they sort it the right way: > > ...seems to mean that the 'statistics' module can't use this flag either. "Consenting adults" applies. If you want to pass an unsorted list to the functions, but pretend that its sorted, then on your own head be it. There's no real difference between these two hypothetical scenarios: data = [1, 4, 2, 0, 5, 3] garbage = median(data, already_sorted=True) versus: data = [1, 4, 2, 0, 5, 3] data.__issorted__ = True garbage = median(data) I'm perfectly comfortable with allowing the caller to lie if they want. Its their own foot they're shooting. (I wouldn't be so blas? about this if it were a function written in C that could segfault if the list wasn't sorted.) > It doesn't seem very likely to me that the savings from this flag > could outweigh the extra overhead it introduces, just because list > operations are *so* common in Python. If you want to push this > forward, the thing I'd most like to see is some kind of measurements > to demonstrate that average programs will benefit. I'm not sure that the average program uses sort *at all*, so a better set of questions are: - how much will this hurt the average program? (my gut feeling is "very little", but we'd need benchmarks to know) - are there any other use-cases for sorted data that could benefit? - how much will they benefit? Let's say, for the sake of the argument that this proposal makes the average program 0.01% slower, but helps sorting-heavy programs be 2% faster when dealing with large lists, then I think that might be a win. (Remember, this is about large lists. For small enough lists, the cost of sorting them in minor.) I can't benchmark the cost of setting the flag accurately (it would want to be done in C, doing it in pure Python using a subclass is very expensive). But for what its worth, on my old and slow computer, it takes 13.5 seconds to sort in place this list of 50 million integers: L = list(range(10000000))*5 and 15 seconds to make a sorted copy. Once sorted, subsequent sorts are faster, but still time consuming, about 2.5 seconds for an in-place sort. So on my machine, this will save about 2-3 seconds per sort that I can avoid doing. More if the list is copied. (I know it is totally unfair to benchmark the benefit without also benchmarking the cost. Deal with it :-) -- Steven From steve at pearwood.info Mon Apr 8 05:17:30 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 19:17:30 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408033114.GA67683@cskk.homeip.net> References: <20190408023204.GH31406@ando.pearwood.info> <20190408033114.GA67683@cskk.homeip.net> Message-ID: <20190408091730.GL31406@ando.pearwood.info> On Mon, Apr 08, 2019 at 01:31:14PM +1000, Cameron Simpson wrote: > __setitem__ concerns me, along with other modification methods: what > about subclasses(*)? Every existing subclass which overrides __setitem__ > now needs to grow code to maintain __issorted__ if they do not > themselves call list.__setitem__. Well, yeah. That's what happens if you subclass: you are responsible for maintaining the invariant if you don't call the superclasses. Maybe there's a cunning way to avoid this, but that will make the implementation more complex and probably tips this proposal from "maybe worth thinking about" to "not a chance". But I'd be inclined to just pass the buck to the subclass: if you want to maintain the invariant, then you have to maintain it, or call the appropriate super methods. Now that you've reminded me of subclasses, I would make one other change to the specs: all lists, including empty ones, are created with the flag set to False. That way subclasses which *don't* maintain the invariant will always be flagged as unsorted (unless the caller explicitly sets the flag themselves). > Also, should this imply an issorted() builtin to consult an instance's > __issorted__ dunder flag? Should such a builtin return False for > instances without an __issorted__ flag? I'm thinking yes since the flag > is intended to mean known-to-be-sorted. I don't think this proposal is worth adding a builtin function. Not unless somebody thinks of some more very compelling use-cases. Perhaps an inspect.sort_hint() function. -- Steven From steve at pearwood.info Mon Apr 8 05:32:49 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 19:32:49 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408051853.GA10806@cskk.homeip.net> References: <20190408051853.GA10806@cskk.homeip.net> Message-ID: <20190408093249.GM31406@ando.pearwood.info> On Mon, Apr 08, 2019 at 03:18:53PM +1000, Cameron Simpson wrote: > If this is a publicly queriable value, is there any need to have a > dunder name at all? Why not just give lists a public is_sorted > attribute? I don't mind that. > I'm also not convinced the cost to every insert/append is worth the > (subjectively to me) highly infrequent use. > > I imagine one could increase the utility of the flag by implementing > insert/append with a bit of logic like: > > if self.__issorted__: > check-previous/next elements to see if sortedness is preserved That's not practical unless the list remembers what sort of sort order it is supposed to have: - sorted smallest to biggest; - or biggest to smallest; - using what key. That might be appropriate for a dedicated SortedList class, but not for generic lists. But a SortedList class probably shouldn't support operations which break the sorted invariant. > Conjecture: anything that requires a sorted list but _accepts_ an > unsorted list (eg outer code which uses bisect or median) needs to check > for sortedness and sort if not already sorted. Well, yes, somebody has to sort the list at least once, otherwise it won't be sorted :-) Currently bisect simply trusts that the list is sorted and makes no effort to even check. The API basically says: Pass a sorted list, or you'll get garbage. With this check in place, it is *possible* to change the API to: Pass a sorted list, or I'll sort it for you; if you lie, you'll get garbage. (Whether the maintainer of bisect thinks this is a good API change, I don't know.) The median (and soon, quantiles) API says: I'm going to sort the list for you, whether you need it or not, just in case you do. It could become: I'm going to sort the list for you, if necessary. If you lie about it already being sorted, you'll get garbage. -- Steven From p.f.moore at gmail.com Mon Apr 8 05:34:19 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 10:34:19 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408090916.GK31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> Message-ID: On Mon, 8 Apr 2019 at 10:10, Steven D'Aprano wrote: > Possibly the maintainer of bisect may decide that its not worth the > change. But for the statistics module, I would certainly change the > implementation of median() to look something vaguely like this: > > # was > data = sorted(data) # may be expensive if data is large > > # may become > if not (isinstance(data, list) and data.__issorted__): > data = sorted(data) So just to be clear, this would be a change in behaviour - for a list that is currently sorted on a key that is *not* "numerically ascending", the code will no longer re-sort, and so will give the wrong answer? (Maybe the code is able to deal with either ascending or descending orders, I don't know about that, but even so, that just makes the failure rarer, not non-existent). I'm not saying that this is forbidden, just want to be clear if that's what you mean (because my difficulty with the proposed attribute is that it seems unreliable enough that I can't imagine a case where I'd feel comfortable using it myself). Paul PS I thought timsort was highly efficient given already sorted data? Whether it's OK to rely on that I don't know, but did you take that into account? From steve at pearwood.info Mon Apr 8 05:40:55 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 19:40:55 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408051853.GA10806@cskk.homeip.net> Message-ID: <20190408094055.GN31406@ando.pearwood.info> On Mon, Apr 08, 2019 at 07:44:41AM +0100, Alex Chamberlain wrote: > I think a better abstraction for a sorted list is a new class, which > implements the Sequence protocol (and hence can be used in a lot of > existing list contexts), but only exposed mutation methods that can > guarantee that sorted order can be maintained Perhaps that's a better idea. > (and hence is _not_ a MutableSequence). Right, but it can still be mutable, so long as the mutation methods can maintain the invariant. That means: - the SortedList needs to know the sort direction; - and the key used for sorting; - no slice or item assignment; - insertions are okay, since the SortedList can put them in the correct place; - but not append; - deletions are okay, since they won't change the sort invariant (at least not for items with a total order). -- Steven From p.f.moore at gmail.com Mon Apr 8 05:45:11 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 10:45:11 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408093249.GM31406@ando.pearwood.info> References: <20190408051853.GA10806@cskk.homeip.net> <20190408093249.GM31406@ando.pearwood.info> Message-ID: On Mon, 8 Apr 2019 at 10:34, Steven D'Aprano wrote: > The median (and soon, quantiles) API says: > > I'm going to sort the list for you, whether you need it or > not, just in case you do. Hmm, I didn't see that mentioned in the docs. It makes a difference to my comment about behaviour change, I think. > It could become: > > I'm going to sort the list for you, if necessary. If you > lie about it already being sorted, you'll get garbage. But regardless, this would be the same potential behaviour change I mentioned. (In the actual docs, it should probably be explicit about the sort order, as someone who passes a list sorted in the wrong way might not *think* they were lying, just that they hadn't sorted the list... Still not convinced this is safe enough to be worth it ;-) Paul From njs at pobox.com Mon Apr 8 05:55:54 2019 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 8 Apr 2019 02:55:54 -0700 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408090916.GK31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> Message-ID: On Mon, Apr 8, 2019, 02:09 Steven D'Aprano wrote: > On Sun, Apr 07, 2019 at 08:26:24PM -0700, Nathaniel Smith wrote: > > On Sun, Apr 7, 2019 at 7:37 PM Steven D'Aprano > wrote: > > > There are quite a few important algorithms which require lists to be > > > sorted. For example, the bisect module, and for statistics median and > > > other quantiles. > > > > But this flag doesn't affect those modules, right? 'bisect' already > > requires the user to ensure that the list is sorted appropriately > > Naturally the bisect and statistics modules (or any other that requires > sorting) won't change to inspect this flag by magic, the code will > require modification. Right, by "doesn't affect" I meant "cannot get any benefit, even if their code is modified". Possibly the maintainer of bisect may decide that its not worth the > change. But for the statistics module, I would certainly change the > implementation of median() to look something vaguely like this: > > # was > data = sorted(data) # may be expensive if data is large > > # may become > if not (isinstance(data, list) and data.__issorted__): > data = sorted(data) > > statistics is soon to grow a quantiles() function, but the thing with > quantiles is often you want to get a bunch of them: > > # This only makes sense if data is a sequence (list) > # not an iterator. > quartiles = quantiles(data, n=4) > quintiles = quantiles(data, n=5) > deciles = quantiles(data, n=10) > percentiles = quantiles(data, n=100) > If only we had some kind of API that could compute multiple quantiles at the same time... > > That's four calls to sorted(). The caller could drop that down to one: > > data.sort() > quartiles = ... etc > > > Now before anyone else mentions it, we could give the function a > "dont_sort" argument, or "already_sorted" if you prefer, but I dislike > that kind of constant-bool parameter and would prefer to avoid it. > > > > and this bit: > > > > > The flag doesn't guarantee that the list is sorted the way you want > > > (e.g. biggest to smallest, by some key, etc) only that it has been > > > sorted. Its up to the user to ensure they sort it the right way: > > > > ...seems to mean that the 'statistics' module can't use this flag either. > > "Consenting adults" applies. If you want to pass an unsorted list to the > functions, but pretend that its sorted, then on your own head be it. > There's no real difference between these two hypothetical scenarios: > > data = [1, 4, 2, 0, 5, 3] > garbage = median(data, already_sorted=True) > > versus: > > data = [1, 4, 2, 0, 5, 3] > data.__issorted__ = True > garbage = median(data) > > > I'm perfectly comfortable with allowing the caller to lie if they want. > Its their own foot they're shooting. > An already_sorted=True argument would be an explicit opt in, and consenting adults would apply. But your message was very explicit that __issorted__ can be set implicitly, though. For example, this would give garbage results: # implicitly sets the sorted flag data.sort() # preserves the flag, because hey it's sorted by *some* key data.reverse() statistics.median(data) You can't use this in statistics.median because it would break compatibility. Also, isn't the whole point of 'statistics' to be the simple, reliable module for folks who aren't that worried about speed? This seems like a massive footgun. > (I wouldn't be so blas? about this if it were a function written in C > that could segfault if the list wasn't sorted.) > Silently giving the wrong answer is way worse than a segfault. > It doesn't seem very likely to me that the savings from this flag > > could outweigh the extra overhead it introduces, just because list > > operations are *so* common in Python. If you want to push this > > forward, the thing I'd most like to see is some kind of measurements > > to demonstrate that average programs will benefit. > > I'm not sure that the average program uses sort *at all*, so a better > set of questions are: > > - how much will this hurt the average program? > (my gut feeling is "very little", but we'd need benchmarks to know) > > - are there any other use-cases for sorted data that could benefit? > > - how much will they benefit? > > Let's say, for the sake of the argument that this proposal makes the > average program 0.01% slower, but helps sorting-heavy programs be 2% > faster when dealing with large lists, then I think that might be a win. > Obviously these are made up numbers, but if they were real then for it to be a net win you would still need at least 1 in 200 programs to be "sorting heavy" in a way that could benefit from this flag, and I don't believe that's true. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Apr 8 06:25:30 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 20:25:30 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> Message-ID: <20190408102530.GP31406@ando.pearwood.info> On Mon, Apr 08, 2019 at 02:55:54AM -0700, Nathaniel Smith wrote: > Right, by "doesn't affect" I meant "cannot get any benefit, even if their > code is modified". Ah, sorry I misunderstood you. > > # This only makes sense if data is a sequence (list) > > # not an iterator. > > quartiles = quantiles(data, n=4) > > quintiles = quantiles(data, n=5) > > deciles = quantiles(data, n=10) > > percentiles = quantiles(data, n=100) > > > > If only we had some kind of API that could compute multiple quantiles at > the same time... You mean something like this? quartiles, quintiles, deciles, percentiles = quantiles(data, n=(4, 5, 10, 100)) Yuck :-) I'd rather allow an already_sorted=True parameter :-) > > I'm perfectly comfortable with allowing the caller to lie if they want. > > Its their own foot they're shooting. > > > > An already_sorted=True argument would be an explicit opt in, and consenting > adults would apply. But your message was very explicit that __issorted__ > can be set implicitly, though. For example, this would give garbage results: > > # implicitly sets the sorted flag > data.sort() > # preserves the flag, because hey it's sorted by *some* key > data.reverse() > statistics.median(data) It would certainly be putting more responsibility on the caller to ensure the sorted flag was correct. > You can't use this in statistics.median because it would break > compatibility. I would argue differently. > Also, isn't the whole point of 'statistics' to be the > simple, reliable module for folks who aren't that worried about speed? This > seems like a massive footgun. Perhaps. Perhaps not as massive as it seems. The expected use-case would be something like this: data = get_data() a = median(data) data.sort() b = median(data) # Like a, but faster. To be a problem, the caller would need to do something like: data = get_data() a = median(data) data.sort(reversed=True) # or some weird key function b = median(data) # Garbage result. I can't see people shooting themselves in the foot in this way by accident very often. But fair enough, it is a risk. > > (I wouldn't be so blas? about this if it were a function written in C > > that could segfault if the list wasn't sorted.) > > > > Silently giving the wrong answer is way worse than a segfault. It depends on what you're worried about, and who gets the blame for the wrong answer. As I understand it, it is the position of the core-devs that *any* seg fault in the interpreter or stdlib is a serious bug that must be fixed (possibly excepting ctypes); but if Python code returns garbage when you give it garbage input, that may or may not be considered a bug. In this case, passing a list with the flag set when it is not actually sorted correctly would be a "Garbage In, Garbage Out" error, just as if they had explicitly passed a already_sorted=True argument. But I take your earlier point that the argument version is explicit and opt-in, while the flag is implicit and may not be opt-in. Given all the points already raised, I think that an explicit SortedList might be more appropriate. Thanks everyone for all the feedback. -- Steven From steve at pearwood.info Mon Apr 8 06:01:58 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Apr 2019 20:01:58 +1000 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> Message-ID: <20190408100158.GO31406@ando.pearwood.info> On Mon, Apr 08, 2019 at 10:34:19AM +0100, Paul Moore wrote: > On Mon, 8 Apr 2019 at 10:10, Steven D'Aprano wrote: > > > Possibly the maintainer of bisect may decide that its not worth the > > change. But for the statistics module, I would certainly change the > > implementation of median() to look something vaguely like this: > > > > # was > > data = sorted(data) # may be expensive if data is large > > > > # may become > > if not (isinstance(data, list) and data.__issorted__): > > data = sorted(data) > > So just to be clear, this would be a change in behaviour - for a list > that is currently sorted on a key that is *not* "numerically > ascending", the code will no longer re-sort, and so will give the > wrong answer? Correct. A thought comes to mind... Currently the two variant forms of median (median_low and _high) can work with non-numeric data: # don't take this example too seriously py> statistics.median_low(['low', 'high', 'high', 'low', 'very low']) 'low' so perhaps there is a use-case for non-numeric sorts. > I'm not saying that this is forbidden, just want to be clear if that's > what you mean (because my difficulty with the proposed attribute is > that it seems unreliable enough that I can't imagine a case where I'd > feel comfortable using it myself). Fair enough, but does it help you feel a bit better about the feature if we called it a "sort hint", and emphasized that it should only be used in the same sort of APIs where you might allow the caller to pass already_sorted=True to skip a redundant sort step? > PS I thought timsort was highly efficient given already sorted data? > Whether it's OK to rely on that I don't know, but did you take that > into account? Yes, timsort is very efficient with already sorted data, and I have :-) On my computer, to sort 50 million integers in place takes about 13.5 seconds; to sort it the next time takes 2.7 seconds. (That's effectively just galloping along the list, looking for out-of-place items and not finding any.) To anyone tempted to say "Get a better computer", if I had a better computer I'd be using a list of 50 billion integers and it would still take 2.7 seconds :-) -- Steven From p.f.moore at gmail.com Mon Apr 8 06:39:05 2019 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Apr 2019 11:39:05 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408100158.GO31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> <20190408100158.GO31406@ando.pearwood.info> Message-ID: On Mon, 8 Apr 2019 at 11:27, Steven D'Aprano wrote: > > Fair enough, but does it help you feel a bit better about the feature if > we called it a "sort hint", and emphasized that it should only be used > in the same sort of APIs where you might allow the caller to pass > > already_sorted=True > > to skip a redundant sort step? Not really, because already_sorted=True is an explicit claim by the user, whereas the (internally maintained) dunder attribute is the interpreter maintaining a (not guaranteed reliable but pessimistic to be as safe as possible) view on whether the data is sorted. But I see elsewhere in the thread that you're now more inclined towards an explicit already_sorted flag, so I won't labour the point. Paul From 2QdxY4RzWzUUiLuE at potatochowder.com Mon Apr 8 06:48:09 2019 From: 2QdxY4RzWzUUiLuE at potatochowder.com (Dan Sommers) Date: Mon, 8 Apr 2019 06:48:09 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408102530.GP31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> <20190408102530.GP31406@ando.pearwood.info> Message-ID: On 4/8/19 6:25 AM, Steven D'Aprano wrote: > On Mon, Apr 08, 2019 at 02:55:54AM -0700, Nathaniel Smith wrote: >> If only we had some kind of API that could compute multiple quantiles at >> the same time... > > You mean something like this? > > quartiles, quintiles, deciles, percentiles = quantiles(data, n=(4, 5, 10, 100)) > > Yuck :-) > > I'd rather allow an already_sorted=True parameter :-) I'd rather have a separate function. I can't think of a good name for it right now, but it'd be just like quantiles, except that you'd pass it a sorted list. Conceptually, it'd fit in like this: def quantiles(data, n): if isinstance(data, list): # or some other appropriate test sorted_data = sorted(data) else: sorted_data = data quantiles_of_sorted_data(sorted_data, n) def quantiles_of_sorted_data(sorted_data, n): ...actual quantiles functionality here... Properly constructed programs that would have called quantiles repeatedly can call quantiles_of_sorted_data repeatedly instead. The ability to shoot a foot remains unchanged. From t_glaessle at gmx.de Mon Apr 8 07:00:15 2019 From: t_glaessle at gmx.de (=?UTF-8?B?VGhvbWFzIEdsw6TDn2xl?=) Date: Mon, 8 Apr 2019 13:00:15 +0200 Subject: [Python-ideas] cli tool to print value, similar to pydoc In-Reply-To: <20190408023530.GI31406@ando.pearwood.info> References: <5ff9f31c-4701-6d75-f08a-eeda095ebd32@gmx.de> <20190408023530.GI31406@ando.pearwood.info> Message-ID: <2d2f5807-fff8-51b5-fd63-1f5acd3b2df8@gmx.de> Steven D'Aprano wrote on 4/8/19 4:35 AM: > How will it know what object os is, without guessing, if you haven't > imported it? Like pydoc/help does it. I assume it splits by dots and imports the longest importable subsplit (try...except ImportError in a loop), then iteratively getattrs the rest. I admit that there is a little ambiguity here in that you could also import the shortest possible subsplit such that the rest of the name can be resolved (which is definitely not what help does). In practice it wouldn't be different in most cases, but if you're concerned about the ambiguity, one could separate module and attribute part by colon, like with entry points, e.g. "os:pathsep". > I always have at least one REPL open for precisely this sort of thing, > and the interactive interpreter is infinitely more flexible and powerful > than a tool to print one value. Sure, but I find that most uses are covered by either pydoc, or printing a value, and more complex snippets are often better tried by putting it into a script file - which makes it easier to make small modifications and then re-execute. Furthermore, often a single REPL is not enough, since you may want to see the value in different python interpreters or environments. In any case, it is surely not "needed" per se, but merely convenience. Same applies for the default-usage of pydoc, and still I tend pydoc a lot over opening a REPL and calling help(), because it combines two or three steps into one. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: OpenPGP digital signature URL: From mertz at gnosis.cx Mon Apr 8 07:51:41 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 8 Apr 2019 07:51:41 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408051853.GA10806@cskk.homeip.net> <20190408093249.GM31406@ando.pearwood.info> Message-ID: On Mon, Apr 8, 2019, 5:46 AM Paul Moore wrote: > Still not convinced this is safe enough to be worth it ;-) > I'm convinced it's NOT safe enough to be worth it. On the other hand, a sortedlist subclass that maintained its invariant (probably remembering a key) sounds cool. I think there are one or more good ones on PyPI. Statistics should simply learn to recognize external always-sorted data structures. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Apr 8 08:00:32 2019 From: mertz at gnosis.cx (David Mertz) Date: Mon, 8 Apr 2019 08:00:32 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408102530.GP31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> <20190408090916.GK31406@ando.pearwood.info> <20190408102530.GP31406@ando.pearwood.info> Message-ID: On Mon, Apr 8, 2019, 6:26 AM Steven D'Aprano wrote: > Given all the points already raised, I think that an explicit SortedList > might be more appropriate. > This one looks cool. I've read about it, but haven't used it: http://www.grantjenks.com/docs/sortedcontainers/ I think a "sort hint" should be a read-only attribute of some collections, and statistics should look for the presence of that. You might need to get third parties to follow that convention, but you're no worse than now if they don't. Burdening list with a vague suggestion about an invariant it doesn't maintain is a bad idea. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pythonchb at gmail.com Mon Apr 8 10:59:32 2019 From: pythonchb at gmail.com (Christopher Barker) Date: Mon, 8 Apr 2019 07:59:32 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Mon, Apr 8, 2019 at 12:02 AM Paul Moore wrote: > I would expect that the only reasonable way of getting a parsing > library in the stdlib would be to propose an established one from PyPI > to be moved into the stdlib Absolutely -- unlike some proposals, a stand-alone parsing lib could very easily be developed external to the stdlib. If one gains traction as an obvious choice, then we can talk about bringing it in. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Apr 8 14:37:27 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 8 Apr 2019 14:37:27 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408094055.GN31406@ando.pearwood.info> References: <20190408051853.GA10806@cskk.homeip.net> <20190408094055.GN31406@ando.pearwood.info> Message-ID: On 4/8/2019 5:40 AM, Steven D'Aprano wrote: > On Mon, Apr 08, 2019 at 07:44:41AM +0100, Alex Chamberlain wrote: > >> I think a better abstraction for a sorted list is a new class, which >> implements the Sequence protocol (and hence can be used in a lot of >> existing list contexts), but only exposed mutation methods that can >> guarantee that sorted order can be maintained > > Perhaps that's a better idea. > >> (and hence is _not_ a MutableSequence). > > Right, but it can still be mutable, so long as the mutation methods can > maintain the invariant. That means: > > - the SortedList needs to know the sort direction; > - and the key used for sorting; > - no slice or item assignment; Item assignment could be allowed if it checked the new value against neighbors and raised ValueError if it would 'unsort' the list. > - insertions are okay, since the SortedList can put them in > the correct place; > - but not append; > - deletions are okay, since they won't change the sort invariant > (at least not for items with a total order). > > -- Terry Jan Reedy From cryptolabour at gmail.com Mon Apr 8 14:56:22 2019 From: cryptolabour at gmail.com (Ai mu) Date: Mon, 8 Apr 2019 22:56:22 +0400 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: @DavidMertz Each one of them takes a dramatically different approach to the defining a grammar they work more towards implementing well known standards like the BNF. well internally they might work different to parse etc. Abdur-Rahmaan Janhangeer Mauritius -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Mon Apr 8 16:48:02 2019 From: barry at barrys-emacs.org (Barry Scott) Date: Mon, 8 Apr 2019 21:48:02 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: References: <20190408051853.GA10806@cskk.homeip.net> <20190408094055.GN31406@ando.pearwood.info> Message-ID: <36E9B8F6-AF8F-4D49-94F2-8EFFC6C042CB@barrys-emacs.org> > On 8 Apr 2019, at 19:37, Terry Reedy wrote: > > On 4/8/2019 5:40 AM, Steven D'Aprano wrote: >> On Mon, Apr 08, 2019 at 07:44:41AM +0100, Alex Chamberlain wrote: >>> I think a better abstraction for a sorted list is a new class, which >>> implements the Sequence protocol (and hence can be used in a lot of >>> existing list contexts), but only exposed mutation methods that can >>> guarantee that sorted order can be maintained >> Perhaps that's a better idea. >>> (and hence is _not_ a MutableSequence). >> Right, but it can still be mutable, so long as the mutation methods can >> maintain the invariant. That means: >> - the SortedList needs to know the sort direction; >> - and the key used for sorting; >> - no slice or item assignment; > > Item assignment could be allowed if it checked the new value against neighbors and raised ValueError if it would 'unsort' the list. >> - insertions are okay, since the SortedList can put them in >> the correct place; >> - but not append; >> - deletions are okay, since they won't change the sort invariant >> (at least not for items with a total order). How do you handle a list of objects that after insertion and being sorted change their sort key and thus make the list unsorted. From the point of view of the list nothing changed, but its not sorted now. Think if a list of file stat info objects that are sorted by size. As the files are written to the size changes. I can loop over the objects and tell them to update the stats. Now the __is_sorted__ property is wrong. Barry > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From alex at alexchamberlain.co.uk Mon Apr 8 17:16:31 2019 From: alex at alexchamberlain.co.uk (Alex Chamberlain) Date: Mon, 8 Apr 2019 22:16:31 +0100 Subject: [Python-ideas] Sorted lists In-Reply-To: <36E9B8F6-AF8F-4D49-94F2-8EFFC6C042CB@barrys-emacs.org> References: <20190408051853.GA10806@cskk.homeip.net> <20190408094055.GN31406@ando.pearwood.info> <36E9B8F6-AF8F-4D49-94F2-8EFFC6C042CB@barrys-emacs.org> Message-ID: On Mon, 8 Apr 2019 at 22:07, Barry Scott wrote: > > > > > On 8 Apr 2019, at 19:37, Terry Reedy wrote: > > > > On 4/8/2019 5:40 AM, Steven D'Aprano wrote: > >> On Mon, Apr 08, 2019 at 07:44:41AM +0100, Alex Chamberlain wrote: > >>> I think a better abstraction for a sorted list is a new class, which > >>> implements the Sequence protocol (and hence can be used in a lot of > >>> existing list contexts), but only exposed mutation methods that can > >>> guarantee that sorted order can be maintained > >> Perhaps that's a better idea. > >>> (and hence is _not_ a MutableSequence). > >> Right, but it can still be mutable, so long as the mutation methods can > >> maintain the invariant. That means: > >> - the SortedList needs to know the sort direction; > >> - and the key used for sorting; > >> - no slice or item assignment; > > > > Item assignment could be allowed if it checked the new value against neighbors and raised ValueError if it would 'unsort' the list. > >> - insertions are okay, since the SortedList can put them in > >> the correct place; > >> - but not append; > >> - deletions are okay, since they won't change the sort invariant > >> (at least not for items with a total order). > > > How do you handle a list of objects that after insertion and being sorted change their sort key and thus make the list unsorted. > From the point of view of the list nothing changed, but its not sorted now. > > Think if a list of file stat info objects that are sorted by size. > As the files are written to the size changes. > I can loop over the objects and tell them to update the stats. > Now the __is_sorted__ property is wrong. > > Barry > > > > > > > > -- > > Terry Jan Reedy > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ Evening Barry, I think you're right to recognise that as a risk, but I consider it similar to the risk of someone defining `__hash__` on a mutable object, putting it in a `set` then mutating it. We'd have to make the contract of `SortedList` clear, but I think it would be very difficult to stop the determined user from breaking the contract if they so wished. To be clear, that contract would be that although storing mutable objects is supported, once in the SortedList, a mutable object must not be changed in such a way as to change its key. Alex From tjreedy at udel.edu Mon Apr 8 19:22:22 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 8 Apr 2019 19:22:22 -0400 Subject: [Python-ideas] Sorted lists In-Reply-To: <36E9B8F6-AF8F-4D49-94F2-8EFFC6C042CB@barrys-emacs.org> References: <20190408051853.GA10806@cskk.homeip.net> <20190408094055.GN31406@ando.pearwood.info> <36E9B8F6-AF8F-4D49-94F2-8EFFC6C042CB@barrys-emacs.org> Message-ID: On 4/8/2019 4:48 PM, Barry Scott wrote: > > >> On 8 Apr 2019, at 19:37, Terry Reedy wrote: >> >> On 4/8/2019 5:40 AM, Steven D'Aprano wrote: >>> On Mon, Apr 08, 2019 at 07:44:41AM +0100, Alex Chamberlain wrote: >>>> I think a better abstraction for a sorted list is a new class, which >>>> implements the Sequence protocol (and hence can be used in a lot of >>>> existing list contexts), but only exposed mutation methods that can >>>> guarantee that sorted order can be maintained >>> Perhaps that's a better idea. >>>> (and hence is _not_ a MutableSequence). >>> Right, but it can still be mutable, so long as the mutation methods can >>> maintain the invariant. That means: >>> - the SortedList needs to know the sort direction; >>> - and the key used for sorting; >>> - no slice or item assignment; >> >> Item assignment could be allowed if it checked the new value against neighbors and raised ValueError if it would 'unsort' the list. >>> - insertions are okay, since the SortedList can put them in >>> the correct place; >>> - but not append; >>> - deletions are okay, since they won't change the sort invariant >>> (at least not for items with a total order). > > > How do you handle a list of objects that after insertion and being sorted change their sort key and thus make the list unsorted. > From the point of view of the list nothing changed, but its not sorted now. > > Think if a list of file stat info objects that are sorted by size. > As the files are written to the size changes. > I can loop over the objects and tell them to update the stats. > Now the __is_sorted__ property is wrong. I think that this is an argument for a SortedList class. If .key_function is changes, the list is resorted by the new key function. To make it thread-safer, access might be locked for the duration. -- Terry Jan Reedy From bitsink at gmail.com Tue Apr 9 12:06:17 2019 From: bitsink at gmail.com (Nam Nguyen) Date: Tue, 9 Apr 2019 09:06:17 -0700 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker wrote: > > > On Mon, Apr 8, 2019 at 12:02 AM Paul Moore wrote: > >> I would expect that the only reasonable way of getting a parsing >> library in the stdlib would be to propose an established one from PyPI >> to be moved into the stdlib > > > Absolutely -- unlike some proposals, a stand-alone parsing lib could very > easily be developed external to the stdlib. If one gains traction as an > obvious choice, then we can talk about bringing it in. > All options are still on the table. It is important to closely align the solution to the goal of making itself available for *internal use* in the stdlib itself. Having a parser library in the stdlib for *general use* is not an explicit goal that I am aiming for, just as pgen2 was not intended that way. Neither should that deter one from being considered. Nam > > -CHB > > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Tue Apr 9 13:59:59 2019 From: barry at barrys-emacs.org (Barry Scott) Date: Tue, 9 Apr 2019 18:59:59 +0100 Subject: [Python-ideas] Built-in parsing library In-Reply-To: References: Message-ID: <14895584-29D5-4D87-A53B-2C91E39192D5@barrys-emacs.org> Nam, I'm not so sure that a "universal parsing library" is possible for the stdlib. I think one way you could find out what the requirements are is to refactor at least 2 of the existing stdlib modules that you have identified as needing a better parser. Did you find that you could use the same parser code for both? Would it apply to other modules? Barry > On 9 Apr 2019, at 17:06, Nam Nguyen wrote: > > On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker > wrote: > > > On Mon, Apr 8, 2019 at 12:02 AM Paul Moore > wrote: > I would expect that the only reasonable way of getting a parsing > library in the stdlib would be to propose an established one from PyPI > to be moved into the stdlib > > Absolutely -- unlike some proposals, a stand-alone parsing lib could very easily be developed external to the stdlib. If one gains traction as an obvious choice, then we can talk about bringing it in. > > All options are still on the table. It is important to closely align the solution to the goal of making itself available for *internal use* in the stdlib itself. Having a parser library in the stdlib for *general use* is not an explicit goal that I am aiming for, just as pgen2 was not intended that way. Neither should that deter one from being considered. > > Nam > > > -CHB > > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Apr 10 00:13:35 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 10 Apr 2019 13:13:35 +0900 Subject: [Python-ideas] Built-in parsing library In-Reply-To: <14895584-29D5-4D87-A53B-2C91E39192D5@barrys-emacs.org> References: <14895584-29D5-4D87-A53B-2C91E39192D5@barrys-emacs.org> Message-ID: <23725.28015.265013.274659@turnbull.sk.tsukuba.ac.jp> Barry Scott writes: > I'm not so sure that a "universal parsing library" is possible for > the stdlib. That shouldn't be our goal. (And I don't think Nam is wedded to that expression of the goal.) > I think one way you could find out what the requirements are is to > refactor at least 2 of the existing stdlib modules that you have > identified as needing a better parser. I think this is a really good idea. I'll be sprinting on Mailman at PyCon, but if Nam and other proponents have time around PyCon (and haven't done it already :-) I'll be able to make time then. Feel free to ping me off-list. (Meeting at PyCon would be a bonus, but IRC or SNS messaging/whiteboarding works for me too if other interested folks can't be there.) > Did you find that you could use the same parser code for both? I think it highly likely that "enough" protocols and "little languages" that are normally written by machines (or skilled programmers) can be handled by "Dragon Book"[1] parsers to make it worth adding some parsing library to the stdlib. Of course, more general (but still efficient) options have been developed since I last shaved a yacc, but that's not the point. Developers who have special needs (extremely efficient parsing of a relatively simple grammar, more general grammars) or simply want to continue using a different module that they've been requiring from the Cheese Shop since it was called "the Cheese Shop"[2] can (and should) do that. The point of the stdlib is to provide standard batteries that serve in common situations going forward. I've been using regexps since 1980, and am perfectly comfortable with rather complex expressions. Eg, I've written more or less general implementations of RFC 3986 and its predecessor RFC 2396, which is one of the examples Nam has tried. Nevertheless, there are some tricky aspects (for example, I did *not* try to implement 3986 in one expression -- as 3986 says: These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference. so I used multiple, mutually exclusive regexps for the different productions). There is no question in my mind that the ABNF is easier to read. Implementing a set of regexps from the ABNF is easier than reconstructing the ABNF from the regexps. That's *my* rationale for including a parsing module in the stdlib: making common parsing tasks more reliable in implementation and more maintainable. To me, the real question is, "Suppose we add a general parsing library to the stdlib, and refactor some modules to use it. (1) Will this 'magically' fix some bugs/RFEs? (Not essential, but would be a nice bonus.) (2) Will the denizens of python-ideas and python-dev find such refactored modules readable and more maintainable than a plethora of ad hoc recursive descent parsers?" Obviously people who haven't studied parsers will have to project to a future self that has become used to reading grammar descriptions, but I think folks around here are used to that kind of projection. This would be a good test. Footnotes: [1] "Do I date myself? Very well then, I date myself." [2] See [1]. From contact at brice.xyz Wed Apr 10 04:01:32 2019 From: contact at brice.xyz (Brice Parent) Date: Wed, 10 Apr 2019 10:01:32 +0200 Subject: [Python-ideas] Sorted lists In-Reply-To: <20190408023204.GH31406@ando.pearwood.info> References: <20190408023204.GH31406@ando.pearwood.info> Message-ID: <9e90545a-334f-98f8-2203-48217d19815f@brice.xyz> It surely would be helpful. I still find it a bit too single-case oriented. Maybe having an equivalent __sorting_algo__ property with a value of the current sorting algorithm would be more general? There could be a SortingAlgo base class, which could be extended into classes like: ?- SortingAlgoNone(SortingAlgo) or SortingAlgoUnsorted(SortingAlgo) which would be the default for non-sorted lists (or just the value None) ?- SortingAlgoAscending(SortingAlgo) ?- SortingAlgoAscendingNumbers(SortingAlgoAscending) ?- MyCustomSortingAlgo(SortingAlgo) ?- ... It would allow to mark a list as sorted with any algorithm, and of course, any code that would use these lists would be able to read/write this __sorting_algo__. And complementary idea, we could have an extra arg to sort() (and other functions like this one) like `trust_declared_algo=True`, that if set would only sort the list if its list.__sorting_algo__ is compatible (a subclass of the sorting algo it uses, or the class itself). The rest of the behaviours (when the __sorting_algo__ would be set or reset) would be as described by Steven in the original proposal. -Brice Le 8/4/19 ? 4:32, Steven D'Aprano a ?crit?: > There are quite a few important algorithms which require lists to be > sorted. For example, the bisect module, and for statistics median and > other quantiles. > > Sorting a list is potentially expensive: while Timsort is very > efficient, it is still ultimately an O(N log N) algorithm which limits > how efficient it can be. Checking whether a list is sorted is O(N). > > What if we could check that lists were sorted in constant time? > > Proposal: let's give lists a dunder flag, __issorted__, that tracks > whether the list is definitely sorted or not: > > - Empty lists, or lists with a single item, are created with > __issorted__ = True; lists with two or more items are created > with the flag set to False. > > - Appending or inserting items sets the flag to False. > > - Deleting or popping items doesn't change the flag. > > - Reversing the list doesn't change the flag. > > - Sorting it sets the flag to True. (The sort method should NOT > assume the list is sorted just because the flag is set.) > > Functions that require the list to be sorted can use the flag as a > quick check: > > if not alist.__issorted__: > alist.sort() > ... > > The flag will be writable, so that functions such as those in > bisect can mark that they have kept the sorted invariant: > > > bisect.insort(alist, x) > assert alist.__issorted__ > > > Being writable, the flag is advisory, not a guarantee, and "consenting > adults" applies. People can misuse the flag: > > alist = [1, 4, 2, 0, 5] > alist.__issorted__ = True > > but then they have nobody to blame but themselves if they shoot > themselves in the foot. That's no worse than the situation we have now, > were you might pass an unsorted list to bisect. > > The flag doesn't guarantee that the list is sorted the way you want > (e.g. biggest to smallest, by some key, etc) only that it has been > sorted. Its up to the user to ensure they sort it the right way: > > # Don't do this and expect it to work! > alist.sort(key=random.random) > bisect.insort(alist, 1) > > > If you really want to be sure about the state of the list, you have to > make a copy and sort it. But that's no different from the situation > right now. But for those willing to assume "consenting adults", you > might trust the flag and avoid sorting. > > Downsides: > > - Every list grows an extra attribute; however, given that lists are > already quite big data structures and are often over-allocated, I > don't think this will matter much. > > - insert(), append(), extend(), __setitem__() will be a tiny bit > slower due to the need to set the flag. > > > > Thoughts? > > > > From de.lsnk at gmail.com Wed Apr 10 05:09:16 2019 From: de.lsnk at gmail.com (Krokosh Nikita) Date: Wed, 10 Apr 2019 19:09:16 +1000 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" Message-ID: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> Hello. I have a following question: How come there's no such thing in Python like starmap but which unpacks dicts as kwargs for fuction? For example I have a format string like "{param1}, {param2}" and want to get results passing list of dicts for it's .format(). Of course I can do that with genexpr or some lambda with map. But can you clarify why starmap version of map fuction exists and doublestarmap doesn't? Best Regards. From boxed at killingar.net Wed Apr 10 05:48:40 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Wed, 10 Apr 2019 11:48:40 +0200 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" In-Reply-To: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> References: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> Message-ID: I don't really understand. You can do: '{a} {b}'.format(**{'a': 1}, **{'b': 2}) Is that what you want? > On 10 Apr 2019, at 11:09, Krokosh Nikita wrote: > > Hello. I have a following question: How come there's no such thing in Python like starmap but which unpacks dicts as kwargs for fuction? > > For example I have a format string like "{param1}, {param2}" and want to get results passing list of dicts for it's .format(). > > Of course I can do that with genexpr or some lambda with map. > > But can you clarify why starmap version of map fuction exists and doublestarmap doesn't? > > > Best Regards. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From de.lsnk at gmail.com Wed Apr 10 05:55:49 2019 From: de.lsnk at gmail.com (Krokosh Nikita) Date: Wed, 10 Apr 2019 19:55:49 +1000 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" In-Reply-To: References: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> Message-ID: <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> I need smth like starstarmap('{a} / {b}/ {c}'.format, [{a:1, b:2, c:3}, {a:4, b:5, c:6}, ...]) On 4/10/19 7:48 PM, Anders Hovm?ller wrote: > I don't really understand. You can do: > > '{a} {b}'.format(**{'a': 1}, **{'b': 2}) > > Is that what you want? > >> On 10 Apr 2019, at 11:09, Krokosh Nikita wrote: >> >> Hello. I have a following question: How come there's no such thing in Python like starmap but which unpacks dicts as kwargs for fuction? >> >> For example I have a format string like "{param1}, {param2}" and want to get results passing list of dicts for it's .format(). >> >> Of course I can do that with genexpr or some lambda with map. >> >> But can you clarify why starmap version of map fuction exists and doublestarmap doesn't? >> >> >> Best Regards. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ From robertve92 at gmail.com Wed Apr 10 06:08:37 2019 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Wed, 10 Apr 2019 13:08:37 +0300 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" In-Reply-To: <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> References: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> Message-ID: robertvandeneynde.be Le mer. 10 avr. 2019 ? 12:55, Krokosh Nikita a ?crit : > I need smth like starstarmap('{a} / {b}/ {c}'.format, [{a:1, b:2, c:3}, > {a:4, b:5, c:6}, ...]) > That's def starstarmap(f, it): return (f(**x) for x in it) That looks like a recipe, not a basis function ^^ -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Wed Apr 10 06:15:53 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Wed, 10 Apr 2019 12:15:53 +0200 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" In-Reply-To: <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> References: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> Message-ID: <4B50CAD7-B191-406F-89A8-3D0ACE7E131C@killingar.net> > On 10 Apr 2019, at 11:55, Krokosh Nikita wrote: > > I need smth like starstarmap('{a} / {b}/ {c}'.format, [{a:1, b:2, c:3}, {a:4, b:5, c:6}, ...]) Seems overly specific. Why not merge the dicts then call formal like normal? >> On 4/10/19 7:48 PM, Anders Hovm?ller wrote: >> I don't really understand. You can do: >> >> '{a} {b}'.format(**{'a': 1}, **{'b': 2}) >> >> Is that what you want? >> >>> On 10 Apr 2019, at 11:09, Krokosh Nikita wrote: >>> >>> Hello. I have a following question: How come there's no such thing in Python like starmap but which unpacks dicts as kwargs for fuction? >>> >>> For example I have a format string like "{param1}, {param2}" and want to get results passing list of dicts for it's .format(). >>> >>> Of course I can do that with genexpr or some lambda with map. >>> >>> But can you clarify why starmap version of map fuction exists and doublestarmap doesn't? >>> >>> >>> Best Regards. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ From storchaka at gmail.com Wed Apr 10 06:46:04 2019 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Apr 2019 13:46:04 +0300 Subject: [Python-ideas] Starap function exists but it seems there's not such thing as "doublestarmap" In-Reply-To: <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> References: <0339b854-e693-ce68-40e4-129c0ef9cb79@gmail.com> <1254d8d1-2990-aa42-5d3d-358dc7393ab2@gmail.com> Message-ID: 10.04.19 12:55, Krokosh Nikita ????: > I need smth like starstarmap('{a} / {b}/ {c}'.format, [{a:1, b:2, c:3}, > {a:4, b:5, c:6}, ...]) Use the format_map method of str. >>> list(map('{a} / {b}/ {c}'.format_map, [{'a':1, 'b':2, 'c':3}, {'a':4, 'b':5, 'c':6}])) ['1 / 2/ 3', '4 / 5/ 6'] From stefano.borini at gmail.com Wed Apr 10 18:09:28 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Wed, 10 Apr 2019 23:09:28 +0100 Subject: [Python-ideas] Exception for developer errors? Message-ID: I occasionally found situations where I want to raise an exception for errors that can only arise because the developer made a mistake, for example: - an abstract method is supposed to be reimplemented and its execution is supposed to leave some internal constraints of an object unchanged, but these are instead violated. - an impossible "else" condition after an if/elif, where the else cannot simply happen unless someone really screwed up the internal state of the object. In general, when these cases happen, I use RuntimeError, but other people may choose otherwise. I've also seen ValueError, plain Exception or a specifically made subclass used in these cases. Whatever the choice is, it generally lacks clarity of communication to whoever receives it. RuntimeError is a rather generic exception according to the documentation, and you can only rely on the message, which relies on writing something appropriate for the situation: ``` exception RuntimeError Raised when an error is detected that doesn?t fall in any of the other categories. The associated value is a string indicating what precisely went wrong. ``` while it would be useful to communicate clearly to whoever is using or modifying the code "listen, it's your mistake, you misunderstood how I work or you ruined my internals, leading me to an impossible state". Also, customers seeing this kind of exception would understand without a doubt that the application is broken because of an internal error. I tried some search on the mailing list but could not find anything at a glance about this topic. Was this already discussed in the past? Thanks -- Kind regards, Stefano Borini From elazarg at gmail.com Wed Apr 10 18:13:50 2019 From: elazarg at gmail.com (Elazar) Date: Thu, 11 Apr 2019 01:13:50 +0300 Subject: [Python-ideas] Exception for developer errors? In-Reply-To: References: Message-ID: You have AssertionError for that. Elazar ?????? ??? ??, 11 ????? 2019, 1:10, ??? Stefano Borini ?< stefano.borini at gmail.com>: > I occasionally found situations where I want to raise an exception for > errors that can only arise because the developer made a mistake, for > example: > > - an abstract method is supposed to be reimplemented and its execution > is supposed to leave some internal constraints of an object unchanged, > but these are instead violated. > - an impossible "else" condition after an if/elif, where the else > cannot simply happen unless someone really screwed up the internal > state of the object. > > In general, when these cases happen, I use RuntimeError, but other > people may choose otherwise. I've also seen ValueError, plain > Exception or a specifically made subclass used in these cases. > Whatever the choice is, it generally lacks clarity of communication to > whoever receives it. > RuntimeError is a rather generic exception according to the > documentation, and you can only rely on the message, which relies on > writing something appropriate for the situation: > > ``` > exception RuntimeError > Raised when an error is detected that doesn?t fall in any of the other > categories. The associated value is a string indicating what precisely > went wrong. > ``` > > while it would be useful to communicate clearly to whoever is using or > modifying the code "listen, it's your mistake, you misunderstood how I > work or you ruined my internals, leading me to an impossible state". > Also, customers seeing this kind of exception would understand without > a doubt that the application is broken because of an internal error. > > I tried some search on the mailing list but could not find anything at > a glance about this topic. Was this already discussed in the past? > > Thanks > > -- > Kind regards, > > Stefano Borini > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.Demeyer at UGent.be Wed Apr 10 18:14:54 2019 From: J.Demeyer at UGent.be (Jeroen Demeyer) Date: Thu, 11 Apr 2019 00:14:54 +0200 Subject: [Python-ideas] Exception for developer errors? In-Reply-To: References: Message-ID: <5CAE6ADE.5000108@UGent.be> On 2019-04-11 00:09, Stefano Borini wrote: > I occasionally found situations where I want to raise an exception for > errors that can only arise because the developer made a mistake, for > example: I use AssertionError for this. An assertion failure means "this is a bug", so that seems the right choice to me. You don't need to use an actual assert statement, you can manually raise AssertionError too. Jeroen. From stefano.borini at gmail.com Wed Apr 10 18:50:11 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Wed, 10 Apr 2019 23:50:11 +0100 Subject: [Python-ideas] Exception for developer errors? In-Reply-To: <5CAE6ADE.5000108@UGent.be> References: <5CAE6ADE.5000108@UGent.be> Message-ID: That's quite a good idea, but then I think it should be more explicit in the documentation that the purpose goes beyond the assert statement failure. I've never seen AssertionError raised manually. On Wed, 10 Apr 2019 at 23:15, Jeroen Demeyer wrote: > > On 2019-04-11 00:09, Stefano Borini wrote: > > I occasionally found situations where I want to raise an exception for > > errors that can only arise because the developer made a mistake, for > > example: > > I use AssertionError for this. An assertion failure means "this is a > bug", so that seems the right choice to me. You don't need to use an > actual assert statement, you can manually raise AssertionError too. > > > Jeroen. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Kind regards, Stefano Borini From rosuav at gmail.com Wed Apr 10 18:51:06 2019 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Apr 2019 08:51:06 +1000 Subject: [Python-ideas] Exception for developer errors? In-Reply-To: <5CAE6ADE.5000108@UGent.be> References: <5CAE6ADE.5000108@UGent.be> Message-ID: On Thu, Apr 11, 2019 at 8:15 AM Jeroen Demeyer wrote: > > On 2019-04-11 00:09, Stefano Borini wrote: > > I occasionally found situations where I want to raise an exception for > > errors that can only arise because the developer made a mistake, for > > example: > > I use AssertionError for this. An assertion failure means "this is a > bug", so that seems the right choice to me. You don't need to use an > actual assert statement, you can manually raise AssertionError too. > Agreed. It's worth noting that AssertionError isn't affected by the -O flag - only the assert *statement*. Also, anything that says "except AssertionError:" (outside of unit testing) should be considered a major bug, which in turn means that this should *only* be raised when you truly expect that normal usage cannot ever hit this. Which is perfect for the use-case you describe. ChrisA From cs at cskk.id.au Wed Apr 10 21:06:32 2019 From: cs at cskk.id.au (Cameron Simpson) Date: Thu, 11 Apr 2019 11:06:32 +1000 Subject: [Python-ideas] Exception for developer errors? In-Reply-To: References: Message-ID: <20190411010632.GA82518@cskk.homeip.net> On 10Apr2019 23:09, Stefano Borini wrote: >I occasionally found situations where I want to raise an exception for >errors that can only arise because the developer made a mistake, for >example: > >- an abstract method is supposed to be reimplemented and its execution >is supposed to leave some internal constraints of an object unchanged, >but these are instead violated. As mentioned, AssertionErrors are good for this. I also use the icontract PyPI library, which provides decorators for annotating functions with preconditions and postconditions, and which are disabled in the same circumstances where assertions are disabled. It produces nice exception messages, too, aiding debugging. Also, it inspects the function definition, and so your preconditions use the same parameter names as in the function header. >- an impossible "else" condition after an if/elif, where the else >cannot simply happen unless someone really screwed up the internal >state of the object. That I tend to use RuntimeError for. I accept that my criteria for this difference are nebulous. I think in my mind: from icontract import require @require(lambda s: s in ('a', 'b', 'c')) def f(s, z): if s == 'a': ... elif s == 'b': ... else: raise RuntimeError("valid input s=%r, but unhandled!" % s) The @require is an assertion that the _caller_ used us correctly. The RuntimeError means that _I_, the function implementor, have screwed up right here instead of in the larger programme. Cheers, Cameron Simpson From pythonchb at gmail.com Fri Apr 12 01:31:00 2019 From: pythonchb at gmail.com (Christopher Barker) Date: Thu, 11 Apr 2019 22:31:00 -0700 Subject: [Python-ideas] Sorted lists In-Reply-To: <9e90545a-334f-98f8-2203-48217d19815f@brice.xyz> References: <20190408023204.GH31406@ando.pearwood.info> <9e90545a-334f-98f8-2203-48217d19815f@brice.xyz> Message-ID: I are the appeal, but this really seems too special purpose for yet another dunder. A SortedList might be a good addition to the statistics package, however. I note that with regard to sorting, I suggested a __sort_key dunder so that the sorting routines could know how to efficiently sort arbitrary objects. It was pretty soundly rejected as adding too much overhead for a special case. This is different, but still a special case. It?s easy to think that everyone uses the language like you do, but needling pre-sorted lists is really a pretty special case. And if performance really matters, you should probably be using numpy anyway. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Fri Apr 12 10:25:53 2019 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 12 Apr 2019 16:25:53 +0200 Subject: [Python-ideas] Provide additional debug info for OSError and WindowsError Message-ID: When dealing with C extensions from Python there are circumstances where a function is called and we get an OSError(errno) exception without knowing what exactly went wrong internally. This is especially not obvious on Windows, where multiple MSDN APIs may be invoked within the same C function and we're not sure which one of them failed. There are other times where the underlying C syscall is obvious, but it would still be useful to append some additional information. One example is socket.bind(): https://github.com/python/cpython/blob/56065d4c8ac03042cb7e29ffda9b1ac544a37b4d/Lib/asyncio/base_events.py#L940-L949 In order to work around that in psutil (on Windows) I stored the debug msg string in OSError.filename attribute: https://github.com/giampaolo/psutil/pull/1428/ As such I was thinking that perhaps it would be nice to provide 2 new cPython APIs: PyErr_SetFromErrnoWithMsg(PyObject *type, const char *msg) PyErr_SetFromWindowsErrWithMsg(int ierr, const char *msg) PyErr_SetExcFromWindowsErrWithMsg(PyObject *type, int ierr, const char *msg) With this in place also OSError and WindowsError would probably have to host a new "extramsg" attribute or something (but not necessarily). Thoughts? -- Giampaolo - http://grodola.blogspot.com From viktor.roytman at gmail.com Fri Apr 12 11:10:57 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 11:10:57 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments Message-ID: Currently, unpacking a dict in order to pass its items as keyword arguments to a function will fail if there are keys present in the dict that are invalid keyword arguments: >>> def func(*, a): ... pass ... >>> func(**{'a': 1, 'b': 2}) Traceback (most recent call last): File "", line 1, in TypeError: func() got an unexpected keyword argument 'b' The standard approach I have encountered in this scenario is to pass in the keyword arguments explicitly like so func( a=kwargs_dict["a"], b=kwargs_dict["b"], c=kwargs_dict["c"], ) But this grows more cumbersome as the number of keyword arguments grows. There are a number of other workarounds, such as using a dict comprehension to select only the required keys, but I think it would be more convenient to have this be a feature of the language. I don't know what a nice syntax for this would be, or even how feasible it is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leban.us Fri Apr 12 11:25:20 2019 From: bruce at leban.us (Bruce Leban) Date: Fri, 12 Apr 2019 08:25:20 -0700 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman > >>> func(**{'a': 1, 'b': 2}) > Traceback (most recent call last): > File "", line 1, in > TypeError: func() got an unexpected keyword argument 'b' > Perhaps func(***kws)? I think this is a real problem given the frequent convention that you can freely add fields to json objects with the additional fields to be ignored. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Apr 12 11:47:21 2019 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Apr 2019 01:47:21 +1000 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Sat, Apr 13, 2019 at 1:12 AM Viktor Roytman wrote: > > Currently, unpacking a dict in order to pass its items as keyword arguments to a function will fail if there are keys present in the dict that are invalid keyword arguments: > > >>> def func(*, a): > ... pass > ... > >>> func(**{'a': 1, 'b': 2}) > Traceback (most recent call last): > File "", line 1, in > TypeError: func() got an unexpected keyword argument 'b' > > The standard approach I have encountered in this scenario is to pass in the keyword arguments explicitly like so > > func( > a=kwargs_dict["a"], > b=kwargs_dict["b"], > c=kwargs_dict["c"], > ) > > But this grows more cumbersome as the number of keyword arguments grows. > > There are a number of other workarounds, such as using a dict comprehension to select only the required keys, but I think it would be more convenient to have this be a feature of the language. I don't know what a nice syntax for this would be, or even how feasible it is. > I'm not 100% sure I understand your proposal, so I'm going to restate it; anywhere that I'm misrepresenting you, please clarify! Given this function and this dictionary: def func(*, a): pass args = {'a': 1, 'b': 2} you want to call the function, passing the recognized argument 'a' the value from the dict, but ignoring the superfluous 'b'. Are you able to alter the function? If so, just add kwargs to it: def func(*, a, **_): pass and then any unrecognized args will quietly land in the junk dictionary. ChrisA From viktor.roytman at gmail.com Fri Apr 12 12:01:43 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 09:01:43 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: <5ba3890e-28e2-4650-a761-76b04990e4fc@googlegroups.com> I could see this being an option, but to someone unfamiliar with it, it might seem strange that * unpacks iterables, ** unpacks dicts, and *** is a special thing only for keyword arguments that mostly behaves like **. On Friday, April 12, 2019 at 11:26:37 AM UTC-4, Bruce Leban wrote: > > > On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman wrote: > >> >> >>> func(**{'a': 1, 'b': 2}) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: func() got an unexpected keyword argument 'b' >> > > Perhaps func(***kws)? > > I think this is a real problem given the frequent convention that you can > freely add fields to json objects with the additional fields to be ignored. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viktor.roytman at gmail.com Fri Apr 12 12:02:42 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 09:02:42 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: That is certainly an option for functions that you have defined for yourself, but generally not an option for a function from a library. I am interested in a solution that works in general. On Friday, April 12, 2019 at 11:48:38 AM UTC-4, Chris Angelico wrote: > > On Sat, Apr 13, 2019 at 1:12 AM Viktor Roytman > wrote: > > > > Currently, unpacking a dict in order to pass its items as keyword > arguments to a function will fail if there are keys present in the dict > that are invalid keyword arguments: > > > > >>> def func(*, a): > > ... pass > > ... > > >>> func(**{'a': 1, 'b': 2}) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: func() got an unexpected keyword argument 'b' > > > > The standard approach I have encountered in this scenario is to pass in > the keyword arguments explicitly like so > > > > func( > > a=kwargs_dict["a"], > > b=kwargs_dict["b"], > > c=kwargs_dict["c"], > > ) > > > > But this grows more cumbersome as the number of keyword arguments grows. > > > > There are a number of other workarounds, such as using a dict > comprehension to select only the required keys, but I think it would be > more convenient to have this be a feature of the language. I don't know > what a nice syntax for this would be, or even how feasible it is. > > > > I'm not 100% sure I understand your proposal, so I'm going to restate > it; anywhere that I'm misrepresenting you, please clarify! > > Given this function and this dictionary: > > def func(*, a): > pass > > args = {'a': 1, 'b': 2} > > you want to call the function, passing the recognized argument 'a' the > value from the dict, but ignoring the superfluous 'b'. > > Are you able to alter the function? If so, just add kwargs to it: > > def func(*, a, **_): > pass > > and then any unrecognized args will quietly land in the junk dictionary. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucas.bourneuf at inria.fr Fri Apr 12 12:17:28 2019 From: lucas.bourneuf at inria.fr (Lucas Bourneuf) Date: Fri, 12 Apr 2019 18:17:28 +0200 (CEST) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments Message-ID: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> Hello ! I made *fief*, a small python package allowing just that using a decorator: from fief import filter_effective_parameters as fief @fief def func(a, b): # some implementation # and then, use it as you want to: func(**MY_BIG_CONFIG_DICT_WITH_MANY_WEIRD_KEYS) The code is quite simple. You may want to use it, with modifications (i didn't touch the code for months, maybe years ; it could probably be improved now). Link: https://github.com/aluriak/fief The code: def filter_effective_parameters(func): """Decorator that filter out parameters in kwargs that are not related to any formal parameter of the given function. """ @wraps(func) def wrapper(*args, **kwargs): formal_parameters = frozenset(signature(func).parameters.keys()) return func(*args, **{ arg: value for arg, value in kwargs.items() if arg in formal_parameters }) return wrapper Best regards, --lucas ----- Mail original ----- > De: "Viktor Roytman" > ?: "python-ideas" > Envoy?: Vendredi 12 Avril 2019 18:01:43 > Objet: Re: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments > I could see this being an option, but to someone unfamiliar with it, it > might seem strange that * unpacks iterables, ** unpacks dicts, and *** is a > special thing only for keyword arguments that mostly behaves like **. > > On Friday, April 12, 2019 at 11:26:37 AM UTC-4, Bruce Leban wrote: >> >> >> On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman > wrote: >> >>> >>> >>> func(**{'a': 1, 'b': 2}) >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: func() got an unexpected keyword argument 'b' >>> >> >> Perhaps func(***kws)? >> >> I think this is a real problem given the frequent convention that you can >> freely add fields to json objects with the additional fields to be ignored. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From eryksun at gmail.com Fri Apr 12 12:29:23 2019 From: eryksun at gmail.com (eryk sun) Date: Fri, 12 Apr 2019 11:29:23 -0500 Subject: [Python-ideas] Provide additional debug info for OSError and WindowsError In-Reply-To: References: Message-ID: On 4/12/19, Giampaolo Rodola' wrote: > > As such I was thinking that perhaps it would be nice to provide 2 new > cPython APIs: > > PyErr_SetFromErrnoWithMsg(PyObject *type, const char *msg) > PyErr_SetFromWindowsErrWithMsg(int ierr, const char *msg) > PyErr_SetExcFromWindowsErrWithMsg(PyObject *type, int ierr, const char > *msg) > > With this in place also OSError and WindowsError would probably have > to host a new "extramsg" attribute or something (but not necessarily). Existing error handling would benefit from this proposal. win32_error [1], win32_error_object_error, and PyErr_SetFromWindowsErrWithFunction [2] take a function name that's currently ignored. [1]: https://github.com/python/cpython/blob/v3.7.3/Modules/posixmodule.c#L1403 [2]: https://github.com/python/cpython/blob/v3.7.3/PC/winreg.c#L26 From rhodri at kynesim.co.uk Fri Apr 12 11:29:52 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Fri, 12 Apr 2019 16:29:52 +0100 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On 12/04/2019 16:10, Viktor Roytman wrote: > Currently, unpacking a dict in order to pass its items as keyword arguments > to a function will fail if there are keys present in the dict that are > invalid keyword arguments: > > >>> def func(*, a): > ... pass > ... > >>> func(**{'a': 1, 'b': 2}) > Traceback (most recent call last): > File "", line 1, in > TypeError: func() got an unexpected keyword argument 'b' > > The standard approach I have encountered in this scenario is to pass in the > keyword arguments explicitly like so > > func( > a=kwargs_dict["a"], > b=kwargs_dict["b"], > c=kwargs_dict["c"], > ) > > But this grows more cumbersome as the number of keyword arguments grows. > > There are a number of other workarounds, such as using a dict comprehension > to select only the required keys, but I think it would be more convenient > to have this be a feature of the language. I don't know what a nice syntax > for this would be, or even how feasible it is. What circumstance do you want to do this in that simply passing the dictionary as itself won't do for? -- Rhodri James *-* Kynesim Ltd From viktor.roytman at gmail.com Fri Apr 12 13:50:36 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 10:50:36 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: Any time I am using a function from a library that accepts keyword arguments. For example, an ORM model constructor that accepts fields as keyword arguments (like Django). On Friday, April 12, 2019 at 12:57:43 PM UTC-4, Rhodri James wrote: > > On 12/04/2019 16:10, Viktor Roytman wrote: > > Currently, unpacking a dict in order to pass its items as keyword > arguments > > to a function will fail if there are keys present in the dict that are > > invalid keyword arguments: > > > > >>> def func(*, a): > > ... pass > > ... > > >>> func(**{'a': 1, 'b': 2}) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: func() got an unexpected keyword argument 'b' > > > > The standard approach I have encountered in this scenario is to pass in > the > > keyword arguments explicitly like so > > > > func( > > a=kwargs_dict["a"], > > b=kwargs_dict["b"], > > c=kwargs_dict["c"], > > ) > > > > But this grows more cumbersome as the number of keyword arguments grows. > > > > There are a number of other workarounds, such as using a dict > comprehension > > to select only the required keys, but I think it would be more > convenient > > to have this be a feature of the language. I don't know what a nice > syntax > > for this would be, or even how feasible it is. > > What circumstance do you want to do this in that simply passing the > dictionary as itself won't do for? > > -- > Rhodri James *-* Kynesim Ltd > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viktor.roytman at gmail.com Fri Apr 12 13:52:45 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 10:52:45 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> References: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> Message-ID: <7383f666-8ad0-43b3-a04e-66af099178b2@googlegroups.com> That is an interesting solution. In the case of a function from another library, you could apply the decorator as needed like. fief(func)(**{'a': 1, 'b': 2}) It looks a little strange, but I've seen stranger. On Friday, April 12, 2019 at 12:18:34 PM UTC-4, Lucas Bourneuf wrote: > > Hello ! > > I made *fief*, a small python package allowing just that using a > decorator: > > from fief import filter_effective_parameters as fief > > @fief > def func(a, b): > # some implementation > > # and then, use it as you want to: > func(**MY_BIG_CONFIG_DICT_WITH_MANY_WEIRD_KEYS) > > The code is quite simple. You may want to use it, with modifications (i > didn't touch the code for months, maybe years ; it could probably be > improved now). > Link: https://github.com/aluriak/fief > > The code: > > def filter_effective_parameters(func): > """Decorator that filter out parameters in kwargs that are not related > to > any formal parameter of the given function. > """ > @wraps(func) > def wrapper(*args, **kwargs): > formal_parameters = frozenset(signature(func).parameters.keys()) > return func(*args, **{ > arg: value > for arg, value in kwargs.items() > if arg in formal_parameters > }) > return wrapper > > Best regards, > --lucas > > > > ----- Mail original ----- > > De: "Viktor Roytman" > > > ?: "python-ideas" > > > Envoy?: Vendredi 12 Avril 2019 18:01:43 > > Objet: Re: [Python-ideas] Syntax for allowing extra keys when unpacking > a dict as keyword arguments > > > I could see this being an option, but to someone unfamiliar with it, it > > might seem strange that * unpacks iterables, ** unpacks dicts, and *** > is a > > special thing only for keyword arguments that mostly behaves like **. > > > > On Friday, April 12, 2019 at 11:26:37 AM UTC-4, Bruce Leban wrote: > >> > >> > >> On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman >> wrote: > >> > >>> > >>> >>> func(**{'a': 1, 'b': 2}) > >>> Traceback (most recent call last): > >>> File "", line 1, in > >>> TypeError: func() got an unexpected keyword argument 'b' > >>> > >> > >> Perhaps func(***kws)? > >> > >> I think this is a real problem given the frequent convention that you > can > >> freely add fields to json objects with the additional fields to be > ignored. > >> > > > > _______________________________________________ > > Python-ideas mailing list > > Python... at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Apr 12 12:29:05 2019 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 12 Apr 2019 12:29:05 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> References: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> Message-ID: Viktor is looking for something at the call site, not the function definition site (which he might not control). I wrote calllib (horrible name, I know). It can do this, although I just noticed it needs updating for keyword-only params. But, here it is without keyword-only params: >>> from calllib import apply >>> def func(a): ... print(f'a={a!r}') ... >>> func(**{'a': 1, 'b': 2}) Traceback (most recent call last): File "", line 1, in TypeError: func() got an unexpected keyword argument 'b' >>> apply(func, {'a': 1, 'b': 2}) a=1 I don't claim it's an efficient solution, since it inspects the callable every time to extract its params. But it will work for this use case. Or at least it will when I update it for keyword-only params. It works today without keyword-only params. https://pypi.org/project/calllib/ Eric On 4/12/2019 12:17 PM, Lucas Bourneuf wrote: > Hello ! > > I made *fief*, a small python package allowing just that using a decorator: > > from fief import filter_effective_parameters as fief > > @fief > def func(a, b): > # some implementation > > # and then, use it as you want to: > func(**MY_BIG_CONFIG_DICT_WITH_MANY_WEIRD_KEYS) > > The code is quite simple. You may want to use it, with modifications (i didn't touch the code for months, maybe years ; it could probably be improved now). > Link: https://github.com/aluriak/fief > > The code: > > def filter_effective_parameters(func): > """Decorator that filter out parameters in kwargs that are not related to > any formal parameter of the given function. > """ > @wraps(func) > def wrapper(*args, **kwargs): > formal_parameters = frozenset(signature(func).parameters.keys()) > return func(*args, **{ > arg: value > for arg, value in kwargs.items() > if arg in formal_parameters > }) > return wrapper > > Best regards, > --lucas > > > > ----- Mail original ----- >> De: "Viktor Roytman" >> ?: "python-ideas" >> Envoy?: Vendredi 12 Avril 2019 18:01:43 >> Objet: Re: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments > >> I could see this being an option, but to someone unfamiliar with it, it >> might seem strange that * unpacks iterables, ** unpacks dicts, and *** is a >> special thing only for keyword arguments that mostly behaves like **. >> >> On Friday, April 12, 2019 at 11:26:37 AM UTC-4, Bruce Leban wrote: >>> >>> >>> On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman >> wrote: >>> >>>> >>>> >>> func(**{'a': 1, 'b': 2}) >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> TypeError: func() got an unexpected keyword argument 'b' >>>> >>> >>> Perhaps func(***kws)? >>> >>> I think this is a real problem given the frequent convention that you can >>> freely add fields to json objects with the additional fields to be ignored. >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rhodri at kynesim.co.uk Fri Apr 12 14:17:47 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Fri, 12 Apr 2019 19:17:47 +0100 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: <568c036b-53af-0941-1dcd-9df39f5d531c@kynesim.co.uk> Re-ordered to avoid top-posting... On 12/04/2019 18:50, Viktor Roytman wrote: > On Friday, April 12, 2019 at 12:57:43 PM UTC-4, Rhodri James wrote: >> >> On 12/04/2019 16:10, Viktor Roytman wrote: >>> Currently, unpacking a dict in order to pass its items as keyword >> arguments >>> to a function will fail if there are keys present in the dict that are >>> invalid keyword arguments: >>> >>> >>> def func(*, a): >>> ... pass >>> ... >>> >>> func(**{'a': 1, 'b': 2}) >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: func() got an unexpected keyword argument 'b' >>> >>> The standard approach I have encountered in this scenario is to pass in >> the >>> keyword arguments explicitly like so >>> >>> func( >>> a=kwargs_dict["a"], >>> b=kwargs_dict["b"], >>> c=kwargs_dict["c"], >>> ) Hang on, I missed this first time around. This gives you exactly the same problem: Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> def func(*, a): ... pass ... >>> d = {"a":1, "b":2} >>> func(**d) Traceback (most recent call last): File "", line 1, in TypeError: func() got an unexpected keyword argument 'b' >>> func(a=d["a"], b=d["b"]) Traceback (most recent call last): File "", line 1, in TypeError: func() got an unexpected keyword argument 'b' >>> >>> But this grows more cumbersome as the number of keyword arguments grows. >>> >>> There are a number of other workarounds, such as using a dict >> comprehension >>> to select only the required keys, but I think it would be more >> convenient >>> to have this be a feature of the language. I don't know what a nice >> syntax >>> for this would be, or even how feasible it is. >> >> What circumstance do you want to do this in that simply passing the >> dictionary as itself won't do for? >> > Any time I am using a function from a library that accepts keyword > arguments. For example, an ORM model constructor that accepts fields as > keyword arguments (like Django). That's not the same issue at all, if I'm understanding you correctly. In any case, surely you need to do some validation on your dictionary of keyword arguments? Otherwise you are setting yourself up for a world of pain. If you do that, you should take the opportunity to decide what to do with invalid keys. -- Rhodri James *-* Kynesim Ltd From viktor.roytman at gmail.com Fri Apr 12 14:46:29 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 11:46:29 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <568c036b-53af-0941-1dcd-9df39f5d531c@kynesim.co.uk> References: <568c036b-53af-0941-1dcd-9df39f5d531c@kynesim.co.uk> Message-ID: > > > Any time I am using a function from a library that accepts keyword > > arguments. For example, an ORM model constructor that accepts fields as > > keyword arguments (like Django). > > That's not the same issue at all, if I'm understanding you correctly. > In any case, surely you need to do some validation on your dictionary of > keyword arguments? Otherwise you are setting yourself up for a world of > pain. If you do that, you should take the opportunity to decide what to > do with invalid keys. > The specific pain point that motivated this was constructing many interrelated models using a dict. So, for example, if there is a User model with a related Address model, and the input is user_kwargs = dict( name='user', age=20, address=dict( city='city', state='ST', ), ) then passing this information in to the User constructor directly will fail, because User.address is a related model, not a simple field. I agree that most of the time you should want unexpected keyword arguments to raise an exception, but in specific circumstances it can be helpful to extract only the pieces of a dict that are relevant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Apr 12 14:55:30 2019 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 12 Apr 2019 11:55:30 -0700 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: I don't think it's possible to make this work reliably. In particular, it's an important feature of python that you can make wrappers that pass through arguments and are equivalent to the original function: def original(a=0): ... def wrapper(*args, **kwargs): return original(*args, **kwargs) Right now these can be called in exactly the same ways. But with the proposal they would become different: # ok original(***{"a": 1, "b": 2}) # raises TypeError wrapper(***{"a": 1, "b": 2}) The problem is that the extra star gets lost when passing through the wrapper. In this case you might be able to fix this by using functools.wraps to fix up the signature introspection metadata, but that doesn't work in more complex cases (e.g. when the wrapper adds/removes some args while passing through the rest). In Python, signature introspection is a best-effort thing, and IME not super reliable. -n On Fri, Apr 12, 2019, 08:11 Viktor Roytman wrote: > Currently, unpacking a dict in order to pass its items as keyword > arguments to a function will fail if there are keys present in the dict > that are invalid keyword arguments: > > >>> def func(*, a): > ... pass > ... > >>> func(**{'a': 1, 'b': 2}) > Traceback (most recent call last): > File "", line 1, in > TypeError: func() got an unexpected keyword argument 'b' > > The standard approach I have encountered in this scenario is to pass in > the keyword arguments explicitly like so > > func( > a=kwargs_dict["a"], > b=kwargs_dict["b"], > c=kwargs_dict["c"], > ) > > But this grows more cumbersome as the number of keyword arguments grows. > > There are a number of other workarounds, such as using a dict > comprehension to select only the required keys, but I think it would be > more convenient to have this be a feature of the language. I don't know > what a nice syntax for this would be, or even how feasible it is. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Apr 12 13:16:55 2019 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 12 Apr 2019 13:16:55 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: <3094beaa-7f77-45bc-54a9-80b557d3d750@trueblade.com> On 4/12/2019 11:29 AM, Rhodri James wrote: > On 12/04/2019 16:10, Viktor Roytman wrote: >> Currently, unpacking a dict in order to pass its items as keyword >> arguments >> to a function will fail if there are keys present in the dict that are >> invalid keyword arguments: >> >> ???? >>> def func(*, a): >> ???? ...???? pass >> ???? ... >> ???? >>> func(**{'a': 1, 'b': 2}) >> ???? Traceback (most recent call last): >> ?????? File "", line 1, in >> ???? TypeError: func() got an unexpected keyword argument 'b' >> >> The standard approach I have encountered in this scenario is to pass >> in the >> keyword arguments explicitly like so >> >> ???? func( >> ???????? a=kwargs_dict["a"], >> ???????? b=kwargs_dict["b"], >> ???????? c=kwargs_dict["c"], >> ???? ) >> >> But this grows more cumbersome as the number of keyword arguments grows. >> >> There are a number of other workarounds, such as using a dict >> comprehension >> to select only the required keys, but I think it would be more convenient >> to have this be a feature of the language. I don't know what a nice >> syntax >> for this would be, or even how feasible it is. > > What circumstance do you want to do this in that simply passing the > dictionary as itself won't do for? I don't want to speak for the OP, but I have a similar use case (which is why I wrote calllib). My use case is: I have number of callables that I don't control. I also have a dict of parameters that the callables might take as parameters. I want to call one of the callables, passing only the subset of parameters that that particular callable takes. Eric From eric at trueblade.com Fri Apr 12 14:53:49 2019 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 12 Apr 2019 14:53:49 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <7383f666-8ad0-43b3-a04e-66af099178b2@googlegroups.com> References: <1254621678.677715.1555085848518.JavaMail.zimbra@inria.fr> <7383f666-8ad0-43b3-a04e-66af099178b2@googlegroups.com> Message-ID: <910d66e4-5f03-9956-d8c8-3f1035a09f33@trueblade.com> On 4/12/2019 1:52 PM, Viktor Roytman wrote: > That is an interesting solution. In the case of a function from another > library, you could apply the decorator as needed like. > > ??? fief(func)(**{'a': 1, 'b': 2}) > > It looks a little strange, but I've seen stranger. Indeed. That's not so bad, and I'll look into using fief. And this is definitely better (for various values of "better") than language syntax to achieve the same thing. Eric > > On Friday, April 12, 2019 at 12:18:34 PM UTC-4, Lucas Bourneuf wrote: > > Hello ! > > I made *fief*, a small python package allowing just that using a > decorator: > > ? ? from fief import filter_effective_parameters as fief > > ? ? @fief > ? ? def func(a, b): > ? ? ? ? # some implementation > > ? ? # and then, use it as you want to: > ? ? func(**MY_BIG_CONFIG_DICT_WITH_MANY_WEIRD_KEYS) > > The code is quite simple. You may want to use it, with modifications > (i didn't touch the code for months, maybe years ; it could probably > be improved now). > Link: https://github.com/aluriak/fief > > The code: > > def filter_effective_parameters(func): > ? ? """Decorator that filter out parameters in kwargs that are not > related to > ? ? any formal parameter of the given function. > ? ? """ > ? ? @wraps(func) > ? ? def wrapper(*args, **kwargs): > ? ? ? ? formal_parameters = > frozenset(signature(func).parameters.keys()) > ? ? ? ? return func(*args, **{ > ? ? ? ? ? ? arg: value > ? ? ? ? ? ? for arg, value in kwargs.items() > ? ? ? ? ? ? if arg in formal_parameters > ? ? ? ? }) > ? ? return wrapper > > Best regards, > --lucas > > > > ----- Mail original ----- > > De: "Viktor Roytman" > > > ?: "python-ideas" > > > Envoy?: Vendredi 12 Avril 2019 18:01:43 > > Objet: Re: [Python-ideas] Syntax for allowing extra keys when > unpacking a dict as keyword arguments > > > I could see this being an option, but to someone unfamiliar with > it, it > > might seem strange that * unpacks iterables, ** unpacks dicts, > and *** is a > > special thing only for keyword arguments that mostly behaves like > **. > > > > On Friday, April 12, 2019 at 11:26:37 AM UTC-4, Bruce Leban wrote: > >> > >> > >> On Fri, Apr 12, 2019, 8:12 AM Viktor Roytman >> wrote: > >> > >>> > >>> ? ? >>> func(**{'a': 1, 'b': 2}) > >>> ? ? Traceback (most recent call last): > >>> ? ? ? File "", line 1, in > >>> ? ? TypeError: func() got an unexpected keyword argument 'b' > >>> > >> > >> Perhaps func(***kws)? > >> > >> I think this is a real problem given the frequent convention > that you can > >> freely add fields to json objects with the additional fields to > be ignored. > >> > > > > _______________________________________________ > > Python-ideas mailing list > > Python... at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From boxed at killingar.net Fri Apr 12 16:00:32 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Fri, 12 Apr 2019 22:00:32 +0200 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <3094beaa-7f77-45bc-54a9-80b557d3d750@trueblade.com> References: <3094beaa-7f77-45bc-54a9-80b557d3d750@trueblade.com> Message-ID: <8E23FA60-19D1-4A3B-A0EB-3550A9A40F61@killingar.net> > On 12 Apr 2019, at 19:16, Eric V. Smith wrote: > > I don't want to speak for the OP, but I have a similar use case (which is why I wrote calllib). My use case is: I have number of callables that I don't control. I also have a dict of parameters that the callables might take as parameters. I want to call one of the callables, passing only the subset of parameters that that particular callable takes. Could you expand on "that I don't control"? Where do these come from? We have similar used cases in libs we've created but there we define that the API is that you must do **_ to be compatible with future versions of the lib. / Anders From eric at trueblade.com Fri Apr 12 16:19:28 2019 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 12 Apr 2019 16:19:28 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <8E23FA60-19D1-4A3B-A0EB-3550A9A40F61@killingar.net> References: <3094beaa-7f77-45bc-54a9-80b557d3d750@trueblade.com> <8E23FA60-19D1-4A3B-A0EB-3550A9A40F61@killingar.net> Message-ID: <0c63517b-7db2-2acb-5874-d61c2d04fc0b@trueblade.com> On 4/12/2019 4:00 PM, Anders Hovm?ller wrote: > > >> On 12 Apr 2019, at 19:16, Eric V. Smith wrote: >> >> I don't want to speak for the OP, but I have a similar use case (which is why I wrote calllib). My use case is: I have number of callables that I don't control. I also have a dict of parameters that the callables might take as parameters. I want to call one of the callables, passing only the subset of parameters that that particular callable takes. > > > Could you expand on "that I don't control"? Where do these come from? Their names come from a config file. I dynamically load them and call them. I wrote some of them, other people wrote others. It's an internal corporate app, and there are a lot of callables, all with different release schedules and controlled by different teams. Over time, the interface to these callables has expanded. First, it took just x, then a few of them needed x and y, and others needed x and z. I realize this isn't the greatest interface, and in an ideal world we would have come up with a better way to specify this. But it evolved over time, and it is what it is. > We have similar used cases in libs we've created but there we define that the API is that you must do **_ to be compatible with future versions of the lib. Unfortunately I don't control the interface or the source to the code I'm calling, so the best I've been able to do is only call each function with the parameters I know it expects, based on its signature. At least the parameter names are well defined. Eric From viktor.roytman at gmail.com Fri Apr 12 17:03:43 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Fri, 12 Apr 2019 14:03:43 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: <568c036b-53af-0941-1dcd-9df39f5d531c@kynesim.co.uk> Message-ID: <0006a323-302b-45ff-ac6a-dac79cf0807d@googlegroups.com> > > That seems to me to be quite different issue. Just throwing invalid stuff > on the ground in this scenario will avoid a crash but lose data. This seems > much worse to me than the crash. > Throwing it away does seem extreme. Maybe something that indicates what's left over? In other words: result, leftover_kwargs = func(kwargs) or result = func(kwargs) assert kwargs == {'whatever is': 'left over'} -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Apr 12 19:08:56 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Apr 2019 11:08:56 +1200 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: <5CB11A88.8020200@canterbury.ac.nz> Bruce Leban wrote: > I think this is a real problem given the frequent convention that you > can freely add fields to json objects with the additional fields to be > ignored. Unpacking json objects isn't something one does every day, so I don't think it would be worth adding syntax just for this. Rather than new syntax, how about a helper function such as: def call_with_kwds_from(func, kwds): code = func.__code__ names = code.co_varnames[:code.co_argcount] func(**{name : kwds[name] for name in names}) Usage example: def f(a, b): print("a =", a, "b =", b) d = {"a": "a_value", "b": "b_value", "c": "something extra"} call_with_kwds_from(f, d) -- Greg From amber.yust at gmail.com Fri Apr 12 19:20:08 2019 From: amber.yust at gmail.com (Amber Yust) Date: Fri, 12 Apr 2019 16:20:08 -0700 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: <5CB11A88.8020200@canterbury.ac.nz> References: <5CB11A88.8020200@canterbury.ac.nz> Message-ID: I'm not a fan of this idea for two (related) reasons: 1) This seems like something that's a relatively niche case to *want* to do rather than doing unintentionally. 2) This vastly increases the chance of silent bugs in code. With regards to #1, the main examples I've seen posited in this thread are unpacking of unstructured object formats (e.g. JSON) and applying subsets of key-values from a configuration file [which is basically another unstructured object format). This seems like the level of functionality that I'd expect a helper function to be the right choice for - it can be part of the overall library handling the objects, or coded as a one-off for smaller projects. Regarding #2, passing unknown bags of values to functions is an easy way to run into both logic and security errors - see for example the problems that the Ruby on Rails ORM ran into with its batch-setter functionality when developers didn't properly limit what values in objects could be updated and forgot to sanitize inputs. Similarly, for keyword args with default values, adding this functionality would make it easy for a typo to result in a parameter that a user expects to be getting passed in not actually getting passed, because the function got {foobra=1} instead of {foobar=1} and silently discarded foobra=1 in favor of the default value for foobar. There may be cases where passing an unknown bag is what you want to do, but it's probably worth requiring that to be more explicit and intentional rather than an easily-missed extra asterisk. On Fri, Apr 12, 2019 at 4:09 PM Greg Ewing wrote: > Bruce Leban wrote: > > I think this is a real problem given the frequent convention that you > > can freely add fields to json objects with the additional fields to be > > ignored. > > Unpacking json objects isn't something one does every day, so I > don't think it would be worth adding syntax just for this. > > Rather than new syntax, how about a helper function such as: > > def call_with_kwds_from(func, kwds): > code = func.__code__ > names = code.co_varnames[:code.co_argcount] > func(**{name : kwds[name] for name in names}) > > Usage example: > > def f(a, b): > print("a =", a, "b =", b) > > d = {"a": "a_value", "b": "b_value", "c": "something extra"} > > call_with_kwds_from(f, d) > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Sat Apr 13 03:38:04 2019 From: contact at brice.xyz (Brice Parent) Date: Sat, 13 Apr 2019 09:38:04 +0200 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: Le 12/4/19 ? 18:02, Viktor Roytman a ?crit?: > That is certainly an option for functions that you have defined for > yourself, but generally not an option for a function from a library. I > am interested in a solution that works in general. This seems like a very specific problem. And in that case, I'm not sure that the solution should come from the language, but there definitely are ways to do it using an intermediary calling function. That calling function would just do one of the following things: * inspect the signature of the called function and only pass the arguments it needs * contain a lookup table for which arguments are required by which function and use it to only pass the ones it needs * or probably what I would do: def call_func(func_name, **kwargs): ??? try: ??????? return globals()[func_name](**kwargs)? # Not sure how you're doing it ??? except TypeError as e: ??????? stripped_arg = e[len(func_name) + 39:-1]? # There's probably a more elegant way of doing it ??????? # Maybe warn the user that he's using an API that is under depreciation process, so that the authors may know they should add **kwargs to its signature ??????? kwargs = del kwargs[stripped_arg] ??????? return call_func(func_name, **kwargs)? # Try to make the call again (I haven't tried the thing, it's just an idea of what I'd probably do in your situation) > > On Friday, April 12, 2019 at 11:48:38 AM UTC-4, Chris Angelico wrote: > > On Sat, Apr 13, 2019 at 1:12 AM Viktor Roytman > > wrote: > > > > Currently, unpacking a dict in order to pass its items as > keyword arguments to a function will fail if there are keys > present in the dict that are invalid keyword arguments: > > > > ? ? >>> def func(*, a): > > ? ? ... ? ? pass > > ? ? ... > > ? ? >>> func(**{'a': 1, 'b': 2}) > > ? ? Traceback (most recent call last): > > ? ? ? File "", line 1, in > > ? ? TypeError: func() got an unexpected keyword argument 'b' > > > > The standard approach I have encountered in this scenario is to > pass in the keyword arguments explicitly like so > > > > ? ? func( > > ? ? ? ? a=kwargs_dict["a"], > > ? ? ? ? b=kwargs_dict["b"], > > ? ? ? ? c=kwargs_dict["c"], > > ? ? ) > > > > But this grows more cumbersome as the number of keyword > arguments grows. > > > > There are a number of other workarounds, such as using a dict > comprehension to select only the required keys, but I think it > would be more convenient to have this be a feature of the > language. I don't know what a nice syntax for this would be, or > even how feasible it is. > > > > I'm not 100% sure I understand your proposal, so I'm going to restate > it; anywhere that I'm misrepresenting you, please clarify! > > Given this function and this dictionary: > > def func(*, a): > ? ? pass > > args = {'a': 1, 'b': 2} > > you want to call the function, passing the recognized argument 'a' > the > value from the dict, but ignoring the superfluous 'b'. > > Are you able to alter the function? If so, just add kwargs to it: > > def func(*, a, **_): > ? ? pass > > and then any unrecognized args will quietly land in the junk > dictionary. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Sat Apr 13 08:59:35 2019 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Sat, 13 Apr 2019 08:59:35 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Fri, Apr 12, 2019 at 11:10 AM Viktor Roytman wrote: > > The standard approach I have encountered in this scenario is to pass in > the keyword arguments explicitly like so > > func( > a=kwargs_dict["a"], > b=kwargs_dict["b"], > c=kwargs_dict["c"], > ) > func(**{k:v for k, v in d.items() if k in ('a','b','c')) Or you can `def dict_filter(d, yes)` to the the above. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Apr 13 09:02:39 2019 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Apr 2019 23:02:39 +1000 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Sat, Apr 13, 2019 at 11:00 PM Juancarlo A?ez wrote: > func(**{k:v for k, v in d.items() if k in ('a','b','c')) > Would be really nice to be able to spell this as a dict/set intersection. func(**(d & {'a', 'b', 'c'})) ChrisA From viktor.roytman at gmail.com Sat Apr 13 12:34:53 2019 From: viktor.roytman at gmail.com (Viktor Roytman) Date: Sat, 13 Apr 2019 09:34:53 -0700 (PDT) Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: <8e9de5dc-f663-442e-a497-a366dc03db5e@googlegroups.com> On Saturday, April 13, 2019 at 9:03:54 AM UTC-4, Chris Angelico wrote: > > On Sat, Apr 13, 2019 at 11:00 PM Juancarlo A?ez > wrote: > > func(**{k:v for k, v in d.items() if k in ('a','b','c')) > > > > Would be really nice to be able to spell this as a dict/set intersection. > > func(**(d & {'a', 'b', 'c'})) > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > I like this idea. It accomplishes the goal of only using some of the keys and has broader applications for working with dicts in general. It might also be nice to have something that splits a dict into two, like >>> items = dict(a=1, b=2, c=3) >>> included, excluded = items &- {'a', 'b'} >>> print(included) {'a': 1, 'b': 2} >>> print(excluded) {'c': 3} I don't know if I like the &- "operator" but it is illustrative. -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Sat Apr 13 20:15:35 2019 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Sat, 13 Apr 2019 20:15:35 -0400 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Sat, Apr 13, 2019 at 9:02 AM Chris Angelico wrote: > Would be really nice to be able to spell this as a dict/set intersection. > > func(**(d & {'a', 'b', 'c'})) > That would be _very_ consistent with the ongoing discussions about operators over dicts. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Apr 13 20:16:14 2019 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Apr 2019 10:16:14 +1000 Subject: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments In-Reply-To: References: Message-ID: On Sun, Apr 14, 2019 at 10:15 AM Juancarlo A?ez wrote: > > > > On Sat, Apr 13, 2019 at 9:02 AM Chris Angelico wrote: >> >> Would be really nice to be able to spell this as a dict/set intersection. >> >> func(**(d & {'a', 'b', 'c'})) > > > That would be _very_ consistent with the ongoing discussions about operators over dicts. > I believe it's already been mentioned. ChrisA From solipsis at pitrou.net Mon Apr 15 16:07:18 2019 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 15 Apr 2019 22:07:18 +0200 Subject: [Python-ideas] Logical tracebacks Message-ID: <20190415220718.0bb113cc@fsol> Hello, I apologize because I'm only going to throw a very vague idea and I don't currently have time or motivation to explore it myself. But I think it may prove interesting for other people and perhaps spur some concrete actionable proposal. With the growing complexity of Python software stacks, the length of tracebacks is continuously growing and is frequently making debugging errors and issues more tedious than it should be. This is a language-agnostic problem. Java software is often mocked for its ridiculously long tracebacks, but Python might come close in the future. Especially since Python is often the a language of choice for non computer science professionals, including but not only as a teaching language, this would be a problem worth solving. We already recognized the issue some years ago, and even implemented a focussed fix for one specific context: the elision of importlib frames when an import error occurs: https://bugs.python.org/issue15110 However, there are many contexts where implementation details would benefit from being hidden from tracebacks (the classical example being the internals of framework or middleware code, such as Django, Dask, etc.). We would therefore have to define some kind of protocol by which tracebacks can be enumerated, not only as frames, but as logical execution blocks, comprised of one or several frames each, whose boundaries would reflect the boundaries of the various logical execution layers (again: framework, middleware...) involved in the frame stack. We would probably also need some flag(s) to disable the feature in cases where the full stack frame is wanted (I imagine elaborate UIs could also allow switching back and forth from both representations). This would need a lot more thinking, and perhaps exploring what kind of hacks already exist in the wild to achieve similar functionality. Again, I'm just throwing this around for others to play with. Regards Antoine. From christian at python.org Mon Apr 15 16:21:33 2019 From: christian at python.org (Christian Heimes) Date: Mon, 15 Apr 2019 22:21:33 +0200 Subject: [Python-ideas] Logical tracebacks In-Reply-To: <20190415220718.0bb113cc@fsol> References: <20190415220718.0bb113cc@fsol> Message-ID: On 15/04/2019 22.07, Antoine Pitrou wrote: > > Hello, > > I apologize because I'm only going to throw a very vague idea and I > don't currently have time or motivation to explore it myself. But I > think it may prove interesting for other people and perhaps spur some > concrete actionable proposal. > > With the growing complexity of Python software stacks, the length of > tracebacks is continuously growing and is frequently making debugging > errors and issues more tedious than it should be. This is a > language-agnostic problem. Java software is often mocked for its > ridiculously long tracebacks, but Python might come close in the future. > > Especially since Python is often the a language of choice for non > computer science professionals, including but not only as a teaching > language, this would be a problem worth solving. We already recognized > the issue some years ago, and even implemented a focussed fix for one > specific context: the elision of importlib frames when an import error > occurs: > https://bugs.python.org/issue15110 > > However, there are many contexts where implementation details would > benefit from being hidden from tracebacks (the classical example being > the internals of framework or middleware code, such as Django, Dask, > etc.). We would therefore have to define some kind of protocol by > which tracebacks can be enumerated, not only as frames, but as logical > execution blocks, comprised of one or several frames each, whose > boundaries would reflect the boundaries of the various logical > execution layers (again: framework, middleware...) involved in the > frame stack. We would probably also need some flag(s) to disable the > feature in cases where the full stack frame is wanted (I imagine > elaborate UIs could also allow switching back and forth from both > representations). > > This would need a lot more thinking, and perhaps exploring what kind of > hacks already exist in the wild to achieve similar functionality. > Again, I'm just throwing this around for others to play with. Zope has a feature like that for more than a decade. Code could define variables __traceback_info__ and __traceback_supplement__ in local scope, which would then be used by the traceback formatter to annotate the traceback with additional information. I think it was also possible to hide frame with a similar technique. https://zopeexceptions.readthedocs.io/en/latest/narr.html From nathan12343 at gmail.com Mon Apr 15 16:23:17 2019 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Mon, 15 Apr 2019 14:23:17 -0600 Subject: [Python-ideas] Logical tracebacks In-Reply-To: <20190415220718.0bb113cc@fsol> References: <20190415220718.0bb113cc@fsol> Message-ID: This is a really great idea. I?d also point to the awful hacks that jinja2 needs to go through to elide jinia2 frames from user tracebacks as an indicator that this is a desired feature. https://github.com/pallets/jinja/blob/master/jinja2/debug.py On Mon, Apr 15, 2019 at 2:08 PM Antoine Pitrou wrote: > > Hello, > > I apologize because I'm only going to throw a very vague idea and I > don't currently have time or motivation to explore it myself. But I > think it may prove interesting for other people and perhaps spur some > concrete actionable proposal. > > With the growing complexity of Python software stacks, the length of > tracebacks is continuously growing and is frequently making debugging > errors and issues more tedious than it should be. This is a > language-agnostic problem. Java software is often mocked for its > ridiculously long tracebacks, but Python might come close in the future. > > Especially since Python is often the a language of choice for non > computer science professionals, including but not only as a teaching > language, this would be a problem worth solving. We already recognized > the issue some years ago, and even implemented a focussed fix for one > specific context: the elision of importlib frames when an import error > occurs: > https://bugs.python.org/issue15110 > > However, there are many contexts where implementation details would > benefit from being hidden from tracebacks (the classical example being > the internals of framework or middleware code, such as Django, Dask, > etc.). We would therefore have to define some kind of protocol by > which tracebacks can be enumerated, not only as frames, but as logical > execution blocks, comprised of one or several frames each, whose > boundaries would reflect the boundaries of the various logical > execution layers (again: framework, middleware...) involved in the > frame stack. We would probably also need some flag(s) to disable the > feature in cases where the full stack frame is wanted (I imagine > elaborate UIs could also allow switching back and forth from both > representations). > > This would need a lot more thinking, and perhaps exploring what kind of > hacks already exist in the wild to achieve similar functionality. > Again, I'm just throwing this around for others to play with. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andre.roberge at gmail.com Mon Apr 15 17:00:58 2019 From: andre.roberge at gmail.com (Andre Roberge) Date: Mon, 15 Apr 2019 18:00:58 -0300 Subject: [Python-ideas] Logical tracebacks In-Reply-To: <20190415220718.0bb113cc@fsol> References: <20190415220718.0bb113cc@fsol> Message-ID: On Mon, Apr 15, 2019 at 5:07 PM Antoine Pitrou wrote: > > Hello, > > I apologize because I'm only going to throw a very vague idea and I > don't currently have time or motivation to explore it myself. But I > think it may prove interesting for other people and perhaps spur some > concrete actionable proposal. > > With the growing complexity of Python software stacks, the length of > tracebacks is continuously growing and is frequently making debugging > errors and issues more tedious than it should be. This is a > language-agnostic problem. Java software is often mocked for its > ridiculously long tracebacks, but Python might come close in the future. > > Especially since Python is often the a language of choice for non > computer science professionals, including but not only as a teaching > language, this would be a problem worth solving. For the "teaching" aspect, I have just started about a week ago working on a project which aims to produce more useful tracebacks for beginners. (This work builds from a slightly different project I had started quite a bit earlier.) For example, you can see some "simplified" tracebacks for general exceptions: https://aroberge.github.io/friendly-traceback/docs/html/tracebacks_en.html as well as for the not very informative "SyntaxError: invalid syntax". https://aroberge.github.io/friendly-traceback/docs/html/syntax_tracebacks_en.html If you go to these links, you will see on the following pages a version of the same but translated into French. For those that read only the English version, you may find that it is sometimes repetitive. However, the original message included in a traceback is shown "as is", and is later rewritten in a way meant to be translated: there is no such duplication in languages other than English. Ultimately, one could translate these in any language. My ultimate goal is to provide similar simplified explanations for all standard exceptions in Python, including some that are specific to specialized modules (Turtle and Decimal come to mind), as well as providing an easy way for Library writers to hook into this framework. This does not address the full problem you raise, but I thought it might provide an additional viewpoint to this discussion. > This would need a lot more thinking, and perhaps exploring what kind of > hacks already exist in the wild to achieve similar functionality. > I guess my project could be thought of as one of these hacks. Andr? Again, I'm just throwing this around for others to play with. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Apr 15 22:17:52 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Apr 2019 22:17:52 -0400 Subject: [Python-ideas] Logical tracebacks In-Reply-To: <20190415220718.0bb113cc@fsol> References: <20190415220718.0bb113cc@fsol> Message-ID: On 4/15/2019 4:07 PM, Antoine Pitrou wrote: > > Hello, > > I apologize because I'm only going to throw a very vague idea and I > don't currently have time or motivation to explore it myself. But I > think it may prove interesting for other people and perhaps spur some > concrete actionable proposal. > > With the growing complexity of Python software stacks, the length of > tracebacks is continuously growing and is frequently making debugging > errors and issues more tedious than it should be. This is a > language-agnostic problem. Java software is often mocked for its > ridiculously long tracebacks, but Python might come close in the future. > > Especially since Python is often the a language of choice for non > computer science professionals, including but not only as a teaching > language, this would be a problem worth solving. We already recognized > the issue some years ago, and even implemented a focussed fix for one > specific context: the elision of importlib frames when an import error > occurs: > https://bugs.python.org/issue15110 > > However, there are many contexts where implementation details would > benefit from being hidden from tracebacks (the classical example being > the internals of framework or middleware code, such as Django, Dask, > etc.). We would therefore have to define some kind of protocol by > which tracebacks can be enumerated, not only as frames, but as logical > execution blocks, comprised of one or several frames each, whose > boundaries would reflect the boundaries of the various logical > execution layers (again: framework, middleware...) involved in the > frame stack. We would probably also need some flag(s) to disable the > feature in cases where the full stack frame is wanted (I imagine > elaborate UIs could also allow switching back and forth from both > representations). > > This would need a lot more thinking, and perhaps exploring what kind of > hacks already exist in the wild to achieve similar functionality. IDLE has some hackish code to eliminate the extra stuff it adds to raw tracebacks. An IDLE startup flag to not do that might occasionally be useful for IDLE development. -- Terry Jan Reedy From boxed at killingar.net Tue Apr 16 02:10:32 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Tue, 16 Apr 2019 08:10:32 +0200 Subject: [Python-ideas] Logical tracebacks In-Reply-To: <20190415220718.0bb113cc@fsol> References: <20190415220718.0bb113cc@fsol> Message-ID: <6DCC2365-68CE-4F20-B8E1-3BA6AED152E0@killingar.net> > On 15 Apr 2019, at 22:07, Antoine Pitrou wrote: > > However, there are many contexts where implementation details would > benefit from being hidden from tracebacks (the classical example being > the internals of framework or middleware code, such as Django, Dask, > etc.). We would therefore have to define some kind of protocol by > which tracebacks can be enumerated, not only as frames, but as logical > execution blocks, comprised of one or several frames each, whose > boundaries would reflect the boundaries of the various logical > execution layers (again: framework, middleware...) involved in the > frame stack. We would probably also need some flag(s) to disable the > feature in cases where the full stack frame is wanted (I imagine > elaborate UIs could also allow switching back and forth from both > representations). At work I've implemented a super simple system where frames with file names that match some simple patterns are put in bold. This helps enormously while not hiding information that can be crucial at times. I'd recommend people try this approach and see how it feels. It's very easy to implement compared to the alternatives suggested so far in this thread. I'd also argue that even if the more complex hiding methods mentioned are implemented then this method can still be very useful on top of those other changes. / Anders From stefano.borini at gmail.com Tue Apr 16 16:54:31 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Tue, 16 Apr 2019 21:54:31 +0100 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop Message-ID: given the following code def g(): yield 2 yield 3 return 6 for x in g(): print(x) The output is obviously 2 3 As far as I know, there is currently no way to capture the StopIteration value when the generator is used in a for loop. Is it true? If not, would a syntax like: for x in g() return v: print(x) print(v) # prints 6 be useful? It would be syntactic sugar for (corner cases omitted) def g(): yield 2 yield 3 return 6 it = iter(g()) while True: try: x = next(it) except StopIteration as exc: v = exc.value break else: print(x) print(v) -- Kind regards, Stefano Borini From zuo at kaliszewski.net Tue Apr 16 18:32:13 2019 From: zuo at kaliszewski.net (Jan Kaliszewski) Date: Wed, 17 Apr 2019 00:32:13 +0200 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: References: Message-ID: <20190417003213.680bf177@grzmot> Hello, 2019-04-16 Stefano Borini dixit: > def g(): > yield 2 > yield 3 > return 6 [...] > for x in g() return v: > print(x) > > print(v) # prints 6 I like the idea -- occasionally (when dealing with `yield from`-intensive code...) I wish such a shortcut existed. I don't like the proposed keyword. Maybe `as` would be better? for x in g() as v: print(x) print(v) Cheers, *j From steve at pearwood.info Tue Apr 16 19:44:37 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 17 Apr 2019 09:44:37 +1000 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: References: Message-ID: <20190416234434.GH3010@ando.pearwood.info> On Tue, Apr 16, 2019 at 09:54:31PM +0100, Stefano Borini wrote: > As far as I know, there is currently no way to capture the > StopIteration value when the generator is used in a for loop. Is it > true? I think you are correct. See https://bugs.python.org/issue35756 > If not, would a syntax like: > > for x in g() return v: > print(x) > > print(v) # prints 6 > > be useful? I don't know. You tell us -- why do you care about the StopIteration value in a for-loop? I think your question here is backwards. You should not start with syntax to capture the exception value, then ask if it would be useful. You should start by finding a reason why we would want to capture the exception value, and only then worry about whether we need syntax for it, or some other method. -- Steven From greg.ewing at canterbury.ac.nz Tue Apr 16 21:11:02 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Apr 2019 13:11:02 +1200 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: <20190417003213.680bf177@grzmot> References: <20190417003213.680bf177@grzmot> Message-ID: <5CB67D26.206@canterbury.ac.nz> Jan Kaliszewski wrote: > I like the idea -- occasionally (when dealing with `yield > from`-intensive code...) I wish such a shortcut existed. Can you give a concrete example of a use case? -- Greg From tjreedy at udel.edu Tue Apr 16 23:47:43 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 16 Apr 2019 23:47:43 -0400 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: References: Message-ID: On 4/16/2019 4:54 PM, Stefano Borini wrote: > given the following code > > def g(): > yield 2 > yield 3 > return 6 > > for x in g(): > print(x) > > The output is obviously > 2 > 3 > > As far as I know, there is currently no way to capture the > StopIteration value when the generator is used in a for loop. Is it > true? > If not, would a syntax like: > > for x in g() return v: > print(x) > > print(v) # prints 6 > > be useful? It would be syntactic sugar for (corner cases omitted) Syntactic sugar should be reserved for fairly common cases, not for extremely rare cases. > def g(): > yield 2 > yield 3 > return 6 If a for loop user needs to see 6, it should be yielded. Adding non-None return values to StopIteration is fairly new, and was/is intended for cases where a generator is not being used as a simple forward iterator. For such special cases, special code like the following should be used. > it = iter(g()) > while True: > try: > x = next(it) > except StopIteration as exc: > v = exc.value > break > else: > print(x) > print(v) -- Terry Jan Reedy From stefano.borini at gmail.com Wed Apr 17 02:38:00 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Wed, 17 Apr 2019 07:38:00 +0100 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: <20190417003213.680bf177@grzmot> References: <20190417003213.680bf177@grzmot> Message-ID: I don't like it either. Ideally, I would want "returning", but of course a new keyword is not an option for such limited use case. "as" is probably much better, and the behavior of as in other contexts is very similar. On Tue, 16 Apr 2019 at 23:42, Jan Kaliszewski wrote: > > Hello, > > 2019-04-16 Stefano Borini dixit: > > > def g(): > > yield 2 > > yield 3 > > return 6 > [...] > > for x in g() return v: > > print(x) > > > > print(v) # prints 6 > > I like the idea -- occasionally (when dealing with `yield > from`-intensive code...) I wish such a shortcut existed. > > I don't like the proposed keyword. Maybe `as` would be better? > > for x in g() as v: > print(x) > > print(v) > > Cheers, > *j > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Kind regards, Stefano Borini From stefano.borini at gmail.com Wed Apr 17 02:58:07 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Wed, 17 Apr 2019 07:58:07 +0100 Subject: [Python-ideas] Catching the return value of a generator at the end of a for loop In-Reply-To: <20190416234434.GH3010@ando.pearwood.info> References: <20190416234434.GH3010@ando.pearwood.info> Message-ID: On Wed, 17 Apr 2019 at 00:45, Steven D'Aprano wrote: > I don't know. You tell us -- why do you care about the StopIteration > value in a for-loop? I came across the idea while I was reading various PEPs, so I don't have an actual use case under my hands right now. However, in the past I had a circumstance that might have called for that. Of course it was easy to workaround. I was iterating over plugins as they were loaded, using a generator, and setting up some configuration options. e.g. def load_plugins(): for plugin in pluginloader: if plugin.success: plugin.setup(configuration_vars) In some cases, some of the plugins failed to load (e.g. because there was a syntax error in their content). In my design, the plugin class instance had a flag indicating if it was successful or not (plugin.success). I wanted to keep track of how many plugins failed to load and how many were successful, so I could return that information to the user. This calls for counting at the level of load_plugins(). If pluginloader returned a tuple (num_successful, num_failed) it would save a (agreed trivial) need to count this information in the caller. -- Kind regards, Stefano Borini From ernst at pleiszenburg.de Fri Apr 19 15:49:54 2019 From: ernst at pleiszenburg.de (Sebastian M. Ernst) Date: Fri, 19 Apr 2019 21:49:54 +0200 Subject: [Python-ideas] Add a `dir_fd` parameter to `os.truncate`? Message-ID: <4bfd87c1-1880-618a-ad22-3a8112da92a2@pleiszenburg.de> Hi everyone, many methods in `os` have a `dir_fd` parameter, for instance `unlink` [1]: ```python os.unlink(path, *, dir_fd=None) ``` The `dir_fd` parameter [2] allows it to describe paths relative to directory descriptors. Otherwise, relative paths are relative to the current working directory. The implementation of `truncate` in `os` does not have this parameter [3]: ```python os.truncate(path, length) ``` On POSIX-like systems for instance, the `os` module actually imports [4] this function from the `posix module`. There, one can see that it [5] just calls the `truncate` system call [6]. This kind of implicitly explains why there is no `dir_fd` parameter: There is no such thing like a `truncateat` system call, which would be required for handling the `dir_fd` parameter [7, 8]. However, it is possible to work around this limitation: ```python def truncate(path, length, dir_fd = None): if dir_fd is None: return os.truncate(path, length) else: fd = os.open(path, flags = os.O_WRONLY, dir_fd = dir_fd) ret = os.ftruncate(fd, length) os.close(fd) return ret ``` Why not add a convenience function or wrapper like above to the `os` module, which closes this gap and is more consistent with other methods? Best regards, Sebastian [1] https://docs.python.org/3/library/os.html#os.unlink [2] https://docs.python.org/3/library/os.html#dir-fd [3] https://docs.python.org/3/library/os.html#os.truncate [4] https://github.com/python/cpython/blob/3.7/Lib/os.py#L135 [5] https://github.com/python/cpython/blob/3.7/Modules/posixmodule.c#L9042 [6] https://github.com/python/cpython/blob/3.7/Modules/posixmodule.c#L9079 [7] https://stackoverflow.com/q/52871892/1672565 [8] https://stackoverflow.com/q/55765181/1672565 From danilo.bellini at gmail.com Fri Apr 19 16:12:06 2019 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Fri, 19 Apr 2019 17:12:06 -0300 Subject: [Python-ideas] Open parenthesis in REPL completion Message-ID: I'm not aware if that was already discussed, but something I find quite annoying is the REPL auto-complete that also includes the open parenthesis symbol. I think it shouldn't be included in TAB completion. At least twice a week I make mistakes like typing "help(something()" with unmatched parentheses, because the "something" was filled by a TAB completion, and the trailing open parenthesis wasn't expected (given that there's no such a completion elsewhere). -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Sat Apr 20 11:57:23 2019 From: jfine2358 at gmail.com (Jonathan Fine) Date: Sat, 20 Apr 2019 16:57:23 +0100 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: Message-ID: Hi Danilo I've exactly the same experience, and now take the behaviour for granted, and don't personally have a desire to fix the problem. I've got used to it. On Unix the behaviour follows from https://docs.python.org/3/library/rlcompleter.html and also https://docs.python.org/3/library/site.html#rlcompleter-config. I read in https://docs.python.org/3/library/site.html that site.py attempts "to import a module named usercustomize, which can perform arbitrary user-specific customizations". So I think that you, I or any other user could solve this problem for oneself, by creating a usercustomize.py. And that would be the first step to getting it into the standard library (or perhaps as an installation option for Python). I hope this helps. -- Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From danilo.bellini at gmail.com Sat Apr 20 13:51:34 2019 From: danilo.bellini at gmail.com (Danilo J. S. Bellini) Date: Sat, 20 Apr 2019 14:51:34 -0300 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: Message-ID: For most of the time, I'm already copying and pasting stuff from a text editor to use its completion instead. After more than an year having trouble with this, I think there's no way I can "get used" to it in any positive sense, but when I stick to the REPL for a long time, I'm aware that I press backspace all the time after TAB, sometimes unconsciously. Typing the last char manually hurts less than breaking the bracket matching expectation. I usually type both the open and close parentheses at once, then I move the cursor back one character to type its contents, so I never forget some unmatched parenthesis, but that strategy (which still works for other brackets and almost everywhere else) gets broken when some open parenthesis appear for some completions but not for others. On Sat, 20 Apr 2019 at 12:57, Jonathan Fine wrote: > On Unix the behaviour follows from > https://docs.python.org/3/library/rlcompleter.html and also > https://docs.python.org/3/library/site.html#rlcompleter-config. > > I read in https://docs.python.org/3/library/site.html that site.py > attempts "to import a module named usercustomize, which can perform > arbitrary user-specific customizations". > Thanks. From those links I found that this behavior is a quite old stuff , though the completion wasn't active by default in Python 2.7. The documentation states that only Python 3.4 activated it by default. The solution I found from reading that code is this oneliner, which I'll copy and paste after loading the REPL (yes, it's monkeypatching a seemingly private method): __import__("rlcompleter").Completer._callable_postfix = lambda self, val, word: word Is there a default PYTHONSTARTUP file name in Python 3.7.3, or at least a single global configuration file for the REPL where I can put that oneliner or a file reference with that line? I strongly prefer not to mess around with ~/.bashrc, ~/.zshrc and scattered stuff like that, if possible. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Sat Apr 20 16:12:08 2019 From: jfine2358 at gmail.com (Jonathan Fine) Date: Sat, 20 Apr 2019 21:12:08 +0100 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: Message-ID: Hi Danilo You wrote: Is there a default PYTHONSTARTUP file name in Python 3.7.3, or at least a > single global configuration file for the REPL where I can put that oneliner > or a file reference with that line? I strongly prefer not to mess around > with ~/.bashrc, ~/.zshrc and scattered stuff like that, if possible. > I don't know the answer to this question. Perhaps someone else does. -- Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Apr 20 16:25:31 2019 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Apr 2019 06:25:31 +1000 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: Message-ID: On Sun, Apr 21, 2019 at 3:52 AM Danilo J. S. Bellini wrote: > The solution I found from reading that code is this oneliner, which I'll copy and paste after loading the REPL (yes, it's monkeypatching a seemingly private method): > > __import__("rlcompleter").Completer._callable_postfix = lambda self, val, word: word > > Is there a default PYTHONSTARTUP file name in Python 3.7.3, or at least a single global configuration file for the REPL where I can put that oneliner or a file reference with that line? I strongly prefer not to mess around with ~/.bashrc, ~/.zshrc and scattered stuff like that, if possible. > I think it's probably cleaner to do this by subclassing Completer and overriding something (dig around to see what actually *calls* that), but either way, you can most certainly create yourself a config file. Details are here: https://docs.python.org/3/library/site.html You should be able to do everything you want with sitecustomize or usercustomize. If you need detailed help, python-list has lots of people who know how this works (way better than I do). ChrisA From johnlinp at gmail.com Tue Apr 23 02:44:25 2019 From: johnlinp at gmail.com (=?UTF-8?B?5p6X6Ieq5Z2H?=) Date: Tue, 23 Apr 2019 14:44:25 +0800 Subject: [Python-ideas] What are the strong use cases for str.rindex()? Message-ID: Hi all, I found that there are str.index() and str.rindex(), but there is only list.index() and no list.rindex(). So I filed the issue https://bugs.python.org/issue36639 to provide list.rindex(). However, the issue was rejected and closed with the comment: > There were known, strong use cases for str.rindex(). The list.rindex() method was intentionally omitted. AFAICT no compelling use cases have arisen, so we should continue to leave it out. In general, we don't grow the core APIs unnecessarily. However, I am not sure what the known, strong use cases for str.rindex() are. Why doesn't the strong use cases apply on list.rindex()? Could anyone give me some examples? Thanks. Best, John Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Apr 23 13:28:29 2019 From: brett at python.org (Brett Cannon) Date: Tue, 23 Apr 2019 10:28:29 -0700 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: Given "abcdefabcdefabcdef", what is the last result of "abc"? x.rindex("abc") will tell you. Given [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] where is the last result of 3? reversed(x).index(3) will tell you (or x[::-1]). Notice how with lists you can easily reverse them and still get at the value since you are searching per index. But with strings, you searching by a subslice that can be greater than 1 in which case you can't use a similar approach. On Mon, Apr 22, 2019 at 11:47 PM ??? wrote: > Hi all, > > I found that there are str.index() and str.rindex(), but there is only > list.index() and no list.rindex(). So I filed the issue > https://bugs.python.org/issue36639 to provide list.rindex(). However, the > issue was rejected and closed with the comment: > > > There were known, strong use cases for str.rindex(). The list.rindex() > method was intentionally omitted. AFAICT no compelling use cases have > arisen, so we should continue to leave it out. In general, we don't grow > the core APIs unnecessarily. > > However, I am not sure what the known, strong use cases for str.rindex() > are. Why doesn't the strong use cases apply on list.rindex()? Could anyone > give me some examples? Thanks. > > Best, > John Lin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yaoxiansamma at gmail.com Tue Apr 23 13:50:48 2019 From: yaoxiansamma at gmail.com (Thautwarm Zhao) Date: Wed, 24 Apr 2019 01:50:48 +0800 Subject: [Python-ideas] What are the strong use cases for str.rindex()? (John Lin) Message-ID: IMO, there're lots of use cases in parsing related stuffs, which requires rindex a lot, say, when you have generated a tokenizer which might across multiple lines: line 8: X """ line 9: line 10: """ In this case, we need to extract 2 tokens X and , a multiline whitespace string. After getting each token we're to compute/update the current column and line number. If the line number gets advanced then we use rindex('\n') to help with updating the new column number, otherwise, col_offset += len(current_token) . However, the reason why we don't need list.rindex but do for str.rindex is simple I'd say: str is immutable and has no O(1) reverse method. On the other hand, when it comes to list, you can use list.index after list.reverse, and after a bunch of operations you can resume the state by invoking list.reverse again. On Wed, Apr 24, 2019, 12:11 AM Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > Today's Topics: > > 1. What are the strong use cases for str.rindex()? (???) > > > > ---------- Forwarded message ---------- > From: "???" > To: python-ideas at python.org > Cc: > Bcc: > Date: Tue, 23 Apr 2019 14:44:25 +0800 > Subject: [Python-ideas] What are the strong use cases for str.rindex()? > Hi all, > > I found that there are str.index() and str.rindex(), but there is only > list.index() and no list.rindex(). So I filed the issue > https://bugs.python.org/issue36639 to provide list.rindex(). However, the > issue was rejected and closed with the comment: > > > There were known, strong use cases for str.rindex(). The list.rindex() > method was intentionally omitted. AFAICT no compelling use cases have > arisen, so we should continue to leave it out. In general, we don't grow > the core APIs unnecessarily. > > However, I am not sure what the known, strong use cases for str.rindex() > are. Why doesn't the strong use cases apply on list.rindex()? Could anyone > give me some examples? Thanks. > > Best, > John Lin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Apr 23 13:52:48 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Apr 2019 13:52:48 -0400 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: On 4/23/2019 2:44 AM, ??? wrote: > Hi all, > > I found that there are str.index() and str.rindex(), but there is only > list.index() and no list.rindex(). str.index and list.index are related but not the same. The consistency argument is better applied to find-rfind, index-rindex, partition-rpartition, etc. It is much more common to process strings right to left or in both directions, than process lists right to left or in both directions. Moreover, lists have a reverse method, strings do not. ''.join(reversed(somestring)) is likely slower, especially if there many non-ascii chars. Moreover, somestring.rindex(substring) would have to have both somestring and substring reversed when substring is more than one char. So I filed the issue > https://bugs.python.org/issue36639 to provide list.rindex(). However, > the issue was rejected and closed with the comment: > > > There were known, strong use cases for str.rindex().? The > list.rindex() method was intentionally omitted.? AFAICT no compelling > use cases have arisen, so we should continue to leave it out.? In > general, we don't grow the core APIs unnecessarily. > > However, I am not sure what the?known, strong use cases for str.rindex() > are. Why doesn't the strong use cases apply on list.rindex()? Could > anyone give me some examples? Thanks. Omitting tests of rindex and rfind themselves: Searching 'rindex' in C:\Programs\Python38\lib\*.py ... C:\Programs\Python38\lib\_markupbase.py: 55: pos = rawdata.rindex("\n", i, j) # Should not fail C:\Programs\Python38\lib\collections\__init__.py: 1260: def rindex(self, sub, start=0, end=_sys.maxsize): C:\Programs\Python38\lib\collections\__init__.py: 1261: return self.data.rindex(sub, start, end) C:\Programs\Python38\lib\pydoc_data\topics.py: 10125: 'str.rindex(sub[, start[, end]])\n' C:\Programs\Python38\lib\test\pickletester.py: 2543: self.assertEqual((pickled.rindex(b"abcd") + len(b"abcd") - C:\Programs\Python38\lib\test\pickletester.py: 2574: self.assertEqual((pickled.rindex(b"abcd") + len(b"abcd") - C:\Programs\Python38\lib\test\test_baseexception.py: 46: depth = exc_line.rindex('-') C:\Programs\Python38\lib\test\test_bigmem.py: 310: self.assertEqual(s.rindex(_(' ')), C:\Programs\Python38\lib\test\test_bigmem.py: 311: sublen + size + SUBSTR.rindex(_(' '))) C:\Programs\Python38\lib\test\test_bigmem.py: 312: self.assertEqual(s.rindex(SUBSTR), sublen + size) C:\Programs\Python38\lib\test\test_bigmem.py: 313: self.assertEqual(s.rindex(_(' '), 0, sublen + size - 1), C:\Programs\Python38\lib\test\test_bigmem.py: 314: SUBSTR.rindex(_(' '))) C:\Programs\Python38\lib\test\test_bigmem.py: 315: self.assertEqual(s.rindex(SUBSTR, 0, sublen + size), 0) C:\Programs\Python38\lib\test\test_bigmem.py: 316: self.assertEqual(s.rindex(_('i')), C:\Programs\Python38\lib\test\test_bigmem.py: 317: sublen + size + SUBSTR.rindex(_('i'))) C:\Programs\Python38\lib\test\test_bigmem.py: 318: self.assertEqual(s.rindex(_('i'), 0, sublen), SUBSTR.rindex(_('i'))) C:\Programs\Python38\lib\test\test_bigmem.py: 319: self.assertEqual(s.rindex(_('i'), 0, sublen + size), C:\Programs\Python38\lib\test\test_bigmem.py: 320: SUBSTR.rindex(_('i'))) C:\Programs\Python38\lib\test\test_bigmem.py: 321: self.assertRaises(ValueError, s.rindex, _('j')) C:\Programs\Python38\lib\test\test_bytes.py: 566: self.assertEqual(b.rindex(b'ss'), 5) C:\Programs\Python38\lib\test\test_bytes.py: 567: self.assertRaises(ValueError, b.rindex, b'w') C:\Programs\Python38\lib\test\test_bytes.py: 568: self.assertRaises(ValueError, b.rindex, b'mississippian') C:\Programs\Python38\lib\test\test_bytes.py: 570: self.assertEqual(b.rindex(i), 10) C:\Programs\Python38\lib\test\test_bytes.py: 571: self.assertRaises(ValueError, b.rindex, w) C:\Programs\Python38\lib\test\test_bytes.py: 573: self.assertEqual(b.rindex(b'ss', 3), 5) C:\Programs\Python38\lib\test\test_bytes.py: 574: self.assertEqual(b.rindex(b'ss', 0, 6), 2) C:\Programs\Python38\lib\test\test_bytes.py: 576: self.assertEqual(b.rindex(i, 1, 3), 1) C:\Programs\Python38\lib\test\test_bytes.py: 577: self.assertEqual(b.rindex(i, 3, 9), 7) C:\Programs\Python38\lib\test\test_bytes.py: 578: self.assertRaises(ValueError, b.rindex, w, 1, 3) C:\Programs\Python38\lib\test\test_bytes.py: 768: self.assertEqual(3, b.rindex(l, None)) C:\Programs\Python38\lib\test\test_bytes.py: 769: self.assertEqual(3, b.rindex(l, -2, None)) C:\Programs\Python38\lib\test\test_bytes.py: 770: self.assertEqual(2, b.rindex(l, None, -2)) C:\Programs\Python38\lib\test\test_bytes.py: 771: self.assertEqual(0, b.rindex(h, None, None)) C:\Programs\Python38\lib\test\test_bytes.py: 791: for method in (b.count, b.find, b.index, b.rfind, b.rindex): C:\Programs\Python38\lib\test\test_bytes.py: 806: self.assertRaisesRegex(TypeError, r'\brindex\b', b.rindex, C:\Programs\Python38\lib\test\test_descr.py: 3596: try: ''.rindex('5') C:\Programs\Python38\lib\test\test_descr.py: 3598: else: self.fail("''.rindex('5') doesn't raise ValueError") Searching 'rfind' in C:\Programs\Python38\lib\*.py ... C:\Programs\Python38\lib\collections\__init__.py: 1256: def rfind(self, sub, start=0, end=_sys.maxsize): C:\Programs\Python38\lib\collections\__init__.py: 1259: return self.data.rfind(sub, start, end) C:\Programs\Python38\lib\ctypes\macholib\dyld.py: 143: fmwk_index = fn.rfind('.framework') C:\Programs\Python38\lib\doctest.py: 341: i = msg.rfind('.', 0, end) C:\Programs\Python38\lib\email\_parseaddr.py: 76: i = data[0].rfind(',') C:\Programs\Python38\lib\encodings\punycode.py: 187: pos = text.rfind(b"-") C:\Programs\Python38\lib\formatter.py: 401: i = data.rfind('\n') C:\Programs\Python38\lib\genericpath.py: 124: sepIndex = p.rfind(sep) C:\Programs\Python38\lib\genericpath.py: 126: altsepIndex = p.rfind(altsep) C:\Programs\Python38\lib\genericpath.py: 129: dotIndex = p.rfind(extsep) C:\Programs\Python38\lib\html\parser.py: 148: amppos = rawdata.rfind('&', max(i, n-34)) C:\Programs\Python38\lib\html\parser.py: 336: - self.__starttag_text.rfind("\n") C:\Programs\Python38\lib\http\client.py: 873: i = host.rfind(':') C:\Programs\Python38\lib\http\client.py: 874: j = host.rfind(']') # ipv6 addresses have [...] C:\Programs\Python38\lib\http\cookiejar.py: 566: i = A.rfind(B) C:\Programs\Python38\lib\http\cookiejar.py: 1016: i = domain.rfind(".") C:\Programs\Python38\lib\http\cookiejar.py: 1017: j = domain.rfind(".", 0, i) C:\Programs\Python38\lib\http\cookiejar.py: 1507: i = path.rfind("/") C:\Programs\Python38\lib\idlelib\pyparse.py: 161: i = code.rfind(":\n", 0, limit) C:\Programs\Python38\lib\idlelib\pyparse.py: 164: i = code.rfind('\n', 0, i) + 1 # start of colon line (-1+1=0) C:\Programs\Python38\lib\idlelib\pyparse.py: 379: p = code.rfind('\n', 0, p-1) + 1 C:\Programs\Python38\lib\idlelib\pyparse.py: 477: origi = i = code.rfind('\n', 0, j) + 1 C:\Programs\Python38\lib\json\decoder.py: 33: colno = pos - doc.rfind('\n', 0, pos) C:\Programs\Python38\lib\logging\__init__.py: 1340: i = name.rfind(".") C:\Programs\Python38\lib\logging\__init__.py: 1353: i = name.rfind(".", 0, i - 1) C:\Programs\Python38\lib\modulefinder.py: 155: i = pname.rfind('.') C:\Programs\Python38\lib\modulefinder.py: 509: i = name.rfind(".") C:\Programs\Python38\lib\pathlib.py: 791: i = name.rfind('.') C:\Programs\Python38\lib\pathlib.py: 810: i = name.rfind('.') C:\Programs\Python38\lib\pdb.py: 629: colon = arg.rfind(':') C:\Programs\Python38\lib\pdb.py: 884: i = arg.rfind(':') C:\Programs\Python38\lib\posixpath.py: 109: i = p.rfind(sep) + 1 C:\Programs\Python38\lib\posixpath.py: 148: i = p.rfind(sep) + 1 C:\Programs\Python38\lib\posixpath.py: 158: i = p.rfind(sep) + 1 C:\Programs\Python38\lib\pyclbr.py: 145: i = module.rfind('.') C:\Programs\Python38\lib\pydoc.py: 1657: desc += ' in ' + name[:name.rfind('.')] C:\Programs\Python38\lib\pydoc_data\topics.py: 10115: 'str.rfind(sub[, start[, end]])\n' C:\Programs\Python38\lib\pydoc_data\topics.py: 10127: ' Like "rfind()" but raises "ValueError" when the ' C:\Programs\Python38\lib\site-packages\pip\_vendor\html5lib\_inputstream.py: 228: lastLinePos = chunk.rfind('\n', 0, offset) C:\Programs\Python38\lib\site-packages\pip\_vendor\pyparsing.py: 1110: return 1 if 0 References: Message-ID: On 2019-04-23 18:52, Terry Reedy wrote: > On 4/23/2019 2:44 AM, ??? wrote: >> Hi all, >> >> I found that there are str.index() and str.rindex(), but there is only >> list.index() and no list.rindex(). > > str.index and list.index are related but not the same. The consistency > argument is better applied to find-rfind, index-rindex, > partition-rpartition, etc. > > It is much more common to process strings right to left or in both > directions, than process lists right to left or in both directions. > Moreover, lists have a reverse method, strings do not. > ''.join(reversed(somestring)) is likely slower, especially if there many > non-ascii chars. Moreover, somestring.rindex(substring) would have to > have both somestring and substring reversed when substring is more than > one char. > You can reverse a string with somestring[::-1]. Personally, I'm not convinced by the "lists can be reversed" argument. [snip] From guido at python.org Tue Apr 23 16:18:59 2019 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Apr 2019 13:18:59 -0700 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: On Tue, Apr 23, 2019 at 1:02 PM MRAB wrote: > On 2019-04-23 18:52, Terry Reedy wrote: > > On 4/23/2019 2:44 AM, ??? wrote: > >> Hi all, > >> > >> I found that there are str.index() and str.rindex(), but there is only > >> list.index() and no list.rindex(). > > > > str.index and list.index are related but not the same. The consistency > > argument is better applied to find-rfind, index-rindex, > > partition-rpartition, etc. > > > > It is much more common to process strings right to left or in both > > directions, than process lists right to left or in both directions. > > Moreover, lists have a reverse method, strings do not. > > ''.join(reversed(somestring)) is likely slower, especially if there many > > non-ascii chars. Moreover, somestring.rindex(substring) would have to > > have both somestring and substring reversed when substring is more than > > one char. > > > You can reverse a string with somestring[::-1]. > > Personally, I'm not convinced by the "lists can be reversed" argument. > Me neither, though for substring checks, reversing the string would be even more cumbersome (you'd have to reverse the query string too). My money is on "nobody uses this for lists". Some use cases for rindex() on strings that I found in a large codebase here include searching a pathname for the final slash, a list of comma-separated items for the last comma, a fully-qualified module name for the last period, and some ad-hoc parsing of other things. The "last separator" use cases are the most common and here rindex() sounds very useful. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrmatos at gmail.com Tue Apr 23 16:39:05 2019 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Matos?=) Date: Tue, 23 Apr 2019 13:39:05 -0700 (PDT) Subject: [Python-ideas] contains_any_in and contains_all_in Message-ID: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Hello, If we want to check if a string contains any/all of several other strings we have to use several or/and conditions or any/all. For any: if ('string1' in master_string or 'string2' in master_string or 'string3' in master_string): or if any(item in master_string for item in ['string1', 'string2', 'string3']): For all: if ('string1' in master_string and 'string2' in master_string and'string3' in master_string): or if all(item in master_string for item in ['string1', 'string2', 'string3']): I suggest adding some "sugar" to make it more readable by adding contains_any_in and contains_all_in to look like this For any: if master_string contains_any_in ['string1', 'string2', 'string3']: For all: if master_string contains_all_in ['string1', 'string2', 'string3]: What do you think? Thanks, JM -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertve92 at gmail.com Tue Apr 23 20:50:05 2019 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Wed, 24 Apr 2019 02:50:05 +0200 Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: Here comes funcoperators again : if master_string -contains_any_in- ['string1', 'string2', 'string3']: Given from funcoperators import infix @infix def contains_any_in(string, iterable): return any(item in string for item in iterable) pip install funcoperators https://pypi.org/project/funcoperators/ robertvandeneynde.be Le mar. 23 avr. 2019 ? 22:39, Jo?o Matos a ?crit : > Hello, > > If we want to check if a string contains any/all of several other strings > we have to use several or/and conditions or any/all. > > For any: > if ('string1' in master_string or 'string2' in master_string > or 'string3' in master_string): > > or > > if any(item in master_string for item in ['string1', 'string2', 'string3' > ]): > > For all: > if ('string1' in master_string and 'string2' in master_string > and'string3' in master_string): > > or > > if all(item in master_string for item in ['string1', 'string2', 'string3' > ]): > > I suggest adding some "sugar" to make it more readable by adding > contains_any_in and contains_all_in to look like this > > For any: > if master_string contains_any_in ['string1', 'string2', 'string3']: > > For all: > if master_string contains_all_in ['string1', 'string2', 'string3]: > > > What do you think? > > > Thanks, > > JM > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnlinp at gmail.com Tue Apr 23 20:59:18 2019 From: johnlinp at gmail.com (=?UTF-8?B?5p6X6Ieq5Z2H?=) Date: Wed, 24 Apr 2019 08:59:18 +0800 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: Hi all, Thanks for the explanation. Now I agree that the need for list.rindex() is not as common as str.rindex(). In fact, I only need list.rindex() when doing some algorithm problems. I guess that doesn't count as real need here. Best, John Lin Guido van Rossum ? 2019?4?24? ?? ??4:20??? > On Tue, Apr 23, 2019 at 1:02 PM MRAB wrote: > >> On 2019-04-23 18:52, Terry Reedy wrote: >> > On 4/23/2019 2:44 AM, ??? wrote: >> >> Hi all, >> >> >> >> I found that there are str.index() and str.rindex(), but there is only >> >> list.index() and no list.rindex(). >> > >> > str.index and list.index are related but not the same. The consistency >> > argument is better applied to find-rfind, index-rindex, >> > partition-rpartition, etc. >> > >> > It is much more common to process strings right to left or in both >> > directions, than process lists right to left or in both directions. >> > Moreover, lists have a reverse method, strings do not. >> > ''.join(reversed(somestring)) is likely slower, especially if there many >> > non-ascii chars. Moreover, somestring.rindex(substring) would have to >> > have both somestring and substring reversed when substring is more than >> > one char. >> > >> You can reverse a string with somestring[::-1]. >> >> Personally, I'm not convinced by the "lists can be reversed" argument. >> > > Me neither, though for substring checks, reversing the string would be > even more cumbersome (you'd have to reverse the query string too). > > My money is on "nobody uses this for lists". > > Some use cases for rindex() on strings that I found in a large codebase > here include searching a pathname for the final slash, a list of > comma-separated items for the last comma, a fully-qualified module name for > the last period, and some ad-hoc parsing of other things. The "last > separator" use cases are the most common and here rindex() sounds very > useful. > > -- > --Guido van Rossum (python.org/~guido) > *Pronouns: he/him/his **(why is my pronoun here?)* > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Apr 24 00:17:45 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 24 Apr 2019 00:17:45 -0400 Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: On 4/23/2019 4:39 PM, Jo?o Matos wrote: > Hello, > > If we want to check if a string contains any/all of several other > strings we have to use several or/and conditions or any/all. > > For any: > |if ('string1' in master_string or 'string2' in master_string > ??????? or 'string3' in master_string): > or > ifany(item inmaster_string foritem in['string1','string2','string3']): Trivial with re module, which will answer the question in one pass. > For all: > | > ||if ('string1' in master_string and 'string2' in master_string > ??????? and'string3' in master_string): > or > ||ifall(item inmaster_string foritem in['string1','string2','string3']): Tougher. Are the strings guaranteed to not be prefixes of each other? Do you want to allow overlaps? Can do in one pass by compiling a new re every time an item is found. If overlaps not wanted, re.iterfind will find all occurrence of any, so feed to set and see if all found. -- Terry Jan Reedy From robertve92 at gmail.com Wed Apr 24 00:45:46 2019 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Wed, 24 Apr 2019 06:45:46 +0200 Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: > > Trivial with re module, which will answer thequestion in one pass. > re.search('|'.join(map(re.escape, ['string1', 'string2', 'string3'])), master_string) For those who might find it non trivial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhodri at kynesim.co.uk Wed Apr 24 07:26:48 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Wed, 24 Apr 2019 12:26:48 +0100 Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: On 23/04/2019 21:39, Jo?o Matos wrote: > If we want to check if a string contains any/all of several other strings > we have to use several or/and conditions or any/all. [snip] > I suggest adding some "sugar" to make it more readable by adding > contains_any_in and contains_all_in to look like this > > For any: > if master_string contains_any_in ['string1', 'string2', 'string3']: > > For all: > if master_string contains_all_in ['string1', 'string2', 'string3]: They sound more like string methods to me, by analogy with startswith() and endswith(): if master_string.contains_any('string1', 'string2', 'string3'): etc. The only question is whether this is a common enough requirement to justify their existence. I don't remember our recent discussion on suffices coming to much of a conclusion about that. Anyone? -- Rhodri James *-* Kynesim Ltd From jcrmatos at gmail.com Wed Apr 24 12:50:56 2019 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Matos?=) Date: Wed, 24 Apr 2019 09:50:56 -0700 (PDT) Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: The objective of the proposal is to increase readability. IMO using re is even more unreadable than the and/or or any/all I mentioned. quarta-feira, 24 de Abril de 2019 ?s 05:47:04 UTC+1, Robert Vanden Eynde escreveu: > > Trivial with re module, which will answer thequestion in one pass. >> > > re.search('|'.join(map(re.escape, ['string1', 'string2', 'string3'])), > master_string) > > For those who might find it non trivial. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrmatos at gmail.com Wed Apr 24 12:50:10 2019 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Matos?=) Date: Wed, 24 Apr 2019 09:50:10 -0700 (PDT) Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: <8491418c-8c99-45d0-a329-25e5bf3e21b2@googlegroups.com> The objective of the proposal is to increase readability. IMO your options are even more unreadable than the and/or or any/all I mentioned. quarta-feira, 24 de Abril de 2019 ?s 05:33:12 UTC+1, Terry Reedy escreveu: > > On 4/23/2019 4:39 PM, Jo?o Matos wrote: > > Hello, > > > > If we want to check if a string contains any/all of several other > > strings we have to use several or/and conditions or any/all. > > > > For any: > > |if ('string1' in master_string or 'string2' in master_string > > or 'string3' in master_string): > > or > > ifany(item inmaster_string foritem in['string1','string2','string3']): > > Trivial with re module, which will answer the question in one pass. > > > > For all: > > | > > ||if ('string1' in master_string and 'string2' in master_string > > and'string3' in master_string): > > or > > ||ifall(item inmaster_string foritem in['string1','string2','string3']): > > Tougher. > Are the strings guaranteed to not be prefixes of each other? > Do you want to allow overlaps? > Can do in one pass by compiling a new re every time an item is found. > If overlaps not wanted, re.iterfind will find all occurrence of any, so > feed to set and see if all found. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From batuhanosmantaskaya at gmail.com Wed Apr 24 12:54:34 2019 From: batuhanosmantaskaya at gmail.com (=?UTF-8?Q?Batuhan_Osman_Ta=C5=9Fkaya?=) Date: Wed, 24 Apr 2019 19:54:34 +0300 Subject: [Python-ideas] CPython Bytecode Assembler Message-ID: Hello, Currently it is hard to assemble cpython bytecode without help of 3rd party libraries (like: vstinner/bytecode). I'm proposing an assembler to standard library and an API to cpython's peephole optimizer. Also an interface like `ast.NodeVisitor` and `ast.NodeTransformer` for bytecode objects will may be handy. It would help if you are doing; - Runtime patching - Specific optimizations at bytecode level -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Apr 24 13:44:12 2019 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Apr 2019 10:44:12 -0700 Subject: [Python-ideas] CPython Bytecode Assembler In-Reply-To: References: Message-ID: It is intentionally not included -- bytecode is a detail of the implementation and changes with each feature release, without concern for backwards compatibility. On Wed, Apr 24, 2019 at 10:33 AM Batuhan Osman Ta?kaya < batuhanosmantaskaya at gmail.com> wrote: > Hello, > > Currently it is hard to assemble cpython bytecode without help of 3rd > party libraries (like: vstinner/bytecode). I'm proposing an assembler to > standard library and an API to cpython's peephole optimizer. Also an > interface like `ast.NodeVisitor` and `ast.NodeTransformer` for bytecode > objects will may be handy. > > It would help if you are doing; > - Runtime patching > - Specific optimizations at bytecode level > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* -------------- next part -------------- An HTML attachment was scrubbed... URL: From batuhanosmantaskaya at gmail.com Wed Apr 24 13:57:55 2019 From: batuhanosmantaskaya at gmail.com (=?UTF-8?Q?Batuhan_Osman_Ta=C5=9Fkaya?=) Date: Wed, 24 Apr 2019 20:57:55 +0300 Subject: [Python-ideas] CPython Bytecode Assembler In-Reply-To: References: Message-ID: `dis` module was my only reference for this proposal. If majority doesn't want a new implementation-specific module, it is best to withdraw this proposal. Brett Cannon , 24 Nis 2019 ?ar, 20:49 tarihinde ?unu yazd?: > Since bytecode is a CPython-specific implementaiton detail I don't know if > it makes sense to enshrine an assembler for it in the stdlib (if you were > to ask me today if I thought the dis module belonged in the stdlib I would > probably say "no", but I also know not everyone agrees with that view :) . > > On Wed, Apr 24, 2019 at 10:36 AM Batuhan Osman Ta?kaya < > batuhanosmantaskaya at gmail.com> wrote: > >> Hello, >> >> Currently it is hard to assemble cpython bytecode without help of 3rd >> party libraries (like: vstinner/bytecode). I'm proposing an assembler to >> standard library and an API to cpython's peephole optimizer. Also an >> interface like `ast.NodeVisitor` and `ast.NodeTransformer` for bytecode >> objects will may be handy. >> >> It would help if you are doing; >> - Runtime patching >> - Specific optimizations at bytecode level >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 24 13:48:53 2019 From: brett at python.org (Brett Cannon) Date: Wed, 24 Apr 2019 10:48:53 -0700 Subject: [Python-ideas] CPython Bytecode Assembler In-Reply-To: References: Message-ID: Since bytecode is a CPython-specific implementaiton detail I don't know if it makes sense to enshrine an assembler for it in the stdlib (if you were to ask me today if I thought the dis module belonged in the stdlib I would probably say "no", but I also know not everyone agrees with that view :) . On Wed, Apr 24, 2019 at 10:36 AM Batuhan Osman Ta?kaya < batuhanosmantaskaya at gmail.com> wrote: > Hello, > > Currently it is hard to assemble cpython bytecode without help of 3rd > party libraries (like: vstinner/bytecode). I'm proposing an assembler to > standard library and an API to cpython's peephole optimizer. Also an > interface like `ast.NodeVisitor` and `ast.NodeTransformer` for bytecode > objects will may be handy. > > It would help if you are doing; > - Runtime patching > - Specific optimizations at bytecode level > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Apr 24 15:27:19 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 24 Apr 2019 15:27:19 -0400 Subject: [Python-ideas] contains_any_in and contains_all_in In-Reply-To: References: <245818c4-e51e-4450-8b3c-a02f2f99009a@googlegroups.com> Message-ID: On 4/24/2019 12:50 PM, Jo?o Matos wrote: > The objective of the proposal is to increase readability. Relational expressions are powerful and flexible and yes, people find them hard to read until they really learn the sublanguage. The solution is to isolate application-specific regexes in functions that you then use in the application. > IMO using re is even more unreadable than the and/or or any/all > I mentioned. Your multiple scan str-method solutions can also be wrapped if they are sufficient for an application. The stdlib provides basic building blocks. If we endlessly add simple functions to the stdlib, it will become harder and harder to learn. > quarta-feira, 24 de Abril de 2019 ?s 05:47:04 UTC+1, Robert Vanden Eynde > escreveu: > [Me] Trivial with re module, which will answer the question in one pass. > > For those who might find it non trivial. > re.search('|'.join(map(re.escape, ['string1', 'string2', > 'string3'])), master_string) This should be wrapped in an application-specific function, with whatever name and parameters one prefers. If this particular re example is not in Regular Expression HOWTO, it could be added. It is easy to forget the need to apply re.escape to general, non-identifier strings. -- Terry Jan Reedy From python at mrabarnett.plus.com Wed Apr 24 16:05:30 2019 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 24 Apr 2019 21:05:30 +0100 Subject: [Python-ideas] CPython Bytecode Assembler In-Reply-To: References: Message-ID: On 2019-04-24 18:48, Brett Cannon wrote: > Since bytecode is a CPython-specific implementaiton detail I don't know > if it makes sense to enshrine an assembler for it in the stdlib (if you > were to ask me today if I thought the dis module belonged in the stdlib > I would probably say "no", but I also know not everyone agrees with that > view :) . > The dis module can help you understand what's happening below the surface; it's just that what happens below the surface is implementation-specific and can change between releases... > On Wed, Apr 24, 2019 at 10:36 AM Batuhan Osman Ta?kaya > > > wrote: > > Hello, > > Currently it is hard to assemble cpython bytecode without help of > 3rd party libraries (like: vstinner/bytecode). I'm proposing an > assembler to standard library and an API to cpython's peephole > optimizer. Also an interface like `ast.NodeVisitor` and > `ast.NodeTransformer` for bytecode objects will may be handy. > > It would help if you are doing; > - Runtime patching > - Specific optimizations at bytecode level > From vaibhavskarve at gmail.com Wed Apr 24 17:29:06 2019 From: vaibhavskarve at gmail.com (Vaibhav Karve) Date: Wed, 24 Apr 2019 16:29:06 -0500 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions Message-ID: (Note: This idea is about a particular static typecheking (typing?) annotation syntax). The idea is that currently the use of the "->" (right arrow) is restricted to only function definition annotation. Can we extent it to declaration of type for functions even outside their definitions? Example: Currently we write: f: Callable[[int, Dict[str, int]], str] # declaring the type of some fake function This would be much cleaner if we could write: f: int -> Dict[str, int] -> str # One of the possibilities or even: f: int, Dict[str, int] -> str # Another possibility I have no idea how this will affect the existing syntax (and if this will have any bad repercussions/notational misuse). I just thought it would be nicer to: a) Not have to spell out Callable b) Not have to use all those square brackets c) Have the same notation work for both the function annotation as well as for declaring the type. This is my first time posting an idea to python-ideas. So apologies if i am not following some conventions that i might not be aware of. Vaibhav Karve -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Apr 24 17:42:05 2019 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Apr 2019 14:42:05 -0700 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: Thanks for posting. I agree that Callable is ugly (even hideous :-), but when we introduced type annotations in PEP 484, we didn't want to introduce new syntax. The existing syntax (using -> in function headings) was supported since Python 3.0. Since then we've introduced other new syntax (in particular PEP 526) so we could indeed try adding something better for Callable. I think we should probably at least have parentheses around the arguments, so you'd write f: (int) -> str g: (int, str) -> float That looks elegant. But we should also try to support optional arguments and keyword arguments. Also, some groups of people would like to see a more concise notation for lambdas, and maybe they'd want to write x = (a, b) -> a + b as sugar for x = lambda a, b: a + b We probably can't have both, so we should at least decide which is more important. Too bad we can't use Unicode arrows. :-) On Wed, Apr 24, 2019 at 2:30 PM Vaibhav Karve wrote: > (Note: This idea is about a particular static typecheking (typing?) > annotation syntax). > The idea is that currently the use of the "->" (right arrow) is restricted > to only function definition annotation. Can we extent it to declaration of > type for functions even outside their definitions? > Example: > > Currently we write: > f: Callable[[int, Dict[str, int]], str] # declaring the type of some > fake function > > This would be much cleaner if we could write: > f: int -> Dict[str, int] -> str # One of the possibilities > > or even: > f: int, Dict[str, int] -> str # Another possibility > > I have no idea how this will affect the existing syntax (and if this will > have any bad repercussions/notational misuse). I just thought it would be > nicer to: > a) Not have to spell out Callable > b) Not have to use all those square brackets > c) Have the same notation work for both the function annotation as well as > for declaring the type. > > This is my first time posting an idea to python-ideas. So apologies if i > am not following some conventions that i might not be aware of. > Vaibhav Karve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Apr 24 20:11:42 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Apr 2019 10:11:42 +1000 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: <20190425001140.GQ3010@ando.pearwood.info> On Wed, Apr 24, 2019 at 08:59:18AM +0800, ??? wrote: > Hi all, > > Thanks for the explanation. Now I agree that the need for list.rindex() is > not as common as str.rindex(). In fact, I only need list.rindex() when > doing some algorithm problems. I guess that doesn't count as real need here. Of course it's a "real need", but the question is whether it is common enough for somebody to do the work if it doesn't affect them personally. Can you share an example of one of these algorithms? -- Steven From steve at pearwood.info Wed Apr 24 20:07:43 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Apr 2019 10:07:43 +1000 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: Message-ID: <20190425000742.GP3010@ando.pearwood.info> On Tue, Apr 23, 2019 at 10:28:29AM -0700, Brett Cannon wrote: > Given "abcdefabcdefabcdef", what is the last result of "abc"? > x.rindex("abc") will tell you. > > Given [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] where is the last result of 3? > reversed(x).index(3) will tell you (or x[::-1]). That first version doesn't work, as list_reverseiterator objects don't have an index method. You're not the only person to make that error, I too often forget that reverse() returns an iterator, not a list. The second is easy to get wrong, because it returns the wrong index: # Get the item following the last instance of spam. index = x[::-1].index(spam) print(x[index+1]) In your example, the correct index is 7 but the returned value is 2. > Notice how with lists you can easily reverse them and still get at the > value since you are searching per index. "Easily" hides a lot of copying behind the scenes. If the list is a non-trivial size, that can be very wasteful, especially if you're doing it in a loop, or hidden in a function. Don't think about the case of a ten element list, think of a ten-billion element list. Personally, I don't think I've every used list.index, let alone needed rindex. But I think we underestimate the difficulty and cost of faking an rindex method from index for those who need it (if anyone does). > But with strings, you searching by > a subslice that can be greater than 1 in which case you can't use a similar > approach. Of course you can: you "just" need to reverse the substring as well. The conversions will be even more fiddly and error-prone: py> s = "abc spam def spam ghi" py> s.rindex('spam') == len(s) - s[::-1].index('spam'[::-1]) - len('spam') True but it can be done. -- Steven From python at mrabarnett.plus.com Wed Apr 24 20:54:15 2019 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 25 Apr 2019 01:54:15 +0100 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: On 2019-04-24 22:42, Guido van Rossum wrote: > Thanks for posting. I agree that Callable is ugly (even hideous :-), but > when we introduced type annotations in PEP 484, we didn't want to > introduce new syntax. The existing syntax (using -> in function > headings) was supported since Python 3.0. > > Since then we've introduced other new syntax (in particular PEP 526) so > we could indeed try adding something better for Callable. > > I think we should probably at least have parentheses around the > arguments, so you'd write > > f: (int) -> str > g: (int, str) -> float > > That looks elegant. > > But we should also try to support optional arguments and keyword arguments. > > Also, some groups of people would like to see a more concise notation > for lambdas, and maybe they'd want to write > > x = (a, b) -> a + b > > as sugar for > > x = lambda a, b: a + b > > We probably can't have both, so we should at least decide which is more > important. > > Too bad we can't use Unicode arrows. :-) > Some languages use ->; some others use =>. As Python already uses -> for the return type, it could use => for lambdas. > On Wed, Apr 24, 2019 at 2:30 PM Vaibhav Karve > wrote: > > (Note: This idea is about a particular static typecheking (typing?) > annotation syntax). > The idea is that currently the use of the "->" (right arrow) is > restricted to only function definition annotation. Can we extent it > to declaration of type for functions even outside their definitions? > Example: > > Currently we write: > ??? f: Callable[[int, Dict[str, int]], str]? # declaring the type > of some fake function > > This would be much cleaner if we could write: > ??? f: int -> Dict[str, int] -> str?? # One of the possibilities > > or even: > ??? f: int, Dict[str, int] -> str ? ?? # Another possibility > > I have no idea how this will affect the existing syntax (and if this > will have any bad repercussions/notational misuse). I just thought > it would be nicer to: > a) Not have to spell out Callable > b) Not have to use all those square brackets > c) Have the same notation work for both the function annotation as > well as for declaring the type. > > This is my first time posting an idea to python-ideas. So apologies > if i am not following some conventions that i might not be aware of. From steve at pearwood.info Thu Apr 25 03:57:02 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Apr 2019 17:57:02 +1000 Subject: [Python-ideas] What are the strong use cases for str.rindex()? (John Lin) In-Reply-To: References: Message-ID: <20190425075702.GT3010@ando.pearwood.info> On Wed, Apr 24, 2019 at 01:50:48AM +0800, Thautwarm Zhao wrote: > However, the reason why we don't need list.rindex but do for str.rindex is > simple I'd say: str is immutable and has no O(1) reverse method. > > On the other hand, when it comes to list, you can use list.index after > list.reverse, and after a bunch of operations you can resume the state by > invoking list.reverse again. list reverse is not O(1), and flipping the order, then flipping the order back again is not safe if the list could be accessed by two or more threads. (The call to reverse itself is thread-safe, but not the operations in between.) -- Steven From steve at pearwood.info Thu Apr 25 04:20:52 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Apr 2019 18:20:52 +1000 Subject: [Python-ideas] Add a `dir_fd` parameter to `os.truncate`? In-Reply-To: <4bfd87c1-1880-618a-ad22-3a8112da92a2@pleiszenburg.de> References: <4bfd87c1-1880-618a-ad22-3a8112da92a2@pleiszenburg.de> Message-ID: <20190425082052.GU3010@ando.pearwood.info> On Fri, Apr 19, 2019 at 09:49:54PM +0200, Sebastian M. Ernst wrote: > Hi everyone, > > many methods in `os` have a `dir_fd` parameter, for instance `unlink` [1]: > ```python > os.unlink(path, *, dir_fd=None) > ``` > > The `dir_fd` parameter [2] allows it to describe paths relative to > directory descriptors. Otherwise, relative paths are relative to the > current working directory. > > The implementation of `truncate` in `os` does not have this parameter [3]: > ```python > os.truncate(path, length) > ``` [...] > Why not add a convenience function or wrapper like above to the `os` > module, which closes this gap and is more consistent with other methods? I haven't seen any responses to your proposal, perhaps I missed something. The os module is supposed to be a thin wrapper around the os functionality, but your wrapper seems thin enough that I think it could and should just go into the os.truncate function itself, rather than adding a new function. -- Steven From steve at pearwood.info Thu Apr 25 04:33:19 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Apr 2019 18:33:19 +1000 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: Message-ID: <20190425083319.GV3010@ando.pearwood.info> On Fri, Apr 19, 2019 at 05:12:06PM -0300, Danilo J. S. Bellini wrote: > I'm not aware if that was already discussed, but something I find quite > annoying is the REPL auto-complete that also includes the open parenthesis > symbol. I think it shouldn't be included in TAB completion. At least twice > a week I make mistakes like typing "help(something()" with unmatched > parentheses You could try reading the command line before hitting Enter *wink* I know what you mean, and it's a tiny annoyance for me too that when I type "help(function(" I have to delete the autocompleted opening bracket. So I guess that's a small annoyance a few dozen times a day. But having the opening bracket auto-added is a small satisfaction, and if you're using the REPL for actual calculations and not just help(), the benefit probably outweighs the annoyance: # save up to four opening brackets result = function(myclass(arg)) + another(x).method() So I don't think that have the extra complication of a switch to turn this feature off is a good idea. -- Steven From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Apr 25 07:27:34 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 25 Apr 2019 20:27:34 +0900 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: <20190425083319.GV3010@ando.pearwood.info> References: <20190425083319.GV3010@ando.pearwood.info> Message-ID: <23745.39334.925614.263705@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > But having the opening bracket auto-added is a small satisfaction, I'm not going to weigh in on this feature: I can see it either way. What I don't get is why we don't go full metal Emacs, by adding the corresponding end delimiter, and backspace. In each line below the RHS are the characters typed, the LHS the characters inserted, and the vertical bar | represents the position of the insertion cursor: ( => (|) [ => [|] { => {|} " => "|" "" => ""| # fooled ya, didn't I? Could do it for parens too. """ => """|""" '' => ''| ''' => '''|''' There may be others. And bonus points for DTRT in strings and comments (especially for "'", since it's used as apostrophe in non-program text). Once I got used to it, I found it really cut down on the annoyance of counting parens and making sure start parens weren't paired with end braces and similar errors. YMMV, of course. Steve From rhodri at kynesim.co.uk Thu Apr 25 08:09:49 2019 From: rhodri at kynesim.co.uk (Rhodri James) Date: Thu, 25 Apr 2019 13:09:49 +0100 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: <23745.39334.925614.263705@turnbull.sk.tsukuba.ac.jp> References: <20190425083319.GV3010@ando.pearwood.info> <23745.39334.925614.263705@turnbull.sk.tsukuba.ac.jp> Message-ID: <1c27e786-2d99-690f-f4b6-11486e898af8@kynesim.co.uk> On 25/04/2019 12:27, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > But having the opening bracket auto-added is a small satisfaction, > > I'm not going to weigh in on this feature: I can see it either way. > What I don't get is why we don't go full metal Emacs, by adding the > corresponding end delimiter, and backspace. In each line below the > RHS are the characters typed, the LHS the characters inserted, and the > vertical bar | represents the position of the insertion cursor: > > ( => (|) > [ => [|] > { => {|} > " => "|" > "" => ""| # fooled ya, didn't I? Could do it for parens too. > """ => """|""" > '' => ''| > ''' => '''|''' > > There may be others. And bonus points for DTRT in strings and > comments (especially for "'", since it's used as apostrophe in > non-program text). > > Once I got used to it, I found it really cut down on the annoyance of > counting parens and making sure start parens weren't paired with end > braces and similar errors. YMMV, of course. My mileage definitely varies. This sort of autocompletion is the first thing I turn off in the Eclipse-variants that come bundled with a lot of the SDKs I use, and I've never enabled it in EMACs. Automatically highlighting where the matching brackets or quotes are is a lovely thing, but I like making my own decisions and stepping past or deleting decisions the editor has made for me is a pain. -- Rhodri James *-* Kynesim Ltd From jfine2358 at gmail.com Thu Apr 25 09:21:23 2019 From: jfine2358 at gmail.com (Jonathan Fine) Date: Thu, 25 Apr 2019 14:21:23 +0100 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: <1c27e786-2d99-690f-f4b6-11486e898af8@kynesim.co.uk> References: <20190425083319.GV3010@ando.pearwood.info> <23745.39334.925614.263705@turnbull.sk.tsukuba.ac.jp> <1c27e786-2d99-690f-f4b6-11486e898af8@kynesim.co.uk> Message-ID: This is an interesting thread. Here's my two cents worth. (Colloquial US English for a low-value opinion.) I'm in favour of sensible defaults (of course). In this situation, perhaps this means defaults that work well for those who would find it difficult to select a different default. Put enough way, values that work well for Emacs users should not be the default (unless they also work well for beginners). Sometimes, when I'm using a module for the first time (or when I'm puzzled about Python's behaviour and online documentation), I find myself doing >>> help(something) quite often. And I find myself typing >>> help({}) instead of >>> help(dict) to avoid the unwanted >>> help(dict( My preference, which might work well for a wide range of use cases, is 1. If the initial identifier is help, tab produces the opening paren (. 2. If the intial identifier is callable, tab produces the opening paren (. 3. After help(, tab does not produce opening paren (. 4. Otherwise, tab does produce opening paren (. 5. Perhaps, after something like >>> help(int have tab produce the CLOSING paren ). As I said, just my two cents worth. Your opinions may vary. -- Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.borini at gmail.com Thu Apr 25 09:42:09 2019 From: stefano.borini at gmail.com (Stefano Borini) Date: Thu, 25 Apr 2019 14:42:09 +0100 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: References: <20190425083319.GV3010@ando.pearwood.info> <23745.39334.925614.263705@turnbull.sk.tsukuba.ac.jp> <1c27e786-2d99-690f-f4b6-11486e898af8@kynesim.co.uk> Message-ID: My two cents as well. I also find the open parenthesis very annoying. For me, it's asymmetry with PyCharm behavior, and inconsistency with bash. PyCharm adds the parenthesis, but also adds the end parenthesis, so the whole use of parentheses is consistent: the user has not to worry about them. Bash refuses to guess when it's ambiguous, and stops until you fill the ambiguous part. Right now, the REPL implements a mixed situation where it both assumes your usage, and does not help you all the way through. Although we can all agree that functions most of the time are invoked, rather than used as is. IMHO, either the parenthesis should not be added, or two parentheses should be added and the cursor placed in the center (I am unsure about the details of the REPL implementation, but I guess it's possible) at least to have a consistent experience. On Thu, 25 Apr 2019 at 14:24, Jonathan Fine wrote: > > This is an interesting thread. Here's my two cents worth. (Colloquial US English for a low-value opinion.) > > I'm in favour of sensible defaults (of course). In this situation, perhaps this means defaults that work well for those who would find it difficult to select a different default. Put enough way, values that work well for Emacs users should not be the default (unless they also work well for beginners). > > Sometimes, when I'm using a module for the first time (or when I'm puzzled about Python's behaviour and online documentation), I find myself doing > >>> help(something) > quite often. And I find myself typing > >>> help({}) > instead of > >>> help(dict) > to avoid the unwanted > >>> help(dict( > > My preference, which might work well for a wide range of use cases, is > 1. If the initial identifier is help, tab produces the opening paren (. > 2. If the intial identifier is callable, tab produces the opening paren (. > 3. After help(, tab does not produce opening paren (. > 4. Otherwise, tab does produce opening paren (. > 5. Perhaps, after something like > >>> help(int > have tab produce the CLOSING paren ). > > As I said, just my two cents worth. Your opinions may vary. > > -- > Jonathan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Kind regards, Stefano Borini From ram at rachum.com Thu Apr 25 10:51:08 2019 From: ram at rachum.com (Ram Rachum) Date: Thu, 25 Apr 2019 17:51:08 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers Message-ID: Hi, Here's something I want in Python: Multiple levels of tracers working on top of each other, instead of just one. I'm talking about the tracer that one can set by calling sys.settrace. I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ One of the difficulties I have, is that I can't debug or run the `coverage` tool on the core of this module. That's because the core is a trace function, and debuggers and coverage tools work by setting a trace function. When PySnooper sets its trace function using `sys.settrace`, the code that runs in that trace function runs without getting traced by the coverage tracer. This means that people who develop debuggers and coverage tools can't use a debugger or a coverage tool on the core of their tool. It's quite an annoying problem. My proposed solution: Multiple levels of tracing, instead of just one. When you install a tracer, you're not replacing the existing one, you're appending a tracer to the existing list of tracers. If this was implemented, then when PySnooper would install its tracer, the coverage tracer would still be active and running, for every line of code including the ones in PySnooper's tracer. Obviously, we'll need to figure out the API and any other kind of problems with this proposal. What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Thu Apr 25 11:10:01 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Thu, 25 Apr 2019 17:10:01 +0200 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: Message-ID: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Can't this be implemented today by a simple monkey patch of sys.settrace? > On 25 Apr 2019, at 16:51, Ram Rachum wrote: > > Hi, > > Here's something I want in Python: Multiple levels of tracers working on top of each other, instead of just one. > > I'm talking about the tracer that one can set by calling sys.settrace. > > I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ > > One of the difficulties I have, is that I can't debug or run the `coverage` tool on the core of this module. That's because the core is a trace function, and debuggers and coverage tools work by setting a trace function. When PySnooper sets its trace function using `sys.settrace`, the code that runs in that trace function runs without getting traced by the coverage tracer. > > This means that people who develop debuggers and coverage tools can't use a debugger or a coverage tool on the core of their tool. It's quite an annoying problem. > > My proposed solution: Multiple levels of tracing, instead of just one. When you install a tracer, you're not replacing the existing one, you're appending a tracer to the existing list of tracers. > > If this was implemented, then when PySnooper would install its tracer, the coverage tracer would still be active and running, for every line of code including the ones in PySnooper's tracer. > > Obviously, we'll need to figure out the API and any other kind of problems with this proposal. > > What do you think? > > > Thanks, > Ram. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Thu Apr 25 11:16:26 2019 From: ram at rachum.com (Ram Rachum) Date: Thu, 25 Apr 2019 18:16:26 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Message-ID: Oh wow, I didn't even consider that. I think you're right, I'll do more thinking about this. Thanks Anders! On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller wrote: > Can't this be implemented today by a simple monkey patch of sys.settrace? > > On 25 Apr 2019, at 16:51, Ram Rachum wrote: > > Hi, > > Here's something I want in Python: Multiple levels of tracers working on > top of each other, instead of just one. > > I'm talking about the tracer that one can set by calling sys.settrace. > > I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ > > One of the difficulties I have, is that I can't debug or run the > `coverage` tool on the core of this module. That's because the core is a > trace function, and debuggers and coverage tools work by setting a trace > function. When PySnooper sets its trace function using `sys.settrace`, the > code that runs in that trace function runs without getting traced by the > coverage tracer. > > This means that people who develop debuggers and coverage tools can't use > a debugger or a coverage tool on the core of their tool. It's quite an > annoying problem. > > My proposed solution: Multiple levels of tracing, instead of just one. > When you install a tracer, you're not replacing the existing one, you're > appending a tracer to the existing list of tracers. > > If this was implemented, then when PySnooper would install its tracer, the > coverage tracer would still be active and running, for every line of code > including the ones in PySnooper's tracer. > > Obviously, we'll need to figure out the API and any other kind of problems > with this proposal. > > What do you think? > > > Thanks, > Ram. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Thu Apr 25 12:26:04 2019 From: ram at rachum.com (Ram Rachum) Date: Thu, 25 Apr 2019 19:26:04 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Message-ID: Hmm, looks like, for this to work, you'll need the existing tracer to be cooperative. Right now there are existing tracers, for example coverage's tracer and Wing IDE's tracer, and I would need to modify them for your idea to work, right? If I understand your idea correctly, the first tracer would monkeypatch `sys.settrace` so whenever someone else adds a tracer, it doesn't really do `sys.settrace` but just add a function that the real tracer would be calling after it's done tracing. But this can't really be done without the original tracer implementing it, right? On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum wrote: > Oh wow, I didn't even consider that. I think you're right, I'll do more > thinking about this. Thanks Anders! > > On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller > wrote: > >> Can't this be implemented today by a simple monkey patch of sys.settrace? >> >> On 25 Apr 2019, at 16:51, Ram Rachum wrote: >> >> Hi, >> >> Here's something I want in Python: Multiple levels of tracers working on >> top of each other, instead of just one. >> >> I'm talking about the tracer that one can set by calling sys.settrace. >> >> I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ >> >> One of the difficulties I have, is that I can't debug or run the >> `coverage` tool on the core of this module. That's because the core is a >> trace function, and debuggers and coverage tools work by setting a trace >> function. When PySnooper sets its trace function using `sys.settrace`, the >> code that runs in that trace function runs without getting traced by the >> coverage tracer. >> >> This means that people who develop debuggers and coverage tools can't use >> a debugger or a coverage tool on the core of their tool. It's quite an >> annoying problem. >> >> My proposed solution: Multiple levels of tracing, instead of just one. >> When you install a tracer, you're not replacing the existing one, you're >> appending a tracer to the existing list of tracers. >> >> If this was implemented, then when PySnooper would install its tracer, >> the coverage tracer would still be active and running, for every line of >> code including the ones in PySnooper's tracer. >> >> Obviously, we'll need to figure out the API and any other kind of >> problems with this proposal. >> >> What do you think? >> >> >> Thanks, >> Ram. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Thu Apr 25 13:16:41 2019 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 25 Apr 2019 13:16:41 -0400 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Message-ID: Perhaps I misunderstand what's implied by "simple(!) monkeypatch of sys.settrace", but the trickiest part of Ram's proposal is that the body of one trace function would still trigger the remaining trace functions.? That to me sounds like it's going to require changes to ceval.c --Ned. On 4/25/19 12:26 PM, Ram Rachum wrote: > Hmm, looks like, for this to work, you'll need the existing tracer to > be cooperative. Right now there are existing tracers, for example > coverage's tracer and Wing IDE's tracer, and I would need to modify > them for your idea to work, right? > > If I understand your idea correctly, the first tracer would > monkeypatch `sys.settrace` so whenever someone else adds a tracer, it > doesn't really do `sys.settrace` but just add a function that the real > tracer would be calling after it's done tracing. But this can't really > be done without the original tracer implementing it, right? > > On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum > wrote: > > Oh wow, I didn't even consider that. I think you're right, I'll do > more thinking about this. Thanks Anders! > > On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller > > wrote: > > Can't this be implemented today by a simple monkey patch of > sys.settrace? > > On 25 Apr 2019, at 16:51, Ram Rachum > wrote: > >> Hi, >> >> Here's something I want in Python: Multiple levels of tracers >> working on top of each other, instead of just one. >> >> I'm talking about the tracer that one can set by calling >> sys.settrace. >> >> I've recently released PySnooper: >> https://github.com/cool-RR/PySnooper/ >> >> One of the difficulties I have, is that I can't debug or run >> the `coverage` tool on the core of this module. That's >> because the core is a trace function, and debuggers and >> coverage tools work by setting a trace function. When >> PySnooper sets its trace function using `sys.settrace`, the >> code that runs in that trace function runs without getting >> traced by the coverage tracer. >> >> This means that people who develop debuggers and coverage >> tools can't use a debugger or a coverage tool on the core of >> their tool. It's quite an annoying problem. >> >> My proposed solution: Multiple levels of tracing, instead of >> just one. When you install a tracer, you're not replacing the >> existing one, you're appending a tracer to the existing list >> of tracers. >> >> If this was implemented, then when PySnooper would install >> its tracer, the coverage tracer would still be active and >> running, for every line of code including the ones in >> PySnooper's tracer. >> >> Obviously, we'll need to figure out the API and any other >> kind of problems with this proposal. >> >> What do you think? >> >> >> Thanks, >> Ram. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Thu Apr 25 13:35:32 2019 From: ram.rachum at gmail.com (Ram Rachum) Date: Thu, 25 Apr 2019 20:35:32 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Message-ID: To clarify, I meant that each trace function would manually call any trace functions that were registered below it, instead of using the trampoline in cpython. Does that solve the problem you raised? On Thu, Apr 25, 2019, 20:20 Ned Batchelder wrote: > Perhaps I misunderstand what's implied by "simple(!) monkeypatch of > sys.settrace", but the trickiest part of Ram's proposal is that the body of > one trace function would still trigger the remaining trace functions. That > to me sounds like it's going to require changes to ceval.c > > --Ned. > On 4/25/19 12:26 PM, Ram Rachum wrote: > > Hmm, looks like, for this to work, you'll need the existing tracer to be > cooperative. Right now there are existing tracers, for example coverage's > tracer and Wing IDE's tracer, and I would need to modify them for your idea > to work, right? > > If I understand your idea correctly, the first tracer would monkeypatch > `sys.settrace` so whenever someone else adds a tracer, it doesn't really do > `sys.settrace` but just add a function that the real tracer would be > calling after it's done tracing. But this can't really be done without the > original tracer implementing it, right? > > On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum wrote: > >> Oh wow, I didn't even consider that. I think you're right, I'll do more >> thinking about this. Thanks Anders! >> >> On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller >> wrote: >> >>> Can't this be implemented today by a simple monkey patch of sys.settrace? >>> >>> On 25 Apr 2019, at 16:51, Ram Rachum wrote: >>> >>> Hi, >>> >>> Here's something I want in Python: Multiple levels of tracers working on >>> top of each other, instead of just one. >>> >>> I'm talking about the tracer that one can set by calling sys.settrace. >>> >>> I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ >>> >>> One of the difficulties I have, is that I can't debug or run the >>> `coverage` tool on the core of this module. That's because the core is a >>> trace function, and debuggers and coverage tools work by setting a trace >>> function. When PySnooper sets its trace function using `sys.settrace`, the >>> code that runs in that trace function runs without getting traced by the >>> coverage tracer. >>> >>> This means that people who develop debuggers and coverage tools can't >>> use a debugger or a coverage tool on the core of their tool. It's quite an >>> annoying problem. >>> >>> My proposed solution: Multiple levels of tracing, instead of just one. >>> When you install a tracer, you're not replacing the existing one, you're >>> appending a tracer to the existing list of tracers. >>> >>> If this was implemented, then when PySnooper would install its tracer, >>> the coverage tracer would still be active and running, for every line of >>> code including the ones in PySnooper's tracer. >>> >>> Obviously, we'll need to figure out the API and any other kind of >>> problems with this proposal. >>> >>> What do you think? >>> >>> >>> Thanks, >>> Ram. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Thu Apr 25 14:02:23 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Thu, 25 Apr 2019 20:02:23 +0200 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> Message-ID: <6CB0A277-555F-4E77-AAAB-F69F4F93617F@killingar.net> Well, it would trigger the top level chaining trace function, but they should be able to decide when to call the sub-trace functions. Hmm... Maybe :) > On 25 Apr 2019, at 19:16, Ned Batchelder wrote: > > Perhaps I misunderstand what's implied by "simple(!) monkeypatch of sys.settrace", but the trickiest part of Ram's proposal is that the body of one trace function would still trigger the remaining trace functions. That to me sounds like it's going to require changes to ceval.c > > --Ned. > >> On 4/25/19 12:26 PM, Ram Rachum wrote: >> Hmm, looks like, for this to work, you'll need the existing tracer to be cooperative. Right now there are existing tracers, for example coverage's tracer and Wing IDE's tracer, and I would need to modify them for your idea to work, right? >> >> If I understand your idea correctly, the first tracer would monkeypatch `sys.settrace` so whenever someone else adds a tracer, it doesn't really do `sys.settrace` but just add a function that the real tracer would be calling after it's done tracing. But this can't really be done without the original tracer implementing it, right? >> >>> On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum wrote: >>> Oh wow, I didn't even consider that. I think you're right, I'll do more thinking about this. Thanks Anders! >>> >>>> On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller wrote: >>>> Can't this be implemented today by a simple monkey patch of sys.settrace? >>>> >>>> On 25 Apr 2019, at 16:51, Ram Rachum wrote: >>>> >>>>> Hi, >>>>> >>>>> Here's something I want in Python: Multiple levels of tracers working on top of each other, instead of just one. >>>>> >>>>> I'm talking about the tracer that one can set by calling sys.settrace. >>>>> >>>>> I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ >>>>> >>>>> One of the difficulties I have, is that I can't debug or run the `coverage` tool on the core of this module. That's because the core is a trace function, and debuggers and coverage tools work by setting a trace function. When PySnooper sets its trace function using `sys.settrace`, the code that runs in that trace function runs without getting traced by the coverage tracer. >>>>> >>>>> This means that people who develop debuggers and coverage tools can't use a debugger or a coverage tool on the core of their tool. It's quite an annoying problem. >>>>> >>>>> My proposed solution: Multiple levels of tracing, instead of just one. When you install a tracer, you're not replacing the existing one, you're appending a tracer to the existing list of tracers. >>>>> >>>>> If this was implemented, then when PySnooper would install its tracer, the coverage tracer would still be active and running, for every line of code including the ones in PySnooper's tracer. >>>>> >>>>> Obviously, we'll need to figure out the API and any other kind of problems with this proposal. >>>>> >>>>> What do you think? >>>>> >>>>> >>>>> Thanks, >>>>> Ram. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Thu Apr 25 16:50:43 2019 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 25 Apr 2019 13:50:43 -0700 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: TBH, I don't think it is so bad that it requires a new syntax. But I am not strongly against it either. What I would like to add here is that if we will go with the replacement: Callable[[X, Y], Z] becomes (X, Y) -> Z then we should also go with Union[X, Y] becomes X | Y Tuple[X, Y] becomes (X, Y) Optional[X] becomes X? (Intersection[X, Y] when added becomes X & Y) The current syntax (although really verbose) is consistent, so if we want to improve it I would like to keep consistency. Also if we are going forward with this, should we allow mixing old and new syntax? Will the old syntax be deprecated after we introduce the new one? -- Ivan On Wed, 24 Apr 2019 at 14:43, Guido van Rossum wrote: > Thanks for posting. I agree that Callable is ugly (even hideous :-), but > when we introduced type annotations in PEP 484, we didn't want to introduce > new syntax. The existing syntax (using -> in function headings) was > supported since Python 3.0. > > Since then we've introduced other new syntax (in particular PEP 526) so we > could indeed try adding something better for Callable. > > I think we should probably at least have parentheses around the arguments, > so you'd write > > f: (int) -> str > g: (int, str) -> float > > That looks elegant. > > But we should also try to support optional arguments and keyword arguments. > > Also, some groups of people would like to see a more concise notation for > lambdas, and maybe they'd want to write > > x = (a, b) -> a + b > > as sugar for > > x = lambda a, b: a + b > > We probably can't have both, so we should at least decide which is more > important. > > Too bad we can't use Unicode arrows. :-) > > On Wed, Apr 24, 2019 at 2:30 PM Vaibhav Karve > wrote: > >> (Note: This idea is about a particular static typecheking (typing?) >> annotation syntax). >> The idea is that currently the use of the "->" (right arrow) is >> restricted to only function definition annotation. Can we extent it to >> declaration of type for functions even outside their definitions? >> Example: >> >> Currently we write: >> f: Callable[[int, Dict[str, int]], str] # declaring the type of some >> fake function >> >> This would be much cleaner if we could write: >> f: int -> Dict[str, int] -> str # One of the possibilities >> >> or even: >> f: int, Dict[str, int] -> str # Another possibility >> >> I have no idea how this will affect the existing syntax (and if this will >> have any bad repercussions/notational misuse). I just thought it would be >> nicer to: >> a) Not have to spell out Callable >> b) Not have to use all those square brackets >> c) Have the same notation work for both the function annotation as well >> as for declaring the type. >> >> This is my first time posting an idea to python-ideas. So apologies if i >> am not following some conventions that i might not be aware of. >> Vaibhav Karve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > --Guido van Rossum (python.org/~guido) > *Pronouns: he/him/his **(why is my pronoun here?)* > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Thu Apr 25 17:29:18 2019 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 25 Apr 2019 17:29:18 -0400 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: <6CB0A277-555F-4E77-AAAB-F69F4F93617F@killingar.net> References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> <6CB0A277-555F-4E77-AAAB-F69F4F93617F@killingar.net> Message-ID: <11e6c2a0-7c08-34a0-5303-191a64356d4f@nedbatchelder.com> It wouldn't be difficult to have a list of trace functions, so that every line of "real" Python executed, would invoke all the trace functions.? But Ram has asked for something more: when the first trace function is executing, its line should themselves be traced by the remaining trace functions in the list.? Presumably the lines in the second trace function should also be traced by the function third in the list, and so on.? This is the thing that will be difficult to accomplish. --Ned. On 4/25/19 2:02 PM, Anders Hovm?ller wrote: > Well, it would trigger the top level chaining trace function, but they > should be able to decide when to call the sub-trace functions. Hmm... > Maybe :) > > On 25 Apr 2019, at 19:16, Ned Batchelder > wrote: > >> Perhaps I misunderstand what's implied by "simple(!) monkeypatch of >> sys.settrace", but the trickiest part of Ram's proposal is that the >> body of one trace function would still trigger the remaining trace >> functions.? That to me sounds like it's going to require changes to >> ceval.c >> >> --Ned. >> >> On 4/25/19 12:26 PM, Ram Rachum wrote: >>> Hmm, looks like, for this to work, you'll need the existing tracer >>> to be cooperative. Right now there are existing tracers, for example >>> coverage's tracer and Wing IDE's tracer, and I would need to modify >>> them for your idea to work, right? >>> >>> If I understand your idea correctly, the first tracer would >>> monkeypatch `sys.settrace` so whenever someone else adds a tracer, >>> it doesn't really do `sys.settrace` but just add a function that the >>> real tracer would be calling after it's done tracing. But this can't >>> really be done without the original tracer implementing it, right? >>> >>> On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum >> > wrote: >>> >>> Oh wow, I didn't even consider that. I think you're right, I'll >>> do more thinking about this. Thanks Anders! >>> >>> On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller >>> > wrote: >>> >>> Can't this be implemented today by a simple monkey patch of >>> sys.settrace? >>> >>> On 25 Apr 2019, at 16:51, Ram Rachum >> > wrote: >>> >>>> Hi, >>>> >>>> Here's something I want in Python: Multiple levels of >>>> tracers working on top of each other, instead of just one. >>>> >>>> I'm talking about the tracer that one can set by calling >>>> sys.settrace. >>>> >>>> I've recently released PySnooper: >>>> https://github.com/cool-RR/PySnooper/ >>>> >>>> One of the difficulties I have, is that I can't debug or >>>> run the `coverage` tool on the core of this module. That's >>>> because the core is a trace function, and debuggers and >>>> coverage tools work by setting a trace function. When >>>> PySnooper sets its trace function using `sys.settrace`, the >>>> code that runs in that trace function runs without getting >>>> traced by the coverage tracer. >>>> >>>> This means that people who develop debuggers and coverage >>>> tools can't use a debugger or a coverage tool on the core >>>> of their tool. It's quite an annoying problem. >>>> >>>> My proposed solution: Multiple levels of tracing, instead >>>> of just one. When you install a tracer, you're not >>>> replacing the existing one, you're appending a tracer to >>>> the existing list of tracers. >>>> >>>> If this was implemented, then when PySnooper would install >>>> its tracer, the coverage tracer would still be active and >>>> running, for every line of code including the ones in >>>> PySnooper's tracer. >>>> >>>> Obviously, we'll need to figure out the API and any other >>>> kind of problems with this proposal. >>>> >>>> What do you think? >>>> >>>> >>>> Thanks, >>>> Ram. >>>> >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct:http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.ed.oconnor at gmail.com Thu Apr 25 18:41:10 2019 From: peter.ed.oconnor at gmail.com (Peter O'Connor) Date: Thu, 25 Apr 2019 15:41:10 -0700 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings Message-ID: Dear all, Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example: def func_1(a: int = 1, b: float = 2.5) -> float: """ Something about func_1 :param a: Something about param a :param b: Something else about param b :return: Something about return value of func_1 """ return a*b def func_2(c:float=3.4, d: bool =True) -> float: """ Something about func_2 :param c: Something about param c :param d: Something else about param d :return: Something about return value """ return c if d else -c def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: """ Something about main_function :param a: Something about param a :param b: Something else about param b :param d: Something else about param d :return: Something about return value """ return func_2(func_1(a=a, b=b), d=d) Which has the following problems: - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) - Types are defined in multiple places - Documentation is copy-pasted when referencing a single thing from different places. (I can't count the number of types I've written ":param img: A (size_y, size_x, 3) RGB image" - I could now just reference a single RGB_IMAGE_DOC variable) - Argument names need to be written twice - in the header and documentation - and it's up to the user / IDE to make sure they stay in sync. I propose to resolve this with the following changes: - Argument/return documentation can be made inline with a new "?" operator. Documentation becomes a first class citizen. - Argument (type/default/doc) can be referenced by "func.args..type" / "func.args..default" / "func.args..doc". Positional reference: e.g. "func.args[1].default" also allowed. If not specified, they take a special, built-in "Undefined" value (because None may have another meaning for defaults). Return type/doc can be referenced with "func.return.type" / "func.return.doc". This would result in the following syntax: def func_1( a: int = 1 ? 'Something about param a', b: float = 2.5 ? 'Something else about param b', ) -> float ? 'Something about return value of func_1': """ Something about func_1 """ return a*b def func_2( c: float=3.4 ? 'Something about param c', d: bool =True ? 'Something else about param d', ) -> float ? 'Something about return value': """ Something about func_2 """ return c if d else -c def main_function( a: func_1.args.a.type = func_1.args.a.default ? func_1.args.a.doc, b: func_1.args.b.type = func_1.args.b.default ? func_1.args.b.doc, d: func_2.args.d.type = func_2.args.d.default ? func_2.args.d.doc, ) -> func_2.return.type ? func2.return.doc: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d) If the main_function header seems repetitious (it does) we could allow for an optional shorthand notation like: def main_function( a :=? func_1.args.a, b :=? func_1.args.b, d :=? func_2.args.d, ) ->? func_2.return: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d) Where "a :=? func_1.args.a" means "argument 'a' takes the same type/default/documentation as argument 'a' of func_1". So what do you say? Yes it's a bold move, but I think in the long term it's badly needed. Perhaps something similar has been proposed already that I'm not aware of. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Apr 25 18:57:03 2019 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Apr 2019 08:57:03 +1000 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: Message-ID: On Fri, Apr 26, 2019 at 8:42 AM Peter O'Connor wrote: > > Dear all, > > Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example: > > def func_1(a: int = 1, b: float = 2.5) -> float: > """ > Something about func_1 > :param a: Something about param a > :param b: Something else about param b > :return: Something about return value of func_1 > """ > return a*b > > def func_2(c:float=3.4, d: bool =True) -> float: > """ > Something about func_2 > :param c: Something about param c > :param d: Something else about param d > :return: Something about return value > """ > return c if d else -c > > def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: > """ > Something about main_function > :param a: Something about param a > :param b: Something else about param b > :param d: Something else about param d > :return: Something about return value > """ > return func_2(func_1(a=a, b=b), d=d) > > Which has the following problems: > - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) > I'd actually rather explore fixing this problem than the other. We have functools.wraps() for the case where you do nothing other than pass through *a,**kw, but when you want to add or remove an argument, I don't think there's an easy way to say "that function's signature, but with these changes". That way, you aren't obfuscating the interface (since the called function's signature is incorporated into the wrapper's), and you're not duplicating defaults or anything. It shouldn't need to be all that complicated to use (although I'm sure it'll be complicated to implement). Something like: @functools.passes_args(f) def wrapper(spam, ham, *a, **kw): f(*a, **kw) There would need to be parameters to indicate the addition of parameters, but it could detect the removal (which is common for wrappers) just from the function's own signature. If that were implemented, would it remove the need for this new syntax you propose? ChrisA From robertve92 at gmail.com Thu Apr 25 18:59:45 2019 From: robertve92 at gmail.com (Robert Vanden Eynde) Date: Fri, 26 Apr 2019 00:59:45 +0200 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: Message-ID: Looks like a more complicated way to say : def f(x:'int : which does stuff' = 5, y:'int : which does more stuffs') The code reading the annotations (like the linter) might then parse it simply using .split. robertvandeneynde.be Le ven. 26 avr. 2019 ? 00:41, Peter O'Connor a ?crit : > Dear all, > > Despite the general beauty of Python, I find myself constantly violating > the "don't repeat yourself" maxim when trying to write clear, fully > documented code. Take the following example: > > def func_1(a: int = 1, b: float = 2.5) -> float: > """ > Something about func_1 > :param a: Something about param a > :param b: Something else about param b > :return: Something about return value of func_1 > """ > return a*b > > def func_2(c:float=3.4, d: bool =True) -> float: > """ > Something about func_2 > :param c: Something about param c > :param d: Something else about param d > :return: Something about return value > """ > return c if d else -c > > def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: > """ > Something about main_function > :param a: Something about param a > :param b: Something else about param b > :param d: Something else about param d > :return: Something about return value > """ > return func_2(func_1(a=a, b=b), d=d) > > Which has the following problems: > - Defaults are defined in multiple places, which very easily leads to bugs > (I'm aware of **kwargs but it obfuscates function interfaces and usually > does more harm than good) > - Types are defined in multiple places > - Documentation is copy-pasted when referencing a single thing from > different places. (I can't count the number of types I've written ":param > img: A (size_y, size_x, 3) RGB image" - I could now just reference a single > RGB_IMAGE_DOC variable) > - Argument names need to be written twice - in the header and > documentation - and it's up to the user / IDE to make sure they stay in > sync. > > I propose to resolve this with the following changes: > - Argument/return documentation can be made inline with a new "?" > operator. Documentation becomes a first class citizen. > - Argument (type/default/doc) can be referenced by "func.args..type" > / "func.args..default" / "func.args..doc". > Positional reference: e.g. "func.args[1].default" also allowed. If not > specified, they take a special, built-in "Undefined" value (because None > may have another meaning for defaults). Return type/doc can be referenced > with "func.return.type" / "func.return.doc". > > This would result in the following syntax: > > def func_1( > a: int = 1 ? 'Something about param a', > b: float = 2.5 ? 'Something else about param b', > ) -> float ? 'Something about return value of func_1': > """ Something about func_1 """ > return a*b > > def func_2( > c: float=3.4 ? 'Something about param c', > d: bool =True ? 'Something else about param d', > ) -> float ? 'Something about return value': > """ Something about func_2 """ > return c if d else -c > > def main_function( > a: func_1.args.a.type = func_1.args.a.default ? > func_1.args.a.doc, > b: func_1.args.b.type = func_1.args.b.default ? > func_1.args.b.doc, > d: func_2.args.d.type = func_2.args.d.default ? > func_2.args.d.doc, > ) -> func_2.return.type ? func2.return.doc: > """ Something about main_function """ > return func_2(func_1(a=a, b=b), d=d) > > If the main_function header seems repetitious (it does) we could allow for > an optional shorthand notation like: > > def main_function( > a :=? func_1.args.a, > b :=? func_1.args.b, > d :=? func_2.args.d, > ) ->? func_2.return: > """ Something about main_function """ > return func_2(func_1(a=a, b=b), d=d) > > Where "a :=? func_1.args.a" means "argument 'a' takes the same > type/default/documentation as argument 'a' of func_1". > > So what do you say? Yes it's a bold move, but I think in the long term > it's badly needed. Perhaps something similar has been proposed already > that I'm not aware of. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Apr 25 19:14:32 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Apr 2019 11:14:32 +1200 Subject: [Python-ideas] Open parenthesis in REPL completion In-Reply-To: <20190425083319.GV3010@ando.pearwood.info> References: <20190425083319.GV3010@ando.pearwood.info> Message-ID: <5CC23F58.3060205@canterbury.ac.nz> Steven D'Aprano wrote: > But having the opening bracket auto-added is a small satisfaction, and > if you're using the REPL for actual calculations and not just help(), > the benefit probably outweighs the annoyance The completer could detect the help( as well and leave out the opening paren in that case. -- Greg From greg.ewing at canterbury.ac.nz Thu Apr 25 19:12:45 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Apr 2019 11:12:45 +1200 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: <5CC23EED.3050808@canterbury.ac.nz> Guido van Rossum wrote: > Also, some groups of people would like to see a more concise notation > for lambdas, and maybe they'd want to write > > x = (a, b) -> a + b > > We probably can't have both, I think we could if we wanted to. In an annotation, -> could be treated as sugar for Callable[], and in other expressions as sugar for lambda. -- Greg From greg.ewing at canterbury.ac.nz Thu Apr 25 19:12:51 2019 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Apr 2019 11:12:51 +1200 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: <20190425000742.GP3010@ando.pearwood.info> References: <20190425000742.GP3010@ando.pearwood.info> Message-ID: <5CC23EF3.9010204@canterbury.ac.nz> Steven D'Aprano wrote: > I too often forget that reverse() returns an iterator, That seems like a mistake. Shouldn't it return a view? -- Greg From mertz at gnosis.cx Thu Apr 25 20:50:33 2019 From: mertz at gnosis.cx (David Mertz) Date: Thu, 25 Apr 2019 20:50:33 -0400 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: Message-ID: On Thu, Apr 25, 2019 at 6:42 PM Peter O'Connor wrote: > Despite the general beauty of Python, I find myself constantly violating > the "don't repeat yourself" maxim when trying to write clear, fully > documented code. Take the following example: > You do know that OPTIONAL type annotations are optional, right. There's no requirement to repeat yourself if you don't want to. But in general, comments or docstrings can do something very different from just annotate one variable at a time. If I want to describe the interaction of `a` and `b`, that simply cannot fit in an annotation/comment per parameter. Of if one argument switches the relevance or meaning of another, etc. > - Argument/return documentation can be made inline with a new "?" > operator. Documentation becomes a first class citizen. > We already have a first-class citizen in annotations, this seems like extra burden for little reason. > def func_1( > a: int = 1 ? 'Something about param a', > b: float = 2.5 ? 'Something else about param b', > ) -> float ? 'Something about return value of func_1': > """ Something about func_1 """ > return a*b > Why not just this in existing Python: def func_1( a: int = 1 # 'Something about param a', b: float = 2.5 # 'Something else about param b', ) -> float: """Something about func_1 a and b interact in this interesting way. a should be in range 0 < a < 125 floor(b) should be a prime number Something about return value of func_1 returns a multiplication """ return a*b -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Thu Apr 25 23:36:27 2019 From: tsyu80 at gmail.com (Tony Yu) Date: Thu, 25 Apr 2019 22:36:27 -0500 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: Ivan Levkivskyi wrote: > Callable[[X, Y], Z] becomes (X, Y) -> Z > Union[X, Y] becomes X | Y > Tuple[X, Y] becomes (X, Y) > Optional[X] becomes X? > (Intersection[X, Y] when added becomes X & Y) > I really like this idea, but I could also see this getting a bit confusing because the syntax preceding type annotations (e.g. ":") is so lightweight/subtle---it might be difficult distinguish were annotations end and subsequent code begins. Defining some sort of wrapper function might help here; e.g.: Callable[[X, Y], Z] becomes EzType[(X, Y) => Z] Union[X, Y] becomes EzType[X | Y] Tuple[X, Y] becomes EzType[X, Y] Optional[X] becomes EzType[X?] ... The name EzType is certainly not ideal, but you get the basic idea. --- Tony On Thu, Apr 25, 2019 at 3:51 PM Ivan Levkivskyi wrote: > TBH, I don't think it is so bad that it requires a new syntax. But I am > not strongly against it either. What I would like to add here is that if we > will go with the replacement: > > Callable[[X, Y], Z] becomes (X, Y) -> Z > > then we should also go with > > Union[X, Y] becomes X | Y > Tuple[X, Y] becomes (X, Y) > Optional[X] becomes X? > (Intersection[X, Y] when added becomes X & Y) > > The current syntax (although really verbose) is consistent, so if we want > to improve it I would like to keep consistency. > Also if we are going forward with this, should we allow mixing old and new > syntax? Will the old syntax be deprecated after we introduce the new one? > > -- > Ivan > > > > On Wed, 24 Apr 2019 at 14:43, Guido van Rossum wrote: > >> Thanks for posting. I agree that Callable is ugly (even hideous :-), but >> when we introduced type annotations in PEP 484, we didn't want to introduce >> new syntax. The existing syntax (using -> in function headings) was >> supported since Python 3.0. >> >> Since then we've introduced other new syntax (in particular PEP 526) so >> we could indeed try adding something better for Callable. >> >> I think we should probably at least have parentheses around the >> arguments, so you'd write >> >> f: (int) -> str >> g: (int, str) -> float >> >> That looks elegant. >> >> But we should also try to support optional arguments and keyword >> arguments. >> >> Also, some groups of people would like to see a more concise notation for >> lambdas, and maybe they'd want to write >> >> x = (a, b) -> a + b >> >> as sugar for >> >> x = lambda a, b: a + b >> >> We probably can't have both, so we should at least decide which is more >> important. >> >> Too bad we can't use Unicode arrows. :-) >> >> On Wed, Apr 24, 2019 at 2:30 PM Vaibhav Karve >> wrote: >> >>> (Note: This idea is about a particular static typecheking (typing?) >>> annotation syntax). >>> The idea is that currently the use of the "->" (right arrow) is >>> restricted to only function definition annotation. Can we extent it to >>> declaration of type for functions even outside their definitions? >>> Example: >>> >>> Currently we write: >>> f: Callable[[int, Dict[str, int]], str] # declaring the type of >>> some fake function >>> >>> This would be much cleaner if we could write: >>> f: int -> Dict[str, int] -> str # One of the possibilities >>> >>> or even: >>> f: int, Dict[str, int] -> str # Another possibility >>> >>> I have no idea how this will affect the existing syntax (and if this >>> will have any bad repercussions/notational misuse). I just thought it would >>> be nicer to: >>> a) Not have to spell out Callable >>> b) Not have to use all those square brackets >>> c) Have the same notation work for both the function annotation as well >>> as for declaring the type. >>> >>> This is my first time posting an idea to python-ideas. So apologies if i >>> am not following some conventions that i might not be aware of. >>> Vaibhav Karve >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> *Pronouns: he/him/his **(why is my pronoun here?)* >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Thu Apr 25 23:47:03 2019 From: ram at rachum.com (Ram Rachum) Date: Fri, 26 Apr 2019 06:47:03 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: <11e6c2a0-7c08-34a0-5303-191a64356d4f@nedbatchelder.com> References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> <6CB0A277-555F-4E77-AAAB-F69F4F93617F@killingar.net> <11e6c2a0-7c08-34a0-5303-191a64356d4f@nedbatchelder.com> Message-ID: Ah, I thought about it now and Ned is right. This would require modifications to ceval.c and others. The question is... Does anyone else think it's a good idea? On Fri, Apr 26, 2019 at 12:31 AM Ned Batchelder wrote: > It wouldn't be difficult to have a list of trace functions, so that every > line of "real" Python executed, would invoke all the trace functions. But > Ram has asked for something more: when the first trace function is > executing, its line should themselves be traced by the remaining trace > functions in the list. Presumably the lines in the second trace function > should also be traced by the function third in the list, and so on. This > is the thing that will be difficult to accomplish. > > --Ned. > On 4/25/19 2:02 PM, Anders Hovm?ller wrote: > > Well, it would trigger the top level chaining trace function, but they > should be able to decide when to call the sub-trace functions. Hmm... Maybe > :) > > On 25 Apr 2019, at 19:16, Ned Batchelder wrote: > > Perhaps I misunderstand what's implied by "simple(!) monkeypatch of > sys.settrace", but the trickiest part of Ram's proposal is that the body of > one trace function would still trigger the remaining trace functions. That > to me sounds like it's going to require changes to ceval.c > > --Ned. > On 4/25/19 12:26 PM, Ram Rachum wrote: > > Hmm, looks like, for this to work, you'll need the existing tracer to be > cooperative. Right now there are existing tracers, for example coverage's > tracer and Wing IDE's tracer, and I would need to modify them for your idea > to work, right? > > If I understand your idea correctly, the first tracer would monkeypatch > `sys.settrace` so whenever someone else adds a tracer, it doesn't really do > `sys.settrace` but just add a function that the real tracer would be > calling after it's done tracing. But this can't really be done without the > original tracer implementing it, right? > > On Thu, Apr 25, 2019 at 6:16 PM Ram Rachum wrote: > >> Oh wow, I didn't even consider that. I think you're right, I'll do more >> thinking about this. Thanks Anders! >> >> On Thu, Apr 25, 2019 at 6:10 PM Anders Hovm?ller >> wrote: >> >>> Can't this be implemented today by a simple monkey patch of sys.settrace? >>> >>> On 25 Apr 2019, at 16:51, Ram Rachum wrote: >>> >>> Hi, >>> >>> Here's something I want in Python: Multiple levels of tracers working on >>> top of each other, instead of just one. >>> >>> I'm talking about the tracer that one can set by calling sys.settrace. >>> >>> I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ >>> >>> One of the difficulties I have, is that I can't debug or run the >>> `coverage` tool on the core of this module. That's because the core is a >>> trace function, and debuggers and coverage tools work by setting a trace >>> function. When PySnooper sets its trace function using `sys.settrace`, the >>> code that runs in that trace function runs without getting traced by the >>> coverage tracer. >>> >>> This means that people who develop debuggers and coverage tools can't >>> use a debugger or a coverage tool on the core of their tool. It's quite an >>> annoying problem. >>> >>> My proposed solution: Multiple levels of tracing, instead of just one. >>> When you install a tracer, you're not replacing the existing one, you're >>> appending a tracer to the existing list of tracers. >>> >>> If this was implemented, then when PySnooper would install its tracer, >>> the coverage tracer would still be active and running, for every line of >>> code including the ones in PySnooper's tracer. >>> >>> Obviously, we'll need to figure out the API and any other kind of >>> problems with this proposal. >>> >>> What do you think? >>> >>> >>> Thanks, >>> Ram. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Fri Apr 26 00:04:30 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Fri, 26 Apr 2019 06:04:30 +0200 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: Message-ID: <0D0D1FF5-6E8F-4EA9-87B3-AC840BC83A8D@killingar.net> > On 26 Apr 2019, at 00:41, Peter O'Connor wrote: > > - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) > - Types are defined in multiple places > - Documentation is copy-pasted when referencing a single thing from different places. (I can't count the number of types I've written ":param img: A (size_y, size_x, 3) RGB image" - I could now just reference a single RGB_IMAGE_DOC variable) > - Argument names need to be written twice - in the header and documentation - and it's up to the user / IDE to make sure they stay in sync. We have this exact problem in many places in tri.form, tri.query, tri.table and the code bases that use them. I would really like a solution to these! But you don?t seem to address these problems at all in the rest of your email, which makes me confused. In general I think what we want is an agreed upon way to specify argument names, counts and defaults for use by static analysis tools, documentation generation tools and IDEs, in a programmatic way. This could solve the problems you reference above, and also the issue of how to supply auto complete for something like Djangos query language (where you can do SomeTable.objects.filter(foreignkey__anotherforeignkey__value=3) which is great!). Maybe something like... def foo(**kwargs): ??? @signature_by: full.module.path.to.a.signature_function(pass_kwargs_to=bar, hardcoded=[?quux?]) ??? return bar(**kwargs, quux=3) def signature_function(f, pass_kwargs_to=None, hardcoded=None, **_): signature = inspect.signature(f) if pass_kwargs_to is not None: signature_nested = inspect.signature(pass_kwargs_to) signature.remove_kwargs() signature = signature.merge(signature_nested) if hardcoded is not None: for h in hardcoded: signature.parameters.remove(h) return signature Some of the above is pseudo code obviously. What do you think? / Anders -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Apr 26 00:20:10 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 25 Apr 2019 21:20:10 -0700 Subject: [Python-ideas] Using rightarrow "->" for typing annotation of functions In-Reply-To: References: Message-ID: On Thu, Apr 25, 2019 at 1:51 PM Ivan Levkivskyi wrote: > TBH, I don't think it is so bad that it requires a new syntax. But I am > not strongly against it either. What I would like to add here is that if we > will go with the replacement: > > Callable[[X, Y], Z] becomes (X, Y) -> Z > > then we should also go with > > Union[X, Y] becomes X | Y > Tuple[X, Y] becomes (X, Y) > This may not be workable, because A[X, Y] and A[(X, Y)] have identical semantics in Python. So Tuple[(X, Y)] could either mean Tuple[X, Y] or Tuple[Tuple[X, Y]]. -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Fri Apr 26 00:13:00 2019 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Fri, 26 Apr 2019 06:13:00 +0200 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: <86DD85E1-A956-41FA-AB0B-D729F8CECA73@killingar.net> <6CB0A277-555F-4E77-AAAB-F69F4F93617F@killingar.net> <11e6c2a0-7c08-34a0-5303-191a64356d4f@nedbatchelder.com> Message-ID: <09CA5C7A-E8B3-470E-B96B-1F1C0D6129DB@killingar.net> > On 26 Apr 2019, at 05:47, Ram Rachum wrote: > > Ah, I thought about it now and Ned is right. This would require modifications to ceval.c and others. Pity! > The question is... Does anyone else think it's a good idea? I do. It seems to me that coverage is a very useful tool that shouldn?t be unusable for certain programs if we can avoid it. If we should include it in CPython in the end probably depends on how much it complicates the implementation and/or how solid tests are written obviously. I would point out that we can still get another coverage metric for these scenarios though: mutation coverage. But that?s extremely slow to collect compared to traditional coverage. / Anders From tjreedy at udel.edu Fri Apr 26 00:17:57 2019 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Apr 2019 00:17:57 -0400 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: <5CC23EF3.9010204@canterbury.ac.nz> References: <20190425000742.GP3010@ando.pearwood.info> <5CC23EF3.9010204@canterbury.ac.nz> Message-ID: On 4/25/2019 7:12 PM, Greg Ewing wrote: > Steven D'Aprano wrote: >> I too often forget that reverse() returns an iterator, I presume you mean reversed(). list.reverse() is a list > That seems like a mistake. Shouldn't it return a view? RL = reversed(somelist) is already partly view-like. The nth next call returns the nth item at the time of the next call, rather than at the time of the reversed call. However, the number of items produced by next calls is the length of the list at the time of the reversed call. The first next(RL) is the current somelist[captured_length]. >>> somelist = [1,2,3] >>> RL = reversed(somelist) >>> somelist[-1] = None >>> somelist.append(4) >>> list(RL) [None, 2, 1] -- Terry Jan Reedy From steve at pearwood.info Fri Apr 26 00:34:29 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Apr 2019 14:34:29 +1000 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: <5CC23EF3.9010204@canterbury.ac.nz> References: <20190425000742.GP3010@ando.pearwood.info> <5CC23EF3.9010204@canterbury.ac.nz> Message-ID: <20190426043429.GX3010@ando.pearwood.info> On Fri, Apr 26, 2019 at 11:12:51AM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >I too often forget that reverse() returns an iterator, > > That seems like a mistake. Shouldn't it return a view? I don't know what it "should" or "shouldn't" it return, but it actually does return an iterator: py> L = [1, 2, 3] py> R = reversed(L) py> hasattr(R, '__iter__') and iter(R) is R True -- Steven From steve at pearwood.info Fri Apr 26 00:56:08 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Apr 2019 14:56:08 +1000 Subject: [Python-ideas] What are the strong use cases for str.rindex()? In-Reply-To: References: <20190425000742.GP3010@ando.pearwood.info> <5CC23EF3.9010204@canterbury.ac.nz> Message-ID: <20190426045608.GY3010@ando.pearwood.info> On Fri, Apr 26, 2019 at 12:17:57AM -0400, Terry Reedy wrote: > On 4/25/2019 7:12 PM, Greg Ewing wrote: > >Steven D'Aprano wrote: > >>I too often forget that reverse() returns an iterator, > > I presume you mean reversed(). list.reverse() is a list Yes, I meant reversed(), not list.reverse() which is an in-place mutator method and returns None. > >That seems like a mistake. Shouldn't it return a view? > > RL = reversed(somelist) is already partly view-like. The nth next call > returns the nth item at the time of the next call, rather than at the > time of the reversed call. However, the number of items produced by > next calls is the length of the list at the time of the reversed call. That's not quite correct: py> L = [1, 2, 3] py> R = reversed(L) py> L.clear() py> next(R) Traceback (most recent call last): File "", line 1, in StopIteration It seems that: - in-place modifications in the list are reflected in the items yielded (so reversed() doesn't make a copy of the list); - operations which extend the length of the list don't show up in the reversed version; - and operations which decrease the length of the list decrease the number of items yielded. That suggests to me an implementation similar to: # untested def reversed(alist): N = len(alist) for i in range(N-1, -1, -1): try: yield alist[i] except IndexError: break raise StopIteration which I suppose is close to what you meant here: > The first next(RL) is the current somelist[captured_length]. -- Steven From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Apr 26 05:18:30 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 26 Apr 2019 18:18:30 +0900 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: Message-ID: <23746.52454.598466.518152@turnbull.sk.tsukuba.ac.jp> David Mertz writes: > Why not just this in existing Python: > > def func_1( If I understand the OP's POV, one reason is that the following comments are not available to help() in existing Python. (From the Picky-Picky-Picky Dept: It's a syntax error since a and b are not separated by the comma in the comment!) Did you mean "use existing Python syntax and have future interpreters use the comments"? > a: int = 1 # 'Something about param a', > b: float = 2.5 # 'Something else about param b', > ) -> float: > """Something about func_1 > > a and b interact in this interesting way. More from the Picky-Picky-Picky Dept: Again from the OP's POV, the following two lines belong in the signature I believe. I tend to differ, because they are (in principle) checkable, though only at runtime. See (2) below. > a should be in range 0 < a < 125 > floor(b) should be a prime number What I would rather see is (1) Comment syntax "inside" (fvo "inside" including any comment after the colon but before docstring or other code) the signature is interpreted as parameter documentation strings, which are available to help(). This is backward compatible; older Pythons will ignore them. The only problem would be if programs introspect their own docstrings and change behavior based on the result, which is so perverse that I'm happy to go with "if you do that, you deserve to suffer." (2) asserts involving parameters lexically are available to help(). I equate "assert" to your "should" since both have indeterminate semantics if violated at runtime. Of course this is subject to the usual caveats about use of assert to check user input at runtime! Perhaps argument-checking code not marked by an assert could be marked in some other way? So, after def featherfall( # not for use with parrots, dead or alive a: int = 1, # number of swallows in flock b: float = 2.5 # mean feather density of swallows ) -> float: # volume of feathers released when a flock # of swallows is stooped on by an F-35 """ The interaction of flock size and feather density may not be fully captured by multiplication. Also reports F-35's friend-or-foe ID to NORAD. Unimplemented: generalization to differential feather densities between African and European swallows and multiple disjoint flocks. Implemented for DoD contract #DEADBEEF666. """ assert 0 < a < 255 # not essential that these are assert prime(floor(b)) # the first code in the function c = a & floor(b) assert c != 0 # NOT documented in help() return a*b help() could produce featherfall (not for use with parrots, dead or alive) Parameters: a, an int, represents number of swallows in flock. Defaults to 1 and satisfies 0 < a < 255. b, a float, represents mean feather density of swallows. Defaults to 2.5 and satisfies prime(floor(b)). Returns: A float, whose value is the volume of feathers released when a flock of swallows is stooped on by an F-35. The interaction of flock size and feather density may not be fully captured by multiplication. Also reports F-35's friend-or-foe ID to NORAD. Unimplemented: generalization to differential feather densities between African and European swallows and multiple disjoint flocks. Implemented for DoD contract #DEADBEEF666. I'm not sure what to do with asserts involving multiple parameters. They could be repeated for each parameter involved, or put in a separate set of "Satisfies" conditions at the bottom of the Parameters: section. The main objection I see to the whole idea (and it may kill it) is that the comments aren't syntactically code, so that the formatting matters. (I'm also unsure whether they fall afoul of Guido's "No pragmas" dictat.) You could satisfy "must be code" with def featherfall( a: int = (1, 'number of swallows in flock')[0], b: float = (2.5, 'mean feather density of swallows')[0] ) -> float: # volume of feathers released when a flock # of swallows is stooped on by an F-35 but I think that falls into the "too ugly to live" category. NB: Any comment on the signature would be attached as documentation to the return value. It would also be possible to attach each comment in the signature to the most recently read component (function name, parameter, return type) so that def featherfall( # not for use with parrots, dead or alive a: int = 1, b: float = 2.5 # mean feather density of swallows ) -> float: would attach documentation to the function name and the parameter b, but not to the parameter a and the return type. This is well-defined, but maybe too pedantic to survive? The help text is of course infinitely bikesheddable. I suppose some people would prefer more backward compatibility. Ie, instead of the function name with comment as header, the header would be the signature. What that would look like is left as an exercise for the reader, but I would expect that the signature in the help output doesn't include the comment data. Personally, I don't really understand the "fails DRY" argument. If it's in the signature already, don't put it in the docstring. (If the project uses stub files, I guess this would require the ordinary compiler to look for them which might or might not be an acceptable tradeoff if it doesn't do that already.) Steve From j.marshall at arroyo.io Fri Apr 26 11:03:24 2019 From: j.marshall at arroyo.io (Joshua Marshall) Date: Fri, 26 Apr 2019 11:03:24 -0400 Subject: [Python-ideas] Syntax to conditionally define a field in a dict Message-ID: Hello all, I have a use case where I need to send a `dict` to a module as an argument. Inside of this, it has a multi-level structure, but each field I need to set may only be set to a single value. Fields must be valid, non-empty strings. It looks a lot like the following in my code: ``` def my_func(val_1, val_2): return { "field_1": val_1, "next_depth": { "field_2": val_2 } } ``` What I want to do is: ``` def my_func(val_1, val_2): return { "field_1": val_1 if val_1, "next_depth": { "field_2": val_2 if val_2 } } ``` Or: ``` def my_func(val_1, val_2): return { if val_1 : "field_1": val_1, "next_depth": { if val_2: "field_2": val_2 } } ``` Where each conditional in this context functions as: ``` if value: d["your_key"] = value ``` for each conditionally added and set key. >From the slack channel #learning_python, there are a number of more general points which need to be handled. The more core syntax, which should be valid throughout the language, would be to have statements like `x = y if cond` and `x[y if cond]`. The first of these intuitively reorganizes to `if cond: x = y`, but the second is not as clear, with a likely equivalent of `if cond: x[y] else raise Exception`. Thanks to Tom Forbes and Jim Kelly for helping critique the idea thus far. -- Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment. ?? ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Fri Apr 26 11:47:40 2019 From: jfine2358 at gmail.com (Jonathan Fine) Date: Fri, 26 Apr 2019 16:47:40 +0100 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: Hi Joshua Sounds to me that you want a solution soon, rather than in a future version of Python. Perhaps this works for you. def prune_nones(d): for k, v in list(d.items()): if v is None: del d[k] if type(v) is dict: prune_nones(v) >>> d = dict(a=1, b=2, c=None) >>> prune_nones(d) {'a': 1, 'b': 2} >>> d = dict(a=1, b=2, c=None, d=dict(e=None, f=3)) >>> prune_nones(d) {'a': 1, 'b': 2, 'd': {'f': 3}} I hope this helps. By the way, the list(d.items()) in the loop is to avoid RuntimeError: dictionary changed size during iteration -- Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From skreft at gmail.com Fri Apr 26 11:56:16 2019 From: skreft at gmail.com (Sebastian Kreft) Date: Fri, 26 Apr 2019 11:56:16 -0400 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: On Fri, Apr 26, 2019 at 11:07 AM Joshua Marshall wrote: > Hello all, > > I have a use case where I need to send a `dict` to a module as an > argument. Inside of this, it has a multi-level structure, but each field I > need to set may only be set to a single value. Fields must be valid, > non-empty strings. It looks a lot like the following in my code: > > ``` > def my_func(val_1, val_2): > return { > "field_1": val_1, > "next_depth": { > "field_2": val_2 > } > } > ``` > > What I want to do is: > ``` > def my_func(val_1, val_2): > return { > "field_1": val_1 if val_1, > "next_depth": { > "field_2": val_2 if val_2 > } > } > ``` > If val_2 here evaluates to falsey, will next_depth still be defined? From the code I would say that no. But your use case may require to not define the next_depth subdict without any values, as that may break the receiver expectations (think of JSON Schema). > > Or: > ``` > def my_func(val_1, val_2): > return { > if val_1 : "field_1": val_1, > "next_depth": { > if val_2: "field_2": val_2 > } > } > ``` > > Where each conditional in this context functions as: > ``` > if value: > d["your_key"] = value > ``` > for each conditionally added and set key. > > From the slack channel #learning_python, there are a number of more > general points which need to be handled. The more core syntax, which > should be valid throughout the language, would be to have statements like > `x = y if cond` and `x[y if cond]`. The first of these intuitively > reorganizes to `if cond: x = y`, but the second is not as clear, with a > likely equivalent of `if cond: x[y] else raise Exception`. > > Thanks to Tom Forbes and Jim Kelly for helping critique the idea thus far. > > > Please be advised that this email may contain confidential information. > If you are not the intended recipient, please notify us by email by > replying to the sender and delete this message. The sender disclaims that > the content of this email constitutes an offer to enter into, or the > acceptance of, any agreement; provided that the foregoing does not > invalidate the binding effect of any digital or other electronic > reproduction of a manual signature that is included in any attachment. > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Sebastian Kreft -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Fri Apr 26 12:13:33 2019 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 26 Apr 2019 12:13:33 -0400 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: <53191ee4-f4b1-6c04-6051-1a06bf546fdd@nedbatchelder.com> On 4/26/19 11:03 AM, Joshua Marshall wrote: > Hello all, > > I have a use case where I need to send a `dict` to a module as an > argument.? Inside of this, it has a multi-level structure, but each > field I need to set may only be set to a single value.? Fields must be > valid, non-empty strings.? It looks a lot like the following in my code: > > ``` > def my_func(val_1, val_2): > ??? return { > ??????? "field_1": val_1, > ??????? "next_depth": { > ??????????? "field_2": val_2 > ??????? } > ??? } > ``` > > What I want to do is: > ``` > def my_func(val_1, val_2): > ??? return { > ??????? "field_1": val_1 if val_1, > ??????? "next_depth": { > ??????????? "field_2": val_2 if val_2 > ??????? } > ??? } > ``` > It's not clear in this example what you would want if val_2 is None. Should it be: ??? { "field_1": val_1 } or: ??? { "field_1": val_1, "next_depth": {} } ? Better would be to build your dict with the tools you already have: ??? d = {} ??? if val_1: ??????? d['field_1'] = val_1 ??? if val_2: ??????? d['next_depth'] = { 'field_2': val_2 } You have total control over the results, and it doesn't take much more space than your proposal. Various helper function could make the code more compact, and even clearer than your proposal: ??? d = {} ??? add_maybe(d, val_1, "field_1") ??? add_maybe(d, val_2, "next_depth", "field_2") Of course, you might prefer a different API.? That's an advantage of helper functions: you can design them to suit your exact needs. --Ned. From j.marshall at arroyo.io Fri Apr 26 12:29:22 2019 From: j.marshall at arroyo.io (Joshua Marshall) Date: Fri, 26 Apr 2019 12:29:22 -0400 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: Ideally, the "next_depth" field would also not be defined, which may be easier to handle with the later syntax of putting an 'if' out in front. On Fri, Apr 26, 2019 at 11:56 AM Sebastian Kreft wrote: > > On Fri, Apr 26, 2019 at 11:07 AM Joshua Marshall > wrote: > >> Hello all, >> >> I have a use case where I need to send a `dict` to a module as an >> argument. Inside of this, it has a multi-level structure, but each field I >> need to set may only be set to a single value. Fields must be valid, >> non-empty strings. It looks a lot like the following in my code: >> >> ``` >> def my_func(val_1, val_2): >> return { >> "field_1": val_1, >> "next_depth": { >> "field_2": val_2 >> } >> } >> ``` >> >> What I want to do is: >> ``` >> def my_func(val_1, val_2): >> return { >> "field_1": val_1 if val_1, >> "next_depth": { >> "field_2": val_2 if val_2 >> } >> } >> ``` >> > If val_2 here evaluates to falsey, will next_depth still be defined? From > the code I would say that no. But your use case may require to not define > the next_depth subdict without any values, as that may break the receiver > expectations (think of JSON Schema). > > >> >> Or: >> ``` >> def my_func(val_1, val_2): >> return { >> if val_1 : "field_1": val_1, >> "next_depth": { >> if val_2: "field_2": val_2 >> } >> } >> ``` >> >> Where each conditional in this context functions as: >> ``` >> if value: >> d["your_key"] = value >> ``` >> for each conditionally added and set key. >> >> From the slack channel #learning_python, there are a number of more >> general points which need to be handled. The more core syntax, which >> should be valid throughout the language, would be to have statements like >> `x = y if cond` and `x[y if cond]`. The first of these intuitively >> reorganizes to `if cond: x = y`, but the second is not as clear, with a >> likely equivalent of `if cond: x[y] else raise Exception`. >> >> Thanks to Tom Forbes and Jim Kelly for helping critique the idea thus far. >> >> >> Please be advised that this email may contain confidential information. >> If you are not the intended recipient, please notify us by email by >> replying to the sender and delete this message. The sender disclaims that >> the content of this email constitutes an offer to enter into, or the >> acceptance of, any agreement; provided that the foregoing does not >> invalidate the binding effect of any digital or other electronic >> reproduction of a manual signature that is included in any attachment. >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > Sebastian Kreft > -- Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment. ?? ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Apr 26 14:30:46 2019 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 26 Apr 2019 19:30:46 +0100 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: <5aa6cd96-eaa4-413e-4b60-732ee8c9549d@mrabarnett.plus.com> On 2019-04-26 16:56, Sebastian Kreft wrote: > > On Fri, Apr 26, 2019 at 11:07 AM Joshua Marshall > wrote: > > Hello all, > > I have a use case where I need to send a `dict` to a module as an > argument.? Inside of this, it has a multi-level structure, but each > field I need to set may only be set to a single value.? Fields must > be valid, non-empty strings.? It looks a lot like the following in > my code: > > ``` > def my_func(val_1, val_2): > ??? return { > ??????? "field_1": val_1, > ??????? "next_depth": { > ??????????? "field_2": val_2 > ??????? } > ??? } > ``` > > What I want to do is: > ``` > def my_func(val_1, val_2): > ??? return { > ??????? "field_1": val_1 if val_1, > ??????? "next_depth": { > ??????????? "field_2": val_2 if val_2 > ??????? } > ??? } > ``` > > If val_2 here evaluates to falsey, will next_depth still be defined? > From the code I would say that no. But your use case may require to not > define the next_depth subdict without any values, as that may break the > receiver expectations (think of JSON Schema). > From the code I would say yes. If you didn't want the subdict, you would've written: def my_func(val_1, val_2): return { "field_1": val_1 if val_1, "next_depth": { "field_2": val_2 } if val_2 } > > Or: > ``` > def my_func(val_1, val_2): > ??? return { > ??????? if val_1 : "field_1": val_1, > ??????? "next_depth": { > ??????????? if val_2: "field_2": val_2 > ??????? } > ??? } > ``` > def my_func(val_1, val_2): return { if val_1 : "field_1": val_1, if val_2: "next_depth": { "field_2": val_2 } } [snip] The first form is too easily confused with the ternary 'if'. In Python, an expression never starts with an 'if', so the second form would be a better syntax for an optional entry. From pythonchb at gmail.com Fri Apr 26 15:25:02 2019 From: pythonchb at gmail.com (Christopher Barker) Date: Fri, 26 Apr 2019 12:25:02 -0700 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: Others have responded, but a note: > What I want to do is: ``` def my_func(val_1, val_2): return { "field_1": val_1 if val_1, "next_depth": { "field_2": val_2 if val_2 } } ``` I am finding this very confusing as to how to generalize this: How do we know that val_1 belongs to the "top-level" field_1, and val_2 is in the nested dict with field_2? Or: ``` def my_func(val_1, val_2): return { if val_1 : "field_1": val_1, "next_depth": { if val_2: "field_2": val_2 } } but this makes it seem like that distinction is hard-coded -- so is the nested dict is relevant? > The more core syntax, which should be valid throughout the language, would be to have statements like `x = y if cond` we have the x = y if cond else expression already -- and an assignment HAS to be assigned to something, so it seems what you want is: x = y if cond else None Maybe the "else None" feels like too much typing, but I prefer the explicitness myself. (and look in the history of this thread for "null coalescing" discussion, that _may_ be relevant. The first of these intuitively reorganizes to `if cond: x = y` then what do we get for x `if not cond`? it ends up undefined? or set to whatever value it used to have? Frankly, I think that's a mistake -- you're going to end up with having to trap a NameError or do a a hasattr() check later on anyway. It's generally considered good practice to set a name to None if it isn't defined, rather than not defining it. > and `x[y if cond]` ... But the second is not as clear, with a likely equivalent of `if cond: x[y] else raise Exception`. assuming x is a dict, then you could do: d[y if cond else []] = value It's a hack, but as lists aren't hashable, you get an TypeError, so maybe that would work for you? example: In [16]: key = "Fred" In [17]: value = "Barnes" In [18]: d = {} In [19]: # If the key is Truthy: In [20]: d[key if key else []] = value In [21]: d Out[21]: {'Fred': 'Barnes'} In [22]: # if the key is Falsey: In [23]: key = None In [24]: d[key if key else []] = value --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 d[key if key else []] = value TypeError: unhashable type: 'list' -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Apr 26 23:25:29 2019 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Apr 2019 13:25:29 +1000 Subject: [Python-ideas] Users of statistics software, what quantile functionality would be useful for you? Message-ID: <20190427032529.GB3720@ando.pearwood.info> The statistics module is soon to get a quantile function. For those of you who use statistics software (whether in Python, or using other languages) and specifically use quantiles, what sort of functionality would be useful to you? For example: - evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? - one quantile at a time? - any specific definition? - quantiles of a distribution? - anything else? Thanks in advance. -- Steven From pythonchb at gmail.com Sat Apr 27 11:34:41 2019 From: pythonchb at gmail.com (Christopher Barker) Date: Sat, 27 Apr 2019 08:34:41 -0700 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: yes, this can already be done, and yes, mapping unpacking is little used (I never think to use it). But in this case, it's a no-op, not sure the point. I get the same thing with just the ternary expression: In [11]: def create(val1): ...: data = {"val1": "me here"} if val1 else {} ...: return data ...: In [12]: create(True) Out[12]: {'val1': 'me here'} In [13]: create(False) Out[13]: {} -CHB On Sat, Apr 27, 2019 at 8:25 AM Joao S. O. Bueno wrote: > Calling upon ol' Guidos Time Machinne: > > > In [31]: def create(val1): > ...: data = { > ...: **({"val1": "me here"} if val1 else {}) > ...: } > ...: return data > ...: > > > In [32]: create(False) > > Out[32]: {} > > In [33]: create(True) > > Out[33]: {'val1': 'me here'} > > Now, please, just move to the next request to the language. > > I think the in-place star exapanding, along with the inline if like above > can make out > for all of your use-cases. If it can't, then expect xpression assignment-s > (the `a:=1` from PEP 572, > comming in Python 3.8) to cover for the rest - as you can define variables > inside the `if` expressions > like the above which could be re-used in other `if`s (or even in > key-names), further down the dictionary) > > > > So, seriously - On one hand, Python's syntaxalready allow what you are > requesting. > On the other hand, it makes use of a syntax that is already little used > (in place star-expansion for mappings), > to a point that in this thread, up to here, no one made use of it. > Threfore, IMHO, it demonstrates that adding arbitrary syntaxes and > language features ultimatelly fills the language > with clutter very few people ends up knowing how to use - - we should now > work on what > is already possible instead of making the language still more difficult to > learn . > > (respasting the code so you can further experiment): > > def create(val1): > data = { > **({"val1": "me here"} if val1 else {}) > } > return data > > > > > > On Fri, 26 Apr 2019 at 21:22, Christopher Barker > wrote: > >> Others have responded, but a note: >> >> > What I want to do is: >> >> ``` >> def my_func(val_1, val_2): >> return { >> "field_1": val_1 if val_1, >> "next_depth": { >> "field_2": val_2 if val_2 >> } >> } >> ``` >> >> I am finding this very confusing as to how to generalize this: >> >> How do we know that val_1 belongs to the "top-level" field_1, and val_2 >> is in the nested dict with field_2? >> >> Or: >> ``` >> def my_func(val_1, val_2): >> return { >> if val_1 : "field_1": val_1, >> "next_depth": { >> if val_2: "field_2": val_2 >> } >> } >> >> but this makes it seem like that distinction is hard-coded -- so is the >> nested dict is relevant? >> >> > The more core syntax, which should be valid throughout the language, >> would be to have statements like `x = y if cond` >> >> we have the >> >> x = y if cond else >> >> expression already -- and an assignment HAS to be assigned to something, >> so it seems what you want is: >> >> x = y if cond else None >> >> Maybe the "else None" feels like too much typing, but I prefer the >> explicitness myself. (and look in the history of this thread for "null >> coalescing" discussion, that _may_ be relevant. >> >> The first of these intuitively reorganizes to `if cond: x = y` >> >> then what do we get for x `if not cond`? it ends up undefined? or set to >> whatever value it used to have? >> >> Frankly, I think that's a mistake -- you're going to end up with having >> to trap a NameError or do a a hasattr() check later on anyway. It's >> generally considered good practice to set a name to None if it isn't >> defined, rather than not defining it. >> >> > and `x[y if cond]` ... But the second is not as clear, with a likely >> equivalent of `if cond: x[y] else raise Exception`. >> >> assuming x is a dict, then you could do: >> >> d[y if cond else []] = value >> >> It's a hack, but as lists aren't hashable, you get an TypeError, so maybe >> that would work for you? >> >> example: >> >> In [16]: key = "Fred" >> In [17]: value = "Barnes" >> In [18]: d = {} >> >> In [19]: # If the key is Truthy: >> In [20]: d[key if key else []] = value >> >> In [21]: d >> Out[21]: {'Fred': 'Barnes'} >> >> In [22]: # if the key is Falsey: >> In [23]: key = None >> >> In [24]: d[key if key else []] = value >> >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> in () >> ----> 1 d[key if key else []] = value >> >> TypeError: unhashable type: 'list' >> >> -CHB >> >> >> >> >> -- >> Christopher Barker, PhD >> >> Python Language Consulting >> - Teaching >> - Scientific Software Development >> - Desktop GUI and Web Development >> - wxPython, numpy, scipy, Cython >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Sat Apr 27 11:24:59 2019 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 27 Apr 2019 12:24:59 -0300 Subject: [Python-ideas] Syntax to conditionally define a field in a dict In-Reply-To: References: Message-ID: Calling upon ol' Guidos Time Machinne: In [31]: def create(val1): ...: data = { ...: **({"val1": "me here"} if val1 else {}) ...: } ...: return data ...: In [32]: create(False) Out[32]: {} In [33]: create(True) Out[33]: {'val1': 'me here'} Now, please, just move to the next request to the language. I think the in-place star exapanding, along with the inline if like above can make out for all of your use-cases. If it can't, then expect xpression assignment-s (the `a:=1` from PEP 572, comming in Python 3.8) to cover for the rest - as you can define variables inside the `if` expressions like the above which could be re-used in other `if`s (or even in key-names), further down the dictionary) So, seriously - On one hand, Python's syntaxalready allow what you are requesting. On the other hand, it makes use of a syntax that is already little used (in place star-expansion for mappings), to a point that in this thread, up to here, no one made use of it. Threfore, IMHO, it demonstrates that adding arbitrary syntaxes and language features ultimatelly fills the language with clutter very few people ends up knowing how to use - - we should now work on what is already possible instead of making the language still more difficult to learn . (respasting the code so you can further experiment): def create(val1): data = { **({"val1": "me here"} if val1 else {}) } return data On Fri, 26 Apr 2019 at 21:22, Christopher Barker wrote: > Others have responded, but a note: > > > What I want to do is: > > ``` > def my_func(val_1, val_2): > return { > "field_1": val_1 if val_1, > "next_depth": { > "field_2": val_2 if val_2 > } > } > ``` > > I am finding this very confusing as to how to generalize this: > > How do we know that val_1 belongs to the "top-level" field_1, and val_2 is > in the nested dict with field_2? > > Or: > ``` > def my_func(val_1, val_2): > return { > if val_1 : "field_1": val_1, > "next_depth": { > if val_2: "field_2": val_2 > } > } > > but this makes it seem like that distinction is hard-coded -- so is the > nested dict is relevant? > > > The more core syntax, which should be valid throughout the language, > would be to have statements like `x = y if cond` > > we have the > > x = y if cond else > > expression already -- and an assignment HAS to be assigned to something, > so it seems what you want is: > > x = y if cond else None > > Maybe the "else None" feels like too much typing, but I prefer the > explicitness myself. (and look in the history of this thread for "null > coalescing" discussion, that _may_ be relevant. > > The first of these intuitively reorganizes to `if cond: x = y` > > then what do we get for x `if not cond`? it ends up undefined? or set to > whatever value it used to have? > > Frankly, I think that's a mistake -- you're going to end up with having to > trap a NameError or do a a hasattr() check later on anyway. It's generally > considered good practice to set a name to None if it isn't defined, rather > than not defining it. > > > and `x[y if cond]` ... But the second is not as clear, with a likely > equivalent of `if cond: x[y] else raise Exception`. > > assuming x is a dict, then you could do: > > d[y if cond else []] = value > > It's a hack, but as lists aren't hashable, you get an TypeError, so maybe > that would work for you? > > example: > > In [16]: key = "Fred" > In [17]: value = "Barnes" > In [18]: d = {} > > In [19]: # If the key is Truthy: > In [20]: d[key if key else []] = value > > In [21]: d > Out[21]: {'Fred': 'Barnes'} > > In [22]: # if the key is Falsey: > In [23]: key = None > > In [24]: d[key if key else []] = value > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > in () > ----> 1 d[key if key else []] = value > > TypeError: unhashable type: 'list' > > -CHB > > > > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Apr 27 19:21:09 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 27 Apr 2019 16:21:09 -0700 Subject: [Python-ideas] Users of statistics software, what quantile functionality would be useful for you? In-Reply-To: <20190427032529.GB3720@ando.pearwood.info> References: <20190427032529.GB3720@ando.pearwood.info> Message-ID: On Sat, Apr 27, 2019 at 6:10 AM Steven D'Aprano wrote: > The statistics module is soon to get a quantile function. > > For those of you who use statistics software (whether in Python, or > using other languages) and specifically use quantiles, what sort of > functionality would be useful to you? > > For example: > > - evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? > - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? > If I'm interested in multiple quantiles, they are usually unevenly spaced. Something like [0.8, 0.9, 0.95, 0.99, 0.999] would be pretty typical if I'm not sure what the right threshold for an "outlier" is. > - one quantile at a time? > Yes, this is also quite common, once I know what threshold I care about. > - any specific definition? > NumPy's quantile function has an "interpolation" option for controlling the quantile definition. But in years of calculating quantiles for data analysis, I've never used it. > - quantiles of a distribution? > Yes, rarely -- though the only example that comes to mind is quantiles for a Normal distribution. (scipy.stats supports this use-case well.) > - anything else? > The flexibility of calculating either one or multiple quantiles with np.quantile() is pretty convenient. But this might make for a more dynamic type signature that you'd like for the standard library, e.g., T = TypeVar('T') @overload def quantile(data: Iterable[T], threshold: float) -> T: ... @overload def quantile(data: Iterable[T], threshold: Sequence[float]) -> List[T]: ... > Thanks in advance. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Sun Apr 28 03:34:38 2019 From: barry at barrys-emacs.org (Barry Scott) Date: Sun, 28 Apr 2019 08:34:38 +0100 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: Message-ID: > On 25 Apr 2019, at 15:51, Ram Rachum wrote: > > Hi, > > Here's something I want in Python: Multiple levels of tracers working on top of each other, instead of just one. > > I'm talking about the tracer that one can set by calling sys.settrace. > > I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ > > One of the difficulties I have, is that I can't debug or run the `coverage` tool on the core of this module. That's because the core is a trace function, and debuggers and coverage tools work by setting a trace function. When PySnooper sets its trace function using `sys.settrace`, the code that runs in that trace function runs without getting traced by the coverage tracer. > > This means that people who develop debuggers and coverage tools can't use a debugger or a coverage tool on the core of their tool. It's quite an annoying problem. > > My proposed solution: Multiple levels of tracing, instead of just one. When you install a tracer, you're not replacing the existing one, you're appending a tracer to the existing list of tracers. > > If this was implemented, then when PySnooper would install its tracer, the coverage tracer would still be active and running, for every line of code including the ones in PySnooper's tracer. > > Obviously, we'll need to figure out the API and any other kind of problems with this proposal. > > What do you think? Personally I would look to other means to get the coverage report for a tracing tool or debugger. For example why not use unittesting and mocking to allow the trace code to be run and measured? After all you only have to mock for one functions interface. As for debugging I would use print() or logging to find what I need. Barry > > > Thanks, > Ram. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Sun Apr 28 04:12:42 2019 From: ram.rachum at gmail.com (Ram Rachum) Date: Sun, 28 Apr 2019 11:12:42 +0300 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: Message-ID: It's possible, but it would be very cumbersome, for a bunch of reasons. One of them is that the tracing code inspects the frame, the variables referenced in it, and it even opens the file of the code object of the frame. It will be difficult to mock all of that, and even if that's possible, we won't have high confidence that the mock is reliable. On Sun, Apr 28, 2019, 11:06 Barry Scott wrote: > > > On 25 Apr 2019, at 15:51, Ram Rachum wrote: > > Hi, > > Here's something I want in Python: Multiple levels of tracers working on > top of each other, instead of just one. > > I'm talking about the tracer that one can set by calling sys.settrace. > > I've recently released PySnooper: https://github.com/cool-RR/PySnooper/ > > One of the difficulties I have, is that I can't debug or run the > `coverage` tool on the core of this module. That's because the core is a > trace function, and debuggers and coverage tools work by setting a trace > function. When PySnooper sets its trace function using `sys.settrace`, the > code that runs in that trace function runs without getting traced by the > coverage tracer. > > This means that people who develop debuggers and coverage tools can't use > a debugger or a coverage tool on the core of their tool. It's quite an > annoying problem. > > My proposed solution: Multiple levels of tracing, instead of just one. > When you install a tracer, you're not replacing the existing one, you're > appending a tracer to the existing list of tracers. > > If this was implemented, then when PySnooper would install its tracer, the > coverage tracer would still be active and running, for every line of code > including the ones in PySnooper's tracer. > > Obviously, we'll need to figure out the API and any other kind of problems > with this proposal. > > What do you think? > > > Personally I would look to other means to get the coverage report for a > tracing tool > or debugger. > > For example why not use unittesting and mocking to allow the trace code to > be run > and measured? After all you only have to mock for one functions interface. > > As for debugging I would use print() or logging to find what I need. > > Barry > > > > Thanks, > Ram. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Sun Apr 28 04:29:46 2019 From: barry at barrys-emacs.org (Barry Scott) Date: Sun, 28 Apr 2019 09:29:46 +0100 Subject: [Python-ideas] Idea: Allow multiple levels of tracers In-Reply-To: References: Message-ID: <5D903F7E-0C7C-4FD8-B81A-74A1386A1F2C@barrys-emacs.org> > On 28 Apr 2019, at 09:12, Ram Rachum wrote: > > It's possible, but it would be very cumbersome, for a bunch of reasons. One of them is that the tracing code inspects the frame, the variables referenced in it, and it even opens the file of the code object of the frame. It will be difficult to mock all of that, and even if that's possible, we won't have high confidence that the mock is reliable. It is not so scary that I would not take it on myself if I had the need. Maybe you use real frames and only fake up the call into the tracecode. What I'm thinking is that it is possible and can be done today without C level changes to Python. Maybe if there are changes to the C python it only needs a second tracehook that runs for the code in the sys.settrace only. Barry From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Apr 28 05:08:15 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 28 Apr 2019 18:08:15 +0900 Subject: [Python-ideas] Users of statistics software, what quantile functionality would be useful for you? In-Reply-To: <20190427032529.GB3720@ando.pearwood.info> References: <20190427032529.GB3720@ando.pearwood.info> Message-ID: <23749.28031.617999.73320@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > using other languages) and specifically use quantiles, what sort of > functionality would be useful to you? > > For example: > > - evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? Yes. > - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? Rarely. > - one quantile at a time? Yes. > - any specific definition? My students' data is qualitative survey (Likert scale or categorical) or government-issue, "any" is "good enough for gov't work". > - quantiles of a distribution? You mean the inverse of the cumulative distribution function? Very rarely. > - anything else? Not that I can think of. Thank you for this addition! Steve From guido at python.org Sun Apr 28 11:33:57 2019 From: guido at python.org (Guido van Rossum) Date: Sun, 28 Apr 2019 08:33:57 -0700 Subject: [Python-ideas] Users of statistics software, what quantile functionality would be useful for you? In-Reply-To: <20190427032529.GB3720@ando.pearwood.info> References: <20190427032529.GB3720@ando.pearwood.info> Message-ID: On Sat, Apr 27, 2019 at 7:51 AM Steven D'Aprano wrote: > The statistics module is soon to get a quantile function. > > For those of you who use statistics software (whether in Python, or > using other languages) and specifically use quantiles, what sort of > functionality would be useful to you? > > For example: > > - evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? > - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? > - one quantile at a time? > - any specific definition? > - quantiles of a distribution? > - anything else? > The stats that are pored over by my team every week are running times of mypy in various configurations. We currently show p25, p50, p75, p90, p95 and p99. We currently use the following definition: def pick(data: List[float], fraction: float) -> float: index = int(len(data) * fraction) before = data[max(0, index - 1)] after = data[min(len(data) - 1, index)] return (before + after) / 2.0 where `data` is a sorted array. Essentially we use the average of the two values nearest the cutoff point, except for edge cases. (I think we could do better, but this is the code I found in our repo. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.ed.oconnor at gmail.com Sun Apr 28 14:48:31 2019 From: peter.ed.oconnor at gmail.com (Peter O'Connor) Date: Sun, 28 Apr 2019 11:48:31 -0700 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: <23746.52454.598466.518152@turnbull.sk.tsukuba.ac.jp> References: <23746.52454.598466.518152@turnbull.sk.tsukuba.ac.jp> Message-ID: Thanks all for the responses. I read thought them carefully and address each below. I don't think any fully address the core problem - The "Argument" - the tuple of (type, default, documentation) - is currently not a first-class entity. Because there is no way to reference an Argument, there is much copypasta and dangerous-default-duplication. The idea of this proposal is that it should be possible to define an argument in one place, and simply bind a different name to it in each function signature. To recap - the points to the proposal are: - Allow documentation to be bound to an argument: "func(arg_a: int = 3 ? 'doc about arg', ...)" or "func(arg_a: int = 3 # 'doc about arg', ...)" - Allow reference to argument: "outer_func(new_arg_a_name: func.args.arg_a.type = func.args.arg_a.default ? 'new doc for arg a', ...)" - (optionally) a shorthand syntax for reusing type/doc/default of argument: "def outer_func(new_arg_a_name :=? func.args.arg_a, ...):" Below I have responded to each comment - please let me know if I missed something: ---------- On Thu, Apr 25, 2019 at 3:59 PM Robert Vanden Eynde wrote: > Looks like a more complicated way to say : > def f(x:'int : which does stuff' = 5, y:'int : which does more stuffs') > I hadn't though of incorporating documentation into the type, that's a nice idea. I think it's an ok "for now" solution but: - doing it this way loses the benefits of type inspection (built in to most IDEs now), - does not allow you do for instance keep the type definition in the wrapper while changing the documentation. - Provides no easy way to reference (f.args.x.documentation) which is a main point to the proposal. ---------- On Thu, Apr 25, 2019 at 3:58 PM Chris Angelico wrote: > @functools.passes_args(f) > def wrapper(spam, ham, *a, **kw): > f(*a, **kw) > .... If that were implemented, would it remove the need for this new syntax > you propose? > This does indeed allow defaults and types to be passed on, but I find a this approach still has the basic flaw of using **kwargs: - It only really applies to "wrappers" - functions that wrap another function. The goal here is to address the common case of a function passing args to a number of functions within it. - It is assumed that the wrapper should use the same argument names as the wrapped function. A name should bind a function to an argument - but here the name is packaged with the argument. - It remains difficult to determine the arguments of "wrapper" by simple inspection - added syntax for removal of certain arguments only complicates the task and seems fragile (lest the wrapped functions argument names change). - Renaming an argument to "f" will change the change the arguments of wrapper - but in a way that's not easy to inspect (so for instance if you have a call "y = wrapper(arg_1=4)", and you change "f(arg1=....)" to "f(arg_one=...)" no IDE will catch that and make the appropriate change to "y=wrapper(arg_one=4)". - What happens if you're not simply wrapping one sub-function but calling several? What about when those subfunctions have arguments with the same name? ---------- On Thu, Apr 25, 2019 at 5:50 PM David Mertz wrote: > Why not just this in existing Python: > def func_1( > a: int = 1 # 'Something about param a', > b: float = 2.5 # 'Something else about param b', > ) -> float: > """Something about func_1 > > a and b interact in this interesting way. > a should be in range 0 < a < 125 > floor(b) should be a prime number > > Something about return value of func_1 > returns a multiplication > """ > return a*b > - This would currently be a syntax error (because "#" is before the comma), but sure we could put it after the comma. - It does not address the fact that we cannot reference "func_1.a.default" - which is one of the main points of this proposal. - I'm fine with "#" being the documentation operator instead of "?", but figured people would not like it because it breaks the precedent of anything after "#" being ignored by the compiler --------------- On Thu, Apr 25, 2019 at 9:04 PM Anders Hovm?ller wrote: > Maybe something like... > def foo(**kwargs): > ??? > @signature_by: > full.module.path.to.a.signature_function(pass_kwargs_to=bar, > hardcoded=[?quux?]) > ??? > return bar(**kwargs, quux=3) > This makes it difficult to see what the names of arguments to "foo" are, at a glance. And what happens if (as in the example) "foo" does not simply wrap a function, but distributes arguments to multiple subfunctions? (this is a common case) ---------------------------- On Fri, Apr 26, 2019 at 2:18 AM Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > What I would rather see is > > (1) Comment syntax "inside" (fvo "inside" including any comment after > the colon but before docstring or other code) ..... > > (2) asserts involving parameters lexically are available to help()..... > (1) I'm fine with "#" being used instead of "?" as the "documentation operator", but I figured it would be rejected for breaking the president that everything after "#" is ignored by the compiler. (2) This would be a nice addition ... if this proposal were actually implemented, you'd have a built in "Argument" object, and in that case you could do e.g.: RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = lambda img: (img.ndim==3 and img.shape[2]==3)) def brighten_image(image :=? RGB_IMAGE, ...): ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Apr 28 21:40:27 2019 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 29 Apr 2019 10:40:27 +0900 Subject: [Python-ideas] Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings In-Reply-To: References: <23746.52454.598466.518152@turnbull.sk.tsukuba.ac.jp> Message-ID: <23750.22027.185079.287153@turnbull.sk.tsukuba.ac.jp> Peter O'Connor writes: > On Fri, Apr 26, 2019 at 2:18 AM Stephen J. Turnbull < > turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > > What I would rather see is > > > > (1) Comment syntax "inside" (fvo "inside" including any comment after > > the colon but before docstring or other code) ..... > > > > (2) asserts involving parameters lexically are available to help()..... > > > > (1) I'm fine with "#" being used instead of "?" as the "documentation > operator", but I figured it would be rejected for breaking the president > that everything after "#" is ignored by the compiler. This is not quite true, depending on your definition of "compiler", because of PEP 263, which allows you to specify the program's encoding in a comment at the beginning (first or second line), and because of type hints themselves, which are recognized in comments by parsing tools like mypy. The reason for not having semantically significant comments, AIUI, is not so much that the *compiler* ignore it as that the *human reader* be able to ignore it. The way I think about it, this is what justifies the PEP 263 coding cookies, since humans *will* ignore the cookies in favor of detecting mojibake "by eye", while compilers need them to construct string literals correctly. I'm not sure how the authorities on Pythonicity will come down on this, though. > (2) This would be a nice addition ... if this proposal were actually > implemented, you'd have a built in "Argument" object, and in that case you > could do e.g.: > RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = > lambda img: (img.ndim==3 and img.shape[2]==3)) > def brighten_image(image :=? RGB_IMAGE, ...): > ... If the "(2)" refers to my proposal that asserts be available to help(), I don't understand the example. Are you proposing that instead of an explicit expression, asserts of this kind be written assert image.check() In any case, as far as I know you can already use the syntax suggested in the quotation with standard type hints as long as "Argument" is a class *factory*. (Caveat: I'm not sure what semantics you attach to ":=?".) I'm not arguing against the desire for the *builtin* feature, simply recognizing the practicality that already implemented features, or approximations to what you really want, get more support more quickly. Thinking aloud, you're probably way ahead of me, but just in case: I'm not so clear on whether the reuse of RGB_IMAGE suggested by the assignment in the example is so useful. I would expect something more like RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = lambda img: (img.ndim==3 and img.shape[2]==3)) def brighten_image(image : RGB_IMAGE(doc = 'Image to brighten in-place', ...): because "image" already tells you what the argument *is*, and "RGB_IMAGE" tells you its specific *type*, while the docstring tells you its *role* in the function, and that it's being mutated. For various reasons, these distinctions aren't so salient for RGB images, I guess, but they're crucial for types like int and float. It seems to me, therefore, that the Argument object (whether an instance or a class) is likely to have 'doc' and 'check' attributes that vary with role, ie, as the arguments they described get passed from function to function. I don't think there's a lot to say about the variability of 'doc' except "live with it", but the variability of 'check' sounds like something that a contracting functionality would deal with (if you don't know him already, Anders Hovm?ller is expert on contracts in Python, and there were long threads on that a few months ago, which he could probably summarize for you). Steve