From ncoghlan at gmail.com Fri Dec 1 01:01:11 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 16:01:11 +1000 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> Message-ID: On 1 December 2017 at 01:23, Ronald Oussoren wrote: > 1) Last time around Mark Shannon worried that this introduces infinite > recursion in the language itself (in my crummy summary, please read this > message to get the real concern > ). Is > this truly a problem? I don?t think there is a problem, but I?m worried > that I don?t fully understand Mark?s concerns. > > 2) PEP 487 introduced __init_subclass__ as a class method to avoid having to > write a metaclass for a number of use cases. My PEP currently does require > a metaclass, but it might be nicer to switch to a regular class method > instead (like __init_subclass__). I think the second point there may actually allow you to resolve the first one, by way of making `__getdescriptor__` an optional override of the typical lookup algorithm. That is: def _PyType_Lookup(tp, name): # New optional override for descriptor lookups try: # Ordinary attribute lookup, *NOT* a descriptor lookup getdesc = tp.__getdescriptor__ except AttributeError: pass else: return getdesc(name) # Default algorithm used in the absence of an override mro = tp.mro() assert isinstance(mro, tuple) for base in mro: assert isinstance(base, type) try: return base.__dict__[name] except KeyError: pass return None If you did go this way, then we'd want to add a "types.getdescriptor(cls, name)" API to expose _PyType_Lookup at the Python layer, since "getattr(type(obj), name)" wouldn't be an accurate emulation of the algorithm any more. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Dec 1 01:08:28 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 16:08:28 +1000 Subject: [Python-ideas] Add a dict with the attribute access capability In-Reply-To: References: Message-ID: On 30 November 2017 at 05:11, Barry Warsaw wrote: > Serhiy Storchaka wrote: >> In 3.7 I have removed an old-deprecated plistlib.Dict. [1] Actually it >> already was deprecated when the plistlib module was added to the regular >> stdlib in Python 2.6. >> >> Raymond noticed that that capability seemed nice to have. > > So nice in fact that I'm sure I've reimplemented something similar > several times. :) Note that we do offer a simple namespace type: >>> from types import SimpleNamespace as ns >>> data = ns(a=1, b=2, c=3) >>> data.a 1 >>> vars(data)["a"] 1 >>> vars(data)["a"] = 3 >>> data.a 3 It was added as part of adding sys.implementation, since it's the storage type used for that: >>> import sys >>> type(sys.implementation) So the only thing we don't currently offer is a type that provides both attribute access and mapping access on the *same* object - for SimpleNamespace we require that you request the mapping API with "vars(ns)". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nas-python-ideas at arctrix.com Fri Dec 1 03:13:37 2017 From: nas-python-ideas at arctrix.com (Neil Schemenauer) Date: Fri, 1 Dec 2017 02:13:37 -0600 Subject: [Python-ideas] Provide a way to import module without exec body Message-ID: <20171201081337.c7fry5encm2nc4ob@python.ca> I have been working on reducing Python statup time. It would be nice if there was some way to load a module into memory without exec of its body code. I'm sure other people have wished for this. Perhaps there could be a new special function, similar to __import__ for this purpose. E.g. __load_module__(). To actually execute the module, I had the idea to make module objects callable, i.e. tp_call for PyModule_Type. That's a little too cute though and will cause confusion. Maybe instead, add a function attribute to modules, e.g. mod.__exec__(). I have a little experimental code, just a small step: https://github.com/nascheme/cpython/tree/import_defer_exec We need importlib to give us the module object and the bytecode without doing the exec(). My hackish solution is to set properties on __spec__ and then have PyImport_ImportModuleLevelObject() do the exec(). From rosuav at gmail.com Fri Dec 1 03:27:58 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 1 Dec 2017 19:27:58 +1100 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201081337.c7fry5encm2nc4ob@python.ca> References: <20171201081337.c7fry5encm2nc4ob@python.ca> Message-ID: On Fri, Dec 1, 2017 at 7:13 PM, Neil Schemenauer wrote: > I have been working on reducing Python statup time. It would be > nice if there was some way to load a module into memory without exec > of its body code. I'm sure other people have wished for this. I haven't. Can you elaborate on where this is useful, please? ChrisA From ncoghlan at gmail.com Fri Dec 1 03:37:26 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 18:37:26 +1000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201081337.c7fry5encm2nc4ob@python.ca> References: <20171201081337.c7fry5encm2nc4ob@python.ca> Message-ID: On 1 December 2017 at 18:13, Neil Schemenauer wrote: > I have been working on reducing Python statup time. It would be > nice if there was some way to load a module into memory without exec > of its body code. I'm sure other people have wished for this. > > Perhaps there could be a new special function, similar to __import__ > for this purpose. E.g. __load_module__(). To actually execute the > module, I had the idea to make module objects callable, i.e. tp_call > for PyModule_Type. That's a little too cute though and will cause > confusion. Maybe instead, add a function attribute to modules, e.g. > mod.__exec__(). > > I have a little experimental code, just a small step: > > https://github.com/nascheme/cpython/tree/import_defer_exec > > We need importlib to give us the module object and the bytecode > without doing the exec(). What does actually doing the load give that simply calling https://docs.python.org/3/library/importlib.html#importlib.util.find_spec doesn't? At that point, you know the module exists, and how to load it, which is all a lazy loading implementations really needs to be confident that a subsequent actual execution attempt will be able to start. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Dec 1 03:52:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 18:52:52 +1000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> Message-ID: On 1 December 2017 at 18:37, Nick Coghlan wrote: > On 1 December 2017 at 18:13, Neil Schemenauer > wrote: >> I have been working on reducing Python statup time. It would be >> nice if there was some way to load a module into memory without exec >> of its body code. I'm sure other people have wished for this. >> >> Perhaps there could be a new special function, similar to __import__ >> for this purpose. E.g. __load_module__(). To actually execute the >> module, I had the idea to make module objects callable, i.e. tp_call >> for PyModule_Type. That's a little too cute though and will cause >> confusion. Maybe instead, add a function attribute to modules, e.g. >> mod.__exec__(). >> >> I have a little experimental code, just a small step: >> >> https://github.com/nascheme/cpython/tree/import_defer_exec >> >> We need importlib to give us the module object and the bytecode >> without doing the exec(). > > What does actually doing the load give that simply calling > https://docs.python.org/3/library/importlib.html#importlib.util.find_spec > doesn't? > > At that point, you know the module exists, and how to load it, which > is all a lazy loading implementations really needs to be confident > that a subsequent actual execution attempt will be able to start. After posting this, and while filing https://bugs.python.org/issue32192, I double checked how "importlib.util.module_from_spec" works, and it turns out that already handle the main part of what you're after: it creates the module without executing it. The actual execution is then handled by running "module.__spec__.loader.exec_module(module)". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ronaldoussoren at mac.com Fri Dec 1 04:15:35 2017 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 01 Dec 2017 10:15:35 +0100 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> Message-ID: <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> > On 1 Dec 2017, at 07:01, Nick Coghlan wrote: > > On 1 December 2017 at 01:23, Ronald Oussoren wrote: >> 1) Last time around Mark Shannon worried that this introduces infinite >> recursion in the language itself (in my crummy summary, please read this >> message to get the real concern >> ). Is >> this truly a problem? I don?t think there is a problem, but I?m worried >> that I don?t fully understand Mark?s concerns. >> >> 2) PEP 487 introduced __init_subclass__ as a class method to avoid having to >> write a metaclass for a number of use cases. My PEP currently does require >> a metaclass, but it might be nicer to switch to a regular class method >> instead (like __init_subclass__). > > I think the second point there may actually allow you to resolve the > first one, by way of making `__getdescriptor__` an optional override > of the typical lookup algorithm. That is: > > def _PyType_Lookup(tp, name): > > # New optional override for descriptor lookups > try: > # Ordinary attribute lookup, *NOT* a descriptor lookup > getdesc = tp.__getdescriptor__ > except AttributeError: > pass > else: > return getdesc(name) > > # Default algorithm used in the absence of an override > mro = tp.mro() > assert isinstance(mro, tuple) > > for base in mro: > assert isinstance(base, type) > > try: > return base.__dict__[name] > except KeyError: > pass > > return None > > If you did go this way, then we'd want to add a > "types.getdescriptor(cls, name)" API to expose _PyType_Lookup at the > Python layer, since "getattr(type(obj), name)" wouldn't be an accurate > emulation of the algorithm any more. Maybe, but how would this work with super()? Super walks the MRO of type of the instance, but skips the class on the MRO. This is not equivalent to walking the MRO of the second class on the MRO when you use multiple inheritance, This also has some other disadvantages. The first is that tp.__getdescriptor__ would replace the default behaviour for the entire MRO and it would be possible to have different behavior for classes on the MRO. The second, minor. one is that __getdescriptor__ would have to reimplement the default logic of walking the MRO, but that logic is fairly trivial. BTW. getattr(type(obj), name) is not an accurate emulation of _PyType_Lookup even now, thanks to metaclasses. In particular: ``` class A_meta (type): @property def description(self): return "meta description" @description.setter def description(self, v): raise RuntimeError class A (metaclass=A_meta): @property def description(self): return "description" @description.setter def description(self, v): raise RuntimeError a = A() print(A.description) # prints ?meta description" print(a.description) # prints ?description" print(getattr(type(a), 'description?)) # prints ?meta description" ``` The setter definitions are necessary to ensure that the properties are data descriptors, which are handled differently than function descriptors by __getattribute__. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirillbalunov at gmail.com Fri Dec 1 04:48:56 2017 From: kirillbalunov at gmail.com (Kirill Balunov) Date: Fri, 1 Dec 2017 12:48:56 +0300 Subject: [Python-ideas] How assignment should work with generators? In-Reply-To: References: Message-ID: 2017-11-29 22:33 GMT+03:00 Steve Barnes : > > Just a thought but what about a syntax something along the lines of: > > a, b, *remainder = iterable > > Where remainder becomes the iterable with the first two values consumed > by assigning to a & b. If the iterator has less than 2 values, (in the > above case), remaining it should error, if it has exactly 2 then > remainder would become an exhausted iterable. Of course the user could > also use: > > a, b, *iterable = iterable > > Others may differ but this syntax has a lot of similarity to the f(a, b, > *args) syntax, possibly enough that most users could understand it. > Before I started this thread, not so long ago, I have already asked a question about this semantics [1 ]. But it appears to be very ambiguous in practice for the various rhs: ... x, *y, z = some_iter *x , y, z = some_iter x, y, *z = some_iter And only for the last case it will mean something special. In addition, it is a huge backward compatibility break. Probably, some time ago it was necessary to split this thread into two questions: 1. Philosophical question regarding sequences and iterators. In particular, should they behave differently depending on the context, or, in other words, whether to emphasize their different nature as fixed-size containers and those that are lazily produce values on demand. 2. Additional syntax in the assignment statement for partial extraction of values from the iterable. 2017-11-30 22:19 GMT+03:00 Paul Moore : > > Mostly corner cases, and I don't believe there have been any non-artificial > examples posted in this thread. Certainly no-one has offered a real-life code example that is made > significantly worse by > the current semantics, and/or which couldn't be easily worked around > without needing a language change. Yes, in fact, this is a good question, is whether that is sufficiently useful to justify extending the syntax. But it is not about corner cases, it is rather usual situation. Nevertheless, this is the most difficult moment for Rationale. By now, this feature does not give you new opportunities for solving problems. It's more about expressiveness and convenience. You can write: x, y, ... = iterable or, it = iter(iterable) x, y = next(it), next(it) or, from itertools import isclice x, y = islice(iterable, 2) or, x, y = iterable[:2] and others, also in some cases when you have infinite generator or iterator, you should use 2nd or 3rd. In fact, this has already been said and probably I will not explain it better: 2017-11-28 1:40 GMT+03:00 Greg Ewing : > Guido van Rossum wrote: > >> Is this problem really important enough that it requires dedicated >> syntax? Isn't the itertools-based solution good enough? >> > > Well, it works, but it feels very clumsy. It's annoying to > have to specify the number of items in two places. > > Also, it seems perverse to have to tell Python to do *more* > stuff to mitigate the effects of stuff it does that you > didn't want it to do in the first place. > > Like I said, I'm actually surprised that this doesn't already > work. To me it feels more like filling in a piece of > functionality that was overlooked, rather than adding a > new feature. Filling in a pothole in the road rather than > bulding a new piece of road. > > (Pushing the road analogy maybe a bit too far, the current > itertools solution is like digging *more* potholes to make > the road bumpy enough that you don't notice the first > pothole.) > > (Or failing that, couldn't we add something to itertools to make it more >> readable rather than going straight to new syntax?) >> > > I'm not sure how we would do that. Even if we could, it > would still feel clumsy having to use anything from itertools > at all. With kind regards, -gdg -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Dec 1 05:17:28 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 1 Dec 2017 10:17:28 +0000 Subject: [Python-ideas] How assignment should work with generators? In-Reply-To: References: Message-ID: On 1 December 2017 at 09:48, Kirill Balunov wrote: > Probably, some time ago it was necessary to split this thread into two > questions: > 1. Philosophical question regarding sequences and iterators. In particular, > should they behave differently depending on the context, > or, in other words, whether to emphasize their different nature as > fixed-size containers and those that are lazily produce values on demand. > 2. Additional syntax in the assignment statement for partial extraction of > values from the iterable. That's a good summary of the two elements of the discussion here. On (1), I'd say that Python should *not* have context-dependent semantics like this. It's something Perl was famous for (list and scalar contexts) and IMO makes for pretty unreadable code. Python's Zen here is "Explicit is better than implicit". Specifically, having the semantics of the assignment statement vary depending on the type of the value being assigned seems like a very subtle distinction, and not in line with any other statement in the language. On (2), that's something that is relatively simple to debate - all of the normal rules for new syntax proposals apply - what problem does it solve, how much of an improvement over existing ways of solving the problem does the proposal give, how easy is it for beginners to understand and for people encountering it to locate the documentation, does it break backward compatibility, etc... Personally I don't think it's a significant enough benefit but I'm willing to be swayed if good enough arguments are presented (currently the "a, b, ... = value" syntax is my preferred proposal, but I don't think there's enough benefit to justify implementing it). > 2017-11-30 22:19 GMT+03:00 Paul Moore : >> >> >> Mostly corner cases, and I don't believe there have been any >> non-artificial examples posted in this thread. >> >> Certainly no-one has offered a real-life code example that is made >> significantly worse by >> the current semantics, and/or which couldn't be easily worked around >> without needing a language change. > > > Yes, in fact, this is a good question, is whether that is sufficiently > useful to justify extending the syntax. But it is not about corner cases, it > is rather usual situation. > Nevertheless, this is the most difficult moment for Rationale. By now, this > feature does not give you new opportunities for solving problems. It's more > about expressiveness and convenience. You can write: > > x, y, ... = iterable > > or, > > it = iter(iterable) > x, y = next(it), next(it) > > or, > > from itertools import isclice > x, y = islice(iterable, 2) > > or, > x, y = iterable[:2] > > and others, also in some cases when you have infinite generator or iterator, > you should use 2nd or 3rd. It's significant to me that you're still only able to offer artificial code as examples. In real code, I've certainly needed this type of behaviour, but it's never been particularly problematic to just use first_result = next(it) second_result - next(it) Or if I have an actual sequence, x, y = seq[:2] The next() approach actually has some issues if the iterator terminates early - StopIteration is typically not the exception I want, here. But all that means is that I should use islice more. The reason i don't think to is because I need to import it from itertools. But that's *not* a good argument - we could use the same argument to make everything a builtin. Importing functionality from modules is fundamental to Python, and "this is a common requirement, so it should be a builtin" is an argument that should be treated with extreme suspicion. What I *don't* have a problem with is the need to specify the number of items - that seems completely natural to me, I'm confirming that I require an iterable that has at least 2 elements at this point in my code. The above is an anecdotal explanation of my experience with real code - still not compelling, but hopefully better than an artificial example with no real-world context :-) > In fact, this has already been said and probably > I will not explain it better: > > 2017-11-28 1:40 GMT+03:00 Greg Ewing : >> >> Guido van Rossum wrote: >>> >>> Is this problem really important enough that it requires dedicated >>> syntax? Isn't the itertools-based solution good enough? >> >> >> Well, it works, but it feels very clumsy. It's annoying to >> have to specify the number of items in two places. >> >> Also, it seems perverse to have to tell Python to do *more* >> stuff to mitigate the effects of stuff it does that you >> didn't want it to do in the first place. >> >> Like I said, I'm actually surprised that this doesn't already >> work. To me it feels more like filling in a piece of >> functionality that was overlooked, rather than adding a >> new feature. Filling in a pothole in the road rather than >> bulding a new piece of road. >> >> (Pushing the road analogy maybe a bit too far, the current >> itertools solution is like digging *more* potholes to make >> the road bumpy enough that you don't notice the first >> pothole.) >> >>> (Or failing that, couldn't we add something to itertools to make it more >>> readable rather than going straight to new syntax?) >> >> >> I'm not sure how we would do that. Even if we could, it >> would still feel clumsy having to use anything from itertools >> at all. I'm typically suspicious of arguments based on "filling in the gaps" of existing functionality (largely because it's a fault I'm prone to myself). It's very easy to argue that way for features you'll never actually need in practice - so a "completeness" argument that's not backed up with real-world examples of use cases is weak, at least to me. And I've already commented above on my views of the "it would still feel clumsy having to use anything from itertools" argument. Paul From steve at pearwood.info Fri Dec 1 05:17:46 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 Dec 2017 21:17:46 +1100 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201081337.c7fry5encm2nc4ob@python.ca> References: <20171201081337.c7fry5encm2nc4ob@python.ca> Message-ID: <20171201101745.GJ22248@ando.pearwood.info> On Fri, Dec 01, 2017 at 02:13:37AM -0600, Neil Schemenauer wrote: > I have been working on reducing Python statup time. It would be > nice if there was some way to load a module into memory without exec > of its body code. I'm sure other people have wished for this. I don't understand why you would want to do this. Given a source file: # module.py spam = 1 eggs = 2 if you import the module without executing the code in the module, surely you'll get a bare module with nothing in it? Then: module.spam module.eggs will both fail with AttributeError. If that's what you mean, then no, I haven't wished for that. Unless I'm missing something, it seems pointless. When, and why, would I want to import an empty module? -- Steve From levkivskyi at gmail.com Fri Dec 1 05:10:57 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 1 Dec 2017 11:10:57 +0100 Subject: [Python-ideas] Stub class for Generic to further improve PEP 560 In-Reply-To: <5E326196-7E95-433D-8F1C-A1537278F31A@gmail.com> References: <2FCB1F0E-24AF-4ACC-922F-D5E44B618C37@gmail.com> <3ECD02CE-F378-4A75-BEA6-4F9AE1FF55AF@gmail.com> <5E326196-7E95-433D-8F1C-A1537278F31A@gmail.com> Message-ID: On 1 December 2017 at 00:34, Ilya Kulakov wrote: > Anyway, my expectation is that going along this way (i.e. removing all > runtime API apart from a necessary minimum) > will give a minor speed-up as compared to PEP 560 at the cost of a > breaking change (even for small number of developers). > > > I don't think the change will be breaking: usage of this class will be > entirely voluntarily and does not replace typing.Generic > > If you propose an additional class, then yes, it is non-breaking, but I still don't see much value given a minor performance improvement. > PEP 560 already gives overhead of 80% as compared to normal classes in > worst case scenario > (empty body with a single generic base). This is actually less than for > ABCs (they can give up to 120% in worst case scenario). > > > GenericMeta inherits from ABCMeta. Do you mean that it will be removed > after 560 lands? > > Yes, GenericMeta will be removed. > Moreover, performance is not a single motivation for PEP 560, there are > other arguments such as metaclass conflicts which will > not be solved without the machinery proposed by the PEP. > > > Perhaps you can consider designing Generic / GenericMeta in a way that > will allow end user to create GenericStub-alike class without much trouble? > This can be done, but the hardest part here is not to make the runtime changes, but to get the support of all static type checkers (they need to understand what GenericStub means and add all the necessary special casing). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Dec 1 05:32:11 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 20:32:11 +1000 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> Message-ID: On 1 December 2017 at 19:15, Ronald Oussoren wrote: > Maybe, but how would this work with super()? Super walks the MRO of type of > the instance, but skips the class on the MRO. This is not equivalent to > walking the MRO of the second class on the MRO when you use multiple > inheritance, > > This also has some other disadvantages. The first is that > tp.__getdescriptor__ would replace the default behaviour for the entire MRO > and it would be possible to have different behavior for classes on the MRO. > The second, minor. one is that __getdescriptor__ would have to reimplement > the default logic of walking the MRO, but that logic is fairly trivial. I believe those can both be addressed by structuring the override a little differently, and putting it at the level of individual attribute retrieval on a specific class (rather than on all of its subclasses): def _PyType_Lookup(tp, name): mro = tp.mro() assert isinstance(mro, tuple) for base in mro: assert isinstance(base, type) try: getdesc = base.__dict__["__getdescriptor__"] except KeyError: try: return base.__dict__[name] except KeyError: pass else: try: return getdesc(tp, base, name) except AttributeError: pass return None In that version, the __getdescriptor__ signature would be: def __getdescriptor__(dynamic_cls, base_cls, attr): ... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Dec 1 05:36:18 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 20:36:18 +1000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201101745.GJ22248@ando.pearwood.info> References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201101745.GJ22248@ando.pearwood.info> Message-ID: On 1 December 2017 at 20:17, Steven D'Aprano wrote: > If that's what you mean, then no, I haven't wished for that. Unless I'm > missing something, it seems pointless. When, and why, would I want to > import an empty module? Having access to something along these lines is the core building block for lazy loading. You figure out everything you need to actually load the module up front (so you still get an immediate ImportError if the module doesn't even exist), but then defer actually finishing the load to the first __getattr__ invocation (so if you never actually use the module, you avoid any transitive imports, as well as any other costs of initialising it). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Dec 1 05:37:14 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 Dec 2017 21:37:14 +1100 Subject: [Python-ideas] [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <44f6f218-de45-6b33-d7eb-99b98dea4e35@mail.mipt.ru> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <20171129060843.GZ22248@ando.pearwood.info> <44f6f218-de45-6b33-d7eb-99b98dea4e35@mail.mipt.ru> Message-ID: <20171201103713.GL22248@ando.pearwood.info> On Thu, Nov 30, 2017 at 08:02:08PM +0300, Ivan Pozdeev via Python-ideas wrote: > My experience with these operators in C# says: > * They do save "more than a few keystrokes". Even more importantly, they > allow to avoid double evaluation or the need for a temporary variable > workaround that are inherent in " if else " > ??? * (An alternative solution for the latter problem would be an > assignment expression, another regularly rejected proposal.) > * They make it temptingly easy and implicit to ignore errors. How? > * They are alien to Python's standard semantics on search failure which > is to raise an exception rather than return None Alien, like this? py> mo = re.match(r'\d:', 'abc') py> mo is None True Besides, who said this is limited to searching? I don't remember even a single example in the PEP being about searching. -- Steve From ronaldoussoren at mac.com Fri Dec 1 06:04:00 2017 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 01 Dec 2017 12:04:00 +0100 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> Message-ID: > On 1 Dec 2017, at 11:32, Nick Coghlan wrote: > > On 1 December 2017 at 19:15, Ronald Oussoren wrote: > >> Maybe, but how would this work with super()? Super walks the MRO of type of >> the instance, but skips the class on the MRO. This is not equivalent to >> walking the MRO of the second class on the MRO when you use multiple >> inheritance, >> >> This also has some other disadvantages. The first is that >> tp.__getdescriptor__ would replace the default behaviour for the entire MRO >> and it would be possible to have different behavior for classes on the MRO. >> The second, minor. one is that __getdescriptor__ would have to reimplement >> the default logic of walking the MRO, but that logic is fairly trivial. > > I believe those can both be addressed by structuring the override a > little differently, and putting it at the level of individual > attribute retrieval on a specific class (rather than on all of its > subclasses): > > def _PyType_Lookup(tp, name): > mro = tp.mro() > assert isinstance(mro, tuple) > > for base in mro: > assert isinstance(base, type) > > try: > getdesc = base.__dict__["__getdescriptor__"] > except KeyError: > try: > return base.__dict__[name] > except KeyError: > pass > else: > try: > return getdesc(tp, base, name) > except AttributeError: > pass > > return None > > In that version, the __getdescriptor__ signature would be: > > def __getdescriptor__(dynamic_cls, base_cls, attr): > ? That?s basically what?s in the PEP, except that the PEP says that type will implement __getdescriptor__ to make cooperative subclassing easier. The current patch inlines type.__getdescriptor__ in the lookup code for efficiency reasons (that is, the runtime cost for not using this feature is basically a pointer test instead of a function call), but would work just as well without inlining. I?m pretty sure that the first concern isn?t really there, in the end attribute/descriptor resolution ends up in object and type whose implementation must be magic in some way, even without this PEP. The second question is more a design question: what?s the better design, having __getdescriptor__ as a class method on classes or as method on metaclasses? Either one would work, but a class method appears to be easier to use and with the introduction of __init_subclass__ there is a precedent for going for a class method. The current PEP claims that a method on a metaclass would be better to avoid subtle problems, but ignores the conceptual cost of adding a metaclass. The subtle problem is that a class can have two direct superclasses with a __getdescriptor__ when using multiple inheritance, but that can already be an issue for other methods and that currently includes __getattribute__ for most of not all usecases where __getdescriptor__ would be useful. Ronald From ncoghlan at gmail.com Fri Dec 1 06:29:10 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 21:29:10 +1000 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> Message-ID: On 1 December 2017 at 21:04, Ronald Oussoren wrote: > The second question is more a design question: what?s the better design, having __getdescriptor__ as a class method on classes or as method on metaclasses? Either one would work, but a class method appears to be easier to use and with the introduction of __init_subclass__ there is a precedent for going for a class method. > > The current PEP claims that a method on a metaclass would be better to avoid subtle problems, but ignores the conceptual cost of adding a metaclass. The subtle problem is that a class can have two direct superclasses with a __getdescriptor__ when using multiple inheritance, but that can already be an issue for other methods and that currently includes __getattribute__ for most of not all usecases where __getdescriptor__ would be useful. I think it's having it being a method on the metaclass that creates the infinite regress Mark was worried about: since type's metaclass *is* type, if "__getdescriptor__" is looked up as a regular descriptor in its own right, then there's no base case to terminate the recursive lookup. By contrast, defining it as a class method opens up two options: 1. Truly define it as a class method, and expect implementors to call super().__getdescriptor__() if their own lookup fails. I think this will be problematic and a good way to get the kinds of subtle problems that prompted you to initially opt for the metaclass method. 2. Define it as a class method, but have the convention be for the *caller* to worry about walking the MRO, and hence advise class implementors to *never* call super() from __getdescriptor__ implementations (since doing so would nest MRO walks, and hence inevitably have weird outcomes). Emphasise this convention by passing the current base class from the MRO as the second argument to the method. The reason I'm liking option 2 is that it leaves the existing __getattribute__ implementations fully in charge of the MRO walk, and *only* offers a way to override the "base.__dict__[name]" part with a call to "base.__dict__['__getdescriptor__'](cls, base, name)" instead. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Dec 1 08:40:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Dec 2017 23:40:52 +1000 Subject: [Python-ideas] PEP 505 vs matrix multiplication In-Reply-To: <20171130194956.73732307@fsol> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <20171129060843.GZ22248@ando.pearwood.info> <20171130194956.73732307@fsol> Message-ID: On 1 December 2017 at 04:49, Antoine Pitrou wrote: > On Wed, 29 Nov 2017 18:14:36 +1000 > Nick Coghlan wrote: >> >> As far as utility goes, I put it in a similar category to matrix >> multiplication: if you don't need it, you don't need it, but when you >> do need it, you need it a *lot*. > > As someone who appreciates both the matrix multiplication operator and > "async/await", I really don't think PEP 505-style operators (regardless > of their spellings) fall into the same conceptual bucket. > > There's no risk of matrix multiplication operators bleeding into > non-domain specific code, and the readers of domain specific code > already know about the matrix multiplication operator and what it > does (or they should anyway, since it's so damn useful). It's like > "async/await": you won't find them in regular non-async code, so the > mental burden only falls on specialists who write and read event-driven > networking code (mostly, even though Guido would like to see parsers > based on the idiom too :-)). Conversely, PEP 505-style operators may > appear in everyday code regardless of their application domain or > target. This in turn increases the mental burden for *everyone*. I genuinely don't think these kinds of operators are all that useful outside the specific domain of working with semi-structured hierarchical data stored in graph databases and document stores like MongoDB, ElasticSearch, and PostgreSQL JSONB columns, or else piping data between such stores and JSON consuming clients. If there was a high change of their being broadly adopted outside those domains, I don't think we'd be seeing the sharp division of opinion that we see between folks that consider these operators to be obviously useful, and those that are honestly befuddled as to why on earth anyone would ever want them. It's just that where matrix multiplication and async programming have rich vocabularies and computer science foundations to draw on, the None-aware and None-severing operators in different languages arise more from the pragmatic hackery of working with semi-structured data for tasks that are essentially a matter of reshaping blobs of JSON from one system to feed into another (it's an imperative approach to the kind of work that XSLT does for XML in a more declarative way). The closest mathematical equivalent is a quiet NaN, but the PEP already discusses some of the limitations of pursuing that approach for algorithmic operations in Python: https://www.python.org/dev/peps/pep-0505/#haskell-style-maybe I think the PEP as currently written honestly goes too far into symbolic magic, and hence doesn't give a reader enough hints to plausibly guess what "?." means if they've never seen it before: return jsonify( first_seen=site.first_seen?.isoformat(), id=site.id, is_active=site.is_active, last_seen=site.last_seen?.isoformat(), url=site.url.rstrip('/') ) Thus the idea of possibly using "??" as a pronoun symbol (akin to "_" at the interactive prompt) to allow both the condition and the RHS in a conditional expression to refer to the LHS: return jsonify( first_seen = site.first_seen if ?? is None else ??.isoformat(), id=site.id, is_active = site.is_active, last_seen = site.last_seen if ?? is None else ??.isoformat(), url = site.url.rstrip('/') ) Here, even someone who's never seen "??" before has at least some chance of guessing "OK, it looks like some kind of implicitly defined variable reference. What might it be referring to? Well, the code calls a method on it if it isn't None, so perhaps it means the LHS of the conditional expression it appears in?". And the transcription to English would probably use an actual pronoun: "We set first_seen to site.first_seen if that's None, otherwise we set it to the result of site.first_seen's isoformat() method" Further suggesting a potential name for the symbol: a "that" reference. (Where precisely what "that" refers to will depend on where the symbol appears, similar to regular pronoun usage in English). It isn't the same way that languages that use " ? : " for their conditional expression syntax do things, but spelling conditional expressions as " if else " is already pretty unique in its own right :) "That" references could also be expanded to comprehensions and generator expressions in a fairly useful way: [f(x) for x in iterable if ?? is not None] Pronounced something like "f of x, if that's not None, for x in iterable". Cheers, Nick. P.S. As previously noted, I don't think we should rush into anything for 3.7 on this point, hence my deferral of all the related PEPs, rather than requesting pronouncement. I do think the symbolic pronoun idea is potentially worth exploring further for 3.8 though. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Dec 1 09:16:36 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 1 Dec 2017 14:16:36 +0000 Subject: [Python-ideas] PEP 505 vs matrix multiplication In-Reply-To: References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <20171129060843.GZ22248@ando.pearwood.info> <20171130194956.73732307@fsol> Message-ID: On 1 December 2017 at 13:40, Nick Coghlan wrote: > I genuinely don't think these kinds of operators are all that useful > outside the specific domain of working with semi-structured > hierarchical data stored in graph databases and document stores like > MongoDB, ElasticSearch, and PostgreSQL JSONB columns, or else piping > data between such stores and JSON consuming clients. In that case, surely there are 3rd party libraries that help with extracting such data from raw objects? Or if not, how seriously has anyone looked at developing one? With the ability to create specialised getattr and getitem behaviour, is it really so difficult to produce a class that allows users to extract hierarchical data? I know it probably couldn't do as good a job as if there were dedicated syntax, but as a basis for a proposal that said "current best practice (using module XXX) looks like this, but would be improved with the following language support" it would help to ground the discussion in real use cases. In the context of comparisons with matrix multiplication, PEP 465 put a lot of time into explaining how all the ways of approaching the problem short of a language change had been tried and found wanting. Maybe PEP 505 should be held to a similar standard? At the moment, 99% of the discussion seems rooted in generalised "it would help a lot of code" with readability arguments based on artificial examples, and that's not really helping move the discussion forward. To be clear, I understand the problem of reading semi-structured data. I've hit it myself and been frustrated by it. But my reaction was "why am I not able to find a library that does this?", and when I couldn't find such a library, my assumption was that people in general don't find the current behaviour sufficiently frustrating to do anything about it. And I was in the same situation - it annoys me, but not enough to write a helper module (and certainly not enough that I'm crying out for a language change). So I do appreciate the need, I just don't think "language change" should be the first thing that's suggested. Paul PS Some of the above may have been covered in the PEPs and previous discussions. I haven't reread them - but any serious reboot of the discussion should probably start with a summary of where we're up to. From vano at mail.mipt.ru Fri Dec 1 09:51:44 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Fri, 1 Dec 2017 17:51:44 +0300 Subject: [Python-ideas] Add a dict with the attribute access capability In-Reply-To: <5A208405.6030407@canterbury.ac.nz> References: <3d3c8f2c-0722-1012-6d15-a5706f79ee55@mail.mipt.ru> <5A208405.6030407@canterbury.ac.nz> Message-ID: On 01.12.2017 1:19, Greg Ewing wrote: > Ivan Pozdeev via Python-ideas wrote: >> I needed to hold an external function reference in an object instance >> (if I assigned it to an attribute, it was converted into an instance >> method). > > No, that only happens to functions stored in *class* attributes, > not instance attributes. > > >>> class A: > ...??? pass > ... > >>> a = A() > >>> > >>> def f(): > ...??? print("I'm just a function") > ... > >>> a.x = f > >>> a.x() > I'm just a function > Well, yes, that was a singleton class, so I kept data in the class object. Now I can simplify the code by only keeping the instance reference in the class, thank you. (Without knowing this, that bore no visible benefits.) -- Regards, Ivan From brent.bejot at gmail.com Fri Dec 1 10:23:37 2017 From: brent.bejot at gmail.com (brent bejot) Date: Fri, 1 Dec 2017 10:23:37 -0500 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201101745.GJ22248@ando.pearwood.info> Message-ID: I have found myself implementing something like this before. I was working on a command-line tool with nested sub-commands. Each sub-command would import a script and execute something out of it. I ended up moving the importing of those little scripts into the functions that called them because importing all of them was slowing things down. A built-in lazy importer would have made for a better solution. On Fri, Dec 1, 2017 at 5:36 AM, Nick Coghlan wrote: > On 1 December 2017 at 20:17, Steven D'Aprano wrote: > > If that's what you mean, then no, I haven't wished for that. Unless I'm > > missing something, it seems pointless. When, and why, would I want to > > import an empty module? > > Having access to something along these lines is the core building > block for lazy loading. You figure out everything you need to actually > load the module up front (so you still get an immediate ImportError if > the module doesn't even exist), but then defer actually finishing the > load to the first __getattr__ invocation (so if you never actually use > the module, you avoid any transitive imports, as well as any other > costs of initialising it). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Fri Dec 1 10:12:29 2017 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 01 Dec 2017 16:12:29 +0100 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> Message-ID: <9FAC22C4-F679-4033-82F3-988571332B06@mac.com> > On 1 Dec 2017, at 12:29, Nick Coghlan wrote: > > On 1 December 2017 at 21:04, Ronald Oussoren wrote: >> The second question is more a design question: what?s the better design, having __getdescriptor__ as a class method on classes or as method on metaclasses? Either one would work, but a class method appears to be easier to use and with the introduction of __init_subclass__ there is a precedent for going for a class method. >> >> The current PEP claims that a method on a metaclass would be better to avoid subtle problems, but ignores the conceptual cost of adding a metaclass. The subtle problem is that a class can have two direct superclasses with a __getdescriptor__ when using multiple inheritance, but that can already be an issue for other methods and that currently includes __getattribute__ for most of not all usecases where __getdescriptor__ would be useful. > > I think it's having it being a method on the metaclass that creates > the infinite regress Mark was worried about: since type's metaclass > *is* type, if "__getdescriptor__" is looked up as a regular descriptor > in its own right, then there's no base case to terminate the recursive > lookup. But type.__getattribute__ is already special, it cannot be the same as its superclass implementation (because that?s object), and object.__getattribute__ logically uses type.__getattribute__ to get at type.__dict__. Adding __getdescriptor__ in the mix makes type.__getattribute__ a bit messier, but not by much. > > By contrast, defining it as a class method opens up two options: > > 1. Truly define it as a class method, and expect implementors to call > super().__getdescriptor__() if their own lookup fails. I think this > will be problematic and a good way to get the kinds of subtle problems > that prompted you to initially opt for the metaclass method. The only subtle problem is having a class using multiple inheritance that uses two __getdescriptor__ implementations from two superclasses, where both do something beyond looking up the name in __dict__. Failing to call super().__getdescriptor__() is similar to failing to do so for other methods. > > 2. Define it as a class method, but have the convention be for the > *caller* to worry about walking the MRO, and hence advise class > implementors to *never* call super() from __getdescriptor__ > implementations (since doing so would nest MRO walks, and hence > inevitably have weird outcomes). Emphasise this convention by passing > the current base class from the MRO as the second argument to the > method. But that?s how I already define the method, that is the PEP proposes to change the MRO walking loop to: for cls in mro_list: try: return cls.__getdescriptor__(name) # was cls.__dict__[name] except AttributeError: # was KeyError pass Note that classes on the MRO control how to try to fetch the name at that level. The code is the same for __getdescriptor__ as a classmethod and as a method on the metaclass. I don?t think there?s a good technical reason to pick either option, other than that the metaclass option forces an exception when creating a class that inherits (using multiple inheritance) from two classes that have a custom __getdescriptor__. I?m not convinced that this is a good enough reason to go for the metaclass option. > > The reason I'm liking option 2 is that it leaves the existing > __getattribute__ implementations fully in charge of the MRO walk, and > *only* offers a way to override the "base.__dict__[name]" part with a > call to "base.__dict__['__getdescriptor__'](cls, base, name)" instead. Right. That?s why I propose __getdescriptor__ in the first place. This allows Python coders to do extra work (or different work) to fetch an attribute of a specific class in the MRO and works both with regular attribute lookup as well as lookup through super(). The alternative I came up with before writing the PEP is to add a special API that can be used by super(), but that leads to more code duplication as coders would have to implement both __getattribute__ and this other method. I guess another options would be a method that does the work including walking the MRO, but that leads to more boilerplate for users of the API. BTW. I haven?t had a lot of time to work on the implementation. The code in typeobject.c has changed enough that this needs more work than tweaking the patch until it applies cleanly. Ronald From chris.barker at noaa.gov Fri Dec 1 12:40:51 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 1 Dec 2017 09:40:51 -0800 Subject: [Python-ideas] PEP 505 vs matrix multiplication In-Reply-To: References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <20171129060843.GZ22248@ando.pearwood.info> <20171130194956.73732307@fsol> Message-ID: On Fri, Dec 1, 2017 at 6:16 AM, Paul Moore wrote: > > I genuinely don't think these kinds of operators are all that useful > > outside the specific domain of working with semi-structured > > hierarchical data stored in graph databases and document stores like > > MongoDB, ElasticSearch, and PostgreSQL JSONB columns, or else piping > > data between such stores and JSON consuming clients. > > In that case, surely there are 3rd party libraries that help with > extracting such data from raw objects? Sure -- it's handled by validation libraries like Colander, for instance: https://docs.pylonsproject.org/projects/colander/en/latest/ And I'd be shocked if there weren't similar functionality built in to PyMongo and such. Which makes a lot of sense -- if you want to enforce any kind of schema, you need to specify which fields are optional, and handle those cases. So this is perhaps largely about making it easier to write such libraries. Though the other really common use case is None default parameters: def fun(self, par1, par2, par3=None): if par3 is None: self.par3 = something_mutable is a really common idiom -- though not all the verbose. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas-python-ideas at arctrix.com Fri Dec 1 13:12:26 2017 From: nas-python-ideas at arctrix.com (Neil Schemenauer) Date: Fri, 1 Dec 2017 12:12:26 -0600 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> Message-ID: <20171201181226.at7ugd5ks4yyo7ro@python.ca> On 2017-12-01, Chris Angelico wrote: > Can you elaborate on where this is useful, please? Introspection tools, for example, might want to look at the module without executing it. Also, it is a building block to make lazy loading of modules work. As Nick points out, importlib can do this already. Currently, the IMPORT_NAME both loads the code for a module and also executes it. The exec happens fairly deep in the guts of importlib. This makes import.c and ceval.c mutually recursive. The locking gets complicated. There are hacks like _call_with_frames_removed() to hide the recursion going on. Instead, we could have two separate opcodes, one that gets the module but does not exec it (i.e. a function like __import__() that returns a future) and another opcode that actually does the execution. Figuring out all the details is complicated. Possible benefits: - importlib is simpler - reduce the amount of stack space used (removing recursion by "continuation passing style"). - makes profiling Python easier. Tools like valgrind get confused by call cycle between ceval.c and import.c. - easier to implement lazy loading of modules (not necessarily a standard Python feature but will make 3rd party implementations cleaner) I'm CCing Brett as I'm sure he has thoughts on this, given his intimate knowledge of importlib. To me, it seems like __import__() has a terribly complicated API because it does so many different things. Maybe two opcodes is not even enough. Maybe we should have one to resolve relative imports (i.e. import.c:resolve_name), one to load but not exec a module given its absolute name (i.e. _find_and_load() without the exec), one to exec a loaded module, one or more to handle the horror of "fromlist" (i.e. _handle_fromlist()). Regards, Neil From c at anthonyrisinger.com Fri Dec 1 16:04:27 2017 From: c at anthonyrisinger.com (C Anthony Risinger) Date: Fri, 1 Dec 2017 15:04:27 -0600 Subject: [Python-ideas] How assignment should work with generators? In-Reply-To: <20171130040439.GE22248@ando.pearwood.info> References: <20171127135502.GF22248@ando.pearwood.info> <20171127153550.GI22248@ando.pearwood.info> <5A1C8879.2050105@canterbury.ac.nz> <5A1F32FC.2010502@canterbury.ac.nz> <20171130040439.GE22248@ando.pearwood.info> Message-ID: On Nov 29, 2017 10:09 PM, "Steven D'Aprano" wrote: > On Thu, Nov 30, 2017 at 11:21:48AM +1300, Greg Ewing wrote: > > > It seems that many people think about unpacking rather > > differently from the way I do. I think the difference > > is procedural vs. declarative. > > > > To my way of thinking, something like > > > > a, b, c = x > > > > is a pattern-matching operation. It's declaring that > > x is a sequence of three things, and giving names to > > those things. It's not saying to *do* anything to x. > > I hadn't thought of that interpretation before, but now that Greg > mentions it, its so obvious-in-hindsight that I completely agree with > it. I think that we should promote this as the "one obvious" > interpretation. > > Obviously as a practical matter, there are some x (namely iterators) > where you cannot extract items without modifying x, but in all other > cases I think that the pattern-matching interpretation is superiour. > This conversation about suitcases, matching, and language assumptions is interesting. I've realized two concrete things about how I understand unpacking, and perhaps, further explain the dissonance we have here: * Unpacking is destructuring not pattern matching. * Tuple syntax is commas, paren, one, or both. For the former, destructuring, this reply conveys my thoughts verbatim: https://groups.google.com/forum/#!topic/clojure/SUoThs5FGvE "There are two different concerns in what people refer to as "pattern matching": binding and flow-control. Destructuring only addresses binding. Pattern matching emphasizes flow control, and some binding features typically come along for free with whatever syntax it uses. (But you could in principle have flow control without binding.)" The only part of unpacking that is 'pattern matching' is the fact that it blows up spectacularly when the LHS doesn't perfectly match the length of RHS, reversing flow via exception: >>> 0,b = 0,1 File "", line 1 SyntaxError: can't assign to literal If Python really supported pattern matching (which I would 100% love! yes please), and unpacking was pattern matching, the above would succeed because zero matches zero. Pattern matching is used extensively in Erlang/Elixir for selecting between various kinds of clauses (function, case, etc), but you also see *significant* use of the `[a, b | _] = RHS` construct to ignore "the remainder" because 99% of the time what you really want is to [sometimes!] match a few things, bind a few others, and ignore what you don't understand or need. This is why destructuring Elixir maps or JS objects never expect (or even support AFAIK) exact-matching the entire object... it would render this incredible feature next to useless! *Destructuring is opportunistic if matching succeeds*. For the latter, tuples-are-commas-unless-they-are-parens :-), I suspect I'm very much in the minority here. While Python is one of my favorite languages, it's only 1 of 10, and I didn't learn it until I was already 4 languages deep. It's easy to forget how odd tuples are because they are so baked in, but I've had the "well, ehm, comma is the tuple constructor... usually" or "well, ehm, you are actually returning 1 tuple... not 2 things" conversation with *dozens* of seasoned developers. Even people professionally writing Python for months or more. Other languages use more explicit, visually-bounded object constructors. This makes a meaningful difference in how a human's intuition interprets the meaning of a new syntax. *Objects start and end but commas have no inherent boundaries*. These two things combined lead to unpacking problems because I look at all assignments through the lens of destructuring (first observation) and unpacking almost never uses parentheses (second observation). To illustrate this better, the following is how my mind initially parses different syntax contrasted with what's actually happening (and thus the correction I've internalized over a decade): >>> a, b = 5, 6 CONCEPT ... LHS[0] `a` bound to RHS[0] `5`, LHS[1] `b` bound to RHS[1] `6`. REALITY ... LHS[:] single tuple destructured to RHS[:] single tuple. >>> a, b = 5, 6, 7 CONCEPT ... LHS[0] `a` bound to RHS[0] `5`, LHS[1] `b` bound to RHS[1] `6`, RHS[2] `6` is unbound expression. REALITY ... LHS[:] single tuple destructured to RHS[:] single tuple, bad match, RAISE ValueError! >>> (a, b) = 5, 6 >>> [a, b] = 5, 6 CONCEPT ... LHS[0] `(a, b)` bound to RHS[0] `5`, bad match, RAISE TypeError! REALITY ... LHS[:] single tuple destructured to RHS[:] single tuple. >>> a, b = it >>> [a], b = it CONCEPT ... LHS[0] `a` bound with RHS[0] `it`, LHS[1] is bad match, RAISE UnboundLocalError/NameError! REALITY ... `a` bound to `it[0]` and `b` bound to `it[1]` (`it[N]` for illustration only!) The tuple thing in particular takes non-zero time to internalize. I consider it one of Python's warts, attributed to times explained and comparisons with similar languages. Commas at the top-level, with no other construction-related syntax, look like expression groups or multiple returns. You have to already know Python's implicit tuple quirks to rationalize what it's really doing. This helps explain why I suggested the `LHS0, LHS1 = *RHS` syntax, because it would read "expand RHS[0] into RHS[:]". Thanks, -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Dec 1 16:55:11 2017 From: brett at python.org (Brett Cannon) Date: Fri, 01 Dec 2017 21:55:11 +0000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201181140.5dmb525bl4reg2va@python.ca> References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201181140.5dmb525bl4reg2va@python.ca> Message-ID: On Fri, 1 Dec 2017 at 10:11 Neil Schemenauer wrote: > On 2017-12-01, Chris Angelico wrote: > > Can you elaborate on where this is useful, please? > > Introspection tools, for example, might want to look at the module > without executing it. Also, it is a building block to make lazy loading > of modules work. As Nick points out, importlib can do this already. > > Currently, the IMPORT_NAME both loads the code for a module and also > executes it. The exec happens fairly deep in the guts of importlib. > This makes import.c and ceval.c mutually recursive. The locking gets > complicated. There are hacks like _call_with_frames_removed() to hide > the recursion going on. > > Instead, we could have two separate opcodes, one that gets the module > but does not exec it (i.e. a function like __import__() that returns a > future) and another opcode that actually does the execution. Figuring > out all the details is complicated. > > Possible benefits: > > - importlib is simpler > > - reduce the amount of stack space used (removing recursion by > "continuation passing style"). > > - makes profiling Python easier. Tools like valgrind get confused > by call cycle between ceval.c and import.c. > > - easier to implement lazy loading of modules (not necessarily a > standard Python feature but will make 3rd party implementations > cleaner) > > I'm CCing Brett as I'm sure he has thoughts on this, given his intimate > knowledge of importlib. To me, it seems like __import__() has a > terribly complicated API because it does so many different things. > I have always assumed the call signature for __import__() was because the import-related opcodes pushed so much logic into the function instead of doing it in opcodes (I actually blogged about this at https://snarky.ca/if-i-were-designing-imort-from-scratch/). Heck, the thing takes in locals() and yet never uses them (and its use of globals() is restricted to specific values so it really doesn't need to be quite so broad). Basically I wished __import__() looked like importlib.import_module(). > > Maybe two opcodes is not even enough. Maybe we should have one to > resolve relative imports (i.e. import.c:resolve_name), one to load but > not exec a module given its absolute name (i.e. _find_and_load() > without the exec), one to exec a loaded module, one or more to handle > the horror of "fromlist" (i.e. _handle_fromlist()). > I have always wanted to at least break up getting the module and fromlist as separate opcodes, so +1 for that. Name resolution could potentially be done as an opcode as it relies on execution state pulled from the globals of the module, but the logic also isn't difficult so +0 for that (i.e. making an opcode that calls something more like importlib.import_module() is more critical to me than eliminating the 'package' argument to that call, but I don't view it as a bad thing to have another opcode for that either). As for the completely separating the loading and execution, I don't have a need for what's being proposed so I don't have an opinion. I basically made sure Eric Snow structured specs so that lazy loading as currently supported works so I got what I wanted for basic lazy importing (short of the PyPI package I keep talking about writing to add a nicer API around lazy importing :) . -Brett > > Regards, > > Neil > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Dec 1 17:46:48 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 2 Dec 2017 09:46:48 +1100 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201101745.GJ22248@ando.pearwood.info> Message-ID: <20171201224648.GM22248@ando.pearwood.info> On Fri, Dec 01, 2017 at 10:23:37AM -0500, brent bejot wrote: > I have found myself implementing something like this before. I was working > on a command-line tool with nested sub-commands. Each sub-command would > import a script and execute something out of it. I ended up moving the > importing of those little scripts into the functions that called them > because importing all of them was slowing things down. A built-in lazy > importer would have made for a better solution. If I understand your use-case, you have a bunch of functions like this: def spam_subcommand(): import spam spam.command() def eggs_subcommand(): import eggs eggs.command() With lazy importing, you might have something like this: spam = lazy_import('spam') eggs = lazy_import('eggs') def spam_subcommand(): load(spam) spam.command() def eggs_subcommand(): load(eggs) eggs.command() I don't see the benefit for your use-case. How would it be better? Have I missed something? -- Steve From python at mrabarnett.plus.com Fri Dec 1 20:15:18 2017 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 2 Dec 2017 01:15:18 +0000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201224648.GM22248@ando.pearwood.info> References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201101745.GJ22248@ando.pearwood.info> <20171201224648.GM22248@ando.pearwood.info> Message-ID: <172eb8ef-562b-65b1-0898-433774c5d3ab@mrabarnett.plus.com> On 2017-12-01 22:46, Steven D'Aprano wrote: > On Fri, Dec 01, 2017 at 10:23:37AM -0500, brent bejot wrote: > >> I have found myself implementing something like this before. I was working >> on a command-line tool with nested sub-commands. Each sub-command would >> import a script and execute something out of it. I ended up moving the >> importing of those little scripts into the functions that called them >> because importing all of them was slowing things down. A built-in lazy >> importer would have made for a better solution. > > If I understand your use-case, you have a bunch of functions like this: > > def spam_subcommand(): > import spam > spam.command() > > def eggs_subcommand(): > import eggs > eggs.command() > > > With lazy importing, you might have something like this: > > spam = lazy_import('spam') > eggs = lazy_import('eggs') > > def spam_subcommand(): > load(spam) > spam.command() > > def eggs_subcommand(): > load(eggs) > eggs.command() > > > I don't see the benefit for your use-case. How would it be better? Have > I missed something? > You don't think you'd need the 'load'; you'd delay execution of the module's code until the first attribute access. All of the script's module dependencies would be listed at the top, but you could avoid most of the cost of importing a module until you know that you need the module's functionality. From greg.ewing at canterbury.ac.nz Sat Dec 2 16:51:05 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 03 Dec 2017 10:51:05 +1300 Subject: [Python-ideas] How assignment should work with generators? In-Reply-To: References: <20171127135502.GF22248@ando.pearwood.info> <20171127153550.GI22248@ando.pearwood.info> <5A1C8879.2050105@canterbury.ac.nz> <5A1F32FC.2010502@canterbury.ac.nz> <20171130040439.GE22248@ando.pearwood.info> Message-ID: <5A232049.5080805@canterbury.ac.nz> C Anthony Risinger wrote: > * Unpacking is destructuring not pattern matching. We're just arguing about the definition of terms now. The point I was making is that unpacking is fundamentally a declarative construct, or at least that's how I think about it. I used the term "pattern matching" because that's something unambiguously declarative. Terms like "unpacking" and "destructuring" can be misleading to the uninitiated, because they sound like they're doing something destructive to the original object. > * Tuple syntax is commas, paren, one, or both. The only situation where parentheses make a tuple is the case of the 0-tuple. Even then, you could argue that the tuple is really the empty space between the parens, and the parens are only there to make it visible. :-) I agree that this is out of step with mathematical convention, but it does make multiple-value returns look nice. There you don't really want to have to think about the fact that there's a tuple involved. -- Greg From ncoghlan at gmail.com Sat Dec 2 21:58:02 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 12:58:02 +1000 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: <9FAC22C4-F679-4033-82F3-988571332B06@mac.com> References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> <9FAC22C4-F679-4033-82F3-988571332B06@mac.com> Message-ID: On 2 December 2017 at 01:12, Ronald Oussoren wrote: > On 1 Dec 2017, at 12:29, Nick Coghlan wrote: >> 2. Define it as a class method, but have the convention be for the >> *caller* to worry about walking the MRO, and hence advise class >> implementors to *never* call super() from __getdescriptor__ >> implementations (since doing so would nest MRO walks, and hence >> inevitably have weird outcomes). Emphasise this convention by passing >> the current base class from the MRO as the second argument to the >> method. > > But that?s how I already define the method, that is the PEP proposes to change > the MRO walking loop to: > > for cls in mro_list: > try: > return cls.__getdescriptor__(name) # was cls.__dict__[name] > except AttributeError: # was KeyError > pass > > Note that classes on the MRO control how to try to fetch the name at that level. The code is the same for __getdescriptor__ as a classmethod and as a method on the metaclass. That's not exactly the same as what I'm suggesting, and it's the part that has Mark concerned about an infinite regression due to the "cls.__getdescriptor__" subexpression. What I'm suggesting: try: getdesc = base.__dict__["__getdescriptor__"] except KeyError: # Use existing logic else: try: getdesc(cls, base, name) except AttributeError: pass * Neither type nor object implement __getdescriptor__ * Calling super() in __getdescriptor__ would be actively discouraged without a base class to define the cooperation rules * If it's missing in the base class dict, fall back to checking the base class dict directly for the requested attribute * cls is injected into the call by the MRO walking code *not* the normal bound method machinery * Only "base.__dict__" needs to be assured of getting a hit on every base class What's currently in the PEP doesn't clearly define how it thinks "cls.__getdescriptor__" should work without getting itself into an infinite loop. > I don?t think there?s a good technical reason to pick either option, other than that the metaclass option forces an exception when creating a class that inherits (using multiple inheritance) from two classes that have a custom __getdescriptor__. I?m not convinced that this is a good enough reason to go for the metaclass option. I'm still not clear on how you're planning to break the recursive loop for "cls.__getdescriptor__" when using a metaclass. >> The reason I'm liking option 2 is that it leaves the existing >> __getattribute__ implementations fully in charge of the MRO walk, and >> *only* offers a way to override the "base.__dict__[name]" part with a >> call to "base.__dict__['__getdescriptor__'](cls, base, name)" instead. > > Right. That?s why I propose __getdescriptor__ in the first place. This allows Python coders to do extra work (or different work) to fetch an attribute of a specific class in the MRO and works both with regular attribute lookup as well as lookup through super(). Yeah, I'm definitely in agreement with the intent of the PEP. I'm just thinking that we should aim to keep the expected semantics of the hook as close as we can to the semantics of the code it's replacing, rather than risking the introduction of potentially nested MRO walks (we can't outright *prevent* the latter, but we *can* quite clearly say "That's almost certainly a bad idea, avoid it if you possibly can"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Dec 2 22:13:48 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 13:13:48 +1000 Subject: [Python-ideas] PEP 505 vs matrix multiplication In-Reply-To: References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <20171129060843.GZ22248@ando.pearwood.info> <20171130194956.73732307@fsol> Message-ID: On 2 December 2017 at 00:16, Paul Moore wrote: > At the moment, 99% of the discussion seems rooted in generalised "it > would help a lot of code" with readability arguments based on > artificial examples, and that's not really helping move the discussion > forward. PEP 505 covers this in https://www.python.org/dev/peps/pep-0505/#motivating-examples (with an actual survey of standard library code, as well as some specific motivating examples from real world open source projects). > > To be clear, I understand the problem of reading semi-structured data. > I've hit it myself and been frustrated by it. But my reaction was "why > am I not able to find a library that does this?", and when I couldn't > find such a library, my assumption was that people in general don't > find the current behaviour sufficiently frustrating to do anything > about it. And I was in the same situation - it annoys me, but not > enough to write a helper module (and certainly not enough that I'm > crying out for a language change). So I do appreciate the need, I just > don't think "language change" should be the first thing that's > suggested. The problem is that libraries don't have any way to manipulate attribute access chains in a usefully relevant way - you either have to put the control flow logic in line, or start writing helper functions like: def maybe_traverse_particular_subtree(obj): if obj is None: return None return obj.particular.subtree.of.interest And there are lots of other tricks used to make such code reasonably readable, with one of the most common being "Use a short placeholder variable name", so you get code like: fs = site.first_seen first_seen = fs if fs is None else fs.isodate() (from one of the proposed refactorings of the SiteView example in the PEP) Another variant on the "?? as pronoun" idea would be to use it as a new syntax for defining single argument lambda expressions: def call_if_not_none(lhs, deferred_rhs): return lhs if lhs is not None else deferred_rhs(lhs) first_seen = call_if_not_none(site.first_seen, (??.isodate()) However, I think that would actually be less clear and more confusing than the inline implicit pronoun idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Dec 2 22:22:03 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 13:22:03 +1000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201181140.5dmb525bl4reg2va@python.ca> Message-ID: On 2 December 2017 at 07:55, Brett Cannon wrote: > As for the completely separating the loading and execution, I don't have a > need for what's being proposed so I don't have an opinion. I basically made > sure Eric Snow structured specs so that lazy loading as currently supported > works so I got what I wanted for basic lazy importing (short of the PyPI > package I keep talking about writing to add a nicer API around lazy > importing :) . In PEP 451 terms, I can definitely see the value in having CREATE_MODULE and EXEC_MODULE be separate opcodes (rather than having them be jammed together in IMPORT_MODULE the way they are now). While there'd still be some import machinery on the frame stack when the module code ran (due to the way the "exec_module" API is defined), there'd be substantially less of it. There'd be some subtleties around handling backwards compatibility with __import__ overrides (essentially, CREATE_MODULE would have to revert to doing all the work, while EXEC_MODULE would become a no-op), but the basic idea seems plausible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Dec 2 22:26:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 13:26:52 +1000 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201181140.5dmb525bl4reg2va@python.ca> Message-ID: On 3 December 2017 at 13:22, Nick Coghlan wrote: > On 2 December 2017 at 07:55, Brett Cannon wrote: >> As for the completely separating the loading and execution, I don't have a >> need for what's being proposed so I don't have an opinion. I basically made >> sure Eric Snow structured specs so that lazy loading as currently supported >> works so I got what I wanted for basic lazy importing (short of the PyPI >> package I keep talking about writing to add a nicer API around lazy >> importing :) . > > In PEP 451 terms, I can definitely see the value in having > CREATE_MODULE and EXEC_MODULE be separate opcodes (rather than having > them be jammed together in IMPORT_MODULE the way they are now). While > there'd still be some import machinery on the frame stack when the > module code ran (due to the way the "exec_module" API is defined), > there'd be substantially less of it. > > There'd be some subtleties around handling backwards compatibility > with __import__ overrides (essentially, CREATE_MODULE would have to > revert to doing all the work, while EXEC_MODULE would become a no-op), > but the basic idea seems plausible. Re-reading my own post reminded me of another potentially harder problem: IMPORT_MODULE also hides all the import cache management from the eval loop. If you try to split creation and execution apart, then that cache management becomes the eval loop's problem (since it needs to know whether the module is already fully initialised or not after the "GET_OR_CREATE_MODULE" step. That cache locking is fairly intricate already, and exposing these to the eval loop as distinct operations wouldn't make that any easier. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nas-python-ideas at arctrix.com Sat Dec 2 22:37:53 2017 From: nas-python-ideas at arctrix.com (Neil Schemenauer) Date: Sat, 2 Dec 2017 21:37:53 -0600 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201181140.5dmb525bl4reg2va@python.ca> Message-ID: <20171203033753.gny7sgmjhetm3rmp@python.ca> On 2017-12-03, Nick Coghlan wrote: > There'd be some subtleties around handling backwards compatibility > with __import__ overrides (essentially, CREATE_MODULE would have to > revert to doing all the work, while EXEC_MODULE would become a no-op), > but the basic idea seems plausible. Right now (half-baked ideas), I'm thinking: IMPORT_RESOLVE Gives the abs_name for a module (to feed to _find_and_load()) IMPORT_LOAD Calls _find_and_load() with abs_name as argment. The body of the module is not executed yet. Could return a spec or a module with the spec that contains the code object of the body. IMPORT_EXEC Executes the body of the module. IMPORT_FROM Calls _handle_fromlist(). Props to Brett for making importlib in such as way that this clean separation should be relatively easy to do. To handle custom __import__ hook, I think we can do the following. Have each opcode detect if __import__ is overridden. There is already such test (import_name fast path). If it is overridden, IMPORT_RESOLVE and IMPORT_LOAD will gather up info and then IMPORT_EXEC will call __import__() using compatible arguments. Inititally, the benefit of making these changes is not some performance improvement or some functionalty we didn't previously have. importlib does all this already and probably just as quickly. The benefit that the import system becomes more understandable. If we decide it is a good idea, we could expose hooks for these opcodes. Not like __import__ though. Maybe there should be a function like sys.set_import_hook(, func). That will keep ceval fast as it will know if there is a hook or not, without having to crawl around in builtins. Regards, Neil From storchaka at gmail.com Sun Dec 3 01:58:11 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 3 Dec 2017 08:58:11 +0200 Subject: [Python-ideas] Provide a way to import module without exec body In-Reply-To: <20171201181226.at7ugd5ks4yyo7ro@python.ca> References: <20171201081337.c7fry5encm2nc4ob@python.ca> <20171201181226.at7ugd5ks4yyo7ro@python.ca> Message-ID: 01.12.17 20:12, Neil Schemenauer ????: > On 2017-12-01, Chris Angelico wrote: >> Can you elaborate on where this is useful, please? > > Introspection tools, for example, might want to look at the module > without executing it. Also, it is a building block to make lazy loading > of modules work. As Nick points out, importlib can do this already. > > Currently, the IMPORT_NAME both loads the code for a module and also > executes it. The exec happens fairly deep in the guts of importlib. > This makes import.c and ceval.c mutually recursive. The locking gets > complicated. There are hacks like _call_with_frames_removed() to hide > the recursion going on. > > Instead, we could have two separate opcodes, one that gets the module > but does not exec it (i.e. a function like __import__() that returns a > future) and another opcode that actually does the execution. Figuring > out all the details is complicated. The IMPORT_NAME opcode is highly optimized. In most cases it just looks up in sys.modules and check that the module is not imported right now. I suppose two opcodes will hit performance. And I don't see how this could simplify the code. I suppose the existing importlib machinery already supports loading modules without executing them. Maybe not with a single function, but with a combination of 2-3 methods. But what you want to get? The source? The code object? What about modules implemented in C? From ronaldoussoren at mac.com Sun Dec 3 06:36:41 2017 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sun, 03 Dec 2017 12:36:41 +0100 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> <9FAC22C4-F679-4033-82F3-988571332B06@mac.com> Message-ID: <64AD6932-4B7A-4E3D-BA81-06A4FD5916CA@mac.com> > On 3 Dec 2017, at 03:58, Nick Coghlan wrote: > > On 2 December 2017 at 01:12, Ronald Oussoren wrote: >> On 1 Dec 2017, at 12:29, Nick Coghlan wrote: >>> 2. Define it as a class method, but have the convention be for the >>> *caller* to worry about walking the MRO, and hence advise class >>> implementors to *never* call super() from __getdescriptor__ >>> implementations (since doing so would nest MRO walks, and hence >>> inevitably have weird outcomes). Emphasise this convention by passing >>> the current base class from the MRO as the second argument to the >>> method. >> >> But that?s how I already define the method, that is the PEP proposes to change >> the MRO walking loop to: >> >> for cls in mro_list: >> try: >> return cls.__getdescriptor__(name) # was cls.__dict__[name] >> except AttributeError: # was KeyError >> pass >> >> Note that classes on the MRO control how to try to fetch the name at that level. The code is the same for __getdescriptor__ as a classmethod and as a method on the metaclass. > > That's not exactly the same as what I'm suggesting, and it's the part > that has Mark concerned about an infinite regression due to the > "cls.__getdescriptor__" subexpression. > > What I'm suggesting: > > try: > getdesc = base.__dict__["__getdescriptor__?] > except KeyError: > # Use existing logic > else: > try: > getdesc(cls, base, name) > except AttributeError: > pass I honestly don?t understand why that?s better than what is in the PEP, other than the way to locate __getdescriptor__. In particular: why change the signature of __getdescriptor__ from __getdescriptor__(base, name) to __getdescriptor__(base, cls, name)? And to keep things clear, what are ?cls? and ?base?? Based on your initial proposal I?m assuming that ?cls? is the type of the object whose attribute we?re looking up, and ?base? is the current class on the MRO that we?re looking at, that is: def _PyType_Lookup(cls, name): mro = cls.mro() for base in mro: ? Getting back to the way __getdescriptor__ is accessed: Locating this using base.__dict__[?__getdescriptor__?] instead of base.__getdescriptor__ makes it clear that this method is not accessed using normal attribute lookup, and matches the (currently non-functional) patch that access slots on the type object from C code. The current implementation access the slot on the meta type, that is type(base).__dict__[?__getdescriptor__?], but that doesn?t fundamentally change the mechanics. My proposal would then end up as: for base in mro: try: getdescr = base.__dict__[?__getdescriptor__?] except KeyError: try: return base.__dict__[name] except KeyError: pass else: getdesc(base, name) This is for the classmethod version of the PEP that I?m currently preferring. The way the __getdescriptor__ implementation is located also might explain the confusion: I?m reasoning based on the implementation and that doesn?t match the PEP text in this regard. I didn?t get how this affected comprehension of the proposal, but think I?m getting there :-) > > * Neither type nor object implement __getdescriptor__ I?m not convinced that this is strictly necessary, but also am not opposed to this. Another reason for not implementing __getdescriptor__ on type and object is that this means that the existence of the methods cannot confuse users (as this is a pretty esoteric API that most users will never have to touch). BTW. My patch more or less inlines the default __getdescriptor__, but that?s for performance reasons. Not implementing __getdescriptor__ on object and type would avoid possible confusion about this, and would make it clearer that the attribute lookup cache is still valid (and removing that cache would definitely be a bad idea) > * Calling super() in __getdescriptor__ would be actively discouraged > without a base class to define the cooperation rules > * If it's missing in the base class dict, fall back to checking the > base class dict directly for the requested attribute > * cls is injected into the call by the MRO walking code *not* the > normal bound method machinery As mentioned above I don?t understand why the change in interface is needed, in particular why __getdescriptor__ needs to have access to the original class and not just the current class on the MRO. > * Only "base.__dict__" needs to be assured of getting a hit on every base class > > What's currently in the PEP doesn't clearly define how it thinks > "cls.__getdescriptor__" should work without getting itself into an > infinite loop. See above. The PEP doesn?t explain that ?cls.__getdescriptor__? isn?t normal attribute access but uses slot access. > >> I don?t think there?s a good technical reason to pick either option, other than that the metaclass option forces an exception when creating a class that inherits (using multiple inheritance) from two classes that have a custom __getdescriptor__. I?m not convinced that this is a good enough reason to go for the metaclass option. > > I'm still not clear on how you're planning to break the recursive loop > for "cls.__getdescriptor__" when using a metaclass. > >>> The reason I'm liking option 2 is that it leaves the existing >>> __getattribute__ implementations fully in charge of the MRO walk, and >>> *only* offers a way to override the "base.__dict__[name]" part with a >>> call to "base.__dict__['__getdescriptor__'](cls, base, name)" instead. >> >> Right. That?s why I propose __getdescriptor__ in the first place. This allows Python coders to do extra work (or different work) to fetch an attribute of a specific class in the MRO and works both with regular attribute lookup as well as lookup through super(). > > Yeah, I'm definitely in agreement with the intent of the PEP. I'm just > thinking that we should aim to keep the expected semantics of the hook > as close as we can to the semantics of the code it's replacing, rather > than risking the introduction of potentially nested MRO walks (we > can't outright *prevent* the latter, but we *can* quite clearly say > "That's almost certainly a bad idea, avoid it if you possibly can?). Does my updated proposal (base.__dict__[?__getdescriptor__?]) adres this issue? My intend with the PEP was indeed to stay as close as possible to the current behaviour and just replace peeking into base.__dict__ by calling an overridable special method on base. The alternative is to make it possible to use something that isn?t builtin.dict as the __dict__ for types, but that requires significant changes to CPython because the C code currently assumes that __dict__ is an instance of exactly builtin.dict and changing that is likely significantly more work than just replacing PyDict_GetItem by PyMapping_GetItem (different failure modes, different refcount rules, different re-entrancy, ?) BTW. Thanks for feedback, I?m finally starting to understand how the PEP isn?t clear enough and how to fix that. Ronald From ncoghlan at gmail.com Sun Dec 3 09:03:49 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Dec 2017 00:03:49 +1000 Subject: [Python-ideas] PEP 447: Adding type.__getdescriptor__ In-Reply-To: <64AD6932-4B7A-4E3D-BA81-06A4FD5916CA@mac.com> References: <69FE6318-7D46-4EA2-B37D-7488B6CDFCE4@mac.com> <5DFB36D0-62FC-4974-9DC3-F6AE40B29020@mac.com> <9FAC22C4-F679-4033-82F3-988571332B06@mac.com> <64AD6932-4B7A-4E3D-BA81-06A4FD5916CA@mac.com> Message-ID: On 3 December 2017 at 21:36, Ronald Oussoren wrote: > > >> On 3 Dec 2017, at 03:58, Nick Coghlan wrote: >> >> On 2 December 2017 at 01:12, Ronald Oussoren wrote: >>> On 1 Dec 2017, at 12:29, Nick Coghlan wrote: >>>> 2. Define it as a class method, but have the convention be for the >>>> *caller* to worry about walking the MRO, and hence advise class >>>> implementors to *never* call super() from __getdescriptor__ >>>> implementations (since doing so would nest MRO walks, and hence >>>> inevitably have weird outcomes). Emphasise this convention by passing >>>> the current base class from the MRO as the second argument to the >>>> method. >>> >>> But that?s how I already define the method, that is the PEP proposes to change >>> the MRO walking loop to: >>> >>> for cls in mro_list: >>> try: >>> return cls.__getdescriptor__(name) # was cls.__dict__[name] >>> except AttributeError: # was KeyError >>> pass >>> >>> Note that classes on the MRO control how to try to fetch the name at that level. The code is the same for __getdescriptor__ as a classmethod and as a method on the metaclass. >> >> That's not exactly the same as what I'm suggesting, and it's the part >> that has Mark concerned about an infinite regression due to the >> "cls.__getdescriptor__" subexpression. >> >> What I'm suggesting: >> >> try: >> getdesc = base.__dict__["__getdescriptor__?] > >> except KeyError: >> # Use existing logic >> else: >> try: >> getdesc(cls, base, name) >> except AttributeError: >> pass > > I honestly don?t understand why that?s better than what is in the PEP, other than the way to locate __getdescriptor__. It's the specifically the way we locate __getdescriptor__ that I'm interested in :) > In particular: why change the signature of __getdescriptor__ from __getdescriptor__(base, name) to __getdescriptor__(base, cls, name)? So that the __getdescriptor__ implementation in the base has access to the class currently being accessed. That way you can have a common algorithm in the base class, but then tune that algorithm based on class level attributes on the class being defined. My inspiration for the suggestion is a combination of the signature of https://docs.python.org/3/reference/datamodel.html#object.__get__ (where you always get passed the class that holds the property, but are only sometimes passed the instance being accessed), and the signature of super() itself (which you have to pass both the dynamic class, and the current position in the MRO). > And to keep things clear, what are ?cls? and ?base?? Based on your initial proposal I?m assuming that ?cls? is the type of the object whose attribute we?re looking up, and ?base? is the current class on the MRO that we?re looking at, that is: > > def _PyType_Lookup(cls, name): > mro = cls.mro() > > for base in mro: > ? Yep (I wrote out the full version a few posts further up the thread, but only quoted the inner snippet here). > Getting back to the way __getdescriptor__ is accessed: Locating this using base.__dict__[?__getdescriptor__?] instead of base.__getdescriptor__ makes it clear that this method is not accessed using normal attribute lookup, and matches the (currently non-functional) patch that access slots on the type object from C code. The current implementation access the slot on the meta type, that is type(base).__dict__[?__getdescriptor__?], but that doesn?t fundamentally change the mechanics. > > My proposal would then end up as: > > for base in mro: > try: > getdescr = base.__dict__[?__getdescriptor__?] > except KeyError: > try: > return base.__dict__[name] > except KeyError: > pass > > else: > getdesc(base, name) > > This is for the classmethod version of the PEP that I?m currently preferring. > > The way the __getdescriptor__ implementation is located also might explain the confusion: I?m reasoning based on the implementation and that doesn?t match the PEP text in this regard. I didn?t get how this affected comprehension of the proposal, but think I?m getting there :-) Yep, I was going off the description in the PEP, which just shows a normal "base.__getdescriptor__" access. If the implementation already doesn't do that and instead works as you suggest above, then that's a good thing, and we're actually already in agreement about how it should work - the PEP just needs to be updated to say that :) >> * Neither type nor object implement __getdescriptor__ > > I?m not convinced that this is strictly necessary, but also am not opposed to this. > > Another reason for not implementing __getdescriptor__ on type and object is that this means that the existence of the methods cannot confuse users (as this is a pretty esoteric API that most users will never have to touch). > > BTW. My patch more or less inlines the default __getdescriptor__, but that?s for performance reasons. Not implementing __getdescriptor__ on object and type would avoid possible confusion about this, and would make it clearer that the attribute lookup cache is still valid (and removing that cache would definitely be a bad idea) Right, I think the null case here is to *not* include them, and then see if that causes any insurmountable problems. I don't expect it will, which would leave us with the simpler option of omitting them. >> * Calling super() in __getdescriptor__ would be actively discouraged >> without a base class to define the cooperation rules >> * If it's missing in the base class dict, fall back to checking the >> base class dict directly for the requested attribute >> * cls is injected into the call by the MRO walking code *not* the >> normal bound method machinery > > As mentioned above I don?t understand why the change in interface is needed, in particular why __getdescriptor__ needs to have access to the original class and not just the current class on the MRO. I think we could likely live without it, but it's easy to pass in, and adding it later would be difficult. If you use the signature order "(cls, base, name)", it also means that __getdescriptor__ implementations will behave more like regular class methods, even though they're not actually looked up that way. >> Yeah, I'm definitely in agreement with the intent of the PEP. I'm just >> thinking that we should aim to keep the expected semantics of the hook >> as close as we can to the semantics of the code it's replacing, rather >> than risking the introduction of potentially nested MRO walks (we >> can't outright *prevent* the latter, but we *can* quite clearly say >> "That's almost certainly a bad idea, avoid it if you possibly can?). > > Does my updated proposal (base.__dict__[?__getdescriptor__?]) adres this issue? My intend with the PEP was indeed to stay as close as possible to the current behaviour and just replace peeking into base.__dict__ by calling an overridable special method on base. Aye, it's sounding like it really is just the "cls.__getdescriptor__" short hand in the PEP that was confusing me, and the way you're intending to implement it matches the way I'm suggesting it should work. > The alternative is to make it possible to use something that isn?t builtin.dict as the __dict__ for types, but that requires significant changes to CPython because the C code currently assumes that __dict__ is an instance of exactly builtin.dict and changing that is likely significantly more work than just replacing PyDict_GetItem by PyMapping_GetItem (different failure modes, different refcount rules, different re-entrancy, ?) I agree with that - changing the execution namespace is one thing, but changing the actual storage underlying cls.__dict__ would be a far more difficult prospect. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From chris.barker at noaa.gov Sun Dec 3 18:06:02 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Sun, 3 Dec 2017 15:06:02 -0800 Subject: [Python-ideas] a sorting protocol dunder method? Message-ID: I can't believe this hasn't been brought up before, but searching the web, and python-ideas, and all the PEPs has found nothing (could be my lame google-fu), so here goes: Recent python has moved toward a "key" function for customized sorting: list.sort(key=key_fun) key is also used (according to https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in: min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby() with this fairly broad use, it seems it's becoming a fairly universal protocol for ordering. However, if you are writing a custom class, and want to make it "sortable", you need to define (some of) the total comparison operators, which presumably are then called O(n logn) number of times for comparisons when sorting. Or provide a sort key function when you actually do the sorting, which requires some inside knowledge of the objects you are sorting. But what if there was a sort key magic method: __key__ or __sort_key__ (or whatever) that would be called by the sorting functions if: no key function was specified and it exists It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case. In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and... hmm, perhaps a __key__ method could even be used by the comparison operators, though that could result in pretty weird results when comparing two different types. So: has this already been brought up and rejected? Am I imagining the performance benefits? Is sorting-related functionally too special-case to deserve a protocol? Thoughts? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From neatnate at gmail.com Sun Dec 3 18:46:45 2017 From: neatnate at gmail.com (Nathan Schneider) Date: Sun, 3 Dec 2017 18:46:45 -0500 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: On Sun, Dec 3, 2017 at 6:06 PM, Chris Barker wrote: > In fact, it's striking me that there may well be classes that are defining > the comparison magic methods not because they want the objects to "work" > with the comparison operators, but because that want them to work with sort > and min, and max, and... > An existence proof: in NLTK, an __lt__ method added purely to facilitate consistent sorting (in doctests) of structured data objects for which comparison operators do not really make conceptual sense: https://github.com/nltk/nltk/pull/1902/files#diff-454368f06fd635b1e06c9bb6d65bd19bR689 Granted, calling min() and max() on collections of these objects would not make conceptual sense either. Still, __sort_key__ would have been cleaner than __lt__. Cheers, Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Dec 3 19:34:20 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 4 Dec 2017 11:34:20 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: <20171204003419.GS22248@ando.pearwood.info> On Sun, Dec 03, 2017 at 03:06:02PM -0800, Chris Barker wrote: > Recent python has moved toward a "key" function for customized sorting: > > list.sort(key=key_fun) > > key is also used (according to > https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in: > > min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby() > > with this fairly broad use, it seems it's becoming a fairly universal > protocol for ordering. Be careful: there are two different concepts here, which are only loosely related: - ordering values, that is, whether or not we can say that x < y - sorting values in a collection. By default, we sort by the inherent order of the values. But if the values have no inherent order (they are unordered), we can sort unordered items in a collection by providing an appropriate key function. Hence why I say they are loosely related. For example, we can sort the normally unordered complex numbers by providing a key function: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: (z.real, z.imag)) [1j, (1+8j), (3-2j), (5+2j)] But conceptually, I'm imposing an order on an otherwise unordered data type. Complex numbers inherently have no order: it makes no sense to say that 1+8j is less than 3-2j. But since the items in the list have to be in *some* one-dimensional order, I can choose whichever order makes sense for *this* collection: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: abs(z)) [1j, (3-2j), (5+2j), (1+8j)] Another collection of the same values might be ordered differently. It doesn't make sense to put that functionality into the complex numbers themselves: complex numbers are unordered, and any order we impose on them comes from the collection, not the individual numbers. > However, if you are writing a custom class, and want to make it "sortable", > you need to define (some of) the total comparison operators, which > presumably are then called O(n logn) number of times for comparisons when > sorting. > > Or provide a sort key function when you actually do the sorting, which > requires some inside knowledge of the objects you are sorting. This is conflating the two distinct concepts: the comparison operators apply to the values in the collection; the key function applies to the collection itself (although it does need to have inside knowledge of the items in the collection). > But what if there was a sort key magic method: > > __key__ or __sort_key__ (or whatever) > > that would be called by the sorting functions if: > > no key function was specified > > and > > it exists > > It seems this would provide a easy way to make custom classes sortable that > would be nicer for end users (not writing key functions), and possibly more > performant in the "usual" case. I'm not sure how you can talk about performance of the __key__ method without knowing the implementation :-) If this __key__ method is called like __lt__, then the big O() behaviour will be the same, worst case, O(n log n). If it is called like the key function, then the big O() behaviour will be the same as the key function now. Either way, you're just changing the name of the function called, not how it is called. > In fact, it's striking me that there may well be classes that are defining > the comparison magic methods not because they want the objects to "work" > with the comparison operators, but because that want them to work with sort > and min, and max, and... It is conceivable, I suppose, but if I were designing an unordered data type (like complex) I wouldn't implement ordering operators to allow sorting, I'd provide a separate key function. But that assumes that there's only ever one way to order a collection of unordered items. I don't think that's a safe assumption. Again, look at complex above, there are at least three obvious ways: - order by magnitude; - order by real part first, then imaginary part; - order by the absolute values of the real and imaginary parts (so that 1+2j and -1-2j sort together). I don't think it makes sense to bake into an unodered data type a single way of ordering. If there was such a natural order, then the data type wouldn't be unordered and you should just define the comparison operators. > hmm, perhaps a __key__ method could even be used by the comparison > operators, though that could result in pretty weird results when comparing > two different types. If the comparison operators fell back to calling __key__ when __lt__ etc aren't defined, that would effectively force unordered types like complex to be ordered. > So: has this already been brought up and rejected? > > Am I imagining the performance benefits? Probably. > Is sorting-related functionally too special-case to deserve a protocol? Yes. -- Steve From steve at pearwood.info Sun Dec 3 19:57:21 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 4 Dec 2017 11:57:21 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: <20171204005721.GT22248@ando.pearwood.info> On Sun, Dec 03, 2017 at 06:46:45PM -0500, Nathan Schneider wrote: > On Sun, Dec 3, 2017 at 6:06 PM, Chris Barker wrote: > > > In fact, it's striking me that there may well be classes that are defining > > the comparison magic methods not because they want the objects to "work" > > with the comparison operators, but because that want them to work with sort > > and min, and max, and... > > > > An existence proof: in NLTK, an __lt__ method added purely to facilitate > consistent sorting (in doctests) of structured data objects for which > comparison operators do not really make conceptual sense: > https://github.com/nltk/nltk/pull/1902/files#diff-454368f06fd635b1e06c9bb6d65bd19bR689 This shows the problem with putting the key function into the data type. What if I want to sort AttrDicts by their list of keys instead? Or their (key, value) pairs? What is so special about sorting by ID (which may not even exist!) that it deserves to be part of the AttrDict itself? The practical answer is that it is a convenience for doctests, but it would have been almost as convenient for nltk to provide a convenience sorting function to hold that knowledge as to bake it into the AttrDict itself. Or a key function that you can pass to sorted. My solution to this would have been to add a key function to the class: @staticmethod def _id_order(item): # Convenience function for doctests return item['ID'] and then sort like this: sorted(list_of_attrdicts, key=AttrDict._id_order) A little less convenient, but conceptually cleaner and more explicit. -- Steve From bruce at leban.us Sun Dec 3 21:53:45 2017 From: bruce at leban.us (Bruce Leban) Date: Sun, 3 Dec 2017 18:53:45 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204005721.GT22248@ando.pearwood.info> References: <20171204005721.GT22248@ando.pearwood.info> Message-ID: On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote: > > However, if you are writing a custom class ... > > But what if there was a sort key magic method: > > __key__ or __sort_key__ (or whatever) > > that would be called by the sorting functions > > It seems this would provide a easy way to make custom classes sortable > that would be nicer for end users (not writing key functions), and possibly > more performant in the "usual" case. > On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano wrote: > > This shows the problem with putting the key function into the data type. > What if I want to sort AttrDicts by their list of keys instead? Or their > (key, value) pairs? What is so special about sorting by ID (which may > not even exist!) that it deserves to be part of the AttrDict itself? I think you're arguing against this for the wrong reason. Chris was talking about custom classes having the *option* of making them sortable by providing a key method in the class definition. This strikes me as useful and I can imagine having used this if it were available. What you're saying is that there are classes which probably shouldn't define a __sort_key__ function, which I quite agree with. But I don't think it's a good argument against this proposal. On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote: > Am I imagining the performance benefits? > Maybe. Looking strictly at O(-) cost, there's no difference between a key function and comparison operators. Sure it might potentially only make O(n) calls to the key function and O(n log n) calls to compare the keys vs. O(n log n) calls to the comparator functions but that might not actually be faster. There certainly are cases where implementing a key function would be quite slow. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Dec 3 22:14:19 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 3 Dec 2017 19:14:19 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204005721.GT22248@ando.pearwood.info> Message-ID: I'm not sure I understand the motivation to make elements *sortable* but not comparable. If an arbitrary order is still useful, I'd think you'd want to be able to tell how two particular elements *would* sort by asking a wrote: > > On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker > wrote: > >> >> However, if you are writing a custom class ... >> >> But what if there was a sort key magic method: >> >> __key__ or __sort_key__ (or whatever) >> >> that would be called by the sorting functions >> >> It seems this would provide a easy way to make custom classes sortable >> that would be nicer for end users (not writing key functions), and possibly >> more performant in the "usual" case. >> > > On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano > wrote: > >> >> This shows the problem with putting the key function into the data type. >> What if I want to sort AttrDicts by their list of keys instead? Or their >> (key, value) pairs? What is so special about sorting by ID (which may >> not even exist!) that it deserves to be part of the AttrDict itself? > > > I think you're arguing against this for the wrong reason. Chris was > talking about custom classes having the *option* of making them sortable > by providing a key method in the class definition. This strikes me as > useful and I can imagine having used this if it were available. What you're > saying is that there are classes which probably shouldn't define a > __sort_key__ function, which I quite agree with. But I don't think it's a > good argument against this proposal. > > > On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker > wrote: > >> Am I imagining the performance benefits? >> > > Maybe. Looking strictly at O(-) cost, there's no difference between a key > function and comparison operators. Sure it might potentially only make O(n) > calls to the key function and O(n log n) calls to compare the keys vs. O(n > log n) calls to the comparator functions but that might not actually be > faster. There certainly are cases where implementing a key function would > be quite slow. > > --- Bruce > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Dec 3 22:21:35 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 3 Dec 2017 19:21:35 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204005721.GT22248@ando.pearwood.info> Message-ID: And if this is a method on a custom *collection*, it can do whatever it wants in MyCollection.sort() already. On Dec 3, 2017 7:14 PM, "David Mertz" wrote: > I'm not sure I understand the motivation to make elements *sortable* but > not comparable. If an arbitrary order is still useful, I'd think you'd want > to be able to tell how two particular elements *would* sort by asking a > On Dec 3, 2017 6:55 PM, "Bruce Leban" wrote: > >> >> On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker >> wrote: >> >>> >>> However, if you are writing a custom class ... >>> >>> But what if there was a sort key magic method: >>> >>> __key__ or __sort_key__ (or whatever) >>> >>> that would be called by the sorting functions >>> >>> It seems this would provide a easy way to make custom classes sortable >>> that would be nicer for end users (not writing key functions), and possibly >>> more performant in the "usual" case. >>> >> >> On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano >> wrote: >> >>> >>> This shows the problem with putting the key function into the data type. >>> What if I want to sort AttrDicts by their list of keys instead? Or their >>> (key, value) pairs? What is so special about sorting by ID (which may >>> not even exist!) that it deserves to be part of the AttrDict itself? >> >> >> I think you're arguing against this for the wrong reason. Chris was >> talking about custom classes having the *option* of making them sortable >> by providing a key method in the class definition. This strikes me as >> useful and I can imagine having used this if it were available. What you're >> saying is that there are classes which probably shouldn't define a >> __sort_key__ function, which I quite agree with. But I don't think it's a >> good argument against this proposal. >> >> >> On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker >> wrote: >> >>> Am I imagining the performance benefits? >>> >> >> Maybe. Looking strictly at O(-) cost, there's no difference between a key >> function and comparison operators. Sure it might potentially only make O(n) >> calls to the key function and O(n log n) calls to compare the keys vs. O(n >> log n) calls to the comparator functions but that might not actually be >> faster. There certainly are cases where implementing a key function would >> be quite slow. >> >> --- Bruce >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Dec 4 01:45:55 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 4 Dec 2017 08:45:55 +0200 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: 04.12.17 01:06, Chris Barker ????: > So: has this already been brought up and rejected? https://bugs.python.org/issue20632 > Am I imagining the performance benefits? This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item. And there will be an overhead even in the case when the __key__ method is not defined. > Is sorting-related functionally too special-case to deserve a protocol? Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method. It will be rather confusing if different methods of sorting produce inconsistent order. If the first item of the sorted list is not the smallest item of the list. But the idea of the class decorator looks more sane to me. From carl at oddbird.net Mon Dec 4 01:48:18 2017 From: carl at oddbird.net (Carl Meyer) Date: Sun, 3 Dec 2017 22:48:18 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> I think this is an interesting idea, and I don't believe that either performance or "sortable vs comparable" are very relevant. I doubt there is much performance to gain here, and I think the default sort order for a class must continue to match its comparison behavior. I think the case in favor of this idea (slightly modified, so it no longer applies only to sorting) is mostly convenience and readability. Most often when I define equality and comparison dunder methods for a custom class, I'm effectively just deferring the comparison to some field or tuple of fields of the object. E.g. from functools import total_ordering @total_ordering class BankAccount: def __init__(self, balance): self.balance = balance def __eq__(self, other): if isinstance(other, BankAccount): return self.balance == other.balance return NotImplemented def __lt__(self, other): if isinstance(other, BankAccount): return self.balance < other.balance return NotImplemented It'd be nice to be able to eliminate an import and have the lines of code and instead write that as: class BankAccount: def __init__(self, balance): self.balance = balance def __sort_key__(self): return self.balance I would expect these two to give the same behavior: instances of BankAccount should still be fully comparable and sortable, with all of these operations effectively being deferred to comparisons and sorts of the sort key. Now for the cases against: 1. I made one important decision explicitly (twice, unfortunately) in the first code block that disappeared in the second: what "other" instances should be considered comparable to instances of BankAccount? Should it be decided structurally, like in the documentation example for `functools.total_ordering`? Should it be "any subclass of BankAccount"? Or maybe it should only be instances of BankAccount itself, not subclasses? (We just went around on this very question for PEP 557, dataclasses.) If Python added __sort_key__, it would have to just pick a behavior here, which would be unfortunate for cases where that behavior is wrong. Or maybe we could also add a __sort_allowed__ method... 2. There might actually be a performance cost here, since this wouldn't replace the existing rich comparison dunder methods, so it would add one more thing Python has to check when trying to compare two objects. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From njs at pobox.com Mon Dec 4 04:16:26 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 4 Dec 2017 01:16:26 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> References: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> Message-ID: On Sun, Dec 3, 2017 at 10:48 PM, Carl Meyer wrote: > It'd be nice to be able to eliminate an import and have the lines of > code and instead write that as: > > class BankAccount: > def __init__(self, balance): > self.balance = balance > > def __sort_key__(self): > return self.balance What if we added a @key_ordering decorator, like @total_ordering but using __key__ to generate the comparisons? I know you'd have to do an import, but usually adding things to the core language requires more of a benefit than that :-). -n -- Nathaniel J. Smith -- https://vorpus.org From solipsis at pitrou.net Mon Dec 4 06:02:43 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 12:02:43 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: Message-ID: <20171204120243.4303b6e1@fsol> On Sun, 3 Dec 2017 15:06:02 -0800 Chris Barker wrote: > I can't believe this hasn't been brought up before, but searching the web, > and python-ideas, and all the PEPs has found nothing (could be my lame > google-fu), so here goes: > > Recent python has moved toward a "key" function for customized sorting: > > list.sort(key=key_fun) > > key is also used (according to > https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in: > > min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby() > > with this fairly broad use, it seems it's becoming a fairly universal > protocol for ordering. > [...] +1 from me. I would also be +1 on an optional class decorator that would generate all the ordering comparison methods (__lt__, etc.) based on the __key__ method definition. Regards Antoine. From solipsis at pitrou.net Mon Dec 4 06:06:38 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 12:06:38 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: Message-ID: <20171204120638.0f84a38a@fsol> On Mon, 4 Dec 2017 08:45:55 +0200 Serhiy Storchaka wrote: > 04.12.17 01:06, Chris Barker ????: > > So: has this already been brought up and rejected? > > https://bugs.python.org/issue20632 > > > Am I imagining the performance benefits? > > This will add an additional overhead. This will be even slower than > passing the key function, since you will need to look up the __key__ > method in every item. That is a reasonable objection. However, looking up a tp_XXX slot is very cheap. > Yes, it is too special-case. I don't see any advantages in comparison > with defining the __lt__ method. There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times). > It will be rather confusing if > different methods of sorting produce inconsistent order. If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__. And there could be a decorator that generates all comparison methods from __key__. Regards Antoine. From steve at pearwood.info Mon Dec 4 06:41:40 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 4 Dec 2017 22:41:40 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> References: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> Message-ID: <20171204114139.GU22248@ando.pearwood.info> On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: > I think this is an interesting idea, and I don't believe that either > performance or "sortable vs comparable" are very relevant. Performance is always relevant -- while performance shouldn't be the sole deciding factor, it should be a factor. And since the entire use-case for this is sorting versus comparison operators, I'm having trouble understanding why you think that sorting versus comparison operators is irrelevant. > I doubt there is much performance to gain here, I doubt there is any performance gain -- rather a performance hit is far more likely. > and I think the default sort order for > a class must continue to match its comparison behavior. This proposal changes that: if a class defines __key__ but not __lt__, then the default sort behaviour will be different from the comparison behaviour. If it defines both, it isn't clear which will be used for sorting. Should __lt__ take priority, or __key__? Whichever we choose, somebody is going to be upset and confused by the choice. > Most often when I define equality and comparison dunder methods for a > custom class, I'm effectively just deferring the comparison to some > field or tuple of fields of the object. E.g. > > from functools import total_ordering > > @total_ordering > class BankAccount: > def __init__(self, balance): > self.balance = balance [snip example] This example shows exactly the confusion of concepts I'm talking about. Why should bank accounts be sorted by their balance, instead of by their account number, or account name, or date they were opened? Why should BankAccounts sort differently according to their balance, a quantity which can change after every transaction? What happens when somebody decides that bank accounts default sorting should be by their account name rather than the ever-changing balance? I'm not saying that it never makes sense to sort a bunch of accounts according to their balance. I'm saying that functionality is not part of the account themselves. If it belongs anywhere, it belongs in the collection of accounts. And even that is dubious: I believe that where it really belongs is in the report generator that needs to sort the collection of accounts. > It'd be nice to be able to eliminate an import That's an argument that applies to *literally* everything in the standard library. Should we make everything a built-in? The prerequisites for eliminating the need for an import should be a *lot* higher than just "it would be nice". I disagree with the design of this: it is putting the decision of how to sort uncomparable objects in the wrong place, in the object itself rather than in the report that wants to sort them. -- Steve From steve at pearwood.info Mon Dec 4 07:16:11 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 4 Dec 2017 23:16:11 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204120638.0f84a38a@fsol> References: <20171204120638.0f84a38a@fsol> Message-ID: <20171204121610.GV22248@ando.pearwood.info> On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > There are definitely advantages. Sorting calls __lt__ for each > comparison (that is, O(n log n) times) while __key__ would only be > called once per item at the start (that is, O(n) times). Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort. Once you've called the key function every time, you still have to *actually sort*, which will be up to O(n log n) calls to __lt__ on whatever __key__ returned. The key function is just a built-in version of the old "DSU" (Decorate-Sort-Undecorate) idiom: values = [(key(x), x) for x in values] values.sort() values = [t[1] for t in values] If you want this functionality, go right ahead and give your class a sortkey method, and then pass that as the explicit key function to sorted: sorted(collection_of_Spam, key=Spam.sortkey) That is nice and explicit, and the method only gets looked up once. It works in any version of Python, it is fully backwards-compatible, and it requires no support from the interpreter. I still think this is a poor object design, putting the key function in the object being sorted rather than in the report doing the sorting, but so long as this isn't blessed by the language as the One Obvious Way I don't mind so much. > > It will be rather confusing if > > different methods of sorting produce inconsistent order. > > If __key__ is inconsistent with __lt__, it is the same error as making > __lt__ inconsistent with __gt__. You seem to be assuming that __key__ is another way of spelling __lt__, rather than being a key function. If it does the same thing as __lt__, it is a comparison method, not a key function, and the name is horribly misleading. In any case, it isn't an error for __lt__ to be inconsistent with __gt__. py> {1, 2, 3, 4} < {2, 3, 4, 5} False py> {1, 2, 3, 4} > {2, 3, 4, 5} False Not all values have total ordering. > And there could be a decorator that generates all comparison methods > from __key__. Defining a single comparison method is not enough to define the rest. You need __eq__ and one comparison method. (Technically we could use __ne__ and one other, but __eq__ is usual. But why not just define __lt__ and use total_ordering, instead of defining two identical decorators that differ only in the name of the dunder method they use? The purpose of __key__ was supposed to be to eliminate the need to define __lt__. It seems ridiculous to then use __key__ to define __lt__ when the whole point of __key__ is to avoid needing to define __lt__. -- Steve From p.f.moore at gmail.com Mon Dec 4 07:19:07 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 4 Dec 2017 12:19:07 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204114139.GU22248@ando.pearwood.info> References: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> <20171204114139.GU22248@ando.pearwood.info> Message-ID: On 4 December 2017 at 11:41, Steven D'Aprano wrote: > On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: >> I think this is an interesting idea, and I don't believe that either >> performance or "sortable vs comparable" are very relevant. > > Performance is always relevant -- while performance shouldn't be the > sole deciding factor, it should be a factor. > > And since the entire use-case for this is sorting versus comparison > operators, I'm having trouble understanding why you think that sorting > versus comparison operators is irrelevant. I'm not completely clear on what the expectation is (in terms of "sortable vs comparable") here. Clearly if a class has __lt__, it's both sortable and comparable, and that's fine. If it doesn't have __lt__, then the implication is that the class designer doesn't believe it's reasonable for it to be ordered. That's what not having comparison methods *means* (well, excepting the case that the designer didn't think of it, which is probably the case for 99% of my classes ;-)) If we have a __key__ method on a class, then the following becomes true: * We can work out which of 2 instances is bigger/smaller using max/min. * We can compare two items by doing a sort. So while the *intent* may not be to allow comparisons, that's what you've done. As a result, I don't think it's an important consideration to worry about classes that "should be sortable, but shouldn't be orderable". That's basically a contradiction in terms, and will likely only come up in corner cases where for technical reasons you may need your instances to participate in sorting without raising exceptions, but you don't consider them orderable (the NLTK example mentioned above). Conversely, when sorting a key can provide significant performance improvements. A single O(n) pass to compute keys, followed by O(n log(n)) comparisons could be significantly faster, assuming comparing keys is faster than extracting them from the object. So allowing classes to define __key__ could be a performance win over a __lt__ defined as (effectively) def __lt__(self, other): return self.__key__() < other.__key__() Overall, I don't see any problem with the idea, although it's not something I've ever needed myself, and I doubt that in practice it will make *that* much difference. The main practical benefit, I suspect, would be if there were an "Orderable" ABC that auto-generated the comparison methods given either __lt__ or __key__ (I could have sworn there was such an ABC for __lt__ already, but I can't find it in the library ref :-() Paul From steve at pearwood.info Mon Dec 4 07:25:57 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 4 Dec 2017 23:25:57 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: Message-ID: <20171204122556.GW22248@ando.pearwood.info> On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote: > But the idea of the class decorator looks more sane to me. The purpose of __key__ is to define a key function (not a comparison operator) for classes that aren't orderable and don't have __lt__. If you're going to then go ahead and define __lt__ and the other comparison operators, there's no point to __key__. -- Steve From solipsis at pitrou.net Mon Dec 4 07:52:19 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 13:52:19 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> Message-ID: <20171204135219.4f10050a@fsol> On Mon, 4 Dec 2017 23:16:11 +1100 Steven D'Aprano wrote: > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > There are definitely advantages. Sorting calls __lt__ for each > > comparison (that is, O(n log n) times) while __key__ would only be > > called once per item at the start (that is, O(n) times). > > Passing a key function doesn't magically turn a O(n log n) comparison > sort into a O(n) sort. Where did I say it did? > Once you've called the key function every time, you still have to > *actually sort*, which will be up to O(n log n) calls to __lt__ on > whatever __key__ returned. Yes... and the whole point is for __key__ to return something which is very cheap to compare, such that there are O(n) expensive calls to __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) expensive calls to __lt__. > > If __key__ is inconsistent with __lt__, it is the same error as making > > __lt__ inconsistent with __gt__. > > You seem to be assuming that __key__ is another way of spelling __lt__, > rather than being a key function. It isn't. It's just supposed to be consistent with it, just like __hash__ is supposed to be consistent with __eq__, and noone reasonable chastises Python because it allows to define __hash__ independently of __eq__. Also please note Serhiy's sentence I was responding to: """It will be rather confusing if different methods of sorting produce inconsistent order.""" > In any case, it isn't an error for __lt__ to be inconsistent with > __gt__. [...] > Not all values have total ordering. As usual you should try to understand what people are trying to say instead of imagining they are incompetent. In the present case, it would definitely be an inconsistency (and a programming error) to have both `a < b` and `b < a` return true. > Defining a single comparison method is not enough to define the rest. How about you stick to the discussion? I'm talking about deriving comparison methods from __key__, not from another comparison method. Defining __key__ is definitely enough to define all comparison methods. Regards Antoine. From solipsis at pitrou.net Mon Dec 4 07:56:31 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 13:56:31 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: <39e8d1e5-0c4a-869e-bfd4-08890938df60@oddbird.net> <20171204114139.GU22248@ando.pearwood.info> Message-ID: <20171204135631.288672bb@fsol> On Mon, 4 Dec 2017 12:19:07 +0000 Paul Moore wrote: > On 4 December 2017 at 11:41, Steven D'Aprano wrote: > > On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: > >> I think this is an interesting idea, and I don't believe that either > >> performance or "sortable vs comparable" are very relevant. > > > > Performance is always relevant -- while performance shouldn't be the > > sole deciding factor, it should be a factor. > > > > And since the entire use-case for this is sorting versus comparison > > operators, I'm having trouble understanding why you think that sorting > > versus comparison operators is irrelevant. > > I'm not completely clear on what the expectation is (in terms of > "sortable vs comparable") here. It's quite clear if you read what Chris Barker posted originally (and which I agree with). We're talking about deriving comparison methods from a key function *while* making sorting potentially faster, because the costly reduction operation happens O(n) times instead of O(n log n) times. Steven OTOH seems to be inventing controversies just for the sake of posting a rant. Regards Antoine. From steve at pearwood.info Mon Dec 4 08:16:14 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 5 Dec 2017 00:16:14 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204005721.GT22248@ando.pearwood.info> Message-ID: <20171204131614.GX22248@ando.pearwood.info> On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote: > I think you're arguing against this for the wrong reason. Chris was talking > about custom classes having the *option* of making them sortable by > providing a key method in the class definition. I never imagined that it would be a required method. Of course it is optional. But by adding interpreter support, we're blessing something which I think is a misguided design choice as the One Obvious Way. We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!) That's the wrong design. Your report needs to know about your values, your values shouldn't need to know how the report is formatted. Its like saying that you want an Email object, and a SMTP_Server object, but to make it more convenient for the SMTP_Server object, we should give the Email objects themselves a method that knows how to talk to port 25 and send themselves. Then the SMTP_Server just calls email.send() on each method. How convenient. The same applies here. Sure, it is convenient to just call bank_accounts.sort() (to re-use the example given by Carl) and it magically works, but as soon as your report changes and you want the bank accounts sorted according to their account name instead of balance, you have to either provide a key function, or change the __key__ method. Obviously changing the __key__ method will break any other reports that rely on it, so you end up using a key function anyway. I would mind this less if it isn't blessed by the interpreter. There are lots of classes which are excessively coupled to other things. I've written a few of them myself, so I understand the temptation. Sometimes that design might even be justified under "Practicality beats purity". But I don't think this is one of those cases: I don't see this as important enough or common enough to build it into the language as an actual dunder method. If people like Chris' idea, just add a sortkey() method to your class, and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will Just Work. It is explicit, backwards compatible and doesn't need to wait for Python 3.8 or whenever this (mis)feature gets (hypothetically) added. > On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote: > > > Am I imagining the performance benefits? > > > > Maybe. Looking strictly at O(-) cost, there's no difference between a key > function and comparison operators. Sure it might potentially only make O(n) > calls to the key function and O(n log n) calls to compare the keys vs. O(n > log n) calls to the comparator functions but that might not actually be > faster. It is unlikely that calling a key function followed by key comparisons would be faster than just calling the key comparisons. Using a key function is effectively the old DSU idiom: values = [(key(x), x) for x in values] # O(n) values.sort() # O(n log n) values = [t[1] for t in values] # O(n) so you make two extra passes through the list. The only way that could be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__ that it saves more than the cost of those two extra passes (plus the overhead from dealing with the extra tuples). You might be thinking of the old Python 1 and early Python 2 cmp argument to sort. The comparator function can end up calling x.__lt__ up to O(n**2) times if I remember correctly, so it is quite expensive. > There certainly are cases where implementing a key function would > be quite slow. The biggest problem with a key function is that it trades off memory for time. If you are constrained by memory, but don't care how slow your sort is, an old-school comparison function might suit you better. But for those cases, functools.cmp_to_key may help. -- Steve From storchaka at gmail.com Mon Dec 4 08:50:31 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 4 Dec 2017 15:50:31 +0200 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204120638.0f84a38a@fsol> References: <20171204120638.0f84a38a@fsol> Message-ID: 04.12.17 13:06, Antoine Pitrou ????: > On Mon, 4 Dec 2017 08:45:55 +0200 > Serhiy Storchaka > wrote: >> 04.12.17 01:06, Chris Barker ????: >>> So: has this already been brought up and rejected? >> >> https://bugs.python.org/issue20632 >> >>> Am I imagining the performance benefits? >> >> This will add an additional overhead. This will be even slower than >> passing the key function, since you will need to look up the __key__ >> method in every item. > > That is a reasonable objection. However, looking up a tp_XXX slot is > very cheap. But introducing a new slot is not easy. This will increase the size of the type object, break binary compatibility. According to Stefan Behnel's researches (https://bugs.python.org/issue31336) the main time of the creation of a new type is spent on initializing slots. This cost will pay every Python program, even if it doesn't use the __key__ method. There are many more common and important methods that don't have the corresponding slot (__length_hint__, __getstate__, __reduce__, __copy__, keys). >> Yes, it is too special-case. I don't see any advantages in comparison >> with defining the __lt__ method. > > There are definitely advantages. Sorting calls __lt__ for each > comparison (that is, O(n log n) times) while __key__ would only be > called once per item at the start (that is, O(n) times). It will call __lt__ for each key comparison (the same O(n log n) times), but *in addition* it will call __key__ O(n) times. You can get the benefit only when times of calling __key__ is much smaller than the difference between times of calling item's __lt__ and key's __lt__, and maybe only for large lists. But why not just pass the key argument when you sort the large list? >> It will be rather confusing if >> different methods of sorting produce inconsistent order. > > If __key__ is inconsistent with __lt__, it is the same error as making > __lt__ inconsistent with __gt__. If __key__ is consistent with __lt__, then we can just use __lt__, and don't introduce new special methods. > And there could be a decorator that generates all comparison methods > from __key__. The decorator idea LGTM. But it doesn't need the special __key__ method. Just pass the key function as a decorator argument. This idea was proposed more than 3.5 years ago. Is there a PyPI package that implements it? From storchaka at gmail.com Mon Dec 4 09:00:17 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 4 Dec 2017 16:00:17 +0200 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204122556.GW22248@ando.pearwood.info> References: <20171204122556.GW22248@ando.pearwood.info> Message-ID: 04.12.17 14:25, Steven D'Aprano ????: > On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote: >> But the idea of the class decorator looks more sane to me. > > The purpose of __key__ is to define a key function (not a comparison > operator) for classes that aren't orderable and don't have __lt__. > > If you're going to then go ahead and define __lt__ and the other > comparison operators, there's no point to __key__. Right. The only benefit of this decorator is that it could avoid writing a boilerplate code for simple cases. Just add @ordered_by_key(attrgetter('name', 'weight')). __key__ is not needed, just pass the key function as an argument of the decorator. Of course if it can be useful this doesn't mean that it should be included in the stdlib. It could live on PyPI. From solipsis at pitrou.net Mon Dec 4 09:10:38 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 15:10:38 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: <20171204120638.0f84a38a@fsol> Message-ID: <20171204151038.1c72abff@fsol> On Mon, 4 Dec 2017 15:50:31 +0200 Serhiy Storchaka wrote: > > >> Yes, it is too special-case. I don't see any advantages in comparison > >> with defining the __lt__ method. > > > > There are definitely advantages. Sorting calls __lt__ for each > > comparison (that is, O(n log n) times) while __key__ would only be > > called once per item at the start (that is, O(n) times). > > It will call __lt__ for each key comparison (the same O(n log n) times), > but *in addition* it will call __key__ O(n) times. You can get the > benefit only when times of calling __key__ is much smaller than the > difference between times of calling item's __lt__ and key's __lt__, and > maybe only for large lists. Sure, that's the point: your non-trivial __key__ method reduces the instance to e.g. a simple tuple or string, and then __lt__ over those keys is cheap. > But why not just pass the key argument when > you sort the large list? For the same reason that you want __lt__ (or __eq__, or __hash__) to be defined on the type, not call it manually every time you want to make a comparison: because it's really a fundamental property of the type and it "feels" wrong to have to pass it explicitly. Also there are library routines which may sort implicitly their inputs (such as pprint, IIRC, though perhaps pprint only sorts after calling str() -- I haven't checked). > If __key__ is consistent with __lt__, then we can just use __lt__, and > don't introduce new special methods. This is ignoring all the other arguments... > The decorator idea LGTM. But it doesn't need the special __key__ method. > Just pass the key function as a decorator argument. I would find it cleaner to express it as a method in the class's body ("__key__" or anything else) rather than have to pass a function object. Also, it's easier to unit-test if it officially exists as a method... > This idea was proposed more than 3.5 years ago. Is there a PyPI package > that implements it? I don't know. I know I reimplemented such a thing for Numba (but of course didn't benefit from automatic sort() support), because I needed fast hashing and equality without implementing the corresponding methods by hand every time (*). It would be a pity to depend on a third-party package just for that. (*) see https://github.com/numba/numba/blob/master/numba/types/abstract.py#L88 Regards Antoine. From steve at pearwood.info Mon Dec 4 10:52:44 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 5 Dec 2017 02:52:44 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204135219.4f10050a@fsol> References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> Message-ID: <20171204155242.GY22248@ando.pearwood.info> On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote: > On Mon, 4 Dec 2017 23:16:11 +1100 > Steven D'Aprano wrote: > > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > > > There are definitely advantages. Sorting calls __lt__ for each > > > comparison (that is, O(n log n) times) while __key__ would only be > > > called once per item at the start (that is, O(n) times). > > > > Passing a key function doesn't magically turn a O(n log n) comparison > > sort into a O(n) sort. > > Where did I say it did? See the text from you quoted above. You said there are "definitely [performance] advantages" by using a key function. You then compare: - calling __lt__ O(n log n) times, versus - calling the key function O(n) times. This is a classic "apples versus oranges" comparison. You compare *actually sorting the list* with *not sorting the list* and conclude that they key function provides a performance advantage. Yes, the key function gets called O(n) times. And that's not enough to sort the list, you still have to actually sort, exactly as I said. So where do these performance advantages come from? As I said in another post, the overhead of decorating the list with the key function makes it rather unlikely that this will be faster than just sorting it. It can happen, if key(x).__lt__ is sufficiently faster than x.__lt__, but that would be unusual. > > Once you've called the key function every time, you still have to > > *actually sort*, which will be up to O(n log n) calls to __lt__ on > > whatever __key__ returned. > > Yes... and the whole point is for __key__ to return something which is > very cheap to compare, such that there are O(n) expensive calls to > __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) > expensive calls to __lt__. Read Chris' post again. The point he was making is that the class might only define __lt__ in order to support sorting, and if we allow it to define a key function instead the class can avoid adding __lt__ at all. There's no requirement or expectation that __lt__ is expensive. If you can order the values using a cheap method and an expensive method, why would you define __lt__ to use the expensive method instead of the cheap method? The point is to avoid defining __lt__ at all, and still support sorting. But even if we define both... what makes you think that x.__lt__ is expensive (if it exists) and key(x).__lt__ is cheap? It might be the other way. If they are different, there's no guarantee about which is cheaper. If they are the same, then one is redundant. > > > If __key__ is inconsistent with __lt__, it is the same error as making > > > __lt__ inconsistent with __gt__. > > > > You seem to be assuming that __key__ is another way of spelling __lt__, > > rather than being a key function. > > It isn't. Right -- you've now clarified your position. Thank you. It wasn't clear from your earlier posts. > > In any case, it isn't an error for __lt__ to be inconsistent with > > __gt__. > [...] > > Not all values have total ordering. > > As usual you should try to understand what people are trying to say > instead of imagining they are incompetent. Instead of getting your nose out of joint and accusing me of "imagining [you] are incompetent", and assuming that I didn't "try to understand what people are trying to say", how about *you* do the same? Don't assume I'm an idiot too stupid to understand your perfectly clear words, rather consider the possibility that maybe I'm reading and responding to you in good faith, but I'm not a mind-reader. If you failed to get your message across, perhaps the fault lies in your post, not my reading comprehension. In any case, whoever is to blame for the misunderstanding, the only one who can do anything about it is the writer. The writer should take responsibility for not being clear enough, rather than blaming the reader. [...] > > Defining a single comparison method is not enough to define the rest. > > How about you stick to the discussion? I'm talking about deriving > comparison methods from __key__, not from another comparison method. > Defining __key__ is definitely enough to define all comparison > methods. Here's a key function I've used: def key(string): return string.strip().casefold() Convert to a key method: def __key__(self): return self.strip().casefold() Is it your idea to define __lt__ and others like this? def __lt__(self, other): # ignoring the possibility of returning NotImplemented return self.__key__() < self.__key__() Fair enough. I'm not convinced that's going to offer definite performance advantages, but like the total_ordering decorator, presumably if we're using this, performance is secondary to convenience. Nor do I think this decorator needs to take an implicit dunder method, when it can take an explicit key function. -- Steve From solipsis at pitrou.net Mon Dec 4 11:30:42 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 17:30:42 +0100 Subject: [Python-ideas] (no subject) References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: <20171204173042.62c38a4f@fsol> On Tue, 5 Dec 2017 02:52:44 +1100 Steven D'Aprano wrote: > On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote: > > On Mon, 4 Dec 2017 23:16:11 +1100 > > Steven D'Aprano wrote: > > > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > > > > > There are definitely advantages. Sorting calls __lt__ for each > > > > comparison (that is, O(n log n) times) while __key__ would only be > > > > called once per item at the start (that is, O(n) times). > > > > > > Passing a key function doesn't magically turn a O(n log n) comparison > > > sort into a O(n) sort. > > > > Where did I say it did? > > See the text from you quoted above. You said there are "definitely > [performance] advantages" by using a key function. You then compare: > > - calling __lt__ O(n log n) times, versus > > - calling the key function O(n) times. > > This is a classic "apples versus oranges" comparison. You compare > *actually sorting the list* with *not sorting the list* and conclude > that they key function provides a performance advantage. At this point, I can only assume you are trolling by twisting my words... even though you later quote the part which explicitly clarifies that I was *not* saying what you claim I did. Why you seem to think that is contributing anything to the discussion rather than derailing it is beyond me. In any case, don't expect further responses from me. Regards Antoine. From brent.bejot at gmail.com Mon Dec 4 13:01:09 2017 From: brent.bejot at gmail.com (brent bejot) Date: Mon, 4 Dec 2017 13:01:09 -0500 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204155242.GY22248@ando.pearwood.info> References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: I'm +1 on this idea for the most part. I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so. > and then sort like this: > sorted(list_of_attrdicts, key=AttrDict._id_order) This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. > This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item. And there will be an overhead even in the case when the __key__ method is not defined. This, to me, is the only possible negative. I would be most interested to see how much of an effect this would have on real-world data that doesn't have __key__ defined. I may be new to this community but Steven D'Aprano and Antoine Pitrou, you guys bicker like my parents before they got a divorce. I'm pretty sure you're both veterans and so know how to behave yourselves. Please set the tone according to how you'd like us newbies to respond. -Brent -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Dec 4 13:16:40 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 04 Dec 2017 10:16:40 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: <5A259108.8010103@stoneleaf.us> On 12/04/2017 10:01 AM, brent bejot wrote: > This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining > __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. Actually, no, we do not all agree. ;-) -- ~Ethan~ From breamoreboy at gmail.com Mon Dec 4 13:22:05 2017 From: breamoreboy at gmail.com (Mark Lawrence) Date: Mon, 4 Dec 2017 18:22:05 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: On 04/12/17 18:01, brent bejot wrote: > I'm +1 on this idea for the most part. > > I agree particularly with the idea that it is better OOP for an object > to access it's member variables to create the key than an external > container to do so. > > > and then sort like this: > > sorted(list_of_attrdicts, key=AttrDict._id_order) > Isn't this exactly what the operator module's itemgetter and attrgetter all ready give you? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rhodri at kynesim.co.uk Mon Dec 4 14:51:16 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Mon, 4 Dec 2017 19:51:16 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: <07608712-38db-a948-993e-6884b6b71eb1@kynesim.co.uk> On 04/12/17 18:01, brent bejot wrote: > I'm +1 on this idea for the most part. > > I agree particularly with the idea that it is better OOP for an object to > access it's member variables to create the key than an external container > to do so. This I'm absolutely fine with. Key methods are something to encourage. The problem that I have is that once you get beyond simple lists of number or strings, there often isn't a particularly obvious sort order, or rather there are often multiple obvious sort orders and you may want any of them. In fact I'd go so far as to suggest that there _usually_ isn't a single obvious sort order for non-trivial classes. To take a non-Python example, I'm in charge of the reading rota at my local church, and I keep a spreadsheet of readers to help me. Usually I sort that list by the date people last read the lesson, so I can quickly tell who I should ask next. When the newsletter asks me for a list of readers, though, I sort them alphabetically by surname, which most people would think of as the natural sorting order. -- Rhodri James *-* Kynesim Ltd From barry at barrys-emacs.org Mon Dec 4 14:37:02 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Mon, 4 Dec 2017 19:37:02 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> Message-ID: <443C1BD4-B9BF-4964-A60F-A68C43FC4C5B@barrys-emacs.org> I wondered what the performance would be and tested the following code: #!/usr/bin/env python3 import random import time random.seed( hash('Testing Keys') ) lt_calls = 0 key_calls = 0 class MyObject: def __init__( self, value1, value2 ): self.value1 = value1 self.value2 = value2 def __lt__(self, other): global lt_calls lt_calls +=1 if self.value1 < other.value1: return True else: return self.value2 < other.value2 def key(self): global key_calls key_calls +=1 return self.value1, self.value2 lt_list = [] random for value1 in reversed(range(10000)): value2 = value1 - 50 lt_list.append( MyObject( value1, value2 ) ) random.shuffle( lt_list ) key_list = lt_list[:] print( len(lt_list) ) s = time.time() key_list.sort( key=MyObject.key ) e = time.time() - s print( 'key %.6fs %6d calls' % (e, key_calls) ) s = time.time() lt_list.sort() e = time.time() - s print( ' lt %.6fs %6d calls' % (e, lt_calls) ) it outputs this for my with python 3.6.0 10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key). I often have more then one way to sort an object. Its easy for me to provide a set of key functions that meet the needs of each sort context. I'm not sure what extra value the __sort_key__ would offer over providing a key method as I did. Barry From solipsis at pitrou.net Mon Dec 4 15:12:03 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Dec 2017 21:12:03 +0100 Subject: [Python-ideas] a sorting protocol dunder method? References: <20171204120638.0f84a38a@fsol> <443C1BD4-B9BF-4964-A60F-A68C43FC4C5B@barrys-emacs.org> Message-ID: <20171204211203.5a091913@fsol> On Mon, 4 Dec 2017 19:37:02 +0000 Barry Scott wrote: > I wondered what the performance would be and tested the following code: > [...] > > it outputs this for my with python 3.6.0 > > 10000 > key 0.010628s 10000 calls > lt 0.053690s 119886 calls > > It seems that providing a key is ~5 times faster the depending on __lt__. > (I even used a short circuit to se if __lt__ could beat key). Thanks for taking the time to write a benchmark. I'm not surprised by the results (and your __lt__ method isn't even complicated: the gap could be very much wider). There is more to Python performance than aggregate big-O algorithmic complexity. Regards Antoine. From barry at barrys-emacs.org Mon Dec 4 16:22:44 2017 From: barry at barrys-emacs.org (Barry) Date: Mon, 4 Dec 2017 21:22:44 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204211203.5a091913@fsol> References: <20171204120638.0f84a38a@fsol> <443C1BD4-B9BF-4964-A60F-A68C43FC4C5B@barrys-emacs.org> <20171204211203.5a091913@fsol> Message-ID: <6F8EB500-0696-4CDD-9BFB-017D2829D4C4@barrys-emacs.org> > On 4 Dec 2017, at 20:12, Antoine Pitrou wrote: > > On Mon, 4 Dec 2017 19:37:02 +0000 > Barry Scott wrote: >> I wondered what the performance would be and tested the following code: >> > [...] >> >> it outputs this for my with python 3.6.0 >> >> 10000 >> key 0.010628s 10000 calls >> lt 0.053690s 119886 calls >> >> It seems that providing a key is ~5 times faster the depending on __lt__. >> (I even used a short circuit to se if __lt__ could beat key). > > Thanks for taking the time to write a benchmark. I'm not surprised > by the results (and your __lt__ method isn't even complicated: the gap > could be very much wider). There is more to Python performance than > aggregate big-O algorithmic complexity. I was surprised by the huge difference. I was expecting a closer race. For the record I think that a __sort_key__ is not a good idea as it is so easy to do as I did and define key methods on the class, without the limit of one such method. Barry > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Mon Dec 4 16:55:16 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 5 Dec 2017 08:55:16 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <6F8EB500-0696-4CDD-9BFB-017D2829D4C4@barrys-emacs.org> References: <20171204120638.0f84a38a@fsol> <443C1BD4-B9BF-4964-A60F-A68C43FC4C5B@barrys-emacs.org> <20171204211203.5a091913@fsol> <6F8EB500-0696-4CDD-9BFB-017D2829D4C4@barrys-emacs.org> Message-ID: On Tue, Dec 5, 2017 at 8:22 AM, Barry wrote: > > >> On 4 Dec 2017, at 20:12, Antoine Pitrou wrote: >> >> On Mon, 4 Dec 2017 19:37:02 +0000 >> Barry Scott wrote: >>> I wondered what the performance would be and tested the following code: >>> >> [...] >>> >>> it outputs this for my with python 3.6.0 >>> >>> 10000 >>> key 0.010628s 10000 calls >>> lt 0.053690s 119886 calls >>> >>> It seems that providing a key is ~5 times faster the depending on __lt__. >>> (I even used a short circuit to se if __lt__ could beat key). >> >> Thanks for taking the time to write a benchmark. I'm not surprised >> by the results (and your __lt__ method isn't even complicated: the gap >> could be very much wider). There is more to Python performance than >> aggregate big-O algorithmic complexity. > > I was surprised by the huge difference. I was expecting a closer race. > > For the record I think that a __sort_key__ is not a good idea as it is so easy to > do as I did and define key methods on the class, without the limit of one such > method. The numbers here are all fairly small, but they make a lot of sense. Calls into Python code are potentially quite slow, so there's a LOT of benefit to be had by calling Python code once per object instead of N log N times for the comparisons. Increasing the length of the list will make that difference even more pronounced. But that's an argument for using a key function, not for having a __key__ special method. ChrisA From listes at salort.eu Mon Dec 4 16:54:27 2017 From: listes at salort.eu (Julien Salort) Date: Mon, 4 Dec 2017 22:54:27 +0100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <20171204131614.GX22248@ando.pearwood.info> References: <20171204005721.GT22248@ando.pearwood.info> <20171204131614.GX22248@ando.pearwood.info> Message-ID: Le 04/12/2017 ? 14:16, Steven D'Aprano a ?crit?: > We're taking something which belongs in the report generator or > collection, the knowledge of how to sort a collection of unordered > values, and baking it into the values themselves. (Effectively making > them ordered!) It is also possible to use this __key__ method for classes for which the ordering is indeed unambiguously defined, e.g.: class MyValue: ??? def __init__(self, value, comment): ??? ??? self.value = value ??? ??? self.comment = comment ??? def __key__(self): ??? ??? return self.value Then it is not shocking to define a sorting key. From rosuav at gmail.com Mon Dec 4 17:03:20 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 5 Dec 2017 09:03:20 +1100 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204005721.GT22248@ando.pearwood.info> <20171204131614.GX22248@ando.pearwood.info> Message-ID: On Tue, Dec 5, 2017 at 8:54 AM, Julien Salort wrote: > Le 04/12/2017 ? 14:16, Steven D'Aprano a ?crit : > >> We're taking something which belongs in the report generator or >> collection, the knowledge of how to sort a collection of unordered >> values, and baking it into the values themselves. (Effectively making >> them ordered!) > > It is also possible to use this __key__ method for classes for which the > ordering > is indeed unambiguously defined, e.g.: > > class MyValue: > > def __init__(self, value, comment): > self.value = value > self.comment = comment > > def __key__(self): > return self.value > > Then it is not shocking to define a sorting key. MyValue = namedtuple('MyValue', ['value', 'comment']) Job done :) ChrisA From apalala at gmail.com Mon Dec 4 19:24:21 2017 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Mon, 4 Dec 2017 20:24:21 -0400 Subject: [Python-ideas] Data Classes in Kotlin Message-ID: I thought this might be interesting input for the discussions about "data classes" in Python: https://kotlinlang.org/docs/reference/data-classes.html I think that the use of "data class" as syntax is kind of cool, but what really matters is the semantics they chose for Kotlin. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Dec 4 20:06:02 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 4 Dec 2017 17:06:02 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-) Thanks for all the feedback. From what I can tell, we don't have a consensus, though It's looking pretty unlikely that this is going to be adopted (though maybe the decorator idea). But I'm going to go through and try to summarize what we (I?) have learned, and what does seem to be agreed upon, or not. If I misinterpret something you said, or take a quote out of context, please correct it, but trust that I didn't do it on purpose.... Also, it's kind of a pain to do a digest like this and properly attribute everyone, so mostly I'm not going to attribute the quotes... So: has this already been brought up and rejected? > https://bugs.python.org/issue20632 Thanks Serhiy -- I didn't think to look in the Bug DB. This does remind me that it would be good to have a place (maybe in a mets-PEP?) to put topics that have been discussed, but didn't get as far as anyone writing a PEP... An existence proof: in NLTK, an __lt__ method added purely to facilitate > consistent sorting (in doctests) of structured data objects for which > comparison operators do not really make conceptual sense: > https://github.com/nltk/nltk/pull/1902/files#diff- > 454368f06fd635b1e06c9bb6d65bd19bR689 > Granted, calling min() and max() on collections of these objects would > not make conceptual sense either. Still, __sort_key__ would have been > cleaner than __lt__. So nice to know I'm not the only one that wants (needs) to provide a sort order be default -- though it doesn't mean it's not a niche issue anyway. By default, we sort by the inherent order of the values. But if the > values have no inherent order (they are unordered), we can sort > unordered items in a collection by providing an appropriate key > function. Hence why I say they are loosely related. > It > doesn't make sense to put that functionality into the complex numbers > themselves: complex numbers are unordered, and any order we impose on > them comes from the collection, not the individual numbers. Sure -- and that's why we definitely still need a key option for the sorting functions, and why not all classes should define a sort order. But for many classes, it does make sense to define a default sort order. Take something as simple as strings -- they have a default sort order, but a user very well might want to sort them differently -- say case-insensitive, or ... So I think there is consensus here (and no one was proposing otherwise): - Defining a sort order is NOT required of all classes - The sorting methods definitely need to allow a custom key function. the comparison operators > apply to the values in the collection; the key function applies to the > collection itself But many (most) objects DO provide a default sort order, by means of the comparison methods. So the idea that objects have a default sort order is already pretty well accepted :-) But that assumes that there's only ever one way to order a collection of > unordered items. no,. it doesn't -- it implies there is a default way, which is what is done already for most types. And again, some classes shouldn't even have a default. I'm not sure I understand the motivation to make elements *sortable* but > not comparable. If an arbitrary order is still useful, I'd think you'd want > to be able to tell how two particular elements *would* sort by asking a yeah, I'm not sure about this -- going back to the example of complex numbers, I like the idea of providing a default sort order but not make the declaration that this is how the objects SHOULD be compared -- after all, you can provide your own key function, but not so easily re-define the comparisons... > Is sorting-related functionally too special-case to deserve a protocol? > > Yes, it is too special-case. I don't see any advantages in comparison with > defining the __lt__ method. It will be rather confusing if different > methods of sorting produce inconsistent order. If the first item of the > sorted list is not the smallest item of the list. well, yeah, but not any different than the fact that you can inconsitently define the comparison operators already. What I wasn't realizing is that you only need to define __lt__ (and __eq__) to get sortability. Maybe it would be good to document that more prominently somewhere (if it's documented at all, other than the source). But the idea of the class decorator looks more sane to me. yeah, I'm liking that. Most often when I define equality and comparison dunder methods for a > custom class, I'm effectively just deferring the comparison to some > field or tuple of fields of the object. Exactly -- which may be an argument for a decorator, rather than a dunder method -- particularly if performance isn't improved by the dunder method. What if we added a @key_ordering decorator, like @total_ordering but > using __key__ to generate the comparisons? This could be a good idea -- just putting it here for the record as it's mentioned elsewhere. OK -- multiple posts about performance, I'm going to try to summarize: > This will add an additional overhead. This will be even slower than > > passing the key function, since you will need to look up the __key__ > > method in every item. > That is a reasonable objection. However, looking up a tp_XXX slot is > very cheap. > > Yes, it is too special-case. I don't see any advantages in comparison > > with defining the __lt__ method. > There are definitely advantages. Sorting calls __lt__ for each > comparison (that is, O(n log n) times) while __key__ would only be > called once per item at the start (that is, O(n) times). OK -- so: if a new slot is added, then the performance hit of __key__ lookup is minor, but every type now has a new slot, so there is another performance (and memory) hit for that everywhere -- this is too niche to want to hit every type. Any chance that adding a __key__ slot would help with other built-in types by being faster than checking __lt__ -- probably not. Conversely, when sorting a key can provide significant performance > improvements. A single O(n) pass to compute keys, followed by O(n > log(n)) comparisons could be significantly faster, snip > It can happen, if key(x).__lt__ is sufficiently faster than > x.__lt__, but that would be unusual. actually, that's quite common :-) -- but a __sort_key__ method would add n extra attribute lookups, too. so now it's less likely that it'll be faster than __lt__ This is "key" :-) -- it is very common for the sort key to be a very simple type, say int or float, that are very fast to compare -- so this can be a really performance saver when passing key function. Also, I recall that someone was working on special-casing the sort code to do even faster sorting on simple numeric keys -- not sure what came of that though. But the extra method lookup for every key could overwhelm that anyway :-( > And there will be an overhead even in the case when the __key__ method is not defined. not good. I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup?? If it defines both, it isn't clear which will be used for sorting. > Should __lt__ take priority, or __key__? Whichever we choose, somebody > is going to be upset and confused by the choice. __sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in any other way. > I disagree with the design of this: it is putting the decision of how to > sort uncomparable objects in the wrong place, in the object itself > rather than in the report that wants to sort them. No -- it's putting the decision about a default sort order in the object itself -- maybe a niche case, but a real one. If we have a __key__ method on a class, then the following becomes true: > * We can work out which of 2 instances is bigger/smaller using max/min. > * We can compare two items by doing a sort. > So while the *intent* may not be to allow comparisons, that's what you've > done. Sure, but Python is for consenting adults -- we allow a lot of things that aren't encouraged... I don't think leveraging sort to get a comparison is a problematic use case at all -- though "min" and "max" is a different matter ... maybe they shouldn't use __sort_key__? thought that does weaken the whole idea :-) But why not just define __lt__ and use total_ordering, instead of > defining two identical decorators that differ only in the name of the > dunder method they use? well, one reason is that total_ordering can result in even worse performance... thought probably not a huge deal. I never imagined that it would be a required method. Of course it is > optional. But by adding interpreter support, we're blessing something > which I think is a misguided design choice as the One Obvious Way. I don't think it's interpreter support, but rather standard library support -- it would be changes in the list.sort and sorted, and ... functions, not any change to the interpreter. (and it would be a change to the language definition) But your point is that same. However, I think there is no consensus that it's a misguided design choice -- having an explicit way to specifically say "this is the default way to sort these objects", when there is not other reason for total ordering feels pythonic to me. Again, maybe a niche use case though. If people like Chris' idea, just add a sortkey() method to your class, > and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will > Just Work. Not a bad approach, though that as useful until/unless it becomes something of a standard. (and for the record, if we did add __sort_key__, then on older versions you could still do: sorted(collection_of_Spam, key=Spam.__sort_key__) so too bad on the compatibility front... The decorator idea LGTM. But it doesn't need the special __key__ method. > Just pass the key function as a decorator argument. > This idea was proposed more than 3.5 years ago. Is there a PyPI package > that implements it? yes, it was, though not in a terribly public place / way... I would find it cleaner to express it as a method in the class's body > ("__key__" or anything else) rather than have to pass a function object. > Also, it's easier to unit-test if it officially exists as a method... me too, though couldn't we do both? the decorator, if not passed a sort function, would look for __key__. I wondered what the performance would be and tested the following code: it outputs this for my with python 3.6.0 > 10000 > key 0.010628s 10000 calls > lt 0.053690s 119886 calls > It seems that providing a key is ~5 times faster the depending on __lt__. > (I even used a short circuit to se if __lt__ could beat key). yeah, this is what I expect for a key function -- exactly why I started all this expecting a performance gain. But as others' have pointed out, this is a key function, not a __key__ method that would need to be called each time. I'm going to try to benchmark (a simulation of) that. This is Barry's code, with the addition of a "outer_key" function that call the instances key method: [I got very similar results as Barry with his version: about 5X faster with the key function] def outer_key(item): return item.key() so we get a function lookup each time it's used. However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key. Did I make a mistake? is that lookup somehow cached? In [36]: run sort_key_test.py 10000 key 0.012529s 10000 calls outer_key 0.012139s 10000 calls lt 0.048057s 119877 calls each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key. Also, I tried making a "simpler" __lt__ method: return (self.value1, self.value2) < (other.value1, other.value2) but it was bit slower than the previous one -- interesting. Then I tried a simpler (but probably common) simple attribute sort: def __lt__(self, other): global lt_calls lt_calls += 1 return self.value1 < other.value1 def key(self): global key_calls key_calls += 1 return self.value1 And that results in about a 6X speedup In [42]: run sort_key_test.py 10000 key 0.005157s 10000 calls outer_key 0.007000s 10000 calls lt 0.041454s 119877 calls time ratio: 5.922036784741144 And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still) In fact, this seems to show that having a __key__ method would often provide a performance boost, even for small lists. (still to try to figure out -- how much would the look for __key__ slow things down when it didn't exist....) I have got to be doing something wrong here -- I hope some of you smarter folks will tell me what :-) Code enclosed. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sort_key_test.py Type: text/x-python-script Size: 1498 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Mon Dec 4 23:36:12 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Dec 2017 17:36:12 +1300 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <5A259108.8010103@stoneleaf.us> References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> <5A259108.8010103@stoneleaf.us> Message-ID: <5A26223C.2080203@canterbury.ac.nz> > On 12/04/2017 10:01 AM, brent bejot wrote: > >> I think we can all agree that defining >> __key__ and calling "sorted(list_of_attrdicts)" has that syntactic >> sugar that is oh-so-sweet-and-tasty. Just remember that too much syntactic sugar can give you cancer of the semicolon. -- Greg From nas-python-ideas at arctrix.com Tue Dec 5 16:19:23 2017 From: nas-python-ideas at arctrix.com (Neil Schemenauer) Date: Tue, 5 Dec 2017 15:19:23 -0600 Subject: [Python-ideas] f-string literals by default? Message-ID: <20171205211923.ioh7d2zpjl7yneyj@python.ca> I think most people who have tried f-strings have found them handy. Could we transition to making default string literal into an f-string? I think there is a smooth migration path. f-strings without embedded expressions already compile to the same bytecode as normal string literals. I.e. no overhead. The issue will be literal strings that contain the f-string format characters. We could add a future import, e.g. from __future__ import fstring_literals that would make all literal strings in the module into f-strings. In some future release, we could warn about literal strings in modules without the future import that contain f-string format characters. Eventually, we can change the default. To make migration easier, we can provide a source-to-source translation tool. It is quite simple to do that using the tokenizer module. From joejev at gmail.com Tue Dec 5 16:22:11 2017 From: joejev at gmail.com (Joseph Jevnik) Date: Tue, 5 Dec 2017 16:22:11 -0500 Subject: [Python-ideas] f-string literals by default? In-Reply-To: <20171205211923.ioh7d2zpjl7yneyj@python.ca> References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: This would break code that uses str.format everywhere for very little benefit. On Tue, Dec 5, 2017 at 4:19 PM, Neil Schemenauer wrote: > I think most people who have tried f-strings have found them handy. > Could we transition to making default string literal into an > f-string? I think there is a smooth migration path. > > f-strings without embedded expressions already compile to the same > bytecode as normal string literals. I.e. no overhead. The issue > will be literal strings that contain the f-string format characters. > > We could add a future import, e.g. > > from __future__ import fstring_literals > > that would make all literal strings in the module into f-strings. > In some future release, we could warn about literal strings in > modules without the future import that contain f-string format > characters. Eventually, we can change the default. > > To make migration easier, we can provide a source-to-source > translation tool. It is quite simple to do that using > the tokenizer module. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Tue Dec 5 16:23:33 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Dec 2017 08:23:33 +1100 Subject: [Python-ideas] f-string literals by default? In-Reply-To: <20171205211923.ioh7d2zpjl7yneyj@python.ca> References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: On Wed, Dec 6, 2017 at 8:19 AM, Neil Schemenauer wrote: > I think most people who have tried f-strings have found them handy. > Could we transition to making default string literal into an > f-string? I think there is a smooth migration path. > > f-strings without embedded expressions already compile to the same > bytecode as normal string literals. I.e. no overhead. The issue > will be literal strings that contain the f-string format characters. > > We could add a future import, e.g. > > from __future__ import fstring_literals > > that would make all literal strings in the module into f-strings. > In some future release, we could warn about literal strings in > modules without the future import that contain f-string format > characters. Eventually, we can change the default. > > To make migration easier, we can provide a source-to-source > translation tool. It is quite simple to do that using > the tokenizer module. No. Definitely not. It'd lead to all manner of confusion ("why can't I put braces in my string?"), and it's unnecessary. When you want interpolation, you put *one letter* in front of the string, and it means that most strings aren't magical pieces of executable code. Tiny advantage, large potential disadvantage. ChrisA From jsbueno at python.org.br Tue Dec 5 16:59:36 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 5 Dec 2017 19:59:36 -0200 Subject: [Python-ideas] f-string literals by default? In-Reply-To: References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: On 5 December 2017 at 19:23, Chris Angelico wrote: > On Wed, Dec 6, 2017 at 8:19 AM, Neil Schemenauer > wrote: >> I think most people who have tried f-strings have found them handy. >> Could we transition to making default string literal into an >> f-string? I think there is a smooth migration path. >> >> f-strings without embedded expressions already compile to the same >> bytecode as normal string literals. I.e. no overhead. The issue >> will be literal strings that contain the f-string format characters. >> >> We could add a future import, e.g. >> >> from __future__ import fstring_literals >> >> that would make all literal strings in the module into f-strings. >> In some future release, we could warn about literal strings in >> modules without the future import that contain f-string format >> characters. Eventually, we can change the default. >> >> To make migration easier, we can provide a source-to-source >> translation tool. It is quite simple to do that using >> the tokenizer module. > > No. Definitely not. It'd lead to all manner of confusion ("why can't I > put braces in my string?"), and it's unnecessary. When you want > interpolation, you put *one letter* in front of the string, and it > means that most strings aren't magical pieces of executable code. Tiny > advantage, large potential disadvantage. One more big NO here - strings are _data_ not code - this little fact had made Python easier to learn for decades. If you need interpolation, and therefore, code that is run in the context the string is declared, just use f-strings. But f-strings are not static data, they are objects aware of the point in the source code file they are declared - a very different beast from ordinary strings. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From nas-python-ideas at arctrix.com Tue Dec 5 17:11:12 2017 From: nas-python-ideas at arctrix.com (Neil Schemenauer) Date: Tue, 5 Dec 2017 16:11:12 -0600 Subject: [Python-ideas] f-string literals by default? In-Reply-To: References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: <20171205221112.4hm2k5xockuulnfr@python.ca> On 2017-12-05, Joseph Jevnik wrote: > This would break code that uses str.format everywhere for very > little benefit. That is a very strong reason not to do it. I think we can end this thread. Thanks. From rosuav at gmail.com Tue Dec 5 17:10:09 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Dec 2017 09:10:09 +1100 Subject: [Python-ideas] f-string literals by default? In-Reply-To: References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: On Wed, Dec 6, 2017 at 8:59 AM, Joao S. O. Bueno wrote: > One more big NO here - > strings are _data_ not code - this little fact had made > Python easier to learn for decades. > If you need interpolation, and therefore, code that is run in > the context the string is declared, just use f-strings. But f-strings > are not static data, they are objects aware of the point > in the source code file they are declared - a very different > beast from ordinary strings. To be technically accurate, an f-string isn't an object at all - it's an expression. There's no way to refer to an unevaluated f-string. ChrisA From storchaka at gmail.com Tue Dec 5 17:46:42 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Dec 2017 00:46:42 +0200 Subject: [Python-ideas] f-string literals by default? In-Reply-To: References: <20171205211923.ioh7d2zpjl7yneyj@python.ca> Message-ID: 05.12.17 23:22, Joseph Jevnik ????: > This would break code that uses str.format everywhere for very little benefit. And many regular expressions. And string.Template patterns. And docstrings (silently). And ast.literal_eval, shelve, doctest. From barry at barrys-emacs.org Wed Dec 6 14:47:22 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 6 Dec 2017 19:47:22 +0000 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> Message-ID: <486C7482-F3B7-4329-B768-95FB344A36A4@barrys-emacs.org> > On 5 Dec 2017, at 01:06, Chris Barker wrote: > > wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-) > > This could be a good idea -- just putting it here for the record as it's mentioned elsewhere. > > I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup?? > > If it defines both, it isn't clear which will be used for sorting. > Should __lt__ take priority, or __key__? Whichever we choose, somebody > is going to be upset and confused by the choice. > > __sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in any other way. If by no brainer you mean the performance of __sort-key__ is always better of __lt__ then I will wask for a proof in the form of benchmarks with enough use-case coverage. > [I got very similar results as Barry with his version: about 5X faster with the key function] > > def outer_key(item): > return item.key() > > so we get a function lookup each time it's used. > > However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key. > > Did I make a mistake? is that lookup somehow cached? > > In [36]: run sort_key_test.py > 10000 > key 0.012529s 10000 calls > outer_key 0.012139s 10000 calls > lt 0.048057s 119877 calls > > each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key. > > Also, I tried making a "simpler" __lt__ method: > > return (self.value1, self.value2) < (other.value1, other.value2) > > but it was bit slower than the previous one -- interesting. This is more expensive to execute then my version for 2 reasons. 1) my __lt__ did not need to create any tuples. 2) my __lt__ can exit after only looking at the value1's > > Then I tried a simpler (but probably common) simple attribute sort: > > def __lt__(self, other): > global lt_calls > lt_calls += 1 > > return self.value1 < other.value1 > > def key(self): > global key_calls > key_calls += 1 > > return self.value1 > > And that results in about a 6X speedup > > In [42]: run sort_key_test.py > 10000 > key 0.005157s 10000 calls > outer_key 0.007000s 10000 calls > lt 0.041454s 119877 calls > time ratio: 5.922036784741144 > > > And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still) My guess is that this is because the __lt__ test on simple types is very fast in python. Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Fri Dec 8 05:41:10 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Fri, 8 Dec 2017 11:41:10 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? Message-ID: IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case. I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not. Currently the following functions fall back on __int__ where available: PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask whereas the following (at least according to the docs--haven't checked the code in all cases) do not: PyLong_AsSsize_t PyLong_AsUnsignedLong PyLong_AsSize_t PyLong_AsUnsignedLongLong PyLong_AsDouble PyLong_AsVoidPtr I think this inconsistency should be fixed, unless there's some reason for it I'm not seeing. Thanks, Erik From solipsis at pitrou.net Fri Dec 8 05:59:04 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Dec 2017 11:59:04 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? References: Message-ID: <20171208115904.7ec58bf7@fsol> On Fri, 8 Dec 2017 11:41:10 +0100 Erik Bray wrote: > > I ran into this because I was passing an object that implements > __int__ to the maxlen argument to deque(). On Python 2 this used > PyInt_AsSsize_t which does fall back to calling __int__, whereas > PyLong_AsSsize_t does not. It should probably call PyNumber_AsSsize_t instead (which will call __index__, which is the right thing here). > I think this inconsistency should be fixed, unless there's some reason > for it I'm not seeing. That sounds reasonable to me. Regards Antoine. From storchaka at gmail.com Fri Dec 8 06:26:48 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 8 Dec 2017 13:26:48 +0200 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: Message-ID: 08.12.17 12:41, Erik Bray ????: > IIUC, it seems to be carry-over from Python 2's PyLong API, but I > don't see an obvious reason for it. In every case there's an explicit > PyLong_Check first anyways, so not calling __int__ doesn't help for > the common case of exact int objects; adding the fallback costs > nothing in that case. There is also a case of int subclasses. It is expected that PyLong_AsLong is atomic, and calling __int__ can lead to crashes or similar consequences. > I ran into this because I was passing an object that implements > __int__ to the maxlen argument to deque(). On Python 2 this used > PyInt_AsSsize_t which does fall back to calling __int__, whereas > PyLong_AsSsize_t does not. PyLong_* functions provide an interface to PyLong objects. If they don't return the content of a PyLong object, how can it be retrieved? If you want to work with general numbers you should use PyNumber_* functions. In your particular case it is more reasonable to fallback to __index__ rather than __int__. Unlikely maxlen=4.2 makes sense. > Currently the following functions fall back on __int__ where available: > > PyLong_AsLong > PyLong_AsLongAndOverflow > PyLong_AsLongLong > PyLong_AsLongLongAndOverflow > PyLong_AsUnsignedLongMask > PyLong_AsUnsignedLongLongMask I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy. From solipsis at pitrou.net Fri Dec 8 06:36:51 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Dec 2017 12:36:51 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? References: Message-ID: <20171208123651.42706e51@fsol> On Fri, 8 Dec 2017 13:26:48 +0200 Serhiy Storchaka wrote: > > > Currently the following functions fall back on __int__ where available: > > > > PyLong_AsLong > > PyLong_AsLongAndOverflow > > PyLong_AsLongLong > > PyLong_AsLongLongAndOverflow > > PyLong_AsUnsignedLongMask > > PyLong_AsUnsignedLongLongMask > > I think this should be deprecated (and there should be an open issue for > this). Calling __int__ is just a Python 2 legacy. I think that's a bad idea. There are widely-used int-like classes out there and it will break actual code: >>> import numpy as np >>> x = np.int64(5) >>> isinstance(x, int) False >>> x.__int__() 5 Regards Antoine. From storchaka at gmail.com Fri Dec 8 07:30:00 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 8 Dec 2017 14:30:00 +0200 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: <20171208123651.42706e51@fsol> References: <20171208123651.42706e51@fsol> Message-ID: 08.12.17 13:36, Antoine Pitrou ????: > On Fri, 8 Dec 2017 13:26:48 +0200 > Serhiy Storchaka > wrote: >> >>> Currently the following functions fall back on __int__ where available: >>> >>> PyLong_AsLong >>> PyLong_AsLongAndOverflow >>> PyLong_AsLongLong >>> PyLong_AsLongLongAndOverflow >>> PyLong_AsUnsignedLongMask >>> PyLong_AsUnsignedLongLongMask >> >> I think this should be deprecated (and there should be an open issue for >> this). Calling __int__ is just a Python 2 legacy. > > I think that's a bad idea. There are widely-used int-like classes out > there and it will break actual code: > >>>> import numpy as np >>>> x = np.int64(5) >>>> isinstance(x, int) > False >>>> x.__int__() > 5 NumPy integers implement __index__. From erik.m.bray at gmail.com Fri Dec 8 07:33:32 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Fri, 8 Dec 2017 13:33:32 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: Message-ID: On Fri, Dec 8, 2017 at 12:26 PM, Serhiy Storchaka wrote: > 08.12.17 12:41, Erik Bray ????: >> >> IIUC, it seems to be carry-over from Python 2's PyLong API, but I >> don't see an obvious reason for it. In every case there's an explicit >> PyLong_Check first anyways, so not calling __int__ doesn't help for >> the common case of exact int objects; adding the fallback costs >> nothing in that case. > > > There is also a case of int subclasses. It is expected that PyLong_AsLong is > atomic, and calling __int__ can lead to crashes or similar consequences. > >> I ran into this because I was passing an object that implements >> __int__ to the maxlen argument to deque(). On Python 2 this used >> PyInt_AsSsize_t which does fall back to calling __int__, whereas >> PyLong_AsSsize_t does not. > > > PyLong_* functions provide an interface to PyLong objects. If they don't > return the content of a PyLong object, how can it be retrieved? If you want > to work with general numbers you should use PyNumber_* functions. By "you " I assume you meant the generic "you". I'm not the one who broke things in this case :) > In your particular case it is more reasonable to fallback to __index__ > rather than __int__. Unlikely maxlen=4.2 makes sense. That's true, but in Python 2 that was possible: >>> deque([], maxlen=4.2) deque([], maxlen=4) More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used. It was mainly motivated by slices, but it *could* be used in general cases where it definitely wouldn't make sense to accept a float (I wonder if maybe the real problem here is that floats can be coerced automatically to ints....) In other words, there are probably countless other cases in the stdlib at all where it "doesn't make sense" to accept a float, but that otherwise should accept objects that can be coerced to int without having to manually wrap those objects with an int(o) call. >> Currently the following functions fall back on __int__ where available: >> >> PyLong_AsLong >> PyLong_AsLongAndOverflow >> PyLong_AsLongLong >> PyLong_AsLongLongAndOverflow >> PyLong_AsUnsignedLongMask >> PyLong_AsUnsignedLongLongMask > > > I think this should be deprecated (and there should be an open issue for > this). Calling __int__ is just a Python 2 legacy. Okay, but then there are probably many cases where they should be replaced with PyNumber_ equivalents or else who knows how much code would break. From solipsis at pitrou.net Fri Dec 8 07:52:54 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Dec 2017 13:52:54 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? References: <20171208123651.42706e51@fsol> Message-ID: <20171208135254.26be0f67@fsol> On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka wrote: > > NumPy integers implement __index__. That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow(). Regards Antoine. From erik.m.bray at gmail.com Fri Dec 8 09:12:30 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Fri, 8 Dec 2017 15:12:30 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: <20171208135254.26be0f67@fsol> References: <20171208123651.42706e51@fsol> <20171208135254.26be0f67@fsol> Message-ID: On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou wrote: > On Fri, 8 Dec 2017 14:30:00 +0200 > Serhiy Storchaka > wrote: >> >> NumPy integers implement __index__. > > That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow(). Right--pointing to __index__ basically implies that PyIndex_Check and subsequent PyNumber_AsSsize_t than there currently are. That I could agree with but then it becomes a question of where are those cases? And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). Add more PyNumber_ conversion functions? From solipsis at pitrou.net Fri Dec 8 13:11:20 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Dec 2017 19:11:20 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? References: <20171208123651.42706e51@fsol> <20171208135254.26be0f67@fsol> Message-ID: <20171208191120.75441a12@fsol> On Fri, 8 Dec 2017 15:12:30 +0100 Erik Bray wrote: > On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou wrote: > > On Fri, 8 Dec 2017 14:30:00 +0200 > > Serhiy Storchaka > > wrote: > >> > >> NumPy integers implement __index__. > > > > That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow(). > > Right--pointing to __index__ basically implies that PyIndex_Check and > subsequent PyNumber_AsSsize_t than there currently are. That I could > agree with but then it becomes a question of where are those cases? > And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). > Add more PyNumber_ conversion functions? We would probably need more PyNumber_ conversion functions indeed. Regards Antoine. From ethan at stoneleaf.us Fri Dec 8 13:20:44 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Dec 2017 10:20:44 -0800 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: Message-ID: <5A2AD7FC.6050304@stoneleaf.us> On 12/08/2017 04:33 AM, Erik Bray wrote: > More importantly not as many objects that coerce to int actually > implement __index__. They probably *should* but there seems to be > some confusion about how that's to be used. __int__ is for coercion (float, fraction, etc) __index__ is for true integers Note that if __index__ is defined, __int__ should also be defined, and return the same value. https://docs.python.org/3/reference/datamodel.html#object.__index__ -- ~Ethan~ From chris.barker at noaa.gov Fri Dec 8 19:08:19 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 8 Dec 2017 16:08:19 -0800 Subject: [Python-ideas] a sorting protocol dunder method? In-Reply-To: <486C7482-F3B7-4329-B768-95FB344A36A4@barrys-emacs.org> References: <20171204120638.0f84a38a@fsol> <20171204121610.GV22248@ando.pearwood.info> <20171204135219.4f10050a@fsol> <20171204155242.GY22248@ando.pearwood.info> <486C7482-F3B7-4329-B768-95FB344A36A4@barrys-emacs.org> Message-ID: <-6067323939246545074@unknownmsgid> If by no brainer you mean the performance of __sort-key__ is always better of __lt__ No. By no-brainer I meant that IF there is a __sort_key__ defined, then it should be used for sorting, regardless of whether __lt__ is also defined. (min and max should probably prefer __lt__) I will wask for a proof in the form of benchmarks with enough use-case coverage. Indeed ? I was surprised that it helped so much, and didn?t seem to hurt for the one example. But the greater concern is that this will effect every sort (that doesn?t provide a key function) so if there is any noticeable performance hit, that probably kills the idea. And the most risky example is lists of ints or floats ? which are very fast to compare. So adding a method lookup could be a big impact. I?m not sure how to benchmark that without hacking the sorting C code though. I?d still love to know if my benchmark attempt was at all correct for custom classes in any case. -CHB def outer_key(item): return item.key() so we get a function lookup each time it's used. However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key. Did I make a mistake? is that lookup somehow cached? In [36]: run sort_key_test.py 10000 key 0.012529s 10000 calls outer_key 0.012139s 10000 calls lt 0.048057s 119877 calls each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key. Also, I tried making a "simpler" __lt__ method: return (self.value1, self.value2) < (other.value1, other.value2) but it was bit slower than the previous one -- interesting. This is more expensive to execute then my version for 2 reasons. 1) my __lt__ did not need to create any tuples. 2) my __lt__ can exit after only looking at the value1's Then I tried a simpler (but probably common) simple attribute sort: def __lt__(self, other): global lt_calls lt_calls += 1 return self.value1 < other.value1 def key(self): global key_calls key_calls += 1 return self.value1 And that results in about a 6X speedup In [42]: run sort_key_test.py 10000 key 0.005157s 10000 calls outer_key 0.007000s 10000 calls lt 0.041454s 119877 calls time ratio: 5.922036784741144 And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still) My guess is that this is because the __lt__ test on simple types is very fast in python. Barry _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 9 02:09:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Dec 2017 17:09:05 +1000 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: Message-ID: On 8 December 2017 at 22:33, Erik Bray wrote: > In other words, there are probably countless other cases in the stdlib > at all where it "doesn't make sense" to accept a float, but that > otherwise should accept objects that can be coerced to int without > having to manually wrap those objects with an int(o) call. Updating these to call __index__ is fine (since that sets the expectation of a *lossless* conversion to an integer), but updating them to call __int__ generally isn't (since that conversion is allowed to be lossy, which may cause surprising behaviour). Indexing & slicing were the primary original use case for that approach (hence the method name), but it's also used for sequence repetition, and other operations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From kirillbalunov at gmail.com Sun Dec 10 05:23:59 2017 From: kirillbalunov at gmail.com (Kirill Balunov) Date: Sun, 10 Dec 2017 13:23:59 +0300 Subject: [Python-ideas] Decorator for creating enumeration? Message-ID: Since PEP 557 "Data Classes"[1] and PEP 526 "Syntax for Variable Annotations"[2] are accepted and become part of the language. Is it worth writing a proposal about decorator-version for creating an enumeration? Something like: from enum import enum @enum(unique=True, int_=False, flag=False, ...): class Color: RED : auto GREEN: auto BLUE : auto Despite the fact that Functional API to create enums already exists, it seems to me that decorator-version will allow to unify these two relatively young residents of the standard library. In addition, the idea of "Not having to specify values for enums"[3], which at the time seemed to involve much magic in the implementation, becomes part of the language. Of course, PEP 526 unequivocally says that it does not allow one to annotate the types of variables when tuple unpacking is used. But in any case, I find the variant with the decorator to be an interersting idea. With kind regards, -gdg [1] PEP 557 "Data Classes" [2] PEP 526 "Syntax for Variable Annotations" [3] Not having to specify values for enums. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 10 12:50:48 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 10 Dec 2017 09:50:48 -0800 Subject: [Python-ideas] Decorator for creating enumeration? In-Reply-To: References: Message-ID: There's a third-party enum package. Maybe you can contribute an implementation of this idea there. If it becomes popular maybe we can add it to the stdlib enum module. On Sun, Dec 10, 2017 at 2:23 AM, Kirill Balunov wrote: > Since PEP 557 "Data Classes"[1] and PEP 526 "Syntax for Variable > Annotations"[2] are accepted and become part of the language. Is it worth > writing a proposal about decorator-version for creating an enumeration? > Something like: > > from enum import enum > > @enum(unique=True, int_=False, flag=False, ...): > class Color: > RED : auto > GREEN: auto > BLUE : auto > > Despite the fact that Functional API to create enums already exists, it > seems to me that decorator-version will allow to unify these two relatively > young residents of the standard library. In addition, the idea of "Not > having to specify values for enums"[3], which at the time seemed to involve > much magic in the implementation, becomes part of the language. Of course, > PEP 526 unequivocally says that it does not allow one to annotate the types > of variables when tuple unpacking is used. But in any case, I find the > variant with the decorator to be an interersting idea. > > With kind regards, -gdg > > [1] PEP 557 "Data Classes" > [2] PEP 526 "Syntax for Variable Annotations" > > [3] Not having to specify values for enums. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Dec 10 18:21:03 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 10 Dec 2017 15:21:03 -0800 Subject: [Python-ideas] Decorator for creating enumeration? In-Reply-To: References: Message-ID: <5A2DC15F.7000001@stoneleaf.us> On 12/10/2017 09:50 AM, Guido van Rossum wrote: > There's a third-party enum package. Maybe you can contribute an implementation of this idea there. If it becomes popular > maybe we can add it to the stdlib enum module. The third-party library in question is aenum (enum34 isn't getting new functionality). It can be found at https://bitbucket.org/stoneleaf/aenum . -- ~Ethan~ From leewangzhong+python at gmail.com Mon Dec 11 04:10:20 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 11 Dec 2017 04:10:20 -0500 Subject: [Python-ideas] Raise StopIteration with same value on subsequent `next`s Message-ID: Consider the following code, which creates a generator that immediately returns 1, and then catches the StopIteration twice. def generatorfunction(): if False: yield return 1 def get_return(gen): try: next(gen) except StopIteration as e: return e.value else: raise ValueError("Generator was not ready to stop.") gen = generatorfunction() get_return(gen) #=> 1 get_return(gen) #=> None The first time StopIteration is raised, it contains the returned value. If StopIteration is forced again, the value is missing. What about keeping the return value for subsequent raises? Perhaps as an attribute on the generator object? The main disadvantage is that it'd add another reference, keeping the return value alive as long as the generator is alive. However, I don't think you'll want to keep a dead generator around anyway. Note: Javascript, which added generators, agrees with the Python status quo: the return value is only available on the first stop. C# does not have the concept of returning from an iterator. ---- Background: I made a trampoline for a toy problem, using generators as coroutines to recurse. When it came time to memoize it, I had to couple the memoization with the trampoline, because I could not cache the answer before it was computed, and I could not cache the generator object because it would not remember its return value later. I would have attached the returned value to the generator object, but for some reason, generator and coroutine objects can't take attributes. Maybe I should ask for that feature instead. Either feature would allow the concept of a coroutine that is also a thunk. From guido at python.org Mon Dec 11 15:06:35 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Dec 2017 12:06:35 -0800 Subject: [Python-ideas] Raise StopIteration with same value on subsequent `next`s In-Reply-To: References: Message-ID: IIRC we considered this when we designed this (PEP 380) and decided that hanging on to the exception object longer than necessary was not in our best interest. On Mon, Dec 11, 2017 at 1:10 AM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > Consider the following code, which creates a generator that > immediately returns 1, and then catches the StopIteration twice. > > def generatorfunction(): > if False: yield > return 1 > > def get_return(gen): > try: > next(gen) > except StopIteration as e: > return e.value > else: > raise ValueError("Generator was not ready to stop.") > > gen = generatorfunction() > get_return(gen) #=> 1 > get_return(gen) #=> None > > > The first time StopIteration is raised, it contains the returned > value. If StopIteration is forced again, the value is missing. > > What about keeping the return value for subsequent raises? Perhaps as > an attribute on the generator object? The main disadvantage is that > it'd add another reference, keeping the return value alive as long as > the generator is alive. However, I don't think you'll want to keep a > dead generator around anyway. > > Note: Javascript, which added generators, agrees with the Python > status quo: the return value is only available on the first stop. C# > does not have the concept of returning from an iterator. > > ---- > > Background: I made a trampoline for a toy problem, using generators as > coroutines to recurse. > > When it came time to memoize it, I had to couple the memoization with > the trampoline, because I could not cache the answer before it was > computed, and I could not cache the generator object because it would > not remember its return value later. > > I would have attached the returned value to the generator object, but > for some reason, generator and coroutine objects can't take > attributes. Maybe I should ask for that feature instead. > > Either feature would allow the concept of a coroutine that is also a thunk. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Mon Dec 11 18:21:07 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 11 Dec 2017 18:21:07 -0500 Subject: [Python-ideas] Raise StopIteration with same value on subsequent `next`s In-Reply-To: References: Message-ID: What about hanging onto just the value, and creating new StopIteration instances instead of raising the same one again? On Mon, Dec 11, 2017 at 3:06 PM, Guido van Rossum wrote: > IIRC we considered this when we designed this (PEP 380) and decided that > hanging on to the exception object longer than necessary was not in our best > interest. > > On Mon, Dec 11, 2017 at 1:10 AM, Franklin? Lee > wrote: >> >> Consider the following code, which creates a generator that >> immediately returns 1, and then catches the StopIteration twice. >> >> def generatorfunction(): >> if False: yield >> return 1 >> >> def get_return(gen): >> try: >> next(gen) >> except StopIteration as e: >> return e.value >> else: >> raise ValueError("Generator was not ready to stop.") >> >> gen = generatorfunction() >> get_return(gen) #=> 1 >> get_return(gen) #=> None >> >> >> The first time StopIteration is raised, it contains the returned >> value. If StopIteration is forced again, the value is missing. >> >> What about keeping the return value for subsequent raises? Perhaps as >> an attribute on the generator object? The main disadvantage is that >> it'd add another reference, keeping the return value alive as long as >> the generator is alive. However, I don't think you'll want to keep a >> dead generator around anyway. >> >> Note: Javascript, which added generators, agrees with the Python >> status quo: the return value is only available on the first stop. C# >> does not have the concept of returning from an iterator. >> >> ---- >> >> Background: I made a trampoline for a toy problem, using generators as >> coroutines to recurse. >> >> When it came time to memoize it, I had to couple the memoization with >> the trampoline, because I could not cache the answer before it was >> computed, and I could not cache the generator object because it would >> not remember its return value later. >> >> I would have attached the returned value to the generator object, but >> for some reason, generator and coroutine objects can't take >> attributes. Maybe I should ask for that feature instead. >> >> Either feature would allow the concept of a coroutine that is also a >> thunk. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Dec 11 18:44:19 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 11 Dec 2017 18:44:19 -0500 Subject: [Python-ideas] Raise StopIteration with same value on subsequent `next`s In-Reply-To: References: Message-ID: On Mon, Dec 11, 2017 at 6:21 PM, Franklin? Lee wrote: > What about hanging onto just the value, and creating new StopIteration > instances instead of raising the same one again? Doesn't really matter, as we're still prolonging the lifespan of the returned object. To me the current behaviour of generators seems fine. For regular generator users this is a non-problem. For trampolines and async frameworks--so many of them have been implemented and all of them worked around this issue in one way or another. > I would have attached the returned value to the generator object, but > for some reason, generator and coroutine objects can't take > attributes. Maybe I should ask for that feature instead. You can use a WeakKeyDictionary to associate any state with a generator; that should solve your problem. We wouldn't want to add __dict__ to generators to have another way of doing that. Yury From leewangzhong+python at gmail.com Mon Dec 11 21:24:44 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 11 Dec 2017 21:24:44 -0500 Subject: [Python-ideas] Raise StopIteration with same value on subsequent `next`s In-Reply-To: References: Message-ID: On Mon, Dec 11, 2017 at 6:44 PM, Yury Selivanov wrote: > On Mon, Dec 11, 2017 at 6:21 PM, Franklin? Lee > wrote: >> What about hanging onto just the value, and creating new StopIteration >> instances instead of raising the same one again? > > Doesn't really matter, as we're still prolonging the lifespan of the > returned object. As I said, I don't see a problem with this. How often does one accidentally hold on to an exhausted generator that they no longer have a need for? How often do those held generators have return values? How often are the generators held past the life of their returned values? And if one holds an exhausted generator on purpose, but doesn't need the returned value, what _is_ needed from it? I suspect it's a rarity of rarities. > To me the current behaviour of generators seems fine. For regular > generator users this is a non-problem. For trampolines and async > frameworks--so many of them have been implemented and all of them > worked around this issue in one way or another. For regular generator users, there is also little harm, as most simple generators won't return values. (Yes, I know the burden of proof is on the one looking to change the status quo.) I haven't figured out if there's a nice way to compose trampolines and memoization. I don't know if it's possible. Are there async frameworks that implement the concept of a coroutine which is also a thunk? I admit again, my use for it is not common. Even for myself. Maybe someone else has a better use? >> I would have attached the returned value to the generator object, but >> for some reason, generator and coroutine objects can't take >> attributes. Maybe I should ask for that feature instead. > > You can use a WeakKeyDictionary to associate any state with a > generator; that should solve your problem. We wouldn't want to add > __dict__ to generators to have another way of doing that. I used a regular dict, which was discarded when the total computation was finished, and extracted the function args to use as the key. However, I had to memoize within the trampoline code, and could not use a separate memoization decorator. From storchaka at gmail.com Sun Dec 17 14:20:38 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 17 Dec 2017 21:20:38 +0200 Subject: [Python-ideas] Repr of lambda Message-ID: Currently repr of doesn't contain much of information besides that it is a lambda. >>> lambda x: x**2 at 0x7f3479b74488> All lambdas have the same repr, different only by unreadable hexadecimal address. What if include the signature and the expression of the lambda in its repr? >>> lambda x: x**2 This would return an old feature of Python 0.9.1 (https://twitter.com/dabeaz/status/934835068043956224). Repr of function could contain just the signature. But there is a problem with default values. Their reprs can be very long, especially in the following case with mutable default value: def foo(x, _cache={}): try: return _cache[x] except KeyError: pass _cache[x] = y = expensive_calculation(x) return y Maybe the placeholder should be always used instead of default values. From tjreedy at udel.edu Sun Dec 17 15:55:16 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 17 Dec 2017 15:55:16 -0500 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: On 12/17/2017 2:20 PM, Serhiy Storchaka wrote: > Currently repr of doesn't contain much of information besides that it is > a lambda. > > >>> lambda x: x**2 > at 0x7f3479b74488> > All lambdas have the same repr, different only by unreadable hexadecimal > address. Having the same pseudo-name is what being anonymous means. Some consider that a feature ;-). > What if include the signature and the expression of the lambda in its repr? > > >>> lambda x: x**2 > Printing the return value requires adding a code or function attribute. The return expression(s), possibly None, would be just as useful for named functions. > This would return an old feature of Python 0.9.1 > (https://twitter.com/dabeaz/status/934835068043956224). > > Repr of function could contain just the signature. > > > > But there is a problem with default values. Their reprs can be very > long, especially in the following case with mutable default value: I would not expect the representation to change; just use the expression. > def foo(x, _cache={}): > ??? try: > ??????? return _cache[x] > ??? except KeyError: > ??????? pass > ??? _cache[x] = y = expensive_calculation(x) > ??? return y > > Maybe the placeholder should be always used instead of default values. Do you mean 'a placeholder'? -- Terry Jan Reedy From vano at mail.mipt.ru Mon Dec 18 01:11:03 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Mon, 18 Dec 2017 09:11:03 +0300 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: On 17.12.2017 22:20, Serhiy Storchaka wrote: > Currently repr of doesn't contain much of information besides that it > is a lambda. > > >>> lambda x: x**2 > at 0x7f3479b74488> > > All lambdas have the same repr, different only by unreadable > hexadecimal address. > > What if include the signature and the expression of the lambda in its > repr? > > >>> lambda x: x**2 > > It's the same for named functions: ??? In [1]: def ditto(a): return a ??? In [2]: ditto ??? Out[2]: Are you intending to do the same for them? > This would return an old feature of Python 0.9.1 > (https://twitter.com/dabeaz/status/934835068043956224). > > Repr of function could contain just the signature. > > > > But there is a problem with default values. Their reprs can be very > long, especially in the following case with mutable default value: > > def foo(x, _cache={}): > ??? try: > ??????? return _cache[x] > ??? except KeyError: > ??????? pass > ??? _cache[x] = y = expensive_calculation(x) > ??? return y > > Maybe the placeholder should be always used instead of default values. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Regards, Ivan From storchaka at gmail.com Mon Dec 18 03:49:54 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 18 Dec 2017 10:49:54 +0200 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: 17.12.17 22:55, Terry Reedy ????: >> What if include the signature and the expression of the lambda in its >> repr? >> >> ?>>> lambda x: x**2 >> > > Printing the return value requires adding a code or function attribute. Yes, this requires adding an optional constant code attribute. > The return expression(s), possibly None, would be just as useful for > named functions. In case of named functions the body is not an expression and usually is multiline. >> But there is a problem with default values. Their reprs can be very >> long, especially in the following case with mutable default value: > > I would not expect the representation to change; just use the expression. This is a solution, thanks. >> Maybe the placeholder should be always used instead of default values. > > Do you mean 'a placeholder'? Yes, sorry. From gadgetsteve at live.co.uk Mon Dec 18 02:54:17 2017 From: gadgetsteve at live.co.uk (Steve Barnes) Date: Mon, 18 Dec 2017 07:54:17 +0000 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: On 18/12/2017 06:11, Ivan Pozdeev via Python-ideas wrote: > On 17.12.2017 22:20, Serhiy Storchaka wrote: >> Currently repr of doesn't contain much of information besides that it >> is a lambda. >> >> >>> lambda x: x**2 >> at 0x7f3479b74488> >> >> All lambdas have the same repr, different only by unreadable >> hexadecimal address. >> >> What if include the signature and the expression of the lambda in its >> repr? >> >> >>> lambda x: x**2 >> >> > It's the same for named functions: > > ??? In [1]: def ditto(a): return a > > ??? In [2]: ditto > ??? Out[2]: > > Are you intending to do the same for them? >> This would return an old feature of Python 0.9.1 >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fdabeaz%2Fstatus%2F934835068043956224&data=02%7C01%7C%7C44a37122957d43015aa308d545de3321%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636491742990321352&sdata=iAE6MDdsZJDHfUqlHlPjnf2XV%2BiRZ%2BrP%2FL%2BIQ8kKoKo%3D&reserved=0). >> >> >> Repr of function could contain just the signature. >> >> >> >> But there is a problem with default values. Their reprs can be very >> long, especially in the following case with mutable default value: >> >> def foo(x, _cache={}): >> ??? try: >> ??????? return _cache[x] >> ??? except KeyError: >> ??????? pass >> ??? _cache[x] = y = expensive_calculation(x) >> ??? return y >> >> Maybe the placeholder should be always used instead of default values. >> >> _______________________________________________ Isn't this exactly the sort of information already available via inspect.getardspec, inspect.getsourcelines & inspect.getsource? In [19]: import inspect In [20]: l = lambda x: x**2 In [21]: inspect.getargspec(l) Out[21]: ArgSpec(args=['x'], varargs=None, keywords=None, defaults=None) In [22]: inspect.getsource(l) Out[22]: u'l = lambda x: x**2\n' In [23]: ...: def foo(x, _cache={}): ...: try: ...: return _cache[x] ...: except KeyError: ...: pass ...: _cache[x] = y = expensive_calculation(x) ...: return y ...: In [24]: inspect.getargspec(foo) Out[24]: ArgSpec(args=['x', '_cache'], varargs=None, keywords=None, defaults=({},)) In [25]: inspect.getsource(foo) Out[25]: u'def foo(x, _cache={}):\n try:\n return _cache[x]\n except KeyError:\n pass\n _cache[x] = y = expensive_calculation(x)\n return y \n' In [26]: inspect.getsourcelines(foo) Out[26]: ([u'def foo(x, _cache={}):\n', u' try:\n', u' return _cache[x]\n', u' except KeyError:\n', u' pass\n', u' _cache[x] = y = expensive_calculation(x)\n', u' return y \n'], 2) -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com From steve at pearwood.info Mon Dec 18 06:43:53 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 18 Dec 2017 22:43:53 +1100 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: <20171218114352.GD16230@ando.pearwood.info> On Mon, Dec 18, 2017 at 07:54:17AM +0000, Steve Barnes wrote: > Isn't this exactly the sort of information already available via > inspect.getardspec, inspect.getsourcelines & inspect.getsource? [snip ipython interactive session showing that the answer is "Yes"] You answered your own question: yes it is. What's your point? That's not a rhetorical question -- I genuinely don't understand why you raise this. Do you see the existence of inspect.* as negating the usefulness of giving functions a more useful repr? As proof that the information is available? Something else? -- Steve From levkivskyi at gmail.com Mon Dec 18 13:11:56 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 18 Dec 2017 19:11:56 +0100 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: Message-ID: Serhiy, I like the idea, but in typeshed we have an agreement to always show a default value by an ellipsis. For example, definition like this: def fun(x, y, z=0): return x + y + z can be represented like this fun(x, y, z=...) or if one has annotations in the definition, then fun(x: int, y: int, z: int = ...) -> int So if you would make this change, I would prefer the existing "style". -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gadgetsteve at live.co.uk Mon Dec 18 08:31:22 2017 From: gadgetsteve at live.co.uk (Steve Barnes) Date: Mon, 18 Dec 2017 13:31:22 +0000 Subject: [Python-ideas] Repr of lambda In-Reply-To: <20171218114352.GD16230@ando.pearwood.info> References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On 18/12/2017 11:43, Steven D'Aprano wrote: > On Mon, Dec 18, 2017 at 07:54:17AM +0000, Steve Barnes wrote: > >> Isn't this exactly the sort of information already available via >> inspect.getardspec, inspect.getsourcelines & inspect.getsource? > > [snip ipython interactive session showing that the answer is "Yes"] > > You answered your own question: yes it is. What's your point? That's not > a rhetorical question -- I genuinely don't understand why you raise > this. Do you see the existence of inspect.* as negating the usefulness > of giving functions a more useful repr? As proof that the information is > available? Something else? > > Hi Steve, I see it as showing that the information is already available to anybody who needs it so I question the usefulness of changing repr (for everybody) which is bound to break something somewhere. Now if you were to suggest adding a verbose flag to repr or a print format, maybe both - so that people don't have to import inspect then I would say that you have a lot less chance of breaking any code/test cases that is in the wild. Sorry I should have made my point(s) clearer. -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com From jcroteau at gmail.com Mon Dec 18 18:51:46 2017 From: jcroteau at gmail.com (Joel Croteau) Date: Mon, 18 Dec 2017 23:51:46 +0000 Subject: [Python-ideas] Support floating-point values in collections.Counter Message-ID: It would be useful in many scenarios for values in collections.Counter to be allowed to be floating point. I know that Counter nominally emulates a multiset, which would suggest only integer values, but in a more general sense, it could be an accumulator of either floating point or integer data. As near as I can tell, Collection already does support float values in both Python 2.7 and 3.6, and the way the code is implemented, this change should be a no-op. All that is required is to update the documentation to say floating-point values are allowed, as it currently says only integers are allowed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fiskomaten at gmail.com Tue Dec 19 04:20:13 2017 From: fiskomaten at gmail.com (=?UTF-8?B?U8O4cmVuIFBpbGfDpXJk?=) Date: Tue, 19 Dec 2017 10:20:13 +0100 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: On Tue, Dec 19, 2017 at 12:51 AM, Joel Croteau wrote: > It would be useful in many scenarios for values in collections.Counter to be > allowed to be floating point. I know that Counter nominally emulates a > multiset, which would suggest only integer values, but in a more general > sense, it could be an accumulator of either floating point or integer data. > As near as I can tell, Collection already does support float values in both > Python 2.7 and 3.6, and the way the code is implemented, this change should > be a no-op. All that is required is to update the documentation to say > floating-point values are allowed, as it currently says only integers are > allowed. > How should the `elements` method work? Currently it raises TypeError: integer argument expected, got float At least it should be documented that the method only works when all counts are integers. The error message could also state exactly what key it failed on. From steve at pearwood.info Tue Dec 19 06:04:35 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Dec 2017 22:04:35 +1100 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: <20171219110435.GA4215@ando.pearwood.info> On Mon, Dec 18, 2017 at 11:51:46PM +0000, Joel Croteau wrote: > It would be useful in many scenarios for values in collections.Counter to > be allowed to be floating point. Can you give a concrete example? > I know that Counter nominally emulates a multiset, > which would suggest only integer values, but in a more general > sense, it could be an accumulator of either floating point or integer data. > > As near as I can tell, Collection already does support float values in both > Python 2.7 and 3.6, and the way the code is implemented, this change should > be a no-op. All that is required is to update the documentation to say > floating-point values are allowed, as it currently says only integers are > allowed. I don't think its that simple. What should the elements() method do when an element has a "count" of 2.5, say? What happens if the count is a NAN? There are operations that discard negative and zero, or positive and zero, counts. How should they treat -0.0 and NANs? I am intrigued by this suggestion, but I'm not quite sure where I would use such an accumulator, or whether a Counter is the right solution for it. Perhaps some concrete use-cases would convince me. -- Steve From p.f.moore at gmail.com Tue Dec 19 06:08:29 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 19 Dec 2017 11:08:29 +0000 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: On 18 December 2017 at 23:51, Joel Croteau wrote: > It would be useful in many scenarios for values in collections.Counter to be > allowed to be floating point. Do you have any evidence of this? Code examples that would be significantly improved by such a change? I can't think of any myself. I might consider writing totals - defaultdict(float) for ...: totals[something] = calculation(something) but using a counter is neither noticeably easier, nor clearer... One way of demonstrating such a need would be if your proposed behaviour were available on PyPI and getting used a lot - I'm not aware of any such module if it is. Paul From jcroteau at gmail.com Tue Dec 19 22:09:07 2017 From: jcroteau at gmail.com (Joel Croteau) Date: Wed, 20 Dec 2017 03:09:07 +0000 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: Well here is some code I wrote recently to build a histogram over a weighted graph, before becoming aware that Counter existed (score is a float here): from collections import defaultdict total_score_by_depth = defaultdict(float) total_items_by_depth = defaultdict(int) num_nodes_by_score = defaultdict(int) num_nodes_by_log_score = defaultdict(int) num_edges_by_score = defaultdict(int) for state in iter_graph_components(): try: # There is probably some overlap here ak = state['ak'] _, c = ak.score_paths(max_depth=15) for edge in state['graph'].edges: num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1 for node in c.nodes: total_score_by_depth[node.depth] += node.score total_items_by_depth[node.depth] += 1 num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1 num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1 num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes) num_nodes_by_log_score[100.0] += len(state['graph'].nodes) - len(c.nodes) except MemoryError: print("Skipped massive.") Without going too much into what this does, note that I could replace the other defaultdicts with Counters, but I can't do the same thing with a total_score_by_depth, at least not without violating the API. I would suggest that with a name like Counter, treating a class like a Counter should be the more common use case. If it's meant to be a multiset, we should call it a Multiset. Here is an example from Stack Overflow of someone else also wanting a float counter, and the only suggestion being to use defaultdict: https://stackoverflow.com/questions/10900207/any-way-to-tackle-float-counter-values-in-python On Tue, Dec 19, 2017 at 3:08 AM Paul Moore wrote: > On 18 December 2017 at 23:51, Joel Croteau wrote: > > It would be useful in many scenarios for values in collections.Counter > to be > > allowed to be floating point. > > Do you have any evidence of this? Code examples that would be > significantly improved by such a change? I can't think of any myself. > > I might consider writing > > totals - defaultdict(float) > for ...: > totals[something] = calculation(something) > > but using a counter is neither noticeably easier, nor clearer... > > One way of demonstrating such a need would be if your proposed > behaviour were available on PyPI and getting used a lot - I'm not > aware of any such module if it is. > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Dec 20 05:05:22 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 20 Dec 2017 10:05:22 +0000 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: On 20 December 2017 at 03:09, Joel Croteau wrote: > Well here is some code I wrote recently to build a histogram over a weighted > graph, before becoming aware that Counter existed (score is a float here): > > from collections import defaultdict > > total_score_by_depth = defaultdict(float) > total_items_by_depth = defaultdict(int) > num_nodes_by_score = defaultdict(int) > num_nodes_by_log_score = defaultdict(int) > num_edges_by_score = defaultdict(int) > for state in iter_graph_components(): > try: > # There is probably some overlap here > ak = state['ak'] > _, c = ak.score_paths(max_depth=15) > for edge in state['graph'].edges: > num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1 > for node in c.nodes: > total_score_by_depth[node.depth] += node.score > total_items_by_depth[node.depth] += 1 > num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1 > num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1 > num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes) > num_nodes_by_log_score[100.0] += len(state['graph'].nodes) - > len(c.nodes) > except MemoryError: > print("Skipped massive.") > > Without going too much into what this does, note that I could replace the > other defaultdicts with Counters, but I can't do the same thing with a > total_score_by_depth, at least not without violating the API. Hmm, OK. I can't see any huge benefit from switching to a Counter, though. You're not using any features of a Counter that aren't shared by a defaultdict, nor is there any code here that could be simplified or replaced by using such features... > I would > suggest that with a name like Counter, treating a class like a Counter > should be the more common use case. If it's meant to be a multiset, we > should call it a Multiset. Personally, I consider "counting" to be something we do with integers (whole numbers), not with floats. So for me the name Counter clearly implies an integer. Multiset would be a reasonable alternative name, but Python has a tradition of using "natural language" names over "computer science" names, so I'm not surprised Counter was chosen instead. I guess it's ultimately a matter of opinion whether a float-based Counter is a natural extension or not. Paul From leewangzhong+python at gmail.com Wed Dec 20 15:55:01 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 20 Dec 2017 15:55:01 -0500 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On Mon, Dec 18, 2017 at 8:31 AM, Steve Barnes wrote: > > On 18/12/2017 11:43, Steven D'Aprano wrote: >> On Mon, Dec 18, 2017 at 07:54:17AM +0000, Steve Barnes wrote: >> >>> Isn't this exactly the sort of information already available via >>> inspect.getardspec, inspect.getsourcelines & inspect.getsource? >> >> [snip ipython interactive session showing that the answer is "Yes"] >> >> You answered your own question: yes it is. What's your point? That's not >> a rhetorical question -- I genuinely don't understand why you raise >> this. Do you see the existence of inspect.* as negating the usefulness >> of giving functions a more useful repr? As proof that the information is >> available? Something else? Wow, that's way too aggressive for this list. > Hi Steve, > > I see it as showing that the information is already available to anybody > who needs it so I question the usefulness of changing repr (for > everybody) which is bound to break something somewhere. Now if you were > to suggest adding a verbose flag to repr or a print format, maybe both - > so that people don't have to import inspect then I would say that you > have a lot less chance of breaking any code/test cases that is in the wild. > > Sorry I should have made my point(s) clearer. But is it readily available? Will the people who need the information know how to get it? No. IMO, `inspect` is somewhat arcane, and you shouldn't need its full power to use lambdas. Right now, lambdas are represented just by their memory location. That's not meaningful. Lambdas are, abstractly speaking, value objects, with value defined by its definition (including parameter names). (That's not really true in Python, where lambdas are not compared by just function definition.) A lambda's location is irrelevant to its functionality; unless you poke around its dunder bits, a lambda's functionality is completely determined by its signature, its body, and the closure. In short, the address doesn't really represent the lambda. Imagine a world where ints and strs are represented by type and address. That's not an ideal world. How you use an int or str has nothing to do with its memory location, and everything to do with its value. Regarding backward compatibility, I'm unsympathetic toward anyone who was relying on lambdas being represented by their addresses. At worst, they'd check for `repr(f).startswith('')`, which, if we really care, can be accounted for. I'm more concerned with future backward compatibility: if the repr is made meaningful now, it will definitely be harder to change in the future. In any case, it should be more a consideration than a reason against. Note that it's impossible in general to fulfill `eval(repr(myLambda)) == myLambda`, because of how lambdas are compared, but also because you won't necessarily have access to the same lexical scope as the original lambda, and because default arguments can hold state. From ned at nedbatchelder.com Wed Dec 20 16:47:18 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 20 Dec 2017 16:47:18 -0500 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: <312c2e51-c66b-1cd6-2c93-1a78ce90d02c@nedbatchelder.com> On 12/20/17 5:05 AM, Paul Moore wrote: > On 20 December 2017 at 03:09, Joel Croteau wrote: >> Well here is some code I wrote recently to build a histogram over a weighted >> graph, before becoming aware that Counter existed (score is a float here): >> >> from collections import defaultdict >> >> total_score_by_depth = defaultdict(float) >> total_items_by_depth = defaultdict(int) >> num_nodes_by_score = defaultdict(int) >> num_nodes_by_log_score = defaultdict(int) >> num_edges_by_score = defaultdict(int) >> for state in iter_graph_components(): >> try: >> # There is probably some overlap here >> ak = state['ak'] >> _, c = ak.score_paths(max_depth=15) >> for edge in state['graph'].edges: >> num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1 >> for node in c.nodes: >> total_score_by_depth[node.depth] += node.score >> total_items_by_depth[node.depth] += 1 >> num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1 >> num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1 >> num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes) >> num_nodes_by_log_score[100.0] += len(state['graph'].nodes) - >> len(c.nodes) >> except MemoryError: >> print("Skipped massive.") >> >> Without going too much into what this does, note that I could replace the >> other defaultdicts with Counters, but I can't do the same thing with a >> total_score_by_depth, at least not without violating the API. > Hmm, OK. I can't see any huge benefit from switching to a Counter, > though. You're not using any features of a Counter that aren't shared > by a defaultdict, nor is there any code here that could be simplified > or replaced by using such features... > >> I would >> suggest that with a name like Counter, treating a class like a Counter >> should be the more common use case. If it's meant to be a multiset, we >> should call it a Multiset. > Personally, I consider "counting" to be something we do with integers > (whole numbers), not with floats. So for me the name Counter clearly > implies an integer. Multiset would be a reasonable alternative name, > but Python has a tradition of using "natural language" names over > "computer science" names, so I'm not surprised Counter was chosen > instead. > > I guess it's ultimately a matter of opinion whether a float-based > Counter is a natural extension or not. > > One thing to note is that Counter supports negative numbers, so we are already outside the natural numbers :) ??? Python 3.6.4 (default, Dec 19 2017, 08:11:42) ??? >>> from collections import Counter ??? >>> c = Counter(a=4, b=2, c=0, d=-2) ??? >>> d = Counter(a=1, b=2, c=3, d=4) ??? >>> c.subtract(d) ??? >>> c ??? Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6}) ??? >>> list(c.elements()) ??? ['a', 'a', 'a'] --Ned. From leewangzhong+python at gmail.com Wed Dec 20 16:49:09 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 20 Dec 2017 16:49:09 -0500 Subject: [Python-ideas] Support floating-point values in collections.Counter In-Reply-To: References: Message-ID: On Mon, Dec 18, 2017 at 6:51 PM, Joel Croteau wrote: > It would be useful in many scenarios for values in collections.Counter to be > allowed to be floating point. I know that Counter nominally emulates a > multiset, which would suggest only integer values, but in a more general > sense, it could be an accumulator of either floating point or integer data. > As near as I can tell, Collection already does support float values in both > Python 2.7 and 3.6, and the way the code is implemented, this change should > be a no-op. All that is required is to update the documentation to say > floating-point values are allowed, as it currently says only integers are > allowed. That's beyond the scope of Counter. I think what you really want is a generalization of Counter which represents a key'd number bag. A dict of key=>number which supports arithmetic operations, like Numpy arrays are to lists. Example methods: __init__(...): Like dict's version, but it will combine the values of duplicate keys in its params. update(...): Similar to __init__. fromkeys(...): Like dict's version, but uses 0 or 1 as the default value, and combines values like the constructor. With value=1, this is roughly equivalent to the Counter constructor. : Arithmetic with other number bags, with dicts, and with number-like values. clearsmall(tolerance=0): Removes keys whose values are close to 0. Other methods may take inspiration from Numpy. The class should probably be a third-party package (and probably already is), so that the method list can be solidified first. From steve at pearwood.info Wed Dec 20 18:58:28 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 21 Dec 2017 10:58:28 +1100 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: <20171220235827.GH4215@ando.pearwood.info> On Wed, Dec 20, 2017 at 03:55:01PM -0500, Franklin? Lee wrote: > Wow, that's way too aggressive for this list. Hair-trigger sensitivity to "aggressiveness" is itself a form of aggression, because it leads to unfair accusations of aggressiveness and other over-reactions, and forces people to "walk on eggshells" rather than say what they are actually thinking. Remember that tone of voice doesn't communicate well across email. You should always read email and give the writer the benefit of the doubt: "if I were face to face with this person, and could see relaxed posture and a smile, or hear a friendly tone of voice, would I still think this was aggressive?" -- Steve From brett at python.org Wed Dec 20 22:43:22 2017 From: brett at python.org (Brett Cannon) Date: Thu, 21 Dec 2017 03:43:22 +0000 Subject: [Python-ideas] Repr of lambda In-Reply-To: <20171220235827.GH4215@ando.pearwood.info> References: <20171218114352.GD16230@ando.pearwood.info> <20171220235827.GH4215@ando.pearwood.info> Message-ID: I'm in the middle of moving, so I'm not planning to take this any farther than this email unless someone explicitly brings up an issue by emailing Python-ideas-owner where Titus can help anyone out. On Wed, Dec 20, 2017, 16:11 Steven D'Aprano, wrote: > On Wed, Dec 20, 2017 at 03:55:01PM -0500, Franklin? Lee wrote: > > > Wow, that's way too aggressive for this list. > I personally disagree as I didn't read anything from Steven as aggressive in his response. I think a better response would have been, "I don't think you mean for what you said to be portrayed as aggressive, but I read as if you were saying ". Otherwise to me it comes off as dealing with aggression with aggression. > Hair-trigger sensitivity to "aggressiveness" is itself a form of > aggression, because it leads to unfair accusations of aggressiveness > and other over-reactions, and forces people to "walk on eggshells" > rather than say what they are actually thinking. > I also disagree with this. ? Any form of aggression isn't really necessary to appropriately communicate an idea or viewpoint, so taking some time to consider how you phrase something so it doesn't come off as aggressive to people in general is a good thing. Given a choice between allowing occasional aggression so that some don't feel they have to be careful in their phrasing "just in case" compared to people having to take extra time in their response to avoid aggression, I always go with the latter. For me, any extra effort to be courteous is worth it. > Remember that tone of voice doesn't communicate well across email. You > should always read email and give the writer the benefit of the doubt: > "if I were face to face with this person, and could see relaxed posture > and a smile, or hear a friendly tone of voice, would I still think this > was aggressive?" > This I do agree with. Give people the benefit of the doubt, and if you want you can kindly inform the person how you could have interpreted what they said as aggressive and why you thought that. This gives people a chance to explain what they actually meant and for people to potentially apologize for the misunderstanding. -Brett > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 21 01:57:39 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 20 Dec 2017 22:57:39 -0800 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On Mon, Dec 18, 2017 at 5:31 AM, Steve Barnes wrote: > I see it as showing that the information is already available to anybody > who needs it so I question the usefulness of changing repr (for > everybody) > @dataclass > class C: > a: "the a parameter" # field with no default > b: "another, different parameter" = 0.0 # field with a default > . well, digging into inspect and all that is definitely an advanced process -- repr is for a quick look-see at the value of an object -- it would be great to have one that was more informative. and in theory, the "goal" is for eval(repr(obj)) to return an equivelent object -- surely that would require showing the arguments and expression, yes? But is it bound to break somethign somewhere? given how, well, useless the current lambda repr is, I can't imagine much breaking. But then I"ve been known to lack imagination :-) As for "proper" functions, I think it's pretty much off the table -- they are simply too often complicated beasts with lots of parameters, lots of code, multiple return possibilities, etc. Is there a downside other than possible breakage? Performance issue, maybe? And with regards to breakage -- anyone have code that would break (yeah, I know, small sample, but if the answer is yes, lots, then we're done) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Thu Dec 21 15:34:15 2017 From: barry at barrys-emacs.org (Barry) Date: Thu, 21 Dec 2017 20:34:15 +0000 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: > On 21 Dec 2017, at 06:57, Chris Barker wrote: > > in theory, the "goal" is for eval(repr(obj)) to return an equivelent object Is that really was the goal of repr? If true then we would not need pickle. I have always assumed that repr of simple things aims to represent them in just the way you would write them in python code. Repr of complex things represents the obj as a useful summary. Lamba seems to be in the complex end of things. In debug logs I am often very interested in object identity and use the 0x123 as one way to know. Removing the unique id would be a regression in my eyes. Maybe what you would like to have is an explain function that given any object tells you alll about it. help function does some of this I guess. Barry From ericfahlgren at gmail.com Thu Dec 21 17:23:59 2017 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Thu, 21 Dec 2017 14:23:59 -0800 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: Could we call it "help"? Maybe add some beef to what's already there... >>> help(lambda x,y,*args: x) Help on function in module __main__: lambda x, y, *args On Thu, Dec 21, 2017 at 12:34 PM, Barry wrote: > > > > On 21 Dec 2017, at 06:57, Chris Barker wrote: > > > > in theory, the "goal" is for eval(repr(obj)) to return an equivelent > object > > Is that really was the goal of repr? If true then we would not need pickle. > > I have always assumed that repr of simple things aims to represent them > in just the way you would write them in python code. Repr of complex things > represents the obj as a useful summary. > > Lamba seems to be in the complex end of things. > > In debug logs I am often very interested in object identity and use the > 0x123 as one way to know. Removing the unique id would be a regression > in my eyes. > > Maybe what you would like to have is an explain function that given any > object tells you alll about it. help function does some of this I guess. > > Barry > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Dec 21 16:33:07 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 21 Dec 2017 13:33:07 -0800 (PST) Subject: [Python-ideas] Allow star unpacking within an slice expression In-Reply-To: References: Message-ID: I didn't think of this when we were discussing 448. I ran into this today, so I agree with you that it would be nice to have this. Best, Neil On Monday, December 4, 2017 at 1:02:09 AM UTC-5, Eric Wieser wrote: > > Hi, > > I've been thinking about the * unpacking operator while writing some numpy > code. PEP 448 allows the following: > > values = 1, *some_tuple, 2 > object[(1, *some_tuple, 2)] > > It seems reasonable to me that it should be extended to allow > > item = object[1, *some_tuple, 2] > item = object[1, *some_tuple, :] > > Was this overlooked in the original proposal, or deliberately rejected? > > Eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 21 17:43:14 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 21 Dec 2017 14:43:14 -0800 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On Thu, Dec 21, 2017 at 12:34 PM, Barry wrote: > > On 21 Dec 2017, at 06:57, Chris Barker wrote: > > > > in theory, the "goal" is for eval(repr(obj)) to return an equivalent > object > > Is that really was the goal of repr? I think so -- see the current discussion about pprint and dict order.... > If true then we would not need pickle. > well, it only a goal, and it's not going to work for complex objects... I have always assumed that repr of simple things aims to represent them > in just the way you would write them in python code. Repr of complex things > represents the obj as a useful summary. > pretty much, yes. > Lamba seems to be in the complex end of things. > I think that's where the key disagreement comes in -- I think we'd al agree that regular, def-defined functions are well in the complex end of things. But lambda is limited to a single expression, so it can only get so complex -- granted you could nest a lot of parentheses and function calls and have a very complex expression, but the common use case is pretty compact. Some reprs will truncate the result if the objec is huge -- numpy arrays come to mind. In [14]: arr = np.array(range(10000)) In [15]: repr(arr) Out[15]: 'array([ 0, 1, 2, ..., 9997, 9998, 9999])' so if folks are worried that it could get too long, it could be limited. Though I note that lists don;t seem to do anything like that -- try a 10,000 element list. In debug logs I am often very interested in object identity and use the > 0x123 as one way to know. Removing the unique id would be a regression > in my eyes. > Every python object has an object identity, and the way to get it is with the id() function. The id is also part of the default object repr, but given that some, but only some objects have the id in their repr, it's probably better to use id() in you logs if you care. And in the case of lambda, wouldn't you rather see what the lambda actually WAS than what it's id is? Is there any downside other than backward compatibility concerns? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Dec 21 18:31:35 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 22 Dec 2017 10:31:35 +1100 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On Fri, Dec 22, 2017 at 9:43 AM, Chris Barker wrote: > Every python object has an object identity, and the way to get it is with > the id() function. The id is also part of the default object repr, but given > that some, but only some objects have the id in their repr, it's probably > better to use id() in you logs if you care. > > And in the case of lambda, wouldn't you rather see what the lambda actually > WAS than what it's id is? > > Is there any downside other than backward compatibility concerns? It's probably worth hanging onto the id, in case the same function (from the same line of code) is used in multiple contexts. But IMO having the text of the function would be very useful - as long as it can be done without costing too much time or memory. So I'm +0.75 on the idea, with the caveat that it'd have to be implemented and performance-tested to make sure it doesn't kill the common case of a lambda function being created, used, and then dropped (think of a sort key function, for instance). ChrisA From ncoghlan at gmail.com Sat Dec 23 20:18:08 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 24 Dec 2017 11:18:08 +1000 Subject: [Python-ideas] Repr of lambda In-Reply-To: References: <20171218114352.GD16230@ando.pearwood.info> Message-ID: On 22 Dec. 2017 12:32 pm, "Chris Angelico" wrote: On Fri, Dec 22, 2017 at 9:43 AM, Chris Barker wrote: > Every python object has an object identity, and the way to get it is with > the id() function. The id is also part of the default object repr, but given > that some, but only some objects have the id in their repr, it's probably > better to use id() in you logs if you care. > > And in the case of lambda, wouldn't you rather see what the lambda actually > WAS than what it's id is? > > Is there any downside other than backward compatibility concerns? It's probably worth hanging onto the id, in case the same function (from the same line of code) is used in multiple contexts. But IMO having the text of the function would be very useful - as long as it can be done without costing too much time or memory. So I'm +0.75 on the idea, with the caveat that it'd have to be implemented and performance-tested to make sure it doesn't kill the common case of a lambda function being created, used, and then dropped (think of a sort key function, for instance). Having the repr report the defining module name & line number in addition to the object ID could be a low cost way of making the definition easier to find without adding any overhead in the common case. Cheers, Nick. ChrisA _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Tue Dec 26 04:13:43 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Tue, 26 Dec 2017 04:13:43 -0500 Subject: [Python-ideas] __intancehook__ special method Message-ID: Hello everybody! In a personnal project I feel the need (or the desire) to implement something like this: assert isinstance(1, PositiveInteger) assert not isinstance(-1, PositiveInteger) So I began looking a lot in the abc module, and I end unp using an __instancehook__ special method wich is called by __instancechek__ in the corresponding metaclass, just like the __subclasshook__ special method called by __subclasscheck__. The core idea is something like this: class MyMeta(type): def __instancecheck__(cls, instance): return cls.__instancehook__(instance) class PositiveInteger(metaclass=MyMeta): @classmethod def __instancehook__(cls, instance): return isinstance(instance, int) and instance > 0 Of course, the real implemention is more detailed... What do you think about that ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Dec 26 06:14:22 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 26 Dec 2017 11:14:22 +0000 Subject: [Python-ideas] __intancehook__ special method In-Reply-To: References: Message-ID: On 26 December 2017 at 09:13, Yahya Abou 'Imran via Python-ideas wrote: > In a personnal project I feel the need (or the desire) to implement > something like this: > > assert isinstance(1, PositiveInteger) > assert not isinstance(-1, PositiveInteger) To me, this seems like an over-use of classes for something they are not really appropriate for (in Python, at least). In my experience, it's people coming from other languages that are more strongly class based, such as Java, that prefer this type of construct. In Python, I'd write this as assert isinstance(1, int) and 1 > 0 assert not isinstance(-1, int) or -1 < 0 # or maybe you meant isinstance(-1, int) and -1 < 0 for the second one...? That reads far more naturally to me in Python. > So I began looking a lot in the abc module, and I end unp using an > __instancehook__ special method wich is called by __instancechek__ in the > corresponding metaclass, just like the __subclasshook__ special method > called by __subclasscheck__. [...] > What do you think about that ? I don't think it's needed - it feels to me like a solution to a problem that Python doesn't have in practice. (Of course, there may be more complex or specialised cases where it would be useful, but as you've demonstrated, you can write the implementation yourself for the rare cases it might be needed, so that may well be sufficient). Thanks for the idea - it's interesting to see how other people approach problems like this even if it turns out not to be something worth adding. Paul From yahya-abou-imran at protonmail.com Tue Dec 26 07:02:49 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Tue, 26 Dec 2017 07:02:49 -0500 Subject: [Python-ideas] __intancehook__ special method In-Reply-To: References: Message-ID: <3_fPQZ_AdSl28_xRjXjSI1Xfv4ljpNGZiosilPq4aR635BQpMsYWFRTyNiASgd5V56NP7P8Px5cCVT8YNXB4USmY5L-0-oCm-ibGi-EoIvQ=@protonmail.com> Well, I fact, I was playing with metaclasses and descriptors, and I was thinking about a way to use the famous variable annotation to instatiate decrptors like this: class Point(metaclass=MyMeta): x: int y: int class Circle(metaclass=MyMeta): center: Point radius: PositiveInteger Here the radius must be [strictly] positive, but since I'm using the annotations to get the type to test in the __set__ method of the descriptor, I didn't found a more elegant way than that... And it was before I knew the existance of dataclasses!!! I was pretty shocked by the way. Maybe I can make this compatible with it anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Dec 26 14:53:49 2017 From: brett at python.org (Brett Cannon) Date: Tue, 26 Dec 2017 19:53:49 +0000 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: [removing Python-ideas-owner so any replies don't flood my inbox] On Sat, Dec 23, 2017, 09:23 William Rose, wrote: > I had an idea that it could be helpful to have local functions as well as > normal ones. They would be called the same way as normal ones but def would > be replaced by internal and inside they could only access variables they > have defined and inputs to them so no global variables or class variables. > I think this could be used to save people accidentally changing variables > you dont' want to change when updating your code. Let me know what you > think! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paddy3118 at gmail.com Tue Dec 26 15:07:52 2017 From: paddy3118 at gmail.com (Paddy3118) Date: Tue, 26 Dec 2017 12:07:52 -0800 (PST) Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? Message-ID: Maybe it is time to deemphasize the creation and passing of what is, in effect, bit-sets as a single argument, and instead promote the passing of a set of members of an enum.Enum constants. Thi comes about because someone wrote a description, (since deleted), of constructing bit-sets to use in passing flags to, for example, the re.compile function. The use of individual bits in a bit-array/bit-set to pass multiple flags is an implementation detail. Should we not *first *teach the passing of a set of enum.Enum constant values in one argument as the *pythonic *way; and leave bit-sets and other types of enum's as a performance or interoperability detail? Comments please, (And a happy new year to you :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Dec 26 18:28:34 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 26 Dec 2017 15:28:34 -0800 (PST) Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: Message-ID: <5c3c6938-fcd6-408e-a8e4-6647e570463e@googlegroups.com> Wouldn't it be more pythonic to just pass flags to re.compile as Boolean keyword arguments? On Tuesday, December 26, 2017 at 3:07:52 PM UTC-5, Paddy3118 wrote: > > Maybe it is time to deemphasize the creation and passing of what is, in > effect, bit-sets as a single > argument, and instead promote the passing of a set of members of an > enum.Enum constants. > > Thi comes about because someone wrote a description, (since deleted), of > constructing bit-sets to use in passing flags to, for example, the > re.compile function. > The use of individual bits in a bit-array/bit-set to pass multiple flags is > an implementation detail. Should we not *first *teach the passing of a > set of enum.Enum constant values in one argument as the *pythonic *way; > and leave bit-sets and other types of enum's as a performance or > interoperability detail? > > Comments please, (And a happy new year to you :-) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Wed Dec 27 00:06:28 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 27 Dec 2017 00:06:28 -0500 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: On Sat, Dec 23, 2017, 09:23 William Rose, wrote: > > I had an idea that it could be helpful to have local functions as well as > normal ones. They would be called the same way as normal ones but def would > be replaced by internal and inside they could only access variables they > have defined and inputs to them so no global variables or class variables. I > think this could be used to save people accidentally changing variables you > dont' want to change when updating your code. Let me know what you think! You mean like this? internal myfunc(x, y, z): return sum(map(int, [x,y,z])) # SyntaxError: Undefined name 'sum'. You may want to loosen the restrictions and allow builtins. However, it is possible to redefine/create builtin names during runtime. You may also want to allow explicit declarations for global/nonlocal names, using the global and nonlocal keywords. You won't be able to access class variables, because you won't be able to access classes. This kind of function prevents a common use for functions: taking a section of an existing function and giving it a name. The proposed `internal` function type will encourage large functions that break the Rule of Three*, and require people to opt in to gain any advantages. Once they opt in, they would have to then opt out if they try to apply the Rule of Three. *https://en.wikipedia.org/wiki/Rule_of_three_%28computer_programming%29 Can you give an example of how you would use this? Could your problem perhaps be better solved with namespaces or refactoring tools? From steve at pearwood.info Wed Dec 27 00:56:39 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 27 Dec 2017 16:56:39 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: Message-ID: <20171227055639.GP4215@ando.pearwood.info> On Tue, Dec 26, 2017 at 12:07:52PM -0800, Paddy3118 wrote: > Maybe it is time to deemphasize the creation and passing of what is, in > effect, bit-sets as a single > argument, and instead promote the passing of a set of members of an > enum.Enum constants. That implies that we are promoting bit sets (not merely using them). Where are we doing that? > Thi comes about because someone wrote a description, (since deleted), "Someone"? One of the Python core developers? If not, well, the Python devs cannot be held responsible for what random people on the internet write. > of > constructing bit-sets to use in passing flags to, for example, the > re.compile function. > The use of individual bits in a bit-array/bit-set to pass multiple flags is > an implementation detail. It certainly is not an implementation detail -- it is a part of the public, published interface to the re module. Of course had history been different, re.compile *could* have taken a set of Enums instead. But that doesn't make it an implementation detail. That would be like saying that: re.search(pattern, text, flags=0) is an implementation detail, and we should feel free to change it to re.examine(text, pattern, set_of_enums=frozenset()) We can't change a public interface documented as taking an integer to one taking a set of Enums without going through a deprecation period. However, we could *add* an additional interface, where the re functions that currently accept an integer flag *also* accepts a set of Enums. > Should we not *first *teach the passing of a set > of enum.Enum constant values in one argument as the *pythonic *way; What makes you say that is the Pythonic way? (That's not a rhetorical question.) -- Steve From erik.m.bray at gmail.com Thu Dec 28 05:10:38 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Thu, 28 Dec 2017 11:10:38 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: <5A2AD7FC.6050304@stoneleaf.us> References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: On Fri, Dec 8, 2017 at 7:20 PM, Ethan Furman wrote: > On 12/08/2017 04:33 AM, Erik Bray wrote: > >> More importantly not as many objects that coerce to int actually >> implement __index__. They probably *should* but there seems to be >> some confusion about how that's to be used. > > > __int__ is for coercion (float, fraction, etc) > > __index__ is for true integers > > Note that if __index__ is defined, __int__ should also be defined, and > return the same value. > > https://docs.python.org/3/reference/datamodel.html#object.__index__ This doesn't appear to be enforced, though I think maybe it should be. I'll also note that because of the changes I pointed out in my original post, it's now necessary for me to explicitly cast as int() objects that previously "just worked" when passed as arguments in some functions in itertools, collections, and other modules with C implementations. However, this is bad because if some broken code is passing floats to these arguments, they will be quietly cast to int and succeed, when really I should only be accepting objects that have __index__. There's no index() alternative to int(). I think changing all these functions to do the appropriate PyIndex_Check is a correct and valid fix, but I think it also stretches beyond the original purpose of __index__. I think that __index__ is relatively unknown, and perhaps there should be better documentation as to when and how it should be used over the better-known __int__. From storchaka at gmail.com Thu Dec 28 14:42:36 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 28 Dec 2017 21:42:36 +0200 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: 28.12.17 12:10, Erik Bray ????: > There's no index() alternative to int(). operator.index() From jsbueno at python.org.br Thu Dec 28 15:09:32 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 28 Dec 2017 18:09:32 -0200 Subject: [Python-ideas] Deprecate "slice" on built-ins, move it to "types"? Message-ID: This is probably too little to justify the compatibility breakage, but is there a motive for the "slice" type to be on built-ins ? (besides people forgot it there at PEP-3000 time?) It is normally used in super-specialized cases, mostly when one is implementing a Sequence type, and even there just for type-checking, not to create new slice objects. It seems to me it should lie on the "types" module (or some other place), rather than eat built-in namespace (which can confuse newcomers anyway). As I said, the benefits of moving it to a less prominent place may be to little, and incompatibilities would ensue since all modules using it don't expect to have to import it anyway. But maybe thsi should be a thing to keep an eye on for some future Python release? js -><- From paddy3118 at gmail.com Thu Dec 28 15:23:53 2017 From: paddy3118 at gmail.com (Paddy3118) Date: Thu, 28 Dec 2017 12:23:53 -0800 (PST) Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <20171227055639.GP4215@ando.pearwood.info> References: <20171227055639.GP4215@ando.pearwood.info> Message-ID: <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> Hi Steve, I did not write an attack on the "Python devs". Re-read my original with a little less hostility and there should be room for an interpretation, (which I meant), that does not warrant such a hostile reply. The original is written in the hope of furthering discussion on the need for what is deemed pythonic , and on what Python is taught , as the language itself changes. We now have enums if you want to pass a set of flags to a function then you could have them as separate arguments - but that leads to long and cumbersome parameter lists; you could, and many do, have flags that are individual powers of two and then or them together and pass the result - creating a bitset; but we can now have the flags as separate enum.Enums and pass a set of values to a function as the flag set. This new way means that people being taught the method can use a knowledge of sets and enums - no need to know about powers of two,, what happens when they are bit-or'd together; and bitsets. We have gone to a higher level description of what we are doing; no need to mired in the details of the how of what one wants to achieve. bitsets can be taught as an optimisation. As for re, and otheralready written libraries, their is no need to change them, but other Pythoneers might well opt to not use bitsets, but rather sets of enum values. On Wednesday, 27 December 2017 05:58:00 UTC, Steven D'Aprano wrote: > > On Tue, Dec 26, 2017 at 12:07:52PM -0800, Paddy3118 wrote: > > > Maybe it is time to deemphasize the creation and passing of what is, in > > effect, bit-sets as a single > > argument, and instead promote the passing of a set of members of an > > enum.Enum constants. > > That implies that we are promoting bit sets (not merely using them). > Where are we doing that? > > > > Thi comes about because someone wrote a description, (since deleted), > > "Someone"? One of the Python core developers? > > If not, well, the Python devs cannot be held responsible for what random > people on the internet write. > > > > of > > constructing bit-sets to use in passing flags to, for example, the > > re.compile function. > > > The use of individual bits in a bit-array/bit-set to pass multiple flags > is > > an implementation detail. > > It certainly is not an implementation detail -- it is a part of the > public, published interface to the re module. > > Of course had history been different, re.compile *could* have taken a > set of Enums instead. But that doesn't make it an implementation detail. > That would be like saying that: > > re.search(pattern, text, flags=0) > > is an implementation detail, and we should feel free to change it to > > re.examine(text, pattern, set_of_enums=frozenset()) > > We can't change a public interface documented as taking an integer to > one taking a set of Enums without going through a deprecation period. > > However, we could *add* an additional interface, where the re functions > that currently accept an integer flag *also* accepts a set of Enums. > > > > Should we not *first *teach the passing of a set > > of enum.Enum constant values in one argument as the *pythonic *way; > > What makes you say that is the Pythonic way? (That's not a rhetorical > question.) > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Dec 28 19:15:10 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 29 Dec 2017 11:15:10 +1100 Subject: [Python-ideas] Deprecate "slice" on built-ins, move it to "types"? In-Reply-To: References: Message-ID: <20171229001505.GQ4215@ando.pearwood.info> On Thu, Dec 28, 2017 at 06:09:32PM -0200, Joao S. O. Bueno wrote: > This is probably too little to justify the compatibility breakage, Indeed. [...] > It seems to me it should lie on the "types" module (or some other > place), rather than > eat built-in namespace (which can confuse newcomers anyway). Have you actually come across newcomers who are confused by slice being a built-in? I've been on the tutor mailing list, and comp.lang.python, for many years now and I don't recall anyone even asking about slice let alone being confused by it. -- Steve From njs at pobox.com Thu Dec 28 20:06:07 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Dec 2017 17:06:07 -0800 Subject: [Python-ideas] Deprecate "slice" on built-ins, move it to "types"? In-Reply-To: References: Message-ID: On Dec 28, 2017 12:10, "Joao S. O. Bueno" wrote: This is probably too little to justify the compatibility breakage, but is there a motive for the "slice" type to be on built-ins ? (besides people forgot it there at PEP-3000 time?) It is normally used in super-specialized cases, mostly when one is implementing a Sequence type, and even there just for type-checking, not to create new slice objects. It does get called sometimes in numerical code to construct complex indexing operations. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Thu Dec 28 21:31:25 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Thu, 28 Dec 2017 21:31:25 -0500 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: On Thu, Dec 28, 2017 at 5:21 AM, William Rose wrote: > I agree with the point that it should allow builtin but the main purpose of > it is to not allow global variables But functions are also accessed using global names. What is your answer to the potential problem of programmers being reluctant to factor out code into new functions? From paddy3118 at gmail.com Fri Dec 29 00:20:10 2017 From: paddy3118 at gmail.com (Paddy3118) Date: Thu, 28 Dec 2017 21:20:10 -0800 (PST) Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> Message-ID: <214e4d1f-23b7-4717-a93a-02ee9e93de81@googlegroups.com> Hi Inada, can I take your point as being that execution speed is an issue? I am coming from the "Python as higher level, more programmer-centric language"direction. In the past, useful but slow things have spurred actions to speed them up. From rosuav at gmail.com Fri Dec 29 01:01:18 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 29 Dec 2017 17:01:18 +1100 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: On Fri, Dec 29, 2017 at 1:31 PM, Franklin? Lee wrote: > On Thu, Dec 28, 2017 at 5:21 AM, William Rose wrote: >> I agree with the point that it should allow builtin but the main purpose of >> it is to not allow global variables > > But functions are also accessed using global names. What is your > answer to the potential problem of programmers being reluctant to > factor out code into new functions? Code review, training, mentorship. If you try to make the language too restrictive, all you end up doing is forcing people to creatively get around its limitations. Make the language expressive, and then teach people how to use it well. Can you show some examples of code that would be improved by this "internal function" concept? ChrisA From leewangzhong+python at gmail.com Fri Dec 29 01:03:15 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 29 Dec 2017 01:03:15 -0500 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: On Fri, Dec 29, 2017 at 1:01 AM, Chris Angelico wrote: > On Fri, Dec 29, 2017 at 1:31 PM, Franklin? Lee > wrote: >> On Thu, Dec 28, 2017 at 5:21 AM, William Rose wrote: >>> I agree with the point that it should allow builtin but the main purpose of >>> it is to not allow global variables >> >> But functions are also accessed using global names. What is your >> answer to the potential problem of programmers being reluctant to >> factor out code into new functions? > > Code review, training, mentorship. If you try to make the language too > restrictive, all you end up doing is forcing people to creatively get > around its limitations. Make the language expressive, and then teach > people how to use it well. > > Can you show some examples of code that would be improved by this > "internal function" concept? Hey, I'm the one saying that internal functions will promote bad habits. From steve at pearwood.info Fri Dec 29 03:18:16 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 29 Dec 2017 19:18:16 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> Message-ID: <20171229081816.GT4215@ando.pearwood.info> On Thu, Dec 28, 2017 at 12:23:53PM -0800, Paddy3118 wrote: > Hi Steve, I did not write an attack on the "Python devs". I didn't say you attacked anyone and your implication that I did is unfair. > Re-read my > original with a little less hostility and there should be room for an > interpretation, (which I meant), that does not warrant such a hostile reply. Please re-read my response without the hair-trigger defensiveness. Disagreement is not hostility. Questioning your statements is not hostility. None of us are entitled to only positive responses that agree with what we suggest, but we are all entitled to the presumption that we are arguing in good faith. See also: https://mail.python.org/pipermail/python-ideas/2017-December/048448.html and: https://mail.python.org/pipermail/python-ideas/2017-December/048449.html (I don't agree with *everything* Brett says, but its a good starting point.) > The original is written in the hope of furthering discussion on the need > for what is deemed pythonic , and on what Python is taught , as the > language itself changes. Right -- and I responded to that discussion. I asked some genuine questions which you haven't responded to. Let me rephrase them: - Are we currently promoting ints as bitsets? - How is it relevant that "someone" (who?) wrote a description of using ints as bitsets and then deleted it? - Why is a set of Enums the Pythonic way? > We now have enums if you want to pass a set of flags to a function then you > could have them as separate arguments - but that leads to long and > cumbersome parameter lists; you could, and many do, have flags that are > individual powers of two and then or them together and pass the result - > creating a bitset; but we can now have the flags as separate enum.Enums and > pass a set of values to a function as the flag set. And no one is stopping anyone from writing code that does so. > This new way means that people being taught the method can use a knowledge > of sets and enums - no need to know about powers of two,, what happens when > they are bit-or'd together; and bitsets. Personally, I think that sets and Enums are no easier to learn than bit twiddling. But YMMV. > We have gone to a higher level description of what we are doing; no > need to mired in the details of the how of what one wants to achieve. Whether we use an int or a set of Enums, we still need to understand the details of how to set and clear flags, and test for them. THE_FLAG = 8 the_flags = THE_FLAG | ANOTHER_FLAG if the_flags & THE_FLAG: print("flag is set") versus: class MyFlags(Enum): THE_FLAG = "whatever" the flags = {THE_FLAG, ANOTHER_FLAG} if MyFlags.THE_FLAG in the_flags: print("flag is set") Honestly, I'm not seeing a *lot* of difference here. [...] > As for re, and otheralready written libraries, their is no need to change > them, I see. It wasn't clear to me. It seemed to me that you were suggesting changing the re module, since that was just an "implementation" detail. > but other Pythoneers might well opt to not use bitsets, but rather > sets of enum values. Since ints don't provide a set-like interface, they aren't strictly speaking bitsets. But in any case, nobody is stopping people from using sets of enum values. If you have a concrete proposal, then please state it explicitly. I thought I understood your proposal: change the implementation of re to use sets of Enums in order to promote their use and act as an example of best practice and Pythonic style but apparently I misunderstood. Sorry. So now I'm left puzzled as to what you actually want. Can you be explicit and explain what you expect us to do when you say we should "teach" and "promote" (your words) Enums instead of using ints? Please be concrete. - Should we change the standard library, and if so, in what ways? - Should we change the documentation? Which parts? - Insert something in PEP 8 about using sets of Enums? - Re-write the tutorial? Where? - Something else? Without a concrete proposal, I don't think this discussion is going anywhere. -- Steve From rosuav at gmail.com Fri Dec 29 03:26:18 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 29 Dec 2017 19:26:18 +1100 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: On Fri, Dec 29, 2017 at 5:03 PM, Franklin? Lee wrote: > On Fri, Dec 29, 2017 at 1:01 AM, Chris Angelico wrote: >> On Fri, Dec 29, 2017 at 1:31 PM, Franklin? Lee >> wrote: >>> On Thu, Dec 28, 2017 at 5:21 AM, William Rose wrote: >>>> I agree with the point that it should allow builtin but the main purpose of >>>> it is to not allow global variables >>> >>> But functions are also accessed using global names. What is your >>> answer to the potential problem of programmers being reluctant to >>> factor out code into new functions? >> >> Code review, training, mentorship. If you try to make the language too >> restrictive, all you end up doing is forcing people to creatively get >> around its limitations. Make the language expressive, and then teach >> people how to use it well. >> >> Can you show some examples of code that would be improved by this >> "internal function" concept? > > Hey, I'm the one saying that internal functions will promote bad habits. My apologies for the lack of clarity; this "you" was addressing the OP primarily. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Dec 29 05:09:57 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 29 Dec 2017 19:09:57 +0900 Subject: [Python-ideas] Internal function idea In-Reply-To: References: Message-ID: <23110.5237.557790.530122@turnbull.sk.tsukuba.ac.jp> On Sat, Dec 23, 2017, 09:23 William Rose, wrote: > I had an idea that it could be helpful to have local functions as > well as normal ones. They would be called the same way as normal > ones but def would be replaced by internal and inside they could > only access variables they have defined and inputs to them so no > global variables or class variables. I think this could be used > to save people accidentally changing variables you dont' want to > change when updating your code. Let me know what you think! I suspect you misunderstand how variables (actually, name bindings) work in Python. If you do understand, I don't understand what you're guarding against. With current semantics, code inside a function cannot change a global binding, although it can refer to one: >>> def f(): ... x=3 ... return x ... >>> def g(): ... return x ... >>> x = 0 >>> y = f() # Changes x? >>> z = g() >>> print("x =", x, "y =", y, "z =", z) x = 0 y = 3 z = 0 # Nope. >>> def h(): ... y = x + 5 ... x = y # Changes x? ... return x ... >>> w = h() # Nope, it crashes your program instead! Traceback (most recent call last): File "", line 1, in File "", line 2, in h UnboundLocalError: local variable 'x' referenced before assignment You *can* use the global and nonlocal declarations to override the normal automatic assumption that names are local to the scope in which they are bound: >>> def a(): ... global x ... x = 42 ... return x ... >>> a() 42 >>> x 42 Prohibiting this last would be un-Pythonic, IMO (violates the "consenting adults" principle). Similarly for class variables, which can only be accessed using attribute notation. There are also conventions, prefixing "_" or "__", which indicate "this is private, mess with it at your own risk", and actually munge the name internally to make it impossible to access accidentally (including in derived classes). From rosuav at gmail.com Fri Dec 29 06:26:22 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 29 Dec 2017 22:26:22 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <20171229081816.GT4215@ando.pearwood.info> References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> Message-ID: On Fri, Dec 29, 2017 at 7:18 PM, Steven D'Aprano wrote: > Since ints don't provide a set-like interface, they aren't strictly > speaking bitsets. But in any case, nobody is stopping people from using > sets of enum values. I'm not sure what "set-like interface" you'd be looking for, but the built-in set type has a lot of the same operations as an integer does, and the semantics are virtually identical to a set of bits. The only one you really lack is __contains__, which could easily be added: class BitSet(int): def __contains__(self, bit): return (self & bit) == bit >>> x = BitSet(1|2|8|32) >>> 2 in x True >>> 4 in x False Set union, intersection, etc are all provided using the same operators in sets and ints. ChrisA From erik.m.bray at gmail.com Fri Dec 29 07:58:08 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Fri, 29 Dec 2017 13:58:08 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka wrote: > 28.12.17 12:10, Erik Bray ????: >> >> There's no index() alternative to int(). > > > operator.index() Okay, and it's broken. That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it. From ncoghlan at gmail.com Fri Dec 29 09:43:07 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 Dec 2017 00:43:07 +1000 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: On 29 December 2017 at 22:58, Erik Bray wrote: > On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka wrote: >> 28.12.17 12:10, Erik Bray ????: >>> >>> There's no index() alternative to int(). >> >> >> operator.index() > > Okay, and it's broken. Broken in what way? It has a fairly extensive test suite in https://github.com/python/cpython/blob/master/Lib/test/test_index.py (and some additional indirect testing in test_slice.py, which assumes that it works as advertised). > That doesn't change my other point that some > functions that could previously take non-int arguments can no > longer--if we agree on that at least then I can set about making a bug > report and fixing it. The size_t, ssize_t and void pointer conversions should only accept true integers (so either no fallback, or fall back to `__index__`). The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`. I'm less sure about the conversion to double, but allowing that to be used on float objects without reporting a type error seems like a bug magnet to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Fri Dec 29 09:56:19 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 29 Dec 2017 15:56:19 +0100 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: <20171229155619.7d13a84d@fsol> On Sat, 30 Dec 2017 00:43:07 +1000 Nick Coghlan wrote: > > > That doesn't change my other point that some > > functions that could previously take non-int arguments can no > > longer--if we agree on that at least then I can set about making a bug > > report and fixing it. > > The size_t, ssize_t and void pointer conversions should only accept > true integers (so either no fallback, or fall back to `__index__`). > > The unsigned long and unsigned long long conversions should likely be > consistent with their signed counterparts and allow lossy conversions > via `__int__`. That is the statu quo indeed... but the destination C type shouldn't be used as a criterion of which __dunder__ method is called. For example, let's assume I'm writing a piece of code that expects a pid number. The C type is `pid_t`, which presumably translates either to a C `int` or `long` (*). But it's not right to accept a float there... I think we really need a bunch a `PyIndex_AsXXX` functions (`PyIndex_AsLong`, etc.) to complement the current `PyLong_AsXXX` functions. That way, every `PyLong_AsXXX` function can continue calling `__int__` (if they ever did so), while `PyIndex_AsXXX` would only call `__index__`. (*) I got curious and went through the maze of type definitions on GNU/Linux. Which gives: #define __S32_TYPEDEF __signed__ int #define __PID_T_TYPE __S32_TYPE __STD_TYPE __PID_T_TYPE __pid_t; typedef __pid_t pid_t; Regards Antoine. From Richard at Damon-Family.org Fri Dec 29 10:23:07 2017 From: Richard at Damon-Family.org (Richard Damon) Date: Fri, 29 Dec 2017 10:23:07 -0500 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: <20171229155619.7d13a84d@fsol> References: <5A2AD7FC.6050304@stoneleaf.us> <20171229155619.7d13a84d@fsol> Message-ID: On 12/29/17 9:56 AM, Antoine Pitrou wrote: > (*) I got curious and went through the maze of type definitions on > GNU/Linux. Which gives: > > #define __S32_TYPEDEF __signed__ int > #define __PID_T_TYPE __S32_TYPE > __STD_TYPE __PID_T_TYPE __pid_t; > typedef __pid_t pid_t; > > > Regards > > Antoine. One quick side note, just because it mapped to signed int on that Linux, doesn't mean it will always map to signed int on all Linuxes. One of the reasons for the multiple levels of indirection in types is to allow a given distribution to configure some parameter types to be 'optimal' for that implementation. -- Richard Damon From steve at pearwood.info Fri Dec 29 10:38:21 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 30 Dec 2017 02:38:21 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> Message-ID: <20171229153821.GU4215@ando.pearwood.info> On Fri, Dec 29, 2017 at 10:26:22PM +1100, Chris Angelico wrote: > On Fri, Dec 29, 2017 at 7:18 PM, Steven D'Aprano wrote: > > Since ints don't provide a set-like interface, they aren't strictly > > speaking bitsets. But in any case, nobody is stopping people from using > > sets of enum values. > > I'm not sure what "set-like interface" you'd be looking for, but the > built-in set type has a lot of the same operations as an integer does, > and the semantics are virtually identical to a set of bits. The only > one you really lack is __contains__, which could easily be added: The lack of support for the `in` operator is a major difference, but there's also `len` (equivalent to "count the one bits"), superset and subset testing, various in-place mutator methods, etc. Java has a BitSet class, and you can see the typical sorts of operations commonly required: https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html Of course we can emulate set-like operations using ints, but the interfaces are different, which is my point. Here's how to clear all the flags of a set or int: the_flags.clear() the_flags = 0 # clear all the bits in an int Setting a flag is *almost* the same between the two: the_flags |= {flag} # set the_flags |= flag # int although for sets, there are two other ways to set a flag which aren't supported by ints: the_flags.add(flag) the_flags.update({flag}) Similarly for clearing flags: the_flags.discard(flag) the_flags & ~flag -- Steve From rosuav at gmail.com Fri Dec 29 10:56:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 30 Dec 2017 02:56:46 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <20171229153821.GU4215@ando.pearwood.info> References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> Message-ID: On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano wrote: > The lack of support for the `in` operator is a major difference, but > there's also `len` (equivalent to "count the one bits"), superset > and subset testing, various in-place mutator methods, etc. Java has a > BitSet class, and you can see the typical sorts of operations > commonly required: > > https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html Okay. A subclass of int could easily add a few more. Counting the 1 bits isn't difficult; superset and subset testing are actually the same as 'contains' but with more than one bit at a time. (In fact, checking if a set contains a subset is *easier* with ints than with actual sets!) Are in-place mutators that big a deal? I'm sure there are sets in languages with no mutables. > Of course we can emulate set-like operations using ints, but the > interfaces are different, which is my point. Here's how to clear all the > flags of a set or int: > > the_flags.clear() > > the_flags = 0 # clear all the bits in an int That's a consequence of Python's mutability distinction. I don't think it's a fundamental difference. You could just as easily use "the_flags = set()" if it weren't for aliasing. > Setting a flag is *almost* the same between the two: > > the_flags |= {flag} # set > > the_flags |= flag # int That's because you can implicitly upcast a bitflag to a bitset. Effectively, ints give you a short-hand that sets can't. But if you explicitly call BitSet(flag) to create a set containing one flag, it would have the same effect. > although for sets, there are two other ways to set a flag which aren't > supported by ints: > > the_flags.add(flag) > the_flags.update({flag}) > > Similarly for clearing flags: > > the_flags.discard(flag) > > the_flags & ~flag Mutability again. If you were to create an ImmutableSet type in Python, what would its API look like? My suspicion is that it'd largely use operators, and that it'd end up looking a lot like the integer API. An integer, at its lowest level, is represented as a set of bits. It's no more crazy to use an int as a set of bits than to use a string as a set of characters: https://docs.python.org/3/library/stdtypes.html#str.strip ChrisA From storchaka at gmail.com Fri Dec 29 11:04:43 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 29 Dec 2017 18:04:43 +0200 Subject: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's __int__? In-Reply-To: References: <5A2AD7FC.6050304@stoneleaf.us> Message-ID: 29.12.17 16:43, Nick Coghlan ????: > On 29 December 2017 at 22:58, Erik Bray wrote: >> Okay, and it's broken. > > Broken in what way? It has a fairly extensive test suite in > https://github.com/python/cpython/blob/master/Lib/test/test_index.py > (and some additional indirect testing in test_slice.py, which assumes > that it works as advertised). Unfortunately the pure Python implementation doesn't work correctly in corner cases (https://bugs.python.org/issue18712). But in CPython the C implementation is used. Maybe Erik means something other. > The unsigned long and unsigned long long conversions should likely be > consistent with their signed counterparts and allow lossy conversions > via `__int__`. There is a code that relies on the atomicity of these functions. Calling __int__ or __index__ will introduce vulnerabilities in the existing code. From yahya-abou-imran at protonmail.com Fri Dec 29 11:49:21 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Fri, 29 Dec 2017 11:49:21 -0500 Subject: [Python-ideas] Make MappingView inherit from Collection instead of Sized Message-ID: After I generate an UML diagram from collections.abc, I found very strange that MappingView inherit from Sized instead of Collection (new in python 3.6). Yes, MappingView only define __len__ and not __iter__ and __contains__, but all of its subclasses define them (KeysView, ValuesView and ItemViews). I tried to run the tests in test/test_collections.py after making this change and on only one fail : Traceback (most recent call last): File "/usr/lib/python3.6/test/test_collections.py", line 789, in test_Collection self.assertNotIsInstance(x, Collection) AssertionError: dict_values([]) is an instance of Wich is absolutely wrong, since in reality a dict_values instance has the behaviour of a Collection: >>> vals = {1:'a', 2: 'b'}.values() >>> 'a' in vals True >>> 'c' in vals False >>> len(vals) 2 >>> for val in vals: ... print(val) ... a b The only lack is that it doesn't define a __contains__ method: >>> '__contains__' in vals False It uses __iter__ to find the presence of the value. But, hey: we have register() for this cases! In fact, when MappingView inherit from Collection, dict_values is considered as a subclass of Collection since it's in the register of ValuesView, causing the above bug... So, the test have to be changed, and dict_values must be placed in the samples that pass the test, and not in the ones that fail it. Maybe we can open an issue on the python bug tracker? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Dec 29 12:01:10 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 30 Dec 2017 04:01:10 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> Message-ID: On Sat, Dec 30, 2017 at 3:56 AM, Stephan Hoyer wrote: > We already have a built-in immutable set for Python. It's called frozenset. This is true, but AIUI its API is based primarily on that of the (mutable) set. If you were creating a greenfield ImmutableSet class, what would its API look like? ChrisA From shoyer at gmail.com Fri Dec 29 11:56:42 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 29 Dec 2017 16:56:42 +0000 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> Message-ID: We already have a built-in immutable set for Python. It's called frozenset. On Fri, Dec 29, 2017 at 10:56 AM Chris Angelico wrote: > On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano > wrote: > > The lack of support for the `in` operator is a major difference, but > > there's also `len` (equivalent to "count the one bits"), superset > > and subset testing, various in-place mutator methods, etc. Java has a > > BitSet class, and you can see the typical sorts of operations > > commonly required: > > > > https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html > > Okay. A subclass of int could easily add a few more. Counting the 1 > bits isn't difficult; superset and subset testing are actually the > same as 'contains' but with more than one bit at a time. (In fact, > checking if a set contains a subset is *easier* with ints than with > actual sets!) Are in-place mutators that big a deal? I'm sure there > are sets in languages with no mutables. > > > Of course we can emulate set-like operations using ints, but the > > interfaces are different, which is my point. Here's how to clear all the > > flags of a set or int: > > > > the_flags.clear() > > > > the_flags = 0 # clear all the bits in an int > > That's a consequence of Python's mutability distinction. I don't think > it's a fundamental difference. You could just as easily use "the_flags > = set()" if it weren't for aliasing. > > > Setting a flag is *almost* the same between the two: > > > > the_flags |= {flag} # set > > > > the_flags |= flag # int > > That's because you can implicitly upcast a bitflag to a bitset. > Effectively, ints give you a short-hand that sets can't. But if you > explicitly call BitSet(flag) to create a set containing one flag, it > would have the same effect. > > > although for sets, there are two other ways to set a flag which aren't > > supported by ints: > > > > the_flags.add(flag) > > the_flags.update({flag}) > > > > Similarly for clearing flags: > > > > the_flags.discard(flag) > > > > the_flags & ~flag > > Mutability again. If you were to create an ImmutableSet type in > Python, what would its API look like? My suspicion is that it'd > largely use operators, and that it'd end up looking a lot like the > integer API. > > An integer, at its lowest level, is represented as a set of bits. It's > no more crazy to use an int as a set of bits than to use a string as a > set of characters: > > https://docs.python.org/3/library/stdtypes.html#str.strip > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Fri Dec 29 14:11:57 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri, 29 Dec 2017 12:11:57 -0700 Subject: [Python-ideas] Make MappingView inherit from Collection instead of Sized In-Reply-To: References: Message-ID: This sounds like a good observation. I recommend opening a bug and preparing a PR if you can (a PR would also help finding if there are any problems with the idea). On Dec 29, 2017 9:50 AM, "Yahya Abou 'Imran via Python-ideas" < python-ideas at python.org> wrote: > After I generate an UML diagram from collections.abc, I found very strange > that MappingView inherit from Sized instead of Collection (new in python > 3.6). > > Yes, MappingView only define __len__ and not __iter__ and __contains__, > but all of its subclasses define them (KeysView, ValuesView and ItemViews). > > I tried to run the tests in test/test_collections.py after making this > change and on only one fail : > > Traceback (most recent call last): > File "/usr/lib/python3.6/test/test_collections.py", line 789, in > test_Collection > self.assertNotIsInstance(x, Collection) > AssertionError: dict_values([]) is an instance of 'collections.abc.Collection'> > > Wich is absolutely wrong, since in reality a dict_values instance has the > behaviour of a Collection: > > >>> vals = {1:'a', 2: 'b'}.values() > >>> 'a' in vals > True > >>> 'c' in vals > False > >>> len(vals) > 2 > >>> for val in vals: > ... print(val) > ... > a > b > > The only lack is that it doesn't define a __contains__ method: > > >>> '__contains__' in vals > False > > It uses __iter__ to find the presence of the value. > > But, hey: we have register() for this cases! In fact, when MappingView > inherit from Collection, dict_values is considered as a subclass of > Collection since it's in the register of ValuesView, causing the above > bug... > So, the test have to be changed, and dict_values must be placed in the > samples that pass the test, and not in the ones that fail it. > > Maybe we can open an issue on the python bug tracker? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vano at mail.mipt.ru Fri Dec 29 20:56:57 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Sat, 30 Dec 2017 04:56:57 +0300 Subject: [Python-ideas] Allow to compile debug extension against release Python in Windows Message-ID: The Windows version of pyconfig.h has the following construct: ??? if defined(_DEBUG) ?????????? pragma comment(lib,"python37_d.lib") ??? elif defined(Py_LIMITED_API) ?????????? pragma comment(lib,"python3.lib") ??? else ?????????? pragma comment(lib,"python37.lib") ??? endif /* _DEBUG */ which fails the compilation of a debug version of an extension. Making debugging it... difficult. Perhaps we could define some other constant? I'm not sure whether such compilation is a good idea in general, so asking here at first. -- Regards, Ivan From steve at pearwood.info Fri Dec 29 23:25:31 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 30 Dec 2017 15:25:31 +1100 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> Message-ID: <20171230042531.GV4215@ando.pearwood.info> On Sat, Dec 30, 2017 at 02:56:46AM +1100, Chris Angelico wrote: > On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano wrote: > > The lack of support for the `in` operator is a major difference, but > > there's also `len` (equivalent to "count the one bits"), superset > > and subset testing, various in-place mutator methods, etc. Java has a > > BitSet class, and you can see the typical sorts of operations > > commonly required: > > > > https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html > > Okay. A subclass of int could easily add a few more. Counting the 1 > bits isn't difficult; superset and subset testing are actually the > same as 'contains' but with more than one bit at a time. (In fact, > checking if a set contains a subset is *easier* with ints than with > actual sets!) Are in-place mutators that big a deal? I'm sure there > are sets in languages with no mutables. We seem to be talking at cross-purposes. Obviously we can and do already use ints as if they were set-like data structures. For example, the re module already does so. If you want to call that a kind of "bit set", I'm okay with that, but Wikipedia suggests that "bit array" is the canonical name: "bit array (also known as bit map , bit set, bit string, or bit vector)" https://en.wikipedia.org/wiki/Bit_array The obvious reason why is that sets are unordered but arrays of bits are not: 0b1000 is not the same "set" as 0b0010. However, the point I was making was that ints don't provide the same interface as sets. I don't deny that you can use an int to provide set-like functionality, or that with sufficient effort you could subclass int to do so, but what you cannot do is seemlessly interchange an int for a set and visa versa and expect the code to work without modification. Even if the function is limited to using the set-like functionality. I think I have beaten this dead horse enough. This was a minor point about the terminology being used, so I think we're now just waiting on Paddy to clarify what his proposal means in concrete terms. -- Steve From yahya-abou-imran at protonmail.com Sat Dec 30 11:11:29 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sat, 30 Dec 2017 11:11:29 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation Message-ID: We can find very usefull class diagramm to understand the hierarchy of the builtin Collection abstract class and interface in java. Some examples: http://www.falkhausen.de/Java-8/java.util/Collection-Hierarchy-simple.html http://www.falkhausen.de/Java-8/java.util/Collection-List.html But when I search about python's ABC, The more detailed I can find are those from the book of Luciano Ramalho Fluent Python: https://goo.gl/images/8JGjvM https://goo.gl/images/6xZqcA (I think they're done with pyreverse of pylint) They are fine, but I think we could provide some other more detailed in this page: https://docs.python.org/3/library/collections.abc.html The table could be difficult to understand, a diagram help visualize things. I've began working on it with plantuml and pyreverse, I'm joining to this mail what I've done so far so you can tell me what you think. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: collections_abc.png Type: image/png Size: 105572 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: full.png Type: image/png Size: 201846 bytes Desc: not available URL: From stephanh42 at gmail.com Sat Dec 30 15:24:46 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sat, 30 Dec 2017 21:24:46 +0100 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: Hi Yahya, I like the full.png diagram, however, I see some issues with it. Most seriously, the methods it lists don't match the documentation. E.g. if you check MappingView: https://docs.python.org/3/library/collections.abc.html#collections.abc.MappingView you see it has only a __len__ mixin method. The other methods in the diagram are implementation details and should be removed. Some presentation points (all IMHO of course): * Get rid of the empty boxes. * Get rid of the trailing (). Since all methods have this, it adds no info. * There is no visual distinction between the abstract methods and the mixin methods. I'd suggest making the abstract methods italic or something like that. Stephan 2017-12-30 17:11 GMT+01:00 Yahya Abou 'Imran via Python-ideas < python-ideas at python.org>: > We can find very usefull class diagramm to understand the hierarchy of the > builtin Collection abstract class and interface in java. > > Some examples: > http://www.falkhausen.de/Java-8/java.util/Collection-Hierarchy-simple.html > http://www.falkhausen.de/Java-8/java.util/Collection-List.html > > But when I search about python's ABC, The more detailed I can find are > those from the book of Luciano Ramalho Fluent Python: > https://goo.gl/images/8JGjvM > https://goo.gl/images/6xZqcA > > (I think they're done with pyreverse of pylint) > > They are fine, but I think we could provide some other more detailed in > this page: > https://docs.python.org/3/library/collections.abc.html > > The table could be difficult to understand, a diagram help visualize > things. > > I've began working on it with plantuml and pyreverse, I'm joining to this > mail what I've done so far so you can tell me what you think. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Dec 30 16:17:45 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 30 Dec 2017 16:17:45 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: On 12/30/2017 11:11 AM, Yahya Abou 'Imran via Python-ideas wrote: > We can find very usefull class diagramm to understand the hierarchy of > the builtin Collection abstract class and interface in java. > > Some examples: > http://www.falkhausen.de/Java-8/java.util/Collection-Hierarchy-simple.html > http://www.falkhausen.de/Java-8/java.util/Collection-List.html > > But when I search about python's ABC, The more detailed I can find are > those from the book of Luciano Ramalho Fluent Python: > https://goo.gl/images/8JGjvM > https://goo.gl/images/6xZqcA > > (I think they're done with pyreverse of pylint) > > They are fine, but I think we could provide some other more detailed in > this page: > https://docs.python.org/3/library/collections.abc.html > > The table could be difficult to understand, a diagram help visualize things. > > I've began working on it with plantuml and pyreverse, I'm joining to > this mail what I've done so far so you can tell me what you think. We have a few .png files in the docs. Yours look like the beginning of perhaps 2 nice additions. A. Width restrictions suggest making the async branches a separate diagram. B. Be consistent on placement of inherited versus added methods. Always list inherited first? Different fonts, as suggested, might be good. C. After discussion here, and revision, open a doc enhancement issue on bugs.python.org. -- Terry Jan Reedy From pylang3 at gmail.com Sat Dec 30 17:23:06 2017 From: pylang3 at gmail.com (pylang) Date: Sat, 30 Dec 2017 17:23:06 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: +1 on adding these diagrams to the docs. It's great to visualize where the special methods are implemented. On Sat, Dec 30, 2017 at 4:17 PM, Terry Reedy wrote: > On 12/30/2017 11:11 AM, Yahya Abou 'Imran via Python-ideas wrote: > >> We can find very usefull class diagramm to understand the hierarchy of >> the builtin Collection abstract class and interface in java. >> >> Some examples: >> http://www.falkhausen.de/Java-8/java.util/Collection-Hierarc >> hy-simple.html >> http://www.falkhausen.de/Java-8/java.util/Collection-List.html >> >> But when I search about python's ABC, The more detailed I can find are >> those from the book of Luciano Ramalho Fluent Python: >> https://goo.gl/images/8JGjvM >> https://goo.gl/images/6xZqcA >> >> (I think they're done with pyreverse of pylint) >> >> They are fine, but I think we could provide some other more detailed in >> this page: >> https://docs.python.org/3/library/collections.abc.html >> >> The table could be difficult to understand, a diagram help visualize >> things. >> >> I've began working on it with plantuml and pyreverse, I'm joining to this >> mail what I've done so far so you can tell me what you think. >> > > We have a few .png files in the docs. Yours look like the beginning of > perhaps 2 nice additions. > > A. Width restrictions suggest making the async branches a separate diagram. > > B. Be consistent on placement of inherited versus added methods. Always > list inherited first? Different fonts, as suggested, might be good. > > C. After discussion here, and revision, open a doc enhancement issue on > bugs.python.org. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Sat Dec 30 19:18:06 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sat, 30 Dec 2017 19:18:06 -0500 Subject: [Python-ideas] Change the __repr__ of the `MappingView`s Message-ID: === This proposition is purely aesthetic === At this time, the __repr__ of the mapping views is showing the whole mapping: >>> from collections.abc import ValuesView, KeysView, ItemsView >>> d = {3: 'three', 4: 'four'} >>> KeysView(d) KeysView({3: 'three', 4: 'four'}) >>> ValuesView(d) ValuesView({3: 'three', 4: 'four'}) >>> ItemsView(d) ItemsView({3: 'three', 4: 'four'}) Witch is not consistent with dict_keys, dict_values, dict_items: >>> d.keys() dict_keys([3, 4]) >>> d.values() dict_values(['three', 'four']) >>> d.items() dict_items([(3, 'three'), (4, 'four')]) We could easily change that, since all the views are iterables on what they are designed for, in MappingView: def __repr__(self): viewname = self.__class__.__name__ elements = ', '.join(map(repr, self)) return f'{viewname}([elements]) And now: >>> KeysView(d) KeysView([3, 4]) >>> ValuesView(d) ValuesView(['three', 'four']) >>> ItemsView(d) ItemsView([(3, 'three'), (4, 'four')]) It's not breaking any test (it seems that there isn't any for this), but it have a real drawback: it's breaking the convention about instantiation by copy/pasting: >>> ValuesView(['three', 'four']) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/IPython/core/formatters.py", line 684, in __call__ return repr(obj) File "/usr/lib/python3.6/_collections_abc.py", line 706, in __repr__ elements = ', '.join(map(repr, self)) File "/usr/lib/python3.6/_collections_abc.py", line 764, in __iter__ yield self._mapping[key] TypeError: list indices must be integers or slices, not str It's because __init__ in MappingView treat the passed argument -- wich is stored in self._mapping -- as the whole mapping, not just keys, values or items... And all the other methods (__contains__ and __iter__) in the subclasses are using this _mapping attribute to work. So what is to prioritize? From yahya-abou-imran at protonmail.com Sat Dec 30 19:25:17 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sat, 30 Dec 2017 19:25:17 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: > Hi Yahya, > I like the full.png diagram, however, I see some issues with it. > > Most seriously, the methods it lists don't match the documentation. > > E.g. if you check MappingView: > > https://docs.python.org/3/library/collections.abc.html#collections.abc.MappingView > > you see it has only a __len__ mixin method. > The other methods in the diagram are implementation details > and should be removed. > > Some presentation points (all IMHO of course): > * Get rid of the empty boxes. > * Get rid of the trailing (). Since all methods have this, it adds no info. > * There is no visual distinction between the abstract methods > and the mixin methods. I'd suggest making the abstract methods italic > or something like that. > > Stephan Thank you for your observations Stephan. The reason behind that is that I genretated it from a copy of the source code, and I was sondering about getting rid of it or not... But I think you'r right: let's display only the documented part. About the displaying of the abstract method, sadly it seems that pyreverse doesn't support it... It didn't find a way to hide the paranetesis either. I could make those changes whth a graphic tool though. Same thing with the empty box. Whereas it's possible with plantuml. But may be I have to use a pure python tool. What's your opinions about that? Another thing: for exemple, in the case of Collection, it asks to implement the three method __iter__, __contains__ and __sized__, but since it inherit them from Iterable, Container and Sized, they're not shown. I think it's better to make them appear since: 1. you have to implement them to inherit from it; 2. the three methods are checked during the __subclasshook__ of this ABC to know if a class is a virtual subclass of it or not (when pass as a second argument of issubclass() for example). I don't think it's an error in UML to re-display them since they are abstracts (it would be if they were concretes because it would mean that they were been overriden). -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Sat Dec 30 19:27:34 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sat, 30 Dec 2017 19:27:34 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: > A. Width restrictions suggest making the async branches a separate diagram. I was thinking about it... Maybe Hashable and Callable could also be removed, since they are standalone ABCs. And they're not directly linked with the concept of Collection anyway. > B. Be consistent on placement of inherited versus added methods. Always > list inherited first? Different fonts, as suggested, might be good. The best may be to not follow UML guidelines but to stick with the terminology of the documentation: the first box for Abstract Methods, the second for Mixin Methods. From yahya-abou-imran at protonmail.com Sat Dec 30 19:29:44 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sat, 30 Dec 2017 19:29:44 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: Here is another version showing all that inherit from Container, Sized and Iterable. I got rid of ABCMeta, since it's not the prupose of the documentation of that page. I left the parenthesis and the end of the methods names though... [collections_abc.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: collections_abc.png Type: image/png Size: 71475 bytes Desc: not available URL: From guido at python.org Sat Dec 30 19:44:36 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Dec 2017 17:44:36 -0700 Subject: [Python-ideas] Change the __repr__ of the `MappingView`s In-Reply-To: References: Message-ID: You're right that there's some inconsistency here. But I don't think it's worth fixing given that the fix would introduce another inconsistency (which you pointed out) and would also risk breaking backwards compatibility. I think this ship has sailed. On Sat, Dec 30, 2017 at 5:18 PM, Yahya Abou 'Imran via Python-ideas < python-ideas at python.org> wrote: > === This proposition is purely aesthetic === > > At this time, the __repr__ of the mapping views is showing the whole > mapping: > > >>> from collections.abc import ValuesView, KeysView, ItemsView > >>> d = {3: 'three', 4: 'four'} > >>> KeysView(d) > KeysView({3: 'three', 4: 'four'}) > >>> ValuesView(d) > ValuesView({3: 'three', 4: 'four'}) > >>> ItemsView(d) > ItemsView({3: 'three', 4: 'four'}) > > Witch is not consistent with dict_keys, dict_values, dict_items: > > >>> d.keys() > dict_keys([3, 4]) > >>> d.values() > dict_values(['three', 'four']) > >>> d.items() > dict_items([(3, 'three'), (4, 'four')]) > > We could easily change that, since all the views are iterables on what > they are designed for, in MappingView: > > def __repr__(self): > viewname = self.__class__.__name__ > elements = ', '.join(map(repr, self)) > return f'{viewname}([elements]) > > And now: > > >>> KeysView(d) > KeysView([3, 4]) > >>> ValuesView(d) > ValuesView(['three', 'four']) > >>> ItemsView(d) > ItemsView([(3, 'three'), (4, 'four')]) > > It's not breaking any test (it seems that there isn't any for this), but > it have a real drawback: it's breaking the convention about instantiation > by copy/pasting: > > >>> ValuesView(['three', 'four']) > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/IPython/core/formatters.py", > line 684, in __call__ > return repr(obj) > File "/usr/lib/python3.6/_collections_abc.py", line 706, in __repr__ > elements = ', '.join(map(repr, self)) > File "/usr/lib/python3.6/_collections_abc.py", line 764, in __iter__ > yield self._mapping[key] > TypeError: list indices must be integers or slices, not str > > > It's because __init__ in MappingView treat the passed argument -- wich is > stored in self._mapping -- as the whole mapping, not just keys, values or > items... And all the other methods (__contains__ and __iter__) in the > subclasses are using this _mapping attribute to work. > > > So what is to prioritize? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Dec 30 23:50:33 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 30 Dec 2017 23:50:33 -0500 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <20171230042531.GV4215@ando.pearwood.info> References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> Message-ID: On Fri, Dec 29, 2017 at 11:25 PM, Steven D'Aprano wrote: > On Sat, Dec 30, 2017 at 02:56:46AM +1100, Chris Angelico wrote: >> On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano wrote: >> > The lack of support for the `in` operator is a major difference, but >> > there's also `len` (equivalent to "count the one bits"), superset >> > and subset testing, various in-place mutator methods, etc. Java has a >> > BitSet class, and you can see the typical sorts of operations >> > commonly required: >> > >> > https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html >> >> Okay. A subclass of int could easily add a few more. Counting the 1 >> bits isn't difficult; superset and subset testing are actually the >> same as 'contains' but with more than one bit at a time. (In fact, >> checking if a set contains a subset is *easier* with ints than with >> actual sets!) Are in-place mutators that big a deal? I'm sure there >> are sets in languages with no mutables. > > We seem to be talking at cross-purposes. > > Obviously we can and do already use ints as if they were set-like data > structures. For example, the re module already does so. If you want to > call that a kind of "bit set", I'm okay with that, but Wikipedia > suggests that "bit array" is the canonical name: > > "bit array (also known as bit map , bit set, bit string, or > bit vector)" > > https://en.wikipedia.org/wiki/Bit_array > > The obvious reason why is that sets are unordered but arrays of bits are > not: 0b1000 is not the same "set" as 0b0010. I think "bit-set" was used because it has semantic meaning in this context. In your example, it is not the bits that are ordered, but the values, which have a canonical order (or, more generally, a specified order). 0b1000 represents the set {3}, while 0b0010 represents the set {1}. A bit set representation is, in fact, unordered, since {1,3} and {3,1} are both represented by the same int. The values of a bit array are the bits themselves, but the values of a bitset are the indices which have a 1. > I think I have beaten this dead horse enough. This was a minor point > about the terminology being used, so I think we're now just waiting on > Paddy to clarify what his proposal means in concrete terms. Paddy might want something like this: - For existing APIs which take int or IntFlag flags, allow them to also take a set (or perhaps any collection) of flags. - In new APIs, take sets of Enum flags, and don't make them IntFlag. - Documentation should show preference toward using sets of Enum flags. Tutorials should pass sets. From guido at python.org Sun Dec 31 00:33:07 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Dec 2017 21:33:07 -0800 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> Message-ID: On Sat, Dec 30, 2017 at 8:50 PM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > > Paddy might want something like this: > - For existing APIs which take int or IntFlag flags, allow them to > also take a set (or perhaps any collection) of flags. > - In new APIs, take sets of Enum flags, and don't make them IntFlag. > - Documentation should show preference toward using sets of Enum > flags. Tutorials should pass sets. I'm not keen on this recommendation. An argument that takes a Set[Foo] would mean that in order to specify: - no flags: you'd have to pass set() -- you can't use {} since that's an empty dict, not an empty set - one flag: you'd have to pass {Foo.BAR} rather than just Foo.BAR - two flags: you'd have to pass {Foo.BAR, Foo.BAZ} rather than Foo.BAR | Foo.BAZ I think for each of these the proposal would be strictly worse than the current convention. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 31 00:39:33 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Dec 2017 21:39:33 -0800 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> Message-ID: I should probably clarify that for this to work, Foo must derive from enum.Flags. See https://docs.python.org/3/library/enum.html#flag. (Or enum.IntFlag, https://docs.python.org/3/library/enum.html#intflag.) Note that when using Flag, you can name the "zero" value (Color.BLACK in the 3rd example). When using IntFlag, you just use 0. On Sat, Dec 30, 2017 at 9:33 PM, Guido van Rossum wrote: > On Sat, Dec 30, 2017 at 8:50 PM, Franklin? Lee < > leewangzhong+python at gmail.com> wrote: >> >> Paddy might want something like this: >> - For existing APIs which take int or IntFlag flags, allow them to >> also take a set (or perhaps any collection) of flags. >> - In new APIs, take sets of Enum flags, and don't make them IntFlag. >> - Documentation should show preference toward using sets of Enum >> flags. Tutorials should pass sets. > > > I'm not keen on this recommendation. An argument that takes a Set[Foo] > would mean that in order to specify: > - no flags: you'd have to pass set() -- you can't use {} since that's an > empty dict, not an empty set > - one flag: you'd have to pass {Foo.BAR} rather than just Foo.BAR > - two flags: you'd have to pass {Foo.BAR, Foo.BAZ} rather than Foo.BAR | > Foo.BAZ > > I think for each of these the proposal would be strictly worse than the > current convention. > > -- > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Sun Dec 31 01:19:12 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sun, 31 Dec 2017 01:19:12 -0500 Subject: [Python-ideas] Improve ABCs _dump_registry() readability Message-ID: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> In python 2.7, ABCs's caches and registries are sets. But in python 3.6 they are WeakSet. In consequence, the output of _dump_registry() is almost useless: >>> from collections import abc >>> abc.Iterator._dump_registry() Class: collections.abc.Iterator Inv.counter: 40 _abc_cache: <_weakrefset.WeakSet object at 0x7f4b58fe2668> _abc_negative_cache: <_weakrefset.WeakSet object at 0x7f4b53283780> _abc_negative_cache_version: 40 _abc_registry: <_weakrefset.WeakSet object at 0x7f4b58fe2630> We could convert them into a regular set before printing: if isinstance(value, WeakSet): value = set(value) The result: >>> abc.Iterator._dump_registry() Class: collections.abc.Iterator Inv.counter: 40 _abc_cache: {, , , , , , , , , , , , } _abc_negative_cache: set() _abc_negative_cache_version: 40 _abc_registry: set() NB: It seems pretty weird to me that registry is empty... All the iterators in the cache should've been in the registry instead, should'nt they? From paddy3118 at gmail.com Sun Dec 31 03:13:28 2017 From: paddy3118 at gmail.com (Paddy3118) Date: Sun, 31 Dec 2017 00:13:28 -0800 (PST) Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> Message-ID: Hmm, yea I had not thought of how it would look - I had thought formost of not needing to necessarily learn about bitsets.when learning about passing a large number of optional flags to a function. Although the default could be None, interpreted as an empty set of zero values.; a set of one or more enums does use more characters compared to or-ing flags... On Sunday, 31 December 2017 05:34:23 UTC, Guido van Rossum wrote: > > On Sat, Dec 30, 2017 at 8:50 PM, Franklin? Lee > wrote: >> >> Paddy might want something like this: >> - For existing APIs which take int or IntFlag flags, allow them to >> also take a set (or perhaps any collection) of flags. >> - In new APIs, take sets of Enum flags, and don't make them IntFlag. >> - Documentation should show preference toward using sets of Enum >> flags. Tutorials should pass sets. > > > I'm not keen on this recommendation. An argument that takes a Set[Foo] > would mean that in order to specify: > - no flags: you'd have to pass set() -- you can't use {} since that's an > empty dict, not an empty set > - one flag: you'd have to pass {Foo.BAR} rather than just Foo.BAR > - two flags: you'd have to pass {Foo.BAR, Foo.BAZ} rather than Foo.BAR | > Foo.BAZ > > I think for each of these the proposal would be strictly worse than the > current convention. > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 31 10:38:27 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Jan 2018 00:38:27 +0900 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: <23113.1139.65447.199339@turnbull.sk.tsukuba.ac.jp> Terry Reedy writes: > B. Be consistent on placement of inherited versus added methods. Always > list inherited first? Different fonts, as suggested, might be > good. I would prefer listing added methods first. From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 31 10:38:59 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Jan 2018 00:38:59 +0900 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: References: Message-ID: <23113.1171.841353.169485@turnbull.sk.tsukuba.ac.jp> Sorry about the premature send. Terry Reedy writes: > B. Be consistent on placement of inherited versus added methods. Always > list inherited first? Different fonts, as suggested, might be > good. I would prefer listing overridden and added methods first, because there's a good chance I already know from the base classes what methods are inherited. On the other hand, I would list abstract methods first, as they form an agenda for implementing a concrete class. (I don't have much experience with this though.) Steve From yahya-abou-imran at protonmail.com Sun Dec 31 10:46:38 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sun, 31 Dec 2017 10:46:38 -0500 Subject: [Python-ideas] Add an UML class diagram to the collections.abc module documentation In-Reply-To: <23113.1139.65447.199339@turnbull.sk.tsukuba.ac.jp> References: <23113.1139.65447.199339@turnbull.sk.tsukuba.ac.jp> Message-ID: >Terry Reedy writes: > >>B. Be consistent on placement of inherited versus added methods. Always >>>list inherited first? Different fonts, as suggested, might be >>>good. >> > I would prefer listing added methods first. I don't understand why... In the table of the documentation page, the abstract methods are listed fisrt. In the source code, the abstract methods are implemented fisrt. In UML, the convention is to place the abstract methods first. From python at mrabarnett.plus.com Sun Dec 31 12:09:34 2017 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 31 Dec 2017 17:09:34 +0000 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> Message-ID: <980401a3-27c0-c5aa-24c9-41e3db533f69@mrabarnett.plus.com> On 2017-12-31 08:13, Paddy3118 wrote: > Hmm, yea I had not thought of how it would look - I had thought formost > of not needing to necessarily learn about bitsets.when learning about > passing a large number of optional flags to a function. > > Although the default could be None, interpreted as an empty set of zero > values.; a set of one or more enums does use more characters compared to > or-ing flags... > None is often used to represent a default, which might not be an empty set. > On Sunday, 31 December 2017 05:34:23 UTC, Guido van Rossum wrote: > > On Sat, Dec 30, 2017 at 8:50 PM, Franklin? Lee > > wrote: > > Paddy might want something like this: > - For existing APIs which take int or IntFlag flags, allow them to > also take a set (or perhaps any collection) of flags. > - In new APIs, take sets of Enum flags, and don't make them IntFlag. > - Documentation should show preference toward using sets of Enum > flags. Tutorials should pass sets. > > > I'm not keen on this recommendation. An argument that takes a > Set[Foo] would mean that in order to specify: > - no flags: you'd have to pass set() -- you can't use {} since > that's an empty dict, not an empty set > - one flag: you'd have to pass {Foo.BAR} rather than just Foo.BAR > - two flags: you'd have to pass {Foo.BAR, Foo.BAZ} rather than > Foo.BAR | Foo.BAZ > > I think for each of these the proposal would be strictly worse than > the current convention. > From guido at python.org Sun Dec 31 12:54:17 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 31 Dec 2017 10:54:17 -0700 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: Yeah, I guess few developers have needed to use _dump_registry(), and also it's easy enough to just access e.g. Iterator._abc_registry yourself. The reason Iterator._abc_registry is empty is that no class directly registered with it -- they are all registered with e.g. Sequence. The cache includes classes registered with subclasses, but the registry itself does not. I guess a PR to fix the registry output would make sense (first file a bug on bugs.python.org for it). On Sat, Dec 30, 2017 at 11:19 PM, Yahya Abou 'Imran via Python-ideas < python-ideas at python.org> wrote: > In python 2.7, ABCs's caches and registries are sets. But in python 3.6 > they are WeakSet. > In consequence, the output of _dump_registry() is almost useless: > > >>> from collections import abc > >>> abc.Iterator._dump_registry() > Class: collections.abc.Iterator > Inv.counter: 40 > _abc_cache: <_weakrefset.WeakSet object at 0x7f4b58fe2668> > _abc_negative_cache: <_weakrefset.WeakSet object at 0x7f4b53283780> > _abc_negative_cache_version: 40 > _abc_registry: <_weakrefset.WeakSet object at 0x7f4b58fe2630> > > We could convert them into a regular set before printing: > > if isinstance(value, WeakSet): > value = set(value) > > The result: > > >>> abc.Iterator._dump_registry() > Class: collections.abc.Iterator > Inv.counter: 40 > _abc_cache: {, , > , , 'dict_keyiterator'>, , , 'set_iterator'>, , , > , , 'bytes_iterator'>} > _abc_negative_cache: set() > _abc_negative_cache_version: 40 > _abc_registry: set() > > > NB: It seems pretty weird to me that registry is empty... All the > iterators in the cache should've been in the registry instead, should'nt > they? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Sun Dec 31 13:24:17 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sun, 31 Dec 2017 13:24:17 -0500 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: >Yeah, I guess few developers have needed to use _dump_registry(), and also it's easy enough to just access e.g. Iterator._abc_registry yourself. > Yes, I saw that it's not well-known. I was studying hard the internals of ABCs and ABCMeta, so I end up using it and modifying it. >The reason Iterator._abc_registry is empty is that no class directly registered with it -- they are all registered with e.g. Sequence. The cache includes classes registered with subclasses, but the registry itself does not. No, in the source code they are! in _collections_abc.py, just after Iterator definition: Iterator.register(bytes_iterator) Iterator.register(bytearray_iterator) #Iterator.register(callable_iterator) Iterator.register(dict_keyiterator) Iterator.register(dict_valueiterator) Iterator.register(dict_itemiterator) Iterator.register(list_iterator) Iterator.register(list_reverseiterator) Iterator.register(range_iterator) Iterator.register(longrange_iterator) Iterator.register(set_iterator) Iterator.register(str_iterator) Iterator.register(tuple_iterator) Iterator.register(zip_iterator) For some reason, the register is being cleared at some point. I tried: Iterator.register(bytes_iterator) Iterator._dump_registry(open('iterator_registry.log', 'w')) Iterator.register(bytearray_iterator) . . . And I got: $ cat iterator_registry.log????????????????????????????????????????????????? Class: collections.abc.Iterator Inv.counter: 8 _abc_cache: {} _abc_negative_cache: set() _abc_negative_cache_version: 8 _abc_registry: set() It's going into the cache and not into the registry. Strange behaviour... >I guess a PR to fix the registry output would make sense (first file a bug on bugs.python.org for it). Ok, I will! From levkivskyi at gmail.com Sun Dec 31 13:31:06 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 31 Dec 2017 19:31:06 +0100 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: On 31 December 2017 at 19:24, Yahya Abou 'Imran via Python-ideas < python-ideas at python.org> wrote: > > >I guess a PR to fix the registry output would make sense (first file a > bug on bugs.python.org for it). > > Ok, I will! > > Please don't hurry with this. I am going to rewrite ABCMeta in C soon. In fact most of the work is done but I am waiting for implementation of PEP 560 to settle (need few more days for this). In the C version the caches/registry will be simpler and will not use WeakSet (instead they will be thin C wrappers around normal sets). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yahya-abou-imran at protonmail.com Sun Dec 31 13:38:07 2017 From: yahya-abou-imran at protonmail.com (Yahya Abou 'Imran) Date: Sun, 31 Dec 2017 13:38:07 -0500 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: >>>I guess a PR to fix the registry output would make sense (first file a bug onbugs.python.org for it). >> >>Ok, I will! >> >> >> >Please don't hurry with this. I am going to rewrite ABCMeta in C soon. >In fact most of the work is done but I am waiting for implementation of PEP 560 to settle (need few more days for this). > >In the C version the caches/registry will be simpler and will not use WeakSet (instead they will be thin C wrappers around normal sets). Ok, no problem. Found out myself why the registry's empty: every iterator passed to Iterator.register() defines __iter__ and __next__, so they satisfy Iterator.__subclasshook__ and are added to the cache beforehand. From guido at python.org Sun Dec 31 13:47:49 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 31 Dec 2017 11:47:49 -0700 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: Ah, glad the mystery's solved! And sorry for the misdirection. On Sun, Dec 31, 2017 at 11:38 AM, Yahya Abou 'Imran < yahya-abou-imran at protonmail.com> wrote: > > >>>I guess a PR to fix the registry output would make sense (first file a > bug onbugs.python.org for it). > >> > >>Ok, I will! > >> > >> > >> > >Please don't hurry with this. I am going to rewrite ABCMeta in C soon. > >In fact most of the work is done but I am waiting for implementation of > PEP 560 to settle (need few more days for this). > > > >In the C version the caches/registry will be simpler and will not use > WeakSet (instead they will be thin C wrappers around normal sets). > > Ok, no problem. > > Found out myself why the registry's empty: > every iterator passed to Iterator.register() defines __iter__ and > __next__, so they satisfy Iterator.__subclasshook__ and are added to the > cache beforehand. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Dec 31 14:05:02 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 31 Dec 2017 20:05:02 +0100 Subject: [Python-ideas] Improve ABCs _dump_registry() readability References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> Message-ID: <20171231200502.1dec24fa@fsol> On Sun, 31 Dec 2017 19:31:06 +0100 Ivan Levkivskyi wrote: > On 31 December 2017 at 19:24, Yahya Abou 'Imran via Python-ideas < > python-ideas at python.org> wrote: > > > > > >I guess a PR to fix the registry output would make sense (first file a > > bug on bugs.python.org for it). > > > > Ok, I will! > > > > > Please don't hurry with this. I am going to rewrite ABCMeta in C soon. > In fact most of the work is done but I am waiting for implementation of PEP > 560 to settle (need few more days for this). > > In the C version the caches/registry will be simpler and will not use > WeakSet (instead they will be thin C wrappers around normal sets). Hmm... Just because you are rewriting the thing in C doesn't mean that Yahya shouldn't submit a patch for the Python version (which I assume will be staying around anyway). Regards Antoine. From levkivskyi at gmail.com Sun Dec 31 14:09:03 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 31 Dec 2017 20:09:03 +0100 Subject: [Python-ideas] Improve ABCs _dump_registry() readability In-Reply-To: <20171231200502.1dec24fa@fsol> References: <6iWiw-leIhKwpXqZOBa7Q1Zm4e5pVsaazpHA_ozFTdUJaF9szDEgklpyCPBA2A1cy10BfUuFeDowhrPGf-nctR_9mKc5hnJjkd_qzPEjA4Y=@protonmail.com> <20171231200502.1dec24fa@fsol> Message-ID: On 31 December 2017 at 20:05, Antoine Pitrou wrote: > On Sun, 31 Dec 2017 19:31:06 +0100 > Ivan Levkivskyi > wrote: > > > On 31 December 2017 at 19:24, Yahya Abou 'Imran via Python-ideas < > > python-ideas at python.org> wrote: > > > > > > > > >I guess a PR to fix the registry output would make sense (first file a > > > bug on bugs.python.org for it). > > > > > > Ok, I will! > > > > > > > > Please don't hurry with this. I am going to rewrite ABCMeta in C soon. > > In fact most of the work is done but I am waiting for implementation of > PEP > > 560 to settle (need few more days for this). > > > > In the C version the caches/registry will be simpler and will not use > > WeakSet (instead they will be thin C wrappers around normal sets). > > Hmm... Just because you are rewriting the thing in C doesn't mean that > Yahya shouldn't submit a patch for the Python version (which I assume > will be staying around anyway). > Yes, good point! -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sun Dec 31 18:00:47 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 31 Dec 2017 18:00:47 -0500 Subject: [Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"? In-Reply-To: <980401a3-27c0-c5aa-24c9-41e3db533f69@mrabarnett.plus.com> References: <20171227055639.GP4215@ando.pearwood.info> <9ec7fc9a-57ca-4fd5-ad21-8b1346349c2a@googlegroups.com> <20171229081816.GT4215@ando.pearwood.info> <20171229153821.GU4215@ando.pearwood.info> <20171230042531.GV4215@ando.pearwood.info> <980401a3-27c0-c5aa-24c9-41e3db533f69@mrabarnett.plus.com> Message-ID: On Sun, Dec 31, 2017 at 12:09 PM, MRAB wrote: > On 2017-12-31 08:13, Paddy3118 wrote: >> >> Hmm, yea I had not thought of how it would look - I had thought formost of >> not needing to necessarily learn about bitsets.when learning about passing a >> large number of optional flags to a function. >> >> Although the default could be None, interpreted as an empty set of zero >> values.; a set of one or more enums does use more characters compared to >> or-ing flags... >> > None is often used to represent a default, which might not be an empty set. I see no reason not to allow an iterable collection of flags. That will at least allow [] and ().