From mike at selik.org Tue Dec 1 00:18:49 2015 From: mike at selik.org (Michael Selik) Date: Tue, 01 Dec 2015 05:18:49 +0000 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> <650A2120-A263-4568-8C4D-D1B4241B9382@yahoo.com> Message-ID: On Mon, Nov 30, 2015 at 11:38 PM Chris Angelico wrote: > And while I called the class-based system "ugly" to start > with, I'm coming around to it more and more - especially since it > works in current versions of Python, rather than demanding core > interpreter changes. It's not the most intuitive use of syntax either > (the 'class' block isn't really creating a class at all - it creates a > function parameter list), but it isn't as bad as I thought it was. > One advantage of the ``@property`` syntax is that one can show a new Pythonista how to create read-only properties without needing to explain too many new concepts. Using ``@foo.setter`` sometimes requires a bit more hand-waving. Is there a way to use the metaclass and class-in-a-class as Kevin Modzelewski wrote, but hiding some of the complexity behind a decorator? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmod at dropbox.com Tue Dec 1 00:28:42 2015 From: kmod at dropbox.com (Kevin Modzelewski) Date: Mon, 30 Nov 2015 21:28:42 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: Message-ID: Hmm I could have done a bit better with my example. class defs can work with any callable as a "metaclass", so creating an actual metaclass was overkill: def property_wrapper(name, bases, attrs): return property(attrs.get('get'), attrs.get('set'), attrs.get('del'), attrs.get('__doc__')) class Foo(object): class myprop(metaclass=property_wrapper): def get(self): return 1 def set(self, v): pass __doc__ = 1 I wasn't suggesting it for performance reasons, just that there's already an API for "call a function with the locals defined in this scope" that we can use directly, rather than using a different wrapper of the underlying API (aka class creation). But I think from a readability standpoint it's much nicer to wrap a normal classdef with a "@make_property" decorator rather than doing it via a metaclasss. I think this could be different if there was a simpler way to use a metaclass (especially, a way that wasn't so class-related). Here's an example of what it could look like: scope(property_wrapper) myprop: def get(self): return 1 The thing that's nice is that "scope(X) Y" can be just a simple transformation to "class Y(metaclass=X)". Anyway, I'm not trying to seriously suggest this as the way to go, but just trying to say that if you want to apply a function to the locals defined in a scope, that feature already exists even if it is ugly to use :) On Mon, Nov 30, 2015 at 8:37 PM, Andrew Barnert wrote: > On Nov 30, 2015, at 19:21, Kevin Modzelewski via Python-ideas < > python-ideas at python.org> wrote: > > > > Class scopes definitely feel like a good match -- they are a way of > saying "evaluate all of these expression, pass the resulting locals to a > custom function, and bind the result of that function to the classname". > Usually the function is type(), which constructs a new class, but by > setting a custom metaclass we can avoid creating a class just to wrap the > scope: > > Is there really a harm in creating a class? > > A property is a type, and the obvious way to simulate it in Python rather > than C (as shown by the sample code in the HOWTO) is with a class statement. > > Besides, if you're creating many thousands of properties in a loop, the > time and space cost of property creation is probably the least of your > worries. > > Again, maybe that isn't true for other types of decorators this feature > might be useful for, but without having any examples to think about, it's > hard to guess... > > > class PropertyMetaclass(type): > > def __new__(cls, name, bases, attrs): > > return property(attrs.get('get'), attrs.get('set'), > attrs.get('del'), attrs.get('__doc__')) > > I still don't get the benefit of having a metaclass or constructor > function or wrapper function or anything else, instead of just making > property take a class instead of four functions. The latter is > significantly nicer on the user side, and only a tiny bit more verbose in > the implementation of property, and easier to understand. Unless there are > other decorators where they wouldn't be true, or so many potentially useful > one-shot decorators that defining them all a little more succinctly is > worth the cost, why add the extra layer? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjoerdjob at sjec.nl Tue Dec 1 02:51:23 2015 From: sjoerdjob at sjec.nl (Sjoerd Job Postmus) Date: Tue, 1 Dec 2015 08:51:23 +0100 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: Message-ID: <20151201075123.GA32493@sjoerdjob.com> What worries me is that all we're looking at is the case of the @property decorator. That decorator just creates a descriptor. Why not just class Foo(object): class myprop(object): def __get__(self): return 1 def __set__(self, value): pass It would seem far more logical to tell peopele to read up on descriptors, instead of telling them: "Here's some complicated thing you can use that generates something quite simple under the hood.". Is there any other use case which would benefit greatly from the 'add locals to some scope' idea? Probably, but in that case I would suggest discussing these cases separately. On Mon, Nov 30, 2015 at 07:21:57PM -0800, Kevin Modzelewski via Python-ideas wrote: > Class scopes definitely feel like a good match -- they are a way of saying > "evaluate all of these expression, pass the resulting locals to a custom > function, and bind the result of that function to the classname". Usually > the function is type(), which constructs a new class, but by setting a > custom metaclass we can avoid creating a class just to wrap the scope: > > class PropertyMetaclass(type): > def __new__(cls, name, bases, attrs): > return property(attrs.get('get'), attrs.get('set'), > attrs.get('del'), attrs.get('__doc__')) > > class Foo(object): > class myprop(metaclass=PropertyMetaclass): > def get(self): > return 1 > def set(self, v): > pass > __doc__ = 1 > > f = Foo() > print(f.myprop) > > > The "class myprop(metaclass=PropertyClass)" line is pretty ugly though. > > On Mon, Nov 30, 2015 at 6:41 PM, David Mertz wrote: > > > On Mon, Nov 30, 2015 at 6:14 PM, Chris Angelico wrote: > > > >> def call(func): > >> def inner(cls): > >> return func(**{k:v for k,v in cls.__dict__.items() if not > >> k.startswith('_')}) > >> return inner > >> > >> class Foo: > >> def __init__(self): > >> self._x = 42 > >> @call(property) > >> class x: > >> def fget(self): > >> return self._x > >> def fset(self, value): > >> self._x = value > >> def fdel(self): > >> del self._x > >> > > > > I think this looks perfectly nice, actually. I was just trying to work > > out almost the same thing but using a `def x()` rather than `class f` as > > the nesting construct. I think Chris' is better though. I think I might > > want to define something like: > > > > make_property = call(property) > > > > class Foo: > > def __init__(self): > > self._x = 42 > > @make_property > > class x: > > def fget(self): > > return self._x > > def fset(self, value): > > self._x = value > > def fdel(self): > > del self._x > > > > > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Tue Dec 1 04:01:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 1 Dec 2015 01:01:05 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: Message-ID: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> On Nov 30, 2015, at 21:28, Kevin Modzelewski wrote: > > Hmm I could have done a bit better with my example. class defs can work with any callable as a "metaclass", so creating an actual metaclass was overkill: > > def property_wrapper(name, bases, attrs): > return property(attrs.get('get'), attrs.get('set'), attrs.get('del'), attrs.get('__doc__')) > > class Foo(object): > class myprop(metaclass=property_wrapper): > def get(self): > return 1 > def set(self, v): > pass > __doc__ = 1 But even this is still more complicated than just changing the property type's initializer to take a class instead of a bunch of functions (that is, making property a normal class decorator instead of making it a weird multiple-function decorator and then writing a separate wrapper that lets you pass a class as if it were a normal class decorator)? And again, what's the benefit from this extra complexity? Unless you have a whole lot of decorators written that all need this exact same transformation, you're just abstracting out an arbitrary part of the logic that doesn't seem to fit any natural grain. > I wasn't suggesting it for performance reasons, just that there's already an API for "call a function with the locals defined in this scope" that we can use directly, rather than using a different wrapper of the underlying API (aka class creation). But I think from a readability standpoint it's much nicer to wrap a normal classdef with a "@make_property" decorator rather than doing it via a metaclasss. I think this could be different if there was a simpler way to use a metaclass (especially, a way that wasn't so class-related). Here's an example of what it could look like: > > scope(property_wrapper) myprop: > def get(self): > return 1 That's still a lot less readable than this: @property class myprop: def get(self): return 1 > The thing that's nice is that "scope(X) Y" can be just a simple transformation to "class Y(metaclass=X)". Anyway, I'm not trying to seriously suggest this as the way to go, but just trying to say that if you want to apply a function to the locals defined in a scope, that feature already exists even if it is ugly to use :) > > >> On Mon, Nov 30, 2015 at 8:37 PM, Andrew Barnert wrote: >> On Nov 30, 2015, at 19:21, Kevin Modzelewski via Python-ideas wrote: >> > >> > Class scopes definitely feel like a good match -- they are a way of saying "evaluate all of these expression, pass the resulting locals to a custom function, and bind the result of that function to the classname". Usually the function is type(), which constructs a new class, but by setting a custom metaclass we can avoid creating a class just to wrap the scope: >> >> Is there really a harm in creating a class? >> >> A property is a type, and the obvious way to simulate it in Python rather than C (as shown by the sample code in the HOWTO) is with a class statement. >> >> Besides, if you're creating many thousands of properties in a loop, the time and space cost of property creation is probably the least of your worries. >> >> Again, maybe that isn't true for other types of decorators this feature might be useful for, but without having any examples to think about, it's hard to guess... >> >> > class PropertyMetaclass(type): >> > def __new__(cls, name, bases, attrs): >> > return property(attrs.get('get'), attrs.get('set'), attrs.get('del'), attrs.get('__doc__')) >> >> I still don't get the benefit of having a metaclass or constructor function or wrapper function or anything else, instead of just making property take a class instead of four functions. The latter is significantly nicer on the user side, and only a tiny bit more verbose in the implementation of property, and easier to understand. Unless there are other decorators where they wouldn't be true, or so many potentially useful one-shot decorators that defining them all a little more succinctly is worth the cost, why add the extra layer? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 1 04:56:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Dec 2015 19:56:17 +1000 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 1 December 2015 at 12:52, Andrew Barnert via Python-ideas wrote: > It seems a lot cleaner to just pass a class to the decorator: > > class Property: > def __init__(self, cls): > self.fget = getattr(cls, 'fget', None) > self.fset = getattr(cls, 'fset', None) > self.fdel = getattr(cls, 'fdel', None) > self.doc = getattr(cls, '__doc__', None) > # everything below this point is exactly the same as the > # existing implementation in the descriptor HOWTO (or > # the C implementation in descrobject.c). > > class Foo: > def __init__(self): > self._x = 42 > @Property > class x: > def fget(self): > return self._x > def fset(self, value): > self._x = value > def fdel(self): > del self._x I'm not following this discussion closely, but saw a reference to "Why not just use a class?" in one of the later posts, and hence went looking for the specific post suggesting that (since it's a question with a specific-but-not-obvious answer). A class based approach like the one suggested here came up in the previous discussion that gave us the current syntax: class Foo: def __init__(self): self._x = 42 @property def x(self): return self._x @x.setter def x(self, value): self._x = value @x.deleter def x(self): del self._x The main objection I recall being raised against the class based approach in that previous discussion is that it handles the "self" reference in the property implementation methods in a confusing way: the "self" refers to an instance of the class containing the property definition, *not* to an instance of the class containing the methods. By contrast, when you use the "property/setter/deleter" pattern or the original non-decorator based pattern, all of the individual methods are written as normal methods, with the "self" referring to an instance of the class that contains the method definition as usual. Any approach based on defining a new indented suite has the same problem (in that it makes it harder for a human reader to figure out the correct referent for the "self" parameters), but using a class statement specifically has the problem of "no, not *this* class, *that* class". Beyond that, property and any similar decorators are really just a special case of a higher order function accepting multiple distinct functions as inputs, and Python's syntax generally isn't structured to make that easy to do. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Dec 1 05:09:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Dec 2015 20:09:03 +1000 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <20151201075123.GA32493@sjoerdjob.com> References: <20151201075123.GA32493@sjoerdjob.com> Message-ID: On 1 December 2015 at 17:51, Sjoerd Job Postmus wrote: > What worries me is that all we're looking at is the case of the > @property decorator. That decorator just creates a descriptor. Why not > just > > class Foo(object): > class myprop(object): > def __get__(self): > return 1 > def __set__(self, value): > pass Those aren't the signatures of the descriptor methods. From https://docs.python.org/3/reference/datamodel.html#descriptors: object.__get__(self, instance, owner) object.__set__(self, instance, value) object.__delete__(self, instance) The trick with property is that it hides the "self" that refers to the descriptor object itself, as well as the "owning class" reference in __get__, leading to the simplified property protocol where the individual functions are written as instance methods of the class *containing* the property, and retrieving the descriptor from the class will just give you the descriptor object. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Tue Dec 1 07:01:06 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 1 Dec 2015 04:01:06 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> Message-ID: <356013B6-33D0-4970-822C-E423F0A68432@yahoo.com> On Dec 1, 2015, at 01:56, Nick Coghlan wrote: > > On 1 December 2015 at 12:52, Andrew Barnert via Python-ideas > wrote: >> It seems a lot cleaner to just pass a class to the decorator: >> >> class Property: >> def __init__(self, cls): >> self.fget = getattr(cls, 'fget', None) >> self.fset = getattr(cls, 'fset', None) >> self.fdel = getattr(cls, 'fdel', None) >> self.doc = getattr(cls, '__doc__', None) >> # everything below this point is exactly the same as the >> # existing implementation in the descriptor HOWTO (or >> # the C implementation in descrobject.c). >> >> class Foo: >> def __init__(self): >> self._x = 42 >> @Property >> class x: >> def fget(self): >> return self._x >> def fset(self, value): >> self._x = value >> def fdel(self): >> del self._x > > I'm not following this discussion closely, but saw a reference to "Why > not just use a class?" in one of the later posts, and hence went > looking for the specific post suggesting that (since it's a question > with a specific-but-not-obvious answer). > > A class based approach like the one suggested here came up in the > previous discussion that gave us the current syntax: I don't know the exact timing here, but I'm willing to bet that at the time that discussion happened: 1. Python didn't have class decorators yet, and the very notion was seen as obscure and unnecessary. 2. Inner and nested classes were an unfamiliar feature that almost no other major language supported, rather than being fundamental tools in Java. (Which means nobody had yet had to face the "which self" question, for example.) 3. Modern functional/OO hybrids like F# and Scala didn't exist (and OCaml was barely known outside specific academic circles), so the only familiar notion of dynamic class creation was the SmallTalk style, rather than functions that return classes (like namedtuple--although its implementation is more Tcl-ish than anything, the interface is still all about using types as first-class values). So, I'm not sure the objections hold as well today as they did back then. But I'll admit that they're certainly not empty; I'll have to sleep on them, then play with it and see how it really looks. Meanwhile, the point you make at the end is a good one: > Beyond that, property and any similar decorators are really just a > special case of a higher order function accepting multiple distinct > functions as inputs, and Python's syntax generally isn't structured to > make that easy to do. When I think about how you'd do this even in a language like JS, I see your point. Using Pythonesque syntax: x = property({ 'doc': 'x', 'fget': lambda self: self._x, 'fset': lambda self, value: self._x = value, # oops 'fdel': lambda self: del self._x # oops }) The fact that we need def means that we need a suite, not an expression. And I think you're right that this all of the proposals using a suite have basically the same problem as using a class. But they have an additional problem: a class is a standard way to wrap up a bunch of function definitions in a single "thing", and the other suggestions aren't, and Python doesn't have anything else that is. The one obvious alternative is to use a function instead of a class. Maybe something like this: @property def x(): """x""" def fget(self): return self._x def fset(self, value): self._x = value def fdel(self, value): del self._x return fget, fset, fdel ... where property is something like: def property(propfunc): args = propfunc() doc = args[3] if len(args) > 3 else propfunc.__doc__ return existing_property(*args) The only practical problem here is that you need that extra name-repeating return at the end of each @property (and compared to needing to decorate three separate functions, that doesn't seem bad). You could maybe avoid the name repetition by using "return var()" and having property use **ret, but that seems a bit opaque and hacky. (You could even avoid the return entirely by having it use reflective magic to extract the constants from the function's code object, but that seems _really_ hacky.) The conceptual problem is that a decorator that actually calls its function and uses its return value rather than the function itself is pretty weird--a lot weirder than a class decorator, at least to me. It still seems better to me than any of the scope-based alternatives besides class, but a function as a container of functions doesn't feel as right as a class. But if no variation on either of these feels right enough, I think the current design is the best we're going to do. And it really isn't that bad in the first place. It's not like it's hard to tell what the setter is attached to. And repeating the property name up to two times in the secondary decorators is hardly terrible. One more possibility, if property is all we care about, is dedicated syntax. Plenty of other languages have it: property x: """x""" def fget(self): return self._x def fset(self, value): self._x = value I'll bet you could get pretty close to this with MacroPy (and Haoyi has probably already done it for you)... From encukou at gmail.com Tue Dec 1 07:47:15 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 1 Dec 2015 13:47:15 +0100 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <356013B6-33D0-4970-822C-E423F0A68432@yahoo.com> References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> <356013B6-33D0-4970-822C-E423F0A68432@yahoo.com> Message-ID: On Tue, Dec 1, 2015 at 1:01 PM, Andrew Barnert via Python-ideas wrote: > > I don't know the exact timing here, but I'm willing to bet that at the time that discussion happened: > > 1. Python didn't have class decorators yet, and the very notion was seen as obscure and unnecessary. > > 2. Inner and nested classes were an unfamiliar feature that almost no other major language supported, rather than being fundamental tools in Java. (Which means nobody had yet had to face the "which self" question, for example.) > > 3. Modern functional/OO hybrids like F# and Scala didn't exist (and OCaml was barely known outside specific academic circles), so the only familiar notion of dynamic class creation was the SmallTalk style, rather than functions that return classes (like namedtuple--although its implementation is more Tcl-ish than anything, the interface is still all about using types as first-class values). > > So, I'm not sure the objections hold as well today as they did back then. But I'll admit that they're certainly not empty; I'll have to sleep on them, then play with it and see how it really looks. Metaclasses were probably also obscure. > > One more possibility, if property is all we care about, is dedicated syntax. Plenty of other languages have it: > > property x: > """x""" > def fget(self): return self._x > def fset(self, value): self._x = value > > I'll bet you could get pretty close to this with MacroPy (and Haoyi has probably already done it for you)... Indeed, this is quite simple: "property x:" is pretty much syntactic sugar for "class x(metaclass=property_wrapper)", with a simple function as the metaclass, as Kevin suggested: def property_wrapper(name, bases, attrs): return property(attrs.get('get'), attrs.get('set'), attrs.get('del'), attrs.get('__doc__')) If the syntax was a bit more generic ? if "property" would not be a keyword but the name of a callable ? this could solve the "higher order function accepting multiple distinct functions as inputs" problem, or "creating something other than a class from a suite/namespace". From random832 at fastmail.com Tue Dec 1 10:29:02 2015 From: random832 at fastmail.com (Random832) Date: Tue, 1 Dec 2015 15:29:02 +0000 (UTC) Subject: [Python-ideas] Multiple arguments for decorators References: <20151201075123.GA32493@sjoerdjob.com> Message-ID: On 2015-12-01, Nick Coghlan wrote: > The trick with property is that it hides the "self" that refers to the > descriptor object itself, Yes, but if the descriptor object is a class, then there is no "self" because it's called as a static method. The missing owner parameter is the only problem when it's called in the way that the descriptor howto says it is called [type(b).__dict__['x'].__get__(b, type(b))]. It doesn't actually work as written, of course, since it doesn't have any way of knowing myprop is meant to be a descriptor. But the self parameter isn't the issue. Incidentally, I'm not sure I understand what the owner parameter is for, and what purpose it can be useful for which it isn't also needed on the setter. From guido at python.org Tue Dec 1 10:48:44 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Dec 2015 07:48:44 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> Message-ID: On Tue, Dec 1, 2015 at 1:01 AM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: [...] > > > And again, what's the benefit from this extra complexity? Unless you have > a whole lot of decorators written that all need this exact same > transformation, you're just abstracting out an arbitrary part of the logic > that doesn't seem to fit any natural grain. > I'm assuming this recurring desire to improve on the property decorator is because there are several other languages where a compact way to declare getters and setters is part of the language syntax, and it usually takes the form of an indented block containing some functions. But how important is this really? I did a quick count on a fairly big and complex code base I happened to have sitting around. It has 10x more classes than properties, and only a tiny fraction of those use the @x.setter notation. If that's the norm I'm not sure we need more. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Dec 1 11:22:30 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Dec 2015 08:22:30 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> Message-ID: On the other hand, if we're willing to put up with some ugliness in the *implementation*, the *notation* can be fairly clean (and avoid the creation of a class object): class Example: def __init__(self): self._x = 0.0 self._y = 0.0 class x(Property): def get(self): return self._x def set(self, value): self._x = float(value) class y(Property): def get(self): return self._y def set(self, value): self._y = float(value) Notice there's no explicit mention of metaclasses here. The magic is that Property is a class with a custom metaclass. The implementation could be as simple as this: class MetaProperty(type): """Metaclass for Property below.""" def __new__(cls, name, bases, attrs): if name == 'Property' and attrs['__module__'] == cls.__module__: # Defining the 'Property' class. return super().__new__(cls, name, bases, attrs) else: # Creating a property. Avoid creating a class at all. # Return a property instance. assert bases == (Property,) return property(attrs.get('get'), attrs.get('set'), attrs.get('delete'), attrs.get('__doc__')) class Property(metaclass=MetaProperty): """Inherit from this to define a read-write property.""" -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Tue Dec 1 11:34:50 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 1 Dec 2015 17:34:50 +0100 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> Message-ID: <565DCC2A.5060304@mail.de> I think I can also confirm that setters **usually** not needed in Python. On 01.12.2015 16:48, Guido van Rossum wrote: > On Tue, Dec 1, 2015 at 1:01 AM, Andrew Barnert via Python-ideas > > wrote: > [...] > > > And again, what's the benefit from this extra complexity? Unless > you have a whole lot of decorators written that all need this > exact same transformation, you're just abstracting out an > arbitrary part of the logic that doesn't seem to fit any natural > grain. > > > I'm assuming this recurring desire to improve on the property > decorator is because there are several other languages where a compact > way to declare getters and setters is part of the language syntax, and > it usually takes the form of an indented block containing some functions. > > But how important is this really? I did a quick count on a fairly big > and complex code base I happened to have sitting around. It has 10x > more classes than properties, and only a tiny fraction of those use > the @x.setter notation. If that's the norm I'm not sure we need more. > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Tue Dec 1 11:41:19 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 1 Dec 2015 17:41:19 +0100 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> Message-ID: <565DCDAF.1020104@mail.de> Nice idea and as concise as other languages do. I would appreciate a stdlib-provided 'Property'. Meta classes always make me feel that I need to bury them deeeeep down in some library. As properties are considered standard repetory these days (as you mentioned in the last post), I feel stdlib is the place. Best, Sven On 01.12.2015 17:22, Guido van Rossum wrote: > On the other hand, if we're willing to put up with some ugliness in > the *implementation*, the *notation* can be fairly clean (and avoid > the creation of a class object): > > class Example: > def __init__(self): > self._x = 0.0 > self._y = 0.0 > > class x(Property): > def get(self): > return self._x > def set(self, value): > self._x = float(value) > > class y(Property): > def get(self): > return self._y > def set(self, value): > self._y = float(value) > > Notice there's no explicit mention of metaclasses here. The magic is > that Property is a class with a custom metaclass. The implementation > could be as simple as this: > > class MetaProperty(type): > """Metaclass for Property below.""" > > def __new__(cls, name, bases, attrs): > if name == 'Property' and attrs['__module__'] == cls.__module__: > # Defining the 'Property' class. > return super().__new__(cls, name, bases, attrs) > else: > # Creating a property. Avoid creating a class at all. > # Return a property instance. > assert bases == (Property,) > return property(attrs.get('get'), attrs.get('set'), > attrs.get('delete'), attrs.get('__doc__')) > > > class Property(metaclass=MetaProperty): > """Inherit from this to define a read-write property.""" > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Tue Dec 1 14:09:49 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Tue, 01 Dec 2015 11:09:49 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> Message-ID: <565DF07D.8070109@brenbarn.net> On 2015-12-01 08:22, Guido van Rossum wrote: > On the other hand, if we're willing to put up with some ugliness in the > *implementation*, the *notation* can be fairly clean (and avoid the > creation of a class object): I like this way of doing it. It's more fluid than the other proposed solutions and is instantly understandable. I don't even think the implementation is ugly; it's just a bit complex. But I think using complex machinery on the back end to support a nice clean interface for users is a good idea. What would be the point of all those fancy things like metaclasses if we can't leverage them in cases like this? :-) I do see one possible hiccup, though: it won't be possible to use zero-argument super() for a subclass to override a superclass getter/setter, because the magic __class__ variable won't correctly point to the real class (i.e., the enclosing class in which the property inner class is defined). And trying to use the two-argument super() may be confusing for the same reason. I think trying to use super() with properties is already a somewhat fraught endeavor (e.g., http://bugs.python.org/issue14965), but it's worth thinking about how these approaches would work if someone wants overridable property getters/setters. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From vgr255 at live.ca Tue Dec 1 18:00:52 2015 From: vgr255 at live.ca (Emanuel Barry) Date: Tue, 1 Dec 2015 18:00:52 -0500 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: , , , , , , , , <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com>, , Message-ID: I actually like this approach better than a syntactic way. The only downside is that custom decorators don't get this addition, but then the recipe can be used and re-used for those cases. +1 Thanks everyone who replied and offered their solutions, I appreciate it :) From: guido at python.org Date: Tue, 1 Dec 2015 08:22:30 -0800 To: abarnert at yahoo.com Subject: Re: [Python-ideas] Multiple arguments for decorators CC: python-ideas at python.org On the other hand, if we're willing to put up with some ugliness in the *implementation*, the *notation* can be fairly clean (and avoid the creation of a class object): class Example: def __init__(self): self._x = 0.0 self._y = 0.0 class x(Property): def get(self): return self._x def set(self, value): self._x = float(value) class y(Property): def get(self): return self._y def set(self, value): self._y = float(value) Notice there's no explicit mention of metaclasses here. The magic is that Property is a class with a custom metaclass. The implementation could be as simple as this: class MetaProperty(type): """Metaclass for Property below.""" def __new__(cls, name, bases, attrs): if name == 'Property' and attrs['__module__'] == cls.__module__: # Defining the 'Property' class. return super().__new__(cls, name, bases, attrs) else: # Creating a property. Avoid creating a class at all. # Return a property instance. assert bases == (Property,) return property(attrs.get('get'), attrs.get('set'), attrs.get('delete'), attrs.get('__doc__')) class Property(metaclass=MetaProperty): """Inherit from this to define a read-write property.""" -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Dec 1 18:22:25 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 Dec 2015 12:22:25 +1300 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> <650A2120-A263-4568-8C4D-D1B4241B9382@yahoo.com> Message-ID: <565E2BB1.2090103@canterbury.ac.nz> Suppose you were able to write: @somedecorator as name: and have it be equivalent to @somedecorator: class name: Then you could say @property as foo: def get(self): ... def set(self, value): ... -- Greg From greg.ewing at canterbury.ac.nz Tue Dec 1 18:34:05 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 Dec 2015 12:34:05 +1300 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <565DCC2A.5060304@mail.de> References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> <565DCC2A.5060304@mail.de> Message-ID: <565E2E6D.3010203@canterbury.ac.nz> Sven R. Kunze wrote: > I think I can also confirm that setters **usually** not needed in Python. I think that depends on the kind of code you're writing. In PyGUI I make heavy use of properties, and most of them have both getters and setters. -- Greg From abarnert at yahoo.com Tue Dec 1 21:05:49 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 1 Dec 2015 18:05:49 -0800 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <565E2BB1.2090103@canterbury.ac.nz> References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> <650A2120-A263-4568-8C4D-D1B4241B9382@yahoo.com> <565E2BB1.2090103@canterbury.ac.nz> Message-ID: On Dec 1, 2015, at 15:22, Greg Ewing wrote: > > Suppose you were able to write: > > @somedecorator as name: > > > and have it be equivalent to > > @somedecorator: > class name: > Do you mean equivalent to this? @somedecorator class name: If so: what's the problem with what we already have? There's no double indenting going on, or anything else ugly or obtrusive, when spelled properly. And I don't see why making it easier to write a class where it's harder for the reader to tell you've done so is an aid to readability. Also, it means the "foo" is no longer in the usual place (def /class statement or assignment), so now we have to all learn to scan three different places where attributes can get named instead of just two, and the new one (unlike the existing two) only appears in classes, not at module or local scope. From ncoghlan at gmail.com Tue Dec 1 23:21:27 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Dec 2015 14:21:27 +1000 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <356013B6-33D0-4970-822C-E423F0A68432@yahoo.com> References: <1745702014.10551218.1448938331185.JavaMail.yahoo@mail.yahoo.com> <356013B6-33D0-4970-822C-E423F0A68432@yahoo.com> Message-ID: On 1 December 2015 at 22:01, Andrew Barnert wrote: > On Dec 1, 2015, at 01:56, Nick Coghlan wrote: >> A class based approach like the one suggested here came up in the >> previous discussion that gave us the current syntax: > > I don't know the exact timing here, but I'm willing to bet that at the time that discussion happened: > > 1. Python didn't have class decorators yet, and the very notion was seen as obscure and unnecessary. Class decorators and getter/setter/deleter were both added in 2.6/3.0 Property API additions: http://bugs.python.org/issue1416 Class decorators: https://www.python.org/dev/peps/pep-3129/ > 2. Inner and nested classes were an unfamiliar feature that almost no other major language supported, rather than being fundamental tools in Java. (Which means nobody had yet had to face the "which self" question, for example.) Java had had inner classes for over a decade by the time 2.6 & 3.0 were released. > 3. Modern functional/OO hybrids like F# and Scala didn't exist (and OCaml was barely known outside specific academic circles), so the only familiar notion of dynamic class creation was the SmallTalk style, rather than functions that return classes (like namedtuple--although its implementation is more Tcl-ish than anything, the interface is still all about using types as first-class values). Python itself had had dynamic class creation since the beginning, though, and new-style classes in 2.2 doubled down on that. Digging around a bit more, I found one reference to Guido pointing out his dislike for the nested class based approach was in the context of Steven Bethard's old "make" statement PEP: https://www.python.org/dev/peps/pep-0359/ Which also lead me to rediscovering why this particular idea of using a class with to class decorator to define a property sounded familiar: https://mail.python.org/pipermail/python-dev/2005-October/057350.html :) > So, I'm not sure the objections hold as well today as they did back then. But I'll admit that they're certainly not empty; I'll have to sleep on them, then play with it and see how it really looks. I think folks are more familiar with the use of class decorators in general (since they've been around for several years now), but I also think there's still a general expectation that any defined methods should behave like normal instance methods. > But if no variation on either of these feels right enough, I think the current design is the best we're going to do. And it really isn't that bad in the first place. It's not like it's hard to tell what the setter is attached to. And repeating the property name up to two times in the secondary decorators is hardly terrible. Yep :) It's an interesting language design conundrum (hence why a range of folks have been tinkering with various forms of the question for years), but in *practical* terms it doesn't matter a great deal. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Dec 1 23:29:54 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Dec 2015 14:29:54 +1000 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: References: <20151201075123.GA32493@sjoerdjob.com> Message-ID: On 2 December 2015 at 01:29, Random832 wrote: > Incidentally, I'm not sure I understand what the owner parameter is for, > and what purpose it can be useful for which it isn't also needed on the > setter. __get__ is the only case where the descriptor can override retrieval via the *class* in addition to via the instance. That case shows up as the instance being "None" in the call to __get__: >>> class ShowDescr: ... def __get__(self, instance, owner): ... print(self, instance, owner, sep="\n") ... >>> class C: ... x = ShowDescr() ... >>> C.x <__main__.ShowDescr object at 0x7f7ddcf3f400> None >>> C().x <__main__.ShowDescr object at 0x7f7ddcf3f400> <__main__.C object at 0x7f7ddcf3f4a8> For __set__ and __del__, setting and deleting via the class isn't something a descriptor stored on that class can affect (it needs to be on the metaclass instead) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Wed Dec 2 13:22:19 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 2 Dec 2015 19:22:19 +0100 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <565E2E6D.3010203@canterbury.ac.nz> References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> <565DCC2A.5060304@mail.de> <565E2E6D.3010203@canterbury.ac.nz> Message-ID: <565F36DB.3030806@mail.de> On 02.12.2015 00:34, Greg Ewing wrote: > Sven R. Kunze wrote: >> I think I can also confirm that setters **usually** not needed in >> Python. > > I think that depends on the kind of code you're writing. > In PyGUI I make heavy use of properties, and most of them > have both getters and setters. Maybe you are right. On the other hand, one could dispense with setters by using different means. Out of curiosity: what PyGUI are you referring to? Google gives me several distinct projects. Best, Sven From greg.ewing at canterbury.ac.nz Wed Dec 2 19:52:29 2015 From: greg.ewing at canterbury.ac.nz (Greg) Date: Thu, 03 Dec 2015 13:52:29 +1300 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <565F36DB.3030806@mail.de> References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> <565DCC2A.5060304@mail.de> <565E2E6D.3010203@canterbury.ac.nz> <565F36DB.3030806@mail.de> Message-ID: <565F924D.7080308@canterbury.ac.nz> On 3/12/2015 7:22 a.m., Sven R. Kunze wrote: > Out of curiosity: what PyGUI are you referring to? Google gives me > several distinct projects. This one: http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/ Although I suspect the same thing would apply to most GUI libraries for Python, at least if they layer a substantial amount of Python code on top of something else. -- Greg From rymg19 at gmail.com Wed Dec 2 21:13:01 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 02 Dec 2015 20:13:01 -0600 Subject: [Python-ideas] Multiple arguments for decorators In-Reply-To: <565F924D.7080308@canterbury.ac.nz> References: <83AC4C6D-5121-4F37-90CB-872CC2B1AB29@yahoo.com> <565DCC2A.5060304@mail.de> <565E2E6D.3010203@canterbury.ac.nz> <565F36DB.3030806@mail.de> <565F924D.7080308@canterbury.ac.nz> Message-ID: <9B566B6F-960F-496F-909E-8899856A0FA8@gmail.com> On December 2, 2015 6:52:29 PM CST, Greg wrote: >On 3/12/2015 7:22 a.m., Sven R. Kunze wrote: >> Out of curiosity: what PyGUI are you referring to? Google gives me >> several distinct projects. > >This one: > >http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/ > >Although I suspect the same thing would apply to most >GUI libraries for Python, at least if they layer a >substantial amount of Python code on top of something >else. I was under the impression that that library was no longer maintained, considering that Gtk support has been in the works since 2011... -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From Stephan.Sahm at gmx.de Fri Dec 4 08:20:15 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 4 Dec 2015 14:20:15 +0100 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ Message-ID: Dear all, I just stumbled upon a very weird behaviour of python 2 and python 3. At least I was not able to find a solution. *The point is to dynamically define __add__, __or__ and so on via __getattr__* (for example by deriving them from __iadd__ or similar in a generic way). However this very intuitive idea is currently NOT POSSIBLE because * - * / & | and so on just bypass this standard procedure. I found two stackoverflow contributions stating this: http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instances-method-call-to-its-attribute http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operations-to-deferred-value Neither the mentioned posts, nor I myself can see any reason why this is the way it is, nor how the operators are actually implemented to maybe bypass this unintuitive behaviour. Any help? Comments? Ides? best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Fri Dec 4 08:50:05 2015 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 04 Dec 2015 13:50:05 +0000 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: Le ven. 4 d?c. 2015 ? 14:21, Stephan Sahm a ?crit : > Dear all, > > I just stumbled upon a very weird behaviour of python 2 and python 3. At > least I was not able to find a solution. > > *The point is to dynamically define __add__, __or__ and so on via > __getattr__* (for example by deriving them from __iadd__ or similar in a > generic way). > However this very intuitive idea is currently NOT POSSIBLE because * - * / > & | and so on just bypass this standard procedure. > It is possible if you use indirection, and another name for the dynamic methods: class C: def __add__(self, other): try: method = self.dynamic_add except AttributeError: return NotImplemented else: return method(other) obj = C() print(obj + 1) # raises TypeError obj.dynamic_add = lambda other: 42 + other print(obj + 1) # 43 > > I found two stackoverflow contributions stating this: > > http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instances-method-call-to-its-attribute > > http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operations-to-deferred-value > > Neither the mentioned posts, nor I myself can see any reason why this is > the way it is, nor how the operators are actually implemented to maybe > bypass this unintuitive behaviour. > > Any help? Comments? Ides? > > best, > Stephan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Fri Dec 4 08:55:52 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 4 Dec 2015 14:55:52 +0100 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: Thanks for the constructive response! this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code. However, in the long run, such placeholders seem like an unnecessary overhead to me. I myself thought whether one can insert the methods into the class itself using __new__ or something else concerning meta-classes. My background is however not good enough in order to seriously explore this direction. Any further suggestions? On 4 December 2015 at 14:50, Amaury Forgeot d'Arc wrote: > Le ven. 4 d?c. 2015 ? 14:21, Stephan Sahm a ?crit : > >> Dear all, >> >> I just stumbled upon a very weird behaviour of python 2 and python 3. At >> least I was not able to find a solution. >> >> *The point is to dynamically define __add__, __or__ and so on via >> __getattr__* (for example by deriving them from __iadd__ or similar in a >> generic way). >> However this very intuitive idea is currently NOT POSSIBLE because * - * >> / & | and so on just bypass this standard procedure. >> > > It is possible if you use indirection, and another name for the dynamic > methods: > > class C: > def __add__(self, other): > try: > method = self.dynamic_add > except AttributeError: > return NotImplemented > else: > return method(other) > > obj = C() > print(obj + 1) # raises TypeError > obj.dynamic_add = lambda other: 42 + other > print(obj + 1) # 43 > > >> >> I found two stackoverflow contributions stating this: >> >> http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instances-method-call-to-its-attribute >> >> http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operations-to-deferred-value >> >> Neither the mentioned posts, nor I myself can see any reason why this is >> the way it is, nor how the operators are actually implemented to maybe >> bypass this unintuitive behaviour. >> >> Any help? Comments? Ides? >> >> best, >> Stephan >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Dec 4 09:12:48 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 5 Dec 2015 01:12:48 +1100 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: On Sat, Dec 5, 2015 at 12:55 AM, Stephan Sahm wrote: > this is good to know, however it still forces me make such a dynamic > placeholder explicit for every single operator. > Still, I could put everything in a Mixin class so that it is in fact useful > to clean up code. > > However, in the long run, such placeholders seem like an unnecessary > overhead to me. They might be, in the unusual case where the *instance* determines how it reacts to an operator. In the far more common case where the *type* determines it (that is, where all instances of a particular type behave the same way), the current system has less overhead. (Though I'm not sure how much less, nor even if it's been measured.) If you really do need this kind of per-object dynamism, you still probably don't actually need it for _every_ operator, so you can simply create a bouncer for each dunder method that you actually need this dynamic dispatch on, and as you say, toss them into a mixin if necessary. ChrisA From ncoghlan at gmail.com Fri Dec 4 09:14:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Dec 2015 00:14:17 +1000 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: On 4 December 2015 at 23:55, Stephan Sahm wrote: > Thanks for the constructive response! > > this is good to know, however it still forces me make such a dynamic > placeholder explicit for every single operator. > Still, I could put everything in a Mixin class so that it is in fact useful > to clean up code. > > However, in the long run, such placeholders seem like an unnecessary > overhead to me. The current behaviour is by design - special methods are looked up as slots on the object's class, not as instance attributes. This allows the interpreter to bypass several steps in the normal instance attribute lookup process. > I myself thought whether one can insert the methods into the class itself > using __new__ or something else concerning meta-classes. My background is > however not good enough in order to seriously explore this direction. > > Any further suggestions? Graham Dumpleton's wrapt module has a robust implementation of object proxies: http://wrapt.readthedocs.org/en/latest/wrappers.html Even if that library isn't directly applicable to your use case, studying the implementation should be informative. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Stephan.Sahm at gmx.de Fri Dec 4 09:22:21 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 4 Dec 2015 15:22:21 +0100 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: thank you two the wrapt link is on my future todo-list, will need some more time than I currently have You both mentioned that the operators might be better tackleable via class-interfaces (__slots__ e.g.?) How would it look like to change things on this level? My current usecase is to implement a Mixin abc class which shall combine all the available __iadd__ __ior__ and so on with a copy() function (this one is then the abstractmethod) to produce automatically the respective __add__, __or__ and so on On 4 December 2015 at 15:14, Nick Coghlan wrote: > On 4 December 2015 at 23:55, Stephan Sahm wrote: > > Thanks for the constructive response! > > > > this is good to know, however it still forces me make such a dynamic > > placeholder explicit for every single operator. > > Still, I could put everything in a Mixin class so that it is in fact > useful > > to clean up code. > > > > However, in the long run, such placeholders seem like an unnecessary > > overhead to me. > > The current behaviour is by design - special methods are looked up as > slots on the object's class, not as instance attributes. This allows > the interpreter to bypass several steps in the normal instance > attribute lookup process. > > > I myself thought whether one can insert the methods into the class itself > > using __new__ or something else concerning meta-classes. My background is > > however not good enough in order to seriously explore this direction. > > > > Any further suggestions? > > Graham Dumpleton's wrapt module has a robust implementation of object > proxies: http://wrapt.readthedocs.org/en/latest/wrappers.html > > Even if that library isn't directly applicable to your use case, > studying the implementation should be informative. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Dec 4 14:00:47 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 4 Dec 2015 21:00:47 +0200 Subject: [Python-ideas] find-like functionality in pathlib Message-ID: What do you think about implementing functionality similar to the `find` utility in Linux in the Pathlib module? I wanted this today, I had a script to write to archive a bunch of files from a folder, and I decided to try writing it in Python rather than in Bash. But I needed something stronger than `Path.glob` in order to select the files. I wanted a regular expression. (In this particular case, I wanted to get a list of all the files excluding the `.git` folder and all files inside of it. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Fri Dec 4 14:04:46 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 04 Dec 2015 11:04:46 -0800 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: <5661E3CE.6000408@brenbarn.net> On 2015-12-04 06:14, Nick Coghlan wrote: > On 4 December 2015 at 23:55, Stephan Sahm wrote: >> Thanks for the constructive response! >> >> this is good to know, however it still forces me make such a dynamic >> placeholder explicit for every single operator. >> Still, I could put everything in a Mixin class so that it is in fact useful >> to clean up code. >> >> However, in the long run, such placeholders seem like an unnecessary >> overhead to me. > > The current behaviour is by design - special methods are looked up as > slots on the object's class, not as instance attributes. This allows > the interpreter to bypass several steps in the normal instance > attribute lookup process. It is worth noting that the behavior is even more magical than this. Even when looked up on the class, implicit special method lookup bypasses __getattr__ and __getattribute__ of the metaclass. So the special method lookup is not just an ordinary lookup that happens to start on the class instead of the instance; it is a fully magic lookup that does not engage the usual attribute-access-customization hooks at any level. This is (https://docs.python.org/3/reference/datamodel.html#special-method-lookup) documented but it is often surprising to new users. Personally I find it an annoying inconsistency and hope that at some time in the future Python will become fast enough overall that the extra overhead will be acceptable. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From lac at openend.se Fri Dec 4 14:04:45 2015 From: lac at openend.se (Laura Creighton) Date: Fri, 04 Dec 2015 20:04:45 +0100 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: Message-ID: <201512041904.tB4J4jgu006990@fido.openend.se> In a message of Fri, 04 Dec 2015 21:00:47 +0200, Ram Rachum writes: >What do you think about implementing functionality similar to the `find` >utility in Linux in the Pathlib module? I wanted this today, I had a script >to write to archive a bunch of files from a folder, and I decided to try >writing it in Python rather than in Bash. But I needed something stronger >than `Path.glob` in order to select the files. I wanted a regular >expression. (In this particular case, I wanted to get a list of all the >files excluding the `.git` folder and all files inside of it. fnmatch https://docs.python.org/3.6/library/fnmatch.html wasn't sufficient for your needs? Laura From ram at rachum.com Fri Dec 4 14:08:23 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 4 Dec 2015 21:08:23 +0200 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: <201512041904.tB4J4jgu006990@fido.openend.se> References: <201512041904.tB4J4jgu006990@fido.openend.se> Message-ID: 1. That would require going out of the pathlib framework. I can do that but it's more of a mess because then I need to convert the results back to Path objects. 2. Not sure how I would use fnmatch, because I wouldn't want to give it the list of all files recursively, since that would be a long list of files (lots of files in ".git" folder that I want to ignore.) I want it to first ignore everything in the ".git" folder completely without going over all the files, and then include all the other files recursively. On Fri, Dec 4, 2015 at 9:04 PM, Laura Creighton wrote: > In a message of Fri, 04 Dec 2015 21:00:47 +0200, Ram Rachum writes: > >What do you think about implementing functionality similar to the `find` > >utility in Linux in the Pathlib module? I wanted this today, I had a > script > >to write to archive a bunch of files from a folder, and I decided to try > >writing it in Python rather than in Bash. But I needed something stronger > >than `Path.glob` in order to select the files. I wanted a regular > >expression. (In this particular case, I wanted to get a list of all the > >files excluding the `.git` folder and all files inside of it. > > fnmatch https://docs.python.org/3.6/library/fnmatch.html > wasn't sufficient for your needs? > > Laura > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Dec 4 14:24:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Dec 2015 19:24:33 +0000 (UTC) Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: References: Message-ID: <982471246.12589823.1449257073720.JavaMail.yahoo@mail.yahoo.com> This is explained in the documentation (https://docs.python.org/3/reference/datamodel.html#special-method-lookup for 3.x; for 2.x, it's largely the same except that old-style classes exist and have a different rule). There is also at least one StackOverflow answer that explains this (I know, because I wrote one...), but it's not even on the first page of a search, while the docs answer is the first result that shows up. Plus, the docs are written by the Python dev team in an open community process, while my SO answer is only as good as one user's understanding and writing ability; even if it weren't thrown in with a dozen answers that are wrong or just say "Python does this because it's dumb" or whatever, I'd go to the docs first. On Friday, December 4, 2015 6:22 AM, Stephan Sahm wrote: >the wrapt link is on my future todo-list, will need some more time than I currently have The wrapt link has the sample code to do exactly what you want. Pop it off your todo list. If you have more time in the future, you may want to look at some bridging libraries like PyObjC; once you can understand how to map between Python's and ObjC's different notions of method lookup, you'll understand this well enough to teach a class on it. :) But for now, wrapt is more than enough. >You both mentioned that the operators might be better tackleable via class-interfaces (__slots__ e.g.?) Not __slots__. The terminology gets a bit confusing here. What they're referring to is the C-API notion of slots. In particular, every builtin (or C-extension) is defined by a C struct which contains, among other things, members like "np_add", which is a pointer to an adding function. For example, the int type's np_add member points to a function that adds integers to other things. Slightly oversimplified, the np_add slot of every class implemented in Python just points to a function that does a stripped-down lookup for '__add__' instead of the usual __getattribute__ mechanism. (Which means you get no __getattr__, __slots__, @properties from the metaclass, etc.) So, why are these two things both called "slots"? Well, the point of __slots__ is to give you the space savings, and static collection of members that exist on every instance, that builtin types' instances get by using slots in C structs. So, the intuitive notion of C struct layout rather than dict lookup is central in both cases, just in different ways. (If you really want to, you could think about np_add as being a member of the __slots__ of a special metaclass that all builtin classes use. But that's probably more misleading than helpful, because CPython isn't actually implemented that way.) >My current usecase is to implement a Mixin abc class which shall combine all the available __iadd__ __ior__ and so on with a copy() function (this one is then the abstractmethod) to produce automatically the respective __add__, __or__ and so on OK, so if you can't handle __add__ dynamically at method lookup time, how do you deal with that? Of course you can just write lots of boilerplate, but presumably you're using Python instead of Java for a reason. You could also write Python code that generates the boilerplate (as a module, or as code to exec), but presumably you're using Python instead of Tcl for a reason. If you need methods that are dynamically generated at class creation time, just dynamically generate your methods at class creation time: def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): ifunc = getattr(cls, '__i{}__'.format(name)) def wrapper(self, other): self_copy = self.copy() ifunc(self_copy, other) return self_copy setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def __iadd__(self, other): self.x += other.x self.y += other.y self.z += other.z self.w += other.w return self # etc. Obviously in real life you'll want to fix up the name, docstring, etc. of wrapper (see functools.wraps for how to do this). And you'll want to use tested code rather than something I wrote in an email. You may also want to use a metaclass rather than a decorator (which can be easily hidden from the end-user, because they just need to inherit a mixin that uses that metaclass). Also, if you only ever need to do this dynamic stuff for exactly one class (your mixin), you may not want to use a decorator _or_ a metaclass; just munge the class up in module-level code right after the class definition. But you get the idea. If you really need the lookup to be dynamic (but I don't think you do here, in which case you'd just be adding extra complexity and inefficiency for no reason), you just need to write your own protocol for this and dynamically generate the methods that bounce to that protocol. For example: def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): def wrapper(self, other): self._get_math_method('{}'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def _get_math_method(self, name): def wrapper(self, other): # see previous example return wrapper In fact, if you really wanted to, you could even abuse __getattr__. I think that would be more likely to confuse people than to help them, but... def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): def wrapper(self, other): self.__getattr__('__{}__'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def __getattr__(self, name): if name.startswith('__') and name.endswith('__'): name = name.strip('_') def wrapper(self, other): # see previous example return wrapper Again, don't do this last one. I think the first example (or even the simpler version where you just dynamically generate your mixin's members with simple module-level code) is all you need. From Stephan.Sahm at gmx.de Fri Dec 4 14:40:12 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 4 Dec 2015 20:40:12 +0100 Subject: [Python-ideas] Missing Core Feature: + - * / | & do not call __getattr__ In-Reply-To: <982471246.12589823.1449257073720.JavaMail.yahoo@mail.yahoo.com> References: <982471246.12589823.1449257073720.JavaMail.yahoo@mail.yahoo.com> Message-ID: Thank you very much! lovely fast and lovely extensive answers. I fear, I myself cannot react as fast in completeness (i.e. reading those references and respond with more background knowledge) - this needs more time and the weekend is already quite busy in my case - but at least I want to seize the moment and thank all of you for creating this amazing community! On 4 December 2015 at 20:24, Andrew Barnert wrote: > This is explained in the documentation ( > https://docs.python.org/3/reference/datamodel.html#special-method-lookup > for 3.x; for 2.x, it's largely the same except that old-style classes exist > and have a different rule). > > There is also at least one StackOverflow answer that explains this (I > know, because I wrote one...), but it's not even on the first page of a > search, while the docs answer is the first result that shows up. Plus, the > docs are written by the Python dev team in an open community process, while > my SO answer is only as good as one user's understanding and writing > ability; even if it weren't thrown in with a dozen answers that are wrong > or just say "Python does this because it's dumb" or whatever, I'd go to the > docs first. > > > On Friday, December 4, 2015 6:22 AM, Stephan Sahm > wrote: > > > >the wrapt link is on my future todo-list, will need some more time than I > currently have > > The wrapt link has the sample code to do exactly what you want. Pop it off > your todo list. > > If you have more time in the future, you may want to look at some bridging > libraries like PyObjC; once you can understand how to map between Python's > and ObjC's different notions of method lookup, you'll understand this well > enough to teach a class on it. :) But for now, wrapt is more than enough. > > > >You both mentioned that the operators might be better tackleable via > class-interfaces (__slots__ e.g.?) > > Not __slots__. The terminology gets a bit confusing here. What they're > referring to is the C-API notion of slots. In particular, every builtin (or > C-extension) is defined by a C struct which contains, among other things, > members like "np_add", which is a pointer to an adding function. For > example, the int type's np_add member points to a function that adds > integers to other things. Slightly oversimplified, the np_add slot of every > class implemented in Python just points to a function that does a > stripped-down lookup for '__add__' instead of the usual __getattribute__ > mechanism. (Which means you get no __getattr__, __slots__, @properties from > the metaclass, etc.) > > So, why are these two things both called "slots"? Well, the point of > __slots__ is to give you the space savings, and static collection of > members that exist on every instance, that builtin types' instances get by > using slots in C structs. So, the intuitive notion of C struct layout > rather than dict lookup is central in both cases, just in different ways. > (If you really want to, you could think about np_add as being a member of > the __slots__ of a special metaclass that all builtin classes use. But > that's probably more misleading than helpful, because CPython isn't > actually implemented that way.) > > >My current usecase is to implement a Mixin abc class which shall combine > all the available __iadd__ __ior__ and so on with a copy() function (this > one is then the abstractmethod) to produce automatically the respective > __add__, __or__ and so on > > > OK, so if you can't handle __add__ dynamically at method lookup time, how > do you deal with that? > > Of course you can just write lots of boilerplate, but presumably you're > using Python instead of Java for a reason. You could also write Python code > that generates the boilerplate (as a module, or as code to exec), but > presumably you're using Python instead of Tcl for a reason. If you need > methods that are dynamically generated at class creation time, just > dynamically generate your methods at class creation time: > > def mathify(cls): > for name in ('add', 'sub', 'mul', 'div'): > ifunc = getattr(cls, '__i{}__'.format(name)) > def wrapper(self, other): > self_copy = self.copy() > ifunc(self_copy, other) > return self_copy > setattr(cls, '__{}__'.format(name), wrapper) > return cls > > @mathify > class Quaternion: > def __iadd__(self, other): > self.x += other.x > > self.y += other.y > > self.z += other.z > self.w += other.w > return self > # etc. > > Obviously in real life you'll want to fix up the name, docstring, etc. of > wrapper (see functools.wraps for how to do this). And you'll want to use > tested code rather than something I wrote in an email. You may also want to > use a metaclass rather than a decorator (which can be easily hidden from > the end-user, because they just need to inherit a mixin that uses that > metaclass). Also, if you only ever need to do this dynamic stuff for > exactly one class (your mixin), you may not want to use a decorator _or_ a > metaclass; just munge the class up in module-level code right after the > class definition. But you get the idea. > > If you really need the lookup to be dynamic (but I don't think you do > here, in which case you'd just be adding extra complexity and inefficiency > for no reason), you just need to write your own protocol for this and > dynamically generate the methods that bounce to that protocol. For example: > > def mathify(cls): > > for name in ('add', 'sub', 'mul', 'div'): > > def wrapper(self, other): > > self._get_math_method('{}'.format(name))(other) > setattr(cls, '__{}__'.format(name), wrapper) > return cls > > @mathify > class Quaternion: > def _get_math_method(self, name): > def wrapper(self, other): > # see previous example > return wrapper > > In fact, if you really wanted to, you could even abuse __getattr__. I > think that would be more likely to confuse people than to help them, but... > > def mathify(cls): > for name in ('add', 'sub', 'mul', 'div'): > def wrapper(self, other): > self.__getattr__('__{}__'.format(name))(other) > setattr(cls, '__{}__'.format(name), wrapper) > return cls > > @mathify > class Quaternion: > def __getattr__(self, name): > if name.startswith('__') and name.endswith('__'): > name = name.strip('_') > def wrapper(self, other): > # see previous example > return wrapper > > > Again, don't do this last one. I think the first example (or even the > simpler version where you just dynamically generate your mixin's members > with simple module-level code) is all you need. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Dec 4 15:02:46 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Dec 2015 12:02:46 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <5661E3CE.6000408@brenbarn.net> References: <5661E3CE.6000408@brenbarn.net> Message-ID: <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> On Friday, December 4, 2015 11:05 AM, Brendan Barnwell wrote: > > This is > (https://docs.python.org/3/reference/datamodel.html#special-method-lookup) > documented but it is often surprising to new users. Personally I find > it an annoying inconsistency and hope that at some time in the future > Python will become fast enough overall that the extra overhead will be > acceptable. Unless it's changed since I last looked, Python doesn't actually define _which_ methods are looked up this way; it just allows implementations to do it for any subset of (presumably) the methods defined in the Data Model chapter. I don't think any of the major implementations besides CPython need this optimization at all (a JIT is generally going to do the lookup once and cache it, right?), but at least PyPy follows CPython exactly anyway, to avoid any possible compatibility problems. Meanwhile, there might not be one solution once and for all. For example, __add__ has to switch on the other object's type, unbox the values, malloc and box up a result, etc.; __bool__ often just has to check a struct member != 0 and return the constant True or False, so reducing the overhead to 10% on __add__ may still mean 105% on __bool__. From another angle, the __add__ lookup is part of a larger thing that includes __radd__ lookup and comparing types and so on, and that process might end up with some way to optimize lookup of the methods that wouldn't apply to unary operators or non-operator methods. So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods, and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away. From python at lucidity.plus.com Fri Dec 4 16:16:35 2015 From: python at lucidity.plus.com (Erik) Date: Fri, 4 Dec 2015 21:16:35 +0000 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <201512041904.tB4J4jgu006990@fido.openend.se> Message-ID: <566202B3.80307@lucidity.plus.com> On 04/12/15 19:08, Ram Rachum wrote: > 2. Not sure how I would use fnmatch, because I wouldn't want to give it > the list of all files recursively, since that would be a long list of > files (lots of files in ".git" folder that I want to ignore.) I want it > to first ignore everything in the ".git" folder completely without going > over all the files, and then include all the other files recursively. Ram - os.walk() is probably the closest existing thing to what you want here (if it's called with topdown=True - the default - then you can remove the ".git" entry from the list of directories to prevent the walker from descending into that directory completely). I know: this is still stepping out of pathlib. However, it's probably what you want if you want to get something working soon ;) FWIW, this is not unrelated to my recent request for an os.walk() which returns the DirEntry objects - a thread that I am in the process of trying to summarise so that it doesn't drop off the RADAR (though it seems like this whole area is a can of worms ...). E From bunslow at gmail.com Fri Dec 4 16:44:18 2015 From: bunslow at gmail.com (Bill Winslow) Date: Fri, 4 Dec 2015 15:44:18 -0600 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function Message-ID: This is a question I posed to reddit, with no real resolution: https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ The summary for people here is the following: Here's a pattern I'm using for my code: def deterministic_recursive_calculation(input, partial_state=None): condition = do_some_calculations(input) if condition: return deterministic_recursive_calculation(reduced_input, some_state) Basically, in calculating the results of the subproblem, the subproblem can be calculated quicker by including/sharing some partial results from the superproblem. (Calling the subproblem without the partial state still gives the same result, but takes substantially longer.) I want to memoize this function for obvious reasons, but I need the lru_cache to ignore the partial_state argument, for its value does not affect the output, only the computation expense. Is there any reasonable way to do this? Things such as memoizing a wrapper don't actually solve the problem. About the only way I can think of with current technology is either to have a hidden singleton class which maintains state, or a hidden global variable, which amount to the same thing of storing the partial state outside the function. But those strike me as unwieldy and unpythonic. What I'd really like to do is to have some way to tell functools.lru_cache to ignore some arguments of a function it's memoizing for the purposes of caching. One way would be to add an "arg_filter" argument, which for purposes of this example would be used like so: @lru_cache(arg_filter=lambda args, kwargs: args[:1], {}) def deterministic_recursive_calculation(input, partial_state=None): condition = do_some_calculations(input) if condition: return deterministic_recursive_calculation(reduced_input, some_state) This effectively ignores all arguments except the first positional one for the purposes of caching. Such an option could be implemented as in the diff below (provided only for discussion purposes). So what I want to know is: 1) Am I sane? Is there no particularly good way currently to go about caching functions following the given pattern? 2) Assuming the answer to 1) is "Yes I am sane, and there's no good way currently", is my proposal a reasonable one? Mostly on philosophical grounds, not necessarily specifics of how it works. Thank you for your time and consideration. Bill ---------------------------------------------------------------------------------------------------------------------------------- https://hg.python.org/cpython/file/3.5/Lib/functools.py diff functools.py functools.py.orig 363c363 < def _make_key(args, kwds, typed, arg_filter, --- > def _make_key(args, kwds, typed, 377,378d376 < if arg_filter is not None: < args, kwds = arg_filter(args, kwds) 393c391 < def lru_cache(maxsize=128, typed=False, arg_filter=None): --- > def lru_cache(maxsize=128, typed=False): 403,406d400 < *arg_filter* is an optional function which filters user-speicified portions < of the arguments from the caching key (e.g. if an argument affects the < computation but not the final result). < 428,430d421 < if arg_filter is not None and not callable(arg_filter): < raise TypeError('Expected arg_filter to be a callable') < 432,433c423 < wrapper = _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, < _CacheInfo) --- > wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo) 438c428 < def _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, _CacheInfo): --- > def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo): 466c456 < key = make_key(args, kwds, typed, arg_filter) --- > key = make_key(args, kwds, typed) 481c471 < key = make_key(args, kwds, typed, arg_filter) --- > key = make_key(args, kwds, typed) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Dec 4 17:21:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Dec 2015 14:21:43 -0800 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: <649E1165-B69D-4B2C-ACCA-BC20AD4812E2@yahoo.com> IIRC, lru_cache actually stores a make_key function that it calls on the args and kw. If this were exposed as part of the interface, you could just do this: @lru_cache() def spam(a, b): return a _make_key = spam.cache.make_key def make_key(args, kw): return _make_key(args[:1], kw) spam.cache.make_key = make_key Your idea of being able to pass a key-transformer or -maker into the constructor is pretty nice too, but it seems like the two should work together instead of being completely different things that end up having the same effect. One last thing: your transformer obviously won't handle the case where the user passes the args by keyword instead of by name (and same for mine). Maybe they'd never do that for the input argument, but for partial_state it might be more readable than passing it positionally. So maybe we want to make it easier to correctly specify the args that matter. The simplest interface would just be a list of parameter names (and lru_cache would then have to get_signature its argument to figure out the positional equivalents). On the other hand, maybe the implementation complexity isn't worth the interface simplicity. Sent from my iPhone > On Dec 4, 2015, at 13:44, Bill Winslow wrote: > This is a question I posed to reddit, with no real resolution: https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ > > The summary for people here is the following: > > Here's a pattern I'm using for my code: > > def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, some_state) > > Basically, in calculating the results of the subproblem, the subproblem can be calculated quicker by including/sharing some partial results from the superproblem. (Calling the subproblem without the partial state still gives the same result, but takes substantially longer.) > > I want to memoize this function for obvious reasons, but I need the lru_cache to ignore the partial_state argument, for its value does not affect the output, only the computation expense. > > Is there any reasonable way to do this? > > Things such as memoizing a wrapper don't actually solve the problem. About the only way I can think of with current technology is either to have a hidden singleton class which maintains state, or a hidden global variable, which amount to the same thing of storing the partial state outside the function. But those strike me as unwieldy and unpythonic. > > What I'd really like to do is to have some way to tell functools.lru_cache to ignore some arguments of a function it's memoizing for the purposes of caching. > > One way would be to add an "arg_filter" argument, which for purposes of this example would be used like so: > > @lru_cache(arg_filter=lambda args, kwargs: args[:1], {}) > def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, some_state) > > This effectively ignores all arguments except the first positional one for the purposes of caching. Such an option could be implemented as in the diff below (provided only for discussion purposes). > > So what I want to know is: > 1) Am I sane? Is there no particularly good way currently to go about caching functions following the given pattern? > 2) Assuming the answer to 1) is "Yes I am sane, and there's no good way currently", is my proposal a reasonable one? Mostly on philosophical grounds, not necessarily specifics of how it works. > > Thank you for your time and consideration. > > Bill > > ---------------------------------------------------------------------------------------------------------------------------------- > https://hg.python.org/cpython/file/3.5/Lib/functools.py > > diff functools.py functools.py.orig > 363c363 > < def _make_key(args, kwds, typed, arg_filter, > --- > > def _make_key(args, kwds, typed, > 377,378d376 > < if arg_filter is not None: > < args, kwds = arg_filter(args, kwds) > 393c391 > < def lru_cache(maxsize=128, typed=False, arg_filter=None): > --- > > def lru_cache(maxsize=128, typed=False): > 403,406d400 > < *arg_filter* is an optional function which filters user-speicified portions > < of the arguments from the caching key (e.g. if an argument affects the > < computation but not the final result). > < > 428,430d421 > < if arg_filter is not None and not callable(arg_filter): > < raise TypeError('Expected arg_filter to be a callable') > < > 432,433c423 > < wrapper = _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, > < _CacheInfo) > --- > > wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo) > 438c428 > < def _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, _CacheInfo): > --- > > def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo): > 466c456 > < key = make_key(args, kwds, typed, arg_filter) > --- > > key = make_key(args, kwds, typed) > 481c471 > < key = make_key(args, kwds, typed, arg_filter) > --- > > key = make_key(args, kwds, typed) > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Dec 4 17:11:39 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 04 Dec 2015 16:11:39 -0600 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: http://thecodelesscode.com/topics/caching This smells like something that could cause trouble to me... If it's in the stdlib, it's easier for someone to use it and then be confused when something doesn't work correctly (maybe someone else will make partial_state affect the result, not realizing that that argument is ignored). Too buggy to me. On December 4, 2015 3:44:18 PM CST, Bill Winslow wrote: >This is a question I posed to reddit, with no real resolution: >https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ > >The summary for people here is the following: > >Here's a pattern I'm using for my code: > >def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, >some_state) > >Basically, in calculating the results of the subproblem, the subproblem >can >be calculated quicker by including/sharing some partial results from >the >superproblem. (Calling the subproblem without the partial state still >gives >the same result, but takes substantially longer.) > >I want to memoize this function for obvious reasons, but I need the >lru_cache to ignore the partial_state argument, for its value does not >affect the output, only the computation expense. > >Is there any reasonable way to do this? > >Things such as memoizing a wrapper don't actually solve the problem. >About >the only way I can think of with current technology is either to have a >hidden singleton class which maintains state, or a hidden global >variable, >which amount to the same thing of storing the partial state outside the >function. But those strike me as unwieldy and unpythonic. > >What I'd really like to do is to have some way to tell >functools.lru_cache >to ignore some arguments of a function it's memoizing for the purposes >of >caching. > >One way would be to add an "arg_filter" argument, which for purposes of >this example would be used like so: > >@lru_cache(arg_filter=lambda args, kwargs: args[:1], {}) >def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, >some_state) > >This effectively ignores all arguments except the first positional one >for >the purposes of caching. Such an option could be implemented as in the >diff >below (provided only for discussion purposes). > >So what I want to know is: >1) Am I sane? Is there no particularly good way currently to go about >caching functions following the given pattern? >2) Assuming the answer to 1) is "Yes I am sane, and there's no good way >currently", is my proposal a reasonable one? Mostly on philosophical >grounds, not necessarily specifics of how it works. > >Thank you for your time and consideration. > >Bill > >---------------------------------------------------------------------------------------------------------------------------------- >https://hg.python.org/cpython/file/3.5/Lib/functools.py > >diff functools.py functools.py.orig >363c363 >< def _make_key(args, kwds, typed, arg_filter, >--- >> def _make_key(args, kwds, typed, >377,378d376 >< if arg_filter is not None: >< args, kwds = arg_filter(args, kwds) >393c391 >< def lru_cache(maxsize=128, typed=False, arg_filter=None): >--- >> def lru_cache(maxsize=128, typed=False): >403,406d400 >< *arg_filter* is an optional function which filters >user-speicified >portions >< of the arguments from the caching key (e.g. if an argument >affects the >< computation but not the final result). >< >428,430d421 >< if arg_filter is not None and not callable(arg_filter): >< raise TypeError('Expected arg_filter to be a callable') >< >432,433c423 >< wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >arg_filter, >< _CacheInfo) >--- >> wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >_CacheInfo) >438c428 >< def _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, >_CacheInfo): >--- >> def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo): >466c456 >< key = make_key(args, kwds, typed, arg_filter) >--- >> key = make_key(args, kwds, typed) >481c471 >< key = make_key(args, kwds, typed, arg_filter) >--- >> key = make_key(args, kwds, typed) > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Dec 4 21:38:14 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 5 Dec 2015 13:38:14 +1100 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> Message-ID: <20151205023814.GD3821@ando.pearwood.info> On Fri, Dec 04, 2015 at 12:02:46PM -0800, Andrew Barnert via Python-ideas wrote: > So, maybe the way to get from here to there is to explicitly document > the methods CPython treats as magic methods, That's probably a good idea regardless of anything else. > and allow only allow > other implementations to do the same for (a subset of) the same, and > people can gradually tackle and remove parts of that list as people > come up with ideas, and if the list eventually becomes empty (or gets > to the point where it's 2 rare things that aren't important enough to > keep extra complexity in the language), then the whole notion of > special method lookup can finally go away. Well, maybe. I haven't decided whether special method lookup is an annoying inconsistency or a feature. It seems to me that operators, at least, are somewhat different from method calls, and the special handling should be considered a deliberate design feature. My reasoning goes like this: Take a method call: obj.spam(arg) That's clearly called on the instance itself (obj), and syntactically we can see that the method call "belongs" to obj, and so semantically we should give obj a chance to use its own custom method before the method defined in the class. (At least in languages like Python where this feature is supported. I believe that the "Design Patterns" world actually gives this idiom a name, and in Java you have to jump through flaming hoops to make it work, but I can never remember what it is called.) So it makes sense that ordinary methods should be first looked up on the instance, then the class. But now take an operator call: obj + foo (where, for simplicity, both obj and foo are instances of the same class). Syntactically, that's *not* clearly a call on a specific instance. There's no good reason to think that the + operator "belongs" to obj, or foo, or either of them. We could nevertheless arbitrarily decide that the left hand instance obj gets first crack at this, and if it doesn't customise the call to __add__, the right hand instance foo is allowed to customise __radd__, and if that doesn't exist we go back to the class __add__ (and if that also fails to exist we try the class __radd__). That's okay, I guess, but the added complexity feels like it would be a bug magnet. So *maybe* we should decide that the right design is to say that operators don't belong to either instance, they belong to the class, and neither obj nor foo get the chance to override __[r]add__ on a per-instance basis. Similar reasoning applies to __getattr__ and other special methods. We can, I think, convince ourselves that they belong to the class, not the instance, in a way that ordinary methods are not. We define ordinary methods in the class definition for convenience and efficency, but syntactically and semantically obj.spam looks up spam in the instance obj first, so obj has a chance to override any spam defined on the class. That's a good thing. But syntactically, special dunder methods like __getattr__ and __add__ don't *clearly* belong to the instance, so maybe we should decide that it is a positive feature that they can't be overridden by the instance. By analogy, if we use an unbound method instead: TheClass.spam(obj, arg) then I think we would all agree that the per-instance obj.spam() method should *not* be called. It seems to me that operator methods "feel like" they are closer to unbound method calls than bound method calls. And likewise for __getattr__ etc. As I said, I haven't decided myself which way I lean. If Python didn't already support per-instance method overriding, I would argue strongly that it should. But per-instance operator overloading? I'm undecided. -- Steve From abarnert at yahoo.com Sat Dec 5 00:21:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Dec 2015 21:21:26 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <20151205023814.GD3821@ando.pearwood.info> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <20151205023814.GD3821@ando.pearwood.info> Message-ID: <398752D0-6214-4D44-BA91-2C40BAFB8C63@yahoo.com> On Dec 4, 2015, at 18:38, Steven D'Aprano wrote: > >> On Fri, Dec 04, 2015 at 12:02:46PM -0800, Andrew Barnert via Python-ideas wrote: >> >> So, maybe the way to get from here to there is to explicitly document >> the methods CPython treats as magic methods, > > That's probably a good idea regardless of anything else. > > >> and allow only allow >> other implementations to do the same for (a subset of) the same, and >> people can gradually tackle and remove parts of that list as people >> come up with ideas, and if the list eventually becomes empty (or gets >> to the point where it's 2 rare things that aren't important enough to >> keep extra complexity in the language), then the whole notion of >> special method lookup can finally go away. > > Well, maybe. I haven't decided whether special method lookup is an > annoying inconsistency or a feature. It seems to me that operators, at > least, are somewhat different from method calls, and the special > handling should be considered a deliberate design feature. My > reasoning goes like this: > > Take a method call: > > obj.spam(arg) > > That's clearly called on the instance itself (obj), and syntactically we > can see that the method call "belongs" to obj, and so semantically we > should give obj a chance to use its own custom method before the method > defined in the class. > > (At least in languages like Python where this feature is supported. I > believe that the "Design Patterns" world actually gives this idiom a > name, and in Java you have to jump through flaming hoops to make it > work, but I can never remember what it is called.) Of course in JavaScript, you have to jump through flaming hoops to make it _not_ work that way, and yet most JS developers jump through those hoops. But anyway, Python is obviously the best of both worlds here. > So it makes sense that ordinary methods should be first looked up on the > instance, then the class. But now take an operator call: > > obj + foo > > (where, for simplicity, both obj and foo are instances of the same > class). You've simplified away the main problem here. Binary dispatch is what makes designing operator overloading in OO languages hard (except OO languages that are multimethod-based from the start, like Dylan or Common Lisp, or that just ban operator overloading, like Java). Not recognizing how well Python handled this is missing something big. Anyway, saying that they belong to "the class" doesn't solve the problem; it leaves you with exactly the same question of who defines 2+3.5 that you started with. And now it seems a lot more artificial to say that it's type(3.5), not 3.5, that does so. Meanwhile, let's look at a concrete example. Why would I want an instance of some type that has arithmetic operations to override methods from the class in the first place? One obvious possibility for such special values is mathematical special values--e.g., datetimes at positive and negative infinity. They can return float infinities when you ask to convert them to seconds-since-epoch, format themselves as special strings, raise on isdst, etc. But what you really want them to do is be > any normal value. And to return themselves when any timedelta is added. And so on. The operators are exactly what you _do_ want to overload. And you definitely want both values to participate in that overloading, not just the types, because otherwise you have to write all the logic for how infinities work with each other twice. (If that doesn't sound bad, imagine you had 5 special values instead of 2. Or imagine that some of the special values were written by one person, and you came along and created new ones later.) And of course this would simplify the language, not add extra complexity--instead of two different rules for method lookup, we have one. So, I think Python would be better if it allowed instances to overload special methods. The obvious alternative today is to make each special value a singleton instance of a new subclass. That gets you most of the same benefits as instance overloads, but at the cost of writing more code and occasionally making debugging a bit clumsier. Exactly the same cost as using singleton subclasses instead of instance overloading for normal methods. If eliminating that cost for normal methods is good, why not for special methods? > Syntactically, that's *not* clearly a call on a specific > instance. There's no good reason to think that the + operator "belongs" > to obj, or foo, or either of them. No, it belongs to _both_ of them. That's the problem. I think your intuition here is that we can solve the problem by saying it belongs to some thing they both have in common, their type, which sounds good at first--but once you remember that everything that makes this hard arises from the fact that we can't assume they have the same type, and therefore they don't share anything, the solution no longer has anything going for it. > > We could nevertheless arbitrarily decide that the left hand instance obj > gets first crack at this, and if it doesn't customise the call to > __add__, the right hand instance foo is allowed to customise __radd__, > and if that doesn't exist we go back to the class __add__ (and if that > also fails to exist we try the class __radd__). > That's okay, I guess, > but the added complexity feels like it would be a bug magnet. Why make it so complex? The existing rule is: if issubclass(type(b), type(a)), look for type(b).__radd__ first; otherwise, look for type(a).__add__ first. If we allowed instance overloading, the rule would be: if isinstance(b, type(a)), look for b.__radd__ first; otherwise, look for a.__add__. No more complex or arbitrary than what we already have, and more flexible, and it doesn't require learning a second lookup rule that applies to some magic methods but not all. Also, notice that this allows instancehooks instead of just subclasshooks to get involved, which I think is also desirable. If I implement an instancehook (or a register-for-my-instancehook method), it's because I want those objects to act like my instances as far as pythonly possible. Each place where they don't, like __radd__ lookup, is a hole in the abstraction. > So *maybe* we should decide that the right design is to say that > operators don't belong to either instance, they belong to the class, and > neither obj nor foo get the chance to override __[r]add__ on a > per-instance basis. > > Similar reasoning applies to __getattr__ and other special methods. We > can, I think, convince ourselves that they belong to the class, not the > instance, in a way that ordinary methods are not. This seems even more dubious to me. The fact that instances' attributes are defined arbitrarily, rather than constrained by the class, is fundamental to Python. (Of course you can jump through hoops with slots, properties, or custom descriptors when you don't want that, but it's obviously the default behavior, and it's pretty uncommon that you don't want it.) So, why should dynamic attribute lookup be any different than normal dict attribute lookup? Of course very often that's more dynamic than you need, so you wouldn't write an instance __getattr__ that often--but that's exactly the same reason you don't write instance overloads that often in general (or __getattr__, for that matter); there's nothing new or different here. > We define ordinary methods in the class definition for convenience and > efficency, but syntactically and semantically obj.spam looks up spam in > the instance obj first, so obj has a chance to override any spam defined > on the class. That's a good thing. > > But syntactically, special dunder methods like __getattr__ and __add__ > don't *clearly* belong to the instance, so maybe we should decide that > it is a positive feature that they can't be overridden by the instance. > By analogy, if we use an unbound method instead: > > TheClass.spam(obj, arg) > > then I think we would all agree that the per-instance obj.spam() method > should *not* be called. It seems to me that operator methods "feel like" > they are closer to unbound method calls than bound method calls. And > likewise for __getattr__ etc I don't see the analogy at all. The bound __add__ call or __getattr__ call is a bound method, and we can always explicitly look up and pass around the unbound method if we want. (I've seen people pass around int.__neg__ the same way they'd pass around str.split; even though I think I'd use operator.neg instead for that, I don't see any problem with it.) The distinction is the same as with normal methods. Also, consider that passing around spam.eggs always does what you expected, whether it's a bound method or a per-instance override, but spam.__add__ doesn't do what you expect unless it's a bound method. That's an extra thing to figure out and/or memorize about the language, and I don't see any benefit. Well, except for the performance benefit to CPython, of course. In a new language, I'm don't think that would be enough to sway me no matter how big the benefit, but in a language where people have been relying on that optimization for years (whether we're talking about today, or when new-style classes were first added and Guido had to convince everyone they weren't going to slow down the language), it obviously is a major factor. So I don't think Python should change these methods until someone can work out a way to make the optimization unnecessary. One other possible consideration is that, while a tracing JIT should be able to optimize instance overloads just as easily as class overloads, an AIT optimizer might not be able to, simply because types are so central to both the research and the industrial experience there (although JS might be changing that? I'm not up to date...). So, that might be another reason to treat these methods as special in a new language. But that doesn't really apply to Python. (Sure, someone _might_ write a traditional-style AIT optimizer for Python to give us C-competitive speeds some day, but at this point it doesn't seem like an important-enough prospect that we should design the language around it...) From brenbarn at brenbarn.net Sat Dec 5 00:50:58 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 04 Dec 2015 21:50:58 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <20151205023814.GD3821@ando.pearwood.info> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <20151205023814.GD3821@ando.pearwood.info> Message-ID: <56627B42.6000103@brenbarn.net> On 2015-12-04 18:38, Steven D'Aprano wrote: > Well, maybe. I haven't decided whether special method lookup is an > annoying inconsistency or a feature. It seems to me that operators, at > least, are somewhat different from method calls, and the special > handling should be considered a deliberate design feature. My > reasoning goes like this: What you're describing is orthogonal to the issue I was raising. I agree that it makes sense for "obj1 + obj2" to look up __add__ on the class. What I'm saying is that currently the lookup of __add__ on the class is *still* not an ordinary lookup. An ordinary lookup on the class would make use of __getattr__/__getattribute__ on the *metaclass* (just as an ordinary lookup on the instance, like foo.attr, would make use of __getattr__ on the class). But special method lookup doesn't do this; type(foo).__getattribute__("__add__") is never called. There is no way to customize special method name lookup at all. You have to actually define the methods on the class. (You can do that in labor-saving ways like writing a metaclass or class decorator that inserts appropriate methods into the class dict, but you still have to statically set those methods; you can't intercept each special method lookup as it comes in and decide what to do with it, as you can with other attributes.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From brenbarn at brenbarn.net Sat Dec 5 00:55:43 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 04 Dec 2015 21:55:43 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> Message-ID: <56627C5F.40805@brenbarn.net> On 2015-12-04 12:02, Andrew Barnert wrote: > So, maybe the way to get from here to there is to explicitly document > the methods CPython treats as magic methods, and allow only allow > other implementations to do the same for (a subset of) the same, and > people can gradually tackle and remove parts of that list as people > come up with ideas, and if the list eventually becomes empty (or gets > to the point where it's 2 rare things that aren't important enough to > keep extra complexity in the language), then the whole notion of > special method lookup can finally go away. Presumably you mean "treats as magic methods for the purposes of this lookup short-circuiting"? To me the term "magic methods" just means the ones that are implicitly invoked by syntax (e.g., operator overloading), and what those are is already documented. If that's what you mean, I agree that would be a good starting point. I am always nervous when the docs say the kind of thing they say for this, which is "implicit special method lookup generally also bypasses the __getattribute__() method even of the object?s metaclass". "Generally"? What does that mean? It seems to me that whether and when __getattribute__ is called should be a clearly-specified part of the semantics of magic methods, and it's somewhat disturbing that it doesn't seem to be. --- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From abarnert at yahoo.com Sat Dec 5 01:25:04 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Dec 2015 22:25:04 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <56627C5F.40805@brenbarn.net> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> Message-ID: <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> On Dec 4, 2015, at 21:55, Brendan Barnwell wrote: > >> On 2015-12-04 12:02, Andrew Barnert wrote: >> So, maybe the way to get from here to there is to explicitly document >> the methods CPython treats as magic methods, and allow only allow >> other implementations to do the same for (a subset of) the same, and >> people can gradually tackle and remove parts of that list as people >> come up with ideas, and if the list eventually becomes empty (or gets >> to the point where it's 2 rare things that aren't important enough to >> keep extra complexity in the language), then the whole notion of >> special method lookup can finally go away. > > Presumably you mean "treats as magic methods for the purposes of this lookup short-circuiting"? Yes; sorry it's not clear without the context, but that's what I meant. > If that's what you mean, I agree that would be a good starting point. I am always nervous when the docs say the kind of thing they say for this, which is "implicit special method lookup generally also bypasses the __getattribute__() method even of the object?s metaclass". "Generally"? What does that mean? It seems to me that whether and when __getattribute__ is called should be a clearly-specified part of the semantics of magic methods, and it's somewhat disturbing that it doesn't seem to be. Well, I think it may be a good thing that Python allows other implementations to not skip the normal lookup steps. But yes, at the very least, there should be a list of exactly which methods are allowed to do that, rather than a vague description, and an implementation note mentioning exactly which ones do so in CPython. From mike at selik.org Sat Dec 5 03:15:03 2015 From: mike at selik.org (Michael Selik) Date: Sat, 05 Dec 2015 08:15:03 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: The source (https://hg.python.org/cpython/file/3.5/Lib/functools.py) warns on lines 411-414 that one should only use the public API for thread-safety and forwards-compatibility with a possible C version. Why not encapsulate the recursive function and its persistent state in a class? from functools import lru_cache class Fibonacci: 'the classic recursion example' def __init__(self, shortcut=False): self.shortcut = shortcut @lru_cache() def nth(self, n): if self.shortcut: return -1 if n == 0: return 0 if n == 1: return 1 return self.nth(n-1) + self.nth(n-2) On Fri, Dec 4, 2015 at 7:59 PM Ryan Gonzalez wrote: > http://thecodelesscode.com/topics/caching > > This smells like something that could cause trouble to me... If it's in > the stdlib, it's easier for someone to use it and then be confused when > something doesn't work correctly (maybe someone else will make > partial_state affect the result, not realizing that that argument is > ignored). > > Too buggy to me. > > On December 4, 2015 3:44:18 PM CST, Bill Winslow > wrote: > >> This is a question I posed to reddit, with no real resolution: >> https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ >> >> The summary for people here is the following: >> >> Here's a pattern I'm using for my code: >> >> def deterministic_recursive_calculation(input, partial_state=None): >> condition = do_some_calculations(input) >> if condition: >> return deterministic_recursive_calculation(reduced_input, >> some_state) >> >> Basically, in calculating the results of the subproblem, the subproblem >> can be calculated quicker by including/sharing some partial results from >> the superproblem. (Calling the subproblem without the partial state still >> gives the same result, but takes substantially longer.) >> >> I want to memoize this function for obvious reasons, but I need the >> lru_cache to ignore the partial_state argument, for its value does not >> affect the output, only the computation expense. >> >> Is there any reasonable way to do this? >> >> Things such as memoizing a wrapper don't actually solve the problem. >> About the only way I can think of with current technology is either to have >> a hidden singleton class which maintains state, or a hidden global >> variable, which amount to the same thing of storing the partial state >> outside the function. But those strike me as unwieldy and unpythonic. >> >> What I'd really like to do is to have some way to tell >> functools.lru_cache to ignore some arguments of a function it's memoizing >> for the purposes of caching. >> >> One way would be to add an "arg_filter" argument, which for purposes of >> this example would be used like so: >> >> @lru_cache(arg_filter=lambda args, kwargs: args[:1], {}) >> def deterministic_recursive_calculation(input, partial_state=None): >> condition = do_some_calculations(input) >> if condition: >> return deterministic_recursive_calculation(reduced_input, >> some_state) >> >> This effectively ignores all arguments except the first positional one >> for the purposes of caching. Such an option could be implemented as in the >> diff below (provided only for discussion purposes). >> >> So what I want to know is: >> 1) Am I sane? Is there no particularly good way currently to go about >> caching functions following the given pattern? >> 2) Assuming the answer to 1) is "Yes I am sane, and there's no good way >> currently", is my proposal a reasonable one? Mostly on philosophical >> grounds, not necessarily specifics of how it works. >> >> Thank you for your time and consideration. >> >> Bill >> >> >> ---------------------------------------------------------------------------------------------------------------------------------- >> https://hg.python.org/cpython/file/3.5/Lib/functools.py >> >> diff functools.py functools.py.orig >> 363c363 >> < def _make_key(args, kwds, typed, arg_filter, >> --- >> > def _make_key(args, kwds, typed, >> 377,378d376 >> < if arg_filter is not None: >> < args, kwds = arg_filter(args, kwds) >> 393c391 >> < def lru_cache(maxsize=128, typed=False, arg_filter=None): >> --- >> > def lru_cache(maxsize=128, typed=False): >> 403,406d400 >> < *arg_filter* is an optional function which filters user-speicified >> portions >> < of the arguments from the caching key (e.g. if an argument affects >> the >> < computation but not the final result). >> < >> 428,430d421 >> < if arg_filter is not None and not callable(arg_filter): >> < raise TypeError('Expected arg_filter to be a callable') >> < >> 432,433c423 >> < wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >> arg_filter, >> < _CacheInfo) >> --- >> > wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >> _CacheInfo) >> 438c428 >> < def _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, >> _CacheInfo): >> --- >> > def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo): >> 466c456 >> < key = make_key(args, kwds, typed, arg_filter) >> --- >> > key = make_key(args, kwds, typed) >> 481c471 >> < key = make_key(args, kwds, typed, arg_filter) >> --- >> > key = make_key(args, kwds, typed) >> >> >> >> >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -- > Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 5 12:30:30 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 5 Dec 2015 09:30:30 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> Message-ID: I wish there was less armchair language design and post-rationalization on this topic, and more research into *why* we actually changed this. I recall we did not take the decision lightly. Surely correlating the source control logs and the mailing list archives can shed more insight on this topic than thought experiments about classes with five special values. :-) Patches to the reference manual are also very welcome, BTW. We're not proud of the imprecision of much of the language there -- I would love for some language lawyers to help tighten the language (not just on this topic but all over the reference manual). Finally. Regardless of what the reference manual may (not) say, other implementations do *not* have the freedom to change the operator lookup to look in the instance dict first. (However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Sat Dec 5 16:29:31 2015 From: mike at selik.org (Michael Selik) Date: Sat, 05 Dec 2015 21:29:31 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: I saw that Bill previously considered and rejected the idea of a class on the Reddit thread, calling it unpythonic and musing about changing the keying function of lru_cache. It seems to me that while recursion is the hallmark of functional style, shared state is a major feature of object orientation. This example seems particularly appropriate for the blend of the two styles -- a recursive function housed in a class. To reduce the chances someone creates a second instance of the class, wasting the cached results of the first instance, one could wrap an instance in a plain module-level function. def recurse(*args, state=None): recurse.instance.state = state return recurse.instance.recurse(*args) recurse.instance = MethodsHaveCaches() Good, Bad, Ugly? On Sat, Dec 5, 2015 at 3:15 AM Michael Selik wrote: > The source (https://hg.python.org/cpython/file/3.5/Lib/functools.py) > warns on lines 411-414 that one should only use the public API for > thread-safety and forwards-compatibility with a possible C version. > > Why not encapsulate the recursive function and its persistent state in a > class? > > from functools import lru_cache > > class Fibonacci: > 'the classic recursion example' > > def __init__(self, shortcut=False): > self.shortcut = shortcut > > @lru_cache() > def nth(self, n): > if self.shortcut: return -1 > if n == 0: return 0 > if n == 1: return 1 > return self.nth(n-1) + self.nth(n-2) > > > On Fri, Dec 4, 2015 at 7:59 PM Ryan Gonzalez wrote: > >> http://thecodelesscode.com/topics/caching >> >> This smells like something that could cause trouble to me... If it's in >> the stdlib, it's easier for someone to use it and then be confused when >> something doesn't work correctly (maybe someone else will make >> partial_state affect the result, not realizing that that argument is >> ignored). >> >> Too buggy to me. >> >> On December 4, 2015 3:44:18 PM CST, Bill Winslow >> wrote: >> >>> This is a question I posed to reddit, with no real resolution: >>> https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ >>> >>> The summary for people here is the following: >>> >>> Here's a pattern I'm using for my code: >>> >>> def deterministic_recursive_calculation(input, partial_state=None): >>> condition = do_some_calculations(input) >>> if condition: >>> return deterministic_recursive_calculation(reduced_input, >>> some_state) >>> >>> Basically, in calculating the results of the subproblem, the subproblem >>> can be calculated quicker by including/sharing some partial results from >>> the superproblem. (Calling the subproblem without the partial state still >>> gives the same result, but takes substantially longer.) >>> >>> I want to memoize this function for obvious reasons, but I need the >>> lru_cache to ignore the partial_state argument, for its value does not >>> affect the output, only the computation expense. >>> >>> Is there any reasonable way to do this? >>> >>> Things such as memoizing a wrapper don't actually solve the problem. >>> About the only way I can think of with current technology is either to have >>> a hidden singleton class which maintains state, or a hidden global >>> variable, which amount to the same thing of storing the partial state >>> outside the function. But those strike me as unwieldy and unpythonic. >>> >>> What I'd really like to do is to have some way to tell >>> functools.lru_cache to ignore some arguments of a function it's memoizing >>> for the purposes of caching. >>> >>> One way would be to add an "arg_filter" argument, which for purposes of >>> this example would be used like so: >>> >>> @lru_cache(arg_filter=lambda args, kwargs: args[:1], {}) >>> def deterministic_recursive_calculation(input, partial_state=None): >>> condition = do_some_calculations(input) >>> if condition: >>> return deterministic_recursive_calculation(reduced_input, >>> some_state) >>> >>> This effectively ignores all arguments except the first positional one >>> for the purposes of caching. Such an option could be implemented as in the >>> diff below (provided only for discussion purposes). >>> >>> So what I want to know is: >>> 1) Am I sane? Is there no particularly good way currently to go about >>> caching functions following the given pattern? >>> 2) Assuming the answer to 1) is "Yes I am sane, and there's no good way >>> currently", is my proposal a reasonable one? Mostly on philosophical >>> grounds, not necessarily specifics of how it works. >>> >>> Thank you for your time and consideration. >>> >>> Bill >>> >>> >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> https://hg.python.org/cpython/file/3.5/Lib/functools.py >>> >>> diff functools.py functools.py.orig >>> 363c363 >>> < def _make_key(args, kwds, typed, arg_filter, >>> --- >>> > def _make_key(args, kwds, typed, >>> 377,378d376 >>> < if arg_filter is not None: >>> < args, kwds = arg_filter(args, kwds) >>> 393c391 >>> < def lru_cache(maxsize=128, typed=False, arg_filter=None): >>> --- >>> > def lru_cache(maxsize=128, typed=False): >>> 403,406d400 >>> < *arg_filter* is an optional function which filters user-speicified >>> portions >>> < of the arguments from the caching key (e.g. if an argument affects >>> the >>> < computation but not the final result). >>> < >>> 428,430d421 >>> < if arg_filter is not None and not callable(arg_filter): >>> < raise TypeError('Expected arg_filter to be a callable') >>> < >>> 432,433c423 >>> < wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >>> arg_filter, >>> < _CacheInfo) >>> --- >>> > wrapper = _lru_cache_wrapper(user_function, maxsize, typed, >>> _CacheInfo) >>> 438c428 >>> < def _lru_cache_wrapper(user_function, maxsize, typed, arg_filter, >>> _CacheInfo): >>> --- >>> > def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo): >>> 466c456 >>> < key = make_key(args, kwds, typed, arg_filter) >>> --- >>> > key = make_key(args, kwds, typed) >>> 481c471 >>> < key = make_key(args, kwds, typed, arg_filter) >>> --- >>> > key = make_key(args, kwds, typed) >>> >>> >>> >>> >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> -- >> Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 5 17:56:42 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 6 Dec 2015 00:56:42 +0200 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On 04.12.15 23:44, Bill Winslow wrote: > This is a question I posed to reddit, with no real resolution: > https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ > > The summary for people here is the following: > > Here's a pattern I'm using for my code: > > def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, > some_state) > > Basically, in calculating the results of the subproblem, the subproblem > can be calculated quicker by including/sharing some partial results from > the superproblem. (Calling the subproblem without the partial state > still gives the same result, but takes substantially longer.) > > I want to memoize this function for obvious reasons, but I need the > lru_cache to ignore the partial_state argument, for its value does not > affect the output, only the computation expense. > > Is there any reasonable way to do this? Memoize a closure. def deterministic_calculation(input): some_state = ... @lru_cache() def recursive_calculation(input): nonlocal some_state ... return recursive_calculation(reduced_input) return recursive_calculation(input) From abarnert at yahoo.com Sat Dec 5 20:56:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 5 Dec 2015 17:56:52 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> Message-ID: <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> On Dec 5, 2015, at 09:30, Guido van Rossum wrote: > > I wish there was less armchair language design and post-rationalization on this topic, and more research into *why* we actually changed this. I think the documentation already describes the reasoning just fine: 1. Performance. It "provides significant scope for speed optimizations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter)." 2. As a solution to the metaclass problem with the handful of methods implemented by both type and normal types--e.g., you don't want hash(int) to call int.__hash__, but type.__hash__. (This one obviously doesn't affect most operators; you can't generally add or xor types--but List[int] shows why that can't be ignored.) Anyway, I had already looked at the relevant PEPs, what's new, descrintro, and the old PLAN.txt. But, at your suggestion, I decided to take a look back at the commit comments and Python-dev archives as well, to see if there is some other reason for the change beyond those mentioned in the docs. In fact, there was almost no discussion at all. The descr branch was merged to trunk on 2 August 2001, nobody mentioned either problem, and then on 28 Aug you committed #19535 and #19538. The performance issue was raised in descrintro and the PEP, and a brief early thread, but it seems like everyone was happy to wait until you declared it done before griping about performance problems that may or may not result. Everyone was happy with 2.2a4 (which was after the change), so nobody had a reason to complain. As far as I can see, nobody noticed the hash(int) issue before you found and fixed it. It wasn't mentioned in PLAN.txt, any mailing list threads, etc. It apparently didn't even come up after the fact (except in a brief aside about PEP 266) until much later, when people got used to using the new features in practice and started asking why they can't do instance overrides for some special methods (at which point the answer was already in the docs). Unless I'm missing something (which is certainly possible), or you can remember your reasoning 14 years ago, I think what the docs say is pretty much all there is to say. In particular, do you actually have a reason that spam + eggs shouldn't look at spam.__dict__['__add__'] other than for performance and for consistency with the hash(int) solution? > I recall we did not take the decision lightly. Surely correlating the source control logs and the mailing list archives can shed more insight on this topic than thought experiments about classes with five special values. :-) > > Patches to the reference manual are also very welcome, BTW. We're not proud of the imprecision of much of the language there -- I would love for some language lawyers to help tighten the language (not just on this topic but all over the reference manual). > Finally. Regardless of what the reference manual may (not) say, other implementations do *not* have the freedom to change the operator lookup to look in the instance dict first. IIRC, Jython 2.2 looks at the instance first unless the type is a Java class, and nobody ever complained (in fact, I remember someone complaining about CPython breaking his code that worked in Jython...). But it would certainly be easier to tighten up the language in 3.3.9 if it applies to all Python implementations. > (However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.) I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so? From ncoghlan at gmail.com Sun Dec 6 07:58:51 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 6 Dec 2015 22:58:51 +1000 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> Message-ID: On 6 December 2015 at 11:56, Andrew Barnert via Python-ideas wrote: > On Dec 5, 2015, at 09:30, Guido van Rossum wrote: >> (However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.) > > I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so? Ronald Oussoren has elaborated on that aspect of the problem in his __getdescriptor__ PEP: https://www.python.org/dev/peps/pep-0447/ The main reason it's separate from __getattribute__ is that this is necessary to avoid changing the semantics of super.__getattribute__, but it's also the case that things would otherwise get quite confusing with object.__getattribute__ and super.__getattribute__ potentially calling type.__getattribute__, which then has the potential for strange consequences when you consider that "type" is itself an instance of "type". My recollection of the previous round of discussions on that PEP is that we're actually pretty happy with the design - it's now dependent on someone with the roundtuits to update the reference implementation to match the current PEP text and the head of the current development branch. Regards, Nick. P.S. Now that I've realised PEP 447 is relevant to the current discussion, I've also realised that providing a base case in type.__getattribute__ that terminates the metaclass lookup chain for attributes is likely sufficient explanation for why this works the way it does now - if it didn't, type being its own metaclass would trigger a recursion error. PEP 447 extends the chain one step further (to the metaclass) by introducing a new magic method, but *that* method in turn still can't be altered the metaclass of the metaclass. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ram at rachum.com Sun Dec 6 16:48:41 2015 From: ram at rachum.com (Ram Rachum) Date: Sun, 6 Dec 2015 23:48:41 +0200 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers Message-ID: Hi guys, I'm using `contextlib.ExitStack` today, and pushing context managers into it. I find myself wanting to exit specific context managers that I've pushed into it, while still inside the `with` suite of the `ExitStack`. In other words, I want to exit one of the context managers but still keep the `ExitStack`, and all other context managers, acquired. This isn't currently possible, right? What do you think about implementing this? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Dec 6 19:32:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 7 Dec 2015 11:32:25 +1100 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: Message-ID: <20151207003224.GF3821@ando.pearwood.info> On Sun, Dec 06, 2015 at 11:48:41PM +0200, Ram Rachum wrote: > Hi guys, > > I'm using `contextlib.ExitStack` today, and pushing context managers into > it. I find myself wanting to exit specific context managers that I've > pushed into it, while still inside the `with` suite of the `ExitStack`. In > other words, I want to exit one of the context managers but still keep the > `ExitStack`, and all other context managers, acquired. This isn't currently > possible, right? What do you think about implementing this? I'm not entirely sure what you mean. Can you give an example? Some of the examples given here: https://docs.python.org/3/library/contextlib.html#examples-and-recipes sound like they might be related to what you are trying to do. Otherwise, if I have understood your requirement correctly, I think you might have a good case for *not* using ExitStack. If you have one context manager that you want to treat differently from the others, perhaps you should write it differently from the others: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] do_things() with open(special_file) as sf: do_other_things() do_more_things() -- Steven From mike at selik.org Sun Dec 6 19:41:02 2015 From: mike at selik.org (Michael Selik) Date: Mon, 07 Dec 2015 00:41:02 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Sat, Dec 5, 2015 at 6:11 PM Serhiy Storchaka wrote: > On 04.12.15 23:44, Bill Winslow wrote: > > def deterministic_recursive_calculation(input, partial_state=None): > > condition = do_some_calculations(input) > > if condition: > > return deterministic_recursive_calculation(reduced_input, > > some_state) > > > > I want to memoize this function for obvious reasons, but I need the > > lru_cache to ignore the partial_state argument, for its value does not > > affect the output, only the computation expense. > > Memoize a closure. > > def deterministic_calculation(input): > some_state = ... > @lru_cache() > def recursive_calculation(input): > nonlocal some_state > ... > return recursive_calculation(reduced_input) > return recursive_calculation(input) > This would provide the dynamic programming aspect for the recursion, but not cache across successive calls to the outer function. It does look cleaner than the OO version. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 6 22:40:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Dec 2015 13:40:00 +1000 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: <20151207003224.GF3821@ando.pearwood.info> References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: On 7 December 2015 at 10:32, Steven D'Aprano wrote: > On Sun, Dec 06, 2015 at 11:48:41PM +0200, Ram Rachum wrote: >> Hi guys, >> >> I'm using `contextlib.ExitStack` today, and pushing context managers into >> it. I find myself wanting to exit specific context managers that I've >> pushed into it, while still inside the `with` suite of the `ExitStack`. In >> other words, I want to exit one of the context managers but still keep the >> `ExitStack`, and all other context managers, acquired. This isn't currently >> possible, right? What do you think about implementing this? > > I'm not entirely sure what you mean. Can you give an example? It's a concept I considered implementing, but decided to hold off on it because there are a few different design options here and I didn't have any use cases to guide the API design, nor the capacity to do usability testing to see if the additional API complexity actually made ExitStack easier to use overall. The first design option is the status quo: using multiple with statement blocks, potentially in conjunction with multiple ExitStack instances. The virtue of this approach is that it means that once a context manager is added to an ExitStack instance, that's it - its lifecycle is now coupled to that of the other context managers in the stack. You can postpone cleaning up all of them with "pop_all()" (transferring responsibility for the cleanup to a fresh ExitStack instance), but you can't selectively clean them up from the end. The usage guidelines are thus relatively simple: if you don't want to couple the lifecycles of two context managers together, then don't add them to the same ExitStack instance. However, there is also that "Stack" in the name, so it's natural for users to expect to be able to both push() *and* pop() individual context managers on the stack. The (on the surface) simplest design option to *add* to the API would be a single "pop()" operation that returned a new ExitStack instance (for return type consistency with pop_all()) that contained the last context manager pushed onto the stack. However, this API is problematic, as you've now decoupled the nesting of the context manager stack - the popped context manager may now survive beyond the closure of the original ExitStack. Since this kind a pop_all() inspired selective clean-up API would result in two different ExitStack instances anyway, the status quo seems cleaner to me than this option, as it's obvious from the start that there are seperate ExitStack instances with potentially distinct lifecycles. The next option would then be to offer a separate "exit_last_context()" method, that exited the last context manager pushed onto the stack. This would be a single-stepping counterpart to the existing close() method, that allowed you to dynamically descend and ascend the context management stack during normal operation, while still preserving the property that the entire stack will be cleaned up when encountering an exception. Assuming we went with that simpler in-place API, there would still be a number of further design questions to be answered: * Do we need to try to manage the reported exception context the way ExitStack.__exit__ does? * Does "exit_last_context()" need to accept exception details like __exit__ does? * Does "exit_last_context()" need to support the ability to suppress exceptions? * What, if anything, should the return value be? * What happens if there are no contexts on the stack to pop? * Should it become possible to query the number of registered callbacks? Taking them in order, as a matter of implementation feasibility, the answer to the first question likely needs to be "No". For consistency with calling __exit__ methods directly, the answers to the next three questions likely need to be "support the same thing __exit__ supports". For the second last question, while it's reasonable to call close(), pop_all() or __exit__() on an empty stack and have it silently do nothing, if someone has taken it on themselves to manage the stack depth manually, then it's likely more helpful to complain that the stack is empty than it is to silently do nothing. Since exit_last_context() may behave differently depending on whether or not there are items on the stack, and the number of items on the stack would be useful for diagnostic purposes, then it likely also makes sense to implement a __len__ method that delegated to "len(self._exit_callbacks)". That all suggests a possible implementation along the lines of the following: def exit_last_context(self, *exc_details): if not self._exit_callbacks: raise RuntimeError("Attempted to exit last context on empty ExitStack instance") cb = self._exit_callbacks.pop() return cb(*exc_details) def __len__(self): return len(self._exit_callbacks) What I don't know is whether or not that's actually a useful enough improvement over the status quo to justify the additional cognitive burden when learning the ExitStack API - the current API was designed around the ExitStack recipes in the documentation, which were all fairly common code patterns, but most cases where I might consider using an "exit_last_context()" method, I'd be more inclined to follow Steven's advice and use a separate context manager for the resource with an independent lifecycle. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Mon Dec 7 08:31:26 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 7 Dec 2015 15:31:26 +0200 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On 07.12.15 02:41, Michael Selik wrote: > On Sat, Dec 5, 2015 at 6:11 PM Serhiy Storchaka > > wrote: > > On 04.12.15 23:44, Bill Winslow wrote: > > def deterministic_recursive_calculation(input, partial_state=None): > > condition = do_some_calculations(input) > > if condition: > > return deterministic_recursive_calculation(reduced_input, > > some_state) > > > > I want to memoize this function for obvious reasons, but I need the > > lru_cache to ignore the partial_state argument, for its value > does not > > affect the output, only the computation expense. > > Memoize a closure. > > def deterministic_calculation(input): > some_state = ... > @lru_cache() > def recursive_calculation(input): > nonlocal some_state > ... > return recursive_calculation(reduced_input) > return recursive_calculation(input) > > > This would provide the dynamic programming aspect for the recursion, but > not cache across successive calls to the outer function. It does look > cleaner than the OO version. Decorate the outer function too. From guido at python.org Mon Dec 7 11:49:06 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Dec 2015 08:49:06 -0800 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> Message-ID: I distinctly remember that by the time I committed that code I had seen numerous examples (perhaps in proprietary code at Zope) of attempts to use per-instance overloading going wrong. I believe what was happening was that users were trying to overload operators using instance attributes named after operators, and were trying to come up with clever tricks to ensure that the instance argument was passed to the method (there use case was not special cases like Infinities, they had code that needed access to both arguments). When I saw the resulting ugly code I figured there would be little loss if we just not allowed the practice. That's really all I can recall. I don't think there's a need to make another U-turn on this topic. Regarding the docs we may still have to add language to make it clear that the behavior you describe for Jython is wrong. (I also believe there was a request earlier in this thread to clarify exactly which methods are not looked up on the instance.) On Sat, Dec 5, 2015 at 5:56 PM, Andrew Barnert wrote: > On Dec 5, 2015, at 09:30, Guido van Rossum wrote: > > > > I wish there was less armchair language design and post-rationalization > on this topic, and more research into *why* we actually changed this. > > I think the documentation already describes the reasoning just fine: > > 1. Performance. It "provides significant scope for speed optimizations > within the interpreter, at the cost of some flexibility in the handling of > special methods (the special method must be set on the class object itself > in order to be consistently invoked by the interpreter)." > > 2. As a solution to the metaclass problem with the handful of methods > implemented by both type and normal types--e.g., you don't want hash(int) > to call int.__hash__, but type.__hash__. (This one obviously doesn't affect > most operators; you can't generally add or xor types--but List[int] shows > why that can't be ignored.) > > Anyway, I had already looked at the relevant PEPs, what's new, descrintro, > and the old PLAN.txt. But, at your suggestion, I decided to take a look > back at the commit comments and Python-dev archives as well, to see if > there is some other reason for the change beyond those mentioned in the > docs. > > In fact, there was almost no discussion at all. The descr branch was > merged to trunk on 2 August 2001, nobody mentioned either problem, and then > on 28 Aug you committed #19535 and #19538. > > The performance issue was raised in descrintro and the PEP, and a brief > early thread, but it seems like everyone was happy to wait until you > declared it done before griping about performance problems that may or may > not result. Everyone was happy with 2.2a4 (which was after the change), so > nobody had a reason to complain. > > As far as I can see, nobody noticed the hash(int) issue before you found > and fixed it. It wasn't mentioned in PLAN.txt, any mailing list threads, > etc. It apparently didn't even come up after the fact (except in a brief > aside about PEP 266) until much later, when people got used to using the > new features in practice and started asking why they can't do instance > overrides for some special methods (at which point the answer was already > in the docs). > > Unless I'm missing something (which is certainly possible), or you can > remember your reasoning 14 years ago, I think what the docs say is pretty > much all there is to say. > > In particular, do you actually have a reason that spam + eggs shouldn't > look at spam.__dict__['__add__'] other than for performance and for > consistency with the hash(int) solution? > > > I recall we did not take the decision lightly. Surely correlating the > source control logs and the mailing list archives can shed more insight on > this topic than thought experiments about classes with five special values. > :-) > > > > > Patches to the reference manual are also very welcome, BTW. We're not > proud of the imprecision of much of the language there -- I would love for > some language lawyers to help tighten the language (not just on this topic > but all over the reference manual). > > > > Finally. Regardless of what the reference manual may (not) say, other > implementations do *not* have the freedom to change the operator lookup to > look in the instance dict first. > > IIRC, Jython 2.2 looks at the instance first unless the type is a Java > class, and nobody ever complained (in fact, I remember someone complaining > about CPython breaking his code that worked in Jython...). > > But it would certainly be easier to tighten up the language in 3.3.9 if it > applies to all Python implementations. > > > (However, giving the metaclass more control is not unreasonable. There > also seems to be some interesting behavior here related to slots.) > > > I'm not sure what you're suggesting here. That implementations can let a > metaclass __getattribute__ hook special method lookup, but some > implementations (including CPython 3.6) won't do so? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Mon Dec 7 12:48:10 2015 From: mike at selik.org (Michael Selik) Date: Mon, 07 Dec 2015 17:48:10 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Mon, Dec 7, 2015 at 8:31 AM Serhiy Storchaka wrote: > On 07.12.15 02:41, Michael Selik wrote: > > On Sat, Dec 5, 2015 at 6:11 PM Serhiy Storchaka > > > > wrote: > > > > On 04.12.15 23:44, Bill Winslow wrote: > > > def deterministic_recursive_calculation(input, > partial_state=None): > > > condition = do_some_calculations(input) > > > if condition: > > > return deterministic_recursive_calculation(reduced_input, > > > some_state) > > > > > > I want to memoize this function for obvious reasons, but I need > the > > > lru_cache to ignore the partial_state argument, for its value > > does not > > > affect the output, only the computation expense. > > > > Memoize a closure. > > > > def deterministic_calculation(input): > > some_state = ... > > @lru_cache() > > def recursive_calculation(input): > > nonlocal some_state > > ... > > return recursive_calculation(reduced_input) > > return recursive_calculation(input) > > > > > > This would provide the dynamic programming aspect for the recursion, but > > not cache across successive calls to the outer function. It does look > > cleaner than the OO version. > > Decorate the outer function too. > Wouldn't that put us back where we started -- the cache is inappropriately keying on the state/shortcut? @lru_cache() def recursive(*args, shortcut=False): @lru_cache() def inner(*args): if shortcut: return 'took a shortcut' print('infinite recursion is fun!') return inner(*args) return inner(n) Creating a separate wrapper would do it, so that the recursive function isn't re-created each time. @lru_cache() def recursive(*args): if recursive.shortcut: return 'took a shortcut' print('infinite recursion is fun!') return recursive(*args) recursive.shortcut = False def wrapper(*args, shortcut=False): recursive.shortcut = shortcut return recursive(*args) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Mon Dec 7 15:37:04 2015 From: ram at rachum.com (Ram Rachum) Date: Mon, 7 Dec 2015 22:37:04 +0200 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: I would actually want a method that exits not just the last context manager, but any context manager in the stack according to my choosing. Maybe it clashes with the fact that you're using `deque`, but I'm not sure that you have a compelling reason to use `deque`. If you're asking about my use case: It's pretty boring. I have a sysadmin script with a long function that does remote actions on a few servers. I wrapped it all in an `ExitStack` since I use file-based locks and I want to ensure they get released eventually. Now, at some point I want to release the file-based lock manually, but I can't use a with statement, because there's a condition around the place where I acquire the lock. It's something like this: if condition: exit_stack.enter_context(get_lock_1()) else: exit_stack.enter_context(get_lock_2()) So ideally I would want a method that takes a context manager and just exits it. Maybe even add an optional argument `context_manager` to the existing `close` method. Personally I don't care about exception-handling in this case, and while I think it would be nice to include exception-handling, I see that the existing close method doesn't provide exception-handling either, so I wouldn't feel bad about it. So maybe something like this: def close(self, context_manager=None): """Immediately unwind the context stack""" if context_manager is None: self.__exit__(None, None, None) else: for _exit_wrapper in reversed(self._exit_callbacks): if _exit_wrapper.__self__ is context_manager: _exit_wrapper(None, None, None) self._exit_callbacks.remove(_exit_wrapper) Maybe also support accepting a tuple of context managers. On Mon, Dec 7, 2015 at 5:40 AM, Nick Coghlan wrote: > On 7 December 2015 at 10:32, Steven D'Aprano wrote: > > On Sun, Dec 06, 2015 at 11:48:41PM +0200, Ram Rachum wrote: > >> Hi guys, > >> > >> I'm using `contextlib.ExitStack` today, and pushing context managers > into > >> it. I find myself wanting to exit specific context managers that I've > >> pushed into it, while still inside the `with` suite of the `ExitStack`. > In > >> other words, I want to exit one of the context managers but still keep > the > >> `ExitStack`, and all other context managers, acquired. This isn't > currently > >> possible, right? What do you think about implementing this? > > > > I'm not entirely sure what you mean. Can you give an example? > > It's a concept I considered implementing, but decided to hold off on > it because there are a few different design options here and I didn't > have any use cases to guide the API design, nor the capacity to do > usability testing to see if the additional API complexity actually > made ExitStack easier to use overall. > > The first design option is the status quo: using multiple with > statement blocks, potentially in conjunction with multiple ExitStack > instances. The virtue of this approach is that it means that once a > context manager is added to an ExitStack instance, that's it - its > lifecycle is now coupled to that of the other context managers in the > stack. You can postpone cleaning up all of them with "pop_all()" > (transferring responsibility for the cleanup to a fresh ExitStack > instance), but you can't selectively clean them up from the end. The > usage guidelines are thus relatively simple: if you don't want to > couple the lifecycles of two context managers together, then don't add > them to the same ExitStack instance. > > However, there is also that "Stack" in the name, so it's natural for > users to expect to be able to both push() *and* pop() individual > context managers on the stack. > > The (on the surface) simplest design option to *add* to the API would > be a single "pop()" operation that returned a new ExitStack instance > (for return type consistency with pop_all()) that contained the last > context manager pushed onto the stack. However, this API is > problematic, as you've now decoupled the nesting of the context > manager stack - the popped context manager may now survive beyond the > closure of the original ExitStack. Since this kind a pop_all() > inspired selective clean-up API would result in two different > ExitStack instances anyway, the status quo seems cleaner to me than > this option, as it's obvious from the start that there are seperate > ExitStack instances with potentially distinct lifecycles. > > The next option would then be to offer a separate > "exit_last_context()" method, that exited the last context manager > pushed onto the stack. This would be a single-stepping counterpart to > the existing close() method, that allowed you to dynamically descend > and ascend the context management stack during normal operation, while > still preserving the property that the entire stack will be cleaned up > when encountering an exception. > > Assuming we went with that simpler in-place API, there would still be > a number of further design questions to be answered: > > * Do we need to try to manage the reported exception context the way > ExitStack.__exit__ does? > * Does "exit_last_context()" need to accept exception details like > __exit__ does? > * Does "exit_last_context()" need to support the ability to suppress > exceptions? > * What, if anything, should the return value be? > * What happens if there are no contexts on the stack to pop? > * Should it become possible to query the number of registered callbacks? > > Taking them in order, as a matter of implementation feasibility, the > answer to the first question likely needs to be "No". For consistency > with calling __exit__ methods directly, the answers to the next three > questions likely need to be "support the same thing __exit__ > supports". > > For the second last question, while it's reasonable to call close(), > pop_all() or __exit__() on an empty stack and have it silently do > nothing, if someone has taken it on themselves to manage the stack > depth manually, then it's likely more helpful to complain that the > stack is empty than it is to silently do nothing. Since > exit_last_context() may behave differently depending on whether or not > there are items on the stack, and the number of items on the stack > would be useful for diagnostic purposes, then it likely also makes > sense to implement a __len__ method that delegated to > "len(self._exit_callbacks)". > > That all suggests a possible implementation along the lines of the > following: > > def exit_last_context(self, *exc_details): > if not self._exit_callbacks: > raise RuntimeError("Attempted to exit last context on > empty ExitStack instance") > cb = self._exit_callbacks.pop() > return cb(*exc_details) > > def __len__(self): > return len(self._exit_callbacks) > > What I don't know is whether or not that's actually a useful enough > improvement over the status quo to justify the additional cognitive > burden when learning the ExitStack API - the current API was designed > around the ExitStack recipes in the documentation, which were all > fairly common code patterns, but most cases where I might consider > using an "exit_last_context()" method, I'd be more inclined to follow > Steven's advice and use a separate context manager for the resource > with an independent lifecycle. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Dec 7 16:06:59 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Dec 2015 08:06:59 +1100 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Sat, Dec 5, 2015 at 8:44 AM, Bill Winslow wrote: > def deterministic_recursive_calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, > some_state) > > Basically, in calculating the results of the subproblem, the subproblem can > be calculated quicker by including/sharing some partial results from the > superproblem. (Calling the subproblem without the partial state still gives > the same result, but takes substantially longer.) > > I want to memoize this function for obvious reasons, but I need the > lru_cache to ignore the partial_state argument, for its value does not > affect the output, only the computation expense. Coming right back to the beginning here... Is there a reason the some_state argument is being passed around, instead of being global? If the result of the function depends only on reduced_input and not on some_state, then wouldn't it be possible to make use of state from an entirely separate call, not just the current one? (And if you _can't_ make use of some_state from an unrelated call, then it _does_ affect the call, and it needs to be incorporated into the cache key.) ISTM this should be exactly as global as the cache itself. ChrisA From mike at selik.org Mon Dec 7 22:52:42 2015 From: mike at selik.org (Michael Selik) Date: Tue, 08 Dec 2015 03:52:42 +0000 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: On Mon, Dec 7, 2015 at 3:37 PM Ram Rachum wrote: > I can't use a with statement, because there's a condition around the place > where I acquire the lock. It's something like this: > > if condition: > > exit_stack.enter_context(get_lock_1()) > > else: > > exit_stack.enter_context(get_lock_2()) > > You can't put the condition in a context manager's __init__ or __enter__? class CM: def __init__(self, condition): self.lock = lock1 if condition else lock2 def __enter__(self): self.lock.acquire() delf __exit__(self, *info): self.lock.release() -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 8 00:56:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 8 Dec 2015 15:56:42 +1000 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: On 8 December 2015 at 06:37, Ram Rachum wrote: > I would actually want a method that exits not just the last context manager, > but any context manager in the stack according to my choosing. Maybe it > clashes with the fact that you're using `deque`, but I'm not sure that you > have a compelling reason to use `deque`. deque itself is an implementation detail, but from a design perspective, ExitStack is intended to recreate the semantics of lexical context management using with statements, without having to actually use that layout in your code. In other words, if the exit semantics can't be expressed in terms of with statements, then I'm not interested in allowing it in ExitStack specifically (see the ExitPool discussion below for more on that qualifier). That means the semantic structures I'm open to ExitStack supporting are: * a nested context stack (which it already does) * a tree structure (which exit_last_context() would allow) The first structure corresponds to passing multiple contexts to the with statement: with cm1(), cm2(), cm3(): ... Which in turn corresponds to nested with statements: with cm1(): with cm2(): with cm3(): ... The ExitStack equivalent is: with ExitStack() as stack: stack.enter_context(cm1()) stack.enter_context(cm2()) stack.enter_context(cm3()) ... Adding an exit_last_context() method would make it possible to replicate the following kind of structure: with cm1(): with cm2(): ... with cm3(): ... Given exit_last_context(), replicating that dynamically would look like: with ExitStack() as stack: stack.enter_context(cm1()) stack.enter_context(cm2()) ... stack.exit_last_context() stack.enter_context(cm3()) ... I'm not aware of any specific use cases for the latter behaviour though, which is why that feature doesn't exist yet. > If you're asking about my use case: It's pretty boring. I have a sysadmin > script with a long function that does remote actions on a few servers. I > wrapped it all in an `ExitStack` since I use file-based locks and I want to > ensure they get released eventually. Now, at some point I want to release > the file-based lock manually, but I can't use a with statement, because > there's a condition around the place where I acquire the lock. It's > something like this: > > if condition: > > exit_stack.enter_context(get_lock_1()) > > else: > > exit_stack.enter_context(get_lock_2()) > > So ideally I would want a method that takes a context manager and just exits > it. Maybe even add an optional argument `context_manager` to the existing > `close` method. Personally I don't care about exception-handling in this > case, and while I think it would be nice to include exception-handling, I > see that the existing close method doesn't provide exception-handling > either, so I wouldn't feel bad about it. OK, thanks for the clarification. The additional details show that this is a slightly different use case from those that ExitStack is designed to support, as ExitStack aims to precisely replicate the semantics of nested with statements (as described above). That includes both the order in which the __exit__ methods get called, and which context managers can suppress exceptions from which other context managers. That's not the only way to manage cleanup logic though, and one relevant alternative is the way the atexit module works: https://docs.python.org/3/library/atexit.html In that model, the cleanup handlers are all considered peer operations, and while they're defined to be run in last-in-first-out order, the assumption is that there aren't any direct dependencies between them the way there can be with lexically nested context managers. That then makes it reasonable to offer the ability to unregister arbitrary callbacks without worrying about the potential impact on other callbacks that were registered later. While I'm not open to adding atexit style logic to ExitStack, I'm *am* amenable to the idea of adding a separate ExitPool context manager that doesn't try to replicate with statement semantics the way ExitStack does, and instead offers atexit style logic where each exit function receives the original exception state passed in to ExitPool.__exit__. One key difference from atexit would be that if any of the exit methods raised an exception, then I'd have ExitPool raise a subclass of RuntimeError (PoolExitError perhaps?) containing a list of all of the cleanup operations that failed. The API for that would probably look something like: class ExitPool: def enter_context(cm): # Call cm.__enter__ and register cm def register(cm): # Register cm.__exit__ to be called on pool exit def callback(func, *args, **kwds): # Register func to be called on pool exit def unregister(cm_or_func): # Unregister a registered CM or callback function def unregister_all(): # Empty the pool without calling anything def close(): # Empty the pool, calling all registered callbacks in LIFO order (via self.__exit__) Internally, the main data structure would be an OrderedDict instance mapping from cm's or functions to their registered callbacks (for ease of unregistration). At this point, if you're open to filing one, we should probably move further discussion over to a new RFE on the contextlib2 issue tracker: https://bitbucket.org/ncoghlan/contextlib2/ That's still pending a rebase on Python 3.5 standard library version of contextlib though... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ram at rachum.com Tue Dec 8 12:21:34 2015 From: ram at rachum.com (Ram Rachum) Date: Tue, 8 Dec 2015 19:21:34 +0200 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: Thanks for the detailed reply Nick. I would like an `ExitPool` class but I'm not invested enough in this feature to champion it, so I'll let it go at this point. If you'll open the ticket and want any feedback from me about the API I'll be happy to give it, just email me. I think I can solve my personal problem with a context manager wrapper that doesn't complain when you try to close the context manager twice. Thanks, Ram. On Tue, Dec 8, 2015 at 7:56 AM, Nick Coghlan wrote: > On 8 December 2015 at 06:37, Ram Rachum wrote: > > I would actually want a method that exits not just the last context > manager, > > but any context manager in the stack according to my choosing. Maybe it > > clashes with the fact that you're using `deque`, but I'm not sure that > you > > have a compelling reason to use `deque`. > > deque itself is an implementation detail, but from a design > perspective, ExitStack is intended to recreate the semantics of > lexical context management using with statements, without having to > actually use that layout in your code. In other words, if the exit > semantics can't be expressed in terms of with statements, then I'm not > interested in allowing it in ExitStack specifically (see the ExitPool > discussion below for more on that qualifier). > > That means the semantic structures I'm open to ExitStack supporting are: > > * a nested context stack (which it already does) > * a tree structure (which exit_last_context() would allow) > > The first structure corresponds to passing multiple contexts to the > with statement: > > with cm1(), cm2(), cm3(): > ... > > Which in turn corresponds to nested with statements: > > with cm1(): > with cm2(): > with cm3(): > ... > > The ExitStack equivalent is: > > with ExitStack() as stack: > stack.enter_context(cm1()) > stack.enter_context(cm2()) > stack.enter_context(cm3()) > ... > > Adding an exit_last_context() method would make it possible to > replicate the following kind of structure: > > with cm1(): > with cm2(): > ... > with cm3(): > ... > > Given exit_last_context(), replicating that dynamically would look like: > > with ExitStack() as stack: > stack.enter_context(cm1()) > stack.enter_context(cm2()) > ... > stack.exit_last_context() > stack.enter_context(cm3()) > ... > > I'm not aware of any specific use cases for the latter behaviour > though, which is why that feature doesn't exist yet. > > > If you're asking about my use case: It's pretty boring. I have a sysadmin > > script with a long function that does remote actions on a few servers. I > > wrapped it all in an `ExitStack` since I use file-based locks and I want > to > > ensure they get released eventually. Now, at some point I want to release > > the file-based lock manually, but I can't use a with statement, because > > there's a condition around the place where I acquire the lock. It's > > something like this: > > > > if condition: > > > > exit_stack.enter_context(get_lock_1()) > > > > else: > > > > exit_stack.enter_context(get_lock_2()) > > > > So ideally I would want a method that takes a context manager and just > exits > > it. Maybe even add an optional argument `context_manager` to the existing > > `close` method. Personally I don't care about exception-handling in this > > case, and while I think it would be nice to include exception-handling, I > > see that the existing close method doesn't provide exception-handling > > either, so I wouldn't feel bad about it. > > OK, thanks for the clarification. The additional details show that > this is a slightly different use case from those that ExitStack is > designed to support, as ExitStack aims to precisely replicate the > semantics of nested with statements (as described above). That > includes both the order in which the __exit__ methods get called, and > which context managers can suppress exceptions from which other > context managers. > > That's not the only way to manage cleanup logic though, and one > relevant alternative is the way the atexit module works: > https://docs.python.org/3/library/atexit.html > > In that model, the cleanup handlers are all considered peer > operations, and while they're defined to be run in last-in-first-out > order, the assumption is that there aren't any direct dependencies > between them the way there can be with lexically nested context > managers. That then makes it reasonable to offer the ability to > unregister arbitrary callbacks without worrying about the potential > impact on other callbacks that were registered later. > > While I'm not open to adding atexit style logic to ExitStack, I'm *am* > amenable to the idea of adding a separate ExitPool context manager > that doesn't try to replicate with statement semantics the way > ExitStack does, and instead offers atexit style logic where each exit > function receives the original exception state passed in to > ExitPool.__exit__. One key difference from atexit would be that if any > of the exit methods raised an exception, then I'd have ExitPool raise > a subclass of RuntimeError (PoolExitError perhaps?) containing a list > of all of the cleanup operations that failed. > > The API for that would probably look something like: > > class ExitPool: > def enter_context(cm): > # Call cm.__enter__ and register cm > def register(cm): > # Register cm.__exit__ to be called on pool exit > def callback(func, *args, **kwds): > # Register func to be called on pool exit > def unregister(cm_or_func): > # Unregister a registered CM or callback function > def unregister_all(): > # Empty the pool without calling anything > def close(): > # Empty the pool, calling all registered callbacks in LIFO > order (via self.__exit__) > > Internally, the main data structure would be an OrderedDict instance > mapping from cm's or functions to their registered callbacks (for ease > of unregistration). > > At this point, if you're open to filing one, we should probably move > further discussion over to a new RFE on the contextlib2 issue tracker: > https://bitbucket.org/ncoghlan/contextlib2/ > > That's still pending a rebase on Python 3.5 standard library version > of contextlib though... > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 9 01:59:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Dec 2015 16:59:09 +1000 Subject: [Python-ideas] ExitStack: Allow exiting individual context managers In-Reply-To: References: <20151207003224.GF3821@ando.pearwood.info> Message-ID: On 9 December 2015 at 03:21, Ram Rachum wrote: > Thanks for the detailed reply Nick. > > I would like an `ExitPool` class but I'm not invested enough in this feature > to champion it, so I'll let it go at this point. If you'll open the ticket > and want any feedback from me about the API I'll be happy to give it, just > email me. Unfortunately, I haven't even found the time to bring contextlib2 up to date with the current standard library version, let alone consider using it to evaluate new APIs like ExitPool :( Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Stephan.Sahm at gmx.de Thu Dec 10 03:58:37 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 09:58:37 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO Message-ID: Dear all, I think I found a crucial usecase where the standard MRO does not work out. I would appreciate your help to still solve this usecase, or might MRO even be adapted? The idea is to build a generic Lift-type which I call this way because the derived classes should be able to easily lift from subclasses. So for example if I have an instance *a* from *class A* and a *class B(A)* I want to make *a* an instance of *B* in a straightforward way. My implementation (Python 2.7): import abc import inspect def use_as_needed(func, kwargs): meta = inspect.getargspec(func) if meta.keywords is not None: return meta(**kwargs) else: # not generic super-constructor - pick only the relevant subentries: return func(**{k:kwargs[k] for k in kwargs if k in meta.args}) class Liftable(object): __metaclass__ = abc.ABCMeta def __init__(self, **kwargs): use_as_needed(super(Liftable,self).__init__, kwargs) use_as_needed(self.__initialize__, kwargs) @abc.abstractmethod def __initialize__(self, **kwargs): return NotImplemented() class NoMatchingAncestor(RuntimeError): pass class NotLiftable(RuntimeError): pass def lift(self, new_class, **kwargs): # Stop Conditions: if self.__class__ is new_class: return # nothing to do elif new_class is object: # Base Case # break recursion at once: raise NoMatchingAncestor() elif new_class.__base__ is not Liftable: #to ensure this is save raise NotLiftable("Class {} is not Liftable (must be first parent)".format(new_class.__name__)) # recursive case: if not self.__class__ is new_class.__bases__[1]: lift(self, new_class.__bases__[1], **kwargs) # own case: self.__class__ = new_class use_as_needed(self.__initialize__, kwargs) and the example usecase: class A(object): def __init__(self, a): self.a = a class B(Liftable, A): def __initialize__(self, b): self.b = b a = A(1) print a.a, a.__class__ # 1 lift(a, B, b=2) print a.a, a.b, a.__class__ # 1 2 this works so far, however if I now put a further level of Liftable (which in principal already works with the generic definition class C(Liftable, B): def __initialize__(self, c): self.c = c I get the error TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases Liftable, B ?I read about MRO, and it seems to be the case that this setting somehow raises this generic Error, however I really think having such a Lifting is save and extremely useful? - how can I make it work in python? (one further comment: switching the order of inheritance, i.e. class B(A, Liftable) will call A.__init__ before Liftable.__init__ which makes the whole idea senseless) Any constructive help is appreciated! best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Dec 10 04:33:09 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Dec 2015 10:33:09 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: Message-ID: <566946D5.9080706@egenix.com> On 10.12.2015 09:58, Stephan Sahm wrote: > Dear all, > > I think I found a crucial usecase where the standard MRO does not work out. > I would appreciate your help to still solve this usecase, or might MRO even > be adapted? > > The idea is to build a generic Lift-type which I call this way because the > derived classes should be able to easily lift from subclasses. So for > example if I have an instance *a* from *class A* and a *class B(A)* I want > to make *a* an instance of *B* in a straightforward way. Why don't you use: a.__class__ = B ? >>> class A: pass ... >>> class B(A): pass ... >>> a = A() >>> a <__main__.A instance at 0x7f0bf5ccdc68> >>> a.__class__ = B >>> a <__main__.B instance at 0x7f0bf5ccdc68> >>> -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 10 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From Stephan.Sahm at gmx.de Thu Dec 10 04:35:50 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 10:35:50 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: <566946D5.9080706@egenix.com> References: <566946D5.9080706@egenix.com> Message-ID: thanks for the fast response in fact, I do use this already, however more generically: def lift(self, new_class, **kwargs): ? ? ... ? ? self.__class__ = new_class ... On 10 December 2015 at 10:33, M.-A. Lemburg wrote: > On 10.12.2015 09:58, Stephan Sahm wrote: > > Dear all, > > > > I think I found a crucial usecase where the standard MRO does not work > out. > > I would appreciate your help to still solve this usecase, or might MRO > even > > be adapted? > > > > The idea is to build a generic Lift-type which I call this way because > the > > derived classes should be able to easily lift from subclasses. So for > > example if I have an instance *a* from *class A* and a *class B(A)* I > want > > to make *a* an instance of *B* in a straightforward way. > > Why don't you use: > > a.__class__ = B > > ? > > >>> class A: pass > ... > >>> class B(A): pass > ... > >>> a = A() > >>> a > <__main__.A instance at 0x7f0bf5ccdc68> > >>> a.__class__ = B > >>> a > <__main__.B instance at 0x7f0bf5ccdc68> > >>> > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Dec 10 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Dec 10 04:33:31 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Dec 2015 20:33:31 +1100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: Message-ID: <20151210093331.GI3821@ando.pearwood.info> On Thu, Dec 10, 2015 at 09:58:37AM +0100, Stephan Sahm wrote: > Dear all, > > I think I found a crucial usecase where the standard MRO does not work out. > I would appreciate your help to still solve this usecase, or might MRO even > be adapted? > > The idea is to build a generic Lift-type which I call this way because the > derived classes should be able to easily lift from subclasses. So for > example if I have an instance *a* from *class A* and a *class B(A)* I want > to make *a* an instance of *B* in a straightforward way. I must admit I am not familiar with the terminology "Lift" in this context, so I may have missed something. But does this example help? py> class A(object): ... def spam(self): ... return "spam" ... py> class B(A): ... def spam(self): ... result = super(B, self).spam() ... return " ".join([result]*3) ... py> a = A() py> a.spam() 'spam' py> a.__class__ = B py> a.spam() 'spam spam spam' py> type(a) -- Steve From mal at egenix.com Thu Dec 10 04:44:38 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Dec 2015 10:44:38 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: <566946D5.9080706@egenix.com> Message-ID: <56694986.2010802@egenix.com> On 10.12.2015 10:35, Stephan Sahm wrote: > thanks for the fast response > in fact, I do use this already, however more generically: > > def lift(self, new_class, **kwargs): > ? ? > ... > ? ? > self.__class__ = new_class > ... Hmm, then I don't understand why you need a meta class for this. Could you explain the requirements that led up to needing a meta class ? If all you want to do is let the lift() know about a certain property of a class to do it's thing, an attribute or perhaps an empty base class would do the trick, e.g. class A: __liftable__ = True or perhaps: class A: def lift(self, ...): # lift operation goes here > On 10 December 2015 at 10:33, M.-A. Lemburg wrote: > >> On 10.12.2015 09:58, Stephan Sahm wrote: >>> Dear all, >>> >>> I think I found a crucial usecase where the standard MRO does not work >> out. >>> I would appreciate your help to still solve this usecase, or might MRO >> even >>> be adapted? >>> >>> The idea is to build a generic Lift-type which I call this way because >> the >>> derived classes should be able to easily lift from subclasses. So for >>> example if I have an instance *a* from *class A* and a *class B(A)* I >> want >>> to make *a* an instance of *B* in a straightforward way. >> >> Why don't you use: >> >> a.__class__ = B >> >> ? >> >>>>> class A: pass >> ... >>>>> class B(A): pass >> ... >>>>> a = A() >>>>> a >> <__main__.A instance at 0x7f0bf5ccdc68> >>>>> a.__class__ = B >>>>> a >> <__main__.B instance at 0x7f0bf5ccdc68> >>>>> >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Dec 10 2015) >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 10 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From Stephan.Sahm at gmx.de Thu Dec 10 04:52:28 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 10:52:28 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: <56694986.2010802@egenix.com> References: <566946D5.9080706@egenix.com> <56694986.2010802@egenix.com> Message-ID: Thanks for asking, I am glad to explain the idea behind Liftable: My first implementation of *lift* just checked whether the class has an initialize method (and not being of type Liftable. However then B must look like so: class B(A): def __init__(self, b, a): super(B, self).__init__(a) self.__initialize__(b) def __initialize__(self, b): self.b = b this *__init__* method is generic for all Liftable-types in my sense, so I wanted to factorize it out, where the first guess I had was to use an abc-mixin. So one main point is that I want to make further initializations to my lifted instance which make it a truly valid instance of the new type. best, Stephan On 10 December 2015 at 10:44, M.-A. Lemburg wrote: > On 10.12.2015 10:35, Stephan Sahm wrote: > > thanks for the fast response > > in fact, I do use this already, however more generically: > > > > def lift(self, new_class, **kwargs): > > ? ? > > ... > > ? ? > > self.__class__ = new_class > > ... > > Hmm, then I don't understand why you need a meta class > for this. Could you explain the requirements that led up to > needing a meta class ? > > If all you want to do is let the lift() know about > a certain property of a class to do it's thing, an > attribute or perhaps an empty base class would do the > trick, e.g. > > class A: > __liftable__ = True > > or perhaps: > > class A: > def lift(self, ...): > # lift operation goes here > > > On 10 December 2015 at 10:33, M.-A. Lemburg wrote: > > > >> On 10.12.2015 09:58, Stephan Sahm wrote: > >>> Dear all, > >>> > >>> I think I found a crucial usecase where the standard MRO does not work > >> out. > >>> I would appreciate your help to still solve this usecase, or might MRO > >> even > >>> be adapted? > >>> > >>> The idea is to build a generic Lift-type which I call this way because > >> the > >>> derived classes should be able to easily lift from subclasses. So for > >>> example if I have an instance *a* from *class A* and a *class B(A)* I > >> want > >>> to make *a* an instance of *B* in a straightforward way. > >> > >> Why don't you use: > >> > >> a.__class__ = B > >> > >> ? > >> > >>>>> class A: pass > >> ... > >>>>> class B(A): pass > >> ... > >>>>> a = A() > >>>>> a > >> <__main__.A instance at 0x7f0bf5ccdc68> > >>>>> a.__class__ = B > >>>>> a > >> <__main__.B instance at 0x7f0bf5ccdc68> > >>>>> > >> > >> -- > >> Marc-Andre Lemburg > >> eGenix.com > >> > >> Professional Python Services directly from the Experts (#1, Dec 10 2015) > >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>>>> Python Database Interfaces ... http://products.egenix.com/ > >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > >> ________________________________________________________________________ > >> > >> ::: We implement business ideas - efficiently in both time and costs ::: > >> > >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >> Registered at Amtsgericht Duesseldorf: HRB 46611 > >> http://www.egenix.com/company/contact/ > >> http://www.malemburg.com/ > >> > >> > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Dec 10 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Thu Dec 10 10:45:54 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 16:45:54 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: <566946D5.9080706@egenix.com> <56694986.2010802@egenix.com> Message-ID: Does anyone know how to handle this MRO-error or how to circumvent it? To me, the error seems unneeded in this case On 10 December 2015 at 10:52, Stephan Sahm wrote: > Thanks for asking, I am glad to explain the idea behind Liftable: > > My first implementation of *lift* just checked whether the class has an > initialize method (and not being of type Liftable. However then B must look > like so: > > class B(A): > def __init__(self, b, a): > super(B, self).__init__(a) > self.__initialize__(b) > > def __initialize__(self, b): > self.b = b > > this *__init__* method is generic for all Liftable-types in my sense, so > I wanted to factorize it out, where the first guess I had was to use an > abc-mixin. > > So one main point is that I want to make further initializations to my > lifted instance which make it a truly valid instance of the new type. > > best, > Stephan > > On 10 December 2015 at 10:44, M.-A. Lemburg wrote: > >> On 10.12.2015 10:35, Stephan Sahm wrote: >> > thanks for the fast response >> > in fact, I do use this already, however more generically: >> > >> > def lift(self, new_class, **kwargs): >> > ? ? >> > ... >> > ? ? >> > self.__class__ = new_class >> > ... >> >> Hmm, then I don't understand why you need a meta class >> for this. Could you explain the requirements that led up to >> needing a meta class ? >> >> If all you want to do is let the lift() know about >> a certain property of a class to do it's thing, an >> attribute or perhaps an empty base class would do the >> trick, e.g. >> >> class A: >> __liftable__ = True >> >> or perhaps: >> >> class A: >> def lift(self, ...): >> # lift operation goes here >> >> > On 10 December 2015 at 10:33, M.-A. Lemburg wrote: >> > >> >> On 10.12.2015 09:58, Stephan Sahm wrote: >> >>> Dear all, >> >>> >> >>> I think I found a crucial usecase where the standard MRO does not work >> >> out. >> >>> I would appreciate your help to still solve this usecase, or might MRO >> >> even >> >>> be adapted? >> >>> >> >>> The idea is to build a generic Lift-type which I call this way because >> >> the >> >>> derived classes should be able to easily lift from subclasses. So for >> >>> example if I have an instance *a* from *class A* and a *class B(A)* I >> >> want >> >>> to make *a* an instance of *B* in a straightforward way. >> >> >> >> Why don't you use: >> >> >> >> a.__class__ = B >> >> >> >> ? >> >> >> >>>>> class A: pass >> >> ... >> >>>>> class B(A): pass >> >> ... >> >>>>> a = A() >> >>>>> a >> >> <__main__.A instance at 0x7f0bf5ccdc68> >> >>>>> a.__class__ = B >> >>>>> a >> >> <__main__.B instance at 0x7f0bf5ccdc68> >> >>>>> >> >> >> >> -- >> >> Marc-Andre Lemburg >> >> eGenix.com >> >> >> >> Professional Python Services directly from the Experts (#1, Dec 10 >> 2015) >> >>>>> Python Projects, Coaching and Consulting ... >> http://www.egenix.com/ >> >>>>> Python Database Interfaces ... >> http://products.egenix.com/ >> >>>>> Plone/Zope Database Interfaces ... >> http://zope.egenix.com/ >> >> >> ________________________________________________________________________ >> >> >> >> ::: We implement business ideas - efficiently in both time and costs >> ::: >> >> >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> >> http://www.egenix.com/company/contact/ >> >> http://www.malemburg.com/ >> >> >> >> >> > >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Dec 10 2015) >> >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >> >>> Python Database Interfaces ... http://products.egenix.com/ >> >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Thu Dec 10 11:36:15 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 17:36:15 +0100 Subject: [Python-ideas] MRO local precedence ordering revisited Message-ID: Dear all, I just pushed a very specific question which as I now recognized is far more general: It concerns python MRO (method resolution order). As reference I take https://www.python.org/download/releases/2.3/mro/ There the section "Bad Method Resolution Orders" starts with an example which behaviour in the current python 2.7 is rather unintuitive to me. I rename the example slightly to illustrate its usecase. class Mixin(object): > pass class A(Mixin): > pass > class B(Mixin, A): > pass this unfortunately throws "TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases A, Mixin" The reference name above? comments this case similar to the following (adapted to the easier example above): > We see that class > ?B? > inherits from > ?Mixin? > and > ?A? > , with > ?Mixin > before > ?A? > : therefore we would expect attribute > ?s of ? > ?B > to be inherited > ? first? > by > ?Mixin > and > ?then? > by > ?A > : nevertheless Python 2.2 > ?was giving the opposite behaviour. > ?... > As a general rule, hierarchies such as the previous one should be avoided, > since it is unclear if > ?Mixin > should override > ?A? > or viceversa? ?While it might be the case that in Python? 2.2 things where different, I cannot agree that the expected order of Method resolution is ambiguous (at least as far I see at this stage). The reference itself says that we would expect B to be inherited ? first? by ?Mixin and ?then? by ?A. There is no ambiguity any longer. *Therefore I would like to propose to make this MRO again a valid one.* The usecase should be obvious: If you want a Mixin to be the first thing overwriting functions, but still want to inherit as normal. (I myself want for instance a Mixin to overwrite the __init__ method, which is simply not impossible when choosing class B(A, Mixin)) best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Dec 10 12:29:05 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Dec 2015 04:29:05 +1100 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: Message-ID: On Fri, Dec 11, 2015 at 3:36 AM, Stephan Sahm wrote: > As reference I take https://www.python.org/download/releases/2.3/mro/ > There the section "Bad Method Resolution Orders" starts with an example > which behaviour in the current python 2.7 is rather unintuitive to me. Just to make something quite clear: "current" Python, as discussed on this list, is 3.5 or 3.6, not 2.7. There won't be a 2.8, and future releases of 2.7.x won't be changing anything like this. What you're describing is also the case in current Python 3.x, but any change to how the MRO is derived would happen in 3.6 at the very soonest. (At least, I don't _think_ the current behaviour could be considered a bug. I could be wrong.) > I rename the example slightly to illustrate its usecase. > >> class Mixin(object): >> pass >> >> class A(Mixin): >> pass >> class B(Mixin, A): >> pass > > > this unfortunately throws "TypeError: Error when calling the metaclass bases > Cannot create a consistent method resolution order (MRO) for bases A, Mixin" > > > The reference name above comments this case similar to the following > (adapted to the easier example above): > > [quoting the docs] >> As a general rule, hierarchies such as the previous one should be avoided, >> since it is unclear if Mixin should override A or viceversa > > > While it might be the case that in Python 2.2 things where different, I > cannot agree that the expected order of Method resolution is ambiguous (at > least as far I see at this stage). > > Therefore I would like to propose to make this MRO again a valid one. > > The usecase should be obvious: If you want a Mixin to be the first thing > overwriting functions, but still want to inherit as normal. > (I myself want for instance a Mixin to overwrite the __init__ method, which > is simply not impossible when choosing class B(A, Mixin)) What you're proposing, in effect, is that it should be possible for super() to go *down* the class hierarchy. Under your proposal, B.__mro__ would be (Mixin,A,object), which means that super().method() inside A will go directly to object, and that the same call inside Mixin will go to A - reversing the order of the calls compared to the way they would be if the object were of type A. Now it is known that the super doesn't mean "call my base class", but something more like "call the next class"; but it's currently guaranteed that the MRO for any derived class is a *merge* of the MROs of its parents, without ever reordering them. Under your rules, what would be the MRO here? class A(object): # explicitly stating the default def __new__(cls): print("Constructing an A") return super().__new__(cls) def method(self): print("Calling method on A") super().method() class B(object, A): pass If you answered (B,object,A), then you've just made it possible to have some other class than object at the end of the chain. Where do super() calls go if there *is no next class*? What should happen? Can that be made equally intuitive to your proposal about mixins? ChrisA From vgr255 at live.ca Thu Dec 10 13:09:20 2015 From: vgr255 at live.ca (Emanuel Barry) Date: Thu, 10 Dec 2015 13:09:20 -0500 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: , Message-ID: I wholeheartedly agree with Chris on this one. This isn't broken, nor is it something we want to change. If you're looking for a case where object is not the last of its hierarchy, Python 2 has classic classes, but they are remaining artefacts there. The MRO needs to be consistent across all classes along the MRO, with object last. > Date: Fri, 11 Dec 2015 04:29:05 +1100 > From: rosuav at gmail.com > CC: python-ideas at python.org > Subject: Re: [Python-ideas] MRO local precedence ordering revisited > > On Fri, Dec 11, 2015 at 3:36 AM, Stephan Sahm wrote: > > As reference I take https://www.python.org/download/releases/2.3/mro/ > > There the section "Bad Method Resolution Orders" starts with an example > > which behaviour in the current python 2.7 is rather unintuitive to me. > > Just to make something quite clear: "current" Python, as discussed on > this list, is 3.5 or 3.6, not 2.7. There won't be a 2.8, and future > releases of 2.7.x won't be changing anything like this. What you're > describing is also the case in current Python 3.x, but any change to > how the MRO is derived would happen in 3.6 at the very soonest. (At > least, I don't _think_ the current behaviour could be considered a > bug. I could be wrong.) > > > I rename the example slightly to illustrate its usecase. > > > >> class Mixin(object): > >> pass > >> > >> class A(Mixin): > >> pass > >> class B(Mixin, A): > >> pass > > > > > > this unfortunately throws "TypeError: Error when calling the metaclass bases > > Cannot create a consistent method resolution order (MRO) for bases A, Mixin" > > > > > > The reference name above comments this case similar to the following > > (adapted to the easier example above): > > > > [quoting the docs] > >> As a general rule, hierarchies such as the previous one should be avoided, > >> since it is unclear if Mixin should override A or viceversa > > > > > > While it might be the case that in Python 2.2 things where different, I > > cannot agree that the expected order of Method resolution is ambiguous (at > > least as far I see at this stage). > > > > Therefore I would like to propose to make this MRO again a valid one. > > > > The usecase should be obvious: If you want a Mixin to be the first thing > > overwriting functions, but still want to inherit as normal. > > (I myself want for instance a Mixin to overwrite the __init__ method, which > > is simply not impossible when choosing class B(A, Mixin)) > > What you're proposing, in effect, is that it should be possible for > super() to go *down* the class hierarchy. Under your proposal, > B.__mro__ would be (Mixin,A,object), which means that super().method() > inside A will go directly to object, and that the same call inside > Mixin will go to A - reversing the order of the calls compared to the > way they would be if the object were of type A. Now it is known that > the super doesn't mean "call my base class", but something more like > "call the next class"; but it's currently guaranteed that the MRO for > any derived class is a *merge* of the MROs of its parents, without > ever reordering them. Under your rules, what would be the MRO here? > > class A(object): # explicitly stating the default > def __new__(cls): > print("Constructing an A") > return super().__new__(cls) > def method(self): > print("Calling method on A") > super().method() > > class B(object, A): > pass > > If you answered (B,object,A), then you've just made it possible to > have some other class than object at the end of the chain. Where do > super() calls go if there *is no next class*? What should happen? Can > that be made equally intuitive to your proposal about mixins? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 10 13:38:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Dec 2015 10:38:42 -0800 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: Message-ID: On Dec 10, 2015, at 00:58, Stephan Sahm wrote: > > Dear all, > > I think I found a crucial usecase where the standard MRO does not work out. I would appreciate your help to still solve this usecase, or might MRO even be adapted? First, do you have an actual use case for this? And are you really looking to suggest changes for Python 3.6, or looking for help with using Python 2.7 as-is? Anyway, I think the first problem here is that you're trying to put the same class, Liftable, on the MRO twice. That doesn't make sense--the whole point of superclass linearization is that each class only appears once in the list. If you weren't using a metaclass, you wouldn't see this error--but then you'd just get the more subtle problem that C can't be lifted from A to B because Liftable isn't getting called there. If you made Liftable a class factory, so two calls to Liftable() returned different class objects, then you might be able to make this work. (I suppose you could hide that from the user by giving Liftable a custom metaclass that constructs new class objects for each copy of Liftable in the bases list before calling through to type, but that seems like magic you really don't want to hide if you want anyone to be able to debut this code.) In fact, you could even make it Liftable(T), which makes your type Liftable from T, rather than from whatever class happens to come after you on the MRO chain. (Think about how this would work with other mixins, or pure-interface ABCs, or full multiple inheritance--you may end up declaring that C can be lifted from Sequence rather than B, which is nonsense, and which will be hard to debug if you don't understand the C3 algorithm.) Or, if you actually _want_ to be liftable from whatever happens to come next, then isn't liftability a property of the entire tree of classes, not of individual classes in that tree, so you should only be specifying Liftable once (either at A, or at B) in the hierarchy in the first place? From what I can tell, the only benefit you get from installing it twice is tricking the ABCMeta machinery into enforcing that all classes implement _initialize_ instead of just enforcing that one does; the easy solution there is to just write your own metaclass that does that check directly. Or maybe, instead of enforcing it, use it as a signal: build a "lift chain" for each Liftable type out of all classes on the MRO that directly implement _initialize_ (or just dynamically look for it as you walk the MRO in lift). So lift only works between those classes. I think that gets you all the same benefits as Liftable(T), without needing a class factory, and without having to specify it more than once on a hierarchy. > The idea is to build a generic Lift-type which I call this way because the derived classes should be able to easily lift from subclasses. So for example if I have an instance a from class A and a class B(A) I want to make a an instance of B in a straightforward way. > > My implementation (Python 2.7): > > import abc > import inspect > > def use_as_needed(func, kwargs): > meta = inspect.getargspec(func) > if meta.keywords is not None: > return meta(**kwargs) > else: > # not generic super-constructor - pick only the relevant subentries: > return func(**{k:kwargs[k] for k in kwargs if k in meta.args}) > > class Liftable(object): > __metaclass__ = abc.ABCMeta > > def __init__(self, **kwargs): > use_as_needed(super(Liftable,self).__init__, kwargs) > use_as_needed(self.__initialize__, kwargs) > > @abc.abstractmethod > def __initialize__(self, **kwargs): > return NotImplemented() > > class NoMatchingAncestor(RuntimeError): > pass > > class NotLiftable(RuntimeError): > pass > > def lift(self, new_class, **kwargs): > # Stop Conditions: > if self.__class__ is new_class: > return # nothing to do > elif new_class is object: # Base Case > # break recursion at once: > raise NoMatchingAncestor() > elif new_class.__base__ is not Liftable: #to ensure this is save > raise NotLiftable("Class {} is not Liftable (must be first parent)".format(new_class.__name__)) > > # recursive case: > if not self.__class__ is new_class.__bases__[1]: > lift(self, new_class.__bases__[1], **kwargs) > # own case: > self.__class__ = new_class > use_as_needed(self.__initialize__, kwargs) > > and the example usecase: > class A(object): > def __init__(self, a): > self.a = a > > class B(Liftable, A): > def __initialize__(self, b): > self.b = b > > a = A(1) > print a.a, a.__class__ > # 1 > > lift(a, B, b=2) > print a.a, a.b, a.__class__ > # 1 2 > > this works so far, however if I now put a further level of Liftable (which in principal already works with the generic definition > class C(Liftable, B): > def __initialize__(self, c): > self.c = c > > I get the error > TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases Liftable, B > > ?I read about MRO, and it seems to be the case that this setting somehow raises this generic Error, however I really think having such a Lifting is save and extremely useful? - how can I make it work in python? > > (one further comment: switching the order of inheritance, i.e. class B(A, Liftable) will call A.__init__ before Liftable.__init__ which makes the whole idea senseless) > > Any constructive help is appreciated! > best, > Stephan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Dec 10 13:46:58 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 11 Dec 2015 03:46:58 +0900 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: Message-ID: <22121.51362.876571.15720@turnbull.sk.tsukuba.ac.jp> On Fri, Dec 11, 2015 at 3:36 AM, Stephan Sahm wrote: > > While it might be the case that in Python 2.2 things where > > different, I cannot agree that the expected order of Method > > resolution is ambiguous (at least as far I see at this stage). It's unambiguous to you because you *already know* your "expected" semantics. The spam-eggs example in Michele Simionato's paper that was cited earlier (https://www.python.org/download/releases/2.3/mro/) makes it clear why this is ambiguous from the point of view of the interpreter. The interpreter can only see the syntax, and there are two semantic principles it can follow to determine what various users might expect: use the most specific base class first (which enforces monotonicity IIUC), or use the order specified in the class first (local precedence). In this case those two principles conflict, and therefore MRO is ambiguous. In cases where you know the "expected" semantics, probably you can use a metaclass to enforce them (don't ask me how, I've never programmed a metaclass). BTW, multiple inheritance != mixin, and I certainly wouldn't call your Liftable class (from the original thread) a mixin, as it has complex and unobvious semantics of instance initialization in the presence of multiple inheritance. YMMV, as there doesn't seem to be an official definition of mixin for Python. Chris Angelico writes: > (At least, I don't _think_ the current behaviour could be > considered a bug. I could be wrong.) Given that this was discussed extensively for 2.3, and Guido found himself compelled to change his mind to support the C3 algorithm by Samuele Pedroni's examples at that time, and that the C3 algorithm has been widely adopted by other languages (intuition portability!), I would say, no, this is *not* a bug. There may be an algorithm which satisfies the C3 conditions and generates a order in more cases, in which case changing to that algorithm would be a reasonable feature request. But C3 still excludes the OP's desired MRO algorithm. From abarnert at yahoo.com Thu Dec 10 13:58:30 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Dec 2015 10:58:30 -0800 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: Message-ID: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> On Dec 10, 2015, at 08:36, Stephan Sahm wrote: > > Dear all, > > I just pushed a very specific question which as I now recognized is far more general: It concerns python MRO (method resolution order). > > As reference I take https://www.python.org/download/releases/2.3/mro/ > There the section "Bad Method Resolution Orders" starts with an example which behaviour in the current python 2.7 is rather unintuitive to me. First off, Python 2.7 isn't going to change, and there isn't going to be a 2.8. If you want to propose changes to Python, you first need to learn what's changed up to 3.5, and then propose your change for 3.6. Second, the fact that it's unintuitive may be a problem with the description, rather than the algorithm. Once you understand what it's trying to do, and what you're trying to do, and why they're not compatible, it's intuitive that it doesn't work. Maybe the docs need to make it easier to understand what it's trying to do, and maybe you can help point out what's confusing you. Since Python just borrowed the C3 algorithm from Dylan to solve the same problems, and other languages have borrowed it as well, you might want to search for more generic documentation on it, too. Anyway, if you understand what C3 is solving, and you think it could be modified to still solve those problems while also allowing your use, or that one of those problems can't be solved but you don't think it's actually a problem, then explain that. Otherwise, I think you're either asking for something inconsistent, or asking for something consistent but broken, like the C++ rules (which would be even more broken in Python without first adding notions like virtual base class and auto-pre-supering constructors). > I rename the example slightly to illustrate its usecase. > >> class Mixin(object): >> pass >> class A(Mixin): >> pass >> class B(Mixin, A): >> pass > > this unfortunately throws "TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases A, Mixin" > > > The reference name above? comments this case similar to the following (adapted to the easier example above): >> We see that class ?B? inherits from ?Mixin? and ?A?, with ?Mixin before ?A?: therefore we would expect attribute?s of ??B to be inherited? first? by ?Mixin and ?then? by ?A: nevertheless Python 2.2 ?was giving the opposite behaviour. >> ?... >> As a general rule, hierarchies such as the previous one should be avoided, since it is unclear if ?Mixin should override ?A? or viceversa? > > > ?While it might be the case that in Python? 2.2 things where different, I cannot agree that the expected order of Method resolution is ambiguous (at least as far I see at this stage). Mixin has to appear before A so that its methods appear before A's. But Mixin has to appear between A and object so that A's methods can count on inheriting from it. How do you fit both those constraints? > The reference itself says that we would expect B to be inherited? first? by ?Mixin and ?then? by ?A. There is no ambiguity any longer. > > Therefore I would like to propose to make this MRO again a valid one. > > > The usecase should be obvious: If you want a Mixin to be the first thing overwriting functions, but still want to inherit as normal. That's not what mixins are for. Mixins add behavior; something that changes the behavior of a real superclass is something different. > (I myself want for instance a Mixin to overwrite the __init__ method, which is simply not impossible when choosing class B(A, Mixin)) > > > best, > Stephan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Thu Dec 10 15:03:58 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 21:03:58 +0100 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> Message-ID: Dear all, this is a highly sensitive topic, isn't it? =) Thank you all for your extensive responses. Of course I was a bit provocative with the sentence in bold, but I haven't said this would be a bug, and if I have, I am sorry for that. I am in fact not familiar with what precisely are all the problems the C3 algorithm tries to solve, however my own intuitiv MRO would not be anything proposed so far, but: (B, mixin, A, mixin, object) or in the simpler, more trivial version (B, object, A, object) As you haven't mentioned this as a possibility at all, I guess having a class twice in this list produces some weird behaviour I do not know about yet - if someone can point out, that would be great. cheers, Stephan P.S.: I haven't said that I want a python 2.8, nevertheless thanks for pointing out that there won't be a python 2.8 (this I in fact have not known before)! I am familar with python 3.5 as well, however I just wanted to make clear where I am working and in which language the example code is runnable. On 10 December 2015 at 19:58, Andrew Barnert wrote: > On Dec 10, 2015, at 08:36, Stephan Sahm wrote: > > Dear all, > > I just pushed a very specific question which as I now recognized is far > more general: It concerns python MRO (method resolution order). > > As reference I take https://www.python.org/download/releases/2.3/mro/ > There the section "Bad Method Resolution Orders" starts with an example > which behaviour in the current python 2.7 is rather unintuitive to me. > > > First off, Python 2.7 isn't going to change, and there isn't going to be a > 2.8. If you want to propose changes to Python, you first need to learn > what's changed up to 3.5, and then propose your change for 3.6. > > Second, the fact that it's unintuitive may be a problem with the > description, rather than the algorithm. Once you understand what it's > trying to do, and what you're trying to do, and why they're not compatible, > it's intuitive that it doesn't work. Maybe the docs need to make it easier > to understand what it's trying to do, and maybe you can help point out > what's confusing you. > > Since Python just borrowed the C3 algorithm from Dylan to solve the same > problems, and other languages have borrowed it as well, you might want to > search for more generic documentation on it, too. > > Anyway, if you understand what C3 is solving, and you think it could be > modified to still solve those problems while also allowing your use, or > that one of those problems can't be solved but you don't think it's > actually a problem, then explain that. Otherwise, I think you're either > asking for something inconsistent, or asking for something consistent but > broken, like the C++ rules (which would be even more broken in Python > without first adding notions like virtual base class and auto-pre-supering > constructors). > > I rename the example slightly to illustrate its usecase. > > class Mixin(object): >> pass > > class A(Mixin): >> pass >> class B(Mixin, A): >> pass > > > this unfortunately throws "TypeError: Error when calling the metaclass > bases Cannot create a consistent method resolution order (MRO) for bases A, > Mixin" > > > The reference name above? comments this case similar to the following > (adapted to the easier example above): > >> We see that class >> ?B? >> inherits from >> ?Mixin? >> and >> ?A? >> , with >> ?Mixin >> before >> ?A? >> : therefore we would expect attribute >> ?s of ? >> ?B >> to be inherited >> ? first? >> by >> ?Mixin >> and >> ?then? >> by >> ?A >> : nevertheless Python 2.2 >> ?was giving the opposite behaviour. >> > ?... >> As a general rule, hierarchies such as the previous one should be >> avoided, since it is unclear if >> ?Mixin >> should override >> ?A? >> or viceversa? > > > > ?While it might be the case that in Python? 2.2 things where different, I > cannot agree that the expected order of Method resolution is ambiguous (at > least as far I see at this stage). > > > Mixin has to appear before A so that its methods appear before A's. But > Mixin has to appear between A and object so that A's methods can count on > inheriting from it. How do you fit both those constraints? > > The reference itself says that we would expect > B > to be inherited > ? first? > by > ?Mixin > and > ?then? > by > ?A. There is no ambiguity any longer. > > > *Therefore I would like to propose to make this MRO again a valid one.* > > > The usecase should be obvious: If you want a Mixin to be the first thing > overwriting functions, but still want to inherit as normal. > > > That's not what mixins are for. Mixins add behavior; something that > changes the behavior of a real superclass is something different. > > (I myself want for instance a Mixin to overwrite the __init__ method, > which is simply not impossible when choosing class B(A, Mixin)) > > > best, > Stephan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Dec 10 15:32:54 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Dec 2015 07:32:54 +1100 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> Message-ID: On Fri, Dec 11, 2015 at 7:03 AM, Stephan Sahm wrote: > I am in fact not familiar with what precisely are all the problems the C3 > algorithm tries to solve, however my own intuitiv MRO would not be anything > proposed so far, but: > > (B, mixin, A, mixin, object) > > or in the simpler, more trivial version > > (B, object, A, object) > > As you haven't mentioned this as a possibility at all, I guess having a > class twice in this list produces some weird behaviour I do not know about > yet - if someone can point out, that would be great. In terms of people's expectations of super(), I think it would be intolerably surprising for it to be able to call a method from *the same* class. If your mixin has "def method(self): super().method()", it'll call itself (once) if called from B. > P.S.: I haven't said that I want a python 2.8, nevertheless thanks for > pointing out that there won't be a python 2.8 (this I in fact have not known > before)! I am familar with python 3.5 as well, however I just wanted to make > clear where I am working and in which language the example code is runnable. https://www.python.org/dev/peps/pep-0404/ http://blog.startifact.com/guido_no.jpg There are some special exemptions in 2.7 (for instance, some changes went through recently that improve security, even at the expense of some backward compatibility), but the bar is extremely high; behaviour has to either be recognizably buggy, or have significant security implications (cf hash randomization and certificate checking), to be changed in 2.7. I don't think the MRO fits either category, so any change would be 3.x only. As mentioned, though, your code works on 3.5/3.6 without changes and with the same semantics. ChrisA From ned at nedbatchelder.com Thu Dec 10 15:43:01 2015 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 10 Dec 2015 15:43:01 -0500 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> Message-ID: <5669E3D5.4070001@nedbatchelder.com> On 12/10/15 3:03 PM, Stephan Sahm wrote: > (B, mixin, A, mixin, object) > > or in the simpler, more trivial version > > (B, object, A, object) > > As you haven't mentioned this as a possibility at all, I guess having > a class twice in this list produces some weird behaviour I do not know > about yet - if someone can point out, that would be great. The MRO is a list of classes to search for attributes. There's no point in having a class listed twice. The second occurrence would never be used, because any attribute it could provide would be found on the first occurrence of the class. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Thu Dec 10 15:51:31 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 21:51:31 +0100 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: <5669E3D5.4070001@nedbatchelder.com> References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> <5669E3D5.4070001@nedbatchelder.com> Message-ID: @Chris thanks for pointing out this self-reference, I am still not sure whether this is really suprising to me, but at least I am still not done with thinking about it, so probably it is @Ned As I understood it, the MRO is not only for searching attributes - there it is indeed impressively redundant to put the same class twice into the MRO, thanks for pointing that out - but also for the hierarchy of the super() command @all thank you all for your comments and help. My current conclusion is that I will read about the C3 algorithm in crucially more detail and what it in fact is trying to solve ... and eventually may come back best, Stephan On 10 December 2015 at 21:43, Ned Batchelder wrote: > On 12/10/15 3:03 PM, Stephan Sahm wrote: > > (B, mixin, A, mixin, object) > > or in the simpler, more trivial version > > (B, object, A, object) > > As you haven't mentioned this as a possibility at all, I guess having a > class twice in this list produces some weird behaviour I do not know about > yet - if someone can point out, that would be great. > > The MRO is a list of classes to search for attributes. There's no point > in having a class listed twice. The second occurrence would never be used, > because any attribute it could provide would be found on the first > occurrence of the class. > > --Ned. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Thu Dec 10 16:05:23 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Thu, 10 Dec 2015 22:05:23 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: Message-ID: Dear Andrew, thank you very much for this impressively constructive response. It is for sure more constructive than I can react on now. For the concrete usecase, the Liftable-signal in might be the most interesting option, as then already the base class can inherit the Liftable-signal and can itself already use lift. However, I cannot see how to make the __init__ method conform in this setting, but by inidividual implementations (I in fact thought that enforcing it by the mixin makes things safer, and it of course should reduce boilerplate code) The Liftable(T) in fact seems also great, as I cannot see how to avoid this lifting from a false class in the Liftable-signal chain. I only want to Lift from one class at the moment, so this is in fact kind of what I was after. The rough outline would look like class B(Lift(A)): pass class C(Lift(B)): pass which seems rather beautiful to read - thank you very much for pointing this out. If you have an idea how to automatically create the right __init__ method when using the Liftable-signal-chain, I would highly welcome it. I myself need to recap some metaclass basics again before seriously tackling this. Best, Stephan On 10 December 2015 at 19:38, Andrew Barnert wrote: > On Dec 10, 2015, at 00:58, Stephan Sahm wrote: > > Dear all, > > I think I found a crucial usecase where the standard MRO does not work > out. I would appreciate your help to still solve this usecase, or might MRO > even be adapted? > > > First, do you have an actual use case for this? And are you really looking > to suggest changes for Python 3.6, or looking for help with using Python > 2.7 as-is? > > Anyway, I think the first problem here is that you're trying to put the > same class, Liftable, on the MRO twice. That doesn't make sense--the whole > point of superclass linearization is that each class only appears once in > the list. > > If you weren't using a metaclass, you wouldn't see this error--but then > you'd just get the more subtle problem that C can't be lifted from A to B > because Liftable isn't getting called there. > > If you made Liftable a class factory, so two calls to Liftable() returned > different class objects, then you might be able to make this work. (I > suppose you could hide that from the user by giving Liftable a custom > metaclass that constructs new class objects for each copy of Liftable in > the bases list before calling through to type, but that seems like magic > you really don't want to hide if you want anyone to be able to debut this > code.) > > In fact, you could even make it Liftable(T), which makes your type > Liftable from T, rather than from whatever class happens to come after you > on the MRO chain. (Think about how this would work with other mixins, or > pure-interface ABCs, or full multiple inheritance--you may end up declaring > that C can be lifted from Sequence rather than B, which is nonsense, and > which will be hard to debug if you don't understand the C3 algorithm.) > > Or, if you actually _want_ to be liftable from whatever happens to come > next, then isn't liftability a property of the entire tree of classes, not > of individual classes in that tree, so you should only be specifying > Liftable once (either at A, or at B) in the hierarchy in the first place? > From what I can tell, the only benefit you get from installing it twice is > tricking the ABCMeta machinery into enforcing that all classes implement > _initialize_ instead of just enforcing that one does; the easy solution > there is to just write your own metaclass that does that check directly. > > Or maybe, instead of enforcing it, use it as a signal: build a "lift > chain" for each Liftable type out of all classes on the MRO that directly > implement _initialize_ (or just dynamically look for it as you walk the MRO > in lift). So lift only works between those classes. I think that gets you > all the same benefits as Liftable(T), without needing a class factory, and > without having to specify it more than once on a hierarchy. > > The idea is to build a generic Lift-type which I call this way because the > derived classes should be able to easily lift from subclasses. So for > example if I have an instance *a* from *class A* and a *class B(A)* I > want to make *a* an instance of *B* in a straightforward way. > > My implementation (Python 2.7): > > import abc > import inspect > > def use_as_needed(func, kwargs): > meta = inspect.getargspec(func) > if meta.keywords is not None: > return meta(**kwargs) > else: > # not generic super-constructor - pick only the relevant > subentries: > return func(**{k:kwargs[k] for k in kwargs if k in meta.args}) > > class Liftable(object): > __metaclass__ = abc.ABCMeta > > def __init__(self, **kwargs): > use_as_needed(super(Liftable,self).__init__, kwargs) > use_as_needed(self.__initialize__, kwargs) > > @abc.abstractmethod > def __initialize__(self, **kwargs): > return NotImplemented() > > class NoMatchingAncestor(RuntimeError): > pass > > class NotLiftable(RuntimeError): > pass > > def lift(self, new_class, **kwargs): > # Stop Conditions: > if self.__class__ is new_class: > return # nothing to do > elif new_class is object: # Base Case > # break recursion at once: > raise NoMatchingAncestor() > elif new_class.__base__ is not Liftable: #to ensure this is save > raise NotLiftable("Class {} is not Liftable (must be first > parent)".format(new_class.__name__)) > > # recursive case: > if not self.__class__ is new_class.__bases__[1]: > lift(self, new_class.__bases__[1], **kwargs) > # own case: > self.__class__ = new_class > use_as_needed(self.__initialize__, kwargs) > > > and the example usecase: > > class A(object): > def __init__(self, a): > self.a = a > > class B(Liftable, A): > def __initialize__(self, b): > self.b = b > > a = A(1) > print a.a, a.__class__ > # 1 > > lift(a, B, b=2) > print a.a, a.b, a.__class__ > # 1 2 > > > this works so far, however if I now put a further level of Liftable (which > in principal already works with the generic definition > > class C(Liftable, B): > def __initialize__(self, c): > self.c = c > > > I get the error > > TypeError: Error when calling the metaclass bases Cannot create a > consistent method resolution order (MRO) for bases Liftable, B > > > ?I read about MRO, and it seems to be the case that this setting somehow > raises this generic Error, however I really think having such a Lifting is > save and extremely useful? - how can I make it work in python? > > (one further comment: switching the order of inheritance, i.e. class B(A, > Liftable) will call A.__init__ before Liftable.__init__ which makes the > whole idea senseless) > > Any constructive help is appreciated! > best, > Stephan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 10 16:26:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Dec 2015 13:26:15 -0800 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> Message-ID: <49D2ADD2-B86B-41DF-8B7D-42B5991328A3@yahoo.com> On Dec 10, 2015, at 12:03, Stephan Sahm wrote: > > Dear all, > > this is a highly sensitive topic, isn't it? Well, it's an _interesting_ topic. If there really is something new to be done here that could improve the way inheritance works, that's a big deal, so people want to think about it, which means asking you challenging questions. > =) Thank you all for your extensive responses. Of course I was a bit provocative with the sentence in bold, but I haven't said this would be a bug, and if I have, I am sorry for that. > > I am in fact not familiar with what precisely are all the problems the C3 algorithm tries to solve, however my own intuitiv MRO would not be anything proposed so far, but: > > (B, mixin, A, mixin, object) > > or in the simpler, more trivial version > > (B, object, A, object) First, think about what kind of algorithm could get mixin to appear twice in the first list without also getting object to appear twice. I don't think there's any obvious way to do it. (Well, you could special-case object, but that doesn't help the more general diamond problem.) If you have a non-obvious answer to that, even though that still might not be a complete solution to what you really wanted, it would be a major contribution on its own. > As you haven't mentioned this as a possibility at all, I guess having a class twice in this list produces some weird behaviour I do not know about yet - if someone can point out, that would be great. For an obvious problem: how could any cooperative inheritance tree ever super anything if object, which obviously doesn't cooperate, might end up on the MRO between one class and the next? Plus, if a class can get called twice on the super chain, how does it know which of the two times it's being called? Imagine a trivial mixin that just counts how many instances of its descendant get created. How do you avoid counting twice when a C gets constructed. (If you're thinking of using __mangled names to help, notice that _mixin__mangled and _mixin__mangled are the same name.) More generally: the code in a class generally assumes that it's not going to somehow super itself. Breaking that assumption, especially for code that adds attributes, makes the code much harder to write, and reason about. You probably can come up with a manual solution to these problems that doesn't work fully generally, but does work for your use case. (By adding specific rules on how classes in your tree interact, you can simplify the problem as far as you want.) Which you'd implement in a metaclass. But once you're writing a metaclass that interferes with the MRO mechanism, why do you care what the default MRO mechanism is? (That last might not be a rhetorical question--maybe the answer is "I wouldn't care, but it's too hard to override the setup of __mro__ in a metaclass __new__ method" or something, in which case there may be something to improve here.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ian.g.kelly at gmail.com Thu Dec 10 16:58:37 2015 From: ian.g.kelly at gmail.com (Ian Kelly) Date: Thu, 10 Dec 2015 14:58:37 -0700 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> <5669E3D5.4070001@nedbatchelder.com> Message-ID: On Thu, Dec 10, 2015 at 1:51 PM, Stephan Sahm wrote: > @Ned > As I understood it, the MRO is not only for searching attributes - there it > is indeed impressively redundant to put the same class twice into the MRO, > thanks for pointing that out - but also for the hierarchy of the super() > command In that situation including the class twice results in an infinite loop. If your MRO is (B, mixin, A, mixin, object), then the chain of super calls looks like this: super(B, self).method() super(mixin, self).method() super(A, self).method() super(mixin, self).method() super(A, self).method() ... This happens because the second super call from the mixin class is identical to the first one. super doesn't know which appearance of mixin based on the arguments passed it, and if it naively assumes the first appearance, then nothing after the second appearance of mixin will ever be reached. From greg.ewing at canterbury.ac.nz Thu Dec 10 18:11:16 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Dec 2015 12:11:16 +1300 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: Message-ID: <566A0694.1010903@canterbury.ac.nz> Stephan Sahm wrote: > class Mixin(object): > pass > > class A(Mixin): > pass > class B(Mixin, A): > pass > > > this unfortunately throws "TypeError: Error when calling the metaclass > bases Cannot create a consistent method resolution order (MRO) for bases > A, Mixin" The reason Python disallows this is that B is saying that methods of Mixin should override those of A, whereas A is saying that its own methods should override those of Mixin. So there is a potential for B to break A's functionality. It may be that nothing gets broken in your particular case, but Python can't know that in general, so it errs on the safe side. -- Greg From greg.ewing at canterbury.ac.nz Thu Dec 10 18:19:33 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Dec 2015 12:19:33 +1300 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> Message-ID: <566A0885.7000505@canterbury.ac.nz> Stephan Sahm wrote: > my own intuitiv MRO would not be > anything proposed so far, but: > > (B, mixin, A, mixin, object) > > As you haven't mentioned this as a possibility at all, I guess having a > class twice in this list produces some weird behaviour I do not know > about yet - if someone can point out, that would be great. I'm not sure what that would do to super calls. It's possible you would end up in a loop from Mixin -> A -> Mixin -> A -> ... But in any case, this MRO doesn't solve the fundamental problem that the MROs of B and A are inherently contradictory. *You* know what MRO you want in this particular case, but there's no way for Python to know that. -- Greg From abarnert at yahoo.com Fri Dec 11 00:53:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Dec 2015 21:53:11 -0800 Subject: [Python-ideas] MRO local precedence ordering revisited In-Reply-To: <566A0885.7000505@canterbury.ac.nz> References: <5CE74DEE-6E37-469A-B74B-40710FFCDEEB@yahoo.com> <566A0885.7000505@canterbury.ac.nz> Message-ID: On Dec 10, 2015, at 15:19, Greg Ewing wrote: > > Stephan Sahm wrote: >> my own intuitiv MRO would not be anything proposed so far, but: >> (B, mixin, A, mixin, object) >> As you haven't mentioned this as a possibility at all, I guess having a class twice in this list produces some weird behaviour I do not know about yet - if someone can point out, that would be great. > > I'm not sure what that would do to super calls. It's possible > you would end up in a loop from Mixin -> A -> Mixin -> A -> ... > > But in any case, this MRO doesn't solve the fundamental > problem that the MROs of B and A are inherently contradictory. > *You* know what MRO you want in this particular case, but > there's no way for Python to know that. After thinking about this a bit: the MRO isn't really contradictory; what's contradictory is what he's expecting super() to do with his MRO. When called from within mixin.spam(), he wants it to give him A, but when calling from within mixin.spam(), he wants it to give him object. Since those two "whens" are identical, there's his problem. But if you never call super(), there's nothing wrong with it; any attribute provided by Mixin the first time means you'll never get to Mixin the second time. (I suppose you could hack that with quantum attributes that have an x% chance of being there each time you call __getattr__, but just don't do that...) Of course he wants to call super. Or, rather, he wants to call something that's kind of like super, but that handles his MRO in a non-contradictory way. A similar MI next-method protocol that went by index rather than by name would work: within mro[1].spam() it gives you mro[2]; within mro[3].spam() it gives you mro[4]. Python doesn't come with a function that does that, but you can write one, with a bit of kludgery. And you can easily write a metaclass that sets up the MRO you want. Which means you can build exactly what the OP is asking for. It still won't do what he wants. For one thing, like super, it depends on the entire class hierarchy cooperating, which includes the root not supering but everyone else doing so--but mixin is the root, and also elsewhere on the tree, so it has to both super and not super. And there are a couple other problems. But tightening up the requirements a bit, I think you can make something coherent out of this. It's definitely not the best way to solve his problem, and it's probably not a good way to solve _any_ problem--but it does work. See http://stupidpythonideas.blogspot.com/2015/12/can-you-customize-method-resolution.html for a fully worked through example. From leewangzhong+python at gmail.com Fri Dec 11 20:01:10 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 11 Dec 2015 20:01:10 -0500 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function Message-ID: (Replying to an email I didn't get in my inbox. I hope this goes through correctly.) On Friday, December 4, 2015 at 4:44:47 PM UTC-5, Bill Winslow wrote: > This is a question I posed to reddit, with no real resolution: https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/ > The summary for people here is the following: > Here's a pattern I'm using for my code: > def deterministic_recursive_ calculation(input, partial_state=None): > condition = do_some_calculations(input) > if condition: > return deterministic_recursive_calculation(reduced_input, some_state) > Basically, in calculating the results of the subproblem, the subproblem can be calculated quicker by including/sharing some partial results from the superproblem. (Calling the subproblem without the partial state still gives the same result, but takes substantially longer.) Solutions: 1. Rewrite your recursive function so that the partial state is a nonlocal variable (in the closure), and memoize the recursive part. 2. Attach the state to the function, rather than passing it as a parameter. (Also suggested in this comment: https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/cxr7cnp) 3. Wrap the state in a class for which __eq__ is defined to say that all instances of it are the same, and __hash__ always returns a constant (say, the id of the class). I think #1 is the best: The state is a property of the current recursive execution. You make a function which solves the problem, and the recursive function is an inner function of the solver. The state will be lost when you finish computing the answer. If you want to keep the partial state even after the recursion finished, then #2 is the right answer: the state is a property of the solver. I feel that #3 is icky and heavy-handed. The function won't be semantically responsible for the state. Hacking/extending lru_cache is saying the wrong thing about what the partial state is. Cached information (which is what your state is) isn't really an argument to the function as a mathematical input-output box. Also, if you're calling this function multiple times, you as the caller shouldn't be expected to keep track of the cache, so you shouldn't be expected to store it (in global scope). But if it's reusable, it should be stored, so the function should be responsible for storing it. If you want to keep the lru_cache, too, then you can store the recursive function as an attribute. It will even keep itself and your partial state in its closure. Sample code so you can confirm that the state is stored between calls. Just call solve(50) twice in a row, and compare the times. # WARNING: I don't recommend x > 100. # I've killed two interpreters while messing with this. def solve(x): try: rec = solve.rec # To trigger the exception the first time. except: @lru_cache(None) def rec(n, m): return sum( rec(i, j) for i in reversed(range(n)) for j in reversed(range(m))) solve.rec = rec return rec(x, x) From leewangzhong+python at gmail.com Fri Dec 11 20:19:42 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 11 Dec 2015 20:19:42 -0500 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: By the way, there are other usecases for ignoring arguments for caching. For example, dynamic programming where the arguments are the indices of a sequence, or some other object (tree?) which isn't a recursive argument. I recommend that those also be done with a closure (separating the recursive part from the initial arguments), but I think it's worth considering an lru_cache implementation for students who haven't learned to, er, abuse closures. Unless someone thinks a recipe can/should be added to the docs. On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee wrote: > Solutions: > 1. Rewrite your recursive function so that the partial state is a > nonlocal variable (in the closure), and memoize the recursive part. > 2. Attach the state to the function, rather than passing it as a > parameter. (Also suggested in this comment: > https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cache_only_on_some_arguments/cxr7cnp) > 3. Wrap the state in a class for which __eq__ is defined to say that > all instances of it are the same, and __hash__ always returns a > constant (say, the id of the class). (What's the etiquette for adding onto your own email? I snipped most of it for the kb bloat.) From leewangzhong+python at gmail.com Sat Dec 12 04:27:43 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 12 Dec 2015 04:27:43 -0500 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict Message-ID: Here's Tim Peters's 2013 explanation of OrderedDict: http://stackoverflow.com/a/18951209/2963903 He said: > But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list. I didn't see why this would need to be the case. 1. If you start searching from the front, then each element would be inspected or shifted, never both. This would mean only one full iteration. 2. You don't have to shift for each deletion. You can wait until some threshold is reached before shifting, and maybe that will spread out the cost of shift among deletions enough to make an impact. 3. He then mentions that the linkedlist implementation uses a second dict to map keys to their list nodes. Well, that would solve the cost of lookup for lists, too. So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm not really sure what to do next, in terms of testing and comparing it against the existing implementation, and whether it's worth pushing for such a design for OrderedDict (current Python version or future C cersion). It's enough for me if I can figure out how this compares in performance, both algorithmically and actually, to the current implementation. -------------------------------------------------------------------- First idea: (link: http://pastebin.com/LESRktJw) Data members (with ListDict() as od): - super() : dict[key] => value - od.__map : dict[key] => index where od.__list[index] = key - od.__list : list[index] => key or `_sentinel` - od.__size : int - number of non-sentinel elements in list (aka len(dict)) - This might be replaceable by len(self). Some algorithms to note: - .__setitem__: Add it to the end of __list if it's new. - .__delitem__: Replace the __list entry with `sentinel`, and attempt to compact the list. - .__compact: Called when items are removed. Naive threshold: If the list is 50% empty, then make a new list and update the indices for each key. - (iterator views): Simply yield things that aren't `sentinel`. - .move_to_end(last=False): The biggest loss. Adding to the front of the list will shift potentially many things up. Maybe if the list were a dequeue, this would be faster. The actual length of __list will slow down iteration through the list, but the factor of O(n) is dependent on the threshold value. And the code itself MIGHT be faster per-element, and maybe (big maybe) easier for the hardware. Needs special review: - __sizeof__: I have no idea if I did this right. - (iterators): Can they be made to sometimes detect additions/removals during iteration? - Exception safety: E.g. in __setitem__, the order of updates for the metadata might leave the dict in an invalid state. Also needs review: - I put a lot of global calls as default args, but the existing use of default args for static name lookup seemed inconsistent. -------------------------------------------------------------------- Second idea: thicker wrapper. Store a key,value,index triple in the internal dictionary, instead of just the value. This will have a wrapper cost on __getitem__ (have to get the value from the triple), but will save many lookups for adding and removing elements (which is probably the less-common use). (At the C level, it should be possible to keep multiple storage dictionaries in lockstep. Also, the triple doesn't need to be exposed to the Python interface, right?) Type: class _DictStorage(object): __slots__ = ['key', 'value', 'index'] Members: (external): ListDict[key] => value super() : dict[key] => [key, value, index] .__list : list[i] => [key, value, index] (same as above) or None .__size : as in the other implementation. Probably just as unnecessary. .__map : (removed; unneeded) Some method implementations: .__getitem__: Grab the [k,v,i] from the inner dict and unwrap it. .__setitem__: If the triple exists, update its value. If it doesn't exist, add in [k, v, len(__list)-1] and append it to __list. (This saves a lookup in the internal dict.) .__delitem__: inner.pop(key), and use the index to None it on the list. (iteration): Same as the first ListDict implementation, except using None instead of _sentinel. .move_to_end, .__compact: Same as the first ListDict, except I won't have to dict.get the keys to update their indices. Third idea: Well, I have to learn PyPy's implementation of OrderedDict. From abarnert at yahoo.com Sat Dec 12 06:06:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Dec 2015 03:06:55 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: Message-ID: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: > > Here's Tim Peters's 2013 explanation of OrderedDict: > http://stackoverflow.com/a/18951209/2963903 > > He said: >> But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list. > > I didn't see why this would need to be the case. > 1. If you start searching from the front, then each element would be > inspected or shifted, never both. This would mean only one full > iteration. Even if Tim is wrong about it being O(N) twice over, it's still O(N) once over, instead of O(1) like the current implementation. To remove the Kth element takes K compares, and N-K moves. Yes, that N-K can be one big memmove, which can reduce the constant factor by orders of magnitude... But it's still linear work that the linked list version doesn't have to do, and N/100 is still worse than 1. Plus, the K compares can't be similarly accelerated, and on average K is N/2. The current implementation, on the other hand, takes one hash lookup, one hash deletion, and two node pointer twiddles, which means it's O(1). > 2. You don't have to shift for each deletion. You can wait until some > threshold is reached before shifting, and maybe that will spread out > the cost of shift among deletions enough to make an impact. I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? > 3. He then mentions that the linkedlist implementation uses a second > dict to map keys to their list nodes. Well, that would solve the cost > of lookup for lists, too. No it won't. With a linked list, you only have to update a single hash entry when a node is inserted or deleted, because the rest of the nodes are unchanged. But with an array, you have to update half the hash entries, because half of the indices are shifted. (And, again, spreading out the work isn't going to help unless you can actually reduce the actual amount of work to something sublinear.) > So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm > not really sure what to do next, in terms of testing and comparing it > against the existing implementation, There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start. > and whether it's worth pushing > for such a design for OrderedDict (current Python version or future C > cersion). What would be the benefit of such a design? Unless it's a huge performance win, I don't see what your argument would be. (Sure, it's simpler to describe--but the implementation is more complex, so it probably doesn't serve as better example code.) > It's enough for me if I can figure out how this compares in > performance, both algorithmically and actually, to the current > implementation. > > -------------------------------------------------------------------- > > First idea: (link: http://pastebin.com/LESRktJw) > > Data members (with ListDict() as od): > - super() : dict[key] => value > - od.__map : dict[key] => index where od.__list[index] = key > - od.__list : list[index] => key or `_sentinel` > - od.__size : int - number of non-sentinel elements in list (aka len(dict)) > - This might be replaceable by len(self). > > > Some algorithms to note: > - .__setitem__: Add it to the end of __list if it's new. > - .__delitem__: Replace the __list entry with `sentinel`, and > attempt to compact the list. > - .__compact: Called when items are removed. Naive threshold: If > the list is 50% empty, then make a new list and update the indices for > each key. > - (iterator views): Simply yield things that aren't `sentinel`. > - .move_to_end(last=False): The biggest loss. Adding to the front > of the list will shift potentially many things up. Maybe if the list > were a dequeue, this would be faster. Honestly, I'd just leave that out of the preliminary tests. If everything else gets faster, then you can worry about how to make this faster as well. > The actual length of __list will slow down iteration through the > list, but the factor of O(n) is dependent on the threshold value. And > the code itself MIGHT be faster per-element, and maybe (big maybe) > easier for the hardware. Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.) > Needs special review: > - __sizeof__: I have no idea if I did this right. > - (iterators): Can they be made to sometimes detect > additions/removals during iteration? > - Exception safety: E.g. in __setitem__, the order of updates for > the metadata might leave the dict in an invalid state. > > Also needs review: > - I put a lot of global calls as default args, but the existing > use of default args for static name lookup seemed inconsistent. > > -------------------------------------------------------------------- > > Second idea: thicker wrapper. > > Store a key,value,index triple in the internal dictionary, instead > of just the value. > > This will have a wrapper cost on __getitem__ (have to get the > value from the triple), but will save many lookups for adding and > removing elements (which is probably the less-common use). I actually thought about collapsing the linked list implementation in the same way. But I think there are many uses where a 2x slower lookup hurts a lot more than a 50% faster mutate helps. And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.) > (At the C level, it should be possible to keep multiple storage > dictionaries in lockstep. Also, the triple doesn't need to be exposed > to the Python interface, right?) Are you suggesting increasing every hash bucket in every dictionary by an extra pointer, just to speed up OrderedDict? That would definitely slow down regular dicts, and increase their memory use, and they get used a lot more often. > Type: > class _DictStorage(object): > __slots__ = ['key', 'value', 'index'] > > > Members: > (external): ListDict[key] => value > super() : dict[key] => [key, value, index] > .__list : list[i] => [key, value, index] (same as above) or None > .__size : as in the other implementation. Probably just as unnecessary. > .__map : (removed; unneeded) > > > Some method implementations: > .__getitem__: > Grab the [k,v,i] from the inner dict and unwrap it. > .__setitem__: > If the triple exists, update its value. > If it doesn't exist, add in [k, v, len(__list)-1] > and append it to __list. > (This saves a lookup in the internal dict.) > .__delitem__: > inner.pop(key), and use the index to None it on the list. > (iteration): > Same as the first ListDict implementation, except using None > instead of _sentinel. > .move_to_end, .__compact: > Same as the first ListDict, except I won't have to > dict.get the keys to update their indices. > > > Third idea: Well, I have to learn PyPy's implementation of OrderedDict. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From storchaka at gmail.com Sat Dec 12 07:19:08 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 12 Dec 2015 14:19:08 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> Message-ID: On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: > On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >> 2. You don't have to shift for each deletion. You can wait until some >> threshold is reached before shifting, and maybe that will spread out >> the cost of shift among deletions enough to make an impact. > > I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. > > Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. From storchaka at gmail.com Sat Dec 12 07:34:44 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 12 Dec 2015 14:34:44 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: Message-ID: On 12.12.15 11:27, Franklin? Lee wrote: > Here's Tim Peters's 2013 explanation of OrderedDict: > http://stackoverflow.com/a/18951209/2963903 > > He said: >> But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list. > > I didn't see why this would need to be the case. > 1. If you start searching from the front, then each element would be > inspected or shifted, never both. This would mean only one full > iteration. > 2. You don't have to shift for each deletion. You can wait until some > threshold is reached before shifting, and maybe that will spread out > the cost of shift among deletions enough to make an impact. > 3. He then mentions that the linkedlist implementation uses a second > dict to map keys to their list nodes. Well, that would solve the cost > of lookup for lists, too. > > So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm > not really sure what to do next, in terms of testing and comparing it > against the existing implementation, and whether it's worth pushing > for such a design for OrderedDict (current Python version or future C > cersion). It's enough for me if I can figure out how this compares in > performance, both algorithmically and actually, to the current > implementation. Did you try to replace standard OrderedDict implementation with your implementation and run tests? Be aware, some details of C implementation were changed, an new tests were added in 3.5.1. It is better to take 3.5.1 for testing. Both current implementations of OrderedDict tried to make OrderedDict threadsafe and don't left an OrderedDict in inconsistent state if hashing or comparison raise an exception. There is also an attempt to handle cases when an OrderedDict was changed passing over OrderedDict API (with direct calls of dict modifying methods: dict.__setitem__, dict.__delitem__, dict.update, etc). The performance of Python implementation doesn't matter much now (while it has the same computational complexity). As for C implementation, I can't forecast what approach will be faster. From mike at selik.org Sat Dec 12 13:34:22 2015 From: mike at selik.org (Michael Selik) Date: Sat, 12 Dec 2015 18:34:22 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Fri, Dec 11, 2015, 8:20 PM Franklin? Lee wrote: > By the way, there are other usecases for ignoring arguments for > caching. For example, dynamic programming where the arguments are the > indices of a sequence, or some other object (tree?) which isn't a > recursive argument. I recommend that those also be done with a closure > (separating the recursive part from the initial arguments), but I > think it's worth considering an lru_cache implementation for students > who haven't learned to, er, abuse closures. Unless someone thinks a > recipe can/should be added to the docs. > This whole thing is probably best implemented as two separate functions rather than using a closure, depending on how intertwined the code paths are for the shortcut/non-shortcut versions. @lru_cache def factorial(n): if n < 2: return 1 return n * factorial(n-1) @lru_cache def factorial_faster(n, shortcut=None): if shortcut is not None: return shortcut return factorial(n) > On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee > wrote: > > Solutions: > > 1. Rewrite your recursive function so that the partial state is a > > nonlocal variable (in the closure), and memoize the recursive part. > I'd flip the rare-case to the except block and put the normal-case in the try block. I believe this will be more compute-efficient and more readable. def factorial(n, answer=None): try: return factorial.recursive(n) except AttributeError: @lru_cache() def recursive(n): # shortcut if answer is not None: return answer # non-shortcut if n < 2: return 1 return n * recursive(n-1) factorial.recursive = recursive return recursive(n) Note that the original question was how to handle an optional shortcut parameter that would not change the output but simply increase speed if (and only if) that call was a cache miss. A successive cache hit should be near instantaneous, regardless of the optional parameter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Dec 12 17:11:20 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Dec 2015 14:11:20 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> Message-ID: <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> On Dec 12, 2015, at 04:19, Serhiy Storchaka wrote: > >> On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: >>> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >>> 2. You don't have to shift for each deletion. You can wait until some >>> threshold is reached before shifting, and maybe that will spread out >>> the cost of shift among deletions enough to make an impact. >> >> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. >> >> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? > > All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times. From storchaka at gmail.com Sat Dec 12 18:20:43 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 13 Dec 2015 01:20:43 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> Message-ID: On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote: > On Dec 12, 2015, at 04:19, Serhiy Storchaka wrote: >> >>> On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: >>>> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >>>> 2. You don't have to shift for each deletion. You can wait until some >>>> threshold is reached before shifting, and maybe that will spread out >>>> the cost of shift among deletions enough to make an impact. >>> >>> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. >>> >>> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? >> >> All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. > > That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. > > I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times. Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] [1] https://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space-efficient-faster/ From abarnert at yahoo.com Sat Dec 12 19:06:56 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Dec 2015 16:06:56 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> Message-ID: <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> On Dec 12, 2015, at 15:20, Serhiy Storchaka wrote: > >> On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote: >>> On Dec 12, 2015, at 04:19, Serhiy Storchaka wrote: >>> >>>>> On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: >>>>> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >>>>> 2. You don't have to shift for each deletion. You can wait until some >>>>> threshold is reached before shifting, and maybe that will spread out >>>>> the cost of shift among deletions enough to make an impact. >>>> >>>> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. >>>> >>>> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? >>> >>> All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. >> >> That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. >> >> I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times. > > Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. You've got two halves of a process that both take N/2 time. If you turn one of those halves into amortized constant time, your total time is still linear. (Maybe in some cases you've made it twice as fast, but that's still linear--but at any rate, in this case, he hasn't really made it twice as fast, because the first half, the linear search, now has twice as much to search.) So he's adding linear chunkiness, without reducing the total time. That isn't the same as a normal dict, which adds only logarithmic chunkiness (it does its extra N work log N times, instead of extra N work N/C times), and in service of reducing the total time from linear to amortized constant. It's possible that there's some other optimization here that the OP didn't include (or that I didn't notice and reply to here) that you've intuitively assumed, which can get us back to constant time. If so, that would be great. (But if there were a way to get constant-time inserts and deletes at arbitrary locations within an array, I suspect nobody would have invented linked lists; am I missing something?) > All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). For example: >>> d = Dict(((1,1), (2,2), (3,3), (4,4))) >>> list(d) [1, 2, 3, 4] >>> del d[1] >>> list(d) [4, 2, 3] And constant time deletes (or moves) without destroying order, which is exactly the problem that the hash of linked list nodes in the current design is there to solve. If you don't care about that problem, you don't need that design, but if you're building a general replacement for OrderedDict, you need it (or something to replace it). Of course an ordered mapping that doesn't support deletions (or moves) could still be useful for lots of things (e.g., most of the time people say they want ordering for a **kw or a class dict, they don't need to delete or move anything). > [1] https://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space-efficient-faster/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From me at jeltef.nl Sat Dec 12 19:13:49 2015 From: me at jeltef.nl (Jelte Fennema) Date: Sun, 13 Dec 2015 01:13:49 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes Message-ID: I really like the OrderedDict class. But there is one thing that has always bothered me about it. Quite often I want to initialize a small ordered dict. When the keys are all strings this is pretty easy, since you can just use the keyword arguments. But when some, or all of the keys are other things this is an issue. In that case there are two options (as far as I know). If you want an ordered dict of this form for instance: {1: 'a', 4: int, 2: (3, 3)}, you would either have to use: OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) or you could use: d = OrderedDict() d[1] = 'a' d[4] = int d[2] = (3, 3) In my opinion both are quite verbose and the first is pretty unreadable because of all the nested tuples. That is why I have two suggestions for language additions that fix that. The first one is the normal dict literal syntax available to custom dict classes like this: OrderedDict{1: 'a', 4: int, 2: (3, 3)} This looks much cleaner in my opinion. As far as I can tell it could simply be implemented as if the either of the two above options was used. This would make it available to all custom dict types that implement the two options above. A second very similar option, which might be cleaner and more useful, is to make this syntax available (only) after initialization. So it could be used like this: d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} d{3: 4, 'a': 'c'} *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} This would allow arguments to the __init__ method as well. And this way it could simply be a shorthand for setting multiple attributes. It might even be used to change multiple values in a list if that is a feature that is wanted. Lastly I think either of the two sugested options could be used to allow dict comprehensions for custom dict types. But this might require a bit more work (although not much I think). I'm interested to hear what you guys think. Jelte PS. I read part of this thread https://mail.python.org/pipermail/python-ideas/2009-June/thread.html#4916, but that seemed more about making OrderedDict itself a literal. Which is not what I'm suggesting here. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 12 19:27:01 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Dec 2015 16:27:01 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> Message-ID: I'm not following this thread all that closely, but am I the only one who thinks that the LRU cache (discussed in another thread) could be implemented with much less code on top of OrderedDict? Basically whenever you access a key you just move it to the front, and when you add a key when it is already at capacity you delete the first one. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Sat Dec 12 19:53:20 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Sat, 12 Dec 2015 19:53:20 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: One of my favorite things about python is that it is always very clear when and in what order my code is executed. There is very little syntactic sugar that obscures the execution order; however, using an unordered collection (dict) to discuss the order that data will be added to an order-aware collection seems very confusing. We should first look at what the language already provides us for defining this. One thing that might make the association lists more readable without changing the language would be to visually break up the pairs over multiple lines. This could change the `OrderedDict` construction to look like: OrderedDict([ (k0, v0), (k1, v1), (kn, vn), ]) This makes the k: v mapping much more clear. Another option, which we have used in the library 'datashape', is to make the class itself subscriptable: R['a':int32, 'b':float64, 'c': string] or if we were to write it like the above example: R[ 'a':int32, 'b':float64, 'c':string, ] This might look like custom syntax; however, this is just using `__getitem__`, `tuple` literals, and `slice` literals in an interesting way to look sort of like a dictionary. One nice property of this syntax is that because readers know that they are creating a tuple, the fact that this is an order-preserving operation is very clear. This code is semantically equivalent to normal numpy code like: `my_array[idx_0_start:idx_0_end, idx_1_start:idx_1_end, idx_2_start:idx_2_end]` Here `R` is a an alias for the class `Record` where `R is Record`. This class has a metaclass that adds a `__getitem__`. This getitem looks for either a single slice or a tuple of slices and then checks the values to make sure they are all valid inputs. This is used to internally construct an `OrderedDict` We hadn't considered the comprehension case; however, you could dispatch the `__getitem__` on a generator that yields tuples to simulate the comprehension. This could look like: R[(k, v) for k, v in other_seq] where inside our `__getitem__` we would add a case like: if isinstance(key, types.GeneratorType): mapping = OrderedDict(key) If you would like to see like to see a code example of the implementation for the `Record` type it is available under the BSD license here: https://github.com/blaze/datashape/blob/master/datashape/coretypes.py#L968 There is another library that I have worked on that adds the ability to overload all of the literals, including dictionaries. This requires CPython >= 3.4 though and is for fun only so I would not recommend using this in a production setting. I am merely mentioning this to show another possible syntax for this. @ordereddict_literals def f(): return {'a': 1, 'b': 2, 'c': 3} >>> f() OrderedDict([('a', 1), ('b', 2), ('c', 3)]) This code is available under the GPLv2 license here: https://github.com/llllllllll/codetransformer/blob/master/codetransformer/transformers/literals.py#L15 On Sat, Dec 12, 2015 at 7:13 PM, Jelte Fennema wrote: > I really like the OrderedDict class. But there is one thing that has > always bothered me about it. Quite often I want to initialize a small > ordered dict. When the keys are all strings this is pretty easy, since you > can just use the keyword arguments. But when some, or all of the keys are > other things this is an issue. In that case there are two options (as far > as I know). If you want an ordered dict of this form for instance: {1: 'a', > 4: int, 2: (3, 3)}, you would either have to use: > OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) > > or you could use: > d = OrderedDict() > d[1] = 'a' > d[4] = int > d[2] = (3, 3) > > In my opinion both are quite verbose and the first is pretty unreadable > because of all the nested tuples. That is why I have two suggestions for > language additions that fix that. > The first one is the normal dict literal syntax available to custom dict > classes like this: > OrderedDict{1: 'a', 4: int, 2: (3, 3)} > > This looks much cleaner in my opinion. As far as I can tell it could > simply be implemented as if the either of the two above options was used. > This would make it available to all custom dict types that implement the > two options above. > > A second very similar option, which might be cleaner and more useful, is > to make this syntax available (only) after initialization. So it could be > used like this: > d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} > d{3: 4, 'a': 'c'} > *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} > > This would allow arguments to the __init__ method as well. And this way > it could simply be a shorthand for setting multiple attributes. It might > even be used to change multiple values in a list if that is a feature that > is wanted. > > Lastly I think either of the two sugested options could be used to allow > dict comprehensions for custom dict types. But this might require a bit > more work (although not much I think). > > I'm interested to hear what you guys think. > > Jelte > > PS. I read part of this thread > https://mail.python.org/pipermail/python-ideas/2009-June/thread.html#4916, > but that seemed more about making OrderedDict itself a literal. Which is > not what I'm suggesting here. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryan at ryanhiebert.com Sat Dec 12 19:44:12 2015 From: ryan at ryanhiebert.com (Ryan Hiebert) Date: Sat, 12 Dec 2015 18:44:12 -0600 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <2B70F408-1967-414A-BF2C-8D2807EA7D91@ryanhiebert.com> > On Dec 12, 2015, at 18:13, Jelte Fennema wrote: > > I really like the OrderedDict class. But there is one thing that has always bothered me about it. Quite often I want to initialize a small ordered dict. When the keys are all strings this is pretty easy, since you can just use the keyword arguments. I don't think this will work. Python uses a dict to pass in kwargs, so you've already lost ordering at that point, if I'm right. > [...] I have two suggestions for language additions that fix that. > The first one is the normal dict literal syntax available to custom dict classes like this: > OrderedDict{1: 'a', 4: int, 2: (3, 3)} My first preference would be for the proposals for dict to become ordered by default to come to fruition. If that turns out to be unacceptable (even though, as I understand it, PyPy has already been able to do this), then this looks pretty good to me. It's and alternate call operator that uses dict-like syntax to pass in a tuple of pairs. > > [...] > > A second very similar option, which might be cleaner and more useful, is to make this syntax available (only) after initialization. So it could be used like this: > d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} > d{3: 4, 'a': 'c'} > >>> OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} IMO, this is strictly worse than the other option. It's confusing that the call operator wouldn't happen before this dict-like call. Either would be unneeded if dict was ordered by default, and it's far and way the best option, IMO. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 12 19:59:07 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Dec 2015 16:59:07 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: On Sat, Dec 12, 2015 at 4:53 PM, Joseph Jevnik wrote: > [...] > This code is available under the GPLv2 license here: > https://github.com/llllllllll/codetransformer/blob/master/codetransformer/transformers/literals.py#L15 > Good thing you recommended against its use :-), because we can't use GPL contributions for Python. Quoting the Python license: "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." (https://docs.python.org/3/license.html . -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 12 20:03:10 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 13 Dec 2015 03:03:10 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> Message-ID: On 13.12.15 02:06, Andrew Barnert via Python-ideas wrote: > On Dec 12, 2015, at 15:20, Serhiy Storchaka wrote: >>> On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote: >>>> On Dec 12, 2015, at 04:19, Serhiy Storchaka wrote: >>>>>> On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: >>>>>> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >>>>>> 2. You don't have to shift for each deletion. You can wait until some >>>>>> threshold is reached before shifting, and maybe that will spread out >>>>>> the cost of shift among deletions enough to make an impact. >>>>> >>>>> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. >>>>> >>>>> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? >>>> >>>> All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. >>> >>> That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. >>> >>> I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times. >> >> Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. > > You've got two halves of a process that both take N/2 time. If you turn one of those halves into amortized constant time, your total time is still linear. (Maybe in some cases you've made it twice as fast, but that's still linear--but at any rate, in this case, he hasn't really made it twice as fast, because the first half, the linear search, now has twice as much to search.) So he's adding linear chunkiness, without reducing the total time. What is linear time about what you are saying? I don't see anything that has no amortized constant complexity. What I have missed? >> All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] > > But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). This is only because preserving ordering is not a goal for a dict. Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted. From storchaka at gmail.com Sat Dec 12 20:16:00 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 13 Dec 2015 03:16:00 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> Message-ID: On 13.12.15 02:27, Guido van Rossum wrote: > I'm not following this thread all that closely, but am I the only one > who thinks that the LRU cache (discussed in another thread) could be > implemented with much less code on top of OrderedDict? Basically > whenever you access a key you just move it to the front, and when you > add a key when it is already at capacity you delete the first one. I have doubts about this. First, an ordered dictionary is only a part of the LRU cache implementation, using OrderedDict wouldn't make the code much less. Second, the LRU cache needs only small and very limited part of OrderedDict, with different requirements to reentrancy, thread-safety, and errors handling (partially more strong, partially more lenient). The code currently used by the LRU cache is much simpler, likely faster, and more bug-free. From joejev at gmail.com Sat Dec 12 20:25:35 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Sat, 12 Dec 2015 20:25:35 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: The two suggestion I made were for users to put in their own code _without_ needing to upstream it into CPython. I realize that the python license is incompatible with the GPL. On Sat, Dec 12, 2015 at 7:59 PM, Guido van Rossum wrote: > On Sat, Dec 12, 2015 at 4:53 PM, Joseph Jevnik wrote: > >> [...] >> This code is available under the GPLv2 license here: >> https://github.com/llllllllll/codetransformer/blob/master/codetransformer/transformers/literals.py#L15 >> > > Good thing you recommended against its use :-), because we can't use GPL > contributions for Python. Quoting the Python license: "All Python licenses, > unlike the GPL, let you distribute a modified version without making your > changes open source." (https://docs.python.org/3/license.html . > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sat Dec 12 20:32:32 2015 From: barry at python.org (Barry Warsaw) Date: Sat, 12 Dec 2015 20:32:32 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes References: Message-ID: <20151212203232.09dfe4e7@anarchist.wooz.org> On Dec 12, 2015, at 08:25 PM, Joseph Jevnik wrote: >I realize that the python license is incompatible with the GPL. Not technically correct, and not what Guido is saying. The current PSF license is indeed compatible with the GPL: https://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses it's just that the PSF cannot accept contributions licensed under the terms of the GPL. https://www.python.org/psf/contrib/ Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From steve at pearwood.info Sat Dec 12 22:24:16 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 13 Dec 2015 14:24:16 +1100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <20151213032416.GN3821@ando.pearwood.info> On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote: > I really like the OrderedDict class. But there is one thing that has always > bothered me about it. Quite often I want to initialize a small ordered > dict. When the keys are all strings this is pretty easy, since you can just > use the keyword arguments. But when some, or all of the keys are other > things this is an issue. In that case there are two options (as far as I > know). If you want an ordered dict of this form for instance: {1: 'a', 4: > int, 2: (3, 3)}, you would either have to use: > OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) > > or you could use: > d = OrderedDict() > d[1] = 'a' > d[4] = int > d[2] = (3, 3) > > In my opinion both are quite verbose and the first is pretty unreadable > because of all the nested tuples. You have a rather strict view of "unreadable" :-) Some alternatives if you dislike the look of the above: # Option 3: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value # Option 4: d = OrderedDict([(1, 'a'), (4, int)]) # The pretty values. d[2] = (3, 3) # The ugly nested tuple at the end. So there's no shortage of work-arounds for the lack of nice syntax for creating ordered dicts. And besides, why single out OrderedDict? There are surely people out there using ordered mappings like RedBlackTree that have to deal with this same issue. Perhaps what we need is to stop focusing on a specific dictionary type, and think about the lowest common denominator for any mapping. And that, I believe, is a list of (key, value) tuples. See below. > That is why I have two suggestions for > language additions that fix that. > The first one is the normal dict literal syntax available to custom dict > classes like this: > OrderedDict{1: 'a', 4: int, 2: (3, 3)} I don't understand what that syntax is supposed to do. Obviously it creates an OrderedDict, but you haven't explained the details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, like the b prefix for byte-strings and r prefix for raw strings? In that case, I think that's pretty verbose, and would prefer to see something shorter: o{1: 'a', 4: int, 2: (3, 3)} perhaps. If OrderedDict is important enough to get its own syntax, it's important enough to get its own *short* syntax. That was my preferred solution, but it no longer is. Or is the prefix "OrderedDict" somehow looked-up at run-time? So we could write something like: spam = random.choice(list_of_callables) result = spam{1: 'a', 4: int, 2: (3, 3)} and spam would be called, whatever it happens to be, with a single list argument: [(1, 'a'), (4, int), (2, (3, 3))] What puts me off this solution is that it is syntactic sugar for not one but two distinct operations: - sugar for creating a list of tuples; - and sugar for a function call. But if we had the first, we don't need the second, and we don't need to treat OrderedDict as a special case. We could use any mapping: MyDict(sugar) OrderedDict(sugar) BinaryTree(sugar) and functions that aren't mappings at all, but expect lists of (a, b) tuples: covariance(sugar) > This looks much cleaner in my opinion. As far as I can tell it could simply > be implemented as if the either of the two above options was used. This > would make it available to all custom dict types that implement the two > options above. > > A second very similar option, which might be cleaner and more useful, is to > make this syntax available (only) after initialization. So it could be used > like this: > d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} > d{3: 4, 'a': 'c'} > *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} What does that actually do, in detail? Does it call d.__getitem__(key, value) repeatedly? So I could do something like this: L = [None]*10 L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'} assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] If we had nice syntax for creating ordered dict literals, would we want this feature? I don't think so. It must be pretty rare to want something like that (at least, I can't remember the last time I did) and when we do, we can often do it with slicing: py> L = [None]*10 py> L[1::2] = 'abcde' py> L [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] > This would allow arguments to the __init__ method as well. How? You said that this option was only available after initialization. > And this way it could simply be a shorthand for setting multiple attributes. How does the reader (or the interpreter) tell when d{key: value} means "call __setitem__" and when it means "call __setattr__"? > It might even > be used to change multiple values in a list if that is a feature that is > wanted. > > Lastly I think either of the two sugested options could be used to allow > dict comprehensions for custom dict types. But this might require a bit > more work (although not much I think). > > I'm interested to hear what you guys think. I think that there is a kernel of a good idea in this. Let's go back to the idea of syntactic sugar for a list of tuples. The user can then call the function or class of their choice, they aren't limited to just one mapping type. I'm going to suggest [key:value] as syntax. Now your original example becomes: d = OrderedDict([1: 'a', 4: int, 2: (3, 3)]) which breaks up the triple ))) at the end, so hopefully you will not think its ugly. Also, we're not limited to just calling the constructor, it could be any method: d.update([2: None, 1: 'b', 5: 99.9]) or anywhere at all: x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items # shorter and less error-prone than: x = ( [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))] + values ) There could be a comprehension form: [key: value for x in seq if condition] similar to the dict comprehension form. -- Steve From abarnert at yahoo.com Sat Dec 12 23:57:58 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Dec 2015 20:57:58 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> Message-ID: <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> On Dec 12, 2015, at 17:03, Serhiy Storchaka wrote: > >> On 13.12.15 02:06, Andrew Barnert via Python-ideas wrote: >> On Dec 12, 2015, at 15:20, Serhiy Storchaka wrote: >>>>> On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote: >>>>> On Dec 12, 2015, at 04:19, Serhiy Storchaka wrote: >>>>>>> On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote: >>>>>>> On Dec 12, 2015, at 01:27, Franklin? Lee wrote: >>>>>>> 2. You don't have to shift for each deletion. You can wait until some >>>>>>> threshold is reached before shifting, and maybe that will spread out >>>>>>> the cost of shift among deletions enough to make an impact. >>>>>> >>>>>> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. >>>>>> >>>>>> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose? >>>>> >>>>> All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run. >>>> >>>> That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. >>>> >>>> I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times. >>> >>> Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. >> >> You've got two halves of a process that both take N/2 time. If you turn one of those halves into amortized constant time, your total time is still linear. (Maybe in some cases you've made it twice as fast, but that's still linear--but at any rate, in this case, he hasn't really made it twice as fast, because the first half, the linear search, now has twice as much to search.) So he's adding linear chunkiness, without reducing the total time. > > What is linear time about what you are saying? I don't see anything that has no amortized constant complexity. What I have missed? Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations. >>> All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] >> >> But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). > > This is only because preserving ordering is not a goal for a dict. Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to). > Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. > > But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted. The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.) From abarnert at yahoo.com Sun Dec 13 00:15:58 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Dec 2015 21:15:58 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <20151213032416.GN3821@ando.pearwood.info> References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: <99F30521-D40D-4CC7-AB33-F5F159477C0E@yahoo.com> On Dec 12, 2015, at 19:24, Steven D'Aprano wrote: > >> On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote: >> I really like the OrderedDict class. But there is one thing that has always >> bothered me about it. Quite often I want to initialize a small ordered >> dict. When the keys are all strings this is pretty easy, since you can just >> use the keyword arguments. But when some, or all of the keys are other >> things this is an issue. In that case there are two options (as far as I >> know). If you want an ordered dict of this form for instance: {1: 'a', 4: >> int, 2: (3, 3)}, you would either have to use: >> OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) >> >> or you could use: >> d = OrderedDict() >> d[1] = 'a' >> d[4] = int >> d[2] = (3, 3) >> >> In my opinion both are quite verbose and the first is pretty unreadable >> because of all the nested tuples. > > You have a rather strict view of "unreadable" :-) > > Some alternatives if you dislike the look of the above: > > # Option 3: > d = OrderedDict() > for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): > d[key] = value > > # Option 4: > d = OrderedDict([(1, 'a'), (4, int)]) # The pretty values. > d[2] = (3, 3) # The ugly nested tuple at the end. > > So there's no shortage of work-arounds for the lack of nice syntax for > creating ordered dicts. > > And besides, why single out OrderedDict? There are surely people out > there using ordered mappings like RedBlackTree that have to deal with > this same issue. Perhaps what we need is to stop focusing on a specific > dictionary type, and think about the lowest common denominator for any > mapping. And that, I believe, is a list of (key, value) tuples. See > below. > > > >> That is why I have two suggestions for >> language additions that fix that. >> The first one is the normal dict literal syntax available to custom dict >> classes like this: >> OrderedDict{1: 'a', 4: int, 2: (3, 3)} > > I don't understand what that syntax is supposed to do. > > Obviously it creates an OrderedDict, but you haven't explained the > details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, > like the b prefix for byte-strings and r prefix for raw strings? In that > case, I think that's pretty verbose, and would prefer to see something > shorter: > > o{1: 'a', 4: int, 2: (3, 3)} > > perhaps. If OrderedDict is important enough to get its own syntax, it's > important enough to get its own *short* syntax. That was my preferred > solution, but it no longer is. > > Or is the prefix "OrderedDict" somehow looked-up at run-time? So we > could write something like: > > spam = random.choice(list_of_callables) > result = spam{1: 'a', 4: int, 2: (3, 3)} > > and spam would be called, whatever it happens to be, with a single > list argument: > > [(1, 'a'), (4, int), (2, (3, 3))] > > > What puts me off this solution is that it is syntactic sugar for not one > but two distinct operations: > > - sugar for creating a list of tuples; > - and sugar for a function call. > > But if we had the first, we don't need the second, and we don't need to > treat OrderedDict as a special case. We could use any mapping: > > MyDict(sugar) > OrderedDict(sugar) > BinaryTree(sugar) > > and functions that aren't mappings at all, but expect lists of (a, b) > tuples: > > covariance(sugar) > > >> This looks much cleaner in my opinion. As far as I can tell it could simply >> be implemented as if the either of the two above options was used. This >> would make it available to all custom dict types that implement the two >> options above. >> >> A second very similar option, which might be cleaner and more useful, is to >> make this syntax available (only) after initialization. So it could be used >> like this: >> d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} >> d{3: 4, 'a': 'c'} >> *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} > > What does that actually do, in detail? Does it call d.__getitem__(key, > value) repeatedly? So I could do something like this: > > L = [None]*10 > L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'} > assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] > > If we had nice syntax for creating ordered dict literals, would we want > this feature? I don't think so. It must be pretty rare to want something > like that (at least, I can't remember the last time I did) and when we > do, we can often do it with slicing: > > py> L = [None]*10 > py> L[1::2] = 'abcde' > py> L > [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] > > > >> This would allow arguments to the __init__ method as well. > > How? You said that this option was only available after > initialization. > > >> And this way it could simply be a shorthand for setting multiple attributes. > > How does the reader (or the interpreter) tell when > > d{key: value} > > means "call __setitem__" and when it means "call __setattr__"? > > > >> It might even >> be used to change multiple values in a list if that is a feature that is >> wanted. >> >> Lastly I think either of the two sugested options could be used to allow >> dict comprehensions for custom dict types. But this might require a bit >> more work (although not much I think). >> >> I'm interested to hear what you guys think. > > I think that there is a kernel of a good idea in this. Let's go back to > the idea of syntactic sugar for a list of tuples. The user can then call > the function or class of their choice, they aren't limited to just one > mapping type. > > I'm going to suggest [key:value] as syntax This does seem to be the obvious syntax: if [1, 2, 3] is a list and {1, 2, 3} is a set, and {1: 2, 3: 4} is a dict, then [1: 2, 3: 4] should be something that bears the same relationship to dict as list does to set: an a-list. (And we don't even have the {} ambiguity problem with [], because an a-list is the same type as a list, and no pairs is the same value as no elements.) And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin). And it's definitely readable in your examples. > . Now your original example > becomes: > > d = OrderedDict([1: 'a', 4: int, 2: (3, 3)]) > > which breaks up the triple ))) at the end, so hopefully you will not > think its ugly. Also, we're not limited to just calling the constructor, > it could be any method: > > d.update([2: None, 1: 'b', 5: 99.9]) > > or anywhere at all: > > x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items > > # shorter and less error-prone than: > x = ( > [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))] > + values > ) > > > There could be a comprehension form: > > [key: value for x in seq if condition] > > similar to the dict comprehension form. > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From songofacandy at gmail.com Sun Dec 13 00:25:45 2015 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 13 Dec 2015 14:25:45 +0900 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> Message-ID: > >>> All this looks similar to Raymond's proof-of-concept for a compact > dictionary (ordering is a side effect). [1] > >> > >> But his design only preserves ordering if you never delete (or move, > but he doesn't have an API for that). > > > > This is only because preserving ordering is not a goal for a dict. > > Sure. But it is the goal for OrderedDict. Which is exactly why a design > that makes sense for dict doesn't necessarily make sense for OrderedDict, > unless you add something else. The hash table of list nodes works as such a > something else. But Raymond's array doesn't (and isn't intended to). > > FWIW, I think PyPy's dict implementation is advanced version of Raymond's. More compact and preserves order on deletion. http://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at jeltef.nl Sun Dec 13 05:00:24 2015 From: me at jeltef.nl (Jelte Fennema) Date: Sun, 13 Dec 2015 11:00:24 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <20151213032416.GN3821@ando.pearwood.info> References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: I think you are right in suggesting that the whole problem goes away when there is a nice way to specify list of tuples. Since I indeed cannot think of a moment where my previous syntax cannot be replaced by this one. Another option would be that this syntax would not represent a list of tuples but an OrderedDict. I think they both have their advantages, the list of tuples would allow list operations such as `+ items` as you suggested and usage of the same keys multiple times. But OrderedDict would allow simple indexing directly. But it does not really matter that much since both could easily be used to generate the other. OrderedDict(['1':'2']) and list(['1':'2'].items()) respectively. I think the main case for the list of tuples is actually that you can make any OrderedDict from a list of tuples, but not the other way around, since duplicate keys would be removed. Which is why I like your idea for a shorthand for a list of tuples better, since it covers more uses. One important thing to note is the discussion I already mentioned in my first email. Especially this message where guide votes -100 for your syntax for OrderedDict creation: https://mail.python.org/pipermail/python-ideas/2009-June/004924.html I'm not sure why he disliked that syntax and if he still does. Or if his thoughts are different when it would represent a list of tuples instead of an OrderedDict. On 13 December 2015 at 04:24, Steven D'Aprano wrote: > On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote: > > I really like the OrderedDict class. But there is one thing that has > always > > bothered me about it. Quite often I want to initialize a small ordered > > dict. When the keys are all strings this is pretty easy, since you can > just > > use the keyword arguments. But when some, or all of the keys are other > > things this is an issue. In that case there are two options (as far as I > > know). If you want an ordered dict of this form for instance: {1: 'a', 4: > > int, 2: (3, 3)}, you would either have to use: > > OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) > > > > or you could use: > > d = OrderedDict() > > d[1] = 'a' > > d[4] = int > > d[2] = (3, 3) > > > > In my opinion both are quite verbose and the first is pretty unreadable > > because of all the nested tuples. > > You have a rather strict view of "unreadable" :-) > > Some alternatives if you dislike the look of the above: > > # Option 3: > d = OrderedDict() > for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): > d[key] = value > > # Option 4: > d = OrderedDict([(1, 'a'), (4, int)]) # The pretty values. > d[2] = (3, 3) # The ugly nested tuple at the end. > > So there's no shortage of work-arounds for the lack of nice syntax for > creating ordered dicts. > > And besides, why single out OrderedDict? There are surely people out > there using ordered mappings like RedBlackTree that have to deal with > this same issue. Perhaps what we need is to stop focusing on a specific > dictionary type, and think about the lowest common denominator for any > mapping. And that, I believe, is a list of (key, value) tuples. See > below. > > > > > That is why I have two suggestions for > > language additions that fix that. > > The first one is the normal dict literal syntax available to custom dict > > classes like this: > > OrderedDict{1: 'a', 4: int, 2: (3, 3)} > > I don't understand what that syntax is supposed to do. > > Obviously it creates an OrderedDict, but you haven't explained the > details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, > like the b prefix for byte-strings and r prefix for raw strings? In that > case, I think that's pretty verbose, and would prefer to see something > shorter: > > o{1: 'a', 4: int, 2: (3, 3)} > > perhaps. If OrderedDict is important enough to get its own syntax, it's > important enough to get its own *short* syntax. That was my preferred > solution, but it no longer is. > > Or is the prefix "OrderedDict" somehow looked-up at run-time? So we > could write something like: > > spam = random.choice(list_of_callables) > result = spam{1: 'a', 4: int, 2: (3, 3)} > > and spam would be called, whatever it happens to be, with a single > list argument: > > [(1, 'a'), (4, int), (2, (3, 3))] > > > What puts me off this solution is that it is syntactic sugar for not one > but two distinct operations: > > - sugar for creating a list of tuples; > - and sugar for a function call. > > But if we had the first, we don't need the second, and we don't need to > treat OrderedDict as a special case. We could use any mapping: > > MyDict(sugar) > OrderedDict(sugar) > BinaryTree(sugar) > > and functions that aren't mappings at all, but expect lists of (a, b) > tuples: > > covariance(sugar) > > > > This looks much cleaner in my opinion. As far as I can tell it could > simply > > be implemented as if the either of the two above options was used. This > > would make it available to all custom dict types that implement the two > > options above. > > > > A second very similar option, which might be cleaner and more useful, is > to > > make this syntax available (only) after initialization. So it could be > used > > like this: > > d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} > > d{3: 4, 'a': 'c'} > > *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} > > What does that actually do, in detail? Does it call d.__getitem__(key, > value) repeatedly? So I could do something like this: > > L = [None]*10 > L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'} > assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] > > If we had nice syntax for creating ordered dict literals, would we want > this feature? I don't think so. It must be pretty rare to want something > like that (at least, I can't remember the last time I did) and when we > do, we can often do it with slicing: > > py> L = [None]*10 > py> L[1::2] = 'abcde' > py> L > [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] > > > > > This would allow arguments to the __init__ method as well. > > How? You said that this option was only available after > initialization. > > > > And this way it could simply be a shorthand for setting multiple > attributes. > > How does the reader (or the interpreter) tell when > > d{key: value} > > means "call __setitem__" and when it means "call __setattr__"? > > > > > It might even > > be used to change multiple values in a list if that is a feature that is > > wanted. > > > > Lastly I think either of the two sugested options could be used to allow > > dict comprehensions for custom dict types. But this might require a bit > > more work (although not much I think). > > > > I'm interested to hear what you guys think. > > I think that there is a kernel of a good idea in this. Let's go back to > the idea of syntactic sugar for a list of tuples. The user can then call > the function or class of their choice, they aren't limited to just one > mapping type. > > I'm going to suggest [key:value] as syntax. Now your original example > becomes: > > d = OrderedDict([1: 'a', 4: int, 2: (3, 3)]) > > which breaks up the triple ))) at the end, so hopefully you will not > think its ugly. Also, we're not limited to just calling the constructor, > it could be any method: > > d.update([2: None, 1: 'b', 5: 99.9]) > > or anywhere at all: > > x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items > > # shorter and less error-prone than: > x = ( > [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))] > + values > ) > > > There could be a comprehension form: > > [key: value for x in seq if condition] > > similar to the dict comprehension form. > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lac at openend.se Sun Dec 13 06:43:48 2015 From: lac at openend.se (Laura Creighton) Date: Sun, 13 Dec 2015 12:43:48 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <201512131143.tBDBhmkH026966@fido.openend.se> I care about readability but I find: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value quite readable. Laura From Stephan.Sahm at gmx.de Fri Dec 11 17:15:40 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 11 Dec 2015 23:15:40 +0100 Subject: [Python-ideas] generic Liftable abc-mixin breaks at MRO In-Reply-To: References: Message-ID: I now created both, and in fact the code for Liftable-signal seems much cleaner Nevertheless, the code is rather long for this thread, but it might be useful for someone import abc import inspect from contextlib import contextmanager def use_as_needed(func, kwargs): meta = inspect.getargspec(func) if meta.keywords is not None: return func(**kwargs) else: # not generic super-constructor - pick only the relevant subentries: return func(**{k:kwargs[k] for k in kwargs if k in meta.args}) class NotLiftable(RuntimeError): pass @contextmanager def super_liftable(cls, self): """ this is kind of a hack to replace super.super, however I haven't found any other nice way to do it """ if cls is object: raise NotLiftable() liftables = [l for l in cls.__bases__ if type(l).__name__ == "Liftable"] if not liftables: raise NotLiftable() orig_class = self.__class__ self.__class__ = liftables[0] yield self self.__class__ = orig_class def LiftableFrom(base_cls_name): class Liftable(type): def __init__(cls, name, bases, dct): # for base_cls nothing should be done, as this is the one to refer to by Lifting if not cls.__name__ == base_cls_name: if "__init__" in dct: raise TypeError("Descendents of Liftable are not allowed to have own __init__ method. Instead overwrite __initialize__") def lifted__init__(self, **kwargs): with super_liftable(cls, self) as s: use_as_needed(s.__init__, kwargs) if hasattr(self, "__initialize__"): use_as_needed(self.__initialize__, kwargs) cls.__init__ = lifted__init__ #setattr(cls, "__init__", lifted__init__) super(Liftable, cls).__init__(name, bases, dct) Liftable.base_cls_name = base_cls_name #Liftable.__name__ = "LiftableFrom" + base_cls_name # to show that this is possible return Liftable def lift(self, new_class, **kwargs): #TODO adapt to work with both definitions above # Stop Conditions: if self.__class__ is new_class: return # nothing to do elif new_class is object: # Base Case # break recursion at once: raise NotLiftable() ls = [l for l in new_class.__bases__ if type(l).__name__ == "Liftable"] if not ls: raise NotLiftable() # recursive case: if not self.__class__ is ls[0]: # it would also be possible to use tree like left-first-search here lift(self, ls[0], **kwargs) # own case: self.__class__ = new_class use_as_needed(self.__initialize__, kwargs) The least beautiful thing needed is too give the name of the base-class (in this case "A") as an additional parameter for the lift-meta-class. I haven't found a way to access A.__name__ directly or even better automatically get the class where this meta-class was originally inserted. A second point is that I had to use an own version of super(), however this works like a charm as far as I can see. The Metaclass ensures that any child must not have a __init__ but instead can only use __initialize__ like a replacement. Here an example which works: class A(object): __metaclass__ = LiftableFrom("A") def __init__(self, a): self.a = a class B(A): def __initialize__(self, b): print "initialize b" self.b = b class C(B): def __initialize__(self, c): print "initialize c" self.c = c a = A(a=1) a.a # 1 lift(a, C, b=2, c=3) print type(a) print a.a, a.b, a.c #initialize b #initialize c # #1 2 3 ?cheers, Stephan? On 10 December 2015 at 22:05, Stephan Sahm wrote: > Dear Andrew, > > thank you very much for this impressively constructive response. It is for > sure more constructive than I can react on now. > > For the concrete usecase, the Liftable-signal in might be the most > interesting option, as then already the base class can inherit the > Liftable-signal and can itself already use lift. > However, I cannot see how to make the __init__ method conform in this > setting, but by inidividual implementations (I in fact thought that > enforcing it by the mixin makes things safer, and it of course should > reduce boilerplate code) > > The Liftable(T) in fact seems also great, as I cannot see how to avoid > this lifting from a false class in the Liftable-signal chain. I only want > to Lift from one class at the moment, so this is in fact kind of what I was > after. The rough outline would look like > > class B(Lift(A)): > pass > class C(Lift(B)): > pass > > > which seems rather beautiful to read - thank you very much for pointing > this out. > > If you have an idea how to automatically create the right __init__ method > when using the Liftable-signal-chain, I would highly welcome it. > > I myself need to recap some metaclass basics again before seriously > tackling this. > Best, > Stephan > > > > On 10 December 2015 at 19:38, Andrew Barnert wrote: > >> On Dec 10, 2015, at 00:58, Stephan Sahm wrote: >> >> Dear all, >> >> I think I found a crucial usecase where the standard MRO does not work >> out. I would appreciate your help to still solve this usecase, or might MRO >> even be adapted? >> >> >> First, do you have an actual use case for this? And are you really >> looking to suggest changes for Python 3.6, or looking for help with using >> Python 2.7 as-is? >> >> Anyway, I think the first problem here is that you're trying to put the >> same class, Liftable, on the MRO twice. That doesn't make sense--the whole >> point of superclass linearization is that each class only appears once in >> the list. >> >> If you weren't using a metaclass, you wouldn't see this error--but then >> you'd just get the more subtle problem that C can't be lifted from A to B >> because Liftable isn't getting called there. >> >> If you made Liftable a class factory, so two calls to Liftable() returned >> different class objects, then you might be able to make this work. (I >> suppose you could hide that from the user by giving Liftable a custom >> metaclass that constructs new class objects for each copy of Liftable in >> the bases list before calling through to type, but that seems like magic >> you really don't want to hide if you want anyone to be able to debut this >> code.) >> >> In fact, you could even make it Liftable(T), which makes your type >> Liftable from T, rather than from whatever class happens to come after you >> on the MRO chain. (Think about how this would work with other mixins, or >> pure-interface ABCs, or full multiple inheritance--you may end up declaring >> that C can be lifted from Sequence rather than B, which is nonsense, and >> which will be hard to debug if you don't understand the C3 algorithm.) >> >> Or, if you actually _want_ to be liftable from whatever happens to come >> next, then isn't liftability a property of the entire tree of classes, not >> of individual classes in that tree, so you should only be specifying >> Liftable once (either at A, or at B) in the hierarchy in the first place? >> From what I can tell, the only benefit you get from installing it twice is >> tricking the ABCMeta machinery into enforcing that all classes implement >> _initialize_ instead of just enforcing that one does; the easy solution >> there is to just write your own metaclass that does that check directly. >> >> Or maybe, instead of enforcing it, use it as a signal: build a "lift >> chain" for each Liftable type out of all classes on the MRO that directly >> implement _initialize_ (or just dynamically look for it as you walk the MRO >> in lift). So lift only works between those classes. I think that gets you >> all the same benefits as Liftable(T), without needing a class factory, and >> without having to specify it more than once on a hierarchy. >> >> The idea is to build a generic Lift-type which I call this way because >> the derived classes should be able to easily lift from subclasses. So for >> example if I have an instance *a* from *class A* and a *class B(A)* I >> want to make *a* an instance of *B* in a straightforward way. >> >> My implementation (Python 2.7): >> >> import abc >> import inspect >> >> def use_as_needed(func, kwargs): >> meta = inspect.getargspec(func) >> if meta.keywords is not None: >> return meta(**kwargs) >> else: >> # not generic super-constructor - pick only the relevant >> subentries: >> return func(**{k:kwargs[k] for k in kwargs if k in meta.args}) >> >> class Liftable(object): >> __metaclass__ = abc.ABCMeta >> >> def __init__(self, **kwargs): >> use_as_needed(super(Liftable,self).__init__, kwargs) >> use_as_needed(self.__initialize__, kwargs) >> >> @abc.abstractmethod >> def __initialize__(self, **kwargs): >> return NotImplemented() >> >> class NoMatchingAncestor(RuntimeError): >> pass >> >> class NotLiftable(RuntimeError): >> pass >> >> def lift(self, new_class, **kwargs): >> # Stop Conditions: >> if self.__class__ is new_class: >> return # nothing to do >> elif new_class is object: # Base Case >> # break recursion at once: >> raise NoMatchingAncestor() >> elif new_class.__base__ is not Liftable: #to ensure this is save >> raise NotLiftable("Class {} is not Liftable (must be first >> parent)".format(new_class.__name__)) >> >> # recursive case: >> if not self.__class__ is new_class.__bases__[1]: >> lift(self, new_class.__bases__[1], **kwargs) >> # own case: >> self.__class__ = new_class >> use_as_needed(self.__initialize__, kwargs) >> >> >> and the example usecase: >> >> class A(object): >> def __init__(self, a): >> self.a = a >> >> class B(Liftable, A): >> def __initialize__(self, b): >> self.b = b >> >> a = A(1) >> print a.a, a.__class__ >> # 1 >> >> lift(a, B, b=2) >> print a.a, a.b, a.__class__ >> # 1 2 >> >> >> this works so far, however if I now put a further level of Liftable >> (which in principal already works with the generic definition >> >> class C(Liftable, B): >> def __initialize__(self, c): >> self.c = c >> >> >> I get the error >> >> TypeError: Error when calling the metaclass bases Cannot create a >> consistent method resolution order (MRO) for bases Liftable, B >> >> >> ?I read about MRO, and it seems to be the case that this setting somehow >> raises this generic Error, however I really think having such a Lifting is >> save and extremely useful? - how can I make it work in python? >> >> (one further comment: switching the order of inheritance, i.e. class B(A, >> Liftable) will call A.__init__ before Liftable.__init__ which makes the >> whole idea senseless) >> >> Any constructive help is appreciated! >> best, >> Stephan >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cody.piersall at gmail.com Sun Dec 13 16:10:37 2015 From: cody.piersall at gmail.com (Cody Piersall) Date: Sun, 13 Dec 2015 15:10:37 -0600 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <201512131143.tBDBhmkH026966@fido.openend.se> References: <201512131143.tBDBhmkH026966@fido.openend.se> Message-ID: On Sun, Dec 13, 2015 at 5:43 AM, Laura Creighton wrote: > > I care about readability but I find: > > d = OrderedDict() > for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): > d[key] = value > > quite readable. > > Laura You don't even need the loop, just the zip. >>> from collections import OrderedDict >>> OrderedDict(zip([5, 9, 3, 53, 2342, 'pizza'], 'abcdef')) OrderedDict([(5, 'a'), (9, 'b'), (3, 'c'), (53, 'd'), (2342, 'e'), ('pizza', 'f')]) But it's probably more readable as >>> keys = [5, 9, 3, 53, 2342, 'pizza'] >>> values = 'abcdef' >>> OrderedDict(zip(keys, values)) OrderedDict([(5, 'a'), (9, 'b'), (3, 'c'), (53, 'd'), (2342, 'e'), ('pizza', 'f')]) Cody -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at jeltef.nl Sun Dec 13 17:22:20 2015 From: me at jeltef.nl (Jelte Fennema) Date: Sun, 13 Dec 2015 23:22:20 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <201512131143.tBDBhmkH026966@fido.openend.se> Message-ID: This is indeed another option, but it decouples the keys from the values. Which makes it harder to see with a quick look what key will have what value, you will have to count the keys. This feels to me like a serious drawback of this method since knowing what value a key will have is pretty important when talking about initializing dictionaries. On 13 December 2015 at 22:10, Cody Piersall wrote: > > > On Sun, Dec 13, 2015 at 5:43 AM, Laura Creighton wrote: > > > > I care about readability but I find: > > > > d = OrderedDict() > > for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): > > d[key] = value > > > > quite readable. > > > > Laura > > You don't even need the loop, just the zip. > > >>> from collections import OrderedDict > >>> OrderedDict(zip([5, 9, 3, 53, 2342, 'pizza'], 'abcdef')) > OrderedDict([(5, 'a'), (9, 'b'), (3, 'c'), (53, 'd'), (2342, 'e'), > ('pizza', 'f')]) > > But it's probably more readable as > >>> keys = [5, 9, 3, 53, 2342, 'pizza'] > >>> values = 'abcdef' > >>> OrderedDict(zip(keys, values)) > OrderedDict([(5, 'a'), (9, 'b'), (3, 'c'), (53, 'd'), (2342, 'e'), > ('pizza', 'f')]) > > Cody > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 13 18:05:06 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 13 Dec 2015 15:05:06 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> Message-ID: On Sat, Dec 12, 2015 at 5:16 PM, Serhiy Storchaka wrote: > On 13.12.15 02:27, Guido van Rossum wrote: > >> I'm not following this thread all that closely, but am I the only one >> who thinks that the LRU cache (discussed in another thread) could be >> implemented with much less code on top of OrderedDict? Basically >> whenever you access a key you just move it to the front, and when you >> add a key when it is already at capacity you delete the first one. >> > > I have doubts about this. First, an ordered dictionary is only a part of > the LRU cache implementation, using OrderedDict wouldn't make the code much > less. Second, the LRU cache needs only small and very limited part of > OrderedDict, with different requirements to reentrancy, thread-safety, and > errors handling (partially more strong, partially more lenient). The code > currently used by the LRU cache is much simpler, likely faster, and more > bug-free. > Fair enough. Perhaps it would have made more sense if we had a C version of OrderedDict but not of lru_cache. Still, two things: (1) Anything written in Python that implements a linked list feels like a code smell to me (it's hard to make up for the constant factor with a better algorithm unless the data structure is pretty large). (2) If you want to implement something that's close to lru_cache but not quite (e.g. one where items also come with a timeout, like DNS lookups), you can't subclass its implementation, because it's all hidden in a function. But you can wrap a little bit of extra logic around the 10 or so lines it takes to implement a decent LRU cache on top of OrderedDict. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Dec 14 09:58:17 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Dec 2015 16:58:17 +0200 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> Message-ID: On 13.12.15 06:57, Andrew Barnert via Python-ideas wrote: > Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. A linked list searching for the value takes linear time too. To avoid it, an auxiliary hashtable is used. In current Python implementation it maps key to a list node, in proposed implementation it maps key to an index in a continuous array. No linear search anymore, this problem already is solved. > He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. > > He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. What do you mean saying about "change half the hash values for each shift" and why this is a problem? > What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations. Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N). This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again. In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant. The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower. Actually the current C implementation of OrderedDict uses an continuous array for mapping an index in the base hashtable to a list node. It is rebuild from a linked list when the base hashtable is rebuild. I where planned to experiment with getting rid of linked list at all. >>>> All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] >>> >>> But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). >> >> This is only because preserving ordering is not a goal for a dict. > > Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to). > >> Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. >> >> But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted. > > The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.) The benefit of Raymond's design is that it lets you iterate a continuous array instead of jump forward and backward on randomly distributed in memory list nodes. And on modern computers iterating even sparse array can be faster that iterating linked list. [*] This is not the only benefit of Raymond's design, the other is presumably more compact representation. [*] Of course needed benchmarks to prove this. From abarnert at yahoo.com Mon Dec 14 12:30:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 14 Dec 2015 09:30:18 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> Message-ID: <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> On Dec 14, 2015, at 06:58, Serhiy Storchaka wrote: > >> On 13.12.15 06:57, Andrew Barnert via Python-ideas wrote: >> Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. > > A linked list searching for the value takes linear time too. To avoid it, an auxiliary hashtable is used. This is already covered in the very next paragraph. > In current Python implementation it maps key to a list node, in proposed implementation it maps key to an index in a continuous array. No linear search anymore, this problem already is solved. > >> He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. >> >> He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. > > What do you mean saying about "change half the hash values for each shift" and why this is a problem? If you delete a linked list node, none of the other list nodes change. So, if you have hash entries pointing at all of the list nodes, only one hash entry has to be changed for a delete. If you delete an array element, all of the subsequent indices change. So, if you have hash entries pointing at the indices, on average, half of the hash entries have to be changed for a delete. And if you're about to say "but combining the two solves that problem", that's already in the very next paragraph, so please read on. >> What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations. > > Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N). If that were true, dicts would be amortized linear time. You do O(N) work O(N) times, divided over O(N) operations, which means amortized O(N*N/N), which is linear. The fact that dict (and list) grow exponentially rather than linearly is critical. It means that, even in the case of nothing but inserts, you only rehash O(log N) times, not O(N), and the work done each time also only grows logarithmically. The total work done for all resizes ends up being just a constant times N (think about the sum of 1 + 1/2 + 1/4 + 1/8 + ... vs. 1 + 9/10 + 8/10 + ...), so, divided by N operations, you get amortized O(C*N/N), which is constant. > This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again. Sure, but this just shows that the fact that deletes are linear time doesn't matter if you never do any deletes. > In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant. No, deletion operations do O(N) work O(M) times amortized over M, where N is the dict size and M the number of operations. It's linear in the max size of the dict. In the special case where the dict never changes size, you could just declare that a constant--but that's pretty misleading. For one thing, it's usually going to be a very large constant. For example, if you start with 1000 elements and do 2000 deletes and 2000 inserts, your constant is N/4. More importantly, a constant-sized dict with lots of mutations is a pretty rare use case, and in the more common case where there are more inserts than deletes, so the dict is growing, the delete times are increasing with the dict size. Unless the ratio of deletes to inserts is falling at the same rate (another rare special case), this isn't going to be constant. And if you're wondering how dict gets away with this, it's kind of cheating by just not generally shrinking on deletes, but that's a cheat that works just fine for the majority of real-life use cases (and it's hard to work around in the vast majority of cases where it doesn't). That obviously won't work here--if a dict later regrows, it automatically uses up all the deleted buckets, but if an array-based OrderedDict later regrows, it will just append onto the end, leaving all the holes behind. You have no choice but to compact those holes at some point, and there's no way to make that constant time in an array. > The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower. If your argument were correct, you could always simulate a linked list with an array with only a constant-factor cost, and that just isn't true. > Actually the current C implementation of OrderedDict uses an continuous array for mapping an index in the base hashtable to a list node. It is rebuild from a linked list when the base hashtable is rebuild. I where planned to experiment with getting rid of linked list at all. > >>>>> All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] >>>> >>>> But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). >>> >>> This is only because preserving ordering is not a goal for a dict. >> >> Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to). >> >>> Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. >>> >>> But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted. >> >> The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.) > > The benefit of Raymond's design is that it lets you iterate a continuous array instead of jump forward and backward on randomly distributed in memory list nodes. The benefit of Raymond's design over the existing dict, as he explains in the ActiveState post, is that you can iterate (and store some of the data in) the dense array rather than iterating the hash table (which is a sparse array). He can maintain that denseness even in the face of deletions because he isn't trying to preserve order, so he can always just swap the last value into the deleted slot. If you try to use the same design to preserve order, you can't do that. If you instead mark deleted slots as deleted, and recompact every time it gets to 50% deleted, then you're iterating a sparse array that's more sparse than the hash table, so you do not get the time (or space) benefits over a dict that Raymond's design offers. > And on modern computers iterating even sparse array can be faster that iterating linked list. [*] Now you're proposing a completely different theoretical benefit from the one Raymond designed for and explained, so the analogy with his design doesn't buy you anything. You'd be better off sticking to analogy with the current plain dict, which, like your design, iterates over a sparse array, and is simpler to explain and analyze. > This is not the only benefit of Raymond's design, the other is presumably more compact representation. Yes, if you can move each 48-byte hash bucket into a dense array, leaving only a 4-byte integer in the sparse one in the hash table, then you save 48 * load - 4 bytes per entry. But the design under discussion doesn't get that, because you're putting the large data in an array that's even sparser than the hash table, so you're losing space. One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc. From srkunze at mail.de Mon Dec 14 13:00:49 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 14 Dec 2015 19:00:49 +0100 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> Message-ID: <566F03D1.3040601@mail.de> Despite all the nice discussion about runtime and so on and so forth. Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution. Best, Sven On 14.12.2015 18:30, Andrew Barnert via Python-ideas wrote: > On Dec 14, 2015, at 06:58, Serhiy Storchaka wrote: >>> On 13.12.15 06:57, Andrew Barnert via Python-ideas wrote: >>> Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. >> A linked list searching for the value takes linear time too. To avoid it, an auxiliary hashtable is used. > This is already covered in the very next paragraph. > >> In current Python implementation it maps key to a list node, in proposed implementation it maps key to an index in a continuous array. No linear search anymore, this problem already is solved. >> >>> He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. >>> >>> He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. >> What do you mean saying about "change half the hash values for each shift" and why this is a problem? > If you delete a linked list node, none of the other list nodes change. So, if you have hash entries pointing at all of the list nodes, only one hash entry has to be changed for a delete. > > If you delete an array element, all of the subsequent indices change. So, if you have hash entries pointing at the indices, on average, half of the hash entries have to be changed for a delete. > > And if you're about to say "but combining the two solves that problem", that's already in the very next paragraph, so please read on. > >>> What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations. >> Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N). > If that were true, dicts would be amortized linear time. You do O(N) work O(N) times, divided over O(N) operations, which means amortized O(N*N/N), which is linear. > > The fact that dict (and list) grow exponentially rather than linearly is critical. It means that, even in the case of nothing but inserts, you only rehash O(log N) times, not O(N), and the work done each time also only grows logarithmically. The total work done for all resizes ends up being just a constant times N (think about the sum of 1 + 1/2 + 1/4 + 1/8 + ... vs. 1 + 9/10 + 8/10 + ...), so, divided by N operations, you get amortized O(C*N/N), which is constant. > >> This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again. > Sure, but this just shows that the fact that deletes are linear time doesn't matter if you never do any deletes. > >> In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant. > No, deletion operations do O(N) work O(M) times amortized over M, where N is the dict size and M the number of operations. It's linear in the max size of the dict. > > In the special case where the dict never changes size, you could just declare that a constant--but that's pretty misleading. For one thing, it's usually going to be a very large constant. For example, if you start with 1000 elements and do 2000 deletes and 2000 inserts, your constant is N/4. More importantly, a constant-sized dict with lots of mutations is a pretty rare use case, and in the more common case where there are more inserts than deletes, so the dict is growing, the delete times are increasing with the dict size. Unless the ratio of deletes to inserts is falling at the same rate (another rare special case), this isn't going to be constant. > > And if you're wondering how dict gets away with this, it's kind of cheating by just not generally shrinking on deletes, but that's a cheat that works just fine for the majority of real-life use cases (and it's hard to work around in the vast majority of cases where it doesn't). That obviously won't work here--if a dict later regrows, it automatically uses up all the deleted buckets, but if an array-based OrderedDict later regrows, it will just append onto the end, leaving all the holes behind. You have no choice but to compact those holes at some point, and there's no way to make that constant time in an array. > >> The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower. > If your argument were correct, you could always simulate a linked list with an array with only a constant-factor cost, and that just isn't true. > >> Actually the current C implementation of OrderedDict uses an continuous array for mapping an index in the base hashtable to a list node. It is rebuild from a linked list when the base hashtable is rebuild. I where planned to experiment with getting rid of linked list at all. >> >>>>>> All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] >>>>> But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). >>>> This is only because preserving ordering is not a goal for a dict. >>> Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to). >>> >>>> Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. >>>> >>>> But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted. >>> The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.) >> The benefit of Raymond's design is that it lets you iterate a continuous array instead of jump forward and backward on randomly distributed in memory list nodes. > The benefit of Raymond's design over the existing dict, as he explains in the ActiveState post, is that you can iterate (and store some of the data in) the dense array rather than iterating the hash table (which is a sparse array). He can maintain that denseness even in the face of deletions because he isn't trying to preserve order, so he can always just swap the last value into the deleted slot. If you try to use the same design to preserve order, you can't do that. If you instead mark deleted slots as deleted, and recompact every time it gets to 50% deleted, then you're iterating a sparse array that's more sparse than the hash table, so you do not get the time (or space) benefits over a dict that Raymond's design offers. > >> And on modern computers iterating even sparse array can be faster that iterating linked list. [*] > Now you're proposing a completely different theoretical benefit from the one Raymond designed for and explained, so the analogy with his design doesn't buy you anything. You'd be better off sticking to analogy with the current plain dict, which, like your design, iterates over a sparse array, and is simpler to explain and analyze. > >> This is not the only benefit of Raymond's design, the other is presumably more compact representation. > Yes, if you can move each 48-byte hash bucket into a dense array, leaving only a 4-byte integer in the sparse one in the hash table, then you save 48 * load - 4 bytes per entry. But the design under discussion doesn't get that, because you're putting the large data in an array that's even sparser than the hash table, so you're losing space. > > One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From leewangzhong+python at gmail.com Tue Dec 15 04:09:55 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 04:09:55 -0500 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <566F03D1.3040601@mail.de> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> Message-ID: On Dec 14, 2015 1:11 PM, "Sven R. Kunze" wrote: > > Despite all the nice discussion about runtime and so on and so forth. > > Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution. I did. It is in a pastebin link in my original message, based on the 3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance on evaluating it. Maybe it wasn't seen because I used '-'*n to separate it from my intro, or maybe pastebin is so disliked that people couldn't see it. Here it is again: http://pastebin.com/LESRktJw Sorry I haven't been responding. I wanted to respond on a PC instead of my phone because I don't see a plaintext option on the app, but I get distracted when on the PC. My implementation combines the ideas I listed. Raymond's idea seems similar, but more clever with indexing and less focused on ordering. Regarding PyPy's, I get the impression that CPython can make an efficient implementation based on it. I think that's the real way to go for an efficient OrderedDict implementation. By the way, here are the costs for some of the algorithms, compared to the raw Python OrderedDict: - getitem: unchanged - setitem: Amortized O(1). but for slightly different reasons (list expansion in addition to expansion of two maps). If new key, it will usually be better: no creation of a new object, and a lot less constant work in bytecode if the list isn't expanded (incrementing and appending instead of linking). - delitem: Amortized O(1). A lot less bytecode overhead as above, but there can be a compaction here. - (iteration): Iterating over a 50% empty array may be better than iterating over a linked list, especially if cache is taken into account. - Python version: For a 50% empty array, you pay len(odict) extra "if"s and "is"s, plus the cost of iterating over a (reversed, for reverse iteration) list of length 2*len(odict). The list version has a lot of extra dereferences instead. - Potential C version: It's just a simple for loop with a check of pointer identity, versus a chain of dereferences. It should definitely be better on the cache, maybe less good for branch prediction. - pop: Same as delitem: possible compaction. - move_to_end: Moving to the back is less constant work (though I do a lot of checks, and the C version might not have the same benefits). Moving to the front is awful: it shifts everything up. This can be alleviated by using a deque, or using the list as a list-backed deque: treat the array as circular. This would add overhead to everything else, though, so I don't like it. (How do I find out how popular the second parameter to move_to_end is?) If using my second idea of storing inner_dict[key] = [key, value, index] (where index is into the order array), there would be a lot less overhead on many things, but more overhead on getitem. In C code, it might be barely any overhead. On Sat, Dec 12, 2015 at 6:06 AM, Andrew Barnert wrote: > There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start. I, uh, have no professional experience in programming. So it's not as easy for me to know how to start. > Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.) When iterating the array, here are the costs per element: - used: Same as linked list, but without the linkedlist overhead. - unused: Check if the entry is valid. This doesn't require dereferencing. It's possible that the branch for checking validity is higher than the linkedlist overhead, but that's the only way I can think of that the array would be worse at iterating, even when sparse. > And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.) I'm pretty sure that all C code uses the C version of __getitem__ on the dict, because it's NOT statically known which __getitem__ function to call. There's a pointer in each dict to its __getitem__ function. On Sat, Dec 12, 2015 at 7:34 AM, Serhiy Storchaka wrote: > The performance of Python implementation doesn't matter much now (while it > has the same computational complexity). It might matter a little, for alternative Python implementations which haven't yet made their own optimized ODicts. On Mon, Dec 14, 2015 at 12:30 PM, Andrew Barnert via Python-ideas wrote: > One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc. I think it's much better to just live with a half-empty array. But this would have to be proven with code and benchmarks. As an aside, have you seen Stroustrup and Sutter talking about linked lists versus vectors? - Stroustrup: https://www.youtube.com/watch?v=YQs6IC-vgmo - Sutter: https://channel9.msdn.com/Events/Build/2014/2-661 Here's an example: - Vector: Linear search for an element, then remove it by shifting everything down. - List: Linear search for an element, then remove it by delinking. Vector will beat list for even very large sizes, simply because the dereferencing in the search will be worse than the shifting (which would take advantage of prefetching, branch prediction, etc.). For iteration of an OrderedDict based on an array, there won't be a branch prediction benefit, since the holes will be distributed solely based on how it's used. There might not even be a prefetching benefit, usually, because of how many objects might be involved during the iteration (that is, more memory is touched than with equivalent C++ code). But that's what testing is for, I guess. From leewangzhong+python at gmail.com Tue Dec 15 05:04:17 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 05:04:17 -0500 Subject: [Python-ideas] Buffering iterators? Message-ID: (This would be a lot of work that I wouldn't know how to do, but is it worth thinking about? Maybe it's already been done at the level necessary. Also, this is a proposal for the sake of theoretical optimization, in case you don't like that, and it will require a lot of work in a lot of code everywhere. As I see it, even if it's possible and desirable to do this, it would take years of work and testing to make it beneficial.) The move from Python 2 (disclaimer: which I barely touched, so I have little sentimental attachment) to Python 3 resulted in many functions returning iterators instead of lists. This saves a lot of unnecessary memory when iterating, say, over the indices of a large list, especially if we break in the middle. I'm wondering, though, about a step backwards: generating values before they're needed. The idea is based on file buffered reading and memory prefetching (https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#DDR_SDRAM_prefetch_architecture). In fact, I'm hoping to take advantage of such things. For example, in `sum(lst[i] * i for i in range(10000))`, `sum` will exhaust the iterator, so it can ask the generator to return buffers, and it will internally read the elements off the lists. It would be the responsibility of the iterator to decide whether to respect the request, and to determine the size of the buffer. It would be the responsibility of the consumer to request it, and consumers should only request it if they think they'll almost definitely consume a lot at a time. The idea is, especially for complex nested iterators, instead of running A B C A B C A B C..., where each is the code for generating a next thing from the previous, that the interpreter runs A A A A A..., B B B B B..., C C C..., which could mean a lot more memory locality in both instructions and objects. There's the possibility that a function has side-effects, so buffering would have different semantics than normal. There's also the possibility that getting the next element is complex enough that it wouldn't help to buffer. If the iterator can't tell, then it should just not buffer. Here's an obnoxious example of where you can't tell: def f(i): return i s = 0 for i in (f(x) for x in range(100)): s += f(i) def f(x): return x + s In fact, all Python function calls are probably unsafe, including with operators (which can be legally replaced during the iteration). Well, `map` and `filter` are possible exceptions in special cases, because the lookup for their function is bound at the call to `map`. It's usually safe if you're just using reversed, enumerate, builtin operators, dict views, etc. on an existing data structure, as long as your iteration doesn't modify entries, unlike so: for i, x in enumerate(reversed(lst)): lst[i+1] = x But I'm looking toward the future, where it might be possible for the interpreter to analyze loops and functions before making such decisions. Then again, if the interpreter is that smart, it can figure out where and when to buffer without adding to the API of iterators. Anyway, here's an idea, how it might be helpful, and how it might not be helpful. From leewangzhong+python at gmail.com Tue Dec 15 06:49:06 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 06:49:06 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema wrote: > Another option would be that this syntax would not represent a list of > tuples but an OrderedDict. I think they both have their advantages, the list > of tuples would allow list operations such as `+ items` as you suggested and > usage of the same keys multiple times. But OrderedDict would allow simple > indexing directly. But it does not really matter that much since both could > easily be used to generate the other. OrderedDict(['1':'2']) and > list(['1':'2'].items()) respectively. > > I think the main case for the list of tuples is actually that you can make > any OrderedDict from a list of tuples, but not the other way around, since > duplicate keys would be removed. Which is why I like your idea for a > shorthand for a list of tuples better, since it covers more uses. > > One important thing to note is the discussion I already mentioned in my > first email. Especially this message where guide votes -100 for your syntax > for OrderedDict creation: > https://mail.python.org/pipermail/python-ideas/2009-June/004924.html > > I'm not sure why he disliked that syntax and if he still does. Or if his > thoughts are different when it would represent a list of tuples instead of > an OrderedDict. I also wonder why he doesn't like it. I wouldn't like it if it represented a list of tuples. What we have is: - [a, b, c] -> list = [ordered, mutable, collection] - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] - (a, b, c) -> tuple = [ordered, immutable, collection] It seems to me that the pattern would extend to: - [a:x, b:y, c:z] -> [ordered, mutable, mapping] - (a:x, b:y, c:z) -> [ordered, immutable, mapping] The first one is ALMOST OrderedDict, except that it has unique keys. The second is ALMOST namedtuple, except that it: - doesn't allow duplicate keys - doesn't allow indexing by keys (though this can change) - doesn't allow arbitrary key type (and we don't want ints allowed as keys) - needs a name and a type If we add the rule that a mapping literal's type should have unique keys, then [a:x] -> OrderedDict fits the pattern. But [a:x] => [(a,x)] doesn't. From leewangzhong+python at gmail.com Tue Dec 15 07:23:59 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 07:23:59 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <99F30521-D40D-4CC7-AB33-F5F159477C0E@yahoo.com> References: <20151213032416.GN3821@ando.pearwood.info> <99F30521-D40D-4CC7-AB33-F5F159477C0E@yahoo.com> Message-ID: On Sun, Dec 13, 2015 at 12:15 AM, Andrew Barnert via Python-ideas wrote: > And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin). For YAML, I read it as a list of dicts. My Python's yaml module (pyyaml?) agrees. >>> import yaml >>> yaml.load('[a: 1, b: 2]') [{'a': 1}, {'b': 2}] However, YAML's website (page: http://www.yaml.org/refcard.html) lists the !!omap type cast as using this syntax: '!!omap': [ one: 1, two: 2 ] I tried using !!seq. Not sure if I'm doing it right. >>> yaml.load('!!seq [a: 1, b: 2]') [{'a': 1}, {'b': 2}] >>> yaml.load('!!omap [a: 1, b: 2]') [('a', 1), ('b', 2)] (Huh. PyYAML module thinks that an omap should be a list of pairs. It might eventually change to OrderedDict, though.) From me at jeltef.nl Tue Dec 15 08:08:49 2015 From: me at jeltef.nl (Jelte Fennema) Date: Tue, 15 Dec 2015 14:08:49 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful. Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this. I would really like to hear Guido's response to these ideas. Since he disliked the idea so much in the past and I can't find a reference to his reasoning. Jelte PS. Seeing as we're cleary drifting from the original topic of this thread, would it be a good idea if a new one would be created with these new ideas in mind? On 15 December 2015 at 12:49, Franklin? Lee wrote: > On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema wrote: > > Another option would be that this syntax would not represent a list of > > tuples but an OrderedDict. I think they both have their advantages, the > list > > of tuples would allow list operations such as `+ items` as you suggested > and > > usage of the same keys multiple times. But OrderedDict would allow simple > > indexing directly. But it does not really matter that much since both > could > > easily be used to generate the other. OrderedDict(['1':'2']) and > > list(['1':'2'].items()) respectively. > > > > I think the main case for the list of tuples is actually that you can > make > > any OrderedDict from a list of tuples, but not the other way around, > since > > duplicate keys would be removed. Which is why I like your idea for a > > shorthand for a list of tuples better, since it covers more uses. > > > > One important thing to note is the discussion I already mentioned in my > > first email. Especially this message where guide votes -100 for your > syntax > > for OrderedDict creation: > > https://mail.python.org/pipermail/python-ideas/2009-June/004924.html > > > > I'm not sure why he disliked that syntax and if he still does. Or if his > > thoughts are different when it would represent a list of tuples instead > of > > an OrderedDict. > > I also wonder why he doesn't like it. I wouldn't like it if it > represented a list of tuples. > > What we have is: > - [a, b, c] -> list = [ordered, mutable, collection] > - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] > - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] > - (a, b, c) -> tuple = [ordered, immutable, collection] > > It seems to me that the pattern would extend to: > - [a:x, b:y, c:z] -> [ordered, mutable, mapping] > - (a:x, b:y, c:z) -> [ordered, immutable, mapping] > > The first one is ALMOST OrderedDict, except that it has unique keys. > The second is ALMOST namedtuple, except that it: > - doesn't allow duplicate keys > - doesn't allow indexing by keys (though this can change) > - doesn't allow arbitrary key type (and we don't want ints allowed as keys) > - needs a name and a type > > If we add the rule that a mapping literal's type should have unique > keys, then [a:x] -> OrderedDict fits the pattern. But [a:x] => [(a,x)] > doesn't. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Dec 15 08:19:58 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 16 Dec 2015 00:19:58 +1100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: On Wed, Dec 16, 2015 at 12:08 AM, Jelte Fennema wrote: > After thinking some more, I think you are right in saying that it would make > more sense to let it represent an OrderedDict directly. Mostly because the > mutability suggested by the square brackets. And also a bit because I'm not > sure when a mapping that maps multiple values to the same key is actually > useful. > > Secondly, I think your idea for namedtuple literals is great. This would be > really useful in the namedtuple use case where you want to return multiple > values from a function, but you want to be clear in what these values > actually are. I think this would need to generate some kind of anonymous > named tuple class though, since it would make no sense to have to create a > new class when using a literal like this. Be careful of this trap, though: >>> from collections import namedtuple, OrderedDict >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(x=5, y=2) >>> list(p) [5, 2] >>> od = OrderedDict((('x',5),('y',2))) >>> list(od) ['x', 'y'] Dictionary-like things iterate over their keys; tuple-like things iterate over their values. (And a list of pairs would effectively iterate over items().) Having extremely similar syntax for creating them might well lead to a lot of confusion on that point. ChrisA From me at jeltef.nl Tue Dec 15 08:27:14 2015 From: me at jeltef.nl (Jelte Fennema) Date: Tue, 15 Dec 2015 14:27:14 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: I see your point, but it would almost never make sense to list the attributes of a namedtuple. As this would be the only one that could cause confusion (since OrderedDict would do the same as dict), I doubt it would actually cause that much confusion. Jelt On 15 December 2015 at 14:19, Chris Angelico wrote: > On Wed, Dec 16, 2015 at 12:08 AM, Jelte Fennema wrote: > > After thinking some more, I think you are right in saying that it would > make > > more sense to let it represent an OrderedDict directly. Mostly because > the > > mutability suggested by the square brackets. And also a bit because I'm > not > > sure when a mapping that maps multiple values to the same key is actually > > useful. > > > > Secondly, I think your idea for namedtuple literals is great. This would > be > > really useful in the namedtuple use case where you want to return > multiple > > values from a function, but you want to be clear in what these values > > actually are. I think this would need to generate some kind of anonymous > > named tuple class though, since it would make no sense to have to create > a > > new class when using a literal like this. > > Be careful of this trap, though: > > >>> from collections import namedtuple, OrderedDict > >>> Point = namedtuple('Point', ['x', 'y']) > >>> p = Point(x=5, y=2) > >>> list(p) > [5, 2] > >>> od = OrderedDict((('x',5),('y',2))) > >>> list(od) > ['x', 'y'] > > Dictionary-like things iterate over their keys; tuple-like things > iterate over their values. (And a list of pairs would effectively > iterate over items().) Having extremely similar syntax for creating > them might well lead to a lot of confusion on that point. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Tue Dec 15 09:41:17 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 09:41:17 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: On Tue, Dec 15, 2015 at 8:08 AM, Jelte Fennema wrote: > Secondly, I think your idea for namedtuple literals is great. This would be > really useful in the namedtuple use case where you want to return multiple > values from a function, but you want to be clear in what these values > actually are. I think this would need to generate some kind of anonymous > named tuple class though, since it would make no sense to have to create a > new class when using a literal like this. Whoa whoa whoa. I wasn't suggesting a namedtuple literal. Let's not bring down the wrath of the gods. (Also, it's come up before (https://mail.python.org/pipermail/python-ideas/2014-April/027434.html) and I was part of the discussion (https://mail.python.org/pipermail/python-ideas/2014-April/027602.html). So it's not my idea.) (And it shouldn't be namedtuple, exactly, since namedtuple is a metaclass which generates classes with names. Attrtuple? For performance, it would map [keylist] => attrtuple[keylist]. Earlier discussion here: https://mail.python.org/pipermail/python-ideas/2013-June/021277.html) From lac at openend.se Tue Dec 15 09:53:20 2015 From: lac at openend.se (Laura Creighton) Date: Tue, 15 Dec 2015 15:53:20 +0100 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> Message-ID: <201512151453.tBFErKCb032452@fido.openend.se> In a message of Tue, 15 Dec 2015 04:09:55 -0500, "Franklin? Lee" writes: >I did. It is in a pastebin link in my original message, based on the >3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance >on evaluating it. Maybe it wasn't seen because I used '-'*n to >separate it from my intro, or maybe pastebin is so disliked that >people couldn't see it. Here it is again: http://pastebin.com/LESRktJw That is the problem. We absolutely do not want links to things like pastebin. We want the code here, as part of the text. 5 years from now, when other people are scratching their heads saying, I wonder why Guido decided things the way he did, and whether that decision can and should be revisited, the first thing we will do is to go back and read all this discussion. And if the discussion is about code we can no longer see, because the pastebin has expired, then we won't be able to learn much. Anything that matters needs to be part of the archived discussion. Laura From leewangzhong+python at gmail.com Tue Dec 15 10:00:20 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 10:00:20 -0500 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <201512151453.tBFErKCb032452@fido.openend.se> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> Message-ID: On Tue, Dec 15, 2015 at 9:53 AM, Laura Creighton wrote: > In a message of Tue, 15 Dec 2015 04:09:55 -0500, "Franklin? Lee" writes: >>I did. It is in a pastebin link in my original message, based on the >>3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance >>on evaluating it. Maybe it wasn't seen because I used '-'*n to >>separate it from my intro, or maybe pastebin is so disliked that >>people couldn't see it. Here it is again: http://pastebin.com/LESRktJw > > That is the problem. We absolutely do not want links to things > like pastebin. We want the code here, as part of the text. 5 years > from now, when other people are scratching their heads saying, > I wonder why Guido decided things the way he did, and whether that > decision can and should be revisited, the first thing we will do is > to go back and read all this discussion. And if the discussion is > about code we can no longer see, because the pastebin has expired, > then we won't be able to learn much. > > Anything that matters needs to be part of the archived discussion. > > Laura I feel similarly about information history, which is why I always set "Expire: Never" when I use pastebin :D. But alright. The rest of this message is the code as I had it when I sent the original message. from collections.abc import * from reprlib import recursive_repr as _recursive_repr from collections import OrderedDict class _ListDictKeysView(KeysView): def __reversed__(self): yield from reversed(self._mapping) class _ListDictItemsView(ItemsView): def __reversed__(self): for key in reversed(self._mapping): yield (key, self._mapping[key]) class _ListDictValuesView(ValuesView): def __reversed__(self): for key in reversed(self._mapping): yield self._mapping[key] _sentinel = object() class ListDict(dict): 'Dictionary that remembers insertion order' # An inherited dict maps keys to values. # The inherited dict provides __getitem__, __len__, __contains__, and get. # The remaining methods are order-aware. # Big-O running times for all methods are the same as regular dictionaries. def __init__(*args, **kwds): '''Initialize an ordered dictionary. The signature is the same as regular dictionaries, but keyword arguments are not recommended because their insertion order is arbitrary. ''' if not args: raise TypeError("descriptor '__init__' of 'ListDict' object " "needs an argument") self, *args = args if len(args) > 1: raise TypeError('expected at most 1 arguments, got %d' % len(args)) try: # self.__root self.__list except AttributeError: self.__map = {} self.__list = [] self.__size = 0 self.__update(*args, **kwds) def __setitem__(self, key, value, dict_setitem=dict.__setitem__, len=len): 'od.__setitem__(i, y) <==> od[i]=y' # If it's a new key, we need to track it. if key not in self: self.__map[key] = len(self.__list) self.__list.append(key) self.__size += 1 dict_setitem(self, key, value) def __delitem__(self, key, dict_delitem=dict.__delitem__, sentinel=_sentinel): 'od.__delitem__(y) <==> del od[y]' dict_delitem(self, key) # Remove the tracking for this item index = self.__map.pop(key) self.__list[index] = sentinel self.__size -= 1 self.__compact() def __iter__(self, sentinel=_sentinel): 'od.__iter__() <==> iter(od)' for key in self.__list: if key is not sentinel: yield key def __reversed__(self, sentinel=_sentinel, reversed=reversed): 'od.__reversed__() <==> reversed(od)' for key in reversed(self.__list): if key is not sentinel: yield key def clear(self): 'od.clear() -> None. Remove all items from od.' self.__list.clear() self.__map.clear() self.__size = 0 dict.clear(self) # dict.clear isn't cached? def popitem(self, last=True, sentinel=_sentinel, reversed=reversed, next=next): '''od.popitem() -> (k, v), return and remove a (key, value) pair. Pairs are returned in LIFO order if last is true or FIFO order if false. ''' if not self: raise KeyError('dictionary is empty') if last: lst = reversed(self.__list) else: lst = self.__list # Could use the key lookup to find this, but... meh. # Note that attempting to compact first might have helped. index, key = next((i, k) for i, k in enumerate(lst) if k is not sentinel) # We're calling dict.pop later, which won't handle # the metadata. del self.__map[key] self.__list[index] = sentinel self.__size -= 1 self.__compact() value = dict.pop(self, key) # dict.pop isn't cached? return key, value def __compact(self, sentinel=_sentinel, enumerate=enumerate, reversed=reversed): ''' Compact the order __list if necessary. ''' # May need to use this a lot in the upcoming `else`. lst = self.__list if not lst: return if self.__size / len(lst) <= 0.5: #chosen at non-random # Implementation 1: list comprehension self.__list = [k for k in lst if k is not sentinel] # Implementation 2: # If only `list` had a `remove_all` method... pass ''' Update all indices after a reordering. Should only be done when full (because it shouldn't be necessary otherwise). ''' inner_map = self.__map for index, key in enumerate(self.__list): inner_map[key] = index else: # Even if the list isn't mostly empty, # we can try to clear the back. # TODO: How can this be more efficient? # Note: There exists a non-sentinel because # otherwise, .__size/positive == 0 < positive. # # Implementation 1: Pop until it drops. # while lst[-1] is sentinel: # lst.pop() # Implementation 2: Count the number of sentinels at the end. emptys = next(i for i, k in enumerate(reversed(lst)) if k is not sentinel) # guaranteed not to StopIteration since .__size > 0 del lst[:-emptys] #safe even if 0 def move_to_end(self, key, last=True, sentinel=_sentinel, enumerate=enumerate, len=len): '''Move an existing element to the end (or beginning if last==False). Raises KeyError if the element does not exist. When last=True, acts like a fast version of self[key]=self.pop(key). ''' index = self.__map[key] lst = self.__list if last: if index + 1 == len(lst): # already last # Not sure if this is the right path to optimize. # But I think redundant move_to_ends shouldn't # blow up the __list. return lst[index] = sentinel if lst[-1] is sentinel: # can just swap with last lst[-1] = key self.__map[key] = len(lst) - 1 else: # append and maybe compact lst[index] = sentinel lst.append(key) self.__map[key] = len(lst) - 1 self.__compact() else: # This is costly. But this shouldn't # be a common case anyway, right? # I mean, who repeatedly adds to the front # of an OrderedDict? # And this is basically the only costly # operation I'm adding. lst[index] = sentinel # Propagate forward from the front. for i, newkey in enumerate(lst): self.__map[key] = i lst[i], key = key, newkey if key is sentinel: break def __sizeof__(self): sizeof = _sys.getsizeof size = sizeof(self.__dict__) # instance dictionary size += sizeof(self.__map) * 2 # internal dict and inherited dict size += sizeof(self.__list) size += sizeof(self.__size) return size update = __update = MutableMapping.update def keys(self): "D.keys() -> a set-like object providing a view on D's keys" return _ListDictKeysView(self) def items(self): "D.items() -> a set-like object providing a view on D's items" return _ListDictItemsView(self) def values(self): "D.values() -> an object providing a view on D's values" return _ListDictValuesView(self) __ne__ = MutableMapping.__ne__ __marker = object() def pop(self, key, default=__marker): '''od.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised. ''' if key in self: result = self[key] del self[key] return result if default is self.__marker: raise KeyError(key) return default def setdefault(self, key, default=None): 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' if key in self: return self[key] self[key] = default return default @_recursive_repr() def __repr__(self): 'od.__repr__() <==> repr(od)' if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self.items())) def __reduce__(self): 'Return state information for pickling' inst_dict = vars(self).copy() for k in vars(ListDict()): inst_dict.pop(k, None) return self.__class__, (), inst_dict or None, None, iter(self.items()) def copy(self): 'od.copy() -> a shallow copy of od' return self.__class__(self) @classmethod def fromkeys(cls, iterable, value=None): '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S. If not specified, the value defaults to None. ''' self = cls() for key in iterable: self[key] = value return self def __eq__(self, other): '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive while comparison to a regular mapping is order-insensitive. ''' if isinstance(other, ListDict): return dict.__eq__(self, other) and all(map(_eq, self, other)) return dict.__eq__(self, other) From srkunze at mail.de Tue Dec 15 11:45:43 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 15 Dec 2015 17:45:43 +0100 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> Message-ID: <567043B7.5060102@mail.de> Wow. Thanks for that. :) Well, is there some performance case scenario? Something which tests all possible interactions with OrderedDict which could be used as the authoritative Benchmark on this. Otherwise, I fear, it's just talking about possible possibilities nobody really evaluated properly. Best, Sven On 15.12.2015 16:00, Franklin? Lee wrote: > On Tue, Dec 15, 2015 at 9:53 AM, Laura Creighton wrote: >> In a message of Tue, 15 Dec 2015 04:09:55 -0500, "Franklin? Lee" writes: >>> I did. It is in a pastebin link in my original message, based on the >>> 3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance >>> on evaluating it. Maybe it wasn't seen because I used '-'*n to >>> separate it from my intro, or maybe pastebin is so disliked that >>> people couldn't see it. Here it is again: http://pastebin.com/LESRktJw >> That is the problem. We absolutely do not want links to things >> like pastebin. We want the code here, as part of the text. 5 years >> from now, when other people are scratching their heads saying, >> I wonder why Guido decided things the way he did, and whether that >> decision can and should be revisited, the first thing we will do is >> to go back and read all this discussion. And if the discussion is >> about code we can no longer see, because the pastebin has expired, >> then we won't be able to learn much. >> >> Anything that matters needs to be part of the archived discussion. >> >> Laura > I feel similarly about information history, which is why I always set > "Expire: Never" when I use pastebin :D. > > But alright. The rest of this message is the code as I had it when I > sent the original message. > > > from collections.abc import * > from reprlib import recursive_repr as _recursive_repr > from collections import OrderedDict > > class _ListDictKeysView(KeysView): > > def __reversed__(self): > yield from reversed(self._mapping) > > class _ListDictItemsView(ItemsView): > > def __reversed__(self): > for key in reversed(self._mapping): > yield (key, self._mapping[key]) > > class _ListDictValuesView(ValuesView): > > def __reversed__(self): > for key in reversed(self._mapping): > yield self._mapping[key] > > _sentinel = object() > > > class ListDict(dict): > 'Dictionary that remembers insertion order' > # An inherited dict maps keys to values. > # The inherited dict provides __getitem__, __len__, __contains__, and get. > # The remaining methods are order-aware. > # Big-O running times for all methods are the same as regular dictionaries. > > def __init__(*args, **kwds): > '''Initialize an ordered dictionary. The signature is the same as > regular dictionaries, but keyword arguments are not recommended because > their insertion order is arbitrary. > > ''' > if not args: > raise TypeError("descriptor '__init__' of 'ListDict' object " > "needs an argument") > self, *args = args > if len(args) > 1: > raise TypeError('expected at most 1 arguments, got %d' % len(args)) > try: > # self.__root > self.__list > except AttributeError: > self.__map = {} > self.__list = [] > self.__size = 0 > self.__update(*args, **kwds) > > def __setitem__(self, key, value, > dict_setitem=dict.__setitem__, len=len): > 'od.__setitem__(i, y) <==> od[i]=y' > # If it's a new key, we need to track it. > if key not in self: > self.__map[key] = len(self.__list) > self.__list.append(key) > self.__size += 1 > > dict_setitem(self, key, value) > > def __delitem__(self, key, dict_delitem=dict.__delitem__, > sentinel=_sentinel): > 'od.__delitem__(y) <==> del od[y]' > dict_delitem(self, key) > > # Remove the tracking for this item > index = self.__map.pop(key) > self.__list[index] = sentinel > self.__size -= 1 > > self.__compact() > > def __iter__(self, sentinel=_sentinel): > 'od.__iter__() <==> iter(od)' > for key in self.__list: > if key is not sentinel: > yield key > > def __reversed__(self, sentinel=_sentinel, reversed=reversed): > 'od.__reversed__() <==> reversed(od)' > for key in reversed(self.__list): > if key is not sentinel: > yield key > > def clear(self): > 'od.clear() -> None. Remove all items from od.' > self.__list.clear() > self.__map.clear() > self.__size = 0 > dict.clear(self) # dict.clear isn't cached? > > def popitem(self, last=True, sentinel=_sentinel, > reversed=reversed, next=next): > '''od.popitem() -> (k, v), return and remove a (key, value) pair. > Pairs are returned in LIFO order if last is true or FIFO order if false. > > ''' > if not self: > raise KeyError('dictionary is empty') > > if last: > lst = reversed(self.__list) > else: > lst = self.__list > > # Could use the key lookup to find this, but... meh. > # Note that attempting to compact first might have helped. > index, key = next((i, k) > for i, k in enumerate(lst) > if k is not sentinel) > > # We're calling dict.pop later, which won't handle > # the metadata. > del self.__map[key] > self.__list[index] = sentinel > self.__size -= 1 > > self.__compact() > > value = dict.pop(self, key) # dict.pop isn't cached? > return key, value > > def __compact(self, sentinel=_sentinel, > enumerate=enumerate, reversed=reversed): > ''' Compact the order __list if necessary. > ''' > # May need to use this a lot in the upcoming `else`. > lst = self.__list > if not lst: > return > > if self.__size / len(lst) <= 0.5: #chosen at non-random > # Implementation 1: list comprehension > self.__list = [k for k in lst if k is not sentinel] > > # Implementation 2: > # If only `list` had a `remove_all` method... > pass > > ''' Update all indices after a reordering. > > Should only be done when full (because it shouldn't be > necessary otherwise). > ''' > inner_map = self.__map > for index, key in enumerate(self.__list): > inner_map[key] = index > > else: > # Even if the list isn't mostly empty, > # we can try to clear the back. > # TODO: How can this be more efficient? > > # Note: There exists a non-sentinel because > # otherwise, .__size/positive == 0 < positive. > > # # Implementation 1: Pop until it drops. > # while lst[-1] is sentinel: > # lst.pop() > > # Implementation 2: Count the number of sentinels at the end. > emptys = next(i for i, k in enumerate(reversed(lst)) > if k is not sentinel) > # guaranteed not to StopIteration since .__size > 0 > del lst[:-emptys] #safe even if 0 > > def move_to_end(self, key, last=True, sentinel=_sentinel, > enumerate=enumerate, len=len): > '''Move an existing element to the end (or beginning if last==False). > > Raises KeyError if the element does not exist. > When last=True, acts like a fast version of self[key]=self.pop(key). > ''' > > index = self.__map[key] > lst = self.__list > if last: > if index + 1 == len(lst): > # already last > # Not sure if this is the right path to optimize. > # But I think redundant move_to_ends shouldn't > # blow up the __list. > return > > lst[index] = sentinel > if lst[-1] is sentinel: > # can just swap with last > lst[-1] = key > self.__map[key] = len(lst) - 1 > else: > # append and maybe compact > lst[index] = sentinel > lst.append(key) > self.__map[key] = len(lst) - 1 > self.__compact() > else: > # This is costly. But this shouldn't > # be a common case anyway, right? > # I mean, who repeatedly adds to the front > # of an OrderedDict? > # And this is basically the only costly > # operation I'm adding. > > lst[index] = sentinel > > # Propagate forward from the front. > for i, newkey in enumerate(lst): > self.__map[key] = i > lst[i], key = key, newkey > if key is sentinel: > break > > def __sizeof__(self): > sizeof = _sys.getsizeof > size = sizeof(self.__dict__) # instance dictionary > size += sizeof(self.__map) * 2 # internal dict and > inherited dict > > size += sizeof(self.__list) > size += sizeof(self.__size) > return size > > update = __update = MutableMapping.update > > def keys(self): > "D.keys() -> a set-like object providing a view on D's keys" > return _ListDictKeysView(self) > > def items(self): > "D.items() -> a set-like object providing a view on D's items" > return _ListDictItemsView(self) > > def values(self): > "D.values() -> an object providing a view on D's values" > return _ListDictValuesView(self) > > __ne__ = MutableMapping.__ne__ > > __marker = object() > > def pop(self, key, default=__marker): > '''od.pop(k[,d]) -> v, remove specified key and return the corresponding > value. If key is not found, d is returned if given, otherwise KeyError > is raised. > > ''' > if key in self: > result = self[key] > del self[key] > return result > if default is self.__marker: > raise KeyError(key) > return default > > def setdefault(self, key, default=None): > 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' > if key in self: > return self[key] > self[key] = default > return default > > @_recursive_repr() > def __repr__(self): > 'od.__repr__() <==> repr(od)' > if not self: > return '%s()' % (self.__class__.__name__,) > return '%s(%r)' % (self.__class__.__name__, list(self.items())) > > def __reduce__(self): > 'Return state information for pickling' > inst_dict = vars(self).copy() > for k in vars(ListDict()): > inst_dict.pop(k, None) > return self.__class__, (), inst_dict or None, None, iter(self.items()) > > def copy(self): > 'od.copy() -> a shallow copy of od' > return self.__class__(self) > > @classmethod > def fromkeys(cls, iterable, value=None): > '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S. > If not specified, the value defaults to None. > > ''' > self = cls() > for key in iterable: > self[key] = value > return self > > def __eq__(self, other): > '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive > while comparison to a regular mapping is order-insensitive. > > ''' > if isinstance(other, ListDict): > return dict.__eq__(self, other) and all(map(_eq, self, other)) > return dict.__eq__(self, other) From leewangzhong+python at gmail.com Tue Dec 15 11:47:33 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 11:47:33 -0500 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <567043B7.5060102@mail.de> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> <567043B7.5060102@mail.de> Message-ID: On Tue, Dec 15, 2015 at 11:45 AM, Sven R. Kunze wrote: > Wow. Thanks for that. :) > > Well, is there some performance case scenario? Something which tests all > possible interactions with OrderedDict which could be used as the > authoritative Benchmark on this. > > Otherwise, I fear, it's just talking about possible possibilities nobody > really evaluated properly. > > Best, > Sven Evaluating and benchmarking is why I came here. From guido at python.org Tue Dec 15 12:26:09 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Dec 2015 09:26:09 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: I'm still against it. Despite its popularity in certain circles and use cases, OrderedDict is way less important than dict. Having too many variants on these display notations is confusing for many users. On Tue, Dec 15, 2015 at 3:49 AM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema wrote: > > Another option would be that this syntax would not represent a list of > > tuples but an OrderedDict. I think they both have their advantages, the > list > > of tuples would allow list operations such as `+ items` as you suggested > and > > usage of the same keys multiple times. But OrderedDict would allow simple > > indexing directly. But it does not really matter that much since both > could > > easily be used to generate the other. OrderedDict(['1':'2']) and > > list(['1':'2'].items()) respectively. > > > > I think the main case for the list of tuples is actually that you can > make > > any OrderedDict from a list of tuples, but not the other way around, > since > > duplicate keys would be removed. Which is why I like your idea for a > > shorthand for a list of tuples better, since it covers more uses. > > > > One important thing to note is the discussion I already mentioned in my > > first email. Especially this message where guide votes -100 for your > syntax > > for OrderedDict creation: > > https://mail.python.org/pipermail/python-ideas/2009-June/004924.html > > > > I'm not sure why he disliked that syntax and if he still does. Or if his > > thoughts are different when it would represent a list of tuples instead > of > > an OrderedDict. > > I also wonder why he doesn't like it. I wouldn't like it if it > represented a list of tuples. > > What we have is: > - [a, b, c] -> list = [ordered, mutable, collection] > - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] > - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] > - (a, b, c) -> tuple = [ordered, immutable, collection] > > It seems to me that the pattern would extend to: > - [a:x, b:y, c:z] -> [ordered, mutable, mapping] > - (a:x, b:y, c:z) -> [ordered, immutable, mapping] > > The first one is ALMOST OrderedDict, except that it has unique keys. > The second is ALMOST namedtuple, except that it: > - doesn't allow duplicate keys > - doesn't allow indexing by keys (though this can change) > - doesn't allow arbitrary key type (and we don't want ints allowed as keys) > - needs a name and a type > > If we add the rule that a mapping literal's type should have unique > keys, then [a:x] -> OrderedDict fits the pattern. But [a:x] => [(a,x)] > doesn't. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Dec 15 13:36:45 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 15 Dec 2015 18:36:45 +0000 (UTC) Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <1299932021.1319867.1450204605942.JavaMail.yahoo@mail.yahoo.com> One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular. Of course we could introduce [:] as an empty OrderedDict literal, and {:} as an empty dict. But unless you actually wanted to deprecate {} for empty dict and eventually make it mean empty set (which I doubt anyone would argue for), that just gives us two ways to do it instead of solving the problem. Meanwhile: On Tuesday, December 15, 2015 5:09 AM, Jelte Fennema wrote: >After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful. Well, multidicts are actually useful pretty often--but in Python, they're usually spelled defaultdict(set) or defaultdict(list). After all, you need some syntax to look up (and modify and delete) values. In a dict that directly has multiple values per key, there's no way to specify which one you want, but in a dict that explicitly stores those multiple values as a set or list, it's just d[key] to get or delete that set or list, and d[key].add to add a value, and so on. I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset. As for your mutability point: there's no reason it couldn't be a list of 2-lists instead of a list of 2-tuples. Sure, that will take a little more space in most implementations, but that rarely matters for literals--an object that's big enough in memory that you start to worry about compactness is probably way too big to put in a source file. (And if it _does_ matter, there's no reason CPython couldn't have special code that initializes the lists constructed by [:] literals with capacity 2 instead of the normal minimum capacity, on the expectation that you're not likely to go appending to all of the elements of a list constructed that way, and if you really want to, it's probably clearer to write it with [[]] syntax.) >Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this. First, why would it make no sense to create a new class? This isn't a prototype language; if the attributes are part of the object's type, then that type has to exist, and be accessible as a class. (You could cache the types, so that any two object literals with the same attributes have the same type, but that doesn't really change anything.) If you really want to avoid generating a new type, the type has to have normal-Python dynamic-per-object attributes, like SimpleNamespace, not class-specified attributes. I don't think that's an argument for (:) literals creating new classes, so much as an argument against them producing anything remotely like a namedtuple. None of the existing collection literals produce anything with attributes; namedtuple values can't be looked up by key (as in a mapping), only by attribute (as in SimpleNamespace) or index (as in a sequence); namedtuples don't iterate their keys (like a mapping) but their values (like a sequence)... So that's a pretty bad analogy with dict and OrderedDict in almost every way. Also, the syntax looks enough like general object literals in other languages like JavaScript that it will probably mislead people. (Or, worse, they'd actually be right--you could use (:) and lambda together to create some very unpythonic code, and people coming from JS would be very tempted to do so. The fact that you have to explicitly call type to do that today is enough to prevent that from being an attractive nuisance.) If you look at all the other collection literals (including the proposed [:] for OrderedDict) and try to guess what (:) does by analogy, the obvious answer would be a FrozenOrderedDict. Since frozen dicts in general aren't even useful enough to be in the stdlib, much less as builtins, much less ordered ones, I can't imagine they need literals. But a literal that looks like it should mean that, and instead means something completely different, is at best a new and clunky thing that has to be memorized separately from the rest of the syntax. From abarnert at yahoo.com Tue Dec 15 14:01:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 15 Dec 2015 19:01:10 +0000 (UTC) Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <451995089.1313503.1450206070695.JavaMail.yahoo@mail.yahoo.com> On Tuesday, December 15, 2015 4:24 AM, Franklin? Lee wrote: > > On Sun, Dec 13, 2015 at 12:15 AM, Andrew Barnert via Python-ideas > wrote: >> And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} > is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think > it's ambiguous between an ordered dict and a list of pairs, and you can > resolve that by declaring !odict or !seq, or you can just leave it up to the > implementation to pick one if you don't care... but let's pretend it > wasn't ambiguous; either one covers the use case (and Python only has the > latter option anyway, unless OrderedDict becomes a builtin). > > For YAML, I read it as a list of dicts. My Python's yaml module > (pyyaml?) agrees. According to the YAML 1.1 (and 1.2, since AFAICR they never created the separate repo for 1.2) type-repo omap spec (http://yaml.org/type/omap.html): > Most programming languages do not have a built-in native data type for supporting ordered maps. Such data types are usually provided by libraries. If no such data type is available, an application may resort to loading an ?!!omap? into a native array of hash tables containing one key each. > The ?!!omap? tag may be given explicitly. Alternatively, the application may choose to implicitly type a sequence of single-key mappings to ordered maps. In this case, an explicit ?!seq? transfer must be given to sequences of single-key mappings that do not represent ordered maps. So, if you don't specify either "!!omap" or "!seq", it's up to your implementation whether you've designated an ordered dict or a list of dicts. I was wrong on some of the details--it's "!!omap" rather than "!odict", and it's a list of one-element dicts rather than a list of pairs, and it sounds like even if you _do_ explicitly specify one type the implementation is allowed to give you the other... But still, as I said, YAML is precedent for Python interpreting ['one': 1, 'two': 2] as OrderedDict([('one', 1), ('two', 2)]). Of course it's also precedent for Python interpreting that as a list of dicts [{'one': 1}, {'two': 2}], and I don't think anyone wants that... I suppose you can find precedent for almost any idea , no matter how silly, as long as it's implementable. :) >>>> import yaml >>>> yaml.load('[a: 1, b: 2]') > [{'a': 1}, {'b': 2}] > > However, YAML's website (page: http://www.yaml.org/refcard.html) lists > the !!omap type cast as using this syntax: > > '!!omap': [ one: 1, two: 2 ] > > I tried using !!seq. Not sure if I'm doing it right. > >>>> yaml.load('!!seq [a: 1, b: 2]') > [{'a': 1}, {'b': 2}] >>>> yaml.load('!!omap [a: 1, b: 2]') > [('a', 1), ('b', 2)] > > (Huh. PyYAML module thinks that an omap should be a list of pairs. It > might eventually change to OrderedDict, though.) Ha, so I was wrong about YAML allowing that, but PyYAML does it anyway? From p.f.moore at gmail.com Tue Dec 15 14:02:43 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 15 Dec 2015 19:02:43 +0000 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: On 13 December 2015 at 00:53, Joseph Jevnik wrote: > One thing that might make the association lists more readable without > changing the language would be to visually break up the pairs over multiple > lines. This could change the `OrderedDict` construction to look like: > > OrderedDict([ > (k0, v0), > (k1, v1), > (kn, vn), > ]) This is (IMO) readable, but a bit heavy on punctuation. The OP suggested OrderedDict{1: 'a', 4: int, 2: (3, 3)} as a syntax - while it's a bit too special case on its own, one possibility would be to have callable{k1: v1, k2: v2, ...} be syntactic sugar for callable([(k1, k1), (k2, v2), ...]) Then the syntax would work with any function or constructor that took "list of key/value pairs" as an argument. Points against this suggestion, however: 1. It's not clear to me if this would be parseable within the constraints of the Python language parser. 2. It is *only* syntax sugar, and as such adds no extra expressiveness to the language. 3. It's still pretty specialised - while the "list of key/value pairs" pattern is not uncommon, it's not exactly common, either... Paul From leewangzhong+python at gmail.com Tue Dec 15 14:18:14 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 14:18:14 -0500 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <1299932021.1319867.1450204605942.JavaMail.yahoo@mail.yahoo.com> References: <1299932021.1319867.1450204605942.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Tue, Dec 15, 2015 at 1:36 PM, Andrew Barnert wrote: > One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular. I regularly write {} for an empty set and have to fix it later. It looks like math! On the other hand, I don't think anyone will learn OrderedDict before becoming VERY familiar with lists. If you try to treat a list as an empty OrderedDict, at least you will fail as soon as you use it. `empty_list_that_i_think_is_an_ordered_dict[0] = 5` will raise an error. > I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset. No, when I thought about it while writing the email, I was okay with this syntax NOT having multiple values per key, because I don't think it's a very basic data structure to think about (there's no builtin analogous to it), and the indexing syntax wouldn't be like dict's (indexing gets you a collection of values). If you really wanted an ordered multidict, you could write `[k1: [1,2,3], k2: [4,5], k3: [6]]`, or use set displays or tuple displays instead of list displays. In fact, that just shows how `d = [k1: x, k2: y, k1: z]` as an OrderedMultiDict would be ambiguous: Is d[k1] a list, a set, or a tuple? Does k1 come before or after k2? From greg.ewing at canterbury.ac.nz Tue Dec 15 16:07:12 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 16 Dec 2015 10:07:12 +1300 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> Message-ID: <56708100.5090607@canterbury.ac.nz> Franklin? Lee wrote: > I feel similarly about information history, which is why I always set > "Expire: Never" when I use pastebin :D. Never is a long time. In this context it really means "as long as pastebin itself continues to exist", which could be less than the time the python mailing list archives continue to exist. -- Greg From rosuav at gmail.com Tue Dec 15 17:47:21 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 16 Dec 2015 09:47:21 +1100 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: <56708100.5090607@canterbury.ac.nz> References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> <56708100.5090607@canterbury.ac.nz> Message-ID: On Wed, Dec 16, 2015 at 8:07 AM, Greg Ewing wrote: > Franklin? Lee wrote: >> >> I feel similarly about information history, which is why I always set >> "Expire: Never" when I use pastebin :D. > > > Never is a long time. In this context it really means > "as long as pastebin itself continues to exist", which > could be less than the time the python mailing list > archives continue to exist. Or, more simply: As long as pastebin is accessible. It's entirely possible to create an offline archive of python-ideas posts (whether you see it as a mailing list or a newsgroup), and then to browse it at a time when you have no internet connection. That archive will be incomplete to the value of all externally-linked content. ChrisA From leewangzhong+python at gmail.com Tue Dec 15 20:01:21 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 15 Dec 2015 20:01:21 -0500 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> <56708100.5090607@canterbury.ac.nz> Message-ID: Alright, I get it. What about my code, though? On Tue, Dec 15, 2015 at 5:47 PM, Chris Angelico wrote: > On Wed, Dec 16, 2015 at 8:07 AM, Greg Ewing wrote: >> Franklin? Lee wrote: >>> >>> I feel similarly about information history, which is why I always set >>> "Expire: Never" when I use pastebin :D. >> >> >> Never is a long time. In this context it really means >> "as long as pastebin itself continues to exist", which >> could be less than the time the python mailing list >> archives continue to exist. > > Or, more simply: As long as pastebin is accessible. It's entirely > possible to create an offline archive of python-ideas posts (whether > you see it as a mailing list or a newsgroup), and then to browse it at > a time when you have no internet connection. That archive will be > incomplete to the value of all externally-linked content. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From mal at egenix.com Wed Dec 16 05:50:32 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Dec 2015 11:50:32 +0100 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> Message-ID: <567141F8.2090208@egenix.com> On 15.12.2015 12:49, Franklin? Lee wrote: > On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema wrote: >> Another option would be that this syntax would not represent a list of >> tuples but an OrderedDict. I think they both have their advantages, the list >> of tuples would allow list operations such as `+ items` as you suggested and >> usage of the same keys multiple times. But OrderedDict would allow simple >> indexing directly. But it does not really matter that much since both could >> easily be used to generate the other. OrderedDict(['1':'2']) and >> list(['1':'2'].items()) respectively. >> >> I think the main case for the list of tuples is actually that you can make >> any OrderedDict from a list of tuples, but not the other way around, since >> duplicate keys would be removed. Which is why I like your idea for a >> shorthand for a list of tuples better, since it covers more uses. >> >> One important thing to note is the discussion I already mentioned in my >> first email. Especially this message where guide votes -100 for your syntax >> for OrderedDict creation: >> https://mail.python.org/pipermail/python-ideas/2009-June/004924.html >> >> I'm not sure why he disliked that syntax and if he still does. Or if his >> thoughts are different when it would represent a list of tuples instead of >> an OrderedDict. > > I also wonder why he doesn't like it. I wouldn't like it if it > represented a list of tuples. > > What we have is: > - [a, b, c] -> list = [ordered, mutable, collection] > - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] > - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] > - (a, b, c) -> tuple = [ordered, immutable, collection] > > It seems to me that the pattern would extend to: > - [a:x, b:y, c:z] -> [ordered, mutable, mapping] > - (a:x, b:y, c:z) -> [ordered, immutable, mapping] How would the parser be able to detect these ? Since requests to be able to access the order of values, parameters and definitions in source code come up rather often, perhaps it'd better to provide Python application with a standard access mechanism to this order rather than trying to push use of OrderedDict and the like into the runtime, causing unnecessary performance overhead. The parser does have access to this information in the AST and some of it is partially copied into code object attributes, but there's no general purpose access to the information. Based on the source code order, you could do lots of things, e.g. avoid hacks to map class attributes to column definitions for ORMs, make it possible to write OrderedDict(a=x, b=y) and have the literal order preserved, have NamedTuple(a=x, b=y) work without additional tricks, etc. What's important here is that the runtime performance would not change. The code objects would gain some additional tuples, which store the order of the literals used in their AST, so only the memory consumption would increase. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 16 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From niki.spahiev at gmail.com Wed Dec 16 07:38:59 2015 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Wed, 16 Dec 2015 14:38:59 +0200 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: Hello, Currently expression (a=1, b=2) is a syntax error. If it's defined to mean (('a',1), ('b',2)) it can be used when making OrderedDict or anything that requires named ordered args e.g. OrderedDict((a=1, b=2)) another variant with more changes in VM is OrderedDict(**(a=1, b=2)) Niki From python-ideas at mgmiller.net Wed Dec 16 13:47:22 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Wed, 16 Dec 2015 10:47:22 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <5671B1BA.9040701@mgmiller.net> On 2015-12-15 11:02, Paul Moore wrote: > while it's a bit too special case on its own, one > possibility would be to have > > callable{k1: v1, k2: v2, ...} > > be syntactic sugar for > > callable([(k1, k1), (k2, v2), ...]) Very interesting... I've faced the issue several times over the years when I've wanted to unpack values into a function call in an ordered manner (but it hasn't been available). Perhaps:: callable{**odict} In fact with callables I'd even go so far as wish that ordered unpacking was the default somehow, though I guess that probably isn't possible due to history. So, would be happy to have a way to do it. The syntax looks slightly odd but I could get used to it. I found this on the subject, don't know its status: http://legacy.python.org/dev/peps/pep-0468/ -- -Mike From njs at pobox.com Wed Dec 16 18:17:10 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Dec 2015 15:17:10 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <5671B1BA.9040701@mgmiller.net> References: <5671B1BA.9040701@mgmiller.net> Message-ID: On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller wrote: > > On 2015-12-15 11:02, Paul Moore wrote: >> >> while it's a bit too special case on its own, one >> possibility would be to have >> >> callable{k1: v1, k2: v2, ...} >> >> be syntactic sugar for >> >> callable([(k1, k1), (k2, v2), ...]) > > > Very interesting... I've faced the issue several times over the years when > I've wanted to unpack values into a function call in an ordered manner (but > it hasn't been available). Perhaps:: > > callable{**odict} > > In fact with callables I'd even go so far as wish that ordered unpacking was > the default somehow, though I guess that probably isn't possible due to > history. That's not so clear, actually! It turns out that PyPy was able to make their regular 'dict' implementation ordered, while at the same time making it faster and more memory-efficient compared to their previous (CPython-like) implementation: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html So in PyPy all these issues are automatically solved for free. The $1e6-question these other proposals have to answer is, why not do what PyPy did? Maybe there is a good reason not to, but it seems like it'll be difficult to get consensus on moving forward on any of these other more complicated proposals until someone has first made a serious attempt at porting PyPy's dict to CPython and is able to clearly describe why it didn't work. (3.5 does have a faster C implementation of OrderedDict, thanks to tireless efforts by Eric Snow -- https://bugs.python.org/issue16991 -- but this implementation uses a very different and less cache-efficient strategy than PyPy.) -n -- Nathaniel J. Smith -- http://vorpus.org From jim.baker at python.org Wed Dec 16 19:14:29 2015 From: jim.baker at python.org (Jim Baker) Date: Wed, 16 Dec 2015 17:14:29 -0700 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <5671B1BA.9040701@mgmiller.net> Message-ID: For Jython, ordered dict semantics for the dict type *could* possibly work. Currently, dict objects are backed by java.util.concurrent.ConcurrentHashMap, to get correct semantics with respect to possible del when iterating over the dict; and to provide volatile memory semantics, to match CPython's memory model, informal as it may be. Using CHM also likely helps with Jython's threading story. Note that we can not simply use a java.util.LinkedHashMap that has been wrapped with java.util.collections.synchronizedMap; at the very least we would get this behavior: The iterators returned by the iterator method of the collections returned > by all of this class's collection view methods are fail-fast: if the map is > structurally modified at any time after the iterator is created, in any way > except through the iterator's own remove method, the iterator will throw a > ConcurrentModificationException. Thus, in the face of concurrent > modification, the iterator fails quickly and cleanly, rather than risking > arbitrary, non-deterministic behavior at an undetermined time in the future. (http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html) But there is an alternative: Caffeine is an interesting project that has matured significantly since when I took a look at it last year ( https://github.com/ben-manes/caffeine). Caffeine implements a generally useful concurrent linked hash map which provides the necessary weak iteration semantics we need for Python compatibility; and it looks like Caffeine may have performance comparable to, or better than CHM (but not clear if that extends to map construction, currently a pretty heavy cost for Jython). Caffeine also builds on the implementation experience of Google Guava, which Jython currently uses extensively for internal runtime caches. So it's certainly worth exploring if this possible change for Python gets further interest - we will want to benchmark and really understand because dict/__dict__ support is one of the most critical aspects of good Python performance. - Jim On Wed, Dec 16, 2015 at 4:17 PM, Nathaniel Smith wrote: > On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller > wrote: > > > > On 2015-12-15 11:02, Paul Moore wrote: > >> > >> while it's a bit too special case on its own, one > >> possibility would be to have > >> > >> callable{k1: v1, k2: v2, ...} > >> > >> be syntactic sugar for > >> > >> callable([(k1, k1), (k2, v2), ...]) > > > > > > Very interesting... I've faced the issue several times over the years > when > > I've wanted to unpack values into a function call in an ordered manner > (but > > it hasn't been available). Perhaps:: > > > > callable{**odict} > > > > In fact with callables I'd even go so far as wish that ordered unpacking > was > > the default somehow, though I guess that probably isn't possible due to > > history. > > That's not so clear, actually! It turns out that PyPy was able to make > their regular 'dict' implementation ordered, while at the same time > making it faster and more memory-efficient compared to their previous > (CPython-like) implementation: > > > http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html > > So in PyPy all these issues are automatically solved for free. The > $1e6-question these other proposals have to answer is, why not do what > PyPy did? Maybe there is a good reason not to, but it seems like it'll > be difficult to get consensus on moving forward on any of these other > more complicated proposals until someone has first made a serious > attempt at porting PyPy's dict to CPython and is able to clearly > describe why it didn't work. > > (3.5 does have a faster C implementation of OrderedDict, thanks to > tireless efforts by Eric Snow -- https://bugs.python.org/issue16991 -- > but this implementation uses a very different and less cache-efficient > strategy than PyPy.) > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 17 00:12:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 16 Dec 2015 21:12:54 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <5671B1BA.9040701@mgmiller.net> Message-ID: <3E594BA1-8EAE-435F-BADE-F208E0C1D6BC@yahoo.com> On Dec 16, 2015, at 15:17, Nathaniel Smith wrote: > >> On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller wrote: >> >>> On 2015-12-15 11:02, Paul Moore wrote: >>> >>> while it's a bit too special case on its own, one >>> possibility would be to have >>> >>> callable{k1: v1, k2: v2, ...} >>> >>> be syntactic sugar for >>> >>> callable([(k1, k1), (k2, v2), ...]) >> >> >> Very interesting... I've faced the issue several times over the years when >> I've wanted to unpack values into a function call in an ordered manner (but >> it hasn't been available). Perhaps:: >> >> callable{**odict} >> >> In fact with callables I'd even go so far as wish that ordered unpacking was >> the default somehow, though I guess that probably isn't possible due to >> history. > > That's not so clear, actually! It turns out that PyPy was able to make > their regular 'dict' implementation ordered, while at the same time > making it faster and more memory-efficient compared to their previous > (CPython-like) implementation: > > http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html > > So in PyPy all these issues are automatically solved for free. The > $1e6-question these other proposals have to answer is, why not do what > PyPy did? You don't even need that; a dict that's ordered as long as you never delete from it or expand it after initial creation is good enough, and that may be simpler. (For example, Raymond Hettinger's two-table prototype guarantees this much, even though it isn't order-preserving on deletes, and it should be faster and more compact than the current design, although I don't know if anyone's proven that part, and it's the "dead-simple once you see it" kind of brilliant rather than the deep-magic kind.) The bigger problem is that any other Python implementation that uses some native (Java, .NET, JS, ...) structure to back its Python dicts will obviously need to either change to a different one or use a two-table structure like Raymond's--which may be a pessimization rather than an optimization and/or may break whatever thread-safety guarantees they were relying on. So, you'd need to make sure there's a good answer for every major implementation out there before changing the language to require them all to do it. Of course we'd also need to require that **kw unpacking happens in iteration order, and **kw collecting collects the keywords in the order passed, but both of those should be easy. (They may still be changes--an implementation might, say, reverse the order for a slight simplification or something--but they should be trivial compare to finding an order-preserving-at-least-until-mutation structure.) But assuming those are all reasonable, it seems like the easiest solution to the problem. From njs at pobox.com Thu Dec 17 00:36:19 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Dec 2015 21:36:19 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <3E594BA1-8EAE-435F-BADE-F208E0C1D6BC@yahoo.com> References: <5671B1BA.9040701@mgmiller.net> <3E594BA1-8EAE-435F-BADE-F208E0C1D6BC@yahoo.com> Message-ID: On Wed, Dec 16, 2015 at 9:12 PM, Andrew Barnert wrote: > On Dec 16, 2015, at 15:17, Nathaniel Smith wrote: >> >>> On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller wrote: [...] >> That's not so clear, actually! It turns out that PyPy was able to make >> their regular 'dict' implementation ordered, while at the same time >> making it faster and more memory-efficient compared to their previous >> (CPython-like) implementation: >> >> http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html >> >> So in PyPy all these issues are automatically solved for free. The >> $1e6-question these other proposals have to answer is, why not do what >> PyPy did? > > You don't even need that; a dict that's ordered as long as you never delete from it or expand it after initial creation is good enough, and that may be simpler. (For example, Raymond Hettinger's two-table prototype guarantees this much, even though it isn't order-preserving on deletes, and it should be faster and more compact than the current design, although I don't know if anyone's proven that part, and it's the "dead-simple once you see it" kind of brilliant rather than the deep-magic kind.) IIUC, the PyPy dict is exactly a fully-worked-out version of Raymond Hettinger's two-table design, and they claim that it is in fact faster and more compact than the current design, so I suppose one could argue that someone has indeed proven that part :-). -n -- Nathaniel J. Smith -- http://vorpus.org From abarnert at yahoo.com Thu Dec 17 01:00:40 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 06:00:40 +0000 (UTC) Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: <464042439.5767.1450332040235.JavaMail.yahoo@mail.yahoo.com> On Wednesday, December 16, 2015 9:36 PM, Nathaniel Smith wrote: > > On Wed, Dec 16, 2015 at 9:12 PM, Andrew Barnert > wrote: >> On Dec 16, 2015, at 15:17, Nathaniel Smith wrote: >>> >>>> On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller > wrote: > [...] >>> That's not so clear, actually! It turns out that PyPy was able to > make >>> their regular 'dict' implementation ordered, while at the same > time >>> making it faster and more memory-efficient compared to their previous >>> (CPython-like) implementation: >>> >>> > http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html >>> >>> So in PyPy all these issues are automatically solved for free. The >>> $1e6-question these other proposals have to answer is, why not do what >>> PyPy did? >> >> You don't even need that; a dict that's ordered as long as you > never delete from it or expand it after initial creation is good enough, and > that may be simpler. (For example, Raymond Hettinger's two-table prototype > guarantees this much, even though it isn't order-preserving on deletes, and > it should be faster and more compact than the current design, although I > don't know if anyone's proven that part, and it's the > "dead-simple once you see it" kind of brilliant rather than the > deep-magic kind.) > > IIUC, the PyPy dict is exactly a fully-worked-out version of Raymond > Hettinger's two-table design, and they claim that it is in fact faster > and more compact than the current design, so I suppose one could argue > that someone has indeed proven that part :-). According the blog post, some cases are slower, "for example when repeatedly adding and removing keys in equal number". For PyPy, that's obviously fine. But PyPy generally isn't worried about small performance regressions between PyPy versions for relatively uncommon edge cases, especially if the new code is still much faster than CPython, and when it enables "various optimization possibilities which we're going to explore in the near future", and so on. I don't know if the same applies for CPython. So it may be better to stick with Raymond's original design, which should be faster than 3.5 in all cases, not just most, and require less and simpler code, and still provide the guarantee we actually need here (which will hopefully be an easier requirement on the other implementations as well). As an aside, IIRC, the rejected "blist" proposal from a few years ago sped up every benchmark, and also provided all the guarantees we need here. (I think it used a flat a-list for tiny dicts, a variant B-tree for medium-sized dicts and all non-tiny literals, and a hash when you resize a dict beyond a certain cutoff, or something like that.) It's obviously more complicated than the Raymond design or the PyPy design, and was presumably rejected for a good reason, but it's more evidence that the requirement may not be too strenuous. From bruce at leban.us Thu Dec 17 02:03:12 2015 From: bruce at leban.us (Bruce Leban) Date: Wed, 16 Dec 2015 23:03:12 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: Message-ID: Getting back to the original issue of being able to initialize an OrderedDict with a literal syntax, one proposal is that all dicts are ordered. There are actually several intermediate steps between dicts are unordered and ordered and here are some: (1) Dict literals preserve order of the original keys. When a literal is iterated, its keys are returned in declaration order. If the dict is changed in any way, then no guarantee is made about iteration order. (2) Dict literals preserve order of the original keys. This order is preserved even if values are changed as long as no keys are added or deleted. If a key is added or removed, then no guarantee is made about iteration order. (3) Dict literals preserve order of the original keys. This order is preserved for all keys originally added but new keys may be returned in any order, even interspersed between original keys. The order of the new keys is not stable in that it may change anytime the dict is changed in any way. (4) Dict literals are really ordered dicts. For the case of initialization, all of the above make OrderedDict{k1: v1, k2: v2, ...}) work. If that use case is the primary motivator here, then just implementing option (1) seems sufficient. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ipipomme+python at gmail.com Thu Dec 17 04:40:33 2015 From: ipipomme+python at gmail.com (Alexandre Figura) Date: Thu, 17 Dec 2015 10:40:33 +0100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: Message-ID: Hi, I recently compared OrderedDict instances while writing unit tests, and discovered an interesting behavior. If I create two ordered dictionaries with the same keys/values in the same order, I observe that their values are not equal when I compare them. I recently asked a question about this on Stackoverflow: http://stackoverflow.com/questions/34312674/why-values-of-an-ordereddict-are-not-equal Moreover, another user observed that keys of ordered dictionaries are compared in an order insensitive way: http://stackoverflow.com/questions/34320600/why-does-the-ordereddict-keys-view-compare-order-insensitive Are there any reasons for such implementation choices? As it appears disturbing for many people, would it be possible to update these behaviors? Best Regards, Alexandre. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Thu Dec 17 06:19:42 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Thu, 17 Dec 2015 06:19:42 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: Message-ID: So the issues are: 1. OrderedDict().values() does not implement __eq__. It uses object equality, which means identity. 1a. dict().values() does not implement __eq__. 2. OrderedDict().keys().__eq__ does not respect order. I'd argue that keys() should be ordered comparisons, and values() could be. As a plus, this could be more efficient than unordered comparisons, since it's just return all(x == y for x, y in zip(self.keys(), other.keys())) instead of packing each into a set and comparing the sets. But what would be the point of comparing values views? On the other hand, I guess dict().values().__eq__ should stay unimplemented. What would it mean? MultiSet comparison? Values in general aren't even hashable, so they can't be stuck into a hash-based set structure. Maybe explicitly make it NotImplemented. On Thu, Dec 17, 2015 at 4:40 AM, Alexandre Figura wrote: > > Hi, > > I recently compared OrderedDict instances while writing unit tests, and > discovered an interesting behavior. If I create two ordered dictionaries > with the same keys/values in the same order, I observe that their values are > not equal when I compare them. > > I recently asked a question about this on Stackoverflow: > http://stackoverflow.com/questions/34312674/why-values-of-an-ordereddict-are-not-equal > > Moreover, another user observed that keys of ordered dictionaries are > compared in an order insensitive way: > http://stackoverflow.com/questions/34320600/why-does-the-ordereddict-keys-view-compare-order-insensitive > > Are there any reasons for such implementation choices? As it appears > disturbing for many people, would it be possible to update these behaviors? > > Best Regards, > Alexandre. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Thu Dec 17 11:57:21 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 08:57:21 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: Message-ID: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> On Dec 17, 2015, at 03:19, Franklin? Lee wrote: > > So the issues are: I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.) As it stands, you can guess: * since all mapping keys and items act like sets (and are Sets), they probably compare like sets * since there is no OrderedSet for OrderedDict's keys and values to act like, they probably don't compare like those and instead just compare like sets * since values are just generic collections, they probably just use generic identity comparison. But, even if that guess may happen to be right for all builtin and stdlib types, and all user types that rely on the ABCs as mixins, in all of the major implementations, it's still just a guess, not something I'd want to rely on in portable code. > 1. OrderedDict().values() does not implement __eq__. It uses object > equality, which means identity. > 1a. dict().values() does not implement __eq__. > > 2. OrderedDict().keys().__eq__ does not respect order. > > I'd argue that keys() should be ordered comparisons, and values() > could be. So what happens when you compare the keys, items, or values view from an OrderedDict against the view from another mapping type? Or, for keys and items, against another set type? If you leave that up to the whichever one is on the left, you get cases where a==b and b!=a. If you leave it up to the most derived type (by the usual __rspam__-like rules), that doesn't help anything, since dict_keys_view and odict_keys_view are unrelated except in sharing an abstract base class. And, worst of all, even if you contrive a way for one or the other to always win consistently, you get cases where a==b and b==c and a!=c. Also, overriding equality for ordered dict views might make sense, but what about ordering? I think it's pretty reasonable to expect that something that acts like a set and inherits from Set use < to mean subset, not lexicographical comparison. But then the interactions with other mapping views get even crazier. Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement. Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons? Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code? > As a plus, this could be more efficient than unordered > comparisons, since it's just > return all(x == y for x, y in zip(self.keys(), other.keys())) > instead of packing each into a set and comparing the sets. I think you want zip_longest(self.keys(), other.keys(), fill=object()) or equivalent; otherwise {1, 2} will be equal to {1, 2, 3} . Also, you don't need to pack self.keys() into a set; it already is a Set, and implements all of the immutable set operations as-is, without needing any conversions. But really, who cares which of two implementations is more efficient when they implement two completely different things? If set comparison is right, it doesn't matter that sequence comparison is faster, you still can't use it, and vice-versa. > But what > would be the point of comparing values views? > > On the other hand, I guess dict().values().__eq__ should stay > unimplemented. What would it mean? MultiSet comparison? Values in > general aren't even hashable, so they can't be stuck into a hash-based > set structure. Maybe explicitly make it NotImplemented. > > > On Thu, Dec 17, 2015 at 4:40 AM, Alexandre Figura > wrote: >> >> Hi, >> >> I recently compared OrderedDict instances while writing unit tests, and >> discovered an interesting behavior. If I create two ordered dictionaries >> with the same keys/values in the same order, I observe that their values are >> not equal when I compare them. >> >> I recently asked a question about this on Stackoverflow: >> http://stackoverflow.com/questions/34312674/why-values-of-an-ordereddict-are-not-equal >> >> Moreover, another user observed that keys of ordered dictionaries are >> compared in an order insensitive way: >> http://stackoverflow.com/questions/34320600/why-does-the-ordereddict-keys-view-compare-order-insensitive >> >> Are there any reasons for such implementation choices? As it appears >> disturbing for many people, would it be possible to update these behaviors? >> >> Best Regards, >> Alexandre. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From pierre.saveant at thalesgroup.com Thu Dec 17 12:12:40 2015 From: pierre.saveant at thalesgroup.com (SAVEANT Pierre) Date: Thu, 17 Dec 2015 18:12:40 +0100 Subject: [Python-ideas] Trailing-based state restoration Message-ID: <4008F9ABAF25EA469EA100EAC59FE290011DE7F024D6@THSONEA01CMS01P.one.grp> Hi, I got a proposal for a new feature in Python in order to perform hypothetical reasoning in a very simple way. Hypothetical reasoning is mainly used in problem solving search algorithms. In such a process, variables are temporally assigned and later restored to their preceding value. With the assumption that only a small fraction of the variables changes at the same time, it is more efficient to store undo information instead of copying the entire state. The basic data structure to handle these moves efficiently is called a trail which is mainly a stack with markers. Potentially this mechanism can be provided to any object type of Python. Only few lines are needed to implement the mechanism using the reflexive power of Python. A deeper implementation could hide the process in the internal data representation in order to offer this new feature in a seamless way. This new feature is activated with the specification of thee primitives: - push() - back() - assign(object, attribute, value) push : declare a new hypothetical state. back : restore the previous state. assign(object, attribute, value) : assign the value to the attribute of the object in the current state. In addition a primitive can be provided to go back to a specific state: backtrack(n): restore the state number n. EXAMPLES class Var: def __init__(self, value): self.value = value def __str__(self): return"{}".format(self.value) def test1(): V1 = Var(10) assert V1.value == 10 push() assign(V1, 'value', 100) assert V1.value == 100 back() assert V1.value == 10 def test2(): V1 = Var(0) for i in range(8): push() for j in range(5): assign(V1, 'value',(j+1)+i*5-1) assert V1.value == 39 assert current() == 8 backtrack(6) assert V1.value == 29 assert current() == 6 backtrack(4) assert V1.value == 19 assert current() == 4 backtrack(0) assert V1.value == 0 assert current() == 0 The assign procedure is used in place of the standard assignment denoted by the "=" operator. In a seamless integration the "=" assignment operator could be overridden when the variable has been previously declared as "recordable" (require a declaration primitive or a type declaration). Is anybody interested by this new feature? Pierre Sav?ant -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 17 12:50:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 09:50:52 -0800 Subject: [Python-ideas] Trailing-based state restoration In-Reply-To: <4008F9ABAF25EA469EA100EAC59FE290011DE7F024D6@THSONEA01CMS01P.one.grp> References: <4008F9ABAF25EA469EA100EAC59FE290011DE7F024D6@THSONEA01CMS01P.one.grp> Message-ID: <82750EBF-A8FE-4ADF-871A-76E561BBB585@yahoo.com> On Dec 17, 2015, at 09:12, SAVEANT Pierre wrote: > > Hi, > > I got a proposal for a new feature in Python in order to perform hypothetical reasoning in a very simple way. > Hypothetical reasoning is mainly used in problem solving search algorithms. > > In such a process, variables are temporally assigned and later restored to their preceding value. > With the assumption that only a small fraction of the variables changes at the same time, it is more efficient to store undo information instead of copying the entire state. > The basic data structure to handle these moves efficiently is called a trail which is mainly a stack with markers. > Potentially this mechanism can be provided to any object type of Python. > Only few lines are needed to implement the mechanism using the reflexive power of Python. > A deeper implementation could hide the process in the internal data representation in order to offer this new feature in a seamless way. > > This new feature is activated with the specification of thee primitives: > - push() > - back() > - assign(object, attribute, value) > > push : declare a new hypothetical state. > back : restore the previous state. > assign(object, attribute, value) : assign the value to the attribute of the object in the current state. > > In addition a primitive can be provided to go back to a specific state: > backtrack(n): restore the state number n. > > EXAMPLES > > class Var: > def __init__(self, value): > self.value = value > def __str__(self): return"{}".format(self.value) > > def test1(): > V1 = Var(10) > assert V1.value == 10 > push() > assign(V1, 'value', 100) > assert V1.value == 100 > back() > assert V1.value == 10 > > def test2(): > V1 = Var(0) > for i in range(8): > push() > for j in range(5): > assign(V1, 'value',(j+1)+i*5-1) > assert V1.value == 39 > assert current() == 8 > backtrack(6) > assert V1.value == 29 > assert current() == 6 > backtrack(4) > assert V1.value == 19 > assert current() == 4 > backtrack(0) > assert V1.value == 0 > assert current() == 0 > > The assign procedure is used in place of the standard assignment denoted by the "=" operator. It seems like all of that can be implemented in Python today (presumably storing the set of trailed Vars as a global or as a class attribute of Var, since there doesn't seem to be any scoping involved). So, why not just build it and put it on PyPI? > In a seamless integration the "=" assignment operator could be overridden when the variable has been previously declared as "recordable" (require a declaration primitive or a type declaration). That's one thing that couldn't be implemented in Python today, because = can't be overridden (it's a binding-and-possibly-declaring statement, not an assignment operator, in Python). But if you just made all these things attributes on some object, then you could hook its __setattr__. Plus, you could have different sets of trails in parallel, and implement whatever kind of scoping you want instead of everything being sort of global but sort of Python-scoped, and so on. > > Is anybody interested by this new feature? > > > Pierre Sav?ant > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 17 13:05:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 10:05:29 -0800 Subject: [Python-ideas] Trailing-based state restoration In-Reply-To: <82750EBF-A8FE-4ADF-871A-76E561BBB585@yahoo.com> References: <4008F9ABAF25EA469EA100EAC59FE290011DE7F024D6@THSONEA01CMS01P.one.grp> <82750EBF-A8FE-4ADF-871A-76E561BBB585@yahoo.com> Message-ID: On Dec 17, 2015, at 09:50, Andrew Barnert via Python-ideas wrote: > >> On Dec 17, 2015, at 09:12, SAVEANT Pierre wrote: >> >> Hi, >> >> I got a proposal for a new feature in Python in order to perform hypothetical reasoning in a very simple way. >> Hypothetical reasoning is mainly used in problem solving search algorithms. >> >> In such a process, variables are temporally assigned and later restored to their preceding value. >> With the assumption that only a small fraction of the variables changes at the same time, it is more efficient to store undo information instead of copying the entire state. >> The basic data structure to handle these moves efficiently is called a trail which is mainly a stack with markers. >> Potentially this mechanism can be provided to any object type of Python. >> Only few lines are needed to implement the mechanism using the reflexive power of Python. >> A deeper implementation could hide the process in the internal data representation in order to offer this new feature in a seamless way. >> >> This new feature is activated with the specification of thee primitives: >> - push() >> - back() >> - assign(object, attribute, value) >> >> push : declare a new hypothetical state. >> back : restore the previous state. >> assign(object, attribute, value) : assign the value to the attribute of the object in the current state. >> >> In addition a primitive can be provided to go back to a specific state: >> backtrack(n): restore the state number n. >> >> EXAMPLES >> >> class Var: >> def __init__(self, value): >> self.value = value >> def __str__(self): return"{}".format(self.value) >> >> def test1(): >> V1 = Var(10) >> assert V1.value == 10 >> push() >> assign(V1, 'value', 100) >> assert V1.value == 100 >> back() >> assert V1.value == 10 >> >> def test2(): >> V1 = Var(0) >> for i in range(8): >> push() >> for j in range(5): >> assign(V1, 'value',(j+1)+i*5-1) >> assert V1.value == 39 >> assert current() == 8 >> backtrack(6) >> assert V1.value == 29 >> assert current() == 6 >> backtrack(4) >> assert V1.value == 19 >> assert current() == 4 >> backtrack(0) >> assert V1.value == 0 >> assert current() == 0 >> >> The assign procedure is used in place of the standard assignment denoted by the "=" operator. > > It seems like all of that can be implemented in Python today (presumably storing the set of trailed Vars as a global or as a class attribute of Var, since there doesn't seem to be any scoping involved). So, why not just build it and put it on PyPI? > >> In a seamless integration the "=" assignment operator could be overridden when the variable has been previously declared as "recordable" (require a declaration primitive or a type declaration). > > That's one thing that couldn't be implemented in Python today, because = can't be overridden (it's a binding-and-possibly-declaring statement, not an assignment operator, in Python). But if you just made all these things attributes on some object, then you could hook its __setattr__. Plus, you could have different sets of trails in parallel, and implement whatever kind of scoping you want instead of everything being sort of global but sort of Python-scoped, and so on. Actually, reading things a bit more carefully, that's just as doable. Just define Var.__setattr__ to call assign and you're done. Now "v1.value = 10" gets trailed. (I'm a bit confused about why the "value" attribute is constructed at __init__ time from the constructor argument, but assign lets you assign any attribute you want even though you only "value". If you only want "value" to be trailed, skip the __setattr__ and make it a @property.) >> Is anybody interested by this new feature? >> >> >> Pierre Sav?ant >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Dec 17 13:37:09 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 10:37:09 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <567141F8.2090208@egenix.com> References: <20151213032416.GN3821@ando.pearwood.info> <567141F8.2090208@egenix.com> Message-ID: On Wed, Dec 16, 2015 at 2:50 AM, M.-A. Lemburg wrote: > Since requests to be able to access the order of values, > parameters and definitions in source code come up rather > often, perhaps it'd better to provide Python application > with a standard access mechanism to this order rather than > trying to push use of OrderedDict and the like into the > runtime, causing unnecessary performance overhead. > > The parser does have access to this information in the AST > and some of it is partially copied into code object attributes, > but there's no general purpose access to the information. > > Based on the source code order, you could do lots of > things, e.g. avoid hacks to map class attributes to > column definitions for ORMs, make it possible to write > OrderedDict(a=x, b=y) and have the literal order preserved, > have NamedTuple(a=x, b=y) work without additional tricks, > etc. > > What's important here is that the runtime performance > would not change. The code objects would > gain some additional tuples, which store the order of > the literals used in their AST, so only the memory > consumption would increase. +1 from me. That's essentially the goal of PEP 468. [1] While the proposed solution focuses on OrderedDict, note the various alternatives at the bottom of the PEP. Also note that OrderedDict now has a C implementation that doesn't suffer from the same performance penalty. [2] -eric [1] http://legacy.python.org/dev/peps/pep-0468/ [2] OrderedDict is actually faster for iteration, the same speed for other non-mutation operations, not much slower for most mutation operations, and 4x slower in the worst case. That is a drastic improvement over the pure Python OrderedDict. From ericsnowcurrently at gmail.com Thu Dec 17 13:46:26 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 10:46:26 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <5671B1BA.9040701@mgmiller.net> Message-ID: On Wed, Dec 16, 2015 at 3:17 PM, Nathaniel Smith wrote: > That's not so clear, actually! It turns out that PyPy was able to make > their regular 'dict' implementation ordered, while at the same time > making it faster and more memory-efficient compared to their previous > (CPython-like) implementation: > > http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html > > So in PyPy all these issues are automatically solved for free. The > $1e6-question these other proposals have to answer is, why not do what > PyPy did? Maybe there is a good reason not to, but it seems like it'll > be difficult to get consensus on moving forward on any of these other > more complicated proposals until someone has first made a serious > attempt at porting PyPy's dict to CPython and is able to clearly > describe why it didn't work. If I recall correctly, the PyPy implementation is either based on a proposal by Raymond Hettinger [1] or on the same concept. Note that I mention the approach as an alternative in PEP 468. -eric [1] original: https://mail.python.org/pipermail/python-dev/2012-December/123028.html revisited: https://mail.python.org/pipermail/python-dev/2013-May/126327.html From ericsnowcurrently at gmail.com Thu Dec 17 14:05:53 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 11:05:53 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> References: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> Message-ID: On Thu, Dec 17, 2015 at 8:57 AM, Andrew Barnert via Python-ideas wrote: > I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.) The views are documented as "set-like" and tied to the Set ABC. [1] So comparision operations should match the same semantics as for sets. [snip] > Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons? Yep. If you want to work with a sequence then you have to convert the OrderedDict to a list or other sequence. The same goes for the views, which *do* preserve order during iteration (e.g. "list(od.keys())"). > > Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code? I've spoken with folks that have use cases for OrderedDict-as-a-sequence, but they were always able to use something else that met their needs more directly anyway. -eric [1] https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects From abarnert at yahoo.com Thu Dec 17 14:07:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 11:07:16 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: References: <20151213032416.GN3821@ando.pearwood.info> <567141F8.2090208@egenix.com> Message-ID: <6B037426-B7ED-4854-8294-2A96677B6DAD@yahoo.com> On Dec 17, 2015, at 10:37, Eric Snow wrote: > >> On Wed, Dec 16, 2015 at 2:50 AM, M.-A. Lemburg wrote: >> Since requests to be able to access the order of values, >> parameters and definitions in source code come up rather >> often, perhaps it'd better to provide Python application >> with a standard access mechanism to this order rather than >> trying to push use of OrderedDict and the like into the >> runtime, causing unnecessary performance overhead. >> >> The parser does have access to this information in the AST >> and some of it is partially copied into code object attributes, >> but there's no general purpose access to the information. >> >> Based on the source code order, you could do lots of >> things, e.g. avoid hacks to map class attributes to >> column definitions for ORMs, make it possible to write >> OrderedDict(a=x, b=y) and have the literal order preserved, >> have NamedTuple(a=x, b=y) work without additional tricks, >> etc. >> >> What's important here is that the runtime performance >> would not change. The code objects would >> gain some additional tuples, which store the order of >> the literals used in their AST, so only the memory >> consumption would increase. > > +1 from me. That's essentially the goal of PEP 468. [1] While the > proposed solution focuses on OrderedDict, note the various > alternatives at the bottom of the PEP. Also note that OrderedDict now > has a C implementation that doesn't suffer from the same performance > penalty. [2] It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468. After all, if Python 3.6 changes to make all dicts ordered, or make dict literals preserve literal order at least until first mutation (or anything in between--as Bruce Leban pointed out, there are at least three sensible things in between rather than just the one I suggested, and of courseall of them are good enough here), or to stash AST links on code objects, etc., you should be able to show how PEP 468 only adds a trivial requirement to any implementation of Python 3.6, and then the concerns come down to "maybe on some implementations it would be slightly faster to construct the kwdict in reverse, but now it will have to not do that". Conversely if someone convinces everyone that there is no solution that works for dict literals, you'd probably need to explain why that same problem doesn't apply to kwargs to keep the PEP alive. > > -eric > > [1] http://legacy.python.org/dev/peps/pep-0468/ > [2] OrderedDict is actually faster for iteration, the same speed for > other non-mutation operations, not much slower for most mutation > operations, and 4x slower in the worst case. That is a drastic > improvement over the pure Python OrderedDict. IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem. From ericsnowcurrently at gmail.com Thu Dec 17 14:22:18 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 11:22:18 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> Message-ID: On Mon, Dec 14, 2015 at 6:58 AM, Serhiy Storchaka wrote: > Actually the current C implementation of OrderedDict uses an continuous > array for mapping an index in the base hashtable to a list node. It is > rebuild from a linked list when the base hashtable is rebuild. I where > planned to experiment with getting rid of linked list at all. Awesome! If I remember correctly, Antoine recommended something similar when I put up my initial OrderedDict implementation. -eric From ericsnowcurrently at gmail.com Thu Dec 17 14:30:17 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 11:30:17 -0800 Subject: [Python-ideas] Contiguous-array-based ordering for OrderedDict In-Reply-To: References: <27A2EEA3-94E1-4268-A5BF-09AD6F3E6D83@yahoo.com> <93C45270-5102-4CF7-9345-F7854D3C899F@yahoo.com> <0BD5BB3D-76D5-48FA-95F1-FCC5579BEBEF@yahoo.com> <1F45A7CC-A371-413F-BDAD-96A206B88BF4@yahoo.com> <25B2A339-C921-43DC-9F84-A259C11806EA@yahoo.com> <566F03D1.3040601@mail.de> <201512151453.tBFErKCb032452@fido.openend.se> Message-ID: On Tue, Dec 15, 2015 at 7:00 AM, Franklin? Lee wrote: > But alright. The rest of this message is the code as I had it when I > sent the original message. Be sure to run the full OrderedDict test suite. [1] Also, I posted a rudimentary benchmark when working on the C implementation of OrdereDict that may help you. [2] Finally, be sure to respect OrderedDict's invariants. You may find my notes useful. [3] -eric [1] (on default branch) make && ./python -m test.test_ordered_dict [2] "odict-speed.diff" on https://bugs.python.org/issue16991 [3] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l1 From leewangzhong+python at gmail.com Thu Dec 17 14:50:59 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Thu, 17 Dec 2015 14:50:59 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> References: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> Message-ID: On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert wrote: > * since all mapping keys and items act like sets (and are Sets), they probably compare like sets .items() aren't like sets. Or something is very wrong. >> 1. OrderedDict().values() does not implement __eq__. It uses object >> equality, which means identity. >> 1a. dict().values() does not implement __eq__. >> >> 2. OrderedDict().keys().__eq__ does not respect order. >> >> I'd argue that keys() should be ordered comparisons, and values() >> could be. > > So what happens when you compare the keys, items, or values view from an OrderedDict against the view from another mapping type? Or, for keys and items, against another set type? If you leave that up to the whichever one is on the left, you get cases where a==b and b!=a. If you leave it up to the most derived type (by the usual __rspam__-like rules), that doesn't help anything, since dict_keys_view and odict_keys_view are unrelated except in sharing an abstract base class. And, worst of all, even if you contrive a way for one or the other to always win consistently, you get cases where a==b and b==c and a!=c. If OrderedDict views raised NotImplemented, I believe the other view will then have the chance to try its own comparison. > Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement. > > Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons? OrderedDict itself does everything that you might not want its views to do. OrderedDict implements order-sensitive comparison. It breaks all the rules you can think of that order-sensitive comparisons can break. Transitivity is already a problem. "Who's comparing?" is already a problem, and it's made its choice ("Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries."). The question now is, should views be made consistent with the OrderedDict itself? There are three options: 1. Deprecate the OrderedDict ordered comparison, to make it consistent with the views. 2. Make the views consistent with the OrderedDict. 3. Do nothing. >> As a plus, this could be more efficient than unordered >> comparisons, since it's just >> return all(x == y for x, y in zip(self.keys(), other.keys())) >> instead of packing each into a set and comparing the sets. > > I think you want zip_longest(self.keys(), other.keys(), fill=object()) or equivalent; otherwise {1, 2} will be equal to {1, 2, 3} . I actually want return len(self.keys()) == len(other.keys()) and all(x == y for x, y in zip(self.keys(), other.keys())) This should be more efficient than set comparisons. It's a bonus, not a reason. From guido at python.org Thu Dec 17 14:55:30 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Dec 2015 11:55:30 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> Message-ID: So I think that not using the __eq__ method of the keys or values is wrong (dicts do use it), but there's a philosophical question: if two OrderedDicts have the same key/value pairs in a different order, should they be considered equal or not? (Only when we've answered this can we answer the question about keys/values/items). I'm not a frequent user of OrderedDict, so I don't have a good intuition, unfortunately. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Dec 17 14:57:42 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 17 Dec 2015 11:57:42 -0800 Subject: [Python-ideas] Dict literal use for custom dict classes In-Reply-To: <6B037426-B7ED-4854-8294-2A96677B6DAD@yahoo.com> References: <20151213032416.GN3821@ando.pearwood.info> <567141F8.2090208@egenix.com> <6B037426-B7ED-4854-8294-2A96677B6DAD@yahoo.com> Message-ID: On Thu, Dec 17, 2015 at 11:07 AM, Andrew Barnert wrote: > It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468. Agreed. It's just a simple matter of finding time. > IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem. Yeah, he said something like that. However, with the C implementation the memory usage is less. Compared to dict: * basically 8 extra pointers on the object [1] * an instance __dict__ * an array of pointers equal in length to the underlying dict's hash table * basically 4 pointers per item So, per the __sizeof__ method, an empty C OrderedDict uses 800 bytes in contrast to 280 bytes for dict. Each added item uses an additional 24 bytes. With 1000 items usage is 122720 bytes (compared to 49240 bytes for dict). With the pure Python OrderedDict, empty is 824 bytes, each item uses 152 bytes, and with 1000 items usage is 250744 bytes. -eric [1] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l481 From leewangzhong+python at gmail.com Thu Dec 17 15:38:21 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Thu, 17 Dec 2015 15:38:21 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> Message-ID: (Note: This is already the case. https://docs.python.org/3/library/collections.html#collections.OrderedDict """ Equality tests between OrderedDict objects are order-sensitive and are implemented as list(od1.items())==list(od2.items()). Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries. This allows OrderedDict objects to be substituted anywhere a regular dictionary is used. """ So you're asking whether to deprecate this behavior?) On Thu, Dec 17, 2015 at 2:55 PM, Guido van Rossum wrote: > So I think that not using the __eq__ method of the keys or values is wrong > (dicts do use it), but there's a philosophical question: if two OrderedDicts > have the same key/value pairs in a different order, should they be > considered equal or not? (Only when we've answered this can we answer the > question about keys/values/items). > > I'm not a frequent user of OrderedDict, so I don't have a good intuition, > unfortunately. > > -- > --Guido van Rossum (python.org/~guido) From abarnert at yahoo.com Thu Dec 17 16:34:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 17 Dec 2015 21:34:29 +0000 (UTC) Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: Message-ID: <692609138.319772.1450388069645.JavaMail.yahoo@mail.yahoo.com> On Thursday, December 17, 2015 11:50 AM, Franklin? Lee wrote: > > On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert > wrote: > >> * since all mapping keys and items act like sets (and are Sets), they > probably compare like sets > > .items() aren't like sets. Or something is very wrong. Yes they are. Look at the library documentation for dict and dict views in stdtypes, and in collections.abc. An items view is supposed to be set-like, and to be actually usable as a set if the values are hashable. (If the values aren't hashable, obviously most set operations are just going to raise.) And, as far as I can tell, nothing is very wrong. >>> 1. OrderedDict().values() does not implement __eq__. It uses object >>> equality, which means identity. >>> 1a. dict().values() does not implement __eq__. >>> >>> 2. OrderedDict().keys().__eq__ does not respect order. >>> >>> I'd argue that keys() should be ordered comparisons, and values() >>> could be. >> >> So what happens when you compare the keys, items, or values view from an > OrderedDict against the view from another mapping type? Or, for keys and items, > against another set type? If you leave that up to the whichever one is on the > left, you get cases where a==b and b!=a. If you leave it up to the most derived > type (by the usual __rspam__-like rules), that doesn't help anything, since > dict_keys_view and odict_keys_view are unrelated except in sharing an abstract > base class. And, worst of all, even if you contrive a way for one or the other > to always win consistently, you get cases where a==b and b==c and a!=c. > > If OrderedDict views raised NotImplemented, I believe the other view > will then have the chance to try its own comparison. Yes, but the proposal here is for OrderedDict and its views to implement something sequence-like, not to raise NotImplemented, so why is that relevant? >>> As a plus, this could be more efficient than unordered >>> comparisons, since it's just >>> return all(x == y for x, y in zip(self.keys(), other.keys())) >>> instead of packing each into a set and comparing the sets. >> >> I think you want zip_longest(self.keys(), other.keys(), fill=object()) or > equivalent; otherwise {1, 2} will be equal to {1, 2, 3} . > > I actually want > return len(self.keys()) == len(other.keys()) and all(x == y for x, > y in zip(self.keys(), other.keys())) Which is something equivalent--the only way that can return different results is if self.values() contains elements that declare themselves to be equal to everything, at which point they already break too many invariants to safely put in a container. From leewangzhong+python at gmail.com Thu Dec 17 17:19:59 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Thu, 17 Dec 2015 17:19:59 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <692609138.319772.1450388069645.JavaMail.yahoo@mail.yahoo.com> References: <692609138.319772.1450388069645.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Thu, Dec 17, 2015 at 4:34 PM, Andrew Barnert wrote: > On Thursday, December 17, 2015 11:50 AM, Franklin? Lee wrote: > >> > On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert >> wrote: >> >>> * since all mapping keys and items act like sets (and are Sets), they >> probably compare like sets >> >> .items() aren't like sets. Or something is very wrong. > > Yes they are. Look at the library documentation for dict and dict views in stdtypes, and in collections.abc. An items view is supposed to be set-like, and to be actually usable as a set if the values are hashable. (If the values aren't hashable, obviously most set operations are just going to raise.) And, as far as I can tell, nothing is very wrong. Hm. >>> x = {0: []} >>> y = {0: []} >>> x == y True >>> x.items() == y.items() True That last one doesn't seem set-like to me. But it seems I misunderstood what you were saying. Looking at the source code for ItemsView, "containment" is defined as "other[0] in self and self[other[0]] == other[1]". So yes, it's set-like, in that it checks for containment. I've just never thought about "containment of a key => value mapping". (It's funny, 'cause I've tried to develop the exact same idea in a different subject.) >>>> 1. OrderedDict().values() does not implement __eq__. It uses object >>>> equality, which means identity. >>>> 1a. dict().values() does not implement __eq__. >>>> >>>> 2. OrderedDict().keys().__eq__ does not respect order. >>>> >>>> I'd argue that keys() should be ordered comparisons, and values() >>>> could be. >>> >>> So what happens when you compare the keys, items, or values view from an >> OrderedDict against the view from another mapping type? Or, for keys and items, >> against another set type? If you leave that up to the whichever one is on the >> left, you get cases where a==b and b!=a. If you leave it up to the most derived >> type (by the usual __rspam__-like rules), that doesn't help anything, since >> dict_keys_view and odict_keys_view are unrelated except in sharing an abstract >> base class. And, worst of all, even if you contrive a way for one or the other >> to always win consistently, you get cases where a==b and b==c and a!=c. >> >> If OrderedDict views raised NotImplemented, I believe the other view >> will then have the chance to try its own comparison. > > Yes, but the proposal here is for OrderedDict and its views to implement something sequence-like, not to raise NotImplemented, so why is that relevant? I mean for them to raise NotImplemented in the case of "the other dict is not an instance of OrderedDict". Anyway, this is all a moot point. *If* they were to do something different from dict's views, then they should follow OrderedDict.__eq__. PS: To extend "is an OrderedDict a sequence?" in the wrong direction, I'll point out that OrderedDict.sort(key=None) has a clear and natural meaning, and the implementation should be obvious (relative to a given ODict implementation), for either a linkedlist or array ordering. And to go even further: OrderedDict.items().sort(key=None) also makes some sense. (Though why you would want to compare with respect to values...) From guido at python.org Thu Dec 17 16:33:50 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Dec 2015 13:33:50 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com> Message-ID: No, I was just responding to some speculation earlier in the thread. If this is already defined we should definitely not change it. On Thu, Dec 17, 2015 at 12:38 PM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > (Note: This is already the case. > https://docs.python.org/3/library/collections.html#collections.OrderedDict > """ > Equality tests between OrderedDict objects are order-sensitive and are > implemented as list(od1.items())==list(od2.items()). Equality tests > between OrderedDict objects and other Mapping objects are > order-insensitive like regular dictionaries. This allows OrderedDict > objects to be substituted anywhere a regular dictionary is used. > """ > > So you're asking whether to deprecate this behavior?) > > On Thu, Dec 17, 2015 at 2:55 PM, Guido van Rossum > wrote: > > So I think that not using the __eq__ method of the keys or values is > wrong > > (dicts do use it), but there's a philosophical question: if two > OrderedDicts > > have the same key/value pairs in a different order, should they be > > considered equal or not? (Only when we've answered this can we answer the > > question about keys/values/items). > > > > I'm not a frequent user of OrderedDict, so I don't have a good intuition, > > unfortunately. > > > > -- > > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vgr255 at live.ca Thu Dec 17 17:35:35 2015 From: vgr255 at live.ca (Emanuel Barry) Date: Thu, 17 Dec 2015 17:35:35 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: , , , <32A22D39-A434-49DC-9D3F-8F17D0471559@yahoo.com>, , , Message-ID: I may or may not have gotten this right, but I think he's asking "should that be the case?", not "is that the case?" - It is the case, but should it? I believe so. For one of my projects, I have a more complex mapping type, which under the hood is just an ODict that I do various operations on, and sometimes delegate to - for example, for equality tests against another instance of the same class, I just compare the underlying ODs. In that last case, I most certainly don't want two non-equal classes to compare equal! I'll ask the obvious question instead: "Should two lists compare unequal if they have the same items but not the same order?" However, the answer you want is "If I care about insertion and iteration order, there's a good chance I also care about that order when comparing, too" If you explicitly don't care about the order, it's quite easy to do so: >>> import collections>>> a=collections.OrderedDict([("spam", "eggs"), ("foo", "bar")])>>> b=collections.OrderedDict([("foo", "bar"), ("spam", "eggs")])>>> a == bFalse>>> dict.items(a) == dict.items(b)True -Emanuel > From: leewangzhong+python at gmail.com > Date: Thu, 17 Dec 2015 15:38:21 -0500 > To: guido at python.org > Subject: Re: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? > CC: python-ideas at python.org > > (Note: This is already the case. > https://docs.python.org/3/library/collections.html#collections.OrderedDict > """ > Equality tests between OrderedDict objects are order-sensitive and are > implemented as list(od1.items())==list(od2.items()). Equality tests > between OrderedDict objects and other Mapping objects are > order-insensitive like regular dictionaries. This allows OrderedDict > objects to be substituted anywhere a regular dictionary is used. > """ > > So you're asking whether to deprecate this behavior?) > > On Thu, Dec 17, 2015 at 2:55 PM, Guido van Rossum wrote: > > So I think that not using the __eq__ method of the keys or values is wrong > > (dicts do use it), but there's a philosophical question: if two OrderedDicts > > have the same key/value pairs in a different order, should they be > > considered equal or not? (Only when we've answered this can we answer the > > question about keys/values/items). > > > > I'm not a frequent user of OrderedDict, so I don't have a good intuition, > > unfortunately. > > > > -- > > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Dec 17 21:38:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Dec 2015 02:38:53 +0000 (UTC) Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: Message-ID: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> On Thursday, December 17, 2015 2:19 PM, Franklin? Lee wrote: > > On Thu, Dec 17, 2015 at 4:34 PM, Andrew Barnert > wrote: >> On Thursday, December 17, 2015 11:50 AM, Franklin? Lee > wrote: >> >>> > On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert > >>> wrote: >>> >>>> * since all mapping keys and items act like sets (and are Sets), > they >>> probably compare like sets >>> >>> .items() aren't like sets. Or something is very wrong. >> >> Yes they are. Look at the library documentation for dict and dict views in > stdtypes, and in collections.abc. An items view is supposed to be set-like, and > to be actually usable as a set if the values are hashable. (If the values > aren't hashable, obviously most set operations are just going to raise.) > And, as far as I can tell, nothing is very wrong. > > Hm. > > >>> x = {0: []} > >>> y = {0: []} > >>> x == y > True > >>> x.items() == y.items() > True > > That last one doesn't seem set-like to me. But it seems I > misunderstood what you were saying. Sure it's set-like > Looking at the source code for ItemsView, "containment" is defined as > "other[0] in self and self[other[0]] == other[1]". So yes, it's > set-like, in that it checks for containment. I've just never thought > about "containment of a key => value mapping". Yeah, that's the point: items views are effectively sets of items, which are key-value pairs. What else would they be sets of? Because values aren't necessarily hashable, it's unavoidable that some set operations may raise a TypeError (if the values aren't hashable) instead of doing what you'd want. But that doesn't mean all set operations that could possibly raise a TypeError always do so; that only happens when it's unavoidable. And here, it's avoidable. I suppose you could argue that this is a bit too clever to just assume without documenting, and another Python implementation might well have a dict items view that just directly tried hash(self) and raised here, so you really can't write any code that depends on this behavior. Maybe that's true. But that still doesn't seem like a problem with CPython's implementation; the question would just be how to change the docs (whether to require other implementations to do the same thing or to explicitly allow them to raise). > (It's funny, > 'cause > I've tried to develop the exact same idea in a different subject.) >>>>> 1. OrderedDict().values() does not implement __eq__. It uses > object >>>>> equality, which means identity. >>>>> 1a. dict().values() does not implement __eq__. >>>>> >>>>> 2. OrderedDict().keys().__eq__ does not respect order. >>>>> >>>>> I'd argue that keys() should be ordered comparisons, and > values() >>>>> could be. >>>> >>>> So what happens when you compare the keys, items, or values view > from an >>> OrderedDict against the view from another mapping type? Or, for keys > and items, >>> against another set type? If you leave that up to the whichever one is > on the >>> left, you get cases where a==b and b!=a. If you leave it up to the most > derived >>> type (by the usual __rspam__-like rules), that doesn't help > anything, since >>> dict_keys_view and odict_keys_view are unrelated except in sharing an > abstract >>> base class. And, worst of all, even if you contrive a way for one or > the other >>> to always win consistently, you get cases where a==b and b==c and a!=c. >>> >>> If OrderedDict views raised NotImplemented, I believe the other view >>> will then have the chance to try its own comparison. >> >> Yes, but the proposal here is for OrderedDict and its views to implement > something sequence-like, not to raise NotImplemented, so why is that relevant? > > I mean for them to raise NotImplemented in the case of "the other dict > is not an instance of OrderedDict". > > > > Anyway, this is all a moot point. *If* they were to do something > different from dict's views, then they should follow > OrderedDict.__eq__. Sure; any other option is terrible. In fact, both of these options are terrible--one breaks consistency with other mappings, and with the basic rules of comparison; the other breaks consistency with the other related types. Ick. Then again, they both have compelling arguments for them, and they're both pretty simple. I doubt anyone has any critical code that relies on the current behavior, but then I doubt anyone would write any critical code that relied on the other behavior if it were changed. To me, it looks like the best deciding factor is inertia. If it's worked the same way for years, it might as well keep working that way. Maybe add a note in the docs saying you shouldn't compare the things, especially to other mapping types' views, and what will happen if you do? From steve at pearwood.info Fri Dec 18 06:07:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Dec 2015 22:07:55 +1100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> Message-ID: <20151218110755.GH1609@ando.pearwood.info> On Fri, Dec 18, 2015 at 02:38:53AM +0000, Andrew Barnert via Python-ideas wrote: [...] > > Anyway, this is all a moot point. *If* they were to do something > > different from dict's views, then they should follow > > OrderedDict.__eq__. > > Sure; any other option is terrible. > > In fact, both of these options are terrible--one breaks consistency > with other mappings, and with the basic rules of comparison; the other > breaks consistency with the other related types. Ick. Then again, they > both have compelling arguments for them, and they're both pretty > simple. I doubt anyone has any critical code that relies on the > current behavior, but then I doubt anyone would write any critical > code that relied on the other behavior if it were changed. This thread has wandered over a fair bit of ground, so I've lost track of *precisely* what these options are. I think we're still debating the fact that OrderedDict *values* compare by ID (like arbitrary objects), rather than by value like items and keys. For example: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = a.copy() py> a.keys() == b.keys() True py> a.items() == b.items() True So KeysView and ItemsView compare by the value of the view. But ValuesView compare by ID, not value: py> a.values() == b.values() False This makes no sense to me. The same counter-intuitive behaviour occurs for dict ValuesView as well: py> dict(a).values() == dict(b).values() False I think that if a and b are the same Mapping type (say, both dicts, or both OrderedDicts), then there is an obvious and logical invariant(s). The following three expressions should be equivalent: (1) a == b (2) a.items() == b.items() (3) (a.keys() == b.keys()) and (a.values() == b.values()) ignoring the usual suspects like NANs and other "funny stuff". Am I missing something? Is there a rationale for ValuesView to compare by identity instead of value? It's not just equality that behaves strangely with ValuesView. Even when the values are unique and hash-like, they don't behave very "set-like": # KeysView and ItemsView are set-like py> a.keys() & b.keys() {'b', 'a'} py> a.items() | b.items() {('b', 2), ('a', 1)} # values are hashable, but ValuesViews are not set-like py> set(a.values()) | set(b.values()) {1, 2} py> a.values() | b.values() Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for |: 'ValuesView' and 'ValuesView' This is *especially* weird when one realises that ItemsView actually manages to be set-like even when the values are not hashable: py> c = odict(x=[]) py> c.items() & {} set() yet ValuesView isn't set-like when they are! -- Steve From guido at python.org Fri Dec 18 11:37:09 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Dec 2015 08:37:09 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151218110755.GH1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: ValuesView is not a set because there may be duplicates. But the identity thing feels odd. (Even though I designed this myself.) Maybe because values may not be comparable? --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Dec 18 12:19:46 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 18 Dec 2015 09:19:46 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151218110755.GH1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 3:07 AM, Steven D'Aprano wrote: > This makes no sense to me. The same counter-intuitive behaviour occurs > for dict ValuesView as well: OrderedDict re-uses dict's view types (but overrides __iter__). So it had better be the same behavior! :) -eric From abarnert at yahoo.com Fri Dec 18 14:58:25 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Dec 2015 11:58:25 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151218110755.GH1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: On Dec 18, 2015, at 03:07, Steven D'Aprano wrote: > > It's not just equality that behaves strangely with ValuesView. Even when > the values are unique and hash-like, they don't behave very "set-like": But values views are inherently multisets, not sets. Do you really want to say that the multisets {2, 3, 3} and {2, 2, 3} are equal because they have the same elements, even though they have different counts? And is {1, 1, 2} & {1, 1, 3} really just {1} rather than {1, 1}? For some uses of multisets, those rules make sense, but in general, they don't. (Check out how Counter handles the same questions.) If we had a general notion of multisets in Python, it might make sense to define values views and their behavior in terms of multisets. But defining them in terms of sets because that's sort of close and we have them doesn't make any sense. If you're thinking we could define what multisets should do, despite not having a standard multiset type or an ABC for them, and apply that to values views, the next question is how to do that in better than quadratic time for non-hashable values. (And you can't assume ordering here, either.) Would having a values view hang for 30 seconds and then come back with the answer you intuitively wanted instead of giving the wrong answer in 20 millis be an improvement? (Either way, you're going to learn the same lesson: don't compare values views. I'd rather learn that in 20 millis.) > # KeysView and ItemsView are set-like > py> a.keys() & b.keys() > {'b', 'a'} > py> a.items() | b.items() > {('b', 2), ('a', 1)} > > # values are hashable, but ValuesViews are not set-like > py> set(a.values()) | set(b.values()) > {1, 2} > py> a.values() | b.values() > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for |: 'ValuesView' and > 'ValuesView' > > This is *especially* weird when one realises that ItemsView actually > manages to be set-like even when the values are not hashable: > > py> c = odict(x=[]) > py> c.items() & {} > set() Try this: c = {1: []} d = {1: [], 2: []} c.items() < d.items() It can tell that one "virtual set" of key-value pairs is a subset even though they can't actually be represented as sets. The items views take "as set-like as possible" very seriously. From srkunze at mail.de Fri Dec 18 17:03:13 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 18 Dec 2015 23:03:13 +0100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: <567482A1.8070409@mail.de> On 18.12.2015 20:58, Andrew Barnert via Python-ideas wrote: > If you're thinking we could define what multisets should do, despite > not having a standard multiset type or an ABC for them, and apply that > to values views, the next question is how to do that in better than > quadratic time for non-hashable values. (And you can't assume ordering > here, either.) Would having a values view hang for 30 seconds and then > come back with the answer you intuitively wanted instead of giving the > wrong answer in 20 millis be an improvement? (Either way, you're going > to learn the same lesson: don't compare values views. I'd rather learn > that in 20 millis.) I like the multiset/bag idea. Python calls them Counter, right? Best, Sven From leewangzhong+python at gmail.com Fri Dec 18 17:59:39 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 18 Dec 2015 17:59:39 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <567482A1.8070409@mail.de> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> Message-ID: On Fri, Dec 18, 2015 at 5:03 PM, Sven R. Kunze wrote: > On 18.12.2015 20:58, Andrew Barnert via Python-ideas wrote: >> >> If you're thinking we could define what multisets should do, despite not >> having a standard multiset type or an ABC for them, and apply that to values >> views, the next question is how to do that in better than quadratic time for >> non-hashable values. (And you can't assume ordering here, either.) Would >> having a values view hang for 30 seconds and then come back with the answer >> you intuitively wanted instead of giving the wrong answer in 20 millis be an >> improvement? (Either way, you're going to learn the same lesson: don't >> compare values views. I'd rather learn that in 20 millis.) > > > I like the multiset/bag idea. > > Python calls them Counter, right? > > > Best, > Sven Counter would require hashable values. Any efficient multibag concept, in fact, would. Quadratic multibag comparisons would run into trouble with custom equality. # Pretending that kwargs is ordered. a0 = dict(x=0, y=1) a1 = a0 b0 = OrderedDict(x=0, y=1) b1 = OrderedDict(y=1, x=0) d0 = {'foo': a0, 'bar': b0} d1 = {'foo': b1, 'bar': a1} If we compare a0 == a1 and b0 == b1, then it fails. If we compare a0 == b1 and b0 == a1, then it passes. The order of comparisons matter. I see two options: - comparison is explicitly NotImplemented. Any code that used it should've used `is`. - comparison respects keys. OrderedDict values() comparison makes some sense, but its options would be - comparison is sequential. - comparison respects keys. From steve at pearwood.info Fri Dec 18 21:34:05 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Dec 2015 13:34:05 +1100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: <20151219023405.GK1609@ando.pearwood.info> On Fri, Dec 18, 2015 at 08:37:09AM -0800, Guido van Rossum wrote: > ValuesView is not a set because there may be duplicates. But the identity > thing feels odd. (Even though I designed this myself.) Maybe because values > may not be comparable? Right, that makes sense now, and it's even documented that value views are not treated as sets: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects I'm not sure what you mean by "values may not be comparable"? Since we're only talking about equality, aren't all values comparable? -- Steve From stephen at xemacs.org Fri Dec 18 22:24:54 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Dec 2015 12:24:54 +0900 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: <22132.52742.936973.127650@turnbull.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > Would having a values view hang for 30 seconds and then come back > with the answer you intuitively wanted instead of giving the wrong > answer in 20 millis be an improvement? (Either way, you're going to > learn the same lesson: don't compare values views. I'd rather learn > that in 20 millis.) I don't think this is an appropriate argument. I don't check every computation my Python programs do. There's a good chance it would take years to learn not to compare values views because it's sometimes wrong (eg, if in my use case most of the time view_a is view_b or not equal according to my desired definition). OTOH, I do check my watch many times a day, and am very protective of my time; I would notice a 30-second hang in that case. Speaking of 30-second hangs, I really wish you'd trim (the alternative is to stop reading your posts, and I really don't want to do that either). If you won't trim, please top-post, at least for those posts which have S/N ratios on the order of 10^-2 (which have been frequent recently). From leewangzhong+python at gmail.com Fri Dec 18 22:28:52 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 18 Dec 2015 22:28:52 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219023405.GK1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 9:34 PM, Steven D'Aprano wrote: > On Fri, Dec 18, 2015 at 08:37:09AM -0800, Guido van Rossum wrote: > >> ValuesView is not a set because there may be duplicates. But the identity >> thing feels odd. (Even though I designed this myself.) Maybe because values >> may not be comparable? > > Right, that makes sense now, and it's even documented that value views > are not treated as sets: > > https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects > > > I'm not sure what you mean by "values may not be comparable"? Since > we're only talking about equality, aren't all values comparable? > > > -- > Steve See my example in the other email (https://mail.python.org/pipermail/python-ideas/2015-December/037498.html). That's a case where the order of comparison matters, so you can't do a conceptual "unordered comparison" without, in the worst case, comparing everything to everything else. This is due to custom __eq__ (by OrderedDict, for irony's sake): a == b and b == c does not mean a == c. From steve at pearwood.info Fri Dec 18 22:38:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Dec 2015 14:38:36 +1100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> Message-ID: <20151219033836.GL1609@ando.pearwood.info> On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote: > I see two options: > - comparison is explicitly NotImplemented. Any code that used it > should've used `is`. We're still talking about equality between Mapping.values() views, correct? I strongly dislike that option. I don't know of any std lib or built-in object which tries to prohibit equality comparisons. Of course your own custom classes can do anything they like, including rather silly things: class Silly: def __eq__(self, other): x = random.choice([0, 1, 2]) if x == 2: raise ValueError return bool(x) but I think that people expect that equality tests should always succeed, even if it falls back on the default object behaviour (namely identity comparison). > - comparison respects keys. I'm not sure I understand what this means. If we have two dicts: a = {1: None, 2: None} b = {1: None, 3: None} are you suggesting that `a.values() == b.values()` should return False because the KEYS {1, 2} and {1, 3} are different? "ValuesViews implement equality by comparing identity, because everything else is too hard" might not be useful, but at least it makes sense as an explanation. Whereas "ValuesViews implement equality by comparing keys" sounds like something I might have read in "PHP, A Fractal Of Bad Design" :-) -- Steve From leewangzhong+python at gmail.com Fri Dec 18 22:43:24 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 18 Dec 2015 22:43:24 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219033836.GL1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> <20151219033836.GL1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 10:38 PM, Steven D'Aprano wrote: > On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote: > >> I see two options: >> - comparison is explicitly NotImplemented. Any code that used it >> should've used `is`. > > We're still talking about equality between Mapping.values() views, > correct? > > I strongly dislike that option. I don't know of any std lib or built-in > object which tries to prohibit equality comparisons. Of course your own > custom classes can do anything they like, including rather silly things: > > class Silly: > def __eq__(self, other): > x = random.choice([0, 1, 2]) > if x == 2: > raise ValueError > return bool(x) > > but I think that people expect that equality tests should always > succeed, even if it falls back on the default object behaviour (namely > identity comparison). First, failing fast. I see this as a silent error waiting to happen. Second, "NotImplemented" allows the other side to try its __eq__. >> - comparison respects keys. > > I'm not sure I understand what this means. If we have two dicts: > > a = {1: None, 2: None} > b = {1: None, 3: None} > > are you suggesting that `a.values() == b.values()` should return False > because the KEYS {1, 2} and {1, 3} are different? Yes, I'm just saying that's an option. From steve at pearwood.info Fri Dec 18 22:57:50 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Dec 2015 14:57:50 +1100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> Message-ID: <20151219035749.GM1609@ando.pearwood.info> On Fri, Dec 18, 2015 at 10:28:52PM -0500, Franklin? Lee wrote: > On Fri, Dec 18, 2015 at 9:34 PM, Steven D'Aprano wrote: > > On Fri, Dec 18, 2015 at 08:37:09AM -0800, Guido van Rossum wrote: > > > >> ValuesView is not a set because there may be duplicates. But the identity > >> thing feels odd. (Even though I designed this myself.) Maybe because values > >> may not be comparable? > > > > Right, that makes sense now, and it's even documented that value views > > are not treated as sets: > > > > https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects > > > > > > I'm not sure what you mean by "values may not be comparable"? Since > > we're only talking about equality, aren't all values comparable? > > > > > > -- > > Steve > > See my example in the other email > (https://mail.python.org/pipermail/python-ideas/2015-December/037498.html). > That's a case where the order of comparison matters, so you can't do a > conceptual "unordered comparison" without, in the worst case, > comparing everything to everything else. This is due to custom __eq__ > (by OrderedDict, for irony's sake): a == b and b == c does not mean a > == c. I don't know what Guido means by "values might not be comparable", but your example is lack of transitivity. Mathematical equality is transitive: if a == b, and b == c, then a == c. But that doesn't extend to non-numeric concepts of equality, e.g. preference ordering, or other forms of ranking. Since Python __eq__ can be overridden, we cannot assume that equality of arbitrary objects is necessarily transitive. And indeed, even with Mappings they are not: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = dict(a) py> c = odict([('b', 2), ('a', 1)]) py> a == b == c True py> a == c False -- Steve From leewangzhong+python at gmail.com Fri Dec 18 23:07:16 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 18 Dec 2015 23:07:16 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219035749.GM1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> <20151219035749.GM1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 10:57 PM, Steven D'Aprano wrote: >> See my example in the other email >> (https://mail.python.org/pipermail/python-ideas/2015-December/037498.html). >> That's a case where the order of comparison matters, so you can't do a >> conceptual "unordered comparison" without, in the worst case, >> comparing everything to everything else. This is due to custom __eq__ >> (by OrderedDict, for irony's sake): a == b and b == c does not mean a >> == c. > > I don't know what Guido means by "values might not be comparable", but > your example is lack of transitivity. > > Mathematical equality is transitive: if a == b, and b == c, then a == c. > But that doesn't extend to non-numeric concepts of equality, e.g. > preference ordering, or other forms of ranking. Since Python __eq__ can > be overridden, we cannot assume that equality of arbitrary objects is > necessarily transitive. And indeed, even with Mappings they are not: > > py> from collections import OrderedDict as odict > py> a = odict([('a', 1), ('b', 2)]) > py> b = dict(a) > py> c = odict([('b', 2), ('a', 1)]) > py> a == b == c > True > py> a == c > False Well, that's my point AND my example. If this were lists, then a lack of transitivity of elements would mean a lack of transitivity for lists of those elements. You get what you put in. But since dict views are unordered collections, a lack of transitivity for elements would mean INCONSISTENCY: the comparison of two views as multisets would depend on the exact order of comparison. Unless you are willing to compare everything to everything else in the worst case. From steve at pearwood.info Fri Dec 18 23:10:11 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Dec 2015 15:10:11 +1100 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> <20151219033836.GL1609@ando.pearwood.info> Message-ID: <20151219041010.GN1609@ando.pearwood.info> On Fri, Dec 18, 2015 at 10:43:24PM -0500, Franklin? Lee wrote: > On Fri, Dec 18, 2015 at 10:38 PM, Steven D'Aprano wrote: > > On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote: > > > >> I see two options: > >> - comparison is explicitly NotImplemented. Any code that used it > >> should've used `is`. > > > > We're still talking about equality between Mapping.values() views, > > correct? [...] > First, failing fast. I see this as a silent error waiting to happen. Franklin, could you please try to be a little bit more explicit about what "it" and "this" is when you describe something? I find it very hard to understand what *specific* thing you are referring to when you refer to it using short-hand. You say that "this" is a silent error waiting to happen. Does "this" refer to your suggestion that "comparison is explicitly NotImplemented", or something else? I can't tell. > Second, "NotImplemented" allows the other side to try its __eq__. Okay, that makes better sense. I tought you meant that Mapping.ValuesView *did not implement* an __eq__ method, so that it raised an exception if you tried to compare them: # What I thought you meant a.values() == b.values() => raises an exception Now I understand that you mean they should return NotImplemented instead. But in practice, that doesn't actually change the behaviour that much: if both sides return NotImplemented, Python will fall back to the default behaviour, which is identity. py> class A: ... def __eq__(self, other): ... return NotImplemented ... py> a = A() py> b = A() py> a == a True py> a == b False Returning NotImplemented just allows the other argument a chance to be called, but that already happens! py> class Spam: ... def __eq__(self, other): ... print("calling Spam.__eq__") ... return True ... py> a = odict() py> b = Spam() py> a == b calling Spam.__eq__ True py> b == a calling Spam.__eq__ True So no change there. We already have that behaviour. -- Steve From leewangzhong+python at gmail.com Fri Dec 18 23:58:26 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 18 Dec 2015 23:58:26 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219041010.GN1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> <20151219033836.GL1609@ando.pearwood.info> <20151219041010.GN1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 11:10 PM, Steven D'Aprano wrote: > On Fri, Dec 18, 2015 at 10:43:24PM -0500, Franklin? Lee wrote: >> On Fri, Dec 18, 2015 at 10:38 PM, Steven D'Aprano wrote: >> > On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote: >> > >> >> I see two options: >> >> - comparison is explicitly NotImplemented. Any code that used it >> >> should've used `is`. >> > >> > We're still talking about equality between Mapping.values() views, >> > correct? > [...] >> First, failing fast. I see this as a silent error waiting to happen. > > Franklin, could you please try to be a little bit more explicit about > what "it" and "this" is when you describe something? I find it very hard > to understand what *specific* thing you are referring to when you refer > to it using short-hand. > > You say that "this" is a silent error waiting to happen. Does "this" > refer to your suggestion that "comparison is explicitly NotImplemented", > or something else? I can't tell. Sorry, bad habit. I see dict().values().__eq__ as an error. >> Second, "NotImplemented" allows the other side to try its __eq__. > > Okay, that makes better sense. I tought you meant that > Mapping.ValuesView *did not implement* an __eq__ method, so that it > raised an exception if you tried to compare them: > > # What I thought you meant > a.values() == b.values() > => raises an exception > > > Now I understand that you mean they should return NotImplemented > instead. But in practice, that doesn't actually change the behaviour > that much: if both sides return NotImplemented, Python will fall back to > the default behaviour, which is identity. I didn't realize this. Since explicitly raising NotImplemented would cause it NOT to try the reflected method, I withdraw the suggestion of making the comparison fail in any way that it doesn't already. From guido at python.org Fri Dec 18 23:59:29 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Dec 2015 20:59:29 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219035749.GM1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> <20151219035749.GM1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 7:57 PM, Steven D'Aprano wrote: > I don't know what Guido means by "values might not be comparable", but > your example is lack of transitivity. > I mean cases where == actually raises an exception. In Python 2 we did this for comparing certain 8-bit strings to unicode strings. While we've fixed that particular issue (in 3, bytes(...) == str(...) always returns False), it's still possible, and in fact even some stdlib modules do this -- I know certain comparisons of naive and tz-aware datetimes do this (which is not the same as returning NotImplemented). However for full disclosure I should add that until just now I had misunderstood the complaint about values() -- it doesn't compare the values by identity, values views themselves are only compared by identity. But (how's this for a nice recovery :-) that's how comparing dict.values() works, and it's reasonable given how expensive it would be to make it work otherwise (the corresponding ABCs behave the same way). The real oddity is that an OrderedDict's keys and items views don't take order into account, even though comparing the OrderedDict objects themselves does use the order. This seems to be laziness of implementation (just inheriting most of the view implementation from dict) rather than based on some careful design consideration. Unless I missed the consideration (I wasn't involved in the design of OrderedDict at all TBH). (And FWIW if we do fix this I might be amenable to changing the way values views are compared, but *not* by taking the keys into account. It's either by identity of the view object, or by comparing the values in order.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Dec 19 00:02:07 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 00:02:07 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <567482A1.8070409@mail.de> <20151219033836.GL1609@ando.pearwood.info> <20151219041010.GN1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 11:58 PM, Franklin? Lee wrote: > Sorry, bad habit. I see dict().values().__eq__ as an error. I mean that I see comparison of dict values views as an error. That it shouldn't have been done. But I can see where it would be used. So I withdraw that, too. From guido at python.org Sat Dec 19 00:03:40 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Dec 2015 21:03:40 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> <20151219035749.GM1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 8:07 PM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > If this were lists, then a lack of transitivity of elements would mean > a lack of transitivity for lists of those elements. You get what you > put in. > > But since dict views are unordered collections, a lack of transitivity > for elements would mean INCONSISTENCY: the comparison of two views as > multisets would depend on the exact order of comparison. Unless you > are willing to compare everything to everything else in the worst > case. > Honestly, I think it's fine if an operation like == for a collection uses an algorithm that just assumes the items' comparison is transitive, and if it isn't, you still get what you put it (i.e. it's still like a sewer :-). It's the same with sort() -- if the comparison is unreasonable the sort still terminates, just not with the items in sorted order. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Dec 19 01:22:04 2015 From: random832 at fastmail.com (Random832) Date: Sat, 19 Dec 2015 01:22:04 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: Andrew Barnert writes: > If you're thinking we could define what multisets should do, despite > not having a standard multiset type or an ABC for them, and apply that > to values views, the next question is how to do that in better than > quadratic time for non-hashable values. Why aren't all values hashable? When I think about it, it's a bit strange that "hashable" is so wrapped up in immutability for no other reason than the idea that it's not safe to use a mutable object as a dictionary key if it is changed while it is in the dictionary. Java doesn't have this restriction. And while the reasoning is certainly defensible in isolation, it sort of goes against the "consenting adults" principle that is used to justify all the _other_ dangerous/questionable things that Python doesn't bother putting technical obstacles in front of. Why not demand that all objects (except NaN?) can be hashed, and that their hash shall match the equality relationship they define, and that all objects can safely be used as set members and dictionary keys so long as they are not in fact changed while in such a position? Certainly we can't put technical restrictions in the way of defining a __hash__ method that raises an exception, but we can say that if they try to use such an object in a dictionary and compare their values views they're on their own. For a somewhat more conservative path forward, define a new method __vhash__ which will always return a value (by default based on the object identity), and __hash__ shall return either the same number as __vhash__, or raise an exception if the object is not guaranteed as "immutable" for the purpose of equality comparison. From stephen at xemacs.org Sat Dec 19 02:20:51 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Dec 2015 16:20:51 +0900 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: <20151219035749.GM1609@ando.pearwood.info> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <20151219023405.GK1609@ando.pearwood.info> <20151219035749.GM1609@ando.pearwood.info> Message-ID: <22133.1363.153778.823295@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I don't know what Guido means by "values might not be comparable", Guido evidently meant something more pragmatic, but as I see it, values of different types are in principle incomparable, and the odict == dict == odict example shows why: satisfying the equality definitions of different types simultaneously is often impossible. But if you transform them to a common type or unify them in a union type, I think it's reasonable to expect that the common type will implement equality as an equivalence relation. (As you indirectly mentioned yourself, that's why we accept objects like NaN only as a last resort in preserving compatibility with a truly important standard we can't change.) In Python, the TOOWTDI common type for fallback is object, and that works well enough as a default implementation of __eq__ (ie, "is") that allows for equality comparison across types, which can be useful. > but your example is lack of transitivity. > Mathematical equality is transitive: if a == b, and b == c, then a == c. > But that doesn't extend to non-numeric concepts of equality, e.g. > preference ordering, or other forms of ranking. Indifference (the notion of "equality" that applies to preference, at least in economics) in practice is always an equivalence; all use cases I know of require this (many strengthen it to actual equality, ie, requiring antisymmetry). I can't think of a case where I would consider any equality relation deliberately defined as not an equivalence to be anything but perverse. So I'm not sure what you're trying to say here. I guess you mean that as a pragmatic matter, a programming language may allow perverse definitions, and of course there may be artifacts of particular definitions such that objects of types that should always be considered unequal might compare equal. I suppose there are "consenting adults" cases where it's convenient for a particular use case to use a perverse definition. And yes, I consider "fall back to the less restrictive definition of equality", as done by OrderedDict and dict in the example, to be perverse. From guido at python.org Sat Dec 19 03:09:06 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Dec 2015 00:09:06 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: On Fri, Dec 18, 2015 at 10:22 PM, Random832 wrote: > Andrew Barnert writes: > > If you're thinking we could define what multisets should do, despite > > not having a standard multiset type or an ABC for them, and apply that > > to values views, the next question is how to do that in better than > > quadratic time for non-hashable values. > > Why aren't all values hashable? > > When I think about it, it's a bit strange that "hashable" is so > wrapped up in immutability for no other reason than the idea > that it's not safe to use a mutable object as a dictionary key > if it is changed while it is in the dictionary. Java doesn't > have this restriction. > > And while the reasoning is certainly defensible in isolation, it > sort of goes against the "consenting adults" principle that is > used to justify all the _other_ dangerous/questionable things > that Python doesn't bother putting technical obstacles in front > of. > > Why not demand that all objects (except NaN?) can be hashed, and > that their hash shall match the equality relationship they > define, and that all objects can safely be used as set members > and dictionary keys so long as they are not in fact changed > while in such a position? Certainly we can't put technical > restrictions in the way of defining a __hash__ method that > raises an exception, but we can say that if they try to use such > an object in a dictionary and compare their values views they're > on their own. > > For a somewhat more conservative path forward, define a new > method __vhash__ which will always return a value (by default > based on the object identity), and __hash__ shall return either > the same number as __vhash__, or raise an exception if the > object is not guaranteed as "immutable" for the purpose of > equality comparison. > The link between hashing and immutability is because objects whose hash would change are common, e.g. lists, and using them as dict keys would be very hard to debug for users most likely to make this mistake. The issue is that the dict implementation makes it impossible to find back keys whose hash has changed, other than by linear search, which is unacceptable -- but that's exactly what users will try to debug such issues, i.e., print the dict and notice that the missing key is indeed present. The consenting adults rule typically applies to things that are well hidden or marked (e.g. using __dunder__ names). There are plenty of things that Python could allow but doesn't, not because they are hard to implement or would violate an invariant of the interpreter, but because they could trip over naive users. Note that you are turning things upside down: the question "why aren't all things hashable" came about because Andrew was considering making a hash table of the values of a dict. But the real question here isn't "why aren't all things hashable" but "why can't you put mutable values into a set". The answer to the latter is because when we put a value into a container, and later the value changes, we can't tell the container, so if the container has any sort of lookup scheme other than linear search, it would lose track of the value. Hashing comes into play because all of Python's common data structures use hashing to optimize lookup -- but if we used a different data structure, e.g. something based on sorting the keys, we'd still have the mutability problem. And we'd have worse problems, because values would have to be sortable, which is a stricter condition than being immutable. In any case, you can't solve this problem by making all values hashable. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Dec 19 03:16:22 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 19 Dec 2015 19:16:22 +1100 Subject: [Python-ideas] Why can't you put mutable values in a set? (was: Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected?) References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: <858u4q3lbd.fsf_-_@benfinney.id.au> Guido van Rossum writes: > The link between hashing and immutability is because objects whose > hash would change are common, e.g. lists, and using them as dict keys > would be very hard to debug for users most likely to make this > mistake. [?] > > [?] But the real question here isn't "why aren't all things hashable" > but "why can't you put mutable values into a set". [?] > > Hashing comes into play because all of Python's common data structures > use hashing to optimize lookup -- but if we used a different data > structure, e.g. something based on sorting the keys, we'd still have > the mutability problem. And we'd have worse problems, because values > would have to be sortable, which is a stricter condition than being > immutable. > > In any case, you can't solve this problem by making all values hashable. That was a great explanation; you answered several points on which I was vague, and you addressed some things I didn't even know were problems. I'd love to see that edited to a blog post we can reference in a single article, if you have the time. -- \ ?I went to the museum where they had all the heads and arms | `\ from the statues that are in all the other museums.? ?Steven | _o__) Wright | Ben Finney From epsilonmichael at gmail.com Sat Dec 19 07:01:40 2015 From: epsilonmichael at gmail.com (Michael Mitchell) Date: Sat, 19 Dec 2015 04:01:40 -0800 Subject: [Python-ideas] Buffering iterators? In-Reply-To: References: Message-ID: Have you considered doing this at the plain Python level? Something such as the following would have the desired semantics from my understanding. def buffered_iterator(it, size): while True: buffer = [next(it) for _ in range(size)] for element in buffer: yield element As for whether something like this can achieve optimizations by utilizing things such as locality is entirely dependent on the implementation of the runtime. It doesn't make much sense to me to maintain runtime level support for such an edge use case as it would be yet another burden on every implementation of Python. On Tue, Dec 15, 2015 at 2:04 AM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > (This would be a lot of work that I wouldn't know how to do, but is it > worth thinking about? Maybe it's already been done at the level > necessary. > > Also, this is a proposal for the sake of theoretical optimization, in > case you don't like that, and it will require a lot of work in a lot > of code everywhere. > > As I see it, even if it's possible and desirable to do this, it would > take years of work and testing to make it beneficial.) > > The move from Python 2 (disclaimer: which I barely touched, so I have > little sentimental attachment) to Python 3 resulted in many functions > returning iterators instead of lists. This saves a lot of unnecessary > memory when iterating, say, over the indices of a large list, > especially if we break in the middle. > > I'm wondering, though, about a step backwards: generating values > before they're needed. The idea is based on file buffered reading and > memory prefetching > ( > https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#DDR_SDRAM_prefetch_architecture > ). > In fact, I'm hoping to take advantage of such things. > > For example, in `sum(lst[i] * i for i in range(10000))`, `sum` will > exhaust the iterator, so it can ask the generator to return buffers, > and it will internally read the elements off the lists. > > It would be the responsibility of the iterator to decide whether to > respect the request, and to determine the size of the buffer. It would > be the responsibility of the consumer to request it, and consumers > should only request it if they think they'll almost definitely consume > a lot at a time. > > The idea is, especially for complex nested iterators, instead of > running A B C A B C A B C..., where each is the code for generating a > next thing from the previous, that the interpreter runs A A A A A..., > B B B B B..., C C C..., which could mean a lot more memory locality in > both instructions and objects. > > There's the possibility that a function has side-effects, so buffering > would have different semantics than normal. There's also the > possibility that getting the next element is complex enough that it > wouldn't help to buffer. If the iterator can't tell, then it should > just not buffer. > > Here's an obnoxious example of where you can't tell: > > def f(i): > return i > > s = 0 > for i in (f(x) for x in range(100)): > s += f(i) > def f(x): > return x + s > > In fact, all Python function calls are probably unsafe, including with > operators (which can be legally replaced during the iteration). Well, > `map` and `filter` are possible exceptions in special cases, because > the lookup for their function is bound at the call to `map`. > > It's usually safe if you're just using reversed, enumerate, builtin > operators, dict views, etc. on an existing data structure, as long as > your iteration doesn't modify entries, unlike so: > > for i, x in enumerate(reversed(lst)): > lst[i+1] = x > > But I'm looking toward the future, where it might be possible for the > interpreter to analyze loops and functions before making such > decisions. Then again, if the interpreter is that smart, it can figure > out where and when to buffer without adding to the API of iterators. > > Anyway, here's an idea, how it might be helpful, and how it might not > be helpful. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Dec 19 08:46:24 2015 From: random832 at fastmail.com (Random832) Date: Sat, 19 Dec 2015 08:46:24 -0500 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: Guido van Rossum writes: > The link between hashing and immutability is because objects whose > hash would change are common, e.g. lists, and using them as dict keys > would be very hard to debug for users most likely to make this > mistake. The issue is that the dict implementation makes it impossible > to find back keys whose hash has changed, other than by linear search, > which is unacceptable -- but that's exactly what users will try to > debug such issues, i.e., print the dict and notice that the missing > key is indeed present. Java doesn't seem to have this problem. Python uses dicts more heavily as part of its core architecture, sure, but those dicts use strings as their keys. > The consenting adults rule typically applies to things that are well > hidden or marked (e.g. using __dunder__ names). The ability to e.g. replace a class or module's functions, or values intended as constants, is not especially well-hidden. > There are plenty of things that Python could allow but doesn't, not > because they are hard to implement or would violate an invariant of > the interpreter, but because they could trip over naive users. > > Note that you are turning things upside down: the question "why aren't > all things hashable" came about because Andrew was considering making > a hash table of the values of a dict. Well, sure, but that's a reasonable way (if the ability to do so were present) to implement the operation being discussed under the performance constraints he specified. > But the real question here isn't "why aren't all things hashable" but > "why can't you put mutable values into a set". The answer to the > latter is because when we put a value into a container, and later the > value changes, we can't tell the container, so if the container has > any sort of lookup scheme other than linear search, it would lose > track of the value. Yes, but you're fine as long as the value doesn't change. What do you think about my __vhash__ idea? Someone would only make sets/dicts that use __vhash__ rather than __hash__ if they can guarantee the object won't change in the lifetime of its presence in the container (something that's no problem for the short-lived container that would be used for this operation) > Hashing comes into play because all of Python's common data structures > use hashing to optimize lookup -- but if we used a different data > structure, e.g. something based on sorting the keys, we'd still have > the mutability problem. And we'd have worse problems, because values > would have to be sortable, which is a stricter condition than being > immutable. > > In any case, you can't solve this problem by making all values > hashable. Sure I can. Normal dict values: def __eq__(self, b): return Counter(self) == Counter(b) #or e.g. Counter(map(self, make_vhash_key)) ... OrderedDict values: def __eq__(self, b): if isinstance(b, OrderedDict) return List(self) == List(b) else: return super().__eq__(b) # Yes, this isn't transitive, i.e. maybe: # a == c and b == c where a != b # but the same is true today for the dicts. >>> a = collections.OrderedDict(((1, 2), (3, 4))) >>> b = collections.OrderedDict(((3, 4), (1, 2))) >>> c = {1: 2, 3: 4} >>> a == c, b == c, a == b (True, True, False) From leewangzhong+python at gmail.com Sat Dec 19 09:10:10 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 09:10:10 -0500 Subject: [Python-ideas] Optimizing global names via dict references Message-ID: (Previous threads from python-dev: "Re: [Python-Dev] Third milestone of FAT Python" https://mail.python.org/pipermail/python-dev/2015-December/142437.html "[Python-Dev] Idea: Dictionary references" (Forked from above thread.) https://mail.python.org/pipermail/python-dev/2015-December/142490.html ) Dictionary References: An Idea in Three Emails Act 1: Stuff Table of Contents: Act 1: Stuff https://mail.python.org/pipermail/python-ideas/2015-December/037511.html - Definitions - Problem statement - The idea - The benefits - The costs - What will hold a RefCell? - Add `getref` to `dict` - Implementation issues - Interpreters other than CPython Act 2: More stuff https://mail.python.org/pipermail/python-ideas/2015-December/037512.html - Extension: Scope ancestry - Aside: Classes and scopes do not have the same rules - Extension: Read/write refs - Extension: Read-only refs - Option: Alternative RefCell structure - Option: Replace the methods when detaching a RefCell - Extension: Dynamic scoping - Extension: Thread safety Act 3: Sample pseudocode https://mail.python.org/pipermail/python-ideas/2015-December/037513.html - ScopeDict - Scope ancestry - Alternative RefCell structure - Thread safety == Definitions == "lookup" A dictionary lookup. "exposed RefCell" A RefCell with a PyRef other than by its owner. A RefCell with a refcount > 1. "pyreference" A CPython ref-counted reference. == Problem statement == I propose a CPython interpreter optimization for global name lookups. When a function uses a global name, it requires a lookup during execution. def maxlen(lists): return sum(len(lst) #globals['len'] for lst in lists) This is necessary because: 1. The name might not exist yet. 2. The value might change. Repeated dict lookups are more expensive than, uh, not doing that. Local variables are indexed by array, rather than lookups. (See http://stackoverflow.com/questions/12590058/python-performance-with-global-variables-vs-local) Pythonistas often get around this lookup by saving the name in a local variable. def maxlen(lists): len_ = len return sum(len_(lst) #globals['len'] for lst in lists) They also use a default argument for that variable, which would prevent the lookup during each call. def maxlen(lists, len=len): return sum(len_(lst) #globals['len'] for lst in lists) I think these tricks are theoretically unnecessary. We only need a single lookup, at compile-time. There should be no change to Python's semantics. In fact, this should reinforce the semantics, because the above tricks don't allow for `len` to change, so if this idea is implemented, the tricks would only be used to explicitly disallow changes. == The idea == We use a level of indirection, and allow empty values in the internal dict (scope.__inner__[key] = NULL). Let ScopeDict be a subclass of dict. (It doesn't have to be a subclass. See "ScopeDict should replace dict".) ScopeDict owns a second dict of RefCells. A normal dict owns pyreference to its values. A RefCell will have a pointer to that reference. This means that a RefCell does not need to be notified when a value is set, since it knows where the dict stores its pointer to the value. Instead, when the dict resizes, the dict has to update all of its RefCells. Upon deletion of the dict, ScopeDict will pass ownership of the value pyreferences to the RefCells. That was a simplified view. We can make these optimizations: 1. The RefCells will be held in an array, not a dict. This array will be synced with the dict's internal hash table, and resizes with it. This allows a single lookup for both the value and its RefCell (or lack thereof), because they will be at the same index in different arrays. 2. RefCells won't be created unless they're requested. Obviously. 2a. The array doesn't need to be created unless a RefCell is created. 3. Any time a ScopeDict would touch a RefCell, it would first check that it's not exposed (that is, that the RefCell has no outside references). If it isn't exposed, that Refcell can be safely deleted. 3a. The refs array can also be deleted at this time if it doesn't hold anything. 4. If a key has no corresponding RefCells, it doesn't need to be saved. This can be checked on resize and ScopeDict.__del__, and also can be checked on __delitem__. [See "Pseudocode: ScopeDict"] == The benefits == Every global lookup is done at compile-time. No need to save a local copy of a global variable. Guaranteed to have the most updated value. Even if it didn't exist at compile-time. In particular, it works even for recursive functions and wrapped functions, such as from functools import lru_cache @lru_cache() def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) Access to a global variable during a call is a few C dereferences, plus a few checks and C function calls. No change to the semantics of the language. Possible change in the semantics of CPython: There might be cases where keys live longer than they used to, given the following: 1. `del scope[key]` was called. 2. There was an exposed RefCell for that key. 3. Later, the RefCell stopped being exposed 4. `scope` has not resized since then (and thus didn't clean up its dead cells yet). The above is already possible in garbage-collected interpreter. We can make ScopeDict weakref its RefCells, and have RefCells remove keys with no value when they die, but it would mean extra overhead by wrapping a RefCell with a weakref. Is that worth it? (`getref` will return a strong reference.) == The costs == Memory: - An extra pointer to the refs table in each ScopeDict, and an extra function pointer for the new function. - An extra array for each ScopeDict, 1/3 to 1/4 the size of the internal dict table (depends on how big pointers are on the platform). Note: Since the refs table is the same size as the internal hash table, Raymond Hettinger's compact dict idea will also decrease the size of the refs table. (compact dict idea: https://mail.python.org/pipermail/python-dev/2012-December/123028.html) - Extra memory for each RefCell (a Python object). (There will be at most one RefCell per dict per key.) - The dict will be bigger because it holds empty entries. Execution: - Extra compile time. Functions will do lookups when they're compiled. But we're saving lookups during execution, so it's only extra if the function never gets called or never uses the RefCell. - `getitem`: Must check if the value is NULL. Currently, it only checks the key. - `delitem`: Instead of removing the key, just set the value to NULL. In fact, this is probably cheaper. - Resizing the dict: It will have to copy over all the RefCells (and delete the unused ones) to a new array, too, and update them. - Deleting the dict: The dict will have to DecRef its RefCells, and pass on its value pyreferences to the ones that survive. Python Refs: - We're potentially holding keys even when values have been deleted. This is okay, because most pyreferences to a refcell replace a pyreference to a key. == What will hold a RefCell? == Functions will request RefCells for each global variable they use. They will hold them until they die. Code not in functions (e.g. in module scope), or using eval/exec, will request RefCells during compilation, and they will be used during execution. They will discard the RefCells when they are done executing. == Add `getref` to `dict` == I propose that all dicts should have the ability to return RefCells. As Victor Stinner pointed out in his FAT Python message, using a subclass for scopes would disallow regular dicts being used as scopes by eval and exec. (See end of https://mail.python.org/pipermail/python-dev/2015-December/142397.html) But I think refs can be useful enough that it could be added to the dict interface. Meaning `getref` (via another name... `item`?) can be a new dict method. Upon first use, the dict will generate an empty refs table, and replace its methods with ones that know how to deal with refs. This prevents O(n) memory overhead for dicts that don't need it. == Implementation issues == There are performance _concerns_. I think it can be done with only a few performance _issues_. In my experience in arguing out the idea, the overhead for both memory and execution will be better than what they replace.[*] So while I'm not promising a free lunch, I'm believing in it. There are potential issues with dict subclasses that also change the C function pointers, since I'm proposing that we replace those functions dynamically. I think these issues are solvable, but it would require understanding what's allowed so I would know how to properly wrap that functionality. [*] (Except that there's a tricky case in deep multiple inheritance of scope, which I'd have to figure out. But I don't think it's possible to do that except in classes, and I don't wanna touch class MRO until I understand how the resolution works in C.) [*] (And you can pay a lot of lookups if you keep creating functions that won't get executed, but it's impossible to do such a thing in a loop since the lookups will be done once by the parent. Unless you loop an eval/exec to keep making functions that will never be called, in which case you're a bad person.) == Interpreters other than CPython == It should be possible to use the same idea in IronPython, Jython, and PyPy. I think it's a simple enough idea that I am surprised that it's not already there. In fact, I would not be surprised if PyPy already uses it. In fact, I would not be (very) surprised if I had gotten the idea from PyPy. From leewangzhong+python at gmail.com Sat Dec 19 09:10:53 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 09:10:53 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: Dictionary References: An Idea in Three Emails Act 2: More stuff Table of Contents: Act 1: Stuff https://mail.python.org/pipermail/python-ideas/2015-December/037511.html - Definitions - Problem statement - The idea - The benefits - The costs - What will hold a RefCell? - Add `getref` to `dict` - Implementation issues - Interpreters other than CPython Act 2: More stuff https://mail.python.org/pipermail/python-ideas/2015-December/037512.html - Extension: Scope ancestry - Aside: Classes and scopes do not have the same rules - Extension: Read/write refs - Extension: Read-only refs - Option: Alternative RefCell structure - Option: Replace the methods when detaching a RefCell - Extension: Dynamic scoping - Extension: Thread safety Act 3: Sample pseudocode https://mail.python.org/pipermail/python-ideas/2015-December/037513.html - ScopeDict - Scope ancestry - Alternative RefCell structure - Thread safety == Extension: Scope ancestry == It's possible for lookups in one scope to fall back into another. In other words, any failed lookup in scope A will result in a lookup in scope B. I will call B a parent scope of A. For example: - Names in functions will be local if defined there, and global otherwise. - If function f creates function g, then g's scope includes f's. - Object attribute lookups go through instance -> class -> [superclasses]. - Global scope includes builtins. The first two are irrelevant. Those decisions are made at compile-time, so there's no need for optimization. I don't want to touch the third one. I'm told that it's already cached, and I'm not sure refcells would help. We're left with globals and builtins. Any lookup in globals() can mean a lookup in __builtins__. So consider __builtins__ as a parent of globals(). Say we have a parentScope = ScopeDict() and a childScope = ChainScopeDict(). The childScope will have a list of its parents: childScope.parents = [parentScope]. When a ref = ChainRefCell() is created, it will also be given a ref.parents = [parentScope]. On resolution, it will ask its parent dicts for values. This requires holding a PyRef to the key. It might also request RefCells from the parents. [See "Pseudocode: Scope ancestry"] == Extension: Replacing a parent == A parent scope might be replaced after ChainRefCells have been created. In particular, globals()['__builtins__'] can be replaced. Fortunately, globals() will know when that particular key is changed. I suggest that globals() goes through all of its ChainRefCells and updates their .parents to the new __builtins__ (which invalidates parent refcells). This will mean iterating through the whole refs array every time __builtins__ is replaced. So if you keep replacing __builtins__, you're going to have a bad time. In other cases where parents can be replaced, we can do the same kind of notification. Say ChainScopeDicts A < B < C is a relation (B is a child scope of A, so B is bigger than A), and we replace B's parent A with D. Then B notifies its RefCells, and C doesn't need to be notified directly. (This does NOT work for class hierarchies. See next section.) == Aside: Classes and scopes do not have the same rules == Take this class hierarchy: class A(object): pass class B(object): pass class C(A, B): pass obj = C() Then a lookup in C will NOT cause recursive lookups in A and B. In other words, the family tree look like this: obj: obj -> C -> A -> B -> object C: C -> A -> B -> object A: A -> object B: B -> object No matter how deep the class hierarchy goes, the scope hierarchy for classes and objects is at most two levels deep: instance -> class -> [superclasses]. This is due to Method Resolution Order. It means that a class's direct scope parents are all of its superclasses. This also means that the notification idea in the previous section won't work. If a class changes its superclass, it doesn't just need to notify its own cells. It needs to notify all of its subclasses (though not instances), because they hold pyreferences to its original superclass. This can be solved by versioning: But let's not get into that when it's not even clear that attribute lookup needs this optimization. == Extension: Read/write refs == Python functions can write to globals by declaring, for example, `global x`. It's straightforward to allow myRefCell.set(val) and myRefCell.delete(). As a method of `dict`, this would slightly improve the memoization idiom. Before: memo = {} def fib(n): if n < 2: return n try: return memo[n] except: result = memo[n] = fib(n-1) + fib(n-2) # ^Second lookup return result After: memo = {} def fib(n): if n < 2: return n ref = memo.getref(n) try: return ref.get() except: ref.set(fib(n-1) + fib(n-2)) # No second lookup return ref.get() Also, "It's better to ask for forgiveness" might no longer be true with a method that checks for existence. memo = {} def fib(n): if n < 2: return n ref = memo.getref(n) if ref.empty(): ref.set(fib(n-1) + fib(n-2)) return ref.get() This allows modification of entries even after the scope dies. It's like closures, except exactly the same as closures. == Extension: Read-only refs == Some optimizations might be possible if a dict knows that it will be notified for every change, through setitem/delitem. For example, if we make dict1 = dict0.copy(), then this can be a shallow copy, and whichever dict modifies first will split off from the other. This is not possible with the existence of RefCells that can .set(). (See previous section.) Two non-exclusive solutions: 1. Distinguish read-only RefCells and read-write RefCells. Have a dict keep track of whether it has read-write RefCells. 2. Make Refcell.set notify its owner dict. This requires a pointer to the owner dict. (NOT a pyreference, since the pointer is only useful as long as the owner is alive, and the owner can notify its RefCells when it dies.) == Option: Alternative RefCell structure == The RefCell I defined owns a pointer to the dict entry or a pointer to the value, and a flag to determine whether it's part of a "living" dict. Instead, we can hold two pointers, one which MIGHT point to `key`. The `key` pointer will do double duty by telling the RefCell whether it's part of a living dict (by being NULL). With this, when we call .get(), it will raise a KeyError with the correct key. Otherwise, it would have to raise a generic KeyError, since it doesn't know the key. [See "Pseudocode: Alternative RefCell structure"] == Option: Replace the methods when detaching a RefCell == Instead of figuring out whether it's pointing to a table entry or a value itself, we can simply replace the RefCell's member functions. They will only change once, since the owner can only die once. == Extension: Dynamic scoping == In (C?)Python 3, `exec` can't change scoping. It can't even change local variables. x = 'global x' y = 'global y' def f(): x = 10 exec("x = 'local x'") exec("y = 'local y'") print(x) # prints 10 print(y) # prints 'global y' f() print(x) #prints 'global x' The docs say: """ The default locals act as described for function locals() below: modifications to the default locals dictionary should not be attempted. Pass an explicit locals dictionary if you need to see effects of the code on locals after function exec() returns. """ It's possible, if we wanted, to change how this works, and have nested function scoping behave like globals > builtins (that is, dynamically). I'm not sure if it's even desirable, though. (Function scope ancestry is linear, fortunately. No multiple inheritance, so no diamond inheritance problem, so there's no need for Python's MRO, so we wouldn't have the same issues as with classes.) Steven D'Aprano talks more about it here: https://mail.python.org/pipermail/python-dev/2015-December/142511.html == Extension: Thread safety == (Note: Needs an expert to double-check.) RefCells might need locks, because they have to determine whether the dict is alive to determine where their value is. The owner dict doesn't need to care about that, since it knows it's alive. When a dict deletes, it only needs to hold a lock on one RefCell at a time. [See "Pseudocode: Thread safety"] From leewangzhong+python at gmail.com Sat Dec 19 09:11:31 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 09:11:31 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: Dictionary References: An Idea in Three Emails Act 3: Sample pseudocode Table of Contents: Act 1: Stuff https://mail.python.org/pipermail/python-ideas/2015-December/037511.html - Definitions - Problem statement - The idea - The benefits - The costs - What will hold a RefCell? - Add `getref` to `dict` - Implementation issues - Interpreters other than CPython Act 2: More stuff https://mail.python.org/pipermail/python-ideas/2015-December/037512.html - Extension: Scope ancestry - Aside: Classes and scopes do not have the same rules - Extension: Read/write refs - Extension: Read-only refs - Option: Alternative RefCell structure - Option: Replace the methods when detaching a RefCell - Extension: Dynamic scoping - Extension: Thread safety Act 3: Sample pseudocode https://mail.python.org/pipermail/python-ideas/2015-December/037513.html - ScopeDict - Scope ancestry - Alternative RefCell structure - Thread safety == Pseudocode: ScopeDict == The CPython dict implementation looks like this (simplified): class KVPair: __slots__ = { 'key': object | DUMMY | NULL, 'value': object | NULL, } class dict: __slots__ = { 'table': List[KVPair], 'size': int, } def lookup(self, key): # returns the entry, or an empty entry ... def __getitem__(self, key): entry = self.lookup(key) if entry.key in [NULL, DUMMY]: raise KeyError return entry.value def __setitem__(self, key, value): entry = self.lookup(key) if entry.key in [NULL, DUMMY]: entry.key = key entry.value = value self.maybe_resize() def __delitem__(self, key): entry = self.lookup(key) if entry.key in [NULL, DUMMY]: raise KeyError entry.key = DUMMY entry.value = NULL self.maybe_resize() def resize(self): old_table = self.table self.table = [KVPair() for i in range(self.predict_size())] for k, v in old_table: self[k] = v I want to add a `refs` member to hold the RefCells, and a `getref` function to acquire RefCells. (I make redundant lookups for demonstration. They are unnecessary in C.) (In C, self.table.index(entry) is a simple pointer subtraction.) (In the actual implementation, we could save a direct pointer to the KVPair's value pointer.) class ScopeDict(dict): __slots__ = { 'refs': List[RefCell | NULL], } def getref(self, key, force_create: bool): entry = self.lookup(key) if entry.key in [NULL, DUMMY]: if force_create: entry.key = key entry.value = NULL else: raise KeyError # Get the index. i = self.table.index(entry) cell = self.refs[i] = RefCell() cell.indirect = True cell.pointer = entry def __getitem__(self, key): entry = self.lookup(key) if entry.key in [NULL, DUMMY] or entry.value is NULL: raise KeyError return entry.value def __delitem__(self, key): entry = self.lookup(key) if entry.key in [NULL, DUMMY] or entry.value is NULL: raise KeyError entry.value = NULL def resize(self): old_table = self.table self.table = [KVPair() for i in range(self.predict_size())] old_refs = self.refs self.refs = [NULL] * len(self.table) for ref, (key, value) in zip(old_refs, old_table): self[key] = value if ref is NULL: continue # Update the ref entry = self.lookup(key) index = self.table.index(entry) self.refs[index] = ref ref.pointer = entry def __del__(self): # with illustrative deletes for ref, entry in zip(self.refs, self.table): delete entry.key if ref is not NULL: ref.pointer = entry.value ref.indirect = False delete ref else: delete entry.value class RefCell: __slots__ = { 'indirect': bool, 'pointer': (KVPair | object), } def get(self): if self.indirect: value = self.pointer.value else: value = self.pointer if value is NULL: raise KeyError return value def __del__(self): if not self.indirect: delete self.pointer == Pseudocode: Scope ancestry == Algorithm for ref creation: (This is complex, because of the decision-making process of whether to create a RefCell.) def ChainScopeDict.getref(self, key, force_create: bool, recursive: bool = True): # Try the normal getref. try: ref = super().getref(key, force_create=force_create) except KeyError: # We don't have this key. if not recursive: raise #give up else: ref = None # We now know: assert recursive or ref is not None if recursive: # Try to find a parent with a refcell for i, parent in enumerate(self.parents): try: parent_ref = parent.getref(key, force_create=False, recursive=True) good_parent = i break except: continue else: # No parent has a RefCell for this key if ref is None: assert not force_create raise KeyError(key) else: parent_ref = None else: assert ref is not None parent_ref = None assert parent_ref is not None or ref is not None if ref is None: assert parent_ref is not None and force_create is False ref = super().getref(key, force_create=True) # Hack to save on pseudocode: ref.__class__ = ChainRefCell if parent_ref is None: assert force_create or key in self # Create no parent refs. It will look up parents later. ref.parents = self.parents.copy() return ref # Create all refs up to the found parent_ref. # (Don't create refs for the later parents.) ref.parents = ([p.getref(key, force_create=True) for p in self.parents[:good_parent]] + [parent_ref] + self.parents[good_parent + 1:]) return ref Algorithm for chain resolution: def ChainRefCell.get(self, recursive: bool): key = self.key try: # Own value. return super().get() except KeyError: if not recursive: # Don't search parents. raise pass # Index of latest parent which is/has a RefCell. # We want to create intermediate RefCells for all things in between. last_parent_cell = 0 for i, parent in enumerate(self.parents): # We want a parent RefCell, not a parent dict if isinstance(parent, ScopeDict): try: parent = parent.getref(key, force_create=False) except KeyError: continue # Don't need the parent dict anymore. self.parents[i] = parent last_parent_cell = i # This parent is now a refcell. # Try to get the parent to resolve. try: # `recursive=False` for class-like hierarchy; # see below value = parent.get(recursive=True) except KeyError: continue else: # No parent has a value. value = NULL # Create refs in the parents which come before a refcell. # This prevents repeated failed lookups. for i, parent in enumerate(self.parents[:last_parent_cell]): if isinstance(parent, ScopeDict): self.parents[i] = parent.getref(key, force_create=True) if value is NULL: raise KeyError return value == Pseudocode: Alternative RefCell structure == This allows us to have a reference to the key (for KeyError(key)) without taking up an extra bool for checking the dictionary. class RefCell: __slots__ = { '_key': (NULL | object), '_value': (KVPair | object), } """ Two possibilities: 1. _key == NULL and isinstance(_value, KVPair) => the dict is alive. 2. isinstance(_key, object) and isinstance(_value, object) => this RefCell has been released from its dict. (KVPair and NULL are not (py-)objects.) So we use _key as a way to tell whether the dict is alive, and to tell whether _value points to the value or to the table entry. """ def get(self): if self._key is NULL: # The pointer is to the KVPair. key = self._value.key value = self._value.value else: # The pointer is to the value. key = self._key value = self._value if value is NULL: raise KeyError(key) return value def __del__(self): if self._key is not NULL: delete self._key delete self._value # Otherwise, it doesn't own the references. def key(self): if self._key is NULL: # Look it up in the dict. return self._value.key else: return self._key == Pseudocode: Thread safety == for i, (ref, (k, v)) in enumerate(zip(oldrefs, oldtable)): # Remove refcells which only have one ref, # then remove keys without either value or refcell. # No need to lock a refcell with only one ref. isdead = self.remove_if_dead(i) if isdead: continue if ref is not NULL: lock(ref) ref._value = v ref._key = k unlock(ref) else: delete k delete v From victor.stinner at gmail.com Sat Dec 19 10:06:09 2015 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 19 Dec 2015 16:06:09 +0100 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: Le samedi 19 d?cembre 2015, Franklin? Lee > a ?crit : > > == Problem statement == > > I propose a CPython interpreter optimization for global name lookups. > Fans of micro optimisation are probably already using various hacks like func(len=len): ... to avoid the lookup at runtime. There is an option rewriting bytecode to replace load_global with load_const (I don't recall the name of the PyPI project, it's a decorator). Serhiy also proposed to implement a new syntax to make the lookup when the function is defined. It would be interesting to mesure the cost of these lookups(ex: number of nanoseconds per lookup) and have an idea on how much load_global lookups are used in the wild (ratio on the overall number of instructions). Since the goal is a speedup, a working proof of concept is required to show that it works and it's faster (on macro benchmarks?). Do you feel able to implement it? As I already wrote, I'm not convinced that it's worth it. Your code looks more complex, will use more memory, etc. I don't think that load_global is common in hot code (the 10% taking 90% of the runtime). I expect effetcs on object lifetime which could be annoying. I implemented an optimization in FAT Python that replaces builtin lookups with load_const. I have to enhance the code to make it safe, but it works and it doesn't require deep changes in dict type. http://faster-cpython.readthedocs.org/en/latest/fat_python.html#copy-builtin-functions-to-constants In short, it only adds a single integer per dict, incremented at each modification (create, modify, delete). Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sat Dec 19 10:14:00 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 19 Dec 2015 16:14:00 +0100 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: <56757438.9000604@egenix.com> On 19.12.2015 16:06, Victor Stinner wrote: > Le samedi 19 d?cembre 2015, Franklin? Lee > a ?crit : >> >> == Problem statement == >> >> I propose a CPython interpreter optimization for global name lookups. >> > > Fans of micro optimisation are probably already using various hacks like > func(len=len): ... to avoid the lookup at runtime. There is an option > rewriting bytecode to replace load_global with load_const (I don't recall > the name of the PyPI project, it's a decorator). > > Serhiy also proposed to implement a new syntax to make the lookup when the > function is defined. > > It would be interesting to mesure the cost of these lookups(ex: number of > nanoseconds per lookup) and have an idea on how much load_global lookups > are used in the wild (ratio on the overall number of instructions). > > Since the goal is a speedup, a working proof of concept is required to show > that it works and it's faster (on macro benchmarks?). Do you feel able to > implement it? > > As I already wrote, I'm not convinced that it's worth it. Your code looks > more complex, will use more memory, etc. I don't think that load_global is > common in hot code (the 10% taking 90% of the runtime). I expect effetcs on > object lifetime which could be annoying. The effects are minimal and only show up in overall performance if the functions in question are used a lot. I gave a talk about such optimizations last year at PyCon UK you might want to have a look at: http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-matters/ Slide 38 has the details about such lookup optimization tricks: https://downloads.egenix.com/python/PyCon-UK-2014-When-performance-matters-Talk.pdf > I implemented an optimization in FAT Python that replaces builtin lookups > with load_const. I have to enhance the code to make it safe, but it works > and it doesn't require deep changes in dict type. > http://faster-cpython.readthedocs.org/en/latest/fat_python.html#copy-builtin-functions-to-constants > > In short, it only adds a single integer per dict, incremented at each > modification (create, modify, delete). > > Victor > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 19 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From guido at python.org Sat Dec 19 11:22:35 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Dec 2015 08:22:35 -0800 Subject: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? In-Reply-To: References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> Message-ID: On Sat, Dec 19, 2015 at 5:46 AM, Random832 wrote: > Guido van Rossum writes: > > The link between hashing and immutability is because objects whose > > hash would change are common, e.g. lists, and using them as dict keys > > would be very hard to debug for users most likely to make this > > mistake. The issue is that the dict implementation makes it impossible > > to find back keys whose hash has changed, other than by linear search, > > which is unacceptable -- but that's exactly what users will try to > > debug such issues, i.e., print the dict and notice that the missing > > key is indeed present. > > Java doesn't seem to have this problem. Python uses dicts more > heavily as part of its core architecture, sure, but those dicts > use strings as their keys. > > > The consenting adults rule typically applies to things that are well > > hidden or marked (e.g. using __dunder__ names). > > The ability to e.g. replace a class or module's functions, or > values intended as constants, is not especially well-hidden. > > > There are plenty of things that Python could allow but doesn't, not > > because they are hard to implement or would violate an invariant of > > the interpreter, but because they could trip over naive users. > > > > Note that you are turning things upside down: the question "why aren't > > all things hashable" came about because Andrew was considering making > > a hash table of the values of a dict. > > Well, sure, but that's a reasonable way (if the ability to do so > were present) to implement the operation being discussed under > the performance constraints he specified. > > > But the real question here isn't "why aren't all things hashable" but > > "why can't you put mutable values into a set". The answer to the > > latter is because when we put a value into a container, and later the > > value changes, we can't tell the container, so if the container has > > any sort of lookup scheme other than linear search, it would lose > > track of the value. > > Yes, but you're fine as long as the value doesn't change. > > What do you think about my __vhash__ idea? Someone would only > make sets/dicts that use __vhash__ rather than __hash__ if they > can guarantee the object won't change in the lifetime of its > presence in the container (something that's no problem for the > short-lived container that would be used for this operation) > You can solve this without adding warts to the language by using a wrapper object. > > Hashing comes into play because all of Python's common data structures > > use hashing to optimize lookup -- but if we used a different data > > structure, e.g. something based on sorting the keys, we'd still have > > the mutability problem. And we'd have worse problems, because values > > would have to be sortable, which is a stricter condition than being > > immutable. > > > > In any case, you can't solve this problem by making all values > > hashable. > > Sure I can. > > Normal dict values: > def __eq__(self, b): > return Counter(self) == Counter(b) > #or e.g. Counter(map(self, make_vhash_key)) ... > > OrderedDict values: > def __eq__(self, b): > if isinstance(b, OrderedDict) > return List(self) == List(b) > else: > return super().__eq__(b) > # Yes, this isn't transitive, i.e. maybe: > # a == c and b == c where a != b > # but the same is true today for the dicts. > > >>> a = collections.OrderedDict(((1, 2), (3, 4))) > >>> b = collections.OrderedDict(((3, 4), (1, 2))) > >>> c = {1: 2, 3: 4} > >>> a == c, b == c, a == b > (True, True, False) > I don't see what that bit of code proves (or most other things you wrote above). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Dec 19 11:33:34 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 11:33:34 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: I messed up the links. Well, I learned something about mailman. On Sat, Dec 19, 2015 at 10:06 AM, Victor Stinner wrote: > Le samedi 19 d?cembre 2015, Franklin? Lee a > ?crit : >> >> == Problem statement == >> >> I propose a CPython interpreter optimization for global name lookups. > > > Fans of micro optimisation are probably already using various hacks like > func(len=len): ... to avoid the lookup at runtime. There is an option > rewriting bytecode to replace load_global with load_const (I don't recall > the name of the PyPI project, it's a decorator). > > Serhiy also proposed to implement a new syntax to make the lookup when the > function is defined. I mentioned the hack, and said this would make it unnecessary. I also mentioned how the hack bypasses dynamic lookup for the sake of performance, while this would have the effect of dynamic lookup. It could possibly be faster (since you don't have to load a default arg). Rewriting bytecode has the same issues, and adds overhead during compilation, too. My idea might have less execution overhead on compilation, since you'd have to do the lookup in either case. It would improve all functions which use globals, and doesn't require the programmer to bypass dynamic lookup. No new syntax > new syntax. Not having to use a hack > using a hack. > It would be interesting to mesure the cost of these lookups(ex: number of > nanoseconds per lookup) and have an idea on how much load_global lookups are > used in the wild (ratio on the overall number of instructions). > > Since the goal is a speedup, a working proof of concept is required to show > that it works and it's faster (on macro benchmarks?). Do you feel able to > implement it? It would be a project. I've studied some of the dict code, but I'd still have to look at OrderedDict and how it differs. I'd have to hack at the interpreter level. Worst, I'm not very experienced at C, and have no real "development" experience, so I'd struggle with just compiling CPython. But I'll definitely try, because I won't get anywhere by hoping someone else implements what I think of. Theoretically, using a RefCell is definitely faster than actually doing the lookups (what currently happens). The extra level of indirection might result in CPU cache misses, but the dictionary lookup it replaces is much more likely to do so. > As I already wrote, I'm not convinced that it's worth it. Your code looks > more complex, will use more memory, etc. I don't think that load_global is > common in hot code (the 10% taking 90% of the runtime). I expect effetcs on > object lifetime which could be annoying. At the least, it would relieve Python's mental burden: "Don't use global names in tight loops." That would be worth it. (It would also mean that you wouldn't need to implement guards, or have a "slow path" with lookups.) Even if the gains are minimal, they're gains that people work for, so this would save them work. The idea is not very complex. "Know where the value would live, and you will always be able to find it." The complexity comes from: - The complexity of dicts. Much of the code is simply managing the syncing between the dict and the refs table. - The complexity of scope relationships. - The complexity of allowing scopes to be replaced. (`globals()['__builtins__'] = {}`) - The complexity of cells which will live on after their owner dict. - Exceptions and error messages. - Preserving dict performance. For example, if normal dicts weren't a concern, the ref table would be built into dict's internal table. Most of the complexity is in trying to replicate the existing complexity. Otherwise, we can just implement it as, "The inner dict holds a length-1 list which holds the value, or NULL. Make sure to check it's not NULL." The cost of resolving a (not-chained) RefCell: Call a function, which dereferences a pointer and checks if it's NULL. If it's not, return it. It's pretty cheap. I don't know why you think it'd possibly be slower than the current way. Memory... That will have to be seen, but it would be a function of the combined sizes of the module dicts, which shouldn't be a big part of the program. Implementing compact dicts will also make the refs table smaller. Module loading will gain costs, since the names used in each module require lookups at compile-time instead of runtime, and many functions in a module can go unused. This needs to be explored. It's feasible to have the interpreter delay the lookups to first-call. > I implemented an optimization in FAT Python that replaces builtin lookups > with load_const. I have to enhance the code to make it safe, but it works > and it doesn't require deep changes in dict type. > http://faster-cpython.readthedocs.org/en/latest/fat_python.html#copy-builtin-functions-to-constants > > In short, it only adds a single integer per dict, incremented at each > modification (create, modify, delete). To contrast: I don't require any safety checks, I can replace `print` without incurring additional lookups, and I am insensitive to other changes in the dict (unless the dict resizes). I add a few pointers to dict (probably smaller than a PyIntObject), and the refs table won't exist unless it's needed. From abarnert at yahoo.com Sat Dec 19 13:30:57 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 19 Dec 2015 10:30:57 -0800 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: <8FA5E45F-BCFF-46B5-9044-A5419A6DAB10@yahoo.com> On Dec 19, 2015, at 08:33, Franklin? Lee wrote: > > I messed up the links. Well, I learned something about mailman. > > On Sat, Dec 19, 2015 at 10:06 AM, Victor Stinner > wrote: >> Le samedi 19 d?cembre 2015, Franklin? Lee a >> ?crit : >>> >>> == Problem statement == >>> >>> I propose a CPython interpreter optimization for global name lookups. >> >> >> Fans of micro optimisation are probably already using various hacks like >> func(len=len): ... to avoid the lookup at runtime. There is an option >> rewriting bytecode to replace load_global with load_const (I don't recall >> the name of the PyPI project, it's a decorator). >> >> Serhiy also proposed to implement a new syntax to make the lookup when the >> function is defined. > > I mentioned the hack, and said this would make it unnecessary. I also > mentioned how the hack bypasses dynamic lookup for the sake of > performance, while this would have the effect of dynamic lookup. It > could possibly be faster (since you don't have to load a default arg). On Dec 19, 2015, at 08:33, Franklin? Lee wrote: > > I mentioned the hack, and said this would make it unnecessary. I also > mentioned how the hack bypasses dynamic lookup for the sake of > performance, while this would have the effect of dynamic lookup. It > could possibly be faster (since you don't have to load a default arg). It can't possibly be faster, unless you never actually use the len function (in which case it would be pretty stupid to optimize with len=len). Loading a default value is just copying from one array to another at call time, with pseudocode like "frame.localsplus[3] = function.defaults[1]". That takes about as much time as two locals lookups (and then turns every builtin lookup into a local lookup). Meanwhile, your idea replaces every locals lookup with a RefCell lookup plus a dereference, something like "frame.code.consts[1].value" instead of "frame.localsplus[3]". Loading off consts is about as fast as loading off locals, but that extra cell dereference step can't possibly be free, and in fact will be on the same order as the array lookup. As a rough guess, since your RefCell system is pretty close to what already happens for closure cells, it'll probably be about as fast as closure lookups--which are significantly faster than global lookups, meaning the optimization would help, but not as fast as local lookups, meaning the optimization would not help as much as the default value hack. And that seems like an almost unavoidable cost of keeping it dynamic. The FAT idea avoids that cost by faking the dynamic nature: it caches the value as a const, but the cache is wiped out if the value changes and the guard is tripped, so in cases where you don't ever change len it's nearly as fast the default value hack, but if you do change len it's as slow as a traditional global lookup (because, in fact, it's just falling back to the unoptimized code that does a traditional global lookup). And, since you asked about PyPy, I'd be willing to bet that what it does is a lot closer to FAT than to your idea. > It would improve all functions which use globals, and doesn't require > the programmer to bypass dynamic lookup. No new syntax > new syntax. > Not having to use a hack > using a hack. But more complex under-the-covers implementation < simple implementation. Unless it actually significantly speeds up a reasonable amount of real-life code whose speed matters, that's not worth it. I suppose one way you could estimate the potential benefit is to try to figure out how common the default hack is in real life. If people need to use it frequently in production code (which presumably means there's other production code that could be benefitting from it but the programmer didn't realize it), then making it unnecessary is a real win. >> Since the goal is a speedup, a working proof of concept is required to show >> that it works and it's faster (on macro benchmarks?). Do you feel able to >> implement it? > > It would be a project. I've studied some of the dict code, but I'd > still have to look at OrderedDict and how it differs. I'd have to hack > at the interpreter level. Worst, I'm not very experienced at C, and > have no real "development" experience, so I'd struggle with just > compiling CPython. But I'll definitely try, because I won't get > anywhere by hoping someone else implements what I think of. Something you might want to consider doing first: Write a pure-python RefCell, RefCellDict, and RefCellChainDict. Write some code that explicitly uses a ChainMap to look things up instead of using globals and builtins. Then rewrite it as equivalent code that manually sets up a list of RefCells out of a RefCellChainDict and then uses the list. You can simulate "global" and "builtin" lookups and mutations, and see when the optimization makes things faster. If it often makes a big difference, that should inspire you to do the hard work in C. Also, it means you have real code instead of pseudocode to show off and let people play with. In fact, you could even write a pure-Python optimizer out of this without inspecting the bytecode: for each def/lambda/etc. node, compile the AST normally, get the co_names, then rewrite the AST to build and use a list of RefCells, then compile the module and replace its dict with a RefCellDict, and now all globals in the module use RefCell lookups. (Making that work for builtins might be trickier, because other modules can change that... But for a first pass, you can either leave builtins out, or assume that no code changes it after startup and throw in some asserts for tests that break that assumption.) That would allow you to run realistic benchmarks. (If you're worried that the pure-Python objects cells and dicts are too slow, Cython should help there.) And it might even be a useful project on its own if it could optimize real-life code in CPython 3.5 and 2.7, PyPy, Jython, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Dec 19 15:07:08 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 19 Dec 2015 15:07:08 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: On Sat, Dec 19, 2015 at 9:10 AM, Franklin? Lee wrote: > Dictionary References: An Idea in Three Emails *snip* I found some earlier threads while looking for info on PyPy's "celldicts.py". (I knew it couldn't have been a new idea!) I'm taking a break from reading them, but I was writing some notes, and the fact that they exist should be known early. ==== Subject: [Python-Dev] Accessing globals without dict lookup From: Guido van Rossum Date: Fri, 08 Feb 2002 11:50:31 -0500 Link: https://mail.python.org/pipermail/python-dev/2002-February/019846.html PEP: https://www.python.org/dev/peps/pep-0280/ It goes into many of the same details I had worked out this week, including - pre-creation of empty cells - globals() > __builtins__ relationship - can't figure out a use for deeper relationships - Tim Peters lays out an idea of "celldict pointing to a realdict" which is similar to how my refs table is a dict parallel to the original dict, except that it being part of the same object means that I can do a single lookup to find both an entry and its cell. (https://mail.python.org/pipermail/python-dev/2002-February/019893.html) It also has a basic implementation in Python. (Victor, maybe you'd like reading that one better.) The PEP, along with PEP 266 and PEP 267, was deferred because "no-one has yet emerged to do the work of hashing out the differences between [them.]" Key differences: - He stores cells as values. I store them parallel to values, to reduce overhead on normal dict operations. - He uses a subclass. I want to make it the default, and dynamically convert from normal dict to celldict and back. - He packs __builtins__ cells into module cells when the module is created. I am lazier: a module cell will only request a __builtins__ cell when it fails to resolve to a value. - He converts the lookups to indexed lookups, with new corresponding bytecode ops. I... don't actually know how to get the interpreter to use the cells. - Since he uses a subclass, he doesn't have functions load cells from the module if the functions have a regular dict as a globals() scope. I want a function to always be able to request cells from its globals(), even if it is a user dict. - He uses a single-parent relationship: a cell either has a parent or it doesn't. I allow for multiple parents, but since I've already realized that scopes don't have multiple direct parents, let's take away the loop and go with a single optional parent. On semantics: """ I think this faithfully implements the current semantics (where a global can shadow a builtin), but without the need for any dict lookups when accessing globals, except in cases where an explicit dict is passed to exec or eval(). """ Tim Peters points out how it might be more expensive than a lookup. (https://mail.python.org/pipermail/python-dev/2002-February/019874.html) """ Note that a chain of 4 test+branches against NULL in "the usual case" for builtins may not be faster on average than inlining the first few useful lines of lookdict_string twice (the expected path in this routine became fat-free for 2.2): i = hash; ep = &ep0[i]; if (ep->me_key == NULL || ep->me_key == key) return ep; Win or lose, that's usually the end of a dict lookup. That is, I'm certain we're paying significantly more for layers of C-level function call overhead today than for what the dict implementation actually does now (in the usual cases). """ He also wanted to optimize the globals() -> __builtins__ relationship by loading the __builtins__ value into the globals() cell as a fast path, with an expensive notification to all modules' cells when a name in __builtins__ changes. (https://mail.python.org/pipermail/python-dev/2002-February/019904.html) ==== Subject: [Python-ideas] Fast global cacheless lookup From: Neil Toronto Date: Thu Nov 22 16:40:49 CET 2007 Link: https://mail.python.org/pipermail/python-ideas/2007-November/001212.html A proposal to modify `dict` instead of subclassing. """ What if a frame could maintain an array of pointers right into a dictionary's entry table? A global lookup would then consist of a couple of pointer dereferences, and any value change would show up immediately to the frame. """ Unfortunately, he makes the dict responsible for notifying all registered functions, instead of using a ref-counted indirection. So his resizes are an additional O(len(functions)), while mine are an additional O(len(table) + #(refs)). Subject: [Python-Dev] PATCH: Fast globals/builtins lookups for 2.6 From: Neil Toronto Date: Thu Nov 29 11:26:37 CET 2007 Link: http://mail.python.org/pipermail/python-ideas/2007-November/001212.html The patch. He ended up using versioning, because the notification was expensive. (*Possibly* not as much of an issue for me.) The patch slightly changed the behavior of the __builtins__ dict itself. From guido at python.org Sat Dec 19 20:01:14 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Dec 2015 17:01:14 -0800 Subject: [Python-ideas] Why can't you put mutable values in a set? (was: Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected?) In-Reply-To: <858u4q3lbd.fsf_-_@benfinney.id.au> References: <659951373.398716.1450406333370.JavaMail.yahoo@mail.yahoo.com> <20151218110755.GH1609@ando.pearwood.info> <858u4q3lbd.fsf_-_@benfinney.id.au> Message-ID: Hah, I never have time to blog any more. :-( However you can just link to the mailman archives if you want to reference it. On Sat, Dec 19, 2015 at 12:16 AM, Ben Finney wrote: > Guido van Rossum writes: > > > The link between hashing and immutability is because objects whose > > hash would change are common, e.g. lists, and using them as dict keys > > would be very hard to debug for users most likely to make this > > mistake. [?] > > > > [?] But the real question here isn't "why aren't all things hashable" > > but "why can't you put mutable values into a set". [?] > > > > Hashing comes into play because all of Python's common data structures > > use hashing to optimize lookup -- but if we used a different data > > structure, e.g. something based on sorting the keys, we'd still have > > the mutability problem. And we'd have worse problems, because values > > would have to be sortable, which is a stricter condition than being > > immutable. > > > > In any case, you can't solve this problem by making all values hashable. > > That was a great explanation; you answered several points on which I was > vague, and you addressed some things I didn't even know were problems. > > I'd love to see that edited to a blog post we can reference in a single > article, if you have the time. > > -- > \ ?I went to the museum where they had all the heads and arms | > `\ from the statues that are in all the other museums.? ?Steven | > _o__) Wright | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sun Dec 20 01:34:13 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 20 Dec 2015 01:34:13 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: (You sent this as private?) On Dec 19, 2015 7:09 PM, "Victor Stinner" wrote: > > Le 19 d?c. 2015 21:07, "Franklin? Lee" a > ?crit : > > Tim Peters points out how it might be more expensive than a lookup. > > Your whole idea rely on the assumption than a dict lookup (two lookups for > builtins) is slow. Remember that a dict lookup has a complexity of O(1)! > That's why I suggested you to start to benchmark, especially know the time > in nanoseconds of a dict lookup. > > > He ended up using versioning, because the notification was expensive. > > Yeah I also began with notification with a registry of functions before > moving to versionning: > http://faster-cpython.readthedocs.org/readonly.html > > Versionning is simple to implement and doesn't make dict operations slower. > > Victor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sun Dec 20 02:06:18 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 20 Dec 2015 02:06:18 -0500 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: (Sent this from the wrong email and it got rejected. Sorry, Victor, for the double post.) On Dec 20, 2015 2:03 AM, "Franklin?" wrote: > > Oops, that was supposed to private. No point now. > > Anyway, after I read Tim's remark, I realized that I overestimated the cost of dict lookups. It's not the complexity, but the constant factor in terms of branches and dereferences. It means that if I want to make an improvement, it would have to be very optimized C. They were discussing tricks like having a cell consider itself as its own parent, just to avoid a branch, which is not the level at which I was thinking. > > I think that normal dict operations won't have to slow down, except resize/destruction. I don't use registry, so there is no O(n) notification. This is an important difference from Neil Toronto's original method. > > Reading PEP 266, and rethinking your own work, I'm now considering a separate idea for temporary registries. But I think it'd be reaching into JIT territory, and I'm not confident at all in my JIT knowledge. I can't say that this idea would be an improvement. I'm just putting this out there in the hope that it can make some sense and inspire someone to have a real idea. > > The idea is, when code is running, if a function is called (within a loop?), load its global names into an array and register them with the globals/builtins (which can be done with user dicts by temporarily swapping out their get/set/del). Then replace the function with one that does LOAD_CONSTANT (like in FAT Python) and replace its function calls with "compile the functions the same way before calling" bytecodes. > > This array is for the whole stack. As I imagine it, it will be used during an execution. So for the REPL, it can be a new array per eval, or just globally. > > The trick is, each name will be added to the array once, so the registry will slow down the normal dict operations but each dict change only needs (at most) one notification per stack. > > This can be made to (only) slow down the dicts which are used as scopes, by dynamically swapping out its function pointers upon (temporary?) conversion. > > (You sent this as private?) > > On Dec 19, 2015 7:09 PM, "Victor Stinner" wrote: >> >> >> Le 19 d?c. 2015 21:07, "Franklin? Lee" a ?crit : >> > Tim Peters points out how it might be more expensive than a lookup. >> >> Your whole idea rely on the assumption than a dict lookup (two lookups for builtins) is slow. Remember that a dict lookup has a complexity of O(1)! That's why I suggested you to start to benchmark, especially know the time in nanoseconds of a dict lookup. >> >> > He ended up using versioning, because the notification was expensive. >> >> Yeah I also began with notification with a registry of functions before moving to versionning: >> http://faster-cpython.readthedocs.org/readonly.html >> >> Versionning is simple to implement and doesn't make dict operations slower. >> >> Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 20 11:41:13 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Dec 2015 08:41:13 -0800 Subject: [Python-ideas] Optimizing global names via dict references In-Reply-To: References: Message-ID: At this point it's entirely unclear (except to the actual authors, perhaps) who said what. On Sat, Dec 19, 2015 at 11:06 PM, Franklin? Lee < leewangzhong+python at gmail.com> wrote: > (Sent this from the wrong email and it got rejected. Sorry, Victor, for > the double post.) > > On Dec 20, 2015 2:03 AM, "Franklin?" wrote: > > > > Oops, that was supposed to private. No point now. > > > > Anyway, after I read Tim's remark, I realized that I overestimated the > cost of dict lookups. It's not the complexity, but the constant factor in > terms of branches and dereferences. It means that if I want to make an > improvement, it would have to be very optimized C. They were discussing > tricks like having a cell consider itself as its own parent, just to avoid > a branch, which is not the level at which I was thinking. > > > > I think that normal dict operations won't have to slow down, except > resize/destruction. I don't use registry, so there is no O(n) notification. > This is an important difference from Neil Toronto's original method. > > > > Reading PEP 266, and rethinking your own work, I'm now considering a > separate idea for temporary registries. But I think it'd be reaching into > JIT territory, and I'm not confident at all in my JIT knowledge. I can't > say that this idea would be an improvement. I'm just putting this out there > in the hope that it can make some sense and inspire someone to have a real > idea. > > > > The idea is, when code is running, if a function is called (within a > loop?), load its global names into an array and register them with the > globals/builtins (which can be done with user dicts by temporarily swapping > out their get/set/del). Then replace the function with one that does > LOAD_CONSTANT (like in FAT Python) and replace its function calls with > "compile the functions the same way before calling" bytecodes. > > > > This array is for the whole stack. As I imagine it, it will be used > during an execution. So for the REPL, it can be a new array per eval, or > just globally. > > > > The trick is, each name will be added to the array once, so the registry > will slow down the normal dict operations but each dict change only needs > (at most) one notification per stack. > > > > This can be made to (only) slow down the dicts which are used as scopes, > by dynamically swapping out its function pointers upon (temporary?) > conversion. > > > > > (You sent this as private?) > > > > On Dec 19, 2015 7:09 PM, "Victor Stinner" > wrote: > >> > >> > >> Le 19 d?c. 2015 21:07, "Franklin? Lee" > a ?crit : > >> > Tim Peters points out how it might be more expensive than a lookup. > >> > >> Your whole idea rely on the assumption than a dict lookup (two lookups > for builtins) is slow. Remember that a dict lookup has a complexity of > O(1)! That's why I suggested you to start to benchmark, especially know the > time in nanoseconds of a dict lookup. > >> > >> > He ended up using versioning, because the notification was expensive. > >> > >> Yeah I also began with notification with a registry of functions before > moving to versionning: > >> http://faster-cpython.readthedocs.org/readonly.html > >> > >> Versionning is simple to implement and doesn't make dict operations > slower. > >> > >> Victor > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guettliml at thomas-guettler.de Tue Dec 22 13:08:56 2015 From: guettliml at thomas-guettler.de (=?UTF-8?Q?Thomas_G=c3=bcttler?=) Date: Tue, 22 Dec 2015 19:08:56 +0100 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: Message-ID: <567991B8.2050205@thomas-guettler.de> Am 04.12.2015 um 20:00 schrieb Ram Rachum: > What do you think about implementing functionality similar to the `find` utility in Linux in the Pathlib module? I wanted this today, I had a script to write to archive a bunch of files from a folder, and I decided to try writing it in Python rather than in Bash. But I needed something stronger than `Path.glob` in order to select the files. I wanted a regular expression. (In this particular case, I wanted to get a list of all the files excluding the `.git` folder and all files inside of it. Me, too. I miss a find like method. I use os.walk() since more than 10 years, but it still feels way too complicated. I asked about a library on softwarerecs some weeks ago: http://softwarerecs.stackexchange.com/questions/26296/python-library-for-traversing-directory-tree-like-unix-command-line-tool-find -- http://www.thomas-guettler.de/ From guido at python.org Tue Dec 22 15:14:47 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Dec 2015 12:14:47 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: <567991B8.2050205@thomas-guettler.de> References: <567991B8.2050205@thomas-guettler.de> Message-ID: The UNIX find tool has many, many options. For the general case it's probably easier to use os.walk(). But there are probably some common uses that deserve better direct support in e.g. the glob module. Would just a way to recursively search for matches using e.g. "**.txt" be sufficient? If not, can you specify what else you'd like? (Just " find-like" is too vague.) --Guido (mobile) On Dec 22, 2015 11:14 AM, "Thomas G?ttler" wrote: > Am 04.12.2015 um 20:00 schrieb Ram Rachum: > > What do you think about implementing functionality similar to the `find` > utility in Linux in the Pathlib module? I wanted this today, I had a script > to write to archive a bunch of files from a folder, and I decided to try > writing it in Python rather than in Bash. But I needed something stronger > than `Path.glob` in order to select the files. I wanted a regular > expression. (In this particular case, I wanted to get a list of all the > files excluding the `.git` folder and all files inside of it. > > > Me, too. I miss a find like method. I use os.walk() since more than 10 > years, but it still feels way too complicated. > > I asked about a library on softwarerecs some weeks ago: > > > http://softwarerecs.stackexchange.com/questions/26296/python-library-for-traversing-directory-tree-like-unix-command-line-tool-find > > > > > -- > http://www.thomas-guettler.de/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Dec 22 16:54:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 22 Dec 2015 21:54:55 +0000 (UTC) Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: Message-ID: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> On Tuesday, December 22, 2015 12:14 PM, Guido van Rossum wrote: >The UNIX find tool has many, many options. I think a Pythonicized, stripped-down version of the basic design of fts (http://man7.org/linux/man-pages/man3/fts.3.html) is as simple as you're going to get. After all, fts was designed to make it as easy as possible to implement find efficiently. In my incomplete Python wrapper around fts, the simplest use looks like: with fts(root) as f: for path in f: do_stuff(path) No two-level iteration, no need to join the root to the paths, no handling dirs and files separately. Of course for that basic use case, you could just write your own wrapper around os.walk: def flatwalk(*args, **kwargs): return (os.path.join(root, file) for file in files for root, dirs, files in os.walk(*args, **kwargs)) But more complex uses build on fts pretty readably: # find "$@" -H -xdev -type f -mtime 1 -iname '*.pyc' -exec do_stuff '{}' \; yesterday = datetime.now() - timedelta(days=1) with fts(top, stat=True, crossdev=False) as f: for path in f: if path.is_file and path.stat.st_mtime < yesterday and path.lower().endswith('.pyc'): do_stuff(path) When you actually need to go a directory at a time, like the spool directory size example in the stdlib, os.walk is arguably nicer, but fortunately os.walk already exists. The problem isn't designing a nice walk API; it's integrating it with pathlib.* It seems fundamental to the design of pathlib that Path objects never cache anything. But the whole point of using something like fts is to do as few filesystem calls as possible to get the information you need; if it throws away everything it did and forces you to retrieve the same information gain (possibly even in a less efficient way), that kind of defeats the purpose. Even besides efficiency, having those properties all nicely organized and ready for you can make the code simpler. Anyway, if you don't want either the efficiency or the simplicity, and just want an iterable of filenames or Paths, you might as well just use the wrapper around the existing os.walk that I wrote above. To make it works with Path objects: def flatpathwalk(root, *args, **kwargs): return map(path.Path, flatwalk(str(root), *args, **kwargs)) And then to use those Path objects: matches = (path for path in flatpathwalk(root) if pattern.match(str(path))) > For the general case it's probably easier to use os.walk(). But there are probably some > common uses that deserve better direct support in e.g. the glob module. Would just a way > to recursively search for matches using e.g. "**.txt" be sufficient? If not, can you > specify what else you'd like? (Just " find-like" is too vague.)>--Guido (mobile) pathlib already has a glob method, which handles '*/*.py' and even recursive '**/*.py' (and a match method to go with it). If that's sufficient, it's already there. Adding direct support for Path objects in the glob module would just be a second way to do the exact same thing. And honestly, if open, os.walk, etc. aren't going to work with Path objects, why should glob.glob? * Honestly, I think the problem here is that the pathlib module is just not useful. In a new language that used path objects--or, probably, URL objects--everywhere, it would be hard to design something better than pathlib, but as it is, while it's great for making really hairy path manipulation more readable, path manipulation never _gets_ really hairy, and os.path is already very well designed, and the fact that pathlib doesn't know how to interact with anything else in the stdlib or third-party code means that the wrapper stuff that constructs a Path on one end and calls str or bytes on the other end depending on which one you originally had adds as much complexity as you saved. But that's obviously off-topic here. From mike at selik.org Tue Dec 22 17:22:09 2015 From: mike at selik.org (Michael Selik) Date: Tue, 22 Dec 2015 22:22:09 +0000 Subject: [Python-ideas] Buffering iterators? In-Reply-To: References: Message-ID: On Sat, Dec 19, 2015 at 7:02 AM Michael Mitchell wrote: > Have you considered doing this at the plain Python level? Something such > as the following would have the desired semantics from my understanding. > > def buffered_iterator(it, size): > while True: > buffer = [next(it) for _ in range(size)] > for element in buffer: > yield element > There's a recipe in the itertools module for something like this ( https://docs.python.org/3.6/library/itertools.html#itertools-recipes). Check out ``def grouper``. A combination of starmap, repeat, and islice might work fine as well. args = (iterable, buffersize) chunks = starmap(islice, repeat(args)) Either way, you could then yield from the chunks to make it appear like a regular iterator. Not being a PyPy or Pyston expert, I have no clue if this scenario exists -- that a JIT compiling interpreter would not be able to prefetch chunks of the iterator without the extra buffering layer, but would be able to prefetch after the chunking step is added. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Dec 22 19:23:16 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Dec 2015 16:23:16 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: (Wow, what a rambling message. I'm not sure which part you hope to see addressed.) On Tue, Dec 22, 2015 at 1:54 PM, Andrew Barnert wrote: > On Tuesday, December 22, 2015 12:14 PM, Guido van Rossum > wrote: > > >The UNIX find tool has many, many options. > > > I think a Pythonicized, stripped-down version of the basic design of fts ( > http://man7.org/linux/man-pages/man3/fts.3.html) is as simple as you're > going to get. After all, fts was designed to make it as easy as possible to > implement find efficiently. The docs make no attempt at showing the common patterns. The API described looks horribly complex (I guess that's what you get when all that matters is efficient implementation). > In my incomplete Python wrapper around fts, the simplest use looks like: > > with fts(root) as f: > for path in f: > do_stuff(path) > > No two-level iteration, no need to join the root to the paths, no handling > dirs and files separately. > The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense, and remarkably often there *is* something where the two-level iteration helps (otherwise I'm sure you'd see lots of code that's trying to recover the directory by parsing the path and remembering the previous path and comparing the two). > > > Of course for that basic use case, you could just write your own wrapper > around os.walk: > > def flatwalk(*args, **kwargs): > return (os.path.join(root, file) > for file in files for root, dirs, files in os.walk(*args, > **kwargs)) > > But more complex uses build on fts pretty readably: > > # find "$@" -H -xdev -type f -mtime 1 -iname '*.pyc' -exec do_stuff > '{}' \; > yesterday = datetime.now() - timedelta(days=1) > with fts(top, stat=True, crossdev=False) as f: > for path in f: > if path.is_file and path.stat.st_mtime < yesterday and > path.lower().endswith('.pyc'): > do_stuff(path) > Why does this use a with *and* a for-loop? Is there some terribly important cleanup that needs to happen when the for-loop is aborted? It also shows off the arbitrariness of the fts API -- fts() seems to have a bunch of random keyword args to control a variety of aspects of its behavior and the returned path objects look like they have a rather bizarre API: e.g. why is is_file a property on path, mtime a property on path.stat, and lower() a method on path directly? (And would path also have an endswith() method directly, in case I don't need to lowercase it?) Of course that's can all be cleaned up easily enough -- it's a simple matter of API design. > > > When you actually need to go a directory at a time, like the spool > directory size example in the stdlib, os.walk is arguably nicer, but > fortunately os.walk already exists. > I've never seen that example. But just a few days ago I wrote a little bit of code where the os.walk() API came in handy: for root, dirs, files in os.walk(arg): print("Scanning %s (%d files):" % (root, len(files))) for file in files: process(os.path.join(root, file)) (The point is not that we have access to dirs separately, but that we have the directories filtered out of the count of files.) > The problem isn't designing a nice walk API; it's integrating it with > pathlib.* It seems fundamental to the design of pathlib that Path objects > never cache anything. But the whole point of using something like fts is to > do as few filesystem calls as possible to get the information you need; if > it throws away everything it did and forces you to retrieve the same > information gain (possibly even in a less efficient way), that kind of > defeats the purpose. Even besides efficiency, having those properties all > nicely organized and ready for you can make the code simpler. > Would it make sense to engage in a little duck typing and have an API that mimicked the API of Path objects but caches the stat() information? This could be built on top of scandir(), which provides some of the information without needing extra syscalls (depending on the platform). But even where a syscall() is still needed, this hypothetical Path-like object could cache the stat() result. If this type of result was only returned by a new hypothetical integration of os.walk() and pathlib, the caching would not be objectionable (it would simply be a limitation of the pathwalk API, rather than of the Path object). > Anyway, if you don't want either the efficiency or the simplicity, and > just want an iterable of filenames or Paths, you might as well just use the > wrapper around the existing os.walk that I wrote above. To make it works > with Path objects: > > > def flatpathwalk(root, *args, **kwargs): > > return map(path.Path, flatwalk(str(root), *args, **kwargs)) > > And then to use those Path objects: > > matches = (path for path in flatpathwalk(root) if > pattern.match(str(path))) > > > For the general case it's probably easier to use os.walk(). But there > are probably some > > common uses that deserve better direct support in e.g. the glob module. > Would just a way > > to recursively search for matches using e.g. "**.txt" be sufficient? If > not, can you > > specify what else you'd like? (Just " find-like" is too vague.)>--Guido > (mobile) > > pathlib already has a glob method, which handles '*/*.py' and even > recursive '**/*.py' (and a match method to go with it). If that's > sufficient, it's already there. Adding direct support for Path objects in > the glob module would just be a second way to do the exact same thing. And > honestly, if open, os.walk, etc. aren't going to work with Path objects, > why should glob.glob? > Oh, I'd forgotten about pathlib.Path.rglob(). Maybe the OP also didn't know about it? He claimed he just wanted to use regular expressions so he could exclude .git directories. To tell the truth, I don't have much sympathy for that: regular expressions are just too full of traps to make a good API for file matching, and it wouldn't even strictly be sufficient to filter the entire directory tree under .git unless you added matching on the entire path -- but then you'd still pay for the cost of traversing the .git tree even if your regex were to exclude it entirely, because the library wouldn't be able to introspect the regex to determine that for sure. He also insisted on staying withing the Path framework, which is an indication that maybe what we're really looking for here is the hybrid of walk/scandir/Path that I was trying to allude to above. > * Honestly, I think the problem here is that the pathlib module is just > not useful. In a new language that used path objects--or, probably, URL > objects--everywhere, it would be hard to design something better than > pathlib, but as it is, while it's great for making really hairy path > manipulation more readable, path manipulation never _gets_ really hairy, > and os.path is already very well designed, and the fact that pathlib > doesn't know how to interact with anything else in the stdlib or > third-party code means that the wrapper stuff that constructs a Path on one > end and calls str or bytes on the other end depending on which one you > originally had adds as much complexity as you saved. But that's obviously > off-topic here. > Seems the OP disagrees with you here -- he really wants to use pathlib (as was clear from his response to a suggestion to use fnmatch). Truly pushing for adoption of a new abstraction like this takes many years -- pathlib was new (and provisional) in 3.4 so it really hasn't been long enough to give up on it. The OP hasn't! So, perhaps the pathlib.Path class needs to have some way to take in a DirEntry produced by os.scandir() and a flag to allow it to cache stat() results? Then we could easily write a pathlib.walk() function that's like os.walk() but returning caching Path objects. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Tue Dec 22 19:53:10 2015 From: python at lucidity.plus.com (Erik) Date: Wed, 23 Dec 2015 00:53:10 +0000 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: <5679F076.90003@lucidity.plus.com> On 23/12/15 00:23, Guido van Rossum wrote: > So, perhaps the pathlib.Path class needs to have some way to take in a > DirEntry produced by os.scandir() and a flag to allow it to cache stat() > results? Then we could easily write a pathlib.walk() function that's > like os.walk() but returning caching Path objects. Yes please. I raised this recently in a thread that died (but with no negative responses - see below). I started looking at the various modules to try to bring the whole thing together into a reasonable proposal, but it was just a can of worms (glob, fnmatch, pathlib, os.scandir, os.walk, os.fwalk, fts ...). I'm afraid I don't have the free cycles to try to tackle that, so I ducked out. It would be great if all of that could be somehow brought together into a cohesive filesystem module. On 27/11/15 13:49, Eric Fahlgren wrote: >> -----Original Message----- From: Erik [snip] >> So, I'd like to suggest an os.walk()-like API that returns the >> os.scandir() DirEntry structures rather than names (*). I have my >> own local version that's just a copy of os.walk() that appends >> "entry" rather than "entry.name" to the returned lists, but that's >> a nasty way of achieving this. >> >> How to do it - >> >> os.walk() "direntries=True" keyword? os.walkentries() function? >> Something else better than those? > > "walk" + "scandir" = "walkdir"??? > > I'm definitely +1 on this, as it is fresh on my mind, too. I just > converted our build tools over to use a homebrew walk as you did, and > now use DirEntry instead of path names almost exclusively. > > EricF E. From abarnert at yahoo.com Tue Dec 22 22:05:57 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 22 Dec 2015 19:05:57 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: <9F8D042E-D4EA-439B-ADEF-1593851E714A@yahoo.com> On Dec 22, 2015, at 16:23, Guido van Rossum wrote: > > (Wow, what a rambling message. I'm not sure which part you hope to see addressed.) I don't know that anything actually need to be addressed here at all. Struggling to see the real problem that needs to be solved means a bit of guesswork at what's relevant to the solution... >> On Tue, Dec 22, 2015 at 1:54 PM, Andrew Barnert wrote: >> On Tuesday, December 22, 2015 12:14 PM, Guido van Rossum wrote: >> >> >The UNIX find tool has many, many options. >> >> >> I think a Pythonicized, stripped-down version of the basic design of fts (http://man7.org/linux/man-pages/man3/fts.3.html) is as simple as you're going to get. After all, fts was designed to make it as easy as possible to implement find efficiently. > > The docs make no attempt at showing the common patterns. The API described looks horribly complex (I guess that's what you get when all that matters is efficient implementation). Yes, that's why I gave a few examples, using my stripped-down and Pythonicized wrapper, so you don't have to work it all out from scratch by trying to read the manpage and guess how you'd use it in C. But the point is, that's what something as flexible as find looks like as a function. > The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense, and remarkably often there *is* something where the two-level iteration helps (otherwise I'm sure you'd see lots of code that's trying to recover the directory by parsing the path and remembering the previous path and comparing the two). Yes--as I said below, sometimes you really do want to go a directory at a time, and for that, it's hard to beat the API of os.walk. But when it's unnecessary, it makes the code look more complicated than necessary, so a flat iteration can be nicer. And, significantly, that, and the need to join all over the place, are the only things I can imagine that people would find worth "solving" about os.walk's API. >> But more complex uses build on fts pretty readably: >> >> # find "$@" -H -xdev -type f -mtime 1 -iname '*.pyc' -exec do_stuff '{}' \; >> yesterday = datetime.now() - timedelta(days=1) >> with fts(top, stat=True, crossdev=False) as f: >> for path in f: >> if path.is_file and path.stat.st_mtime < yesterday and path.lower().endswith('.pyc'): >> do_stuff(path) > > Why does this use a with *and* a for-loop? Is there some terribly important cleanup that needs to happen when the for-loop is aborted? Same reason this code uses with and a for loop: with open(path) as f: for line in f: do_stuff(line) Cleaning up a file handle isn't _terribly_ important, but it's not _unimportant_, and isn't it generally a good habit? > It also shows off the arbitrariness of the fts API -- fts() seems to have a bunch of random keyword args to control a variety of aspects of its behavior and the returned path objects look like they have a rather bizarre API: e.g. why is is_file a property on path, mtime a property on path.stat, and lower() a method on path directly? (And would path also have an endswith() method directly, in case I don't need to lowercase it?) Explaining the details of the API design takes this even farther off-topic, but: my initial design was based on the same Path class that the stdlib's Path is: a subclass of str that adds attributes/properties for things that are immediately available and methods for things that aren't. (The names are de-abbreviated versions of the C names.) As for stat, for one thing, people already have code (and mental models) to deal with stat (named)tuples. Plus, if you request a fast walk without stat information (which often goes considerably faster than scandir--I've got a a Python tool that actually _beats_ the find invocation it replaced), or the stat on a file fails, I think it's clearer to have "stat" be None than to have 11-18 arbitrary attributes be None while the rest are still there. At any rate, I was planning to take another pass at the design after finishing the Windows and generic implementations, but the project I was working on turned out to need this only for OS X, so I never got to that point. > Of course that's can all be cleaned up easily enough -- it's a simple matter of API design. > >> When you actually need to go a directory at a time, like the spool directory size example in the stdlib, os.walk is arguably nicer, but fortunately os.walk already exists. > > I've never seen that example. The first example under os.walk in the library docs is identical to the wiki spool example, except the first line points at subpackages of the stdlib email package instead of the top email spool directory, and an extra little bit was added at the end: for root, dirs, files in os.walk('python/Lib/email'): print(root, "consumes", end=" ") print(sum(getsize(join(root, name)) for name in files), end=" ") print("bytes in", len(files), "non-directory files") if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories So, take that instead. Perfectly good example. And, while you could write that with a flat Iterator in a number of ways, none are going to be as simple as with two levels. >> The problem isn't designing a nice walk API; it's integrating it with pathlib.* It seems fundamental to the design of pathlib that Path objects never cache anything. But the whole point of using something like fts is to do as few filesystem calls as possible to get the information you need; if it throws away everything it did and forces you to retrieve the same information gain (possibly even in a less efficient way), that kind of defeats the purpose. Even besides efficiency, having those properties all nicely organized and ready for you can make the code simpler. > > Would it make sense to engage in a little duck typing and have an API that mimicked the API of Path objects but caches the stat() information? This could be built on top of scandir(), which provides some of the information without needing extra syscalls (depending on the platform). But even where a syscall() is still needed, this hypothetical Path-like object could cache the stat() result. If this type of result was only returned by a new hypothetical integration of os.walk() and pathlib, the caching would not be objectionable (it would simply be a limitation of the pathwalk API, rather than of the Path object). The question is what code that uses (duck-typed) Path objects expects. I'm pretty sure there was extensive discussion of why Paths should never cache during the PEP 428 discussions, and I vaguely remember both Antoine Pitrou and Nick Coghlan giving good summaries more recently, but I don't remember enough details to say whether a duck-typed Path-like object would be just as bad. But I'm guessing it could have the same problems--if some function takes a Path object, stores it for later, and expects to use it to get live info, handing it something that quacks like a Path but returns snapshot info instead would be pretty insidious. >> > ... But there are probably some >> > common uses that deserve better direct support in e.g. the glob module. Would just a way >> > to recursively search for matches using e.g. "**.txt" be sufficient? If not, can you >> > specify what else you'd like? (Just " find-like" is too vague.)>--Guido (mobile) >> >> pathlib already has a glob method, which handles '*/*.py' and even recursive '**/*.py' (and a match method to go with it). If that's sufficient, it's already there. Adding direct support for Path objects in the glob module would just be a second way to do the exact same thing. And honestly, if open, os.walk, etc. aren't going to work with Path objects, why should glob.glob? > > Oh, I'd forgotten about pathlib.Path.rglob(). Or just Path.glob with ** in the pattern. > Maybe the OP also didn't know about it? So, did Antoine Pitrou already solve this problem 3 years ago (or Jason Orendorff many years before that), possibly barring a minor docs tweak, or is there still something to consider here? > He claimed he just wanted to use regular expressions so he could exclude .git directories. To tell the truth, I don't have much sympathy for that: regular expressions are just too full of traps to make a good API for file matching, and it wouldn't even strictly be sufficient to filter the entire directory tree under .git unless you added matching on the entire path -- but then you'd still pay for the cost of traversing the .git tree even if your regex were to exclude it entirely, because the library wouldn't be able to introspect the regex to determine that for sure. I agree with everything here. I believe Path.glob can do everything he needs, and what he asked for instead couldn't do any more. It's dead-easy to imperatively apply a regex to decide whether to prune each dir in walk (or fts). Or to do the same to the joined path or the abspath. Or to use fnmatch instead of regex, or an arbitrary predicate function. Or to reverse the sense to mean only recurse on these instead of skip these. Imagine what a declarative API that allowed all that would look like. Even find doesn't have any of those options (at least not portably), and most people have to read guides to the manpage before they can read the manpage. At any rate, there's no reason you couldn't add some regex methods to Path and/or special Path handling code to regex to make that imperative code slightly easier, but I don't see how "pattern.match(str(path))" is any worse than "os.scandir(str(path))" or "json.load(str(path))" or any of the zillion other places where you have to convert paths to strings explicitly, or what makes regex more inherently path-related than those things. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 26 04:28:49 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Dec 2015 11:28:49 +0200 Subject: [Python-ideas] Unambiguous repr for recursive objects Message-ID: Currently repr() for recursive object replaces its recurred representation wish a placeholder containing "...". For list this is "[...]": >>> a = [1, 2] >>> a.append(a) >>> a [1, 2, [...]] For dict this is "{...}": >>> d = {1: 2} >>> d[3] = d >>> d {1: 2, 3: {...}} For OrderedDict (Python implementation or 3.4-) this is just "...": >>> from collections import OrderedDict >>> od = OrderedDict({1: 2}) >>> od[3] = od >>> od OrderedDict([(1, 2), (3, ...)]) The problem is that "[...]", and "{...}", and just "..." are valid Python expressions and above representations can be evaluated to different objects. I propose to use uniform and unambiguous non-evaluable representation for recursive objects. I have two ideas: 1. "<...>". Plus: this is as short as "[...]" and "{...}". Minus: we loss even a little tip about the type of recurred object. 2. Use the default implementation, object.__repr__(). E.g. "". Plus: we get even more information than before. Not just exact name of the type, but the identifier of the object. This can be useful in the case of complex structure containing a number of potentially recursive objects. Minus: it is longer. From ncoghlan at gmail.com Sat Dec 26 06:08:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Dec 2015 21:08:18 +1000 Subject: [Python-ideas] Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: On 26 December 2015 at 19:28, Serhiy Storchaka wrote: > The problem is that "[...]", and "{...}", and just "..." are valid Python > expressions and above representations can be evaluated to different objects. I believe this is just an oversight from when "..." became usable outside subscripts in 3.0, but I agree it's a discrepancy worth addressing. > I propose to use uniform and unambiguous non-evaluable representation for > recursive objects. I have two ideas: > > 1. "<...>". > > Plus: this is as short as "[...]" and "{...}". > > Minus: we loss even a little tip about the type of recurred object. I think that would still be an improvement - the hinting only works for types with native syntax anyway, while for arbitrary containers it's already necessary to fall back to a generic notation like "<...>". > 2. Use the default implementation, object.__repr__(). E.g. " 0xb7111498>". Another minor disadvantage is that it's not as easy to write doctests or simple examples, as the repr isn't predictable (or you have to put a "..." in as a placeholder for the ID anyway). A larger disadvantage is that you can't readily spot that it's a recursive reference, since that's implicit in the ID of the given object. To make this less abstract, here's a simple example: >>> a = [1, 2] >>> b = [a] >>> a.append(b) >>> a [1, 2, [[...]]] >>> b [[1, 2, [...]]] With the first alternative, that becomes: >>> a [1, 2, [<...>]] >>> b [[1, 2, <...>]] I think that's actually clearer than the status quo (since the circular reference is more visually distinct), but it would retain the current ambiguity if the containers also reference themselves: >>> a.append(a) >>> b.append(b) >>> a [1, 2, [[...], [...]], [...]] >>> b [[1, 2, [...], [...]], [...]] That scenario is likely rare enough not to worry about - visualising such data structures sensibly is tough in general, and arguably best left to use case specific display routines, rather than trying to handle it with the default container repr. If the recursive display changed to use object.__repr__ instead, we'd get something like: >>> object.__repr__(a) '' >>> object.__repr__(b) '' >>> a [1, 2, [, ], ] >>> b [[1, 2, , ], ] So +1 from me for switching to "<...>" in 3.6+ to make the default recursive repr for containers an invalid expression again, but only +0 for using the full object.__repr__ - the extra precision in the more complex case hurts readability in the typical case, without really improving readability in the complex cases that would be the intended beneficiaries. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ben+python at benfinney.id.au Sat Dec 26 06:36:55 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 26 Dec 2015 22:36:55 +1100 Subject: [Python-ideas] Unambiguous repr for recursive objects References: Message-ID: <85io3lxwyw.fsf@benfinney.id.au> Nick Coghlan writes: > To make this less abstract, here's a simple example: > > >>> a = [1, 2] > >>> b = [a] > >>> a.append(b) > >>> a > [1, 2, [[...]]] > >>> b > [[1, 2, [...]]] > > With the first alternative, that becomes: > > >>> a > [1, 2, [<...>]] > >>> b > [[1, 2, <...>]] That clarifies it quite well. The fact ?<...>? is not valid syntax is an improvement: it helps to signal this is a display for something that can't be simply serialised. The ?? enclosing characters also have a nice symmetry with the default representation of so many types. +1 to change the representation of ?recursive references? to ?<...>?. -- \ ?The surest way to corrupt a youth is to instruct him to hold | `\ in higher esteem those who think alike than those who think | _o__) differently.? ?Friedrich Nietzsche, _The Dawn_, 1881 | Ben Finney From storchaka at gmail.com Sat Dec 26 07:00:50 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Dec 2015 14:00:50 +0200 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: Message-ID: On 07.11.15 00:11, Amir Rachum wrote: > I am suggesting the addition of a collections abstract base class called > "Ordered". Its meaning is that a collection's iteration order is part of > its API. The bulk of this mail describes a use case for this. The reason > I believe that such abstract base class is required is that there is no > way to test this behavior in a given class. An ordered collection has > the exact same interface as an unordered collection (e.g, dict and > OrderedDict), other than a _promise_ of the API that the order in which > this collection will be iterated has some sort of meaning (In > OrderedDict, it is the order in which keys were added to it.) > > As examples, set, frozenset, dict and defaultdict should *not* be > considered as ordered. list, OrderedDict, deque and tuple should be > considered ordered. Actually we already have such abstract class. It's typing.Reversible. Iterating non-ordered collection doesn't make sense. >>> issubclass(list, typing.Reversible) True >>> issubclass(collections.deque, typing.Reversible) True >>> issubclass(collections.OrderedDict, typing.Reversible) True >>> issubclass(type(collections.OrderedDict().items()), typing.Reversible) True >>> issubclass(dict, typing.Reversible) False >>> issubclass(set, typing.Reversible) False >>> issubclass(frozenset, typing.Reversible) False >>> issubclass(collections.defaultdict, typing.Reversible) False >>> issubclass(type({}.items()), typing.Reversible) False Unfortunately the test returns False for tuple, str, bytes, bytearray, and array: >>> issubclass(tuple, typing.Reversible) False >>> issubclass(str, typing.Reversible) False >>> issubclass(bytes, typing.Reversible) False >>> issubclass(bytearray, typing.Reversible) False >>> issubclass(array.array, typing.Reversible) False This looks as a bug in typing.Reversible. From wes.turner at gmail.com Sat Dec 26 09:53:57 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 08:53:57 -0600 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <22077.34177.66228.508517@turnbull.sk.tsukuba.ac.jp> <4D51ABF1-2961-44E2-B7D3-F77FAFD05EAC@yahoo.com> <22078.3491.476646.759893@turnbull.sk.tsukuba.ac.jp> <67E66963-8086-4809-8EA1-23E69B264E75@yahoo.com> <22079.16733.991880.97360@turnbull.sk.tsukuba.ac.jp> <9F6BA67D-29D3-4CB8-9542-E5C9DA36740F@yahoo.com> Message-ID: "[Python-ideas] OrderedCounter and OrderedDefaultDict" https://mail.python.org/pipermail/python-ideas/2015-November/037163.html - +2 for collections.ABC.Ordered - exception on initialization w/ an unordered collection /// __init__force_from_unordered On Nov 9, 2015 7:21 PM, "Michael Selik" wrote: > I found a use case, out in the wild! > > I've been searching through GitHub for "ordered" and similar phrases. > Much of the usage is OrderedDict and OrderedSet, which really don't > benefit from an essentially redundant inheritance from a hypothetical > collections.Ordered. There are some tools that enable UI reordering, > and a bunch of tree- and graph-traversal ordering functions. Again, > not much benefit from a collections.Ordered. > > However, Django has a class factory that creates an attribute `can_order`: > > def formset_factory(form, formset=BaseFormSet, extra=1, can_order=False, > can_delete=False, max_num=None, validate_max=False, > min_num=None, validate_min=False): > """Return a FormSet for the given form class.""" > if min_num is None: > min_num = DEFAULT_MIN_NUM > if max_num is None: > max_num = DEFAULT_MAX_NUM > # hard limit on forms instantiated, to prevent memory-exhaustion > attacks > # limit is simply max_num + DEFAULT_MAX_NUM (which is 2*DEFAULT_MAX_NUM > # if max_num is None in the first place) > absolute_max = max_num + DEFAULT_MAX_NUM > attrs = {'form': form, 'extra': extra, > 'can_order': can_order, 'can_delete': can_delete, > 'min_num': min_num, 'max_num': max_num, > 'absolute_max': absolute_max, 'validate_min': validate_min, > 'validate_max': validate_max} > return type(form.__name__ + str('FormSet'), (formset,), attrs) > > > > This attribute gets set in several places, but checked only twice in > the Django codebase. Unfortunately, I think switching to the proposed > inheritance mechanism would make the code worse, not better: > `self.can_order` would become `isinstance(self, collections.Ordered)`. > The readability of the Django internal code would not be much > different, but users would lose consistency of introspectability as > the other features like `can_delete` are simple class attributes. > > > > On Mon, Nov 9, 2015 at 11:29 AM, Guido van Rossum > wrote: > > Well, that would still defeat the purpose, wouldn't it? The items are no > > more ordered than the headers dict itself. Also, items() doesn't return a > > sequence -- it's an ItemsView (which inherits from Set) and presumably > it's > > not Ordered. > > > > I guess my question is not so much how to prevent getting an exception -- > > I'm trying to tease out what the right order for the headers would be. Or > > perhaps I'm just trying to understand what the code is doing (the snippet > > shown mostly looks like bad code to me). > > > > On Mon, Nov 9, 2015 at 12:03 AM, Ram Rachum wrote: > >> > >> I'm not Andrew, but I'm guessing simply writing > >> `OrderedDict(headers.items())`. > >> > >> On Mon, Nov 9, 2015 at 6:28 AM, Guido van Rossum > wrote: > >>> > >>> So if OrderedDict had always rejected construction from a dict, how > would > >>> you have written this? > >>> > >>> > >>> On Sunday, November 8, 2015, Andrew Barnert via Python-ideas > >>> wrote: > >>>> > >>>> On Nov 8, 2015, at 14:10, Serhiy Storchaka > wrote: > >>>> > > >>>> >> On 08.11.15 23:12, Sjoerd Job Postmus wrote: > >>>> >> On 8 Nov 2015, at 20:06, Amir Rachum >>>> >> > wrote: > >>>> >>> As part of BasicStruct I intend to allow the use of mapping types > as > >>>> >>> __slots__, with the semantics of default values.. > >>>> >>> So it'll look something like this: > >>>> >>> > >>>> >>> class Point(BasicStruct): > >>>> >>> __slots__ = {'x': 5, 'y': 7} > >>>> >> > >>>> >> So instead they'll write > >>>> >> __slots__ = OrderedDict({'x': 5, 'y': 7}) > >>>> >> Causing the same issues? > >>>> > > >>>> > Perhaps OrderedDict should reject unordered sources. Hey, here is > yet > >>>> > one use case! > >>>> > >>>> I've maintained code that does this: > >>>> > >>>> self.headers = OrderedDict(headers) > >>>> self.origheaders = len(headers) > >>>> > >>>> ? so it can later do this: > >>>> > >>>> altheaders = list(self.headers.items())[self.origheaders:] > >>>> > >>>> Not a great design, but one that exists in the wild, and would be > broken > >>>> by OrderedDict not allowing a dict as an argument. > >>>> > >>>> Also, this wouldn't allow creating an OrderedDict from an empty dict > >>>> (which seems far less stupid, but I didn't lead with it because I > can't > >>>> remember seeing it in real code). > >>>> > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> Code of Conduct: http://python.org/psf/codeofconduct/ > >>> > >>> > >>> > >>> -- > >>> --Guido (mobile) > >>> > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > >> > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 26 10:04:43 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Dec 2015 17:04:43 +0200 Subject: [Python-ideas] Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: On 26.12.15 13:08, Nick Coghlan wrote: > So +1 from me for switching to "<...>" in 3.6+ to make the default > recursive repr for containers an invalid expression again, but only +0 > for using the full object.__repr__ - the extra precision in the more > complex case hurts readability in the typical case, without really > improving readability in the complex cases that would be the intended > beneficiaries. http://bugs.python.org/issue25956 From wes.turner at gmail.com Sat Dec 26 10:02:45 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 09:02:45 -0600 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <22077.34177.66228.508517@turnbull.sk.tsukuba.ac.jp> <4D51ABF1-2961-44E2-B7D3-F77FAFD05EAC@yahoo.com> <22078.3491.476646.759893@turnbull.sk.tsukuba.ac.jp> <67E66963-8086-4809-8EA1-23E69B264E75@yahoo.com> <22079.16733.991880.97360@turnbull.sk.tsukuba.ac.jp> Message-ID: On Nov 8, 2015 1:46 PM, "Guido van Rossum" wrote: > > I'm warming up slightly to the idea of this ABC. > > I've re-read Amir's post, and if I found myself in his situation I would have used a combination of documenting that __slots__ needs to be ordered and at runtime only checking for a few common definitely-bad cases. E.g. for exactly set or dict (not subclasses, since OrderedDict inherits from dict) would cover most of the common mistakes: most people will use a literal in their __slots__ definition so we just need to watch out for the common literals. Those users who are sophisticated enough to use some advanced mapping type in their __slots__ should just be expected to deal with the consequences. > > But the Ordered ABC, while not essential (unless you're a perfectionist, in which case you're in the wrong language community anyways :-) still fills a niche. I take issue with this comment because it's in the middle of the text. It's more than a perfection thing: - Ordered collections may already be Sorted [1] - [ ] would collectiond.namedtuple [~struct w/ slots] also be Ordered - [1] here's a rejected PR for jupyter/nbformat https://github.com/jupyter/nbformat/pull/30 "ENH: v4/nbjson.py: json.loads(object_pairs_hook=collections.OrderedDict)" > > The Ordered ABC should have no additional methods, and no default implementations. I think it should apply to collections but not to iterators. It should apply at the level of the read-only interface. Sequence is always Ordered. Mapping and Set are not by default, but can have it added. OrderedDict is the prime (maybe only) example -- it's a MutableMapping and Ordered. We might eventually get an OrderedSet. > > A sorted set or mapping (e.g. one implemented using some kind of tree) should also be considered Ordered, even though otherwise this is a totally different topic -- while some other languages or branches of math use "order" to refer to sorting (e.g. "partial ordering"), in Python we make a distinction: on the one hand there's sorted() and list.sort(), and on the other hand there's OrderedDict. > > So, I think that the Ordered ABC proposed here is totally well-defined and mildly useful. It may be somewhat confusing (because many people when they first encounter the term "ordered" they think it's about sorting -- witness some posts in this thread). The use cases are not very important. I guess I'm -0 on adding it -- if better use cases than Amir's are developed I might change to +0. (I don't care much about Serhiy's use cases.) > > Sorry for the rambling. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 26 11:36:31 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 09:36:31 -0700 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: --Guido (mobile) ---------- Forwarded message ---------- From: Date: Dec 26, 2015 9:33 AM Subject: Re: [Python-ideas] Unambiguous repr for recursive objects To: Cc: Your message has been rejected, probably because you are not subscribed to the mailing list and the list's policy is to prohibit non-members from posting to it. If you think that your messages are being rejected in error, contact the mailing list owner at python-ideas-owner at python.org. ---------- Forwarded message ---------- From: Guido van Rossum To: Serhiy Storchaka Cc: Python-Ideas Date: Sat, 26 Dec 2015 09:33:37 -0700 Subject: Re: [Python-ideas] Unambiguous repr for recursive objects I disagree. We should not take this guideline too literally. The dots are easily understood and nobody has been fooled by a list containing an ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?) --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 26 11:43:56 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 09:43:56 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: Message-ID: Is there also a collections.Reversible? Either way, can you report that bug in the typehinting tracker on GitHub. --Guido (mobile) On Dec 26, 2015 5:01 AM, "Serhiy Storchaka" wrote: > On 07.11.15 00:11, Amir Rachum wrote: > >> I am suggesting the addition of a collections abstract base class called >> "Ordered". Its meaning is that a collection's iteration order is part of >> its API. The bulk of this mail describes a use case for this. The reason >> I believe that such abstract base class is required is that there is no >> way to test this behavior in a given class. An ordered collection has >> the exact same interface as an unordered collection (e.g, dict and >> OrderedDict), other than a _promise_ of the API that the order in which >> this collection will be iterated has some sort of meaning (In >> OrderedDict, it is the order in which keys were added to it.) >> >> As examples, set, frozenset, dict and defaultdict should *not* be >> considered as ordered. list, OrderedDict, deque and tuple should be >> considered ordered. >> > > Actually we already have such abstract class. It's typing.Reversible. > Iterating non-ordered collection doesn't make sense. > > >>> issubclass(list, typing.Reversible) > True > >>> issubclass(collections.deque, typing.Reversible) > True > >>> issubclass(collections.OrderedDict, typing.Reversible) > True > >>> issubclass(type(collections.OrderedDict().items()), > typing.Reversible) > True > >>> issubclass(dict, typing.Reversible) > False > >>> issubclass(set, typing.Reversible) > False > >>> issubclass(frozenset, typing.Reversible) > False > >>> issubclass(collections.defaultdict, typing.Reversible) > False > >>> issubclass(type({}.items()), typing.Reversible) > False > > Unfortunately the test returns False for tuple, str, bytes, bytearray, and > array: > > >>> issubclass(tuple, typing.Reversible) > False > >>> issubclass(str, typing.Reversible) > False > >>> issubclass(bytes, typing.Reversible) > False > >>> issubclass(bytearray, typing.Reversible) > False > >>> issubclass(array.array, typing.Reversible) > False > > This looks as a bug in typing.Reversible. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Dec 26 11:48:25 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 10:48:25 -0600 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: So, Would I be correct in that, because of this regression in __repr__ behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off? On Dec 26, 2015 11:37 AM, "Guido van Rossum" wrote: > --Guido (mobile) > ---------- Forwarded message ---------- > From: > Date: Dec 26, 2015 9:33 AM > Subject: Re: [Python-ideas] Unambiguous repr for recursive objects > To: > Cc: > > Your message has been rejected, probably because you are not > subscribed to the mailing list and the list's policy is to prohibit > non-members from posting to it. If you think that your messages are > being rejected in error, contact the mailing list owner at > python-ideas-owner at python.org. > > > > ---------- Forwarded message ---------- > From: Guido van Rossum > To: Serhiy Storchaka > Cc: Python-Ideas > Date: Sat, 26 Dec 2015 09:33:37 -0700 > Subject: Re: [Python-ideas] Unambiguous repr for recursive objects > > I disagree. We should not take this guideline too literally. The dots are > easily understood and nobody has been fooled by a list containing an > ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?) > > --Guido (mobile) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 26 11:51:36 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Dec 2015 18:51:36 +0200 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: Message-ID: On 26.12.15 18:43, Guido van Rossum wrote: > Is there also a collections.Reversible? Either way, can you report that > bug in the typehinting tracker on GitHub. Ah, I didn't know there is the typehinting tracker. I had reported this on CPython tracker in the comment to issue #25864 [1] (the issue itself is related to Reversible too). What is the address of the typehinting tracker? [1] http://bugs.python.org/issue25864#msg256910 From guido at python.org Sat Dec 26 11:55:08 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 09:55:08 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: Message-ID: https://github.com/ambv/typehinting --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Dec 26 11:59:25 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Dec 2015 18:59:25 +0200 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: > I disagree. We should not take this guideline too literally. The dots > are easily understood and nobody has been fooled by a list containing an > ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?) Yes, the repr of an ellipsis is the string 'Ellipsis'. But the repr of recursive list still looks as Python expression, and if somebody uses repr/eval wraparound, he will silently get wrong result instead of an error. From guido at python.org Sat Dec 26 12:15:43 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 10:15:43 -0700 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: Well, that is not the point of the repr() guideline. It is so you can understand what value you got. There are plenty of other cases where eval() of the repr silently gives something different, e.g. when the same object occurs multiple times. Neither proposal is clearer to understand. --Guido (mobile) On Dec 26, 2015 9:59 AM, "Serhiy Storchaka" wrote: > I disagree. We should not take this guideline too literally. The dots >> are easily understood and nobody has been fooled by a list containing an >> ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?) >> > > Yes, the repr of an ellipsis is the string 'Ellipsis'. But the repr of > recursive list still looks as Python expression, and if somebody uses > repr/eval wraparound, he will silently get wrong result instead of an error. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Dec 26 15:09:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Dec 2015 12:09:18 -0800 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: Message-ID: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> On Dec 26, 2015, at 04:00, Serhiy Storchaka wrote: > >> On 07.11.15 00:11, Amir Rachum wrote: >> I am suggesting the addition of a collections abstract base class called >> "Ordered". Its meaning is that a collection's iteration order is part of >> its API. The bulk of this mail describes a use case for this. The reason >> I believe that such abstract base class is required is that there is no >> way to test this behavior in a given class. An ordered collection has >> the exact same interface as an unordered collection (e.g, dict and >> OrderedDict), other than a _promise_ of the API that the order in which >> this collection will be iterated has some sort of meaning (In >> OrderedDict, it is the order in which keys were added to it.) >> >> As examples, set, frozenset, dict and defaultdict should *not* be >> considered as ordered. list, OrderedDict, deque and tuple should be >> considered ordered. > > Actually we already have such abstract class. It's typing.Reversible. But surely an infinite list, for example, is ordered but not reversible. Also, typing types aren't abstract base classes--one is for static type checking, the other for runtime tests. Of course they're closely related, but if they were the same thing, we wouldn't need a separate module for typing in the first place. Of course there's nothing stopping us from adding collections.abc.Reversible, but that still doesn't solve the problem that not all ordered things are reversible. (I still don't think Ordered is necessary--but if it is, I don't think Reversible being kind of close helps, any more than Sequence and Iterable both being kind of close helps.) > Unfortunately the test returns False for tuple, str, bytes, bytearray, and array: It's defined (and implemented) as testing for the presence of __reversed__. But the reverse function works on types that don't implement __reversed__ if they implement the old-style sequence protocol, which can't be tested structurally. Iterable is defined similarly, but it's a supertype of Sequence, and all of those builtin types get registered explicitly with Sequence (as some third-party types do), so they're all Iterable too. The obvious fix is to make Reversible a subtype of Iterable, and Sequence a subtype of Reversible instead of Iterable. That would fix tuple, str, and all the other types that are registered explicitly with Sequence or MutableSequence. This still doesn't cover OrderedDict and friends, but they could be explicitly registered with Reversible. I think for any solution to work with static typing, you'd also need to change the hierarchy in typing to parallel the new hierarchy in collections.abc, and change typing.Reversible to use collections.abc.Reversible as its "extra". One last thing: issue 25864, about all mappings except dict and its subclasses accidentally (and incorrectly) implementing the old-style sequence protocol well enough that when you call reverse on them, you successfully get an unusable iterator, instead of getting a TypeError. The only obvious fix is to add a __reversed__ that raises. But, as you pointed out there, that makes the problem with typing.Reversible (and any collections.abc.Reversible) worse. Currently, by being overly strict, Reversible happens to fail on Mapping subclasses, for the same reason it fails on things that actually _are_ properly reversible. I'm not sure what the solution is there. Fixing both Reversible and Mapping will accidentally make all Mappings statically pass as Reversible, which we definitely don't want. Maybe we need a way to explicitly mark (for type checking) that a method isn't implemented, or to explicitly "unregister" from an ABC and/or typing type that takes precedence over structural checks? From abarnert at yahoo.com Sat Dec 26 16:05:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Dec 2015 21:05:11 +0000 (UTC) Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: Message-ID: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> On Saturday, December 26, 2015 8:48 AM, Wes Turner wrote: >So, >Would I be correct in that, because of this regression in __repr__ behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off? No. 2.x did not provide total information. It used the exact same __repr__ as 3.x. (If you can come up with a way to provide total information that's readable to both humans and the parser, I'm sure everyone would love to see it.) Any tests that string compare __repr__ to test the equality of two lists in 2.x will get the same results in 3.x. They're still probably bad tests, but no worse than before. The only difference is that the `...` is a valid literal in 3.x, so `[1, 2, [...]]` is a valid list display in 3.x, and it wasn't in 2.x. (Even there, tests that assume __repr__ equality are no more broken than before: a list containing 1, 2, and itself reprs as `[1, 2, [...]]`, while a list containing 1, 2, and a list containing `...` reprs as `[1, 2, [Ellipsis]]`, so they will not be mistakenly compared equal.) As Serhiy points out, there actually _is_ a regression here: tests that depend on the fact that a circular list will raise a SyntaxError on eval(repr(x)) do break with 3.0. I doubt there were many such tests, given that nobody's noticed the problem until half a decade later, but I suppose that is a regression. At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does. From wes.turner at gmail.com Sat Dec 26 18:00:22 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 17:00:22 -0600 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Dec 26, 2015 4:05 PM, "Andrew Barnert" wrote: > > On Saturday, December 26, 2015 8:48 AM, Wes Turner wrote: > > > >So, > > >Would I be correct in that, because of this regression in __repr__ behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off? > > No. 2.x did not provide total information. It used the exact same __repr__ as 3.x. (If you can come up with a way to provide total information that's readable to both humans and the parser, I'm sure everyone would love to see it.) > > Any tests that string compare __repr__ to test the equality of two lists in 2.x will get the same results in 3.x. They're still probably bad tests, but no worse than before. The only difference is that the `...` is a valid literal in 3.x, so `[1, 2, [...]]` is a valid list display in 3.x, and it wasn't in 2.x. (Even there, tests that assume __repr__ equality are no more broken than before: a list containing 1, 2, and itself reprs as `[1, 2, [...]]`, while a list containing 1, 2, and a list containing `...` reprs as `[1, 2, [Ellipsis]]`, so they will not be mistakenly compared equal.) Got it, thanks! * https://docs.python.org/3/library/constants.html#Ellipsis * http://python-reference.readthedocs.org/ewhat-does-the-python-ellipsis-object-do * http://stackoverflow.com/questions/772124/what-does-the-python-ellipsis-object-do > > As Serhiy points out, there actually _is_ a regression here: tests that depend on the fact that a circular list will raise a SyntaxError on eval(repr(x)) do break with 3.0. I doubt there were many such tests, given that nobody's noticed the problem until half a decade later, but I suppose that is a regression. > > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 26 18:05:05 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 16:05:05 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: There is a precedent for declaring that a method isn't implemented: __hash__. The convention is to set it to None in the subclass that explicitly doesn't want to implement it. The __subclasshook__ in collections.Hashable checks for this. The pattern is also used for __await__. On Sat, Dec 26, 2015 at 1:09 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On Dec 26, 2015, at 04:00, Serhiy Storchaka wrote: > > > >> On 07.11.15 00:11, Amir Rachum wrote: > >> I am suggesting the addition of a collections abstract base class called > >> "Ordered". Its meaning is that a collection's iteration order is part of > >> its API. The bulk of this mail describes a use case for this. The reason > >> I believe that such abstract base class is required is that there is no > >> way to test this behavior in a given class. An ordered collection has > >> the exact same interface as an unordered collection (e.g, dict and > >> OrderedDict), other than a _promise_ of the API that the order in which > >> this collection will be iterated has some sort of meaning (In > >> OrderedDict, it is the order in which keys were added to it.) > >> > >> As examples, set, frozenset, dict and defaultdict should *not* be > >> considered as ordered. list, OrderedDict, deque and tuple should be > >> considered ordered. > > > > Actually we already have such abstract class. It's typing.Reversible. > > But surely an infinite list, for example, is ordered but not reversible. > > Also, typing types aren't abstract base classes--one is for static type > checking, the other for runtime tests. Of course they're closely related, > but if they were the same thing, we wouldn't need a separate module for > typing in the first place. > > Of course there's nothing stopping us from adding > collections.abc.Reversible, but that still doesn't solve the problem that > not all ordered things are reversible. (I still don't think Ordered is > necessary--but if it is, I don't think Reversible being kind of close > helps, any more than Sequence and Iterable both being kind of close helps.) > > > Unfortunately the test returns False for tuple, str, bytes, bytearray, > and array: > > It's defined (and implemented) as testing for the presence of > __reversed__. But the reverse function works on types that don't implement > __reversed__ if they implement the old-style sequence protocol, which can't > be tested structurally. > > Iterable is defined similarly, but it's a supertype of Sequence, and all > of those builtin types get registered explicitly with Sequence (as some > third-party types do), so they're all Iterable too. > > The obvious fix is to make Reversible a subtype of Iterable, and Sequence > a subtype of Reversible instead of Iterable. That would fix tuple, str, and > all the other types that are registered explicitly with Sequence or > MutableSequence. > > This still doesn't cover OrderedDict and friends, but they could be > explicitly registered with Reversible. > > I think for any solution to work with static typing, you'd also need to > change the hierarchy in typing to parallel the new hierarchy in > collections.abc, and change typing.Reversible to use > collections.abc.Reversible as its "extra". > > One last thing: issue 25864, about all mappings except dict and its > subclasses accidentally (and incorrectly) implementing the old-style > sequence protocol well enough that when you call reverse on them, you > successfully get an unusable iterator, instead of getting a TypeError. The > only obvious fix is to add a __reversed__ that raises. But, as you pointed > out there, that makes the problem with typing.Reversible (and any > collections.abc.Reversible) worse. Currently, by being overly strict, > Reversible happens to fail on Mapping subclasses, for the same reason it > fails on things that actually _are_ properly reversible. I'm not sure what > the solution is there. Fixing both Reversible and Mapping will accidentally > make all > Mappings statically pass as Reversible, which we definitely don't want. > Maybe we need a way to explicitly mark (for type checking) that a method > isn't implemented, or to explicitly "unregister" from an ABC and/or typing > type that takes precedence over structural checks? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Dec 26 18:23:50 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 27 Dec 2015 10:23:50 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: <85twn4x08p.fsf@benfinney.id.au> Guido van Rossum writes: > There is a precedent for declaring that a method isn't implemented: > __hash__. The convention is to set it to None in the subclass that > explicitly doesn't want to implement it. Isn't ?raise NotImplementedError? the more explicit convention provided by Python (as a built-in, explicitly-named exception!) for communicating this meaning? -- \ ?[It's] best to confuse only one issue at a time.? ?Brian W. | `\ Kernighan, Dennis M. Ritchie, _The C programming language_, 1988 | _o__) | Ben Finney From guido at python.org Sat Dec 26 18:30:23 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 16:30:23 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: <85twn4x08p.fsf@benfinney.id.au> References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> Message-ID: No, raising NotImplementedError means that a subclass was supposed to implement the method. In this case it's different -- it should appear as if the method isn't implemented to code that checks for the method's presence. Introspecting whether the code raises NotImplementedError is unfeasible. We explicitly decided that setting the method to None indicates that it should be considered as absent by code that checks for the method's presence. On Sat, Dec 26, 2015 at 4:23 PM, Ben Finney wrote: > Guido van Rossum writes: > > > There is a precedent for declaring that a method isn't implemented: > > __hash__. The convention is to set it to None in the subclass that > > explicitly doesn't want to implement it. > > Isn't ?raise NotImplementedError? the more explicit convention provided > by Python (as a built-in, explicitly-named exception!) for communicating > this meaning? > > -- > \ ?[It's] best to confuse only one issue at a time.? ?Brian W. | > `\ Kernighan, Dennis M. Ritchie, _The C programming language_, 1988 | > _o__) | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Dec 26 18:33:56 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 27 Dec 2015 10:33:56 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> Message-ID: <85poxswzrv.fsf@benfinney.id.au> Guido van Rossum writes: > No, raising NotImplementedError means that a subclass was supposed to > implement the method. In this case it's different -- it should appear > as if the method isn't implemented to code that checks for the > method's presence. Understood, thanks for explaining the difference. > Introspecting whether the code raises NotImplementedError is > unfeasible. We explicitly decided that setting the method to None > indicates that it should be considered as absent by code that checks > for the method's presence. Oh, you mean setting the attribute so it's not a method at all but a simple non-callable object? That makes sense. Why recommend ?None?, though? We now have the ?NotImplemented? object; why not set the attribute of the class as ?foo = NotImplemented?? -- \ ?We are stuck with technology when what we really want is just | `\ stuff that works.? ?Douglas Adams | _o__) | Ben Finney From abarnert at yahoo.com Sat Dec 26 18:34:25 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Dec 2015 15:34:25 -0800 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: On Dec 26, 2015, at 15:05, Guido van Rossum wrote: > > There is a precedent for declaring that a method isn't implemented: __hash__. The convention is to set it to None in the subclass that explicitly doesn't want to implement it. The __subclasshook__ in collections.Hashable checks for this. The pattern is also used for __await__. Well, that makes things a lot simpler. But getting back to Serhiy's point: would a `collections.abc.Reversible` (with this fix) solve the need for Ordered, and, if so, should it be added? From storchaka at gmail.com Sat Dec 26 19:09:07 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 27 Dec 2015 02:09:07 +0200 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: On 27.12.15 01:05, Guido van Rossum wrote: > There is a precedent for declaring that a method isn't implemented: > __hash__. The convention is to set it to None in the subclass that > explicitly doesn't want to implement it. The __subclasshook__ in > collections.Hashable checks for this. The pattern is also used for > __await__. Yes, this was the first thing that I tried, but it doesn't work, as shown in my example in issue25864. This is yet one thing that should be fixed in Reversible. May be we have to use this idiom more widely, and specially handle assigning special methods to None. The error message "'sometype' can't be reverted" looks better than "'NoneType' is not callable". From guido at python.org Sat Dec 26 19:04:50 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 17:04:50 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: <85poxswzrv.fsf@benfinney.id.au> References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> <85poxswzrv.fsf@benfinney.id.au> Message-ID: On Sat, Dec 26, 2015 at 4:33 PM, Ben Finney wrote: > Guido van Rossum writes: > > > No, raising NotImplementedError means that a subclass was supposed to > > implement the method. In this case it's different -- it should appear > > as if the method isn't implemented to code that checks for the > > method's presence. > > Understood, thanks for explaining the difference. > > > Introspecting whether the code raises NotImplementedError is > > unfeasible. We explicitly decided that setting the method to None > > indicates that it should be considered as absent by code that checks > > for the method's presence. > > Oh, you mean setting the attribute so it's not a method at all but a > simple non-callable object? That makes sense. > > Why recommend ?None?, though? We now have the ?NotImplemented? object; > why not set the attribute of the class as ?foo = NotImplemented?? Too late by many language releases, and not worth fixing. Either way it's an arbitrary token that you would have to check for specially and whose meaning you'd have to look up. Also, NotImplemented has very special semantics (its main use is for *binary* operators to indicate "not overloaded on this argument, try the other") -- this has nothing to do with that. (If I had to do it over again, I'd choose more different names for the exception you raise to indicate that a method should be implemented by a subclass, and the value you return to indicate that the other argument of a binary operator should be given a chance. But that's also too late by many releases.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Dec 26 19:21:00 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 27 Dec 2015 11:21:00 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> <85poxswzrv.fsf@benfinney.id.au> Message-ID: <85k2o0wxlf.fsf@benfinney.id.au> Guido van Rossum writes: > On Sat, Dec 26, 2015 at 4:33 PM, Ben Finney > wrote: > > > Why recommend ?None?, though? We now have the ?NotImplemented? > > object; why not set the attribute of the class as ?foo = > > NotImplemented?? > > Too late by many language releases, and not worth fixing. Yes, to be clear I'm not suggesting a change of ?__hash__ = None?. I am talking of new code, like the changes being discussed in this thread: since we have ?NotImplemented? now, we can more explicitly indicate not-implemented attributes with ?foo = NotImplemented?. > Either way it's an arbitrary token that you would have to check for > specially and whose meaning you'd have to look up. Also, > NotImplemented has very special semantics (its main use is for > *binary* operators to indicate "not overloaded on this argument, try > the other") -- this has nothing to do with that. Okay, that's clear. Semantics can change over time, though, and I think ?NotImplemented? much more clearly indicates the desired semantics than ?None?, and is not ambiguous with existing uses of ?foo = None? on a class. So I advocate a class-level ?foo = NotImplemented? as an obvious way to indicate an expected method is not implemented on this class. Thanks for discussing and explaining. My vote counts for whatever it counts for, and I'll let these arguments stand or fall as I've presented them. -- \ ?Every sentence I utter must be understood not as an | `\ affirmation, but as a question.? ?Niels Bohr | _o__) | Ben Finney From abarnert at yahoo.com Sat Dec 26 19:24:03 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Dec 2015 16:24:03 -0800 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> On Dec 26, 2015, at 16:09, Serhiy Storchaka wrote: > >> On 27.12.15 01:05, Guido van Rossum wrote: >> There is a precedent for declaring that a method isn't implemented: >> __hash__. The convention is to set it to None in the subclass that >> explicitly doesn't want to implement it. The __subclasshook__ in >> collections.Hashable checks for this. The pattern is also used for >> __await__. > > Yes, this was the first thing that I tried, but it doesn't work, as shown in my example in issue25864. This is yet one thing that should be fixed in Reversible. As Guido pointed out in your typehinting #170, that isn't a bug with typing.Reversible. You can't use typing types in runtime type tests with issubclass. That isn't supposed to work, and the fact that it often kind of does work is actually a bug that he's fixing. So the fact that it doesn't work in this case is correct. That also means your attempted solution to this thread is wrong; typing.Reversible cannot be used as a substitute for collections.abc.Ordered. > May be we have to use this idiom more widely, and specially handle assigning special methods to None. The error message "'sometype' can't be reverted" looks better than "'NoneType' is not callable". I agree with this. A new collections.abc.Reversible (interposed between Iterable and Sequence) would be a potential substitute for Ordered, and would have this problem, which would be solvable by treating __reversed__ = None specially, just like __hash__ = None. And I'm pretty sure it would come up in practice (see issue 25864). And once we've got two or three special methods doing this instead of one, making it more general does sound like a good idea. So, if we need Reversible as a substitute for Ordered, then I think we want the general "is None" test. But I'm still not sure Reversible is a good substitute for Ordered (again, consider an infinitely long collection, or just a lazy proxy that doesn't compute values until needed and doesn't know its length in advance--they're clearly ordered, and just as clearly not reversible), and I'm not sure we actually need either Reversible or Ordered in the first place. From random832 at fastmail.com Sat Dec 26 19:24:25 2015 From: random832 at fastmail.com (Random832) Date: Sat, 26 Dec 2015 19:24:25 -0500 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: Andrew Barnert writes: > No. 2.x did not provide total information. It used the exact same > __repr__ as 3.x. (If you can come up with a way to provide total > information that's readable to both humans and the parser, I'm sure > everyone would love to see it.) Emacs Lisp has an option to (and some versions of Javascript used to borrow the same syntax) represent circular references and duplicate references with a syntax where the first reference has #N=(whatever) and other references as #N#. So, a circular list would be #1=(#1#); a list containing a reference to itself and two references to another list would be #1=(#1# #2=(3) #2#), etc. Emacs' parser supports it, Javascript's never did even on the versions that could produce the format. When the option is turned off, it substitutes circular references, but not duplicate references, with #N where N appears the level of nesting from the top of the expression where the outermost copy of the reference appears, a syntax which is not supported by its parser. From wes.turner at gmail.com Sat Dec 26 19:34:05 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 18:34:05 -0600 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: * collections.abc.Ordered * collections.abc.Reversible * collections.abc.Infinite [...] * collections.abc.Sorted ? On Dec 26, 2015 7:24 PM, "Andrew Barnert via Python-ideas" < python-ideas at python.org> wrote: > On Dec 26, 2015, at 16:09, Serhiy Storchaka wrote: > > > >> On 27.12.15 01:05, Guido van Rossum wrote: > >> There is a precedent for declaring that a method isn't implemented: > >> __hash__. The convention is to set it to None in the subclass that > >> explicitly doesn't want to implement it. The __subclasshook__ in > >> collections.Hashable checks for this. The pattern is also used for > >> __await__. > > > > Yes, this was the first thing that I tried, but it doesn't work, as > shown in my example in issue25864. This is yet one thing that should be > fixed in Reversible. > > As Guido pointed out in your typehinting #170, that isn't a bug with > typing.Reversible. You can't use typing types in runtime type tests with > issubclass. That isn't supposed to work, and the fact that it often kind of > does work is actually a bug that he's fixing. So the fact that it doesn't > work in this case is correct. > > That also means your attempted solution to this thread is wrong; > typing.Reversible cannot be used as a substitute for > collections.abc.Ordered. > > > May be we have to use this idiom more widely, and specially handle > assigning special methods to None. The error message "'sometype' can't be > reverted" looks better than "'NoneType' is not callable". > > I agree with this. A new collections.abc.Reversible (interposed between > Iterable and Sequence) would be a potential substitute for Ordered, and > would have this problem, which would be solvable by treating __reversed__ = > None specially, just like __hash__ = None. And I'm pretty sure it would > come up in practice (see issue 25864). And once we've got two or three > special methods doing this instead of one, making it more general does > sound like a good idea. So, if we need Reversible as a substitute for > Ordered, then I think we want the general "is None" test. > > But I'm still not sure Reversible is a good substitute for Ordered (again, > consider an infinitely long collection, or just a lazy proxy that doesn't > compute values until needed and doesn't know its length in advance--they're > clearly ordered, and just as clearly not reversible), and I'm not sure we > actually need either Reversible or Ordered in the first place. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Dec 26 19:36:30 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 27 Dec 2015 11:36:30 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: On Sun, Dec 27, 2015 at 11:34 AM, Wes Turner wrote: > * collections.abc.Ordered > * collections.abc.Reversible > * collections.abc.Infinite [...] > > * collections.abc.Sorted ? -1. Can you imagine trying to explain to everyone what the difference is between Ordered and Sorted? (My understanding is that Ordered has an inherent order, and Sorted will maintain an externally-defined order, but I might be wrong.) ChrisA From wes.turner at gmail.com Sat Dec 26 20:11:59 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 19:11:59 -0600 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: * collections.abc.Ordered * collections.abc.Reversible * collections.abc.Infinite [...] * collections.abc.Sorted ? * collections.abc.Recursive ? Rationale: These are all attributes of collections that would allow us to reason about [complexity, types, runtime]? [While someone is at it, annotating functions and methods with complexity class URI fragments accessible at runtime could also be useful for [dynamic programming].] * Ordered is justified by this thread * Reversible is distinct from Ordered (because Infinite sequences) * Infinite is the distinction between Ordered and Reversible * collections.abc.Sorted would be a useful property to keep track of (because then you don't have to do element-wise comparisons for each collection member) ... * collections.abc.Recursive could also be a useful property to mixin [again for dynamic programming] https://en.wikipedia.org/wiki/Dynamic_programming On Dec 26, 2015 7:34 PM, "Wes Turner" wrote: > * collections.abc.Ordered > * collections.abc.Reversible > * collections.abc.Infinite [...] > > * collections.abc.Sorted ? > On Dec 26, 2015 7:24 PM, "Andrew Barnert via Python-ideas" < > python-ideas at python.org> wrote: > >> On Dec 26, 2015, at 16:09, Serhiy Storchaka wrote: >> > >> >> On 27.12.15 01:05, Guido van Rossum wrote: >> >> There is a precedent for declaring that a method isn't implemented: >> >> __hash__. The convention is to set it to None in the subclass that >> >> explicitly doesn't want to implement it. The __subclasshook__ in >> >> collections.Hashable checks for this. The pattern is also used for >> >> __await__. >> > >> > Yes, this was the first thing that I tried, but it doesn't work, as >> shown in my example in issue25864. This is yet one thing that should be >> fixed in Reversible. >> >> As Guido pointed out in your typehinting #170, that isn't a bug with >> typing.Reversible. You can't use typing types in runtime type tests with >> issubclass. That isn't supposed to work, and the fact that it often kind of >> does work is actually a bug that he's fixing. So the fact that it doesn't >> work in this case is correct. >> >> That also means your attempted solution to this thread is wrong; >> typing.Reversible cannot be used as a substitute for >> collections.abc.Ordered. >> >> > May be we have to use this idiom more widely, and specially handle >> assigning special methods to None. The error message "'sometype' can't be >> reverted" looks better than "'NoneType' is not callable". >> >> I agree with this. A new collections.abc.Reversible (interposed between >> Iterable and Sequence) would be a potential substitute for Ordered, and >> would have this problem, which would be solvable by treating __reversed__ = >> None specially, just like __hash__ = None. And I'm pretty sure it would >> come up in practice (see issue 25864). And once we've got two or three >> special methods doing this instead of one, making it more general does >> sound like a good idea. So, if we need Reversible as a substitute for >> Ordered, then I think we want the general "is None" test. >> >> But I'm still not sure Reversible is a good substitute for Ordered >> (again, consider an infinitely long collection, or just a lazy proxy that >> doesn't compute values until needed and doesn't know its length in >> advance--they're clearly ordered, and just as clearly not reversible), and >> I'm not sure we actually need either Reversible or Ordered in the first >> place. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Dec 26 20:39:03 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 26 Dec 2015 19:39:03 -0600 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: On Dec 26, 2015 8:11 PM, "Wes Turner" wrote: > > * collections.abc.Ordered > * collections.abc.Reversible > * collections.abc.Infinite [...] > > * collections.abc.Sorted ? > * collections.abc.Recursive ? > > Rationale: > > These are all attributes of collections that would allow us to reason about [complexity, types, runtime]? > > [While someone is at it, annotating functions and methods with complexity class URI fragments accessible at runtime could also be useful for [dynamic programming].] > > * Ordered is justified by this thread > * Reversible is distinct from Ordered (because Infinite sequences) > * Infinite is the distinction between Ordered and Reversible > > * collections.abc.Sorted would be a useful property to keep track of (because then you don't have to do element-wise comparisons for each collection member) > > ... > > * collections.abc.Recursive could also be a useful property to mixin [again for dynamic programming] > > https://en.wikipedia.org/wiki/Dynamic_programming https://en.wikipedia.org/wiki/Goal_programming There are more properties of sequences listed here; IDK if this is out of scope for OT: https://en.wikipedia.org/wiki/Sequence#Formal_definition_and_basic_properties Vague use case: algorithmic selection / unhalting-avoidance with combinatorial data/logic sequences. [e.g. find the fastest halting solution] Practically, Ordered is a property of various types [e.g. is this a poset or not]. There is currently no way to check for .ordered with hasattr. These properties are things we currently keep in mind (some of our 7?2 things) and haven't yet figured out how to annotate with and access at runtime. > > On Dec 26, 2015 7:34 PM, "Wes Turner" wrote: >> >> * collections.abc.Ordered >> * collections.abc.Reversible >> * collections.abc.Infinite [...] >> >> * collections.abc.Sorted ? >> >> On Dec 26, 2015 7:24 PM, "Andrew Barnert via Python-ideas" < python-ideas at python.org> wrote: >>> >>> On Dec 26, 2015, at 16:09, Serhiy Storchaka wrote: >>> > >>> >> On 27.12.15 01:05, Guido van Rossum wrote: >>> >> There is a precedent for declaring that a method isn't implemented: >>> >> __hash__. The convention is to set it to None in the subclass that >>> >> explicitly doesn't want to implement it. The __subclasshook__ in >>> >> collections.Hashable checks for this. The pattern is also used for >>> >> __await__. >>> > >>> > Yes, this was the first thing that I tried, but it doesn't work, as shown in my example in issue25864. This is yet one thing that should be fixed in Reversible. >>> >>> As Guido pointed out in your typehinting #170, that isn't a bug with typing.Reversible. You can't use typing types in runtime type tests with issubclass. That isn't supposed to work, and the fact that it often kind of does work is actually a bug that he's fixing. So the fact that it doesn't work in this case is correct. >>> >>> That also means your attempted solution to this thread is wrong; typing.Reversible cannot be used as a substitute for collections.abc.Ordered. >>> >>> > May be we have to use this idiom more widely, and specially handle assigning special methods to None. The error message "'sometype' can't be reverted" looks better than "'NoneType' is not callable". >>> >>> I agree with this. A new collections.abc.Reversible (interposed between Iterable and Sequence) would be a potential substitute for Ordered, and would have this problem, which would be solvable by treating __reversed__ = None specially, just like __hash__ = None. And I'm pretty sure it would come up in practice (see issue 25864). And once we've got two or three special methods doing this instead of one, making it more general does sound like a good idea. So, if we need Reversible as a substitute for Ordered, then I think we want the general "is None" test. >>> >>> But I'm still not sure Reversible is a good substitute for Ordered (again, consider an infinitely long collection, or just a lazy proxy that doesn't compute values until needed and doesn't know its length in advance--they're clearly ordered, and just as clearly not reversible), and I'm not sure we actually need either Reversible or Ordered in the first place. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 26 21:17:45 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 19:17:45 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> Message-ID: On Sat, Dec 26, 2015 at 5:09 PM, Serhiy Storchaka wrote: > On 27.12.15 01:05, Guido van Rossum wrote: > >> There is a precedent for declaring that a method isn't implemented: >> __hash__. The convention is to set it to None in the subclass that >> explicitly doesn't want to implement it. The __subclasshook__ in >> collections.Hashable checks for this. The pattern is also used for >> __await__. >> > > Yes, this was the first thing that I tried, but it doesn't work, as shown > in my example in issue25864. This is yet one thing that should be fixed in > Reversible. > Yeah, sorry, I didn't mean it already works. I just meant that we should adopt the same convention here. > May be we have to use this idiom more widely, and specially handle > assigning special methods to None. The error message "'sometype' can't be > reverted" looks better than "'NoneType' is not callable". > Yes, that's what I'm proposing. Just like hash() says "TypeError: unhashable type: 'list'" instead of "TypeError: 'NoneType' object is not callable" or "AttributeError: 'list' object has no attribute '__hash__'". -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 26 21:20:50 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Dec 2015 19:20:50 -0700 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: <85k2o0wxlf.fsf@benfinney.id.au> References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> <85poxswzrv.fsf@benfinney.id.au> <85k2o0wxlf.fsf@benfinney.id.au> Message-ID: On Sat, Dec 26, 2015 at 5:21 PM, Ben Finney wrote: > Guido van Rossum writes: > > > On Sat, Dec 26, 2015 at 4:33 PM, Ben Finney > > wrote: > > > > > Why recommend ?None?, though? We now have the ?NotImplemented? > > > object; why not set the attribute of the class as ?foo = > > > NotImplemented?? > > > > Too late by many language releases, and not worth fixing. > > Yes, to be clear I'm not suggesting a change of ?__hash__ = None?. > > I am talking of new code, like the changes being discussed in this > thread: since we have ?NotImplemented? now, we can more explicitly > indicate not-implemented attributes with ?foo = NotImplemented?. > > > Either way it's an arbitrary token that you would have to check for > > specially and whose meaning you'd have to look up. Also, > > NotImplemented has very special semantics (its main use is for > > *binary* operators to indicate "not overloaded on this argument, try > > the other") -- this has nothing to do with that. > > Okay, that's clear. Semantics can change over time, though, and I think > ?NotImplemented? much more clearly indicates the desired semantics than > ?None?, and is not ambiguous with existing uses of ?foo = None? on a > class. > > So I advocate a class-level ?foo = NotImplemented? as an obvious way to > indicate an expected method is not implemented on this class. > > Thanks for discussing and explaining. My vote counts for whatever it > counts for, and I'll let these arguments stand or fall as I've presented > them. Thanks. I'm not convinced -- I think you're trying too hard to invent a special protocol for a pretty obscure corner case. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Dec 26 22:07:28 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 27 Dec 2015 03:07:28 +0000 (UTC) Subject: [Python-ideas] Deprecating the old-style sequence protocol References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said: > FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol. The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below. Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable). I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.) Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved. But obviously, deprecating the old-style sequence protocol would make the problems go away. --- Here's the argument against doing so: On 2013-09-22 23:46:37 GMT, Steven D'Aprano wrote: > On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote: >> On 9/22/2013 10:22 AM, Nick Coghlan wrote: >>> >>> The __getitem__ fallback is a backwards >>> compatibility hack, not part of the formal definition of an iterable. >>> >> When I suggested that, by suggesting that the fallback *perhaps* could >> be called 'semi-deprecated, but kept for back compatibility' in the >> glossary entry, Raymond screamed at me and accused me of trying to >> change the language. He considers it an intended language feature that >> one can write a sequence class and not bother with __iter__. I guess we >> do not all agree ;-). >> > Raymond did not "scream", he wrote *one* word in uppercase for emphasis. > I quote: > >> It is NOT deprecated. People use and rely on this behavior. It is >> a guaranteed behavior. Please don't use the glossary as a place to >> introduce changes to the language. > > I agree, and I disagree with Nick's characterization of the sequence > protocol as a "backwards-compatibility hack". It is an elegant protocol > for implementing iteration of sequences, an old and venerable one that > predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it > raises StopIteration. If it were considered *merely* for backward > compatibility with Python 1.5 code, there was plenty of opportunity to > drop it when Python 3 came out. > > The sequence protocol allows one to write a lazily generated, > potentially infinite sequence that still allows random access to items. > Here's a toy example: > > py> class Squares: > ... def __getitem__(self, index): > ... return index**2 > ... > py> for sq in Squares(): > ... if sq > 9: break > ... print(sq) > 0 > 1 > 4 > 9 > > Because it's infinite, there's no value that __len__ can return, and no > need for a __len__. Because it supports random access to items, writing > this as an iterator with __next__ is inappropriate. Writing *both* is > unnecessary, and complicates the class for no benefit. As written, > Squares is naturally thread-safe -- two threads can iterate over the > same Squares object without interfering. Also, elsewhere in the thread, someone else pointed out another example (which I'm rewriting to make it fit better with Steven's): class TenSquares: def __len__(self): return 10 def __getitem__(self, index): if 0 <= index < 10: return index**2 raise IndexError You can iterate this, convert it to a `list`, call `reversed` on it, etc., all in only 6 lines of code. --- Guido's response was: > Hm. The example given there is a toy though. Something with a __getitem__ > that maps its argument to its square might as well be a mapping. I really > think it's time to slowly let go of this (no need to rush into removing > support, but we could still frown upon its use). And it's worth noting that making these examples work without the old-style sequence protocol isn't exactly hard: add a 1-line `__iter__` method, or a 1-line replacement for the old-style `iter`, or, for the second example, just inherit the `Sequence` ABC. Also, the thread-safety issue seems bogus. Any reasonable collection is thread-safe as an iterable. Presumably the counter-argument is that, as trivial as those changes are, they're still not nearly as trivial as the original code, and in a quick&dirty script or interactive session, it may be more than you want to do (especially since it involves importing a module you didn't otherwise need). But I'll leave it to the people who are strongly against the deprecation to explain it, rather than putting words in their mouths. --- Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively: > ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). > ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0). > If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed(). From steve at pearwood.info Sat Dec 26 23:41:21 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 27 Dec 2015 15:41:21 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: <20151227044121.GO23700@ando.pearwood.info> On Sat, Dec 26, 2015 at 07:11:59PM -0600, Wes Turner wrote: > * collections.abc.Ordered > * collections.abc.Reversible > * collections.abc.Infinite [...] > > * collections.abc.Sorted ? > * collections.abc.Recursive ? > > Rationale: > > These are all attributes of collections that would allow us to reason about > [complexity, types, runtime]? Rather than vague abstractions like "reason about complexity", can we have some concrete use-cases for these? "Sorted" is not strictly property of the collection alone, it is a property of the collection, the item in the collection, and a sorting function. The collection cannot be Sorted unless the items themselves are Sortable, which is clearly a property of the items, not the collection; and whether or not the collection actually is sorted or not depends on what you are sorting by. So ["apple", "bear", "cat", "do"] may or may not be sorted. If you mean "sorted in dictionary order", it is, but if you mean "sorted by the length of the word", it most certainly is not. If abc.Sorted only considers the default sort order, it will be useless for many purposes. Just this morning I was writing some code where I needed a sequence of 2-tuples sorted in reverse order (highest to lowest) by the second item and by the length of the first item, in that order. For my purposes, this list would count as Sorted: [("dog", 9.2), ("apple", 7.5), ("aardvark", 7.5), ("dog", 4.1)] How does your abc.Sorted help me? > [While someone is at it, annotating functions and methods with complexity > class URI fragments accessible at runtime could also be useful for [dynamic > programming].] I'm afraid I don't understand what you mean by "complexity class URI fragments", or why they would be useful for dynamic programming. > * Ordered is justified by this thread > * Reversible is distinct from Ordered (because Infinite sequences) > * Infinite is the distinction between Ordered and Reversible Where do you put something which has indefinite length? It's not infinite, but you don't know how long it is. For example: def gen(): while random.random() < 0.5: yield "spam" Obviously this isn't a collection, as such, its an iterator, but I could write a lazy collection which similarly has a finite but indefinite length that isn't known ahead of time and so can't be reversed. Something like a stream of data coming from an external device perhaps? You can't reverse it, not because it is infinite, but because you don't know how far forward to go to get the last item. -- Steve From steve at pearwood.info Sat Dec 26 23:55:12 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 27 Dec 2015 15:55:12 +1100 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <12202FAF-0B80-40A2-9131-20E0A91D7EF6@yahoo.com> Message-ID: <20151227045512.GP23700@ando.pearwood.info> On Sun, Dec 27, 2015 at 11:36:30AM +1100, Chris Angelico wrote: > On Sun, Dec 27, 2015 at 11:34 AM, Wes Turner wrote: > > * collections.abc.Ordered > > * collections.abc.Reversible > > * collections.abc.Infinite [...] > > > > * collections.abc.Sorted ? > > -1. Can you imagine trying to explain to everyone what the difference > is between Ordered and Sorted? (My understanding is that Ordered has > an inherent order, and Sorted will maintain an externally-defined > order, but I might be wrong.) The same problem comes up with OrderedDict. People often want a *sorted* dict, in the sense that it always iterates in the naive sorted order that sorted(dict.keys()) would give, but without having to sort the keys. So you will sometimes find people using "ordered" and "sorted" interchangably -- I must admit I've been guilty of that once or twice. In general, though, "ordered" means *items are always in insertion order*, while "sorted" means "the order you would get if you sorted". And neither may apply to data structures that order their items in something other than insertion order, e.g. order of frequency of use. -- Steve From ncoghlan at gmail.com Sun Dec 27 00:19:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Dec 2015 15:19:46 +1000 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas wrote: > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does. Right, this is the reason I think it's reasonable to suggesting changing the recursive repr - the current form is one that *humans* that have only learned Python 3 are likely to misinterpret, since the fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a quirk originating in the fact that "..." is restricted to subscripts in Python 2. I don't think it's a major problem (as recursive container representations aren't something that comes up every day), but switching to "<...>" does have the advantage of allowing for a consistent recursive reference representation across all container types, regardless of whether they have native syntax or not. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Dec 27 01:22:37 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Dec 2015 16:22:37 +1000 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27 December 2015 at 13:07, Andrew Barnert via Python-ideas wrote: > Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable). > > I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See > http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.) > > Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved. > > But obviously, deprecating the old-style sequence protocol would make the problems go away. [snip] > Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively: > >> ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). > >> ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0). > >> If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed(). There's an additional option we can consider, which is to move the backwards compatibility fallback to type creation time, rather than method lookup time. The two rules would be: * if a type defines __getitem__ without also defining __iter__, add a default __iter__ implementation that assumes the type is a sequence * if a type defines __getitem__ and __len__ without also defining __reversed__, add a default __reversed__ implementation that assumes the type is a sequence (At the C level, even sequences need to use the mapping slots to support extended slicing, so we can't make the distinction based on which C level slots are defined) As with using "__hash__ = None" to block the default inheritance of object.__hash__, setting "__iter__ = None" or "__reversed__ = None" in a class definition would block the addition of the implied methods. However, while I think those changes would clean up some quirky edge cases without causing any harm, even doing all of that still wouldn't get us to the point of having a truly *structural* definition of the difference between a Mapping and a Sequence. For example, OrderedDict defines all of __len__, __getitem__, __iter__ and __reversed__ *without* being a sequence in the "items are looked up by their position in the sequence" sense. These days, without considering the presence or absence of any non-dunder methods, the core distinction between sequences, multi-dimensional arrays and arbitrary mappings really lies in the type signature of the key parameter to__getitem__ et al (assuming a suitably defined Index type hint): MappingKey = Any DictKey = collections.abc.Hashable SequenceKey = Union[Index, slice] ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]] Regards, Nick. [1] https://github.com/ambv/typehinting/issues/171 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Dec 27 02:08:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Dec 2015 17:08:13 +1000 Subject: [Python-ideas] Adding collections.abc.Ordered In-Reply-To: References: <4A0243AD-C210-4615-8840-D817BB5BD94C@yahoo.com> <85twn4x08p.fsf@benfinney.id.au> <85poxswzrv.fsf@benfinney.id.au> <85k2o0wxlf.fsf@benfinney.id.au> Message-ID: On 27 December 2015 at 12:20, Guido van Rossum wrote: > On Sat, Dec 26, 2015 at 5:21 PM, Ben Finney > wrote: >> So I advocate a class-level ?foo = NotImplemented? as an obvious way to >> indicate an expected method is not implemented on this class. >> >> Thanks for discussing and explaining. My vote counts for whatever it >> counts for, and I'll let these arguments stand or fall as I've presented >> them. > > Thanks. I'm not convinced -- I think you're trying too hard to invent a > special protocol for a pretty obscure corner case. I was trying to recall if we'd ever seriously considered NotImplemented for this use case, but as near as I can tell, the "__hash__ = None" approach was born as the obvious Python level counterpart of setting the C level tp_hash slot to NULL in Python 3.0: https://hg.python.org/cpython/rev/c6d9fa81f20f/ We then retained the "__hash__ = None" behaviour at the Python level even after switching to a custom slot entry at the C level as part of resolving some corner cases that were found when backporting the abc module from Python 3.0 to 2.6: https://bugs.python.org/issue2235#msg69324 So this is a case of C's NULL pointer concept being visible in Python's semantic model as "attribute = None". Operand coercion is actually the special case on that front, as "None" needs to be permitted as a possible result, and exceptions can't be used as an alternative signalling channel due to the runtime cost involved. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Sun Dec 27 02:30:45 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 27 Dec 2015 09:30:45 +0200 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27.12.15 08:22, Nick Coghlan wrote: > These days, without considering the presence or absence of any > non-dunder methods, the core distinction between sequences, > multi-dimensional arrays and arbitrary mappings really lies in the > type signature of the key parameter to__getitem__ et al (assuming a > suitably defined Index type hint): > > MappingKey = Any > DictKey = collections.abc.Hashable > SequenceKey = Union[Index, slice] > ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]] ArrayKey also includes Ellipsis. From abarnert at yahoo.com Sun Dec 27 02:45:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Dec 2015 23:45:09 -0800 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Dec 26, 2015, at 22:22, Nick Coghlan wrote: > > On 27 December 2015 at 13:07, Andrew Barnert via Python-ideas > wrote: >> Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable). ... > There's an additional option we can consider, which is to move the > backwards compatibility fallback to type creation time, rather than > method lookup time. Sure, that's possible, but why? It doesn't make it any easier to add the rule "__iter__ is None blocks fallback". It doesn't make it easier to eventually remove the old-style protocol if we decide to deprecate it (if anything, it seems to make it harder, by adding another observable difference). It might make it easier to write a perfect Iterable ABC, but making a pure-Python stdlib function simpler at the cost of major churn in the C implementation of multiple builtins and C API functions (and similar for other implementations) doesn't seem like a good tradeoff. Unless it would be a lot simpler than I think? (I confess I haven't looked too much into what type() does under the covers, so maybe I'm overestimating the risk of changing it.) > However, while I think those changes would clean up some quirky edge > cases without causing any harm, even doing all of that still wouldn't > get us to the point of having a truly *structural* definition of the > difference between a Mapping and a Sequence. Agreed--but that wasn't the goal here. The existing nominal distinction between the two types, with all the most useful structurally-detectable features carved out separately, is a great design; the only problem is the quirky edge cases that erode the design and the workarounds needed to hold up the design; getting rid of those is the goal. Sure, being able to structurally distinguish Mapping and Sequence would probably make that goal simpler, but it's neither necessary nor sufficient, and is probably impossible. > For example, OrderedDict > defines all of __len__, __getitem__, __iter__ and __reversed__ > *without* being a sequence in the "items are looked up by their > position in the sequence" sense. Sure, but that just means Sequence implies Reversible (and presumably is a subtype of Reversible) rather than the other way around. There's still a clear hierarchy there, despite it not being structurally detectable. > These days, without considering the presence or absence of any > non-dunder methods, the core distinction between sequences, > multi-dimensional arrays and arbitrary mappings really lies in the > type signature of the key parameter to__getitem__ et al (assuming a > suitably defined Index type hint): Even that doesn't work. For example, most of the SortedDict types out there accept slices of keys, and yet a SkipListSortedDict[int, str] is clearly still not a sequence despite the fact that its __getitem__ takes Union[int, Slice[int]] just like a list[str] does. Unless the type system can actually represent "contiguous ints from 0" as a type, it can't make the distinction structurally. But, again, that's not a problem. I don't know of any serious language that solves the problem you're after (except maybe JS, Tcl, and others that just treat all sequences as mappings and have a clumsy API that everyone gets wrong half the time). The existing Python design, cleaned up a bit, would already be better than most languages, and good enough for me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 27 02:48:05 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Dec 2015 17:48:05 +1000 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27 December 2015 at 17:30, Serhiy Storchaka wrote: > On 27.12.15 08:22, Nick Coghlan wrote: >> >> These days, without considering the presence or absence of any >> non-dunder methods, the core distinction between sequences, >> multi-dimensional arrays and arbitrary mappings really lies in the >> type signature of the key parameter to__getitem__ et al (assuming a >> suitably defined Index type hint): >> >> MappingKey = Any >> DictKey = collections.abc.Hashable >> SequenceKey = Union[Index, slice] >> ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]] > > ArrayKey also includes Ellipsis. You're right, I was mistakenly thinking that memoryview implemented tuple indexing without ellipsis support, but it actually doesn't implement multi-dimensional indexing at all - once you cast to a multi-dimensional shape, most forms of subscript lookup are no longer permitted at all by the current implementation. So a more accurate array key description would look like: ArrayKey = Union[SequenceKey, type(Ellipsis), Tuple["ArrayKey", ...]] (I spelled out Ellipsis to minimise confusion with the tuple-as-frozen-list typing notation) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Sun Dec 27 07:25:57 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 27 Dec 2015 13:25:57 +0100 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> Message-ID: <567FD8D5.8090205@egenix.com> On 27.12.2015 04:07, Andrew Barnert via Python-ideas wrote: > This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said: > >> FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol. > > The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below. > > Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable). I'm not sure I follow. The main purpose of ABCs was to be able to explicitly define a type as complying to the sequence, mapping, etc. protocols by registering the class with the appropriate ABCs. https://www.python.org/dev/peps/pep-3119/ The "sequence protocol" is defined by the Sequence ABC, so by running an isinstance(obj, collections.abc.Sequence) check you can verify the protocol compliance. Now, most of your email talks about iteration, so perhaps you're referring to a different protocol, that of iterating over arbitrary objects which implement .__getitem__(), but don't implement .__iter__() or .__len__(). However, the support for the iteration protocol is part of the Sequence ABC, so there's no way to separate the two. A Sequence must implement .__len__() as well as .__getitem__() and thus can always implement .__reversed__() and .__iter__(). An object which implements .__getitem__() without .__len__() is not a Python sequence (*). Overall, the discussion feels somewhat arbitrary to me and is perhaps caused more by a misinterpretation or vague documentation which would need to be clarified, than by an actually missing feature in Python, paired with an important existing practical need :-) Putting all this together, I believe you're talking about the iter() support for non-sequence, indexable objects. We don't have an ABC for this: https://docs.python.org/3.5/library/collections.abc.html#collections-abstract-base-classes and can thus not check for it. (*) The CPython interpreter actually has a different view on this. It only checks for a .__getitem__() method, not a .__len__() method, in PySequence_Check(). The length information is only queried where necessary and a missing implementation then results in an exception. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 27 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From guido at python.org Sun Dec 27 12:16:34 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Dec 2015 10:16:34 -0700 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan wrote: > On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas > wrote: > > At any rate, I think what people are actually worried about here is not > the theoretical chance that such a regression might have happened 5 years > ago, but the more practical fact that 3.x might be misleading to human > beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy > stuff, you're used to passing ellipses around, and maybe even storing them > in index arrays, but you rarely if ever see a circular list. So, when you > see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as > meaning something different from what it does. > > Right, this is the reason I think it's reasonable to suggesting > changing the recursive repr - the current form is one that *humans* > that have only learned Python 3 are likely to misinterpret, since the > fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a > quirk originating in the fact that "..." is restricted to subscripts > in Python 2. > > I don't think it's a major problem (as recursive container > representations aren't something that comes up every day), but > switching to "<...>" does have the advantage of allowing for a > consistent recursive reference representation across all container > types, regardless of whether they have native syntax or not. > I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 27 12:04:59 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Dec 2015 10:04:59 -0700 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: <567FD8D5.8090205@egenix.com> References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> <567FD8D5.8090205@egenix.com> Message-ID: I think there's a lot of interesting stuff in this thread. Personally I don't think we should strive to distinguish between mappings and sequences structurally. We should instead continue to encourage inheriting from (or registering with) the corresponding ABCs. The goal is to ensure that there's one best-practice way to distinguish mappings from sequences, and it's by using isinstance(x, Sequence) or isinstance(x, Mapping). If we want some way to turn something that just defines __getitem__ and __len__ into a proper sequence, it should just be made to inherit from Sequence, which supplies the default __iter__ and __reversed__. (Registration is *not* good enough here.) If we really want a way to turn something that just supports __getitem__ into an Iterable maybe we can provide an additional ABC for that purpose; let's call it a HalfSequence until we've come up with a better name. (We can't use Iterable for this because Iterable should not reference __getitem__.) I also think it's fine to introduce Reversible as another ABC and carefully fit it into the existing hierarchy. It should be a one-trick pony and be another base class for Sequence; it should not have a default implementation. (But this has been beaten to death in other threads -- it's time to just file an issue with a patch.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Dec 27 14:09:30 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 27 Dec 2015 19:09:30 +0000 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: <5680376A.9060001@mrabarnett.plus.com> On 2015-12-27 17:16, Guido van Rossum wrote: > On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan > wrote: > > On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas > > wrote: > > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does. > > Right, this is the reason I think it's reasonable to suggesting > changing the recursive repr - the current form is one that *humans* > that have only learned Python 3 are likely to misinterpret, since the > fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a > quirk originating in the fact that "..." is restricted to subscripts > in Python 2. > > I don't think it's a major problem (as recursive container > representations aren't something that comes up every day), but > switching to "<...>" does have the advantage of allowing for a > consistent recursive reference representation across all container > types, regardless of whether they have native syntax or not. > > > I really feel you all are overworrying and overthinking this. A downside > to me is that <...> isn't clear about what the type of the object is. > The use case here is not sophisticated users, it's beginners who have > accidentally managed to create a recursive list or dict. They have most > likely not even encountered Ellipsis objects yet. There's nothing > clearer than the current notation to help them see that they've done > something unusual. > We could always just use 4 dots instead. From g.brandl at gmx.net Sun Dec 27 14:41:34 2015 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 27 Dec 2015 20:41:34 +0100 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 12/27/2015 06:16 PM, Guido van Rossum wrote: > On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan > > wrote: > > On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas > > wrote: > > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does. > > Right, this is the reason I think it's reasonable to suggesting > changing the recursive repr - the current form is one that *humans* > that have only learned Python 3 are likely to misinterpret, since the > fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a > quirk originating in the fact that "..." is restricted to subscripts > in Python 2. > > I don't think it's a major problem (as recursive container > representations aren't something that comes up every day), but > switching to "<...>" does have the advantage of allowing for a > consistent recursive reference representation across all container > types, regardless of whether they have native syntax or not. > > > I really feel you all are overworrying and overthinking this. A downside to me > is that <...> isn't clear about what the type of the object is. The use case > here is not sophisticated users, it's beginners who have accidentally managed to > create a recursive list or dict. They have most likely not even encountered > Ellipsis objects yet. There's nothing clearer than the current notation to help > them see that they've done something unusual. I'm not sure. As a newcomer, I would see the "..." ellipsis as "something has been left out" (possibly because of printout length etc., exactly as it is used in numpy, BTW), not "you made a recursive structure". Explicit (and still un-evalable) would be e.g. "" Georg From storchaka at gmail.com Sun Dec 27 15:20:31 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 27 Dec 2015 22:20:31 +0200 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27.12.15 19:16, Guido van Rossum wrote: > I really feel you all are overworrying and overthinking this. A downside > to me is that <...> isn't clear about what the type of the object is. > The use case here is not sophisticated users, it's beginners who have > accidentally managed to create a recursive list or dict. They have most > likely not even encountered Ellipsis objects yet. There's nothing > clearer than the current notation to help them see that they've done > something unusual. My second alternative was to use full object.__repr__. E.g. . Or, if this is considered too long, shorter form: . Or, as Georg suggested, use the word "recursive" for clearness: . Or combine type name and the word "recursive": . From guido at python.org Sun Dec 27 15:41:15 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Dec 2015 13:41:15 -0700 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sun, Dec 27, 2015 at 1:20 PM, Serhiy Storchaka wrote: > On 27.12.15 19:16, Guido van Rossum wrote: > >> I really feel you all are overworrying and overthinking this. A downside >> to me is that <...> isn't clear about what the type of the object is. >> The use case here is not sophisticated users, it's beginners who have >> accidentally managed to create a recursive list or dict. They have most >> likely not even encountered Ellipsis objects yet. There's nothing >> clearer than the current notation to help them see that they've done >> something unusual. >> > > My second alternative was to use full object.__repr__. E.g. at 0xb7111498>. The problem isn't that it's too long (though it is) but that it just poses the question "why is this not using the regular [etc] notation?" > Or, if this is considered too long, shorter form: . Same here. > Or, as Georg suggested, use the word "recursive" for clearness: > . Or combine type name and the word "recursive": > . > Sure, but I still am curious what problem you are really trying to solve. The problem seems to be purely in your mind. You also seem to be taken the guideline that the repr() of an object should be eval()-able way too strictly. It is just a guideline to help class authors decide what their repr() should look like if they don't have a better idea. And the guideline encourages writing repr()s that are intuitive to readers. Beyond that there's nothing of value -- it just reduces guesswork on both sides. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Dec 27 19:55:09 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 28 Dec 2015 11:55:09 +1100 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: <20151228005509.GR23700@ando.pearwood.info> On Sun, Dec 27, 2015 at 10:16:34AM -0700, Guido van Rossum wrote: > I really feel you all are overworrying and overthinking this. A downside to > me is that <...> isn't clear about what the type of the object is. The use > case here is not sophisticated users, it's beginners who have accidentally > managed to create a recursive list or dict. They have most likely not even > encountered Ellipsis objects yet. There's nothing clearer than the current > notation to help them see that they've done something unusual. As a data point, or perhaps an anecdote point, I've been a regular on the tutor@ and python-list@ lists for many years now, and I don't recall seeing recursive lists being an issue. I can't categorically say that it has *never* come up, but it certainly isn't common. My sense is that the not-really-an-invariant-more-of-a-guideline that eval'ing the repr of an object returns the object is not that important here. There are many things you can put in a list which will break the invariant. It is a little unfortunate that [...] is no longer a syntax error, giving us this: eval("[[...]]") == [[Ellipsis]] but I don't see that as a problem worth fixing. I think the repr of OrderedDict is fine the way it is, and I like the fact that it uses a bare ... to refer to itself rather than wrapping it in braces like regular dicts. It just looks nicer in the OrderedDict repr: OrderedDict([('key', ...)]) versus OrderedDict([('key', {...})]) I thought I would generate an extreme example, an OrderedDict with multiple references to itself in values which contain references to themselves as well: py> from collections import OrderedDict py> o = OrderedDict([(1, []), (2, {}), (3, ('a', []))]) py> o[1].append(o[1]) py> o[1].append(o) py> o[2]['x'] = o[2] py> o[2]['y'] = o py> o[3][-1].append(o[3]) py> o[3][-1].append(o) py> o[4] = o py> o OrderedDict([(1, [[...], ...]), (2, {'y': ..., 'x': {...}}), (3, ('a', [(...), ...])), (4, ...)]) As an extreme case, I would hope that I would never need to debug something this complex in real life, but I think it is useful to see all the different kinds of recursive reprs in one place. I think it is useful that they are all slightly different. If it looked like this: OrderedDict([(1, [<...>, <...>]), (2, {'y': <...>, 'x': <...>}), (3, ('a', [<...>, <...>])), (4, <...>)]) we would lose valuable hints about the types, and if they all used object __repr__ the amount of visual noise would be overwhelming: OrderedDict([(1, [, ]), (2, {'y': , 'x': }), (3, ('a', [, ])), (4, )]) Given the risk that any such change will break doctests, I don't think this is a problem worth fixing: +1 on keeping the status quo -1 on using the verbose object.__repr__ -0.5 on consistently using <...> for all types -0.5 on changing the repr of recursive OrderedDicts to be more like dict -- Steve From wes.turner at gmail.com Sun Dec 27 21:10:00 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 27 Dec 2015 20:10:00 -0600 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: <20151228005509.GR23700@ando.pearwood.info> References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> <20151228005509.GR23700@ando.pearwood.info> Message-ID: Newlines in OrderedDict.__repr__ and/or pprint(OrderedDict) would be helpful: * http://stackoverflow.com/questions/4301069/any-way-to-properly-pretty-print-ordered-dictionaries-in-python * http://bugs.python.org/issue10592 closed; superseded by: * http://bugs.python.org/issue7434 "general pprint rewrite" As a workaround, a suitable JSONEncoder and json.dumps(obj, indent=2) works alright. On Dec 27, 2015 7:55 PM, "Steven D'Aprano" wrote: > On Sun, Dec 27, 2015 at 10:16:34AM -0700, Guido van Rossum wrote: > > > I really feel you all are overworrying and overthinking this. A downside > to > > me is that <...> isn't clear about what the type of the object is. The > use > > case here is not sophisticated users, it's beginners who have > accidentally > > managed to create a recursive list or dict. They have most likely not > even > > encountered Ellipsis objects yet. There's nothing clearer than the > current > > notation to help them see that they've done something unusual. > > As a data point, or perhaps an anecdote point, I've been a regular on > the tutor@ and python-list@ lists for many years now, and I don't recall > seeing recursive lists being an issue. I can't categorically say that it > has *never* come up, but it certainly isn't common. > > My sense is that the not-really-an-invariant-more-of-a-guideline that > eval'ing the repr of an object returns the object is not that important > here. There are many things you can put in a list which will break the > invariant. It is a little unfortunate that [...] is no longer a syntax > error, giving us this: > > eval("[[...]]") == [[Ellipsis]] > > but I don't see that as a problem worth fixing. > > I think the repr of OrderedDict is fine the way it is, and I like > the fact that it uses a bare ... to refer to itself rather than > wrapping it in braces like regular dicts. It just looks nicer in the > OrderedDict repr: > > OrderedDict([('key', ...)]) > > versus > > OrderedDict([('key', {...})]) > > > I thought I would generate an extreme example, an OrderedDict with > multiple references to itself in values which contain references to > themselves as well: > > py> from collections import OrderedDict > py> o = OrderedDict([(1, []), (2, {}), (3, ('a', []))]) > py> o[1].append(o[1]) > py> o[1].append(o) > py> o[2]['x'] = o[2] > py> o[2]['y'] = o > py> o[3][-1].append(o[3]) > py> o[3][-1].append(o) > py> o[4] = o > py> o > OrderedDict([(1, [[...], ...]), (2, {'y': ..., 'x': {...}}), (3, ('a', > [(...), ...])), (4, ...)]) > > > As an extreme case, I would hope that I would never need to debug > something this complex in real life, but I think it is useful to see all > the different kinds of recursive reprs in one place. I think it is > useful that they are all slightly different. If it looked like this: > > OrderedDict([(1, [<...>, <...>]), (2, {'y': <...>, 'x': <...>}), > (3, ('a', [<...>, <...>])), (4, <...>)]) > > > we would lose valuable hints about the types, and if they all used > object __repr__ the amount of visual noise would be overwhelming: > > OrderedDict([(1, [, object at 0xb7bcbc5c>]), (2, {'y': 0xb7bcbc5c>, 'x': }), (3, ('a', [ object at 0xb7bb320c>, 0xb7bcbc5c>])), (4, )]) > > > Given the risk that any such change will break doctests, I don't think > this is a problem worth fixing: > > +1 on keeping the status quo > -1 on using the verbose object.__repr__ > -0.5 on consistently using <...> for all types > -0.5 on changing the repr of recursive OrderedDicts to be more like dict > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Mon Dec 28 07:47:06 2015 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 28 Dec 2015 13:47:06 +0100 Subject: [Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__) In-Reply-To: References: <5661E3CE.6000408@brenbarn.net> <1AF13F71-D594-48F6-9CA3-7B94D632C703@yahoo.com> <56627C5F.40805@brenbarn.net> <4DF91BF0-626A-4869-AA61-953D796599CB@yahoo.com> <2D8F2CE6-773B-4AEB-AD73-784841BE72B8@yahoo.com> Message-ID: > On 06 Dec 2015, at 13:58, Nick Coghlan wrote: > > On 6 December 2015 at 11:56, Andrew Barnert via Python-ideas > wrote: >> On Dec 5, 2015, at 09:30, Guido van Rossum wrote: >>> (However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.) >> >> I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so? > > Ronald Oussoren has elaborated on that aspect of the problem in his > __getdescriptor__ PEP: https://www.python.org/dev/peps/pep-0447/ > > The main reason it's separate from __getattribute__ is that this is > necessary to avoid changing the semantics of super.__getattribute__, > but it's also the case that things would otherwise get quite confusing > with object.__getattribute__ and super.__getattribute__ potentially > calling type.__getattribute__, which then has the potential for > strange consequences when you consider that "type" is itself an > instance of "type". > > My recollection of the previous round of discussions on that PEP is > that we're actually pretty happy with the design - it's now dependent > on someone with the roundtuits to update the reference implementation > to match the current PEP text and the head of the current development > branch. Mark Shannon had some concerns about how my proposal affects the object model. I didn?t quite get his concerns at the time, probably because I was thinking to much about the implementation. BTW. Sorry about not following up at the time, I got sucked back into work :-(. My plan for the week is to get a version of PyObC with support of OSX 10.11 on PyPI, after that I hope to return to PEP 447. Ronald From brett at python.org Mon Dec 28 12:42:42 2015 From: brett at python.org (Brett Cannon) Date: Mon, 28 Dec 2015 17:42:42 +0000 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) Message-ID: Speaking of using ABCs more, where should we put ABCs which have nothing to do with collections? As of right now all ABCs seem to get shoved into collections.abc, but e.g. Awaitable and Coroutine are not types of collections. I personally want to add a context manager ABC with a default __exit__. I opened http://bugs.python.org/issue25637 to discuss this, but I figured a wider discussion wouldn't hurt. Some suggest just putting the ABCs into the abc module. We could create an interfaces module (top-level or a submodule of ABC). The other option is to put the ABCs in subject-specific modules, so my context manager one would go into contextlib (either top-level or an abc submodule); don't know where the coroutine ones would go since it might be overloading asyncio if we out them there. Anyway, the key point is collections.abc is starting to get non-collections stuff and if we are going to start pushing ABCs more we should decide how we want to organize them in general in the stdlib and instead of dumping them into collections.abc. On Sun, Dec 27, 2015, 09:35 Guido van Rossum wrote: > I think there's a lot of interesting stuff in this thread. Personally I > don't think we should strive to distinguish between mappings and sequences > structurally. We should instead continue to encourage inheriting from (or > registering with) the corresponding ABCs. The goal is to ensure that > there's one best-practice way to distinguish mappings from sequences, and > it's by using isinstance(x, Sequence) or isinstance(x, Mapping). > > If we want some way to turn something that just defines __getitem__ and > __len__ into a proper sequence, it should just be made to inherit from > Sequence, which supplies the default __iter__ and __reversed__. > (Registration is *not* good enough here.) If we really want a way to turn > something that just supports __getitem__ into an Iterable maybe we can > provide an additional ABC for that purpose; let's call it a HalfSequence > until we've come up with a better name. (We can't use Iterable for this > because Iterable should not reference __getitem__.) > > I also think it's fine to introduce Reversible as another ABC and > carefully fit it into the existing hierarchy. It should be a one-trick pony > and be another base class for Sequence; it should not have a default > implementation. (But this has been beaten to death in other threads -- it's > time to just file an issue with a patch.) > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Dec 28 12:58:54 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Dec 2015 18:58:54 +0100 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: Message-ID: <5681785E.2070105@egenix.com> On 28.12.2015 18:42, Brett Cannon wrote: > Speaking of using ABCs more, where should we put ABCs which have nothing to > do with collections? As of right now all ABCs seem to get shoved into > collections.abc, but e.g. Awaitable and Coroutine are not types of > collections. I personally want to add a context manager ABC with a default > __exit__. > > I opened http://bugs.python.org/issue25637 to discuss this, but I figured a > wider discussion wouldn't hurt. Some suggest just putting the ABCs into the > abc module. We could create an interfaces module (top-level or a submodule > of ABC). The other option is to put the ABCs in subject-specific modules, > so my context manager one would go into contextlib (either top-level or an > abc submodule); don't know where the coroutine ones would go since it might > be overloading asyncio if we out them there. > > Anyway, the key point is collections.abc is starting to get non-collections > stuff and if we are going to start pushing ABCs more we should decide how > we want to organize them in general in the stdlib and instead of dumping > them into collections.abc. I'd put them into the abc module (perhaps turning this into a package, if things get too crowded). collections.abc could then do a "from abc import *" for b/w compatibility. > On Sun, Dec 27, 2015, 09:35 Guido van Rossum wrote: > >> I think there's a lot of interesting stuff in this thread. Personally I >> don't think we should strive to distinguish between mappings and sequences >> structurally. We should instead continue to encourage inheriting from (or >> registering with) the corresponding ABCs. The goal is to ensure that >> there's one best-practice way to distinguish mappings from sequences, and >> it's by using isinstance(x, Sequence) or isinstance(x, Mapping). >> >> If we want some way to turn something that just defines __getitem__ and >> __len__ into a proper sequence, it should just be made to inherit from >> Sequence, which supplies the default __iter__ and __reversed__. >> (Registration is *not* good enough here.) If we really want a way to turn >> something that just supports __getitem__ into an Iterable maybe we can >> provide an additional ABC for that purpose; let's call it a HalfSequence >> until we've come up with a better name. (We can't use Iterable for this >> because Iterable should not reference __getitem__.) >> >> I also think it's fine to introduce Reversible as another ABC and >> carefully fit it into the existing hierarchy. It should be a one-trick pony >> and be another base class for Sequence; it should not have a default >> implementation. (But this has been beaten to death in other threads -- it's >> time to just file an issue with a patch.) >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 28 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From chris.barker at noaa.gov Mon Dec 28 14:25:30 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 28 Dec 2015 11:25:30 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum wrote: > The two-level iteration forced upon you by os.walk() is indeed often > unnecessary -- but handling dirs and files separately usually makes sense, > indeed, but not always, so a simple API that allows you to get a flat walk would be nice.... Of course for that basic use case, you could just write your own wrapper >> around os.walk: >> > sure, but having to write "little" wrappers for common needs is unfortunate... The problem isn't designing a nice walk API; it's integrating it with >> pathlib.* > > indeed -- I'd really like to see a *walk in pathlib itself. I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is.. And honestly, if open, os.walk, etc. aren't going to work with Path >> objects, > > but they should -- of course they should..... Truly pushing for adoption of a new abstraction like this takes many years > -- pathlib was new (and provisional) in 3.4 so it really hasn't been long > enough to give up on it. The OP hasn't! > it will take many years for sure -- but the standard library cold at least adopt it as much as possible. Path.walk would be a nice start :-) My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..) I came up with this: def get_all_paths(start_dir='/'): for dirpath, dirnames, filenames in os.walk(start_dir): for filename in filenames: yield os.path.join(dirpath, filename) too_long = [] for p in get_all_paths('/'): print("checking:", p) if len(p) > 255: too_long.append(p) print("Path too long!") way too wordy! I started with pathlib, but that just made it worse. now that I think about it, maybe I could have simpily used pathlib.Path.rglob.... However, when I try that, I get a permission error: /Users/chris.barker/miniconda2/envs/py3/lib/python3.5/pathlib.py in wrapped(pathobj, *args) 369 @functools.wraps(strfunc) 370 def wrapped(pathobj, *args): --> 371 return strfunc(str(pathobj), *args) 372 return staticmethod(wrapped) 373 PermissionError: [Errno 13] Permission denied: '/Users/.chris.barker.xahome/caches/opendirectory' as the error comes insider the rglob() generator, I'm not sure how to tell it to ignore and move on.... os.walk is somehow able to deal with this. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Dec 28 17:43:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Dec 2015 14:43:29 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Dec 28, 2015, at 11:25, Chris Barker wrote: > >> On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum wrote: >> The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense, > > indeed, but not always, so a simple API that allows you to get a flat walk would be nice.... > >>> Of course for that basic use case, you could just write your own wrapper around os.walk: > > sure, but having to write "little" wrappers for common needs is unfortunate... You're replying to me, not Guido, here... Anyway, if the only thing anyone will ever need is a handful of simple one-liners that even a novice could write, maybe it's reasonable to just add one to the docs to show how to do it, instead of adding them to the stdlib. >>> The problem isn't designing a nice walk API; it's integrating it with pathlib.* > > indeed -- I'd really like to see a *walk in pathlib itself. But first you have to solve the problem that paragraph was all about: a general-purpose walk API shouldn't be throwing away all that stat information it wasted time fetching, but the pathlib module is designed around Path objects that are always live, not snapshots. If Path.walk yields something that isn't a Path, what's the point? > I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is.. I have the same impression as you, but, as Guido says, let's give it time before judging... >>> And honestly, if open, os.walk, etc. aren't going to work with Path objects, > > but they should -- of course they should..... So far things have gone the opposite direction: open requires strings, but there's a Path.open method; walk requires strings, but people are proposing a Path.walk method; etc. I'm not sure how that's supposed to extend to things like json.load or NamedTemporaryFile.name. >> Truly pushing for adoption of a new abstraction like this takes many years -- pathlib was new (and provisional) in 3.4 so it really hasn't been long enough to give up on it. The OP hasn't! > > it will take many years for sure -- but the standard library cold at least adopt it as much as possible. > > Path.walk would be a nice start :-) > > My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..) > > I came up with this: > > def get_all_paths(start_dir='/'): > for dirpath, dirnames, filenames in os.walk(start_dir): > for filename in filenames: > yield os.path.join(dirpath, filename) > > too_long = [] > for p in get_all_paths('/'): > print("checking:", p) > if len(p) > 255: > too_long.append(p) > print("Path too long!") Do you really want it to print out "Path too long!" hundreds of times? If not, this is a lot more concise, and I think readable, with comprehensions: walk = os.walk(start_dir) files = (os.path.join(root, file) for root, dirs, files in walk for file in files) too_long = (file for file in files if len(file) > 255) And now you've got a lazy Iterator over you too-long files. (If you need a list, just use a listcomp instead of a genexpr in the last step.) > way too wordy! > > I started with pathlib, but that just made it worse. If we had a Path.walk, I don't think it could be that much better than the original version, since the only thing Path can help with is making that join a bit shorter--and at the cost of having to convert to str to check len(): walk = start_path.Walk() files = (root / file for root, dirs, files in walk for file in files) too_long = (file for file in files if len(str(file)) > 255) As a side note, there's no Windows restriction to 255 _characters_, it's to 255 UTF-16 code points, just under 64K UTF-16 code points, or 255 codepage bytes, depending on which API you use. So you really want something like len(file.encode('utf-16') / 2) > 255. Also, I suspect you want either the bare filename or the abspath, not the path from the start dir (especially since a path rooted at the default '/' is two characters shorter than one rooted at 'C:\', so you're probably going to pass a bunch of files that then cause problems in your scripts). -------------- next part -------------- An HTML attachment was scrubbed... URL: From moloney at ohsu.edu Mon Dec 28 18:53:33 2015 From: moloney at ohsu.edu (Brendan Moloney) Date: Mon, 28 Dec 2015 23:53:33 +0000 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> , Message-ID: <5F6A858FD00E5F4A82E3206D2D854EF892D413B3@EXMB10.ohsu.edu> Not sure how useful this is, but I ended up writing my own "pythonic find" module: https://github.com/moloney/pathmap/blob/master/pathmap.py I was mostly worried about minimizing stat calls, so I used scandir rather than Pathlib. The only documentation is the doc strings, but the basic idea is you can have one "matching" rule and any number of ignore/prune rules. The rules can be callables or strings that are treated as regular expressions (I suppose it might be better if the default was to treat strings as glob expressions instead...). So for the original use case that spawned this thread, you would do something like: pm = PathMap(prune_rules=['/\.git$']) for match in pm.matches(['path/to/some/dir']): if not match.dir_entry.is_dir(): print(match.path) Or if you wanted to do something similar but only print names of python modules it would be something like: pm = PathMap('.+/(.+)\.py$', prune_rules=['/\.git$']) for match in pm.matches(['path/to/some/dir']): if not match.dir_entry.is_dir(): print(match.match_info[1]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Dec 28 19:50:11 2015 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 28 Dec 2015 18:50:11 -0600 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Dec 28, 2015 2:33 PM, "Chris Barker" wrote: > > On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum wrote: >> >> The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense, > > > indeed, but not always, so a simple API that allows you to get a flat walk would be nice.... The path.py .walk* APIs work great w/ fnmatch: https://pythonhosted.org/path.py/api.html#path.Path.walk https://pythonhosted.org/path.py/api.html#path.Path.walkdirs https://pythonhosted.org/path.py/api.html#path.Path.walkfiles -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Dec 28 20:25:39 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Dec 2015 17:25:39 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Dec 28, 2015, at 16:50, Wes Turner wrote: > > > On Dec 28, 2015 2:33 PM, "Chris Barker" wrote: > > > > On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum wrote: > >> > >> The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense, > > > > > > indeed, but not always, so a simple API that allows you to get a flat walk would be nice.... > > The path.py .walk* APIs work great w/ fnmatch: > > https://pythonhosted.org/path.py/api.html#path.Path.walk > > https://pythonhosted.org/path.py/api.html#path.Path.walkdirs > > https://pythonhosted.org/path.py/api.html#path.Path.walkfiles > The path module has some major differences. First, because it doesn't use scandir or anything else to avoid multiple stat calls, the caching issue doesn't come up. Also, because its Path subclasses str, it doesn't have the same usability issues (you can pass a Path straight to json.loads, for example), although of course that gives it different usability issues (e.g., inherited methods like Path.count are an obvious attractive nuisance). Also, it doesn't handle case sensitivity as automagically. Also, it's definitely the kind of "kitchen sink" design that got PEP 355 rejected (which often makes sense for a third-party lib even when it doesn't for a stdlib module). So, not everything that makes sense for path will also make sense for pathlib. But it's still worth looking at. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 28 21:59:16 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Dec 2015 12:59:16 +1000 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> <567FD8D5.8090205@egenix.com> Message-ID: On 28 December 2015 at 03:04, Guido van Rossum wrote: > If we really want a way to turn > something that just supports __getitem__ into an Iterable maybe we can > provide an additional ABC for that purpose; let's call it a HalfSequence > until we've come up with a better name. (We can't use Iterable for this > because Iterable should not reference __getitem__.) Perhaps collections.abc.Indexable would work? Invariant: for idx, val in enumerate(container): assert container[idx] is val That is, while enumerate() accepts any iterable, Indexable containers have the additional property that the contained values can be looked up by their enumeration index. Mappings (even ordered ones) don't qualify, since they offer a key:value lookup, but enumerating them produces an index:key relationship. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Dec 28 22:02:34 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Dec 2015 13:02:34 +1000 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: <20151228005509.GR23700@ando.pearwood.info> References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> <20151228005509.GR23700@ando.pearwood.info> Message-ID: On 28 December 2015 at 10:55, Steven D'Aprano wrote: > Given the risk that any such change will break doctests, I don't think > this is a problem worth fixing: > > +1 on keeping the status quo > -1 on using the verbose object.__repr__ > -0.5 on consistently using <...> for all types > -0.5 on changing the repr of recursive OrderedDicts to be more like dict +1 here - I've been persuaded that changing this behaviour isn't worth the disruption (to existing third party documentation, if nothing else). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Dec 28 22:58:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Dec 2015 13:58:04 +1000 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: <5681785E.2070105@egenix.com> References: <5681785E.2070105@egenix.com> Message-ID: On 29 December 2015 at 03:58, M.-A. Lemburg wrote: > On 28.12.2015 18:42, Brett Cannon wrote: >> Speaking of using ABCs more, where should we put ABCs which have nothing to >> do with collections? As of right now all ABCs seem to get shoved into >> collections.abc, but e.g. Awaitable and Coroutine are not types of >> collections. I personally want to add a context manager ABC with a default >> __exit__. >> >> I opened http://bugs.python.org/issue25637 to discuss this, but I figured a >> wider discussion wouldn't hurt. Some suggest just putting the ABCs into the >> abc module. We could create an interfaces module (top-level or a submodule >> of ABC). The other option is to put the ABCs in subject-specific modules, >> so my context manager one would go into contextlib (either top-level or an >> abc submodule); don't know where the coroutine ones would go since it might >> be overloading asyncio if we out them there. >> >> Anyway, the key point is collections.abc is starting to get non-collections >> stuff and if we are going to start pushing ABCs more we should decide how >> we want to organize them in general in the stdlib and instead of dumping >> them into collections.abc. > > I'd put them into the abc module (perhaps turning this into a > package, if things get too crowded). > > collections.abc could then do a "from abc import *" for b/w > compatibility. With the benefit of hindsight, I think a broad namespace separation like that might have been a good way to go from a data model discoverability perspective, as it clearly separates the abstract descriptions of data and control flow modelling concepts from the concrete implementations of those concepts. However, we should also keep in mind which standard library modules already publish ABCs, which is at least: typing io numbers collections.abc selectors email (That list was collected by grepping Python files for ABCMeta, so I may have missed some) That suggests to me that this design decision has effectively already been made, and it's to include the ABCs in the relevant domain specific modules. The inclusion of the non-collections related ABCs in collections.abc are the anomaly, which can be addressed by moving them out to a more appropriate location (adjusting the documentation accordingly), and then importing them into collections.abc for backwards compatibility (taking care not to increase the startup import footprint too much in the process). The current collections.abc interfaces which I think are most at issue here: Callable Iterable Iterator Generator Awaitable Coroutine Awaitable AsyncIterable AsyncIterator (I'm excluding Hashable from the list, as the main reason that matters is in describing whether or not something is suitable for inclusion in a dict or set) These differ from the rest of the collections.abc interfaces in that they're more closely associated with language level control flow syntax than they are with containers specifically: Callable - function calls Iterable - for loops, comprehensions Iterator - for loops, comprehensions Generator - generators, generator expressions Awaitable - await expressions Coroutine - async def AsyncIterable - async for AsyncIterator - async for Adding ContextManager and AsyncContextManager would give ABCs for the protocols related to "with" and "async with". Since these all correspond to syntactic protocols, I now think it's reasonable to include them directly in the "abc" namespace, since that still gives us a clear guideline for which ABCs go there, and which belong somewhere else: if it has syntax associated with it, or it's part of the ABC machinery itself, then it can go directly in the "abc" namespace. Using the "abc" namespace also ensures there isn't any additional import overhead, since that gets imported by all ABC using code anyway. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From leewangzhong+python at gmail.com Mon Dec 28 23:08:49 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 28 Dec 2015 23:08:49 -0500 Subject: [Python-ideas] Have max and min functions ignore None Message-ID: What do people think about having `max` and `min` ignore `None`? Examples: max(1, None) == 1 min(1, None) == 1 max([1, None]) == 1 max(None, None) == max() or max(None, None) == max([]) (The last one currently throws two different errors.) This change would allow one to use `None` as a default value. For example, def my_max(lst): best = None for x in lst: best = max(best, x) return best Currently, you would initialize `best` to the first element, or to float('-inf'). There are more complicated examples, which aren't just replications of `max`'s functionality. The example I have in mind wants to update several running maximums during iteration. I know that there are other ways to do it (having given one above). What if this becomes _the_ obvious way to do it? I'm concerned about this silencing some bugs which would have been caught before. I'm also worried about whether it would make sense to people learning Python. I'm less concerned about custom types which allow comparisons to `None`, because I don't understand why you would want that, but you can change my mind. From rosuav at gmail.com Mon Dec 28 23:21:57 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Dec 2015 15:21:57 +1100 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: On Tue, Dec 29, 2015 at 3:08 PM, Franklin? Lee wrote: > What do people think about having `max` and `min` ignore `None`? > > Examples: > max(1, None) == 1 > min(1, None) == 1 > max([1, None]) == 1 > max(None, None) == max() or max(None, None) == max([]) > > (The last one currently throws two different errors.) What you could do is simply filter them out: def max_without_none(seq): return max(item for item in seq if item is not None) Would that do what you need? ChrisA From tjreedy at udel.edu Mon Dec 28 23:52:15 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Dec 2015 23:52:15 -0500 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: On 12/28/2015 11:08 PM, Franklin? Lee wrote: > What do people think about having `max` and `min` ignore `None`? > > Examples: > max(1, None) == 1 > min(1, None) == 1 This amounts to saying that the comparisions 1 < None and 1 > None are both defined and both True. > max([1, None]) == 1 > max(None, None) == max() or max(None, None) == max([]) > > (The last one currently throws two different errors.) > > > This change would allow one to use `None` as a default value. For example, > > def my_max(lst): > best = None > > for x in lst: > best = max(best, x) > > return best rewrite this as def my_best(iterable): it = iter(iterable) try: best = it.next() except StopIteration: raise ValueError('Empty iterable has no maximum') for x in it: if x > best: best = x > Currently, you would initialize `best` to the first element, Since an empty iterable has no max (unless one wants to define the equivalent of float('-inf') as a default), initializing with the first element is the proper thing to do. ... > I'm concerned about this silencing some bugs which would have been > caught before. I'm also worried about whether it would make sense to > people learning Python. None currently means 'no value', which means that most operations on None are senseless. -- Terry Jan Reedy From abarnert at yahoo.com Tue Dec 29 00:43:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Dec 2015 21:43:23 -0800 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: <8E536A5E-ED83-4818-A475-3B703A805508@yahoo.com> On Dec 28, 2015, at 20:08, Franklin? Lee wrote: > > This change would allow one to use `None` as a default value. Actually, it might be useful to allow a default value in general. (In typed functional languages, you often specify a default, or use a type that has a default value, so max(list[A]) can always return an A.) Then again, you can write this pretty easily yourself: def my_max(iterable, *, default=_sentinel): try: return max(Iterable) except WhateverEmptyIterableRaises: if default is _sentinel: raise return default From ncoghlan at gmail.com Tue Dec 29 01:05:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Dec 2015 16:05:18 +1000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: <8E536A5E-ED83-4818-A475-3B703A805508@yahoo.com> References: <8E536A5E-ED83-4818-A475-3B703A805508@yahoo.com> Message-ID: On 29 December 2015 at 15:43, Andrew Barnert via Python-ideas wrote: > On Dec 28, 2015, at 20:08, Franklin? Lee wrote: >> >> This change would allow one to use `None` as a default value. > > Actually, it might be useful to allow a default value in general. (In typed functional languages, you often specify a default, or use a type that has a default value, so max(list[A]) can always return an A.) min() and max() both support a "default" keyword-only parameter in 3.4+: >>> max([]) Traceback (most recent call last): File "", line 1, in ValueError: max() arg is an empty sequence >>> max([], default=None) That means using "None" as the default result for an empty iterable is already straightforward: def my_max(iterable): return max(iterable, default=None) def my_min(iterable): return min(iterable, default=None) You only have to filter the input data or use a custom key function in order to ignore None values that exist in the input, not to produce None rather than an exception when the input iterable is empty. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Tue Dec 29 01:44:13 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 29 Dec 2015 08:44:13 +0200 Subject: [Python-ideas] Fwd: Re: Unambiguous repr for recursive objects In-Reply-To: References: <583718208.2947324.1451163911694.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 27.12.15 22:41, Guido van Rossum wrote: > Sure, but I still am curious what problem you are really trying to > solve. The problem seems to be purely in your mind. You also seem to be > taken the guideline that the repr() of an object should be eval()-able > way too strictly. It is just a guideline to help class authors decide > what their repr() should look like if they don't have a better idea. And > the guideline encourages writing repr()s that are intuitive to readers. > Beyond that there's nothing of value -- it just reduces guesswork on > both sides. Thank you, now I understand this. From leewangzhong+python at gmail.com Tue Dec 29 01:49:43 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 29 Dec 2015 01:49:43 -0500 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: I was hoping that my message was clear enough that I wouldn't get suggestions of alternatives. I know how to do without it. Take this example: from collections import defaultdict from math import inf def maxes(lst): bests = defaultdict(lambda: -inf) for x, y in lst: bests[x] = max(bests[x], y) return bests The proposed change would only save me an import, and I could've used `inf = float('inf')` instead. from collections import defaultdict def maxes(lst): bests = defaultdict(lambda: None) for x, y in lst: bests[x] = max(bests[x], y) return bests On Mon, Dec 28, 2015 at 11:52 PM, Terry Reedy wrote: > On 12/28/2015 11:08 PM, Franklin? Lee wrote: >> >> What do people think about having `max` and `min` ignore `None`? >> >> Examples: >> max(1, None) == 1 >> min(1, None) == 1 > > > This amounts to saying that the comparisions 1 < None and 1 > None are both > defined and both True. Not exactly. max(1, None) == max(None, 1) == 1 There is no definable comparison to None which allows both max and min to return the correct value. > rewrite this as > > def my_best(iterable): > it = iter(iterable) > try: > best = it.next() > except StopIteration: > raise ValueError('Empty iterable has no maximum') > for x in it: > if x > best: > best = x Well, `my_best = max` is the cleanest way. It's not the point. >> Currently, you would initialize `best` to the first element, > > > Since an empty iterable has no max (unless one wants to define the > equivalent of float('-inf') as a default), initializing with the first > element is the proper thing to do. Mathematically, the max of the empty set is the min of the ambient set. So the max of an empty collection of natural numbers is 0, while the max of an empty collection of reals is -inf. Of course, Python doesn't know what type of elements your collection is expected to have, so (as Nick said) you would manually specify the default with a keyword argument. But that's not the point. >> I'm concerned about this silencing some bugs which would have been >> caught before. I'm also worried about whether it would make sense to >> people learning Python. > > > None currently means 'no value', which means that most operations on None > are senseless. No value, like a lack of something to consider in the calculation of the maximum? PS: This change would also allow one to use a `key` function which returns None for an object that shouldn't be considered. Now _that_ might be more useful. But again, I know how to deal without it: have the key function return `inf` or `-inf` instead. I'm asking if using `None` could become the "one obvious way to do it". There is semantic meaning to initializing `best = None`, after all: "At this point, there is no best yet." From leewangzhong+python at gmail.com Tue Dec 29 02:13:47 2015 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 29 Dec 2015 02:13:47 -0500 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Sat, Dec 12, 2015 at 1:34 PM, Michael Selik wrote: > On Fri, Dec 11, 2015, 8:20 PM Franklin? Lee > wrote: >> >> By the way, there are other usecases for ignoring arguments for >> caching. For example, dynamic programming where the arguments are the >> indices of a sequence, or some other object (tree?) which isn't a >> recursive argument. I recommend that those also be done with a closure >> (separating the recursive part from the initial arguments), but I >> think it's worth considering an lru_cache implementation for students >> who haven't learned to, er, abuse closures. Unless someone thinks a >> recipe can/should be added to the docs. > > > This whole thing is probably best implemented as two separate functions > rather than using a closure, depending on how intertwined the code paths are > for the shortcut/non-shortcut versions. I like the closure because it has semantic ownership: the inner function is a worker for the outer function. Also, with many dynamic programming problems which have a non-recursive variable, if you don't have a closure, you would need global state instead. Here's an example: inf = float('inf') def max_nonconsecutive_subset_sum(lst, n): """ Returns the biggest possible sum of n non-consecutive items. """ def rec(i, k): """ i: index k: number of items summed so far """ if k == n: # Stop summing return 0 if i == len(lst): # Reached end of list return -inf return max(rec(i+1, k), rec(i+2, k+1) + lst[i]) return rec(0, 0) (We can also shortcut out if there aren't enough items left to take k of them, and calculate length only once, but it's not needed for correctness.) There should be no significant cost for the closure, as the recursive part should be the bulk of the work. >> On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee >> wrote: >> > Solutions: >> > 1. Rewrite your recursive function so that the partial state is a >> > nonlocal variable (in the closure), and memoize the recursive part. > > > I'd flip the rare-case to the except block and put the normal-case in the > try block. I believe this will be more compute-efficient and more readable. The rare case is in the except block, though. From steve at pearwood.info Tue Dec 29 06:22:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Dec 2015 22:22:58 +1100 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: <20151229112258.GT23700@ando.pearwood.info> On Mon, Dec 28, 2015 at 11:08:49PM -0500, Franklin? Lee wrote: > What do people think about having `max` and `min` ignore `None`? I'd want to think about it carefully. I think that there might be a good case for making max and min a bit more sophisticated, but I'm not quite sure how sophisticated. There's more to it than just None. If you think of None as just some value, then including None in a list of numbers (say) is an error, and should raise an exception as it does now (in Python 3, not Python 2). So that's perfectly reasonable, and correct, behaviour. If you think of None as representing a missing value, then there are two equally good interpretations of max(x, None): either we ignore missing values and return x, or we say that if one value is unknown, the max is also clearly unknown, and propagate that missing value as the answer. So that's three perfectly reasonable behaviours: max(x, None) is an error and should raise; max(x, None) ignores None and returns x; max(x, None) is unknown or missing and returns None (or some other sentinel representing NA/Missing/Unknown). In R, the max or min of a list with missing values is the missing value, unless you specifically tell R to ignore NA: > max(c(1, 2, 3, NA)) [1] NA > max(c(1, 2, 3, NA), na.rm=TRUE) [1] 3 In Javascript, I guess the equivalent would be null, which appears to be coerced to 0: js> Math.max(1, 2, null, 4) 4 js> Math.min(1, 2, null, 4) 0 I don't think there is any good justification for that behaviour. That's the sort of thing which gives weakly typed languages a bad name. Just tossing this out to be shot down... What if the builtin max and min remained unchanged, but we added variants of them to the statistics module which treated None as a missing value, to be either ignored or propagated, as R does? -- Steve From ncoghlan at gmail.com Tue Dec 29 07:02:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Dec 2015 22:02:24 +1000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: <20151229112258.GT23700@ando.pearwood.info> References: <20151229112258.GT23700@ando.pearwood.info> Message-ID: On 29 December 2015 at 21:22, Steven D'Aprano wrote: > What if the builtin max and min remained unchanged, but we added > variants of them to the statistics module which treated None as a > missing value, to be either ignored or propagated, as R does? If the statistics module were to start borrowing selected concepts from R, it makes sense to me to look at how those have been translated into the Python ecosystem by NumPy/SciPy/pandas first. In the case of min/max, the most relevant APIs appear to be: pandas.DataFrame.min pandas.DataFrame.max numpy.amin numpy.amax numpy.nanmin numpy.nanmax The pandas variants support a "skipna" argument, which indicates whether or not to ignore missing values (e.g. None, NaN). This defaults to true, so such null values are ignored. If you set it to False, they get included and propagate to the result: >>> df = pandas.DataFrame([1, 2, 3, None, float("nan")]) >>> df.min() 0 1 dtype: float64 >>> df.min(skipna=False) 0 NaN dtype: float64 For NumPy, amin and amax propagate NaN/None, while nanmin/nanmax are able to filter out floating point NaN values, but emit TypeError if asked to cope with None as a value. I think the fact both NumPy and pandas support R-style handling of min() and max() counts in favour of having variants of those with additional options for handling missing data values in the standard library statistics module. Regards, Nick. P.S. Another option might be to consider the question as part of a general "data cleaning" strategy for the statistics module, similar to the one discussed for pandas at http://pandas.pydata.org/pandas-docs/stable/missing_data.html Even if the statistics module itself doesn't provide the tools to address those problems, it could provide some useful pointers on when someone may want to switch from the standard library module to a more comprehensive solution like pandas that better handles the messy complications of working with real world data (and data formats). -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From chris.barker at noaa.gov Tue Dec 29 11:41:37 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 29 Dec 2015 08:41:37 -0800 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: <20151229112258.GT23700@ando.pearwood.info> Message-ID: <-2402406119806377967@unknownmsgid> but emit TypeError if > asked to cope with None as a value. > Well, sort of. Numpy arrays are homogenous, you can't have a None in an array ( other than an object style). All the Numpy "ufuncs" create an array from the input first -- that's where you get your ValueError. But the Numpy experience is informative -- there have been years of " we need a better masked array" discussions, but no consensus on what it should be. For floats, NaN can be used for missing values, but there is no such value for integers, and each use case has a sufferer end "obvious" interpretation. That's why it's explicit what you want with the nan* functions. I don't think python should decide for users what None means in this context. -CHB > I think the fact both NumPy and pandas support R-style handling of > min() and max() counts in favour of having variants of those with > additional options for handling missing data values in the standard > library statistics module. > > Regards, > Nick. > > P.S. Another option might be to consider the question as part of a > general "data cleaning" strategy for the statistics module, similar to > the one discussed for pandas at > http://pandas.pydata.org/pandas-docs/stable/missing_data.html > > Even if the statistics module itself doesn't provide the tools to > address those problems, it could provide some useful pointers on when > someone may want to switch from the standard library module to a more > comprehensive solution like pandas that better handles the messy > complications of working with real world data (and data formats). > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Tue Dec 29 12:38:19 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 29 Dec 2015 09:38:19 -0800 Subject: [Python-ideas] find-like functionality in pathlib In-Reply-To: References: <357278012.1941601.1450821295123.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Mon, Dec 28, 2015 at 2:43 PM, Andrew Barnert wrote: > sure, but having to write "little" wrappers for common needs is > unfortunate... > > > You're replying to me, not Guido, here... > I was intending to reply to the list :-) > Anyway, if the only thing anyone will ever need is a handful of simple > one-liners that even a novice could write, maybe it's reasonable to just > add one to the docs to show how to do it, instead of adding them to the > stdlib. > well, it's a four liner, yes? but I'm not sure i agree -- the simple things should be simple. even if you can find the couple-liner in the docs, you've still got a lot more overhead than calling a ready-to-go function. and it's not like it'd be a heavy maintenance burden.... The problem isn't designing a nice walk API; it's integrating it with >> pathlib.* > > indeed -- I'd really like to see a *walk in pathlib itself. But first you have to solve the problem that paragraph was all about: a general-purpose walk API shouldn't be throwing away all that stat information it wasted time fetching, but the pathlib module is designed around Path objects that are always live, not snapshots. If Path.walk yields something that isn't a Path, what's the point? OK -- you've gotten out of my technical depth now.....so I'll just shut up. But at the end of the day, if you've got the few-liner in the docs that works, maybe it's OK that it's not optimized..... I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is.. > I have the same impression as you, but, as Guido says, let's give it time > before judging... time good -- but also maybe some more work to make it easy to use with rest of the stdlib. I will say that one thing that bugs me about the "old style" os.path functions is that I find myself stringing tehm together, and that gets really ugly fast: my_path - os.path.join(os.path.split(something)[0], something_else) here's where an OO interface is much nicer. And honestly, if open, os.walk, etc. aren't going to work with Path >> objects, > > but they should -- of course they should..... So far things have gone the opposite direction: open requires strings, but there's a Path.open method; This sure feels to me like the wrong way to go -- too OO -heavy: create a Path object, then use it to open a file. which is why we still have the regular old open() that takes strings. I just finished teaching an intro to Python class, using py3 for the first time -- I found myself pointing students to pathlib, but then never using it in any examples, etc. That may be my old habits, but I really think we do have an ugly mix of APIs here. > walk requires strings, but people are proposing a Path.walk method; etc. well, walk "feels" to me like a path-y operation. whereas open() does not. I'm not sure how that's supposed to extend to things like json.load or NamedTemporaryFile.name. exactly -- that's why open() doesn't feel path-y to me. you have all sorts of places where you might want to open a file, and you want to open other things as well. And I like APIs that let you pass in either an open file-like object, OR a path -- so it seems allowing either a Path object or a path-in-a-string would be good. so my "proposal" is to go through the stdlib and add the ability to accept a Path object everywhere a string path is accepted. (hmm -- could you simply wrap str() around the input?) My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..) I came up with this: def get_all_paths(start_dir='/'): for dirpath, dirnames, filenames in os.walk(start_dir): for filename in filenames: yield os.path.join(dirpath, filename) too_long = [] for p in get_all_paths('/'): print("checking:", p) if len(p) > 255: too_long.append(p) print("Path too long!") > Do you really want it to print out "Path too long!" hundreds of times? well, not in production, no, but was nice to test -- also, in theory, there shouldn't be many! > If not, this is a lot more concise, and I think readable, with comprehensions: walk = os.walk(start_dir) files = (os.path.join(root, file) for root, dirs, files in walk for file in files) too_long = (file for file in files if len(file) > 255) thanks -- should have thought of that -- though that was to pass off to a sysadmin that doesn't know much python -- harder for him to read?? > And now you've got a lazy Iterator over you too-long files. > (If you need a > list, just use a listcomp instead of a genexpr in the last step.) yup -- probably I'd write it out to a file in the real use case. or stdout. way too wordy! I started with pathlib, but that just made it worse. > If we had a Path.walk, I don't think it could be that much better than the > original version, sure -- the wordyness comes from the fact that you have to deal with dirs and files separately. > since the only thing Path can help with is making that join a bit > shorter--and at the cost of having to convert to str to check len(): maybe another argument for why Path doesn't buy much over string paths... > walk = start_path.Walk() > files = (root / file for root, dirs, files in walk for file in files) > too_long = (file for file in files if len(str(file)) > 255) what I really want here is: too_long = (filepath for filepath in Path(root) if len(filepath) > 255 ) I know python isn't a shell scripting language but it is a one liner in powershell or bash, or.... As a side note, there's no Windows restriction to 255 _characters_, it's to > 255 UTF-16 code points, IIUC, Windows itself, nor ntfs has this restriction, but some older utilities do -- really pathetic. And I asked our sysadmin about the unicode issue, and he hasd no idea. > just under 64K UTF-16 code points, how is a codepoint different than a character???? I was wondering if it was a bytes restriction or codepoint restriction? > or 255 codepage bytes, depending on which API you use. this is where it gets ugly -- who knows what API some utility is using??? So you really want something like len(file.encode('utf-16') / 2) > 255. but can't some characters use more than 2 bytes in utf-16? or is that what you're trying to catch here? Also, I suspect you want either the bare filename or the abspath, not the > path from the start dir (especially since a path rooted at the default '/' > is two characters shorter than one rooted at 'C:\', well, the startdir would be C:\ and now I'm confused about whether the "C:\" is parto f the 255-something restriction! anyway, WAY OT -- and if this is used it will be mainly to flag potential problems, not really a robust test. Thanks, -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Dec 29 13:42:23 2015 From: brett at python.org (Brett Cannon) Date: Tue, 29 Dec 2015 18:42:23 +0000 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: <5681785E.2070105@egenix.com> Message-ID: With MAL, Nick, and Yury all suggesting the syntax-related ABCs go into the abc module and no one else really suggesting otherwise beyond me, I think it's reasonable to go with the syntax rule-of-thumb for what ABCs go into abc and all other abstract concepts go into domain-specific modules (e.g., import-related ones go into importlib.abc, which Nick forgot to list :). On Mon, 28 Dec 2015 at 19:58 Nick Coghlan wrote: > On 29 December 2015 at 03:58, M.-A. Lemburg wrote: > > On 28.12.2015 18:42, Brett Cannon wrote: > >> Speaking of using ABCs more, where should we put ABCs which have > nothing to > >> do with collections? As of right now all ABCs seem to get shoved into > >> collections.abc, but e.g. Awaitable and Coroutine are not types of > >> collections. I personally want to add a context manager ABC with a > default > >> __exit__. > >> > >> I opened http://bugs.python.org/issue25637 to discuss this, but I > figured a > >> wider discussion wouldn't hurt. Some suggest just putting the ABCs into > the > >> abc module. We could create an interfaces module (top-level or a > submodule > >> of ABC). The other option is to put the ABCs in subject-specific > modules, > >> so my context manager one would go into contextlib (either top-level or > an > >> abc submodule); don't know where the coroutine ones would go since it > might > >> be overloading asyncio if we out them there. > >> > >> Anyway, the key point is collections.abc is starting to get > non-collections > >> stuff and if we are going to start pushing ABCs more we should decide > how > >> we want to organize them in general in the stdlib and instead of dumping > >> them into collections.abc. > > > > I'd put them into the abc module (perhaps turning this into a > > package, if things get too crowded). > > > > collections.abc could then do a "from abc import *" for b/w > > compatibility. > > With the benefit of hindsight, I think a broad namespace separation > like that might have been a good way to go from a data model > discoverability perspective, as it clearly separates the abstract > descriptions of data and control flow modelling concepts from the > concrete implementations of those concepts. > > However, we should also keep in mind which standard library modules > already publish ABCs, which is at least: > > typing > io > numbers > collections.abc > selectors > email > > (That list was collected by grepping Python files for ABCMeta, so I > may have missed some) > > That suggests to me that this design decision has effectively already > been made, and it's to include the ABCs in the relevant domain > specific modules. The inclusion of the non-collections related ABCs in > collections.abc are the anomaly, which can be addressed by moving them > out to a more appropriate location (adjusting the documentation > accordingly), and then importing them into collections.abc for > backwards compatibility (taking care not to increase the startup > import footprint too much in the process). > > The current collections.abc interfaces which I think are most at issue > here: > > Callable > Iterable > Iterator > Generator > Awaitable > Coroutine > Awaitable > AsyncIterable > AsyncIterator > > (I'm excluding Hashable from the list, as the main reason that matters > is in describing whether or not something is suitable for inclusion in > a dict or set) > > These differ from the rest of the collections.abc interfaces in that > they're more closely associated with language level control flow > syntax than they are with containers specifically: > > Callable - function calls > Iterable - for loops, comprehensions > Iterator - for loops, comprehensions > Generator - generators, generator expressions > Awaitable - await expressions > Coroutine - async def > AsyncIterable - async for > AsyncIterator - async for > > Adding ContextManager and AsyncContextManager would give ABCs for the > protocols related to "with" and "async with". > > Since these all correspond to syntactic protocols, I now think it's > reasonable to include them directly in the "abc" namespace, since that > still gives us a clear guideline for which ABCs go there, and which > belong somewhere else: if it has syntax associated with it, or it's > part of the ABC machinery itself, then it can go directly in the "abc" > namespace. Using the "abc" namespace also ensures there isn't any > additional import overhead, since that gets imported by all ABC using > code anyway. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Tue Dec 29 18:02:09 2015 From: gerald.britton at gmail.com (Gerald Britton) Date: Tue, 29 Dec 2015 18:02:09 -0500 Subject: [Python-ideas] Have max and min functions ignore None Message-ID: On *Tue Dec 29 06:22:58 EST 2015, Stephen D'Aprano wrote:* > So that's three perfectly reasonable behaviours: max(x, None) is an error > and should raise; > max(x, None) ignores None and returns x; > max(x, None) is unknown or missing and returns None > (or some other sentinel representing NA/Missing/Unknown). For comparison's sake, SQL ignores NULL when doing MAX: e.g. select max(val) from (values (1),(null)) v(val) returns 1 In Python, None is sorta-kinda a bit like NULL in SQL, so one could make the argument that None should be handled similarly in min and max. OTOH I wouldn't want to see Python implement 3-valued logic. -- Gerald Britton, MCSE-DP, MVP LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Tue Dec 29 21:21:33 2015 From: mike at selik.org (Michael Selik) Date: Wed, 30 Dec 2015 02:21:33 +0000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: If None gets ignored, what about iterables that mix numbers and strings? A comprehension handles the simple case relatively succinctly and also handles more complex cases. max(x for x in iterable if isinstance(x, (int, float))) If you prefer is_number = lambda obj: isinstance(obj, numbers.Number) max(filter(is_number, iterable)) On Tue, Dec 29, 2015 at 6:02 PM Gerald Britton wrote: > On *Tue Dec 29 06:22:58 EST 2015, Stephen D'Aprano wrote:* > > >> So that's three perfectly reasonable behaviours: max(x, None) is an error >> and should raise; >> max(x, None) ignores None and returns x; >> max(x, None) is unknown or missing and returns None >> (or some other sentinel representing NA/Missing/Unknown). > > > For comparison's sake, SQL ignores NULL when doing MAX: > > e.g. > > select max(val) > from (values (1),(null)) v(val) > > returns > > 1 > > In Python, None is sorta-kinda a bit like NULL in SQL, so one could make > the argument that None should be handled similarly in min and max. OTOH I > wouldn't want to see Python implement 3-valued logic. > > > -- > Gerald Britton, MCSE-DP, MVP > LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 29 22:06:28 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Dec 2015 13:06:28 +1000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: Message-ID: On 30 December 2015 at 12:21, Michael Selik wrote: > If None gets ignored, what about iterables that mix numbers and strings? That's part of why I like Steven's suggestion of putting a capability along these lines in the statistics module: that reduces the input domain to numeric types, so the statistical analysis functions in NumPy and Pandas and the data aggregation functions in SQL become better behavioural guides. By contrast, the builtin min() and max() work with arbitrary (potentially heterogeneous) iterables, so special casing None (or NaN) doesn't make sense the way it does in strictly numerical analysis (or analysis with constrained types). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Wed Dec 30 13:09:08 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 30 Dec 2015 19:09:08 +0100 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> <567FD8D5.8090205@egenix.com> Message-ID: <56841DC4.3020302@mail.de> On 29.12.2015 03:59, Nick Coghlan wrote: > On 28 December 2015 at 03:04, Guido van Rossum wrote: >> [ABCs are one honking great idea -- let's do more of those!] > [collections.abc.Indexable would be a good one.] Maybe, I still cannot wrap my mind enough around the types-everywhere-in-python-please world. But, what's so wrong about checking for __getitem__ or __len__ if necessary? Best, Sven From abarnert at yahoo.com Wed Dec 30 14:32:07 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Dec 2015 11:32:07 -0800 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: <56841DC4.3020302@mail.de> References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> <567FD8D5.8090205@egenix.com> <56841DC4.3020302@mail.de> Message-ID: On Dec 30, 2015, at 10:09, Sven R. Kunze wrote: > >> On 29.12.2015 03:59, Nick Coghlan wrote: >>> On 28 December 2015 at 03:04, Guido van Rossum wrote: >>> [ABCs are one honking great idea -- let's do more of those!] >> [collections.abc.Indexable would be a good one.] > > Maybe, I still cannot wrap my mind enough around the types-everywhere-in-python-please world. > > But, what's so wrong about checking for __getitem__ or __len__ if necessary? Well, for one thing, that will pick up mappings, generic types, and various other things that aren't indexable but use __getitem__ for other purposes. It's the same problem as this thread in reverse: checking for __iter__ gives you false negatives because of the old-style sequence protocol; checking for __getitem__ gives you false positives because of the mapping protocol. But false positives are generally worse. Normally, you'd just EAFP it and write seq[idx] and deal with any exception; if you have to LBYL for some reason, a test that incorrectly passes many common values is not very helpful. Of course you could try a more stringent test--check for __getitem__ but not keys and not __extra__ and so on--but then you have to do that test everywhere; better to centralize it in one place. Or, even better, to just accept that some things are not feasible for structural tests and just test for types that explicitly declare themselves Indexable (by inheritance or registration). That way, you may get false negatives, but not on common types, and it only takes one line of code to register that third-party class with Sequence if you need to--and no false positives. Also, of course, ABCs are often useful as mixins. The fact that I can write a fully-fledged sequence with all the bells and whistles in 10 lines of code by inheriting from Sequence is pretty nice. Getting things like __iter__ for free by inheriting from Indexable (especially if the old-style sequence protocol is deprecated) would be similarly nice. Again, you don't need to test for this all over the place--most of the time, you'll just EAFP. But when you do need to have a test, better to have one that says what it means, and doesn't pass false positives, and can be easily hooked for weird third-party classes, and so on. From mike at selik.org Wed Dec 30 23:30:24 2015 From: mike at selik.org (Michael Selik) Date: Thu, 31 Dec 2015 04:30:24 +0000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: <-2402406119806377967@unknownmsgid> References: <20151229112258.GT23700@ando.pearwood.info> <-2402406119806377967@unknownmsgid> Message-ID: On Tue, Dec 29, 2015, 11:41 AM Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > but emit TypeError if > > asked to cope with None as a value. > > > Well, sort of. Numpy arrays are homogenous, you can't have a None in > an array ( other than an object style). All the Numpy "ufuncs" create > an array from the input first -- that's where you get your ValueError. > > But the Numpy experience is informative -- there have been years of " > we need a better masked array" discussions, but no consensus on what > it should be. > > For floats, NaN can be used for missing values, but there is no such > value for integers, and each use case has a sufferer end "obvious" > interpretation. That's why it's explicit what you want with the nan* > functions. > > I don't think python should decide for users what None means in this > context. > None is obviously the sound of one hand clapping. When you understand its proper use, you become Enlightened. > -CHB > > > I think the fact both NumPy and pandas support R-style handling of > > min() and max() counts in favour of having variants of those with > > additional options for handling missing data values in the standard > > library statistics module. > NumPy and Pandas have a slightly different audience than Python core. The scientific community often veers more practical than pure, in some cases to the detriment of code clarity. > Regards, > > Nick. > > > > P.S. Another option might be to consider the question as part of a > > general "data cleaning" strategy for the statistics module, similar to > > the one discussed for pandas at > > http://pandas.pydata.org/pandas-docs/stable/missing_data.html I prefer this option. Why solve the special case of max/min when we can solve (or help solve) the general case of missing data. There's already the internal ``_coerce`` method. Maybe clean that up for public consumption, or something like it, adding drop-missing functionality? If that flies, then there might be room for an ``interpolate(sequence, method='linear')`` which would be awesome. > > Even if the statistics module itself doesn't provide the tools to > > address those problems, it could provide some useful pointers on when > > someone may want to switch from the standard library module to a more > > comprehensive solution like pandas that better handles the messy > > complications of working with real world data (and data formats). > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Dec 30 23:44:13 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 31 Dec 2015 15:44:13 +1100 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: References: <20151229112258.GT23700@ando.pearwood.info> <-2402406119806377967@unknownmsgid> Message-ID: <20151231044412.GW23700@ando.pearwood.info> On Thu, Dec 31, 2015 at 04:30:24AM +0000, Michael Selik wrote: > If that flies, then there might be room for an ``interpolate(sequence, > method='linear')`` which would be awesome. (I presume you're still talking about the statistics module here, not pandas.) What did you have in mind? -- Steve From mike at selik.org Thu Dec 31 00:10:51 2015 From: mike at selik.org (Michael Selik) Date: Thu, 31 Dec 2015 05:10:51 +0000 Subject: [Python-ideas] Using functools.lru_cache only on some arguments of a function In-Reply-To: References: Message-ID: On Tue, Dec 29, 2015 at 2:14 AM Franklin? Lee wrote: > On Sat, Dec 12, 2015 at 1:34 PM, Michael Selik wrote: > > On Fri, Dec 11, 2015, 8:20 PM Franklin? Lee < > leewangzhong+python at gmail.com> > > wrote: > > This whole thing is probably best implemented as two separate functions > > rather than using a closure, depending on how intertwined the code paths > are > > for the shortcut/non-shortcut versions. > > I like the closure because it has semantic ownership: the inner > function is a worker for the outer function. > True, a closure has better encapsulation, making it less likely someone will misuse the helper function. On the other hand, that means there's less modularity and it would be difficult for someone to use the inner function. It's hard to know the right choice without seeing the exact problem the original author was working on. > >> On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee > >> wrote: > >> > 1. Rewrite your recursive function so that the partial state is a > >> > nonlocal variable (in the closure), and memoize the recursive part. > > > > I'd flip the rare-case to the except block and put the normal-case in the > > try block. I believe this will be more compute-efficient and more > readable. > > The rare case is in the except block, though. > You're correct. Sorry, I somehow misinterpreted the comment, "# To trigger the exception the first time" as indicating that code path would run only once. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Thu Dec 31 00:51:33 2015 From: mike at selik.org (Michael Selik) Date: Thu, 31 Dec 2015 05:51:33 +0000 Subject: [Python-ideas] Have max and min functions ignore None In-Reply-To: <20151231044412.GW23700@ando.pearwood.info> References: <20151229112258.GT23700@ando.pearwood.info> <-2402406119806377967@unknownmsgid> <20151231044412.GW23700@ando.pearwood.info> Message-ID: On Wed, Dec 30, 2015 at 11:44 PM Steven D'Aprano wrote: > On Thu, Dec 31, 2015 at 04:30:24AM +0000, Michael Selik wrote: > > > If that flies, then there might be room for an ``interpolate(sequence, > > method='linear')`` which would be awesome. > > (I presume you're still talking about the statistics module here, not > pandas.) > > What did you have in mind? > While the scientific community is well-served by NumPy and Pandas, there are many users trying to do a lighter amount of data wrangling that does not include linear algebra. In my anecdotal experience, the most common tasks are: 1. drop records with missing/bad data 2. replace missing/bad values with a constant value 3. interpolate missing values with either a pad-forward or linear method While NumPy often has methods doing in-place mutation, the users I'm thinking of are generally not worried about memory size and would be better served by pure functions. Going back to the original topic of skipping None values. I'd like to add that many datasets use bizarre values like all 9s or -1 or '.' or whatever to represent missingness. So, I'm not confident there's a good general-purpose solution more simple than comprehensions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 31 02:51:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 31 Dec 2015 17:51:13 +1000 Subject: [Python-ideas] Deprecating the old-style sequence protocol In-Reply-To: <56841DC4.3020302@mail.de> References: <242429025.2987954.1451185648904.JavaMail.yahoo.ref@mail.yahoo.com> <242429025.2987954.1451185648904.JavaMail.yahoo@mail.yahoo.com> <567FD8D5.8090205@egenix.com> <56841DC4.3020302@mail.de> Message-ID: On 31 December 2015 at 04:09, Sven R. Kunze wrote: > On 29.12.2015 03:59, Nick Coghlan wrote: >> >> On 28 December 2015 at 03:04, Guido van Rossum wrote: >>> >>> [ABCs are one honking great idea -- let's do more of those!] >> >> [collections.abc.Indexable would be a good one.] > > > Maybe, I still cannot wrap my mind enough around the > types-everywhere-in-python-please world. > > But, what's so wrong about checking for __getitem__ or __len__ if necessary? Most of the time when I care, it's for early error detection. For normal function calls, your best bet is to just try the operation, and let the interpreter generate the appropriate exception - the traceback will give the appropriate context for the error, so there's little gain in doing your own check. Things change when you're handing a callable off to be invoked later, whether that's through an object queue, atexit, context manager, thread pool, process pool, or something else. In those cases, a delayed exception will trigger in the invocation context, and so the traceback won't give the reader any information about which part of the code provided the bad arguments. There are two main remedies for this: 1. Use runtime argument checking at the point the arguments are passed in 2. Use some form of structural type checking that allows code to be analysed for correctness without running it The abc module provides a framework for the former task - if you know an algorithm needs a sequence (for example), you can write "isinstance(arg, Sequence)" before submitting the operation for execution and raise TypeError if the check fails. Folks passing in the wrong kind of argument then get a nice error message with a traceback at the point where they provided the incorrect data, rather than an obscure traceback that they then have to debug. As Andrew explains in his reply, this can be as simple as checking for a specific attribute, but it also extends to more complex criteria without changing the way you perform the runtime check. Static analysers like mypy, pytypedecl and pylint provide support for the latter approach, by checking for consistency between the way objects are defined and created and the way they're used. While it's possible for incorrect code to pass static analysis, the vast majority of correct code will pass it (and any which fails would likely be confusing to a human reader as well). Since the analysis is static, runtime dynamism isn't relevant - the analyser can point out both sides of the inconsistency, even if they're encountered at different times or in different threads or processes when executed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tritium-list at sdamon.com Thu Dec 31 10:07:38 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Thu, 31 Dec 2015 10:07:38 -0500 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: <5681785E.2070105@egenix.com> Message-ID: <568544BA.4040709@sdamon.com> Would it be a good idea to mix 'concrete implementations of ABCs'* directly in the abc module where the tooling to create ABCs live, or to put it in a submodule? I feel it should be a submodule, but that isn't based on vast experience. * Yes, I know, these are not concrete implementations of types... I find it a little confusing to describe. On 12/28/2015 22:58, Nick Coghlan wrote: > On 29 December 2015 at 03:58, M.-A. Lemburg wrote: >> On 28.12.2015 18:42, Brett Cannon wrote: >>> Speaking of using ABCs more, where should we put ABCs which have nothing to >>> do with collections? As of right now all ABCs seem to get shoved into >>> collections.abc, but e.g. Awaitable and Coroutine are not types of >>> collections. I personally want to add a context manager ABC with a default >>> __exit__. >>> >>> I opened http://bugs.python.org/issue25637 to discuss this, but I figured a >>> wider discussion wouldn't hurt. Some suggest just putting the ABCs into the >>> abc module. We could create an interfaces module (top-level or a submodule >>> of ABC). The other option is to put the ABCs in subject-specific modules, >>> so my context manager one would go into contextlib (either top-level or an >>> abc submodule); don't know where the coroutine ones would go since it might >>> be overloading asyncio if we out them there. >>> >>> Anyway, the key point is collections.abc is starting to get non-collections >>> stuff and if we are going to start pushing ABCs more we should decide how >>> we want to organize them in general in the stdlib and instead of dumping >>> them into collections.abc. >> I'd put them into the abc module (perhaps turning this into a >> package, if things get too crowded). >> >> collections.abc could then do a "from abc import *" for b/w >> compatibility. > With the benefit of hindsight, I think a broad namespace separation > like that might have been a good way to go from a data model > discoverability perspective, as it clearly separates the abstract > descriptions of data and control flow modelling concepts from the > concrete implementations of those concepts. > > However, we should also keep in mind which standard library modules > already publish ABCs, which is at least: > > typing > io > numbers > collections.abc > selectors > email > > (That list was collected by grepping Python files for ABCMeta, so I > may have missed some) > > That suggests to me that this design decision has effectively already > been made, and it's to include the ABCs in the relevant domain > specific modules. The inclusion of the non-collections related ABCs in > collections.abc are the anomaly, which can be addressed by moving them > out to a more appropriate location (adjusting the documentation > accordingly), and then importing them into collections.abc for > backwards compatibility (taking care not to increase the startup > import footprint too much in the process). > > The current collections.abc interfaces which I think are most at issue here: > > Callable > Iterable > Iterator > Generator > Awaitable > Coroutine > Awaitable > AsyncIterable > AsyncIterator > > (I'm excluding Hashable from the list, as the main reason that matters > is in describing whether or not something is suitable for inclusion in > a dict or set) > > These differ from the rest of the collections.abc interfaces in that > they're more closely associated with language level control flow > syntax than they are with containers specifically: > > Callable - function calls > Iterable - for loops, comprehensions > Iterator - for loops, comprehensions > Generator - generators, generator expressions > Awaitable - await expressions > Coroutine - async def > AsyncIterable - async for > AsyncIterator - async for > > Adding ContextManager and AsyncContextManager would give ABCs for the > protocols related to "with" and "async with". > > Since these all correspond to syntactic protocols, I now think it's > reasonable to include them directly in the "abc" namespace, since that > still gives us a clear guideline for which ABCs go there, and which > belong somewhere else: if it has syntax associated with it, or it's > part of the ABC machinery itself, then it can go directly in the "abc" > namespace. Using the "abc" namespace also ensures there isn't any > additional import overhead, since that gets imported by all ABC using > code anyway. > > Regards, > Nick. > From brett at python.org Thu Dec 31 12:47:50 2015 From: brett at python.org (Brett Cannon) Date: Thu, 31 Dec 2015 17:47:50 +0000 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: <568544BA.4040709@sdamon.com> References: <5681785E.2070105@egenix.com> <568544BA.4040709@sdamon.com> Message-ID: On Thu, Dec 31, 2015, 07:08 Alexander Walters wrote: > Would it be a good idea to mix 'concrete implementations of ABCs'* > directly in the abc module where the tooling to create ABCs live, or to > put it in a submodule? I feel it should be a submodule, but that isn't > based on vast experience. > > * Yes, I know, these are not concrete implementations of types... I find > it a little confusing to describe. > It's a possibility. Feel free to comment on the issue if you want to discuss this further. -Brett > On 12/28/2015 22:58, Nick Coghlan wrote: > > On 29 December 2015 at 03:58, M.-A. Lemburg wrote: > >> On 28.12.2015 18:42, Brett Cannon wrote: > >>> Speaking of using ABCs more, where should we put ABCs which have > nothing to > >>> do with collections? As of right now all ABCs seem to get shoved into > >>> collections.abc, but e.g. Awaitable and Coroutine are not types of > >>> collections. I personally want to add a context manager ABC with a > default > >>> __exit__. > >>> > >>> I opened http://bugs.python.org/issue25637 to discuss this, but I > figured a > >>> wider discussion wouldn't hurt. Some suggest just putting the ABCs > into the > >>> abc module. We could create an interfaces module (top-level or a > submodule > >>> of ABC). The other option is to put the ABCs in subject-specific > modules, > >>> so my context manager one would go into contextlib (either top-level > or an > >>> abc submodule); don't know where the coroutine ones would go since it > might > >>> be overloading asyncio if we out them there. > >>> > >>> Anyway, the key point is collections.abc is starting to get > non-collections > >>> stuff and if we are going to start pushing ABCs more we should decide > how > >>> we want to organize them in general in the stdlib and instead of > dumping > >>> them into collections.abc. > >> I'd put them into the abc module (perhaps turning this into a > >> package, if things get too crowded). > >> > >> collections.abc could then do a "from abc import *" for b/w > >> compatibility. > > With the benefit of hindsight, I think a broad namespace separation > > like that might have been a good way to go from a data model > > discoverability perspective, as it clearly separates the abstract > > descriptions of data and control flow modelling concepts from the > > concrete implementations of those concepts. > > > > However, we should also keep in mind which standard library modules > > already publish ABCs, which is at least: > > > > typing > > io > > numbers > > collections.abc > > selectors > > email > > > > (That list was collected by grepping Python files for ABCMeta, so I > > may have missed some) > > > > That suggests to me that this design decision has effectively already > > been made, and it's to include the ABCs in the relevant domain > > specific modules. The inclusion of the non-collections related ABCs in > > collections.abc are the anomaly, which can be addressed by moving them > > out to a more appropriate location (adjusting the documentation > > accordingly), and then importing them into collections.abc for > > backwards compatibility (taking care not to increase the startup > > import footprint too much in the process). > > > > The current collections.abc interfaces which I think are most at issue > here: > > > > Callable > > Iterable > > Iterator > > Generator > > Awaitable > > Coroutine > > Awaitable > > AsyncIterable > > AsyncIterator > > > > (I'm excluding Hashable from the list, as the main reason that matters > > is in describing whether or not something is suitable for inclusion in > > a dict or set) > > > > These differ from the rest of the collections.abc interfaces in that > > they're more closely associated with language level control flow > > syntax than they are with containers specifically: > > > > Callable - function calls > > Iterable - for loops, comprehensions > > Iterator - for loops, comprehensions > > Generator - generators, generator expressions > > Awaitable - await expressions > > Coroutine - async def > > AsyncIterable - async for > > AsyncIterator - async for > > > > Adding ContextManager and AsyncContextManager would give ABCs for the > > protocols related to "with" and "async with". > > > > Since these all correspond to syntactic protocols, I now think it's > > reasonable to include them directly in the "abc" namespace, since that > > still gives us a clear guideline for which ABCs go there, and which > > belong somewhere else: if it has syntax associated with it, or it's > > part of the ABC machinery itself, then it can go directly in the "abc" > > namespace. Using the "abc" namespace also ensures there isn't any > > additional import overhead, since that gets imported by all ABC using > > code anyway. > > > > Regards, > > Nick. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Thu Dec 31 17:18:00 2015 From: mike at selik.org (Michael Selik) Date: Thu, 31 Dec 2015 22:18:00 +0000 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: <5681785E.2070105@egenix.com> <568544BA.4040709@sdamon.com> Message-ID: On Thu, Dec 31, 2015 at 12:48 PM Brett Cannon wrote: > > > On Thu, Dec 31, 2015, 07:08 Alexander Walters > wrote: > >> Would it be a good idea to mix 'concrete implementations of ABCs'* >> directly in the abc module where the tooling to create ABCs live, or to >> put it in a submodule? I feel it should be a submodule, but that isn't >> based on vast experience. >> > Locating collections ABCs in a submodule makes some sense, as there are 21 of them and the collections module is important for beginners to learn without getting distracted by ABCs. Contrast that with the direct inclusion of ABCs in most other modules and it suggests the creation of a submodule for collections may have been motivated for the same reason as this discussion -- it didn't feel right to have certain ABCs directly in the collections module. If the non-collection ABCs are being moved out of the collections module and into the ``abc`` module, there's less reason to separate them into a submodule. Beginners don't venture into the abc module expecting to understand everything. It's natural to find a bunch of ABCs in a module called ``abc``. And ABCs are included directly in many other modules instead of being relegated to a less discoverable submodule like ``typing.abc``, ``io.abc``, ``numbers.abc``, etc. as many of those are focused on ABCs in the first place. It's easy to notice a submodule when reading docs on the internet, but it's hard to figure out what the correct module is to import when hanging out at a basic REPL. Flat is better than nested and all that. * Yes, I know, these are not concrete implementations of types... I find >> it a little confusing to describe. >> > > It's a possibility. Feel free to comment on the issue if you want to > discuss this further. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Thu Dec 31 18:56:38 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 31 Dec 2015 17:56:38 -0600 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: <5681785E.2070105@egenix.com> <568544BA.4040709@sdamon.com> Message-ID: On Dec 31, 2015 11:48 AM, "Brett Cannon" wrote: > > > > On Thu, Dec 31, 2015, 07:08 Alexander Walters wrote: >> >> Would it be a good idea to mix 'concrete implementations of ABCs'* >> directly in the abc module where the tooling to create ABCs live, or to >> put it in a submodule? I feel it should be a submodule, but that isn't >> based on vast experience. >> >> * Yes, I know, these are not concrete implementations of types... I find >> it a little confusing to describe. > > > It's a possibility. Feel free to comment on the issue if you want to discuss this further. > > -Brett Are these interfaces? * | Docs: http://docs.zope.org/zope.interface/ * | Docs: http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/zca.html * | Docs: https://github.com/Pylons/pyramid/blob/master/pyramid/config/zca.py * | Docs: http://docs.pylonsproject.org/projects/pyramid/en/latest/api/interfaces.html * | Src: https://github.com/Pylons/pyramid/blob/master/pyramid/interfaces.py > >> >> On 12/28/2015 22:58, Nick Coghlan wrote: >> > On 29 December 2015 at 03:58, M.-A. Lemburg wrote: >> >> On 28.12.2015 18:42, Brett Cannon wrote: >> >>> Speaking of using ABCs more, where should we put ABCs which have nothing to >> >>> do with collections? As of right now all ABCs seem to get shoved into >> >>> collections.abc, but e.g. Awaitable and Coroutine are not types of >> >>> collections. I personally want to add a context manager ABC with a default >> >>> __exit__. >> >>> >> >>> I opened http://bugs.python.org/issue25637 to discuss this, but I figured a >> >>> wider discussion wouldn't hurt. Some suggest just putting the ABCs into the >> >>> abc module. We could create an interfaces module (top-level or a submodule >> >>> of ABC). The other option is to put the ABCs in subject-specific modules, >> >>> so my context manager one would go into contextlib (either top-level or an >> >>> abc submodule); don't know where the coroutine ones would go since it might >> >>> be overloading asyncio if we out them there. >> >>> >> >>> Anyway, the key point is collections.abc is starting to get non-collections >> >>> stuff and if we are going to start pushing ABCs more we should decide how >> >>> we want to organize them in general in the stdlib and instead of dumping >> >>> them into collections.abc. >> >> I'd put them into the abc module (perhaps turning this into a >> >> package, if things get too crowded). >> >> >> >> collections.abc could then do a "from abc import *" for b/w >> >> compatibility. >> > With the benefit of hindsight, I think a broad namespace separation >> > like that might have been a good way to go from a data model >> > discoverability perspective, as it clearly separates the abstract >> > descriptions of data and control flow modelling concepts from the >> > concrete implementations of those concepts. >> > >> > However, we should also keep in mind which standard library modules >> > already publish ABCs, which is at least: >> > >> > typing >> > io >> > numbers >> > collections.abc >> > selectors >> > email >> > >> > (That list was collected by grepping Python files for ABCMeta, so I >> > may have missed some) >> > >> > That suggests to me that this design decision has effectively already >> > been made, and it's to include the ABCs in the relevant domain >> > specific modules. The inclusion of the non-collections related ABCs in >> > collections.abc are the anomaly, which can be addressed by moving them >> > out to a more appropriate location (adjusting the documentation >> > accordingly), and then importing them into collections.abc for >> > backwards compatibility (taking care not to increase the startup >> > import footprint too much in the process). >> > >> > The current collections.abc interfaces which I think are most at issue here: >> > >> > Callable >> > Iterable >> > Iterator >> > Generator >> > Awaitable >> > Coroutine >> > Awaitable >> > AsyncIterable >> > AsyncIterator >> > >> > (I'm excluding Hashable from the list, as the main reason that matters >> > is in describing whether or not something is suitable for inclusion in >> > a dict or set) >> > >> > These differ from the rest of the collections.abc interfaces in that >> > they're more closely associated with language level control flow >> > syntax than they are with containers specifically: >> > >> > Callable - function calls >> > Iterable - for loops, comprehensions >> > Iterator - for loops, comprehensions >> > Generator - generators, generator expressions >> > Awaitable - await expressions >> > Coroutine - async def >> > AsyncIterable - async for >> > AsyncIterator - async for >> > >> > Adding ContextManager and AsyncContextManager would give ABCs for the >> > protocols related to "with" and "async with". >> > >> > Since these all correspond to syntactic protocols, I now think it's >> > reasonable to include them directly in the "abc" namespace, since that >> > still gives us a clear guideline for which ABCs go there, and which >> > belong somewhere else: if it has syntax associated with it, or it's >> > part of the ABC machinery itself, then it can go directly in the "abc" >> > namespace. Using the "abc" namespace also ensures there isn't any >> > additional import overhead, since that gets imported by all ABC using >> > code anyway. >> > >> > Regards, >> > Nick. >> > >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Thu Dec 31 18:59:57 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 31 Dec 2015 17:59:57 -0600 Subject: [Python-ideas] Where to put non-collection ABCs (was: Deprecating the old-style sequence protocol) In-Reply-To: References: <5681785E.2070105@egenix.com> <568544BA.4040709@sdamon.com> Message-ID: On Dec 31, 2015 5:56 PM, "Wes Turner" wrote: > > > On Dec 31, 2015 11:48 AM, "Brett Cannon" wrote: > > > > > > > > On Thu, Dec 31, 2015, 07:08 Alexander Walters wrote: > >> > >> Would it be a good idea to mix 'concrete implementations of ABCs'* > >> directly in the abc module where the tooling to create ABCs live, or to > >> put it in a submodule? I feel it should be a submodule, but that isn't > >> based on vast experience. > >> > >> * Yes, I know, these are not concrete implementations of types... I find > >> it a little confusing to describe. > > > > > > It's a possibility. Feel free to comment on the issue if you want to discuss this further. > > > > -Brett > > Are these interfaces? > > * | Docs: http://docs.zope.org/zope.interface/ > * | Docs: http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/zca.html > > * | Docs: https://github.com/Pylons/pyramid/blob/master/pyramid/config/zca.py > * | Docs: http://docs.pylonsproject.org/projects/pyramid/en/latest/api/interfaces.html > * | Src: https://github.com/Pylons/pyramid/blob/master/pyramid/interfaces.py * pyramid.interfaces.IDict http://docs.pylonsproject.org/projects/pyramid/en/latest/api/interfaces.html#pyramid.interfaces.IDict * pyramid.interfaces.IMultiDict http://docs.pylonsproject.org/projects/pyramid/en/latest/api/interfaces.html#pyramid.interfaces.IMultiDict [source] > > > > >> > >> On 12/28/2015 22:58, Nick Coghlan wrote: > >> > On 29 December 2015 at 03:58, M.-A. Lemburg wrote: > >> >> On 28.12.2015 18:42, Brett Cannon wrote: > >> >>> Speaking of using ABCs more, where should we put ABCs which have nothing to > >> >>> do with collections? As of right now all ABCs seem to get shoved into > >> >>> collections.abc, but e.g. Awaitable and Coroutine are not types of > >> >>> collections. I personally want to add a context manager ABC with a default > >> >>> __exit__. > >> >>> > >> >>> I opened http://bugs.python.org/issue25637 to discuss this, but I figured a > >> >>> wider discussion wouldn't hurt. Some suggest just putting the ABCs into the > >> >>> abc module. We could create an interfaces module (top-level or a submodule > >> >>> of ABC). The other option is to put the ABCs in subject-specific modules, > >> >>> so my context manager one would go into contextlib (either top-level or an > >> >>> abc submodule); don't know where the coroutine ones would go since it might > >> >>> be overloading asyncio if we out them there. > >> >>> > >> >>> Anyway, the key point is collections.abc is starting to get non-collections > >> >>> stuff and if we are going to start pushing ABCs more we should decide how > >> >>> we want to organize them in general in the stdlib and instead of dumping > >> >>> them into collections.abc. > >> >> I'd put them into the abc module (perhaps turning this into a > >> >> package, if things get too crowded). > >> >> > >> >> collections.abc could then do a "from abc import *" for b/w > >> >> compatibility. > >> > With the benefit of hindsight, I think a broad namespace separation > >> > like that might have been a good way to go from a data model > >> > discoverability perspective, as it clearly separates the abstract > >> > descriptions of data and control flow modelling concepts from the > >> > concrete implementations of those concepts. > >> > > >> > However, we should also keep in mind which standard library modules > >> > already publish ABCs, which is at least: > >> > > >> > typing > >> > io > >> > numbers > >> > collections.abc > >> > selectors > >> > email > >> > > >> > (That list was collected by grepping Python files for ABCMeta, so I > >> > may have missed some) > >> > > >> > That suggests to me that this design decision has effectively already > >> > been made, and it's to include the ABCs in the relevant domain > >> > specific modules. The inclusion of the non-collections related ABCs in > >> > collections.abc are the anomaly, which can be addressed by moving them > >> > out to a more appropriate location (adjusting the documentation > >> > accordingly), and then importing them into collections.abc for > >> > backwards compatibility (taking care not to increase the startup > >> > import footprint too much in the process). > >> > > >> > The current collections.abc interfaces which I think are most at issue here: > >> > > >> > Callable > >> > Iterable > >> > Iterator > >> > Generator > >> > Awaitable > >> > Coroutine > >> > Awaitable > >> > AsyncIterable > >> > AsyncIterator > >> > > >> > (I'm excluding Hashable from the list, as the main reason that matters > >> > is in describing whether or not something is suitable for inclusion in > >> > a dict or set) > >> > > >> > These differ from the rest of the collections.abc interfaces in that > >> > they're more closely associated with language level control flow > >> > syntax than they are with containers specifically: > >> > > >> > Callable - function calls > >> > Iterable - for loops, comprehensions > >> > Iterator - for loops, comprehensions > >> > Generator - generators, generator expressions > >> > Awaitable - await expressions > >> > Coroutine - async def > >> > AsyncIterable - async for > >> > AsyncIterator - async for > >> > > >> > Adding ContextManager and AsyncContextManager would give ABCs for the > >> > protocols related to "with" and "async with". > >> > > >> > Since these all correspond to syntactic protocols, I now think it's > >> > reasonable to include them directly in the "abc" namespace, since that > >> > still gives us a clear guideline for which ABCs go there, and which > >> > belong somewhere else: if it has syntax associated with it, or it's > >> > part of the ABC machinery itself, then it can go directly in the "abc" > >> > namespace. Using the "abc" namespace also ensures there isn't any > >> > additional import overhead, since that gets imported by all ABC using > >> > code anyway. > >> > > >> > Regards, > >> > Nick. > >> > > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: