[Tutor] question about descriptors
Albert-Jan Roskam
sjeik_appie at hotmail.com
Wed Nov 11 03:33:17 EST 2015
> Date: Sun, 8 Nov 2015 01:24:58 +1100
> From: steve at pearwood.info
> To: tutor at python.org
> Subject: Re: [Tutor] question about descriptors
>
> On Sat, Nov 07, 2015 at 12:53:11PM +0000, Albert-Jan Roskam wrote:
>
> [...]
> > Ok, now to my question. I want to create a class with read-only
> > attribute access to the columns of a .csv file. E.g. when a file has a
> > column named 'a', that column should be returned as list by using
> > instance.a. At first I thought I could do this with the builtin
> > 'property' class, but I am not sure how.
>
> 90% of problems involving computed attributes (including "read-only"
> attributes) are most conveniently solved with `property`, but I think
> this may be an exception. Nevertheless, I'll give you a solution in
> terms of `property` first.
>
> I'm too busy/lazy to handle reading from a CSV file, so I'll fake it
> with a dict of columns.
Actually, I want to make this work for any iterable, as long as I can get the header names and as long as it returns one record per iteration.
> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> @property
> def a(self):
> return self._data['a'][:]
> @property
> def b(self):
> return self._data['b'][:]
> @property
> def c(self):
> return self._data['c'][:]
Interesting. I never would have thought to define a separate class for this.
> And in use:
>
> py> cols = ColumnView()
> py> cols.a
> [1, 2, 3, 4, 5, 6]
> py> cols.a = []
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: can't set attribute
>
>
>
> Now, some comments:
>
> (1) You must inherit from `object` for this to work. (Or use Python 3.)
> It won't work if you just say "class ColumnView:", which would make it a
> so-called "classic" or "old-style" class. You don't want that.
Are there any use cases left where one still must use old-style classes? Or should new code always inherit from object (unless one want to inherit from another "true" class, of course).
> (2) Inside the property getter functions, I make a copy of the lists
> before returning them. That is, I do:
>
> return self._data['c'][:]
>
> rather than:
>
> return self._data['c']
>
>
> The empty slice [:] makes a copy. If I did not do this, you could mutate
> the list (say, by appending a value to it, or deleting items from it)
> and that mutation would show up the next time you looked at the column.
These mutability problems always make me pull my hair out! :-) I like the [:] notation, but:
In [1]: giant = range(10 ** 7)
In [2]: %timeit copy1 = giant[:]
10 loops, best of 3: 97 ms per loop
In [3]: from copy import copy
In [4]: %timeit copy2 = copy(giant)
10 loops, best of 3: 90 ms per loop
In [5]: import copy
In [6]: %timeit copy2 = copy.copy(giant)
10 loops, best of 3: 88.6 ms per loop
Hmmm, wicked, when I looked earlier this week the difference appear to be bigger.
> (3) It's very tedious having to create a property for each column ahead
> of time. But we can do this instead:
>
>
> def make_getter(key):
> def inner(self):
> return self._data[key][:]
> inner.__name__ = key
> return property(inner)
>
>
> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> for key in _data:
> locals()[key] = make_getter(key)
> del key
>
>
> and it works as above, but without all the tedious manual creation of
> property getters.
>
> Do you understand how this operates? If not, ask, and someone will
> explain. (And yes, this is one of the few times that writing to locals()
> actually works!)
I think so. I still plan to write several working implementations to get a better idea about which strategy to choose.
> (4) But what if you don't know what the columns are called ahead of
> time? You can't use property, or descriptors, because you don't know
> what to call the damn things until you know what the column headers are,
> and by the time you know that, the class is already well and truly
> created. You might think you can do this:
>
> class ColumnView(object):
> def __init__(self):
> # read the columns from the CSV file
> self._data = ...
> # now create properties to suit
> for key in self._data:
> setattr(self, key, property( ... ))
>
>
> but that doesn't work. Properties only perform their "magic" when they
> are attached to the class itself. By setting them as attributes on the
> instance (self), they lose their power and just get treated as ordinary
> attributes. To be technical, we say that the descriptor protocol is only
> enacted when the attribute is found in the class, not in the instance.
Ha! That is indeed exactly what I tried! :-))
> You might be tempted to write this instead:
>
> setattr(self.__class__, key, property( ... ))
I thought about defining a classmethod, then inside it do setattr(cls, key, property( ... ))
But that is probably the same?
> but that's even worse. Now, every time you create a new ColumnView
> instance, *all the other instances will change*. They will grown new
> properties, or overwrite existing properties. You don't want that.
>
> Fortunately, Python has an mechanism for solving this problem:
> the `__getattr__` method and friends.
>
>
> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> def __getattr__(self, name):
> if name in self._data:
> return self._data[name][:]
> else:
> raise AttributeError
> def __setattr__(self, name, value):
> if name in self._data:
> raise AttributeError('read-only attribute')
> super(ColumnView, self).__setattr__(name, value)
> def __delattr__(self, name):
> if name in self._data:
> raise AttributeError('read-only attribute')
> super(ColumnView, self).__delattr__(name)
That also seems very straightforward. Why does "if name in self._data:" not cause a recursion? self._data calls __getattr__, which has self._data in it, which...etc.
More information about the Tutor
mailing list