[Tutor] question about descriptors

Wed Nov 11 03:33:17 EST 2015

> Date: Sun, 8 Nov 2015 01:24:58 +1100
> From: steve at pearwood.info
> To: tutor at python.org
> Subject: Re: [Tutor] question about descriptors
> 
> On Sat, Nov 07, 2015 at 12:53:11PM +0000, Albert-Jan Roskam wrote:
> 
> [...]
> > Ok, now to my question. I want to create a class with read-only 
> > attribute access to the columns of a .csv file. E.g. when a file has a 
> > column named 'a', that column should be returned as list by using 
> > instance.a. At first I thought I could do this with the builtin 
> > 'property' class, but I am not sure how. 
> 
> 90% of problems involving computed attributes (including "read-only" 
> attributes) are most conveniently solved with `property`, but I think 
> this may be an exception. Nevertheless, I'll give you a solution in 
> terms of `property` first.
> 
> I'm too busy/lazy to handle reading from a CSV file, so I'll fake it 
> with a dict of columns.

Actually, I want to make this work for any iterable, as long as I can get the header names and as long as it returns one record per iteration.

> class ColumnView(object):
>     _data = {'a': [1, 2, 3, 4, 5, 6],
>              'b': [1, 2, 4, 8, 16, 32],
>              'c': [1, 10, 100, 1000, 10000, 100000],
>              }
>     @property
>     def a(self):
>         return self._data['a'][:]
>     @property
>     def b(self):
>         return self._data['b'][:]
>     @property
>     def c(self):
>         return self._data['c'][:]

Interesting. I never would have thought to define a separate class for this.

> And in use:
> 
> py> cols = ColumnView()
> py> cols.a
> [1, 2, 3, 4, 5, 6]
> py> cols.a = []
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: can't set attribute
> 
> 
> 
> Now, some comments:
> 
> (1) You must inherit from `object` for this to work. (Or use Python 3.) 
> It won't work if you just say "class ColumnView:", which would make it a 
> so-called "classic" or "old-style" class. You don't want that.

Are there any use cases left where one still must use old-style classes? Or should new code always inherit from object (unless one want to inherit from another "true" class, of course).

> (2) Inside the property getter functions, I make a copy of the lists 
> before returning them. That is, I do:
> 
>     return self._data['c'][:]
> 
> rather than:
> 
>     return self._data['c']
> 
> 
> The empty slice [:] makes a copy. If I did not do this, you could mutate 
> the list (say, by appending a value to it, or deleting items from it) 
> and that mutation would show up the next time you looked at the column.

These mutability problems always make me pull my hair out! :-) I like the [:] notation, but: 

In [1]: giant = range(10 ** 7)

In [2]: %timeit copy1 = giant[:]
10 loops, best of 3: 97 ms per loop

In [3]: from copy import copy

In [4]: %timeit copy2 = copy(giant)
10 loops, best of 3: 90 ms per loop

In [5]: import copy

In [6]: %timeit copy2 = copy.copy(giant)
10 loops, best of 3: 88.6 ms per loop

Hmmm, wicked, when I looked earlier this week the difference appear to be bigger.

> (3) It's very tedious having to create a property for each column ahead 
> of time. But we can do this instead:
> 
> 
> def make_getter(key):
>     def inner(self):
>         return self._data[key][:]
>     inner.__name__ = key
>     return property(inner)
> 
> 
> class ColumnView(object):
>     _data = {'a': [1, 2, 3, 4, 5, 6],
>              'b': [1, 2, 4, 8, 16, 32],
>              'c': [1, 10, 100, 1000, 10000, 100000],
>              }
>     for key in _data:
>         locals()[key] = make_getter(key)
>     del key
> 
> 
> and it works as above, but without all the tedious manual creation of 
> property getters.
> 
> Do you understand how this operates? If not, ask, and someone will 
> explain. (And yes, this is one of the few times that writing to locals() 
> actually works!)

I think so. I still plan to write several working implementations to get a better idea about which strategy to  choose. 

> (4) But what if you don't know what the columns are called ahead of 
> time? You can't use property, or descriptors, because you don't know 
> what to call the damn things until you know what the column headers are, 
> and by the time you know that, the class is already well and truly 
> created. You might think you can do this:
> 
> class ColumnView(object):
>     def __init__(self):
>         # read the columns from the CSV file
>         self._data = ...
>         # now create properties to suit
>         for key in self._data:
>             setattr(self, key, property( ... ))
> 
> 
> but that doesn't work. Properties only perform their "magic" when they 
> are attached to the class itself. By setting them as attributes on the 
> instance (self), they lose their power and just get treated as ordinary 
> attributes. To be technical, we say that the descriptor protocol is only 
> enacted when the attribute is found in the class, not in the instance.

Ha! That is indeed exactly what I tried! :-))

> You might be tempted to write this instead:
> 
>             setattr(self.__class__, key, property( ... ))

I thought about defining a classmethod, then inside it do setattr(cls, key, property( ... ))
But that is probably the same?

> but that's even worse. Now, every time you create a new ColumnView 
> instance, *all the other instances will change*. They will grown new 
> properties, or overwrite existing properties. You don't want that.
> 
> Fortunately, Python has an mechanism for solving this problem: 
> the `__getattr__` method and friends.
> 
> 
> class ColumnView(object):
>     _data = {'a': [1, 2, 3, 4, 5, 6],
>              'b': [1, 2, 4, 8, 16, 32],
>              'c': [1, 10, 100, 1000, 10000, 100000],
>              }
>     def __getattr__(self, name):
>         if name in self._data:
>             return self._data[name][:]
>         else:
>             raise AttributeError
>     def __setattr__(self, name, value):
>         if name in self._data:
>             raise AttributeError('read-only attribute')
>         super(ColumnView, self).__setattr__(name, value)
>     def __delattr__(self, name):
>         if name in self._data:
>             raise AttributeError('read-only attribute')
>         super(ColumnView, self).__delattr__(name)

That also seems very straightforward. Why does "if name in self._data:" not cause a recursion? self._data calls __getattr__, which has self._data in it, which...etc.