[Numpy-discussion] Subclasses - use of __finalize__
Pierre GM
pgmdevlist at gmail.com
Mon Dec 18 13:12:50 EST 2006
On Saturday 16 December 2006 19:55, Colin J. Williams wrote:
Colin,
First of all, a disclaimer: I'm a (bad) hydrologist, not a computer scientist.
I learned python/numpy by playing around, and really got into subclassing
since 3-4 months ago. My explanations might not be completely accurate, I'll
ask more experienced users to correct me if I'm wrong.
`__new__` is the class constructor method. A call to `__new__(cls,...)`
creates a new instance of the class `cls`, but doesn't initialize the
instance, that's the role of the `__init__` method. According to the python
documentation,
If __new__() returns an instance of cls, then the new instance's
__init__() method will be invoked like "__init__(self[, ...])", where self is
the new instance and the remaining arguments are the same as were passed to
__new__().
If __new__() does not return an instance of cls, then the new instance's
__init__() method will not be invoked.
__new__() is intended mainly to allow subclasses of immutable types (like
int, str, or tuple) to customize instance creation.
It turns out that ndarrays behaves as immutable types, therefore an `__init__`
method is never called. How can we initialize the instance, then ? By calling
`__array_finalize__`.
`__array_finalize__` is called automatically once an instance is created with
`__new__`. Moreover, it is called each time a new array is returned by a
method, even if the method doesn't specifically call `__new__`.
For example, the `__add__`, `__iadd__`, `reshape` return new arrays, so
`__array_finalize` is called. Note that these methods do not create a new
array from scratch, so there is no call to `__new__`.
As another example, we can also modify the shape of the array with `resize`.
However, this method works in place, so a new array is NOT created.
About the `obj` argument in `__array_finalize__`:
The first time a subarray is created, `__array_finalize__` is called with the
argument `obj` as a regular ndarray. Afterwards, when a new array is returned
without ccall to `__new__`, the `obj` argument is the initial subarray (the
one calling the method).
The easier is to try and see what happens. Here's a small script that defines
a `InfoArray` class: just a ndarray with a tag attached. That's basically the
class of the wiki, with messages printed in `__new__` and
`__array_finalize__`. I join some doctest to illustrate some of the concepts,
I hope it will be explanatory enough.
Please let me know whether it helps. If it does, I'll update the wiki page
##############################################
"""
Let us define a new InfoArray object
>>> x = InfoArray(N.arange(10), info={'name':'x'})
__new__ received <type 'numpy.ndarray'>
__new__ sends <type 'numpy.ndarray'> as <class '__main__.InfoArray'>
__array_finalize__ received <type 'numpy.ndarray'>
__array_finalize__ defined <class '__main__.InfoArray'>
Let's get the first element:
>>> x[0]
0
We expect a scalar, we get a scalar, everything's fine. If now we want all the
elements, we can use `x[:]`, which calls `__getslice__` and returns a new
array. Therefore, we expect `__array_finalize__` to get called:
>>> x[:]
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
InfoArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Let's add 1 to the array: this operation calls the `__add__` method, which
returns a new array from `x`
>>> x+1
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
InfoArray([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Let us change the shape of the array from *(10,)* to *(2,5)* with the
`reshape` method. The method returns a new array, so we expect a call to
`array_finalize`:
>>> y = x.reshape((2,5))
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
If now we print y, we call the __repr__ method, which in turns defines as many
arrays as rows: we expect 2 calls to `__array_finalize__`:
>>> print y
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
[[0 1 2 3 4]
[5 6 7 8 9]]
Let's change the shape of `y` back to *(10,)*, but using the `resize` method
this time. `resize` works in place, so a new array isn't be created, and
`array_finalize` is not called.
>>> y.resize((10,))
>>> y.shape
(10,)
OK, and what about `transpose` ? Well, it returns a new array (1 call), plus
as we print it, we have *rows* calls to `array_finalize`, a total of *rows+1*
calls
>>> y.resize((5,2))
>>> print y.T
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
[[0 1 2 3 4]
[5 6 7 8 9]]
Now let's create a new array from scratch. `__new__` is called, but as the
argument is already an InfoArray, the *__new__ sends...* line is bypassed.
Moreover, if we don't precise the type, we call `data.astype` which in turn
calls `__array_finalize__`. Then, `__array_finalize__` is called a second
time, this time to initialize the new object.
>>> z = InfoArray(x)
__new__ received <class '__main__.InfoArray'>
__new__ saw another dtype.
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
Note that if we precise the dtype, we don't have to call `data.astype`, and
`__array_finalize`` gets called once:
>>> z = InfoArray(x, dtype=x.dtype)
__new__ received <class '__main__.InfoArray'>
__new__ saw the same dtype.
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
"""
import numpy as N
class InfoArray(N.ndarray):
def __new__(subtype, data, info=None, dtype=None, copy=False):
# When data is an InfoArray
print "__new__ received %s" % type(data)
if isinstance(data, InfoArray):
if not copy and dtype==data.dtype:
print "__new__ saw the same dtype."
return data.view(subtype)
else:
print "__new__ saw another dtype."
return data.astype(dtype).view(subtype)
subtype._info = info
subtype.info = subtype._info
print "__new__ sends %s as %s" % (type(N.asarray(data)), subtype)
return N.array(data).view(subtype)
def __array_finalize__(self,obj):
print "__array_finalize__ received %s" % type(obj)
if hasattr(obj, "info"):
# The object already has an info tag: just use it
self.info = obj.info
else:
# The object has no info tag: use the default
self.info = self._info
print "__array_finalize__ defined %s" % type(self)
def _test():
import doctest
doctest.testmod(verbose=True)
if __name__ == "__main__":
_test()
More information about the NumPy-Discussion
mailing list