[Numpy-discussion] Generator arrays
Lluís
xscript at gmx.net
Fri Jan 28 10:25:45 EST 2011
Travis Oliphant writes:
> This concept has as one use-case, the deferred arrays that Mark Wiebe
> has proposed.
Interesting, I didn't read about that.
In fact, I was playing around with a proxy wrapper for ndarrays not long
ago, in order to build a tree of deferred operations that can be later
optimized through numexpr once __str__ or __repr__ is called on such a
deferred object. The idea was to have something like:
a = np.array(...)
a = defereval(a) # returns a proxy wrapper for known methods of np.ndarray
b = 10 + a ** 2
print a # here the tree of deferred operations is flattened
# into a string that numpexpr can use
I didn't play much with it, but proxying all methods but __str__ and
__repr__ (thus iterating on the original a.__dict__) seemed to suffice.
The benefits I see of building this into ndarray itself is that ndarray
would then be the hourglass waist of the framework.
Subclassing ndarray is moderately complex right now, so I think that
having a way to move some of these subclasses below the hourglass waist
and not having to deal with the overloading of ndarray's UI would be a
big step forward towards extension code simplicity.
So, having near-zero knowledge on the internals of numpy and all new
features that have been discussed here, my naive view of what the stack
should contain is:
* ndarray subclasses
Overload indexing (e.g., data_array's named dimension elements),
translating any fancy indexing into ndarray's "native" indexing
methods
Overload user representation (e.g., show some extra info when printing
an array)
* ndarray slicing and numeric operations
A central point for slicing/indexing (the output should be either
views or copies)
A central point to control the deferral of operations (both native and
extensions - see below -). In fact, I see deferred operations as just
a form of copy-on-write/evaluate-on-access views (COW must be used
when one of the input operands of a deferred tree of operations is
modified after capturing it into such a tree).
* numeric operations extensions
Numeric operations should be first-class if deferred operation
evaluation is to be taken to its highest potential, and thus they
should be aware of an "operation evaluation engine" (as well as the
other way around).
If they are not (and they should be able not to be), two things can
happen:
- for those based only on first-class operations, it is just the root
of a subtree
- if more complex operations are performed (explicit looping?), they
simply diminish the range of possibilities of optimizing opearation
evaluation (actually producing multiple evaluation trees, or maybe
simply forcing evaluation).
* operation evaluation engine
This would take care of evaluating the operation tree, while
performing optimizations on it.
Fortunately, if a sensible interface is established between this and
first-class numeric operations, a first implementation can provide
just the naive evaluation, and further optimizations can be provided
behind the scenes.
Such optimizations would provide things like operation tree
simplification/reorganization, blocking (a la numexpr) and
parallellization of computations.
* storage access extensions
Slicing in ndarray should be aware of objects represented by means
other than "plain strided memory buffers": e.g., the compressed array
case (where decompression could be treated with a sliding window), or
deferred operation evaluation itself.
In fact, as you pointed of with the MEMORY flag, both storage and
operation evaluation can be subject to the common concept of deferral
(accessing a compressed array is just another form of accessing
computed contents, like accessing elements on a deferred array).
I just hope they're all not just obvious observations of what has
already been said.
Lluis
PS: sorry for the unnecessarily long mail
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
More information about the NumPy-Discussion
mailing list