[Numpy-discussion] Generator arrays

Lluís xscript at gmx.net
Fri Jan 28 10:25:45 EST 2011


Travis Oliphant writes:

> This concept has as one use-case, the deferred arrays that Mark Wiebe
> has proposed.

Interesting, I didn't read about that.

In fact, I was playing around with a proxy wrapper for ndarrays not long
ago, in order to build a tree of deferred operations that can be later
optimized through numexpr once __str__ or __repr__ is called on such a
deferred object. The idea was to have something like:

a = np.array(...)
a = defereval(a)  # returns a proxy wrapper for known methods of np.ndarray
b = 10 + a ** 2
print a           # here the tree of deferred operations is flattened
                  # into a string that numpexpr can use

I didn't play much with it, but proxying all methods but __str__ and
__repr__ (thus iterating on the original a.__dict__) seemed to suffice.


The benefits I see of building this into ndarray itself is that ndarray
would then be the hourglass waist of the framework.

Subclassing ndarray is moderately complex right now, so I think that
having a way to move some of these subclasses below the hourglass waist
and not having to deal with the overloading of ndarray's UI would be a
big step forward towards extension code simplicity.

So, having near-zero knowledge on the internals of numpy and all new
features that have been discussed here, my naive view of what the stack
should contain is:

* ndarray subclasses

  Overload indexing (e.g., data_array's named dimension elements),
  translating any fancy indexing into ndarray's "native" indexing
  methods

  Overload user representation (e.g., show some extra info when printing
  an array)

* ndarray slicing and numeric operations

  A central point for slicing/indexing (the output should be either
  views or copies)

  A central point to control the deferral of operations (both native and
  extensions - see below -). In fact, I see deferred operations as just
  a form of copy-on-write/evaluate-on-access views (COW must be used
  when one of the input operands of a deferred tree of operations is
  modified after capturing it into such a tree).

* numeric operations extensions

  Numeric operations should be first-class if deferred operation
  evaluation is to be taken to its highest potential, and thus they
  should be aware of an "operation evaluation engine" (as well as the
  other way around).

  If they are not (and they should be able not to be), two things can
  happen:

  - for those based only on first-class operations, it is just the root
    of a subtree

  - if more complex operations are performed (explicit looping?), they
    simply diminish the range of possibilities of optimizing opearation
    evaluation (actually producing multiple evaluation trees, or maybe
    simply forcing evaluation).

* operation evaluation engine

  This would take care of evaluating the operation tree, while
  performing optimizations on it.

  Fortunately, if a sensible interface is established between this and
  first-class numeric operations, a first implementation can provide
  just the naive evaluation, and further optimizations can be provided
  behind the scenes.

  Such optimizations would provide things like operation tree
  simplification/reorganization, blocking (a la numexpr) and
  parallellization of computations.

* storage access extensions

  Slicing in ndarray should be aware of objects represented by means
  other than "plain strided memory buffers": e.g., the compressed array
  case (where decompression could be treated with a sliding window), or
  deferred operation evaluation itself.

  In fact, as you pointed of with the MEMORY flag, both storage and
  operation evaluation can be subject to the common concept of deferral
  (accessing a compressed array is just another form of accessing
  computed contents, like accessing elements on a deferred array).


I just hope they're all not just obvious observations of what has
already been said.


Lluis

PS: sorry for the unnecessarily long mail

--
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



More information about the NumPy-Discussion mailing list