[Neuroimaging] iteraxis API - we need feedback

Sat Sep 5 01:06:24 CEST 2015

Hi,

Over at nibabel gh-344 [1], we found ourselves discussing how to write
an iterator that will allow you to efficiently iterate over slices
from the image array.   We'd love some feedback on where we got to.

As some of you may know, images now have a `dataobj` attribute, that
can contain one of two things:

* an array proxy (if you loaded the image from a file);
* a numpy array (if you created the image with data from an array);

The array proxy object has some fancy slicing syntax that means that
something like ``arr.dataobj[..., 0]`` will only read the data for the
first slice on the last axis.  This can be a lot more efficient that
loading all the data at once with `get_data` [2].

We're currently thinking of a good iterator syntax, something like this:

for vol in img.iteraxis(3):  # iterate over 4th axis
    # do something with vol

where `iteraxis` would use `databobj` slicing under the hood.

The questions are:

* should this be a method on the image (`img.iteraxis`), the dataobj
(`img.dataobj.iteraxis`) or should it be a standalone function that
knows about arrays and array proxies? (`nibabel.iteraxis`);
* how should the iterator optimize speed or memory?   Should this be
configurable?  For example, if you are iterating over the first axis
of a Nifti, then it will probably be most efficient to read all the
data into memory and return the slices from the numpy array.   This
will be very expensive in memory.   If a file is compressed, it may be
most efficient to uncompress the file and use the uncompressed version
with `dataobj` file slicing - but this will involve a temporary file
that may be very large.   Options are:

    * find some heuristic to chose joint optimization for memory and speed;
    * always optimize for memory;
    * always optimize for speed, saving memory where possible;
    * have a tuning kwarg selecting between these options.

The upside of image.iteraxis would be to embed knowledge we've gained
on these objects and simplify the interface for users. The downside is
it's more work for us and the right choice is system-dependent. To
address this, Ben C proposed a benchmark method, which outputs which
optimize method is best for the given image on the current system.

Any thoughts?   Use-cases?

Cheers,

Matthew

[1] https://github.com/nipy/nibabel/issues/344
[2] http://nipy.org/nibabel/images_and_memory.html#saving-time-and-memory