[SciPy-dev] slicing vs. advanced selection -- can be there smth in the middle? ; -)

Yaroslav Halchenko lists at onerussian.com
Mon Jan 14 11:34:26 EST 2008


Dear Scipy and Numpy gurus,

Per our IRC coversation with tvaught I decided to expose my problem and
wishes to the list.

At the moment, there are 2 possibilities to select sub region of an
array

1. slicing -- efficient memory vise -- no copying is done, it is just a
view over the original data (thus even .flags.owndata=False).  Needs to
be done by specifying Slice instance(s) (explicitly or not) in the
index, ie 
b=a[ 1:4, 2:5 ]
   
2. advanced selection where either a list of indexes is given or a mask
c=a[ [1,2,3], [2,3,4] ]

in this case the data gets copied

In the application we are developing (pymvpa)  we are dealing with
relatively large arrays of data (primarily 2D while processing), where
first dimension corresponds to different samples, 2nd -- to different
features.

The problems comes that we often need to sample from an array. For
instance to check cross-validation on N-1 fold we are to generate N
'views' over original array. In each such "view" 1 sample (row) is not present
while training, and it is used as a sample to test against later on.
so at the end instead of N data records, in current implementation we
end up with N*(N-1) records (if we are to keep those views for further
analysis).

But that is not only the case with the 1st dimension -- we have to do
similarly 'evil' selection of the features, which again leads to quite
a big waste of memory.

Thus I wondered, is there any facility which could help us out (may be
by sacrificing reasonable computation cost) and have really a view on
top of an array. We don't really need a sparse representation -- we are
selecting a set of rows and columns, so every column (and similar across
rows) for a given 'view' uses the same steps/increments between
the elements.

I hope that my wording makes some sense ;-) If not -- please don't
hesitate to buzz me to provide really a use case.

Thank you in advance
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the SciPy-Dev mailing list