[SciPy-User] Scipy views and slicing: Can I get a view-slice from only certain elements of an array?
Jacob Biesinger
jake.biesinger at gmail.com
Fri Oct 29 16:15:45 EDT 2010
Hi!
My question is on slicing and views. I'd like to be able to create a
view of an array from some subset of indices. This can *almost* be
done using array slices as follows:
scores = scipy.ones((10,1))
subset = scores[:5] # changes to subset will reflect in scores--
reference to the same object
subset[0] = 3
subset /= subset.sum() # renormalize subset, updating scores as well
I can also do fancy slicing and the reference ("view") to the original
array is intact...
scores = scipy.ones((10,1))
subset = scores[:6:2] # elements 0,2,6
subset[0] = 3
subset /= subset.sum() # both subset and scores are updated, though subset is
# not a contiguous slice of scores
What I can't do is create a view with arbitrary indices:
scores = scipy.ones((10,1))
subset = scores[[1,5,7]] # not a reference!
subset[0] = 3
subset /= subset.sum() # does not update scores!
Is there a way to do this? I've also tried:
subset = [scores[1:2], scores[5:6], scores[6:7]] # these are
references, but the container is a list, not an array
# and the syntax is annoying
subset = scipy.array([scores[1:2], scores[5:6], scores[6:7]) # no
longer a reference...
subset = scipy.array([scores[1:2], scores[5:6], scores[6:7],
copy=True) # also not a reference...
Any thoughts?
The data I'm working on is millions of short high-throughput
sequencing reads, each of which may have 2-100+ possible genomic
alignments. Each alignment falls within a particular genomic bin
(~150 bases) but also has a probability associated with the alignment
(so the sum over all alignments for each read will be 1). I need to
update all the alignments in a particular bin (from many different
reads) and then (once all bins are updated) renormalize all the
alignments for each read. My current strategy is to have a single 1D
array with all the probabilities, then two lists with the indexes into
the large array-- one list stores the indices that fall within a
genomic bin, whereas the other list stores the indices associated with
a particular alignment. This is working fine, but the memory
requirements are a bit high (1.5GB) and it's a bit slow since there
are millions of reads, meaning lots and lots of slices from that large
array. I wonder if I could replace the indices in each list with a
view of the original array-- it seems that would save me a bit of
memory and would make the slicing faster.
Thanks for your help!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587
More information about the SciPy-User
mailing list