[SciPy-User] re[SciPy-user] moving for loops...
mdekauwe
mdekauwe at gmail.com
Sat May 22 06:21:09 EDT 2010
Sounds like I am stuck with the loop as I need to do the comparison for each
pixel of the world and then I have a basemap function call which I guess
slows it down further...hmm
i.e.
def compareSnowData(jules_var):
# Extract the 11 years of snow data and return
outrows = 180
outcols = 360
numyears = 11
nummonths = 12
# Read various files
fname="world_valid_jules_pts.ascii"
(numpts, land_pts_index, latitude, longitude, rows, cols) =
jo.read_land_points_ascii(fname, 1.0)
fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
timesteps=132, numvars=26)
fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
timesteps=132, numvars=26)
# grab some space
data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
np.nan
# extract the data
data1_snow = jules_data1[:,jules_var,:,0]
data2_snow = jules_data2[:,jules_var,:,0]
data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
#for month in xrange(numyears * nummonths):
# for i in xrange(numpts):
# data1 = jules_data1[month,jules_var,land_pts_index[i],0]
# data2 = jules_data2[month,jules_var,land_pts_index[i],0]
# if data1 >= 0.0:
# data1_snow[month,i] = data1
# else:
# data1_snow[month,i] = np.nan
# if data2 > 0.0:
# data2_snow[month,i] = data2
# else:
# data2_snow[month,i] = np.nan
# exclude any months from *both* arrays where we have dodgy data, else
we
# can't do the correlations correctly!!
data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
# put data on a regular grid...
print 'regridding landpts...'
for i in xrange(numpts):
# exclude the NaN, note masking them doesn't work in the stats func
x = data1_snow[:,i]
x = x[np.isfinite(x)]
y = data2_snow[:,i]
y = y[np.isfinite(y)]
# r^2
# exclude v.small arrays, i.e. we need just less over 4 years of
data
if len(x) and len(y) > 50:
pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
(stats.pearsonr(x, y)[0])**2
# wilcox signed rank test
# make sure we have enough samples to do the test
d = x - y
d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
differences
count = len(d)
if count > 10:
z, pval = stats.wilcoxon(x, y)
# only map out sign different data
if pval < 0.05:
wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
np.mean(x - y)
return (pearsonsr_snow, wilcoxStats_snow)
josef.pktd wrote:
>
> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Also I then need to remap the 2D array I make onto another grid (the
>> world in
>> this case). Which again I had am doing with a loop (note numpts is a lot
>> bigger than my example above).
>>
>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>> for i in xrange(numpts):
>> # exclude the NaN, note masking them doesn't work in the stats
>> func
>> x = data1_snow[:,i]
>> x = x[np.isfinite(x)]
>> y = data2_snow[:,i]
>> y = y[np.isfinite(y)]
>>
>> # wilcox signed rank test
>> # make sure we have enough samples to do the test
>> d = x - y
>> d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>> differences
>> count = len(d)
>> if count > 10:
>> z, pval = stats.wilcoxon(x, y)
>> # only map out sign different data
>> if pval < 0.05:
>> wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>> np.mean(x - y)
>>
>> Now I think I can push the data in one move into the wilcoxStats_snow
>> array
>> by removing the index,
>> but I can't see how I will get the individual x and y pts for each array
>> member correctly without the loop, this was my attempt which of course
>> doesn't work!
>>
>> x = data1_snow[:,:]
>> x = x[np.isfinite(x)]
>> y = data2_snow[:,:]
>> y = y[np.isfinite(y)]
>>
>> # r^2
>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>> if len(x) and len(y) > 50:
>> pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>> y)[0])**2
>
>
> If you want to do pairwise comparisons with stats.wilcoxon, then you
> might be stuck with the loop, since wilcoxon takes only two 1d arrays
> at a time (if I read the help correctly).
>
> Also the presence of nans might force the use a loop. stats.mstats has
> masked array versions, but I didn't see wilcoxon in the list. (Even
> when vectorized operations would work with regular arrays, nan or
> masked array versions still have to loop in many cases.)
>
> If you have many columns with count <= 10, so that wilcoxon is not
> calculated then it might be worth to use only array operations up to
> that point. If wilcoxon is calculated most of the time, then it's not
> worth thinking too hard about this.
>
> Josef
>
>
>>
>> thanks.
>>
>>
>>
>>
>> mdekauwe wrote:
>>>
>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>
>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>> link you can recommend I should read? Does that mean given I have 4dims
>>> that Josef's suggestion would be more advised in this case?
>
> There were several discussions on the mailing lists (fancy slicing and
> indexing). Your case is safe, but if you run in future into funny
> shapes, you can look up the details.
> when in doubt, I use np.arange(...)
>
> Josef
>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Thanks that works...
>>>>>
>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>> step
>>>>> I
>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>> two
>>>>> for
>>>>> loops? Do I have that right?
>>>>
>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>> then you can use slicing. It might be faster.
>>>> And a warning, mixing slices and index arrays with 3 or more
>>>> dimensions can have some surprise switching of axes.
>>>>
>>>> Josef
>>>>
>>>>>
>>>>> A lot quicker...!
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>> array,
>>>>>>> but
>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>> arrays
>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>> as
>>>>>>> well
>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>> to
>>>>>>> get
>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>> one
>>>>>>> could
>>>>>>> precompute the indices's i and j i.e.
>>>>>>>
>>>>>>> i = np.arange(tsteps)
>>>>>>> j = np.arange(numpts)
>>>>>>>
>>>>>>> but just can't get my head round how i then use them...
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Martin
>>>>>>>
>>>>>>> import numpy as np
>>>>>>>
>>>>>>> numpts=10
>>>>>>> tsteps = 12
>>>>>>> vari = 22
>>>>>>>
>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>> index = np.arange(numpts)
>>>>>>>
>>>>>>> for i in xrange(tsteps):
>>>>>>> for j in xrange(numpts):
>>>>>>> new_data[i,j] = data[i,5,index[j],0]
>>>>>>
>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>
>>>>>> I think this should do it
>>>>>>
>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>
>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
--
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
Sent from the Scipy-User mailing list archive at Nabble.com.
More information about the SciPy-User
mailing list