[SciPy-User] re[SciPy-user] moving for loops...

Fri May 28 16:14:56 EDT 2010

ok - something like this then...but how would i get the index for the month
for the data array (where month is 0, 1, 2, 4 ... 11)?

data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0]

and would that be quicker than making an array months...

months = np.arange(numyears * nummonths)

and you that instead like you suggested x[start:end:12,:]?

Many thanks again...

josef.pktd wrote:
> 
> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Ok thanks...I'll take a look.
>>
>> Back to my loops issue. What if instead this time I wanted to take an
>> average so every march in 11 years, is there a quicker way to go about
>> doing
>> that than my current method?
>>
>> nummonths = 12
>> numyears = 11
>>
>> for month in xrange(nummonths):
>>    for i in xrange(numpts):
>>        for ym in xrange(month, numyears * nummonths, nummonths):
>>            data[month, i] += array[ym, VAR, land_pts_index[i], 0]
> 
> 
> x[start:end:12,:] gives you every 12th row of an array x
> 
> something like this should work to get rid of the inner loop, or you
> could directly put
> range(month, numyears * nummonths, nummonths) into the array instead
> of ym and sum()
> 
> Josef
> 
> 
>>
>> so for each point in the array for a given month i am jumping through and
>> getting the next years month and so on, summing it.
>>
>> Thanks...
>>
>>
>> josef.pktd wrote:
>>>
>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Could you possibly if you have time explain further your comment re the
>>>> p-values, your suggesting I am misusing them?
>>>
>>> Depends on your use and interpretation
>>>
>>> test statistics, p-values are random variables, if you look at several
>>> tests at the same time, some p-values will be large just by chance.
>>> If, for example you just look at the largest test statistic, then the
>>> distribution for the max of several test statistics is not the same as
>>> the distribution for a single test statistic
>>>
>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>
>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>> pystatsmodels group.
>>>
>>> Josef
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Sounds like I am stuck with the loop as I need to do the comparison
>>>>>> for
>>>>>> each
>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>> guess
>>>>>> slows it down further...hmm
>>>>>
>>>>> I don't see much that could be done differently, after a brief look.
>>>>>
>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>>> never tried or seen a faster version.
>>>>>
>>>>> just a reminder, the p-values are for a single test, when you have
>>>>> many of them, then they don't have the right size/confidence level for
>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>> correction in this case)
>>>>>
>>>>> Josef
>>>>>
>>>>>
>>>>>>
>>>>>> i.e.
>>>>>>
>>>>>> def compareSnowData(jules_var):
>>>>>>    # Extract the 11 years of snow data and return
>>>>>>    outrows = 180
>>>>>>    outcols = 360
>>>>>>    numyears = 11
>>>>>>    nummonths = 12
>>>>>>
>>>>>>    # Read various files
>>>>>>    fname="world_valid_jules_pts.ascii"
>>>>>>    (numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>
>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>>    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>> numcols=1,
>>>>>> \
>>>>>>                       timesteps=132, numvars=26)
>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>>    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>> numcols=1,
>>>>>> \
>>>>>>                       timesteps=132, numvars=26)
>>>>>>
>>>>>>    # grab some space
>>>>>>    data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>> dtype=np.float32)
>>>>>>    data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>> dtype=np.float32)
>>>>>>    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>> np.nan
>>>>>>    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>> np.nan
>>>>>>
>>>>>>    # extract the data
>>>>>>    data1_snow = jules_data1[:,jules_var,:,0]
>>>>>>    data2_snow = jules_data2[:,jules_var,:,0]
>>>>>>    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>>    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>>    #for month in xrange(numyears * nummonths):
>>>>>>    #    for i in xrange(numpts):
>>>>>>    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>>    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>>    #        if data1 >= 0.0:
>>>>>>    #            data1_snow[month,i] = data1
>>>>>>    #        else:
>>>>>>    #            data1_snow[month,i] = np.nan
>>>>>>    #        if data2 > 0.0:
>>>>>>    #            data2_snow[month,i] = data2
>>>>>>    #        else:
>>>>>>    #            data2_snow[month,i] = np.nan
>>>>>>
>>>>>>    # exclude any months from *both* arrays where we have dodgy data,
>>>>>> else
>>>>>> we
>>>>>>    # can't do the correlations correctly!!
>>>>>>    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>>    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>
>>>>>>    # put data on a regular grid...
>>>>>>    print 'regridding landpts...'
>>>>>>    for i in xrange(numpts):
>>>>>>        # exclude the NaN, note masking them doesn't work in the stats
>>>>>> func
>>>>>>        x = data1_snow[:,i]
>>>>>>        x = x[np.isfinite(x)]
>>>>>>        y = data2_snow[:,i]
>>>>>>        y = y[np.isfinite(y)]
>>>>>>
>>>>>>        # r^2
>>>>>>        # exclude v.small arrays, i.e. we need just less over 4 years
>>>>>> of
>>>>>> data
>>>>>>        if len(x) and len(y) > 50:
>>>>>>            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>
>>>>>>        # wilcox signed rank test
>>>>>>        # make sure we have enough samples to do the test
>>>>>>        d = x - y
>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>> non-zero
>>>>>> differences
>>>>>>        count = len(d)
>>>>>>        if count > 10:
>>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>>            # only map out sign different data
>>>>>>            if pval < 0.05:
>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>> np.mean(x - y)
>>>>>>
>>>>>>    return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>> (the
>>>>>>>> world in
>>>>>>>> this case). Which again I had am doing with a loop (note numpts is
>>>>>>>> a
>>>>>>>> lot
>>>>>>>> bigger than my example above).
>>>>>>>>
>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>>> np.nan
>>>>>>>> for i in xrange(numpts):
>>>>>>>>        # exclude the NaN, note masking them doesn't work in the
>>>>>>>> stats
>>>>>>>> func
>>>>>>>>        x = data1_snow[:,i]
>>>>>>>>        x = x[np.isfinite(x)]
>>>>>>>>        y = data2_snow[:,i]
>>>>>>>>        y = y[np.isfinite(y)]
>>>>>>>>
>>>>>>>>        # wilcox signed rank test
>>>>>>>>        # make sure we have enough samples to do the test
>>>>>>>>        d = x - y
>>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>> non-zero
>>>>>>>> differences
>>>>>>>>        count = len(d)
>>>>>>>>        if count > 10:
>>>>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>>>>            # only map out sign different data
>>>>>>>>            if pval < 0.05:
>>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>> np.mean(x - y)
>>>>>>>>
>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>> wilcoxStats_snow
>>>>>>>> array
>>>>>>>> by removing the index,
>>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>>> array
>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>> course
>>>>>>>> doesn't work!
>>>>>>>>
>>>>>>>> x = data1_snow[:,:]
>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>> y = data2_snow[:,:]
>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>
>>>>>>>> # r^2
>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>> data
>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>>> y)[0])**2
>>>>>>>
>>>>>>>
>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>> arrays
>>>>>>> at a time (if I read the help correctly).
>>>>>>>
>>>>>>> Also the presence of nans might force the use a loop. stats.mstats
>>>>>>> has
>>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>
>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>>> calculated then it might be worth to use only array operations up to
>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>> not
>>>>>>> worth thinking too hard about this.
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> mdekauwe wrote:
>>>>>>>>>
>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>> there
>>>>>>>>> a
>>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>>> 4dims
>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>
>>>>>>> There were several discussions on the mailing lists (fancy slicing
>>>>>>> and
>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>> shapes, you can look up the details.
>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>
>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>>> the
>>>>>>>>>>> step
>>>>>>>>>>> I
>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>>> the
>>>>>>>>>>> two
>>>>>>>>>>> for
>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>
>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>> dimension,
>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a
>>>>>>>>>>>>> 2D
>>>>>>>>>>>>> array,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>> reality
>>>>>>>>>>>>> the
>>>>>>>>>>>>> arrays
>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>> solution
>>>>>>>>>>>>> as
>>>>>>>>>>>>> well
>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>> difficult
>>>>>>>>>>>>> to
>>>>>>>>>>>>> get
>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>>> that
>>>>>>>>>>>>> one
>>>>>>>>>>>>> could
>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>
>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>
>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>
>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>
>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>
>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>>    for j in xrange(numpts):
>>>>>>>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>
>>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>>
>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>
>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>> 0]
>>>>>>>>>>>>
>>>>>>>>>>>> Josef
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> View this message in context:
>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
Sent from the Scipy-User mailing list archive at Nabble.com.