[SciPy-User] re[SciPy-user] moving for loops...

Fri May 28 16:25:59 EDT 2010

On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> ok - something like this then...but how would i get the index for the month
> for the data array (where month is 0, 1, 2, 4 ... 11)?
>
> data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0]

you would still need to start at the right month
data[month,:] = array[xrange(month, numyears * nummonths, nummonths),VAR,:,0]
or
data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0]

an alternative would be a reshape with an extra month dimension and
then sum only once over the year axis. this might be faster but
trickier to get the correct reshape .

Josef

>
> and would that be quicker than making an array months...
>
> months = np.arange(numyears * nummonths)
>
> and you that instead like you suggested x[start:end:12,:]?
>
> Many thanks again...
>
>
> josef.pktd wrote:
>>
>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Ok thanks...I'll take a look.
>>>
>>> Back to my loops issue. What if instead this time I wanted to take an
>>> average so every march in 11 years, is there a quicker way to go about
>>> doing
>>> that than my current method?
>>>
>>> nummonths = 12
>>> numyears = 11
>>>
>>> for month in xrange(nummonths):
>>>    for i in xrange(numpts):
>>>        for ym in xrange(month, numyears * nummonths, nummonths):
>>>            data[month, i] += array[ym, VAR, land_pts_index[i], 0]
>>
>>
>> x[start:end:12,:] gives you every 12th row of an array x
>>
>> something like this should work to get rid of the inner loop, or you
>> could directly put
>> range(month, numyears * nummonths, nummonths) into the array instead
>> of ym and sum()
>>
>> Josef
>>
>>
>>>
>>> so for each point in the array for a given month i am jumping through and
>>> getting the next years month and so on, summing it.
>>>
>>> Thanks...
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Could you possibly if you have time explain further your comment re the
>>>>> p-values, your suggesting I am misusing them?
>>>>
>>>> Depends on your use and interpretation
>>>>
>>>> test statistics, p-values are random variables, if you look at several
>>>> tests at the same time, some p-values will be large just by chance.
>>>> If, for example you just look at the largest test statistic, then the
>>>> distribution for the max of several test statistics is not the same as
>>>> the distribution for a single test statistic
>>>>
>>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>>
>>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>>> pystatsmodels group.
>>>>
>>>> Josef
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Sounds like I am stuck with the loop as I need to do the comparison
>>>>>>> for
>>>>>>> each
>>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>>> guess
>>>>>>> slows it down further...hmm
>>>>>>
>>>>>> I don't see much that could be done differently, after a brief look.
>>>>>>
>>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>>>> never tried or seen a faster version.
>>>>>>
>>>>>> just a reminder, the p-values are for a single test, when you have
>>>>>> many of them, then they don't have the right size/confidence level for
>>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>>> correction in this case)
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> i.e.
>>>>>>>
>>>>>>> def compareSnowData(jules_var):
>>>>>>>    # Extract the 11 years of snow data and return
>>>>>>>    outrows = 180
>>>>>>>    outcols = 360
>>>>>>>    numyears = 11
>>>>>>>    nummonths = 12
>>>>>>>
>>>>>>>    # Read various files
>>>>>>>    fname="world_valid_jules_pts.ascii"
>>>>>>>    (numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>>
>>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>>>    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>> numcols=1,
>>>>>>> \
>>>>>>>                       timesteps=132, numvars=26)
>>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>>>    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>> numcols=1,
>>>>>>> \
>>>>>>>                       timesteps=132, numvars=26)
>>>>>>>
>>>>>>>    # grab some space
>>>>>>>    data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>> dtype=np.float32)
>>>>>>>    data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>> dtype=np.float32)
>>>>>>>    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>>    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>>
>>>>>>>    # extract the data
>>>>>>>    data1_snow = jules_data1[:,jules_var,:,0]
>>>>>>>    data2_snow = jules_data2[:,jules_var,:,0]
>>>>>>>    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>>>    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>>>    #for month in xrange(numyears * nummonths):
>>>>>>>    #    for i in xrange(numpts):
>>>>>>>    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>>>    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>>>    #        if data1 >= 0.0:
>>>>>>>    #            data1_snow[month,i] = data1
>>>>>>>    #        else:
>>>>>>>    #            data1_snow[month,i] = np.nan
>>>>>>>    #        if data2 > 0.0:
>>>>>>>    #            data2_snow[month,i] = data2
>>>>>>>    #        else:
>>>>>>>    #            data2_snow[month,i] = np.nan
>>>>>>>
>>>>>>>    # exclude any months from *both* arrays where we have dodgy data,
>>>>>>> else
>>>>>>> we
>>>>>>>    # can't do the correlations correctly!!
>>>>>>>    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>>>    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>>
>>>>>>>    # put data on a regular grid...
>>>>>>>    print 'regridding landpts...'
>>>>>>>    for i in xrange(numpts):
>>>>>>>        # exclude the NaN, note masking them doesn't work in the stats
>>>>>>> func
>>>>>>>        x = data1_snow[:,i]
>>>>>>>        x = x[np.isfinite(x)]
>>>>>>>        y = data2_snow[:,i]
>>>>>>>        y = y[np.isfinite(y)]
>>>>>>>
>>>>>>>        # r^2
>>>>>>>        # exclude v.small arrays, i.e. we need just less over 4 years
>>>>>>> of
>>>>>>> data
>>>>>>>        if len(x) and len(y) > 50:
>>>>>>>            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>>
>>>>>>>        # wilcox signed rank test
>>>>>>>        # make sure we have enough samples to do the test
>>>>>>>        d = x - y
>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>> non-zero
>>>>>>> differences
>>>>>>>        count = len(d)
>>>>>>>        if count > 10:
>>>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>>>            # only map out sign different data
>>>>>>>            if pval < 0.05:
>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> np.mean(x - y)
>>>>>>>
>>>>>>>    return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>>
>>>>>>>
>>>>>>> josef.pktd wrote:
>>>>>>>>
>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>>> (the
>>>>>>>>> world in
>>>>>>>>> this case). Which again I had am doing with a loop (note numpts is
>>>>>>>>> a
>>>>>>>>> lot
>>>>>>>>> bigger than my example above).
>>>>>>>>>
>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>>>> np.nan
>>>>>>>>> for i in xrange(numpts):
>>>>>>>>>        # exclude the NaN, note masking them doesn't work in the
>>>>>>>>> stats
>>>>>>>>> func
>>>>>>>>>        x = data1_snow[:,i]
>>>>>>>>>        x = x[np.isfinite(x)]
>>>>>>>>>        y = data2_snow[:,i]
>>>>>>>>>        y = y[np.isfinite(y)]
>>>>>>>>>
>>>>>>>>>        # wilcox signed rank test
>>>>>>>>>        # make sure we have enough samples to do the test
>>>>>>>>>        d = x - y
>>>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>>> non-zero
>>>>>>>>> differences
>>>>>>>>>        count = len(d)
>>>>>>>>>        if count > 10:
>>>>>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>>>>>            # only map out sign different data
>>>>>>>>>            if pval < 0.05:
>>>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>>> np.mean(x - y)
>>>>>>>>>
>>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>>> wilcoxStats_snow
>>>>>>>>> array
>>>>>>>>> by removing the index,
>>>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>>>> array
>>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>>> course
>>>>>>>>> doesn't work!
>>>>>>>>>
>>>>>>>>> x = data1_snow[:,:]
>>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>>> y = data2_snow[:,:]
>>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>>
>>>>>>>>> # r^2
>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>>> data
>>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>>>> y)[0])**2
>>>>>>>>
>>>>>>>>
>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>>> arrays
>>>>>>>> at a time (if I read the help correctly).
>>>>>>>>
>>>>>>>> Also the presence of nans might force the use a loop. stats.mstats
>>>>>>>> has
>>>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>>
>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>>>> calculated then it might be worth to use only array operations up to
>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>>> not
>>>>>>>> worth thinking too hard about this.
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> mdekauwe wrote:
>>>>>>>>>>
>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>>> there
>>>>>>>>>> a
>>>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>>>> 4dims
>>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>>
>>>>>>>> There were several discussions on the mailing lists (fancy slicing
>>>>>>>> and
>>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>>> shapes, you can look up the details.
>>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>>
>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>>>> the
>>>>>>>>>>>> step
>>>>>>>>>>>> I
>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>>>> the
>>>>>>>>>>>> two
>>>>>>>>>>>> for
>>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>>
>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>>> dimension,
>>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>>
>>>>>>>>>>> Josef
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a
>>>>>>>>>>>>>> 2D
>>>>>>>>>>>>>> array,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>>> reality
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> arrays
>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> well
>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>>>    for j in xrange(numpts):
>>>>>>>>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>>
>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>>> 0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>