[SciPy-User] re[SciPy-user] moving for loops...

Sat May 22 06:21:09 EDT 2010

Sounds like I am stuck with the loop as I need to do the comparison for each
pixel of the world and then I have a basemap function call which I guess
slows it down further...hmm

i.e.

def compareSnowData(jules_var):
    # Extract the 11 years of snow data and return 
    outrows = 180
    outcols = 360
    numyears = 11
    nummonths = 12
    
    # Read various files
    fname="world_valid_jules_pts.ascii"
    (numpts, land_pts_index, latitude, longitude, rows, cols) =
jo.read_land_points_ascii(fname, 1.0)
    
    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
                       timesteps=132, numvars=26)
    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
                       timesteps=132, numvars=26)

    # grab some space
    data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
    data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) 
    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
np.nan
   
    # extract the data
    data1_snow = jules_data1[:,jules_var,:,0]
    data2_snow = jules_data2[:,jules_var,:,0]
    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
    #for month in xrange(numyears * nummonths):
    #    for i in xrange(numpts):
    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
    #        if data1 >= 0.0:
    #            data1_snow[month,i] = data1
    #        else:
    #            data1_snow[month,i] = np.nan 
    #        if data2 > 0.0:        
    #            data2_snow[month,i] = data2
    #        else:
    #            data2_snow[month,i] = np.nan                                    
    
    # exclude any months from *both* arrays where we have dodgy data, else
we
    # can't do the correlations correctly!!
    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)    
    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)   
    
    # put data on a regular grid...
    print 'regridding landpts...'
    for i in xrange(numpts):
        # exclude the NaN, note masking them doesn't work in the stats func
        x = data1_snow[:,i]
        x = x[np.isfinite(x)]
        y = data2_snow[:,i]
        y = y[np.isfinite(y)]
        
        # r^2
        # exclude v.small arrays, i.e. we need just less over 4 years of
data
        if len(x) and len(y) > 50:
            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
(stats.pearsonr(x, y)[0])**2  
        
        # wilcox signed rank test
        # make sure we have enough samples to do the test
        d = x - y
        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
differences
        count = len(d)
        if count > 10:
            z, pval = stats.wilcoxon(x, y)
            # only map out sign different data
            if pval < 0.05:
                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
np.mean(x - y)
               
    return (pearsonsr_snow, wilcoxStats_snow)


josef.pktd wrote:
> 
> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Also I then need to remap the 2D array I make onto another grid (the
>> world in
>> this case). Which again I had am doing with a loop (note numpts is a lot
>> bigger than my example above).
>>
>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>> for i in xrange(numpts):
>>        # exclude the NaN, note masking them doesn't work in the stats
>> func
>>        x = data1_snow[:,i]
>>        x = x[np.isfinite(x)]
>>        y = data2_snow[:,i]
>>        y = y[np.isfinite(y)]
>>
>>        # wilcox signed rank test
>>        # make sure we have enough samples to do the test
>>        d = x - y
>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>> differences
>>        count = len(d)
>>        if count > 10:
>>            z, pval = stats.wilcoxon(x, y)
>>            # only map out sign different data
>>            if pval < 0.05:
>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>> np.mean(x - y)
>>
>> Now I think I can push the data in one move into the wilcoxStats_snow
>> array
>> by removing the index,
>> but I can't see how I will get the individual x and y pts for each array
>> member correctly without the loop, this was my attempt which of course
>> doesn't work!
>>
>> x = data1_snow[:,:]
>> x = x[np.isfinite(x)]
>> y = data2_snow[:,:]
>> y = y[np.isfinite(y)]
>>
>> # r^2
>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>> if len(x) and len(y) > 50:
>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>> y)[0])**2
> 
> 
> If you want to do pairwise comparisons with stats.wilcoxon, then you
> might be stuck with the loop, since wilcoxon takes only two 1d arrays
> at a time (if I read the help correctly).
> 
> Also the presence of nans might force the use a loop. stats.mstats has
> masked array versions, but I didn't see wilcoxon in the list. (Even
> when vectorized operations would work with regular arrays, nan or
> masked array versions still have to loop in many cases.)
> 
> If you have many columns with count <= 10, so that wilcoxon is not
> calculated then it might be worth to use only array operations up to
> that point. If wilcoxon is calculated most of the time, then it's not
> worth thinking too hard about this.
> 
> Josef
> 
> 
>>
>> thanks.
>>
>>
>>
>>
>> mdekauwe wrote:
>>>
>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>
>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>> link you can recommend I should read? Does that mean given I have 4dims
>>> that Josef's suggestion would be more advised in this case?
> 
> There were several discussions on the mailing lists (fancy slicing and
> indexing). Your case is safe, but if you run in future into funny
> shapes, you can look up the details.
> when in doubt, I use np.arange(...)
> 
> Josef
> 
>>>
>>> Thanks.
>>>
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Thanks that works...
>>>>>
>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>> step
>>>>> I
>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>> two
>>>>> for
>>>>> loops? Do I have that right?
>>>>
>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>> then you can use slicing. It might be faster.
>>>> And a warning, mixing slices and index arrays with 3 or more
>>>> dimensions can have some surprise switching of axes.
>>>>
>>>> Josef
>>>>
>>>>>
>>>>> A lot quicker...!
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>> array,
>>>>>>> but
>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>> arrays
>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>> as
>>>>>>> well
>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>> to
>>>>>>> get
>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>> one
>>>>>>> could
>>>>>>> precompute the indices's i and j i.e.
>>>>>>>
>>>>>>> i = np.arange(tsteps)
>>>>>>> j = np.arange(numpts)
>>>>>>>
>>>>>>> but just can't get my head round how i then use them...
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Martin
>>>>>>>
>>>>>>> import numpy as np
>>>>>>>
>>>>>>> numpts=10
>>>>>>> tsteps = 12
>>>>>>> vari = 22
>>>>>>>
>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>> index = np.arange(numpts)
>>>>>>>
>>>>>>> for i in xrange(tsteps):
>>>>>>>    for j in xrange(numpts):
>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
>>>>>>
>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>
>>>>>> I think this should do it
>>>>>>
>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>
>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
Sent from the Scipy-User mailing list archive at Nabble.com.