[Numpy-discussion] numarray speed problem

Tue Sep 20 11:40:15 EDT 2005

Thank you very much. I saw no answer before. It's why I reduce a lot the 
sample :)

I'll try it now

Todd Miller wrote:

> Hi H,
>
> I did some work on this problem based on your previous post but 
> apparently my response never made it to numpy-discussion.  In a 
> nutshell,  I made numarray 12x faster for a benchmark like your 
> numarray_pb_sample.py by speeding up string comparisons and improving 
> all().  The changes are in numarray CVS but there is no Source Forge 
> release that contains them yet.   numarray-1.4.0 is still several 
> weeks away.   If you want to try CVS from UNIX/Linux just do:
>
> % cvs -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy login
> % cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co 
> -P numarray
>
> Regards,
> Todd
>
> Humufr wrote:
>
>> Hello,
>>
>> I have a problem with numarray and especially the function numarray.all.
>>
>> I want to compare two files to do this I read the files with a 
>> function readcol2 who can put them in a list or numarray format 
>> (string or numerical).
>>
>> I'm doing a comparaison on each line of the file.
>> If I'm using the array format and the numarray.all function, that 
>> take forever to do the comparaison for 2 big files. If I'm using 
>> python list object, it's very fast. I think there are some problem or 
>> at least some improvement to do. If I understand correctly the goal 
>> of numarray, it has been write to speed up some part of python but 
>> here it slow down a lot.
>>
>> An very simple sample to see the effect is at the bottom of this mail.
>>
>> Thanks for numarray, I hope to not bother you. My comments are more 
>> to improve numarray than other things. I have been able to find the 
>> problem so no I can avoied it.
>>
>> H.
>>
>>
>>
>>
>> def 
>> readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): 
>>
>>    """
>>    Load ASCII data from fname into an array and return the array.
>>      The data must be regular, same number of values in every row
>>      fname can be a filename or a file handle.
>>  
>>    Input:
>>
>>    - Fname : the name of the file to read
>>
>>    Optionnal input:
>>      - comments : a string to indicate the charactor to delimit the 
>> domments.
>>                   the default is the matlab character '%'.
>>      - columns : list or tuple ho contains the columns to use.
>>      - delimiter : a string to delimit the columns
>>
>>    - dep : an integer to indicate from which line you want to begin
>>
>>            to use the file (useful to avoid the descriptions lines)
>>
>>    - arraytype : a string to indicate which kind of array you want ot
>>                    have: numeric array (numeric) or character array 
>> (numstring) or list (list). By default it's the
>>
>>                  list mode used
>>                   matfile data is not currently supported, but see
>>    Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz
>>
>>    Example usage:
>>
>>    x,y = transpose(readcol('test.dat'))  # data in two columns
>>
>>    X = readcol('test.dat')    # a matrix of data
>>
>>    x = readcol('test.dat')    # a single column of data
>>
>>    x = readcol('test.dat,'#') # the character use like a comment 
>> delimiter is '#'
>>
>>    initial function from pylab (J.Hunter). Change by myself for my 
>> specific need
>>
>>    """
>>    from numarray import array,transpose
>>
>>    fh = file(fname)
>>
>>    X = []
>>    numCols = None
>>    nline = 0
>>    if columns is None:
>>        for line in fh:
>>            nline += 1
>>            if dep is not None and nline <= dep: continue
>>            line = line[:line.find(comments)].strip()
>>            if not len(line): continue
>>            if arraytype=='numeric':
>>                row = [float(val) for val in line.split(delimiter)]
>>            else:
>>                row = [val.strip() for val in line.split(delimiter)]
>>            thisLen = len(row)
>>            if numCols is not None and thisLen != numCols:
>>                raise ValueError('All rows must have the same number 
>> of columns')
>>            X.append(row)
>>    else:
>>        for line in fh:
>>            nline +=1
>>            if dep is not None and nline <= dep: continue
>>            line = line[:line.find(comments)].strip()
>>            if not len(line): continue
>>            row = line.split(delimiter)
>>            if arraytype=='numeric':
>>                row = [float(row[i-1]) for i in columns]
>>            elif arraytype=='numstring':
>>                row = [row[i-1].strip() for i in columns]
>>            else:
>>                row = [row[i-1].strip() for i in columns]
>>            thisLen = len(row)
>>                  if numCols is not None and thisLen != numCols:
>>                raise ValueError('All rows must have the same number 
>> of columns')
>>            X.append(row)
>>
>>    if arraytype=='numeric':
>>        X = array(X)
>>        r,c = X.shape
>>        if r==1 or c==1:
>>            X.shape = max([r,c]),
>>    elif arraytype == 'numstring':
>>        import numarray.strings               # pb if numeric+pylab
>>        X = numarray.strings.array(X)
>>        r,c = X.shape
>>        if r==1 or c==1:
>>            X.shape = max([r,c]),
>>          return X
>>
>>
>> -------------------------------------------
>> files_test_creation.py
>>
>> -------------------------------------------
>>
>> f1 = file('test1.dat','w')
>> for i in range(10000):
>>    f1.write(str(i)+'   '+str(i+1)+'   '+str(i+2)+'\n')
>>   f1.close()
>>
>>
>> f2 = file('test2.dat','w')
>> for i in range(10000):
>>    f2.write(str(i)+'   '+str(i+1)+'   '+str(i+2)+'\n')
>>   f2.close()
>>
>> -------------------------------------------
>> numarray_pb_sample.py
>>
>> -------------------------------------------
>>
>> import numarray
>> data1 = 
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='numstring')
>> data2 = 
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='numstring')
>>
>> #or in non string array form  (same result)
>> ## data1 = 
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='numeric')
>> ## data2 = 
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='numeric')
>>
>> for a_i in range(data1.shape[0]):
>>    for b_i in range(data2.shape[0]):
>>        if numarray.all(data1[a_i,:] == data2[b_i,:]):
>>            print a_i,b_i
>>
>> -------------------------------------------
>> python_list_sample.py
>>
>> -------------------------------------------
>>
>> data1 = 
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='list')
>> data2 = 
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='  
>> ',dep=1,arraytype='list')
>>
>> for a_i in range(len(data1)):
>>    for b_i in  range(len(data2)):
>>        if data1[a_i] == data2[b_i]:
>>            print a_i,b_i
>>
>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is sponsored by:
>> Tame your development challenges with Apache's Geronimo App Server. 
>> Download it for free - -and be entered to win a 42" plasma tv or your 
>> very
>> own Sony(tm)PSP.  Click here to play: 
>> http://sourceforge.net/geronimo.php
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
>