[Numpy-discussion] numarray speed problem
Humufr
humufr at yahoo.fr
Tue Sep 20 11:40:15 EDT 2005
Thank you very much. I saw no answer before. It's why I reduce a lot the
sample :)
I'll try it now
Todd Miller wrote:
> Hi H,
>
> I did some work on this problem based on your previous post but
> apparently my response never made it to numpy-discussion. In a
> nutshell, I made numarray 12x faster for a benchmark like your
> numarray_pb_sample.py by speeding up string comparisons and improving
> all(). The changes are in numarray CVS but there is no Source Forge
> release that contains them yet. numarray-1.4.0 is still several
> weeks away. If you want to try CVS from UNIX/Linux just do:
>
> % cvs -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy login
> % cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co
> -P numarray
>
> Regards,
> Todd
>
> Humufr wrote:
>
>> Hello,
>>
>> I have a problem with numarray and especially the function numarray.all.
>>
>> I want to compare two files to do this I read the files with a
>> function readcol2 who can put them in a list or numarray format
>> (string or numerical).
>>
>> I'm doing a comparaison on each line of the file.
>> If I'm using the array format and the numarray.all function, that
>> take forever to do the comparaison for 2 big files. If I'm using
>> python list object, it's very fast. I think there are some problem or
>> at least some improvement to do. If I understand correctly the goal
>> of numarray, it has been write to speed up some part of python but
>> here it slow down a lot.
>>
>> An very simple sample to see the effect is at the bottom of this mail.
>>
>> Thanks for numarray, I hope to not bother you. My comments are more
>> to improve numarray than other things. I have been able to find the
>> problem so no I can avoied it.
>>
>> H.
>>
>>
>>
>>
>> def
>> readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'):
>>
>> """
>> Load ASCII data from fname into an array and return the array.
>> The data must be regular, same number of values in every row
>> fname can be a filename or a file handle.
>>
>> Input:
>>
>> - Fname : the name of the file to read
>>
>> Optionnal input:
>> - comments : a string to indicate the charactor to delimit the
>> domments.
>> the default is the matlab character '%'.
>> - columns : list or tuple ho contains the columns to use.
>> - delimiter : a string to delimit the columns
>>
>> - dep : an integer to indicate from which line you want to begin
>>
>> to use the file (useful to avoid the descriptions lines)
>>
>> - arraytype : a string to indicate which kind of array you want ot
>> have: numeric array (numeric) or character array
>> (numstring) or list (list). By default it's the
>>
>> list mode used
>> matfile data is not currently supported, but see
>> Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz
>>
>> Example usage:
>>
>> x,y = transpose(readcol('test.dat')) # data in two columns
>>
>> X = readcol('test.dat') # a matrix of data
>>
>> x = readcol('test.dat') # a single column of data
>>
>> x = readcol('test.dat,'#') # the character use like a comment
>> delimiter is '#'
>>
>> initial function from pylab (J.Hunter). Change by myself for my
>> specific need
>>
>> """
>> from numarray import array,transpose
>>
>> fh = file(fname)
>>
>> X = []
>> numCols = None
>> nline = 0
>> if columns is None:
>> for line in fh:
>> nline += 1
>> if dep is not None and nline <= dep: continue
>> line = line[:line.find(comments)].strip()
>> if not len(line): continue
>> if arraytype=='numeric':
>> row = [float(val) for val in line.split(delimiter)]
>> else:
>> row = [val.strip() for val in line.split(delimiter)]
>> thisLen = len(row)
>> if numCols is not None and thisLen != numCols:
>> raise ValueError('All rows must have the same number
>> of columns')
>> X.append(row)
>> else:
>> for line in fh:
>> nline +=1
>> if dep is not None and nline <= dep: continue
>> line = line[:line.find(comments)].strip()
>> if not len(line): continue
>> row = line.split(delimiter)
>> if arraytype=='numeric':
>> row = [float(row[i-1]) for i in columns]
>> elif arraytype=='numstring':
>> row = [row[i-1].strip() for i in columns]
>> else:
>> row = [row[i-1].strip() for i in columns]
>> thisLen = len(row)
>> if numCols is not None and thisLen != numCols:
>> raise ValueError('All rows must have the same number
>> of columns')
>> X.append(row)
>>
>> if arraytype=='numeric':
>> X = array(X)
>> r,c = X.shape
>> if r==1 or c==1:
>> X.shape = max([r,c]),
>> elif arraytype == 'numstring':
>> import numarray.strings # pb if numeric+pylab
>> X = numarray.strings.array(X)
>> r,c = X.shape
>> if r==1 or c==1:
>> X.shape = max([r,c]),
>> return X
>>
>>
>> -------------------------------------------
>> files_test_creation.py
>>
>> -------------------------------------------
>>
>> f1 = file('test1.dat','w')
>> for i in range(10000):
>> f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
>> f1.close()
>>
>>
>> f2 = file('test2.dat','w')
>> for i in range(10000):
>> f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
>> f2.close()
>>
>> -------------------------------------------
>> numarray_pb_sample.py
>>
>> -------------------------------------------
>>
>> import numarray
>> data1 =
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='numstring')
>> data2 =
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='numstring')
>>
>> #or in non string array form (same result)
>> ## data1 =
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='numeric')
>> ## data2 =
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='numeric')
>>
>> for a_i in range(data1.shape[0]):
>> for b_i in range(data2.shape[0]):
>> if numarray.all(data1[a_i,:] == data2[b_i,:]):
>> print a_i,b_i
>>
>> -------------------------------------------
>> python_list_sample.py
>>
>> -------------------------------------------
>>
>> data1 =
>> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='list')
>> data2 =
>> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
>> ',dep=1,arraytype='list')
>>
>> for a_i in range(len(data1)):
>> for b_i in range(len(data2)):
>> if data1[a_i] == data2[b_i]:
>> print a_i,b_i
>>
>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is sponsored by:
>> Tame your development challenges with Apache's Geronimo App Server.
>> Download it for free - -and be entered to win a 42" plasma tv or your
>> very
>> own Sony(tm)PSP. Click here to play:
>> http://sourceforge.net/geronimo.php
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
>
More information about the NumPy-Discussion
mailing list