[Numpy-discussion] numarray speed problem
Todd Miller
jmiller at stsci.edu
Tue Sep 20 10:17:18 EDT 2005
Hi H,
I did some work on this problem based on your previous post but
apparently my response never made it to numpy-discussion. In a
nutshell, I made numarray 12x faster for a benchmark like your
numarray_pb_sample.py by speeding up string comparisons and improving
all(). The changes are in numarray CVS but there is no Source Forge
release that contains them yet. numarray-1.4.0 is still several weeks
away. If you want to try CVS from UNIX/Linux just do:
% cvs -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy login
% cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co -P
numarray
Regards,
Todd
Humufr wrote:
> Hello,
>
> I have a problem with numarray and especially the function numarray.all.
>
> I want to compare two files to do this I read the files with a
> function readcol2 who can put them in a list or numarray format
> (string or numerical).
>
> I'm doing a comparaison on each line of the file.
> If I'm using the array format and the numarray.all function, that take
> forever to do the comparaison for 2 big files. If I'm using python
> list object, it's very fast. I think there are some problem or at
> least some improvement to do. If I understand correctly the goal of
> numarray, it has been write to speed up some part of python but here
> it slow down a lot.
>
> An very simple sample to see the effect is at the bottom of this mail.
>
> Thanks for numarray, I hope to not bother you. My comments are more to
> improve numarray than other things. I have been able to find the
> problem so no I can avoied it.
>
> H.
>
>
>
>
> def
> readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'):
>
> """
> Load ASCII data from fname into an array and return the array.
> The data must be regular, same number of values in every row
> fname can be a filename or a file handle.
>
> Input:
>
> - Fname : the name of the file to read
>
> Optionnal input:
> - comments : a string to indicate the charactor to delimit the
> domments.
> the default is the matlab character '%'.
> - columns : list or tuple ho contains the columns to use.
> - delimiter : a string to delimit the columns
>
> - dep : an integer to indicate from which line you want to begin
>
> to use the file (useful to avoid the descriptions lines)
>
> - arraytype : a string to indicate which kind of array you want ot
> have: numeric array (numeric) or character array
> (numstring) or list (list). By default it's the
>
> list mode used
>
> matfile data is not currently supported, but see
> Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz
>
> Example usage:
>
> x,y = transpose(readcol('test.dat')) # data in two columns
>
> X = readcol('test.dat') # a matrix of data
>
> x = readcol('test.dat') # a single column of data
>
> x = readcol('test.dat,'#') # the character use like a comment
> delimiter is '#'
>
> initial function from pylab (J.Hunter). Change by myself for my
> specific need
>
> """
> from numarray import array,transpose
>
> fh = file(fname)
>
> X = []
> numCols = None
> nline = 0
> if columns is None:
> for line in fh:
> nline += 1
> if dep is not None and nline <= dep: continue
> line = line[:line.find(comments)].strip()
> if not len(line): continue
> if arraytype=='numeric':
> row = [float(val) for val in line.split(delimiter)]
> else:
> row = [val.strip() for val in line.split(delimiter)]
> thisLen = len(row)
> if numCols is not None and thisLen != numCols:
> raise ValueError('All rows must have the same number of
> columns')
> X.append(row)
> else:
> for line in fh:
> nline +=1
> if dep is not None and nline <= dep: continue
> line = line[:line.find(comments)].strip()
> if not len(line): continue
> row = line.split(delimiter)
> if arraytype=='numeric':
> row = [float(row[i-1]) for i in columns]
> elif arraytype=='numstring':
> row = [row[i-1].strip() for i in columns]
> else:
> row = [row[i-1].strip() for i in columns]
> thisLen = len(row)
> if numCols is not None and thisLen != numCols:
> raise ValueError('All rows must have the same number of
> columns')
> X.append(row)
>
> if arraytype=='numeric':
> X = array(X)
> r,c = X.shape
> if r==1 or c==1:
> X.shape = max([r,c]),
> elif arraytype == 'numstring':
> import numarray.strings # pb if numeric+pylab
> X = numarray.strings.array(X)
> r,c = X.shape
> if r==1 or c==1:
> X.shape = max([r,c]),
> return X
>
>
> -------------------------------------------
> files_test_creation.py
>
> -------------------------------------------
>
> f1 = file('test1.dat','w')
> for i in range(10000):
> f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
> f1.close()
>
>
> f2 = file('test2.dat','w')
> for i in range(10000):
> f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
> f2.close()
>
> -------------------------------------------
> numarray_pb_sample.py
>
> -------------------------------------------
>
> import numarray
> data1 =
> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='numstring')
> data2 =
> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='numstring')
>
> #or in non string array form (same result)
> ## data1 =
> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='numeric')
> ## data2 =
> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='numeric')
>
> for a_i in range(data1.shape[0]):
> for b_i in range(data2.shape[0]):
> if numarray.all(data1[a_i,:] == data2[b_i,:]):
> print a_i,b_i
>
> -------------------------------------------
> python_list_sample.py
>
> -------------------------------------------
>
> data1 =
> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='list')
> data2 =
> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter='
> ',dep=1,arraytype='list')
>
> for a_i in range(len(data1)):
> for b_i in range(len(data2)):
> if data1[a_i] == data2[b_i]:
> print a_i,b_i
>
>
>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server.
> Download it for free - -and be entered to win a 42" plasma tv or your
> very
> own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list