From devnew at gmail.com Sat Mar 1 00:41:30 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Fri, 29 Feb 2008 21:41:30 -0800 (PST) Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: References: Message-ID: <78f12aa1-5815-4875-b354-8e0b6cc270ad@s13g2000prd.googlegroups.com> On Mar 1, 12:57 am, "Peter Skomoroch" wrote: I think > > matlab example should be easy to translate to scipy/matplotlib using the > > montage function: > > > load faces.mat > > %Form covariance matrix > > C=cov(faces'); > > %build eigenvectors and eigenvalues > > [E,D] = eig(C); hi Peter, nice code..ran the examples.. however couldn't follow the matlab code since i have no exposure to matlab..was using numpy etc for calcs could you confirm the layout for the face images data? i assumed that the initial face matrix should be faces=a numpy matrix with N rows ie N=numofimages row1=image1pixels as a sequence row2=image2pixels as a sequence ... rowN=imageNpixels as a sequence and covariancematrix=faces*faces_transpose is this the right way? thanks From charlesr.harris at gmail.com Sat Mar 1 01:12:56 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Feb 2008 23:12:56 -0700 Subject: [Numpy-discussion] contiguous true In-Reply-To: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> References: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> Message-ID: On Fri, Feb 29, 2008 at 10:53 AM, John Hunter wrote: > [apologies if this is a resend, my mail just flaked out] > > I have a boolean array and would like to find the lowest index "ind" > where N contiguous elements are all True. Eg, if x is > > In [101]: x = np.random.rand(20)>.4 > > In [102]: x > Out[102]: > array([False, True, True, False, False, True, True, False, False, > True, False, True, False, True, True, True, False, True, > False, True], dtype=bool) > > I would like to find ind=1 for N=2 and ind=13 for N=2. I assume with > the right cumsum, diff and maybe repeat magic, this can be vectorized, > but the proper incantation is escaping me. > > for N==3, I thought of > > In [110]: x = x.astype(int) > In [112]: y = x[:-2] + x[1:-1] + x[2:] > > In [125]: ind = (y==3).nonzero()[0] > > In [126]: if len(ind): ind = ind[0] > > In [128]: ind > Out[128]: 13 > This may be more involved than you want, but In [37]: prng = random.RandomState(1234567890) In [38]: x = prng.random_sample(50) < 0.5 In [39]: y1 = concatenate(([False], x[:-1])) In [40]: y2 = concatenate((x[1:], [False])) In [41]: beg = ind[x & ~y1] In [42]: end = ind[x & ~y2] In [43]: cnt = end - beg + 1 In [44]: i = beg[cnt == 4] In [45]: i Out[45]: array([28]) In [46]: x Out[46]: array([False, False, False, False, True, False, True, False, False, False, True, False, True, False, True, True, True, True, True, False, False, False, True, False, True, False, False, False, True, True, True, True, False, False, True, False, False, False, False, False, False, False, False, True, False, False, True, False, True, False], dtype=bool) produces a list of the indices where sequences of length 4 begin. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Mar 1 01:21:20 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Feb 2008 23:21:20 -0700 Subject: [Numpy-discussion] contiguous true In-Reply-To: References: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> Message-ID: On Fri, Feb 29, 2008 at 11:12 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Feb 29, 2008 at 10:53 AM, John Hunter wrote: > > > [apologies if this is a resend, my mail just flaked out] > > > > I have a boolean array and would like to find the lowest index "ind" > > where N contiguous elements are all True. Eg, if x is > > > > In [101]: x = np.random.rand(20)>.4 > > > > In [102]: x > > Out[102]: > > array([False, True, True, False, False, True, True, False, False, > > True, False, True, False, True, True, True, False, True, > > False, True], dtype=bool) > > > > I would like to find ind=1 for N=2 and ind=13 for N=2. I assume with > > the right cumsum, diff and maybe repeat magic, this can be vectorized, > > but the proper incantation is escaping me. > > > > for N==3, I thought of > > > > In [110]: x = x.astype(int) > > In [112]: y = x[:-2] + x[1:-1] + x[2:] > > > > In [125]: ind = (y==3).nonzero()[0] > > > > In [126]: if len(ind): ind = ind[0] > > > > In [128]: ind > > Out[128]: 13 > > > > > This may be more involved than you want, but > > In [37]: prng = random.RandomState(1234567890) > > In [38]: x = prng.random_sample(50) < 0.5 > > In [39]: y1 = concatenate(([False], x[:-1])) > > In [40]: y2 = concatenate((x[1:], [False])) > > In [41]: beg = ind[x & ~y1] > > In [42]: end = ind[x & ~y2] > > In [43]: cnt = end - beg + 1 > > In [44]: i = beg[cnt == 4] > > In [45]: i > Out[45]: array([28]) > > In [46]: x > Out[46]: > array([False, False, False, False, True, False, True, False, False, > False, True, False, True, False, True, True, True, True, > True, False, False, False, True, False, True, False, False, > False, True, True, True, True, False, False, True, False, > False, False, False, False, False, False, False, True, False, > False, True, False, True, False], dtype=bool) > > produces a list of the indices where sequences of length 4 begin. > > Chuck > Oops, ind = arange(len(x)). I suppose nonzero would work as well. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Sat Mar 1 01:56:41 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 1 Mar 2008 01:56:41 -0500 Subject: [Numpy-discussion] contiguous true In-Reply-To: References: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> Message-ID: On 01/03/2008, Charles R Harris wrote: > > On Fri, Feb 29, 2008 at 10:53 AM, John Hunter wrote: > > > I have a boolean array and would like to find the lowest index "ind" > > > where N contiguous elements are all True. Eg, if x is [...] > Oops, ind = arange(len(x)). I suppose nonzero would work as well. I'm guessing you're alluding to the fact that diff(nonzero(x)) gives you a list of the run lengths of Falses in x (except possibly for the first one). If you have a fondness for the baroque, you can try numpy.where(numpy.convolve(x,[1,]*N,'valid')==N) For large N this can even use Fourier-domain convolution (though you'd then have to be careful about round-off error). Silly, really, it's O(NM) or O(N log M) instead of O(N). Anne From peter.skomoroch at gmail.com Sat Mar 1 02:18:52 2008 From: peter.skomoroch at gmail.com (Peter Skomoroch) Date: Sat, 1 Mar 2008 02:18:52 -0500 Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: <78f12aa1-5815-4875-b354-8e0b6cc270ad@s13g2000prd.googlegroups.com> References: <78f12aa1-5815-4875-b354-8e0b6cc270ad@s13g2000prd.googlegroups.com> Message-ID: I think that is correct... Here is what the final result should look like: http://www.datawrangling.com/media/images/first_16.png If the dimensions for the sample faces don't work out to ( 361 x 361 ) in the end, then you are likely to be missing a transpose somewhere. Also, be aware that the scipy linalg.eig by default returns a vector of eigenvalues and a matrix, but the Matlab eig(), returns 2 matrices ( the eigenvalues are multiplied by an identity matrix to get a diagonal matrix). You can check out the mathesaurus reference sheet for help translating the example into python, but hopefully this will point you in the right direction: see: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eig.html vs: >>> help(linalg.eig) > > Help on function eig in module scipy.linalg.decomp: > > eig(a, b=None, left=False, right=True, overwrite_a=False, > overwrite_b=False) > Solve ordinary and generalized eigenvalue problem > of a square matrix. > > Inputs: > > a -- An N x N matrix. > b -- An N x N matrix [default is identity(N)]. > left -- Return left eigenvectors [disabled]. > right -- Return right eigenvectors [enabled]. > overwrite_a, overwrite_b -- save space by overwriting the a and/or > b matrices (both False by default) > > Outputs: > > w -- eigenvalues [left==right==False]. > w,vr -- w and right eigenvectors [left==False,right=True]. > w,vl -- w and left eigenvectors [left==True,right==False]. > w,vl,vr -- [left==right==True]. > > Definitions: > > a * vr[:,i] = w[i] * b * vr[:,i] > > a^H * vl[:,i] = conjugate(w[i]) * b^H * vl[:,i] > > where a^H denotes transpose(conjugate(a)). > On Sat, Mar 1, 2008 at 12:41 AM, devnew at gmail.com wrote: > > > On Mar 1, 12:57 am, "Peter Skomoroch" wrote: > I think > > > matlab example should be easy to translate to scipy/matplotlib using > the > > > montage function: > > > > > load faces.mat > > > %Form covariance matrix > > > C=cov(faces'); > > > %build eigenvectors and eigenvalues > > > [E,D] = eig(C); > > > hi Peter, > nice code..ran the examples.. > however couldn't follow the matlab code since i have no exposure to > matlab..was using numpy etc for calcs > could you confirm the layout for the face images data? i assumed that > the initial face matrix should be > faces=a numpy matrix with N rows ie N=numofimages > > row1=image1pixels as a sequence > row2=image2pixels as a sequence > ... > rowN=imageNpixels as a sequence > > > and covariancematrix=faces*faces_transpose > > is this the right way? > thanks > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Peter N. Skomoroch peter.skomoroch at gmail.com http://www.datawrangling.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From devnew at gmail.com Sat Mar 1 02:27:05 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Fri, 29 Feb 2008 23:27:05 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> Message-ID: <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> > This example assumes that facearray is an ndarray.(like you described > in original post ;-) ) It looks like you are using a matrix. hi Arnar thanks .. a few doubts however 1.when i use say 10 images of 4X3 each u, s, vt = linalg.svd(facearray, 0) i will get vt of shape (10,12) can't i take this as facespace? why do i need to get the transpose? then i can take as eigface_image0= vt[0].reshape(imgwdth,imght) 2.this way (svd) is diff from covariance matrix method. if i am to do it using the later ,how can i get the eigenface image data? thanks for the help D From robert.kern at gmail.com Sat Mar 1 03:17:46 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 1 Mar 2008 02:17:46 -0600 Subject: [Numpy-discussion] failure building numpy using icc In-Reply-To: <20080228192138.GA21482@swri.org> References: <20080228192138.GA21482@swri.org> Message-ID: <3d375d730803010017r1505e28fwd68bc554060b5ba3@mail.gmail.com> On Thu, Feb 28, 2008 at 1:21 PM, Glen W. Mabey wrote: > Hello, > > I'm using svn numpy and get the following error upon executing > > /usr/local/bin/python2.5 setup.py config --noisy --cc=/opt/intel/cce/10.0.025/bin/icc --compiler=intel --fcompiler=intel build_clib build_ext > > I see: > > conv_template:> build/src.linux-x86_64-2.5/numpy/core/src/scalartypes.inc > Traceback (most recent call last): > File "setup.py", line 96, in > setup_package() > File "setup.py", line 89, in setup_package > configuration=configuration ) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/core.py", line 184, in setup > return old_setup(**new_attr) > File "/usr/local/lib/python2.5/distutils/core.py", line 151, in setup > dist.run_commands() > File "/usr/local/lib/python2.5/distutils/dist.py", line 974, in run_commands > self.run_command(cmd) > File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command > cmd_obj.run() > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_ext.py", line 56, in run > self.run_command('build_src') > File "/usr/local/lib/python2.5/distutils/cmd.py", line 333, in run_command > self.distribution.run_command(command) > File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command > cmd_obj.run() > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 130, in run > self.build_sources() > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 147, in build_sources > self.build_extension_sources(ext) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 252, in build_extension_sources > sources = self.template_sources(sources, ext) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 359, in template_sources > outstr = process_c_file(source) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 185, in process_file > % (sourcefile, process_str(''.join(lines)))) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 150, in process_str > newstr[sub[0]:sub[1]], sub[4]) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 117, in expand_sub > % (line, template_re.sub(namerepl, substr))) > File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 113, in namerepl > return names[name][thissub[0]] > KeyError: 'PREFIX' > > > And I do not see any errors when building the same svn version with gcc (on > a different machine). > > I've unsuccessfully tried to follow that backtrace of functions to > figure out exactly what is going on. > > Any hints/suggestions? Off-hand, no, sorry. I'm not sure why the compiler would matter in this part of the code, though. Can you try using gcc on the same machine? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From devnew at gmail.com Sat Mar 1 08:43:06 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Sat, 1 Mar 2008 05:43:06 -0800 (PST) Subject: [Numpy-discussion] svd() and eigh() Message-ID: <090949fd-795c-4152-8df9-49b3182aed02@i12g2000prf.googlegroups.com> hi i have a set of images of faces which i make into a 2d array using numpy.ndarray each row represents a face image faces= [[ 173. 87. ... 88. 165.] [ 158. 103. .. 73. 143.] [ 180. 87. .. 55. 143.] [ 155. 117. .. 93. 155.]] from which i can get the mean image => avgface=average(faces,axis=0) and calculate the adjustedfaces=faces-avgface now if i apply svd() i get u, s, vt = linalg.svd(adjustedfaces, 0) # a member posted this facespace=vt.transpose() and if i calculate covariance matrix covmat=matrix(adjustedfaces)* matrix(adjustedfaces).transpose() eval,evect=eigh(covmat) evect=sortbyeigenvalue(evect) # sothat largest eval is first facespace=evect* matrix(adjustedfaces) what is the difference btw these 2 methods? apparently they yield different values for the facespace. which should i follow? is it possible to calculate eigenvectors using svd()? thanks D From arnar.flatberg at gmail.com Sat Mar 1 12:50:48 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Sat, 1 Mar 2008 18:50:48 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> Message-ID: <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> On Sat, Mar 1, 2008 at 8:27 AM, devnew at gmail.com wrote: > > > This example assumes that facearray is an ndarray.(like you described > > in original post ;-) ) It looks like you are using a matrix. > > hi Arnar > thanks .. > a few doubts however > > 1.when i use say 10 images of 4X3 each > > u, s, vt = linalg.svd(facearray, 0) > i will get vt of shape (10,12) > can't i take this as facespace? Yes, you may > why do i need to get the transpose? You dont need to. I did because then it would put the eigenvectors that span your column space as columns of the facespace array. I figured that would be easier for you, as that would be compatible with the use of eig (eigh) and matlab > then i can take as eigface_image0= vt[0].reshape(imgwdth,imght) > > 2.this way (svd) is diff from covariance matrix method. No it is not. You may be fooled by the scaling though. I see from the post above, that there may be some confusion here about svd and eig on a crossproduct matrix :-) Essentially, if X is a column centered array of size (num_images, num_pixels): u, s, vt = linalg.svd(X), Then, the columns of u span the space of dot(X, X.T), the rows of vt span the space of dot(X.T, X) and s is a vector of scaling coefficients. Another way of seeing this is that u spans the column space of X, and vt spans the row space of X. So, for a third view, the columns of u are the eigenvectors of dot(X, X.T) and the rows of vt contains the eigenvectors of dot(X.T, X). Now, in your, `covariance method` you use eigh(dot(X, X.T)), where the eigenvectors would be exactly the same as u(the array) from an svd on X. In order to recover the facespace you use facespace=dot(X.T, u). This facespace is the same as s*vt.T, where s and vt are from the svd. In my example, the eigenvectors spanning the column space were scaled. I called this for scores: (u*s) In your computation the facespace gets scaled implicit. Where to put the scale is different from application to application and has no clear definition. I dont know if this made anything any clearer. However, a simple example may be clearer: ------- # X is (a ndarray, *not* matrix) column centered with vectorized images in rows # method 1: XX = dot(X, X.T) s, u = linalg.eigh(XX) reorder = s.argsort()[::-1] facespace = dot(X.T, u[:,reorder]) # method 2: u, s, vt = svd(X, 0) facespace2 = s*vt.T ------ This gives identical result. Please remember that eigenvector signs are arbitrary when comparing. > if i am to do > it using the later ,how can i get the eigenface image data? Just like I described before Arnar From arnar.flatberg at gmail.com Sat Mar 1 12:58:46 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Sat, 1 Mar 2008 18:58:46 +0100 Subject: [Numpy-discussion] svd() and eigh() In-Reply-To: <090949fd-795c-4152-8df9-49b3182aed02@i12g2000prf.googlegroups.com> References: <090949fd-795c-4152-8df9-49b3182aed02@i12g2000prf.googlegroups.com> Message-ID: <5d3194020803010958ha0904aayb79c0673d5cdd19f@mail.gmail.com> On Sat, Mar 1, 2008 at 2:43 PM, devnew at gmail.com wrote: > hi > i have a set of images of faces which i make into a 2d array using > numpy.ndarray > each row represents a face image > faces= > [[ 173. 87. ... 88. 165.] > [ 158. 103. .. 73. 143.] > [ 180. 87. .. 55. 143.] > [ 155. 117. .. 93. 155.]] > > from which i can get the mean image => > avgface=average(faces,axis=0) > and calculate the adjustedfaces=faces-avgface > > now if i apply svd() i get > u, s, vt = linalg.svd(adjustedfaces, 0) > # a member posted this > facespace=vt.transpose() > > and if i calculate covariance matrix > covmat=matrix(adjustedfaces)* matrix(adjustedfaces).transpose() > eval,evect=eigh(covmat) > evect=sortbyeigenvalue(evect) # sothat largest eval is first > facespace=evect* matrix(adjustedfaces) > > what is the difference btw these 2 methods? See my answer, in your other post > apparently they yield > different values for the facespace. Not really. > which should i follow? The svd is a little less efficient and slightly slower. However it is clear in implementation and may, in some rare situations, be more precise. > is it possible to calculate eigenvectors using svd()? Again, see me other response. > > thanks > D > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From dalcinl at gmail.com Sat Mar 1 14:43:56 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 1 Mar 2008 16:43:56 -0300 Subject: [Numpy-discussion] numpy and roundoff(?) Message-ID: Dear all, I want to comment some extrange stuff I'm experiencing with numpy. Please, let me know if this is expected and known. I'm trying to solve a model nonlinear PDE, 2D Bratu problem (-Lapacian u - alpha * exp(u), homogeneus bondary conditions), using the simple finite differences with a 5-point stencil. I implemented the finite diference scheme in pure-numpy, and also in a F90 subroutine, next wrapped with f2py. Next, I use PETSc (through petsc4py) to solve the problem with a Newton method, a Krylov solver, and a matrix-free technique for the Jacobian (that is, the matrix is never explicitelly assembled, its action on a vector is approximated again with a 1st. order finite direrence formula). And the, surprise! The pure-numpy implementation accumulates many more inner linear iterations (about 25%) in the complete nonlinear solution loop than the one using the F90 code wrapped with f2py. Additionally, PETSc have in its source distribution a similar example, but implemented in C and using some PETSc utilities for managing structured grids. In short, this code is in C and completelly unrelated to the previously commented code. After running this example, I get almost the same results that the one for my petsc4py + F90 code. All this surprised me. It seems that for some reason numpy is accumulating some roundoff, and this is afecting the acuracy of the aproximated Jacobian, and then the linear solvers need more iteration to converge. Unfortunatelly, I cannot offer a self contained example, as this code depends on having PETSc and petsc4py. Of course, I could write myself the nonlinear loop, and a CG solver, but I am really busy. Can someone comment on this? Is all this expected? Have any of you experienced somethig similar? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From pav at iki.fi Sat Mar 1 15:32:00 2008 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 01 Mar 2008 22:32:00 +0200 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: <1204403520.7219.7.camel@localhost.localdomain> Hi, la, 2008-03-01 kello 16:43 -0300, Lisandro Dalcin kirjoitti: > I want to comment some extrange stuff I'm experiencing with numpy. > Please, let me know if this is expected and known. > > I'm trying to solve a model nonlinear PDE, 2D Bratu problem (-Lapacian > u - alpha * exp(u), homogeneus bondary conditions), using the simple > finite differences with a 5-point stencil. > > I implemented the finite diference scheme in pure-numpy, and also in a > F90 subroutine, next wrapped with f2py. > > Next, I use PETSc (through petsc4py) to solve the problem with a > Newton method, a Krylov solver, and a matrix-free technique for the > Jacobian (that is, the matrix is never explicitelly assembled, its > action on a vector is approximated again with a 1st. order finite > direrence formula). > > And the, surprise! The pure-numpy implementation accumulates many more > inner linear iterations (about 25%) in the complete nonlinear solution > loop than the one using the F90 code wrapped with f2py. A silly question: did you check directly that the pure-numpy code and the F90 code give the same results for the Jacobian-vector product J(z0) z for some randomly chosen vectors z0, z? -- Pauli Virtanen From charlesr.harris at gmail.com Sat Mar 1 15:37:25 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Mar 2008 13:37:25 -0700 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: On Sat, Mar 1, 2008 at 12:43 PM, Lisandro Dalcin wrote: > Dear all, > > I want to comment some extrange stuff I'm experiencing with numpy. > Please, let me know if this is expected and known. > > I'm trying to solve a model nonlinear PDE, 2D Bratu problem (-Lapacian > u - alpha * exp(u), homogeneus bondary conditions), using the simple > finite differences with a 5-point stencil. > > I implemented the finite diference scheme in pure-numpy, and also in a > F90 subroutine, next wrapped with f2py. > > Next, I use PETSc (through petsc4py) to solve the problem with a > Newton method, a Krylov solver, and a matrix-free technique for the > Jacobian (that is, the matrix is never explicitelly assembled, its > action on a vector is approximated again with a 1st. order finite > direrence formula). > > And the, surprise! The pure-numpy implementation accumulates many more > inner linear iterations (about 25%) in the complete nonlinear solution > loop than the one using the F90 code wrapped with f2py. > > Additionally, PETSc have in its source distribution a similar example, > but implemented in C and using some PETSc utilities for managing > structured grids. In short, this code is in C and completelly > unrelated to the previously commented code. After running this > example, I get almost the same results that the one for my petsc4py + > F90 code. > > All this surprised me. It seems that for some reason numpy is > accumulating some roundoff, and this is afecting the acuracy of the > aproximated Jacobian, and then the linear solvers need more iteration > to converge. > > Unfortunatelly, I cannot offer a self contained example, as this code > depends on having PETSc and petsc4py. Of course, I could write myself > the nonlinear loop, and a CG solver, but I am really busy. > > Can someone comment on this? Is all this expected? Have any of you > experienced somethig similar? > > Could you attach the pure numpy solution along with a test case (alpha=?). Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Sat Mar 1 16:03:22 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 1 Mar 2008 18:03:22 -0300 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: <1204403520.7219.7.camel@localhost.localdomain> References: <1204403520.7219.7.camel@localhost.localdomain> Message-ID: On 3/1/08, Pauli Virtanen wrote: > A silly question: did you check directly that the pure-numpy code and > the F90 code give the same results for the Jacobian-vector product > J(z0) z for some randomly chosen vectors z0, z? No, I did not do that. However, I've checked the output of of the finite diferencing routines for random X input of 32*32 and alpha=6.8, and the maximum difference is always 4.4408920985e-16. At first, this seems good. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Sat Mar 1 16:08:18 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 1 Mar 2008 18:08:18 -0300 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: Dear Charles, As I said, I have no time to code the pure Python+numpy nonlinear and linear loops, and the matrix-free stuff to mimic the PETSc implementation. However, I post the F90 code and the numpy code, and a small script for testing with random input. When I have some spare time, I'll try to do the complete application in pure python. Regards, On 3/1/08, Charles R Harris wrote: > > > > On Sat, Mar 1, 2008 at 12:43 PM, Lisandro Dalcin wrote: > > Dear all, > > > > I want to comment some extrange stuff I'm experiencing with numpy. > > Please, let me know if this is expected and known. > > > > I'm trying to solve a model nonlinear PDE, 2D Bratu problem (-Lapacian > > u - alpha * exp(u), homogeneus bondary conditions), using the simple > > finite differences with a 5-point stencil. > > > > I implemented the finite diference scheme in pure-numpy, and also in a > > F90 subroutine, next wrapped with f2py. > > > > Next, I use PETSc (through petsc4py) to solve the problem with a > > Newton method, a Krylov solver, and a matrix-free technique for the > > Jacobian (that is, the matrix is never explicitelly assembled, its > > action on a vector is approximated again with a 1st. order finite > > direrence formula). > > > > And the, surprise! The pure-numpy implementation accumulates many more > > inner linear iterations (about 25%) in the complete nonlinear solution > > loop than the one using the F90 code wrapped with f2py. > > > > Additionally, PETSc have in its source distribution a similar example, > > but implemented in C and using some PETSc utilities for managing > > structured grids. In short, this code is in C and completelly > > unrelated to the previously commented code. After running this > > example, I get almost the same results that the one for my petsc4py + > > F90 code. > > > > All this surprised me. It seems that for some reason numpy is > > accumulating some roundoff, and this is afecting the acuracy of the > > aproximated Jacobian, and then the linear solvers need more iteration > > to converge. > > > > Unfortunatelly, I cannot offer a self contained example, as this code > > depends on having PETSc and petsc4py. Of course, I could write myself > > the nonlinear loop, and a CG solver, but I am really busy. > > > > Can someone comment on this? Is all this expected? Have any of you > > experienced somethig similar? > > > > > Could you attach the pure numpy solution along with a test case (alpha=?). > > Chuck > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- A non-text attachment was scrubbed... Name: bratu2dlib.f90 Type: application/octet-stream Size: 862 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bratu2dnpy.py Type: text/x-python Size: 454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 372 bytes Desc: not available URL: From charlesr.harris at gmail.com Sat Mar 1 16:49:22 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Mar 2008 14:49:22 -0700 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: 2008/3/1 Lisandro Dalcin : > Dear Charles, > > As I said, I have no time to code the pure Python+numpy nonlinear and > linear loops, and the matrix-free stuff to mimic the PETSc > implementation. However, I post the F90 code and the numpy code, and a > small script for testing with random input. When I have some spare > time, I'll try to do the complete application in pure python. > > Regards, > > On 3/1/08, Charles R Harris wrote: > > > > > > > > On Sat, Mar 1, 2008 at 12:43 PM, Lisandro Dalcin > wrote: > > > Dear all, > > > > > > I want to comment some extrange stuff I'm experiencing with numpy. > > > Please, let me know if this is expected and known. > > > > > > I'm trying to solve a model nonlinear PDE, 2D Bratu problem (-Lapacian > > > u - alpha * exp(u), homogeneus bondary conditions), using the simple > > > finite differences with a 5-point stencil. > > > > > > I implemented the finite diference scheme in pure-numpy, and also in a > > > F90 subroutine, next wrapped with f2py. > > > > > > Next, I use PETSc (through petsc4py) to solve the problem with a > > > Newton method, a Krylov solver, and a matrix-free technique for the > > > Jacobian (that is, the matrix is never explicitelly assembled, its > > > action on a vector is approximated again with a 1st. order finite > > > direrence formula). > > > > > > And the, surprise! The pure-numpy implementation accumulates many more > > > inner linear iterations (about 25%) in the complete nonlinear solution > > > loop than the one using the F90 code wrapped with f2py. > > > > > > Additionally, PETSc have in its source distribution a similar example, > > > but implemented in C and using some PETSc utilities for managing > > > structured grids. In short, this code is in C and completelly > > > unrelated to the previously commented code. After running this > > > example, I get almost the same results that the one for my petsc4py + > > > F90 code. > > > > > > All this surprised me. It seems that for some reason numpy is > > > accumulating some roundoff, and this is afecting the acuracy of the > > > aproximated Jacobian, and then the linear solvers need more iteration > > > to converge. > > > > > > Unfortunatelly, I cannot offer a self contained example, as this code > > > depends on having PETSc and petsc4py. Of course, I could write myself > > > the nonlinear loop, and a CG solver, but I am really busy. > > > > > > Can someone comment on this? Is all this expected? Have any of you > > > experienced somethig similar? > > > > > > > > Could you attach the pure numpy solution along with a test case > (alpha=?). > > > Here are the differences as well as the values of F1 and F2 at the same point: D = 4.4408920985e-16 F1 = 2.29233319997 F2 = 2.29233319997 So they differ in the least significant bit. Not surprising, I expect the Fortran compiler might well perform operations in different order, accumulate in different places, etc. It might also accumulate in higher precision registers or round differently depending on hardware and various flags. The exp functions in Fortran and C might also return slightly different results. I don't think the differences are significant, but if you really want to compare results you will need a higher precision solution to compare against. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Sat Mar 1 17:19:37 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 1 Mar 2008 19:19:37 -0300 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: On 3/1/08, Charles R Harris wrote: > So they differ in the least significant bit. Not surprising, I expect the > Fortran compiler might well perform operations in different order, > accumulate in different places, etc. It might also accumulate in higher > precision registers or round differently depending on hardware and various > flags. Of course, but a completely unrelated but equivalent C implementation of this problem, as you can check in line 313 at this link http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/snes/examples/tutorials/ex5.c.html behaves almost the same that my F90 implemented residual. Perhaps Fortran compiler (gfortran) will generate the same code as the C one, but I'm not sure, Fortran compilers can be smarter that C compilers for this kind of looping. > The exp functions in Fortran and C might also return slightly > different results. I believe this is not the source of the problem, I've tried commenting that term, and differences are still there. > I don't think the differences are significant, but if you > really want to compare results you will need a higher precision solution to > compare against. I agree, the differences are not significant, but they end up having a noticeable impact. I'm still surprised!. Let's stop all this now. I'll be back as soon as I can produce some self-contained code to show and reproducing the problem. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Sat Mar 1 19:45:58 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 1 Mar 2008 21:45:58 -0300 Subject: [Numpy-discussion] how to pronounce numpy? Message-ID: Sorry for the stupid question, but my English knowledge just covers reading and writting (the last, not so good) At the very begining, http://scipy.org/ says SciPy (pronounced "Sigh Pie") ... Then, for the other guy, this assertion NumPy (pronounced "Num Pie", "Num" as in "Number") ... whould be valid? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From robert.kern at gmail.com Sat Mar 1 20:19:04 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 1 Mar 2008 19:19:04 -0600 Subject: [Numpy-discussion] how to pronounce numpy? In-Reply-To: References: Message-ID: <3d375d730803011719u4a9a6c5dna76beec5e818526d@mail.gmail.com> On Sat, Mar 1, 2008 at 6:45 PM, Lisandro Dalcin wrote: > Sorry for the stupid question, but my English knowledge just covers > reading and writting (the last, not so good) > > At the very begining, http://scipy.org/ says > > SciPy (pronounced "Sigh Pie") ... > > Then, for the other guy, this assertion > > NumPy (pronounced "Num Pie", "Num" as in "Number") ... > > whould be valid? Yes, that is how I pronounce them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From grrrr.org at gmail.com Sat Mar 1 20:56:04 2008 From: grrrr.org at gmail.com (Thomas Grill) Date: Sun, 2 Mar 2008 02:56:04 +0100 Subject: [Numpy-discussion] UFUNC_CHECK_STATUS cpu hog Message-ID: <71EC7D99-A305-4D55-A5C9-B0C92288015A@grrrr.org> Hi all, i did some profiling on OS X/Intel 10.5 (numpy 1.0.4) and was surprised to find calls to the system function feclearexcept to be by far the biggest cpu hog, taking away about 30% of the cpu in my case. Would it be possible to change UFUNC_CHECK_STATUS in ufuncobject.h in a way that feclearexcept is only called when necessary (fpstatus != 0), like in ufuncobject.h, line 292.... #define UFUNC_CHECK_STATUS(ret) { \ int fpstatus = (int) fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | \ FE_UNDERFLOW | FE_INVALID); \ if(__builtin_expect(fpstatus,0)) \ ret = 0; \ else { \ ret = ((FE_DIVBYZERO & fpstatus) ? UFUNC_FPE_DIVIDEBYZERO : 0) \ | ((FE_OVERFLOW & fpstatus) ? UFUNC_FPE_OVERFLOW : 0) \ | ((FE_UNDERFLOW & fpstatus) ? UFUNC_FPE_UNDERFLOW : 0) \ | ((FE_INVALID & fpstatus) ? UFUNC_FPE_INVALID : 0); \ (void) feclearexcept(FE_DIVBYZERO | FE_OVERFLOW | \ FE_UNDERFLOW | FE_INVALID); \ } \ } greetings, Thomas -- Thomas Grill http://grrrr.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2407 bytes Desc: not available URL: From oliphant at enthought.com Sat Mar 1 22:24:04 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Sat, 01 Mar 2008 21:24:04 -0600 Subject: [Numpy-discussion] UFUNC_CHECK_STATUS cpu hog In-Reply-To: <71EC7D99-A305-4D55-A5C9-B0C92288015A@grrrr.org> References: <71EC7D99-A305-4D55-A5C9-B0C92288015A@grrrr.org> Message-ID: <47CA1DD4.40805@enthought.com> Thomas Grill wrote: > Hi all, > i did some profiling on OS X/Intel 10.5 (numpy 1.0.4) and was > surprised to find calls to the system function feclearexcept to be by > far the biggest cpu hog, taking away about 30% of the cpu in my case. > Would it be possible to change UFUNC_CHECK_STATUS in ufuncobject.h in > a way that feclearexcept is only called when necessary (fpstatus != > 0), like in > > ufuncobject.h, line 292.... > > #define UFUNC_CHECK_STATUS(ret) { \ > int fpstatus = (int) fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | \ > FE_UNDERFLOW | FE_INVALID); \ > if(__builtin_expect(fpstatus,0)) \ Why the use of __builtin_expect here instead of fpstatus == 0? > ret = 0; \ > else { \ > ret = ((FE_DIVBYZERO & fpstatus) ? UFUNC_FPE_DIVIDEBYZERO : 0) \ > | ((FE_OVERFLOW & fpstatus) ? UFUNC_FPE_OVERFLOW : 0) \ > | ((FE_UNDERFLOW & fpstatus) ? UFUNC_FPE_UNDERFLOW : 0) \ > | ((FE_INVALID & fpstatus) ? UFUNC_FPE_INVALID : 0); \ > (void) feclearexcept(FE_DIVBYZERO | FE_OVERFLOW | \ > FE_UNDERFLOW | FE_INVALID); \ > } \ > } I don't see a problem with this... -Travis O. From grrrr.org at gmail.com Sat Mar 1 22:32:25 2008 From: grrrr.org at gmail.com (Thomas Grill) Date: Sun, 2 Mar 2008 04:32:25 +0100 Subject: [Numpy-discussion] UFUNC_CHECK_STATUS cpu hog In-Reply-To: <47CA1DD4.40805@enthought.com> References: <71EC7D99-A305-4D55-A5C9-B0C92288015A@grrrr.org> <47CA1DD4.40805@enthought.com> Message-ID: <60D09059-CCF9-4F12-8FEB-19C7BCE74FDF@grrrr.org> Am 02.03.2008 um 04:24 schrieb Travis E. Oliphant: > Thomas Grill wrote: >> Hi all, >> i did some profiling on OS X/Intel 10.5 (numpy 1.0.4) and was >> surprised to find calls to the system function feclearexcept to be by >> far the biggest cpu hog, taking away about 30% of the cpu in my case. >> Would it be possible to change UFUNC_CHECK_STATUS in ufuncobject.h in >> a way that feclearexcept is only called when necessary (fpstatus != >> 0), like in >> >> ufuncobject.h, line 292.... >> >> #define UFUNC_CHECK_STATUS(ret) >> { \ >> int fpstatus = (int) fetestexcept(FE_DIVBYZERO | FE_OVERFLOW >> | \ >> FE_UNDERFLOW | FE_INVALID); \ >> if(__builtin_expect(fpstatus,0)) \ > > Why the use of __builtin_expect here instead of fpstatus == 0? It's a branch hint for gcc, as fpstatus is very likely to be 0. If portability to older gcc versions is important, fpstatus == 0 is a better choice. greetings, Thomas -- Thomas Grill http://grrrr.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2407 bytes Desc: not available URL: From grrrr.org at gmail.com Sat Mar 1 22:54:08 2008 From: grrrr.org at gmail.com (Thomas Grill) Date: Sun, 2 Mar 2008 04:54:08 +0100 Subject: [Numpy-discussion] UFUNC_CHECK_STATUS cpu hog In-Reply-To: <47CA1DD4.40805@enthought.com> References: <71EC7D99-A305-4D55-A5C9-B0C92288015A@grrrr.org> <47CA1DD4.40805@enthought.com> Message-ID: <06FA49C3-53BD-46F4-9BC7-D3A853E3D375@grrrr.org> Am 02.03.2008 um 04:24 schrieb Travis E. Oliphant: >> if(__builtin_expect(fpstatus,0)) \ > > Why the use of __builtin_expect here instead of fpstatus == 0? Oops, nevertheless it should rather be something like if(__builtin_expect(fpstatus == 0,1)) or if(__builtin_expect(fpstatus,0) == 0) sorry for the noise, Thomas -- Thomas Grill http://grrrr.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2407 bytes Desc: not available URL: From oliphant at enthought.com Sat Mar 1 23:43:40 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Sat, 01 Mar 2008 22:43:40 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <47C6E5E9.4030201@enthought.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> <47C6E5E9.4030201@enthought.com> Message-ID: <47CA307C.9050406@enthought.com> Travis E. Oliphant wrote: > Sameer DCosta wrote: > >> Hi, >> >> I'm having trouble renaming record array fields if they contain object >> arrays in them. I followed the solutions posted by Robert Kern and >> Stefan van der Walt (Thanks again) but it doesn't look like this >> method works in all cases. For reference: >> http://projects.scipy.org/pipermail/numpy-discussion/2008-February/031509.html >> >> In [1]: from numpy import * >> >> In [2]: olddt = dtype([('foo', '|O4'), ('bar', float)]) >> >> In [3]: a = zeros(10, olddt) >> Can you try: olddt.names = ['notfoo', 'notbar'] on a recent SVN tree. This should now work.... -Travis From oliphant at enthought.com Sat Mar 1 23:45:29 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Sat, 01 Mar 2008 22:45:29 -0600 Subject: [Numpy-discussion] A little help please? In-Reply-To: <47C578D1.5060307@enthought.com> References: <47C42CB2.7080007@enthought.com> <47C578D1.5060307@enthought.com> Message-ID: <47CA30E9.6020107@enthought.com> Travis E. Oliphant wrote: > Neal Becker wrote: > >> Travis E. Oliphant wrote: >> >> >> >> >> The code for this is a bit hard to understand. It does appear that it only >> searches for a conversion on the 2nd argument. I don't think that's >> desirable behavior. >> >> What I'm wondering is, this works fine for builtin types. What is different >> in the handling of builtin types? >> >> > > 3) For user-defined types the 1d loops (functions) for a particular > user-defined type are stored in a linked-list that itself is stored in a > Python dictionary (as a C-object) attached to the ufunc and keyed by the > user-defined type (of the first argument). > > Thus, what is missing is code to search all the linked lists in all the > entries of all the user-defined types on input (only the linked-list > keyed by the first user-defined type is searched at the moment). This > would allow similar behavior to the built-in types (but a bit more > expensive searching). > This code is now in place in current SVN. Could you re-try your example with the current code-base to see if it is fixed. Thanks, -Travis From eads at soe.ucsc.edu Sat Mar 1 23:54:25 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sat, 01 Mar 2008 21:54:25 -0700 Subject: [Numpy-discussion] how to pronounce numpy? In-Reply-To: <3d375d730803011719u4a9a6c5dna76beec5e818526d@mail.gmail.com> References: <3d375d730803011719u4a9a6c5dna76beec5e818526d@mail.gmail.com> Message-ID: <47CA3301.30601@soe.ucsc.edu> Robert Kern wrote: > On Sat, Mar 1, 2008 at 6:45 PM, Lisandro Dalcin wrote: >> Sorry for the stupid question, but my English knowledge just covers >> reading and writting (the last, not so good) >> >> At the very begining, http://scipy.org/ says >> >> SciPy (pronounced "Sigh Pie") ... >> >> Then, for the other guy, this assertion >> >> NumPy (pronounced "Num Pie", "Num" as in "Number") ... >> >> whould be valid? > > Yes, that is how I pronounce them. I'll admit I've been pronouncing them num-pee because I think it's more endearing even though I've been told by many others that num-pie is the pronunciation most people use. Damia From eads at soe.ucsc.edu Sun Mar 2 00:29:34 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sat, 01 Mar 2008 22:29:34 -0700 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: Message-ID: <47CA3B3E.60203@soe.ucsc.edu> Lisandro Dalcin wrote: > On 3/1/08, Charles R Harris wrote: >> So they differ in the least significant bit. Not surprising, I expect the >> Fortran compiler might well perform operations in different order, >> accumulate in different places, etc. It might also accumulate in higher >> precision registers or round differently depending on hardware and various >> flags. > > Of course, but a completely unrelated but equivalent C implementation > of this problem, as you can check in line 313 at this link > > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/snes/examples/tutorials/ex5.c.html > > behaves almost the same that my F90 implemented residual. Perhaps > Fortran compiler (gfortran) will generate the same code as the C one, > but I'm not sure, Fortran compilers can be smarter that C compilers > for this kind of looping. > >> The exp functions in Fortran and C might also return slightly >> different results. > > I believe this is not the source of the problem, I've tried commenting > that term, and differences are still there. > >> I don't think the differences are significant, but if you >> really want to compare results you will need a higher precision solution to >> compare against. > > I agree, the differences are not significant, but they end up having a > noticeable impact. I'm still surprised!. > > Let's stop all this now. I'll be back as soon as I can produce some > self-contained code to show and reproducing the problem. At work we noticed a significant difference in results occurring in two versions of our code, an earlier version written in C++, and a later version written in Python/numpy. The algorithms were structured about the same. My colleague found the cause of the discrepancy; it turned out to be a difference in the way numpy and the C++ program were compiled. One used -mfpmath=sse, and the other, -mfpmath=387. Keeping them both the same cleared the discrepancy. Damian From devnew at gmail.com Sun Mar 2 00:59:56 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Sat, 1 Mar 2008 21:59:56 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> Message-ID: <7818ef0c-0e6e-400f-9c53-0ae53ab53e8d@s19g2000prg.googlegroups.com> > I dont know if this made anything any clearer. However, a simple > example may be clearer: thanks Arnar for the kind response,now things are a lot clearer...will try out in code .. D From sransom at nrao.edu Sun Mar 2 10:27:46 2008 From: sransom at nrao.edu (Scott Ransom) Date: Sun, 2 Mar 2008 10:27:46 -0500 Subject: [Numpy-discussion] fromfile (binary) double free or corruption Message-ID: <20080302152746.GA2693@ssh.cv.nrao.edu> Hi All, So I've just come upon a new(ish?) bug in fromfile. I'm running numpy from subversion rev 4839. Seems that if you try to read a number of items from a binary file but none are read (i.e. you are already at the EOF), you get the following: 4096 items requested but only 0 read *** glibc detected *** python: double free or corruption (!prev): 0x00000000009f5340 *** and the code needs to be killed. I ran my code under gdb and got the following traceback (just keeping the important lines): #14 0x00002ae639f8b34b in backtrace () from /lib/libc.so.6 #15 0x00002ae639f1ff9f in ?? () from /lib/libc.so.6 #16 0x00002ae639f2505d in ?? () from /lib/libc.so.6 #17 0x00002ae639f26d66 in free () from /lib/libc.so.6 #18 0x00002ae63a67feeb in array_dealloc (self=0x9d9880) at numpy/core/src/arrayobject.c:1954 #19 0x00002ae63a67a6d0 in PyArray_FromFile (fp=0x78d930, dtype=0x2ae63a8ba020, num=4096, sep=) at numpy/core/src/multiarraymodule.c:6316 #20 0x00002ae63a67a804 in array_fromfile (ignored=, args=, keywds=) at numpy/core/src/multiarraymodule.c:6361 #21 0x0000000000415520 in PyObject_Call () #22 0x0000000000473849 in PyEval_EvalFrame () #23 0x0000000000477905 in PyEval_EvalCodeEx () #24 0x0000000000477a32 in PyEval_EvalCode () Seems like the bad call is the Py_DECREF(ret); on line 6316 of multiarraymodule.c, which occurs just after a PyDataMem_RENEW() (i.e. realloc) call. I tried to find recent changes in svn that might have caused this, but couldn't see anything that seemed relevant. One thing that has changed recently on my system is that I'm now using the new glibc (v2.7) on Debian unstable. Let me know if you need more information. Thanks, Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From oliphant at enthought.com Sun Mar 2 11:36:05 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Sun, 02 Mar 2008 10:36:05 -0600 Subject: [Numpy-discussion] fromfile (binary) double free or corruption In-Reply-To: <20080302152746.GA2693@ssh.cv.nrao.edu> References: <20080302152746.GA2693@ssh.cv.nrao.edu> Message-ID: <47CAD775.3030506@enthought.com> Scott Ransom wrote: > > > Seems like the bad call is the Py_DECREF(ret); on line 6316 of > multiarraymodule.c, which occurs just after a PyDataMem_RENEW() > (i.e. realloc) call. > > I tried to find recent changes in svn that might have caused this, > but couldn't see anything that seemed relevant. One thing that > has changed recently on my system is that I'm now using the new > glibc (v2.7) on Debian unstable. > This looks like the behavior of realloc has changed when called with 0 as the size. We should avoid calling realloc with a size of 0 as it looks like the behavior is different depending on libc. Please check out the latest SVN and see if my fix improves things. -Travis O. From sransom at nrao.edu Sun Mar 2 14:52:06 2008 From: sransom at nrao.edu (Scott Ransom) Date: Sun, 2 Mar 2008 14:52:06 -0500 Subject: [Numpy-discussion] fromfile (binary) double free or corruption In-Reply-To: <47CAD775.3030506@enthought.com> References: <20080302152746.GA2693@ssh.cv.nrao.edu> <47CAD775.3030506@enthought.com> Message-ID: <20080302195206.GA3521@ssh.cv.nrao.edu> Hi Travis, That fixes the problem that I reported such that there is no glibc issue anymore. However, it does result in a change in behaviour for fromfile. Previously, when no data was returned an exception was raised. With the new fix there is no exception, and an empty array is returned. Code (like mine) that depended on an exception being thrown at EOF will break. I've fixed my code, but this could bite others. Thanks for the prompt fix. Scott On Sun, Mar 02, 2008 at 10:36:05AM -0600, Travis E. Oliphant wrote: > Scott Ransom wrote: > > > > > > Seems like the bad call is the Py_DECREF(ret); on line 6316 of > > multiarraymodule.c, which occurs just after a PyDataMem_RENEW() > > (i.e. realloc) call. > > > > I tried to find recent changes in svn that might have caused this, > > but couldn't see anything that seemed relevant. One thing that > > has changed recently on my system is that I'm now using the new > > glibc (v2.7) on Debian unstable. > > > This looks like the behavior of realloc has changed when called with 0 > as the size. We should avoid calling realloc with a size of 0 as it > looks like the behavior is different depending on libc. > > Please check out the latest SVN and see if my fix improves things. > > -Travis O. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -- -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From oliphant at enthought.com Sun Mar 2 16:23:05 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Sun, 02 Mar 2008 15:23:05 -0600 Subject: [Numpy-discussion] fromfile (binary) double free or corruption In-Reply-To: <20080302195206.GA3521@ssh.cv.nrao.edu> References: <20080302152746.GA2693@ssh.cv.nrao.edu> <47CAD775.3030506@enthought.com> <20080302195206.GA3521@ssh.cv.nrao.edu> Message-ID: <47CB1AB9.5090501@enthought.com> Scott Ransom wrote: > Hi Travis, > > That fixes the problem that I reported such that there is no glibc > issue anymore. > > However, it does result in a change in behaviour for fromfile. > > Previously, when no data was returned an exception was raised. > With the new fix there is no exception, and an empty array is > returned. Code (like mine) that depended on an exception being > thrown at EOF will break. I've fixed my code, but this could bite > others. > This should be fixed. I'll restore the exception. Thanks for checking on it and clarifying. -teo From devnew at gmail.com Mon Mar 3 03:03:57 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Mon, 3 Mar 2008 00:03:57 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> Message-ID: >Arnar wrote > I dont know if this made anything any clearer. However, a simple > example may be clearer: > # X is (a ndarray, *not* matrix) column centered with vectorized images in rows > # method 1: > XX = dot(X, X.T) > s, u = linalg.eigh(XX) > reorder = s.argsort()[::-1] > facespace = dot(X.T, u[:,reorder]) ok..this and # method 2: (ie svd()) returns same facespace ..and i can get eigenface images i read in some document on the topic of eigenfaces that 'Multiplying the sorted eigenvector with face vector results in getting the face-space vector' facespace=sortedeigenvectorsmatrix * adjustedfacematrix (when these are numpy.matrices ) that is why the confusion about transposing X inside facespace=dot(X.T,u[:,reorder]) if i make matrices out of sortedeigenvectors, adjustedfacematrix then i will get facespace =sortedeigenvectorsmatrix * adjustedfacematrix which has a different set of elements than that obtained by dot(X.T, u[:,reorder]). the result differs in some scaling factor? i couldn't get any clear eigenface images out of this facespace:-( D From ndbecker2 at gmail.com Mon Mar 3 06:37:10 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 03 Mar 2008 06:37:10 -0500 Subject: [Numpy-discussion] A little help please? References: <47C42CB2.7080007@enthought.com> <47C578D1.5060307@enthought.com> <47CA30E9.6020107@enthought.com> Message-ID: Travis E. Oliphant wrote: > Travis E. Oliphant wrote: >> Neal Becker wrote: >> >>> Travis E. Oliphant wrote: >>> >>> >>> >>> >>> The code for this is a bit hard to understand. It does appear that it >>> only >>> searches for a conversion on the 2nd argument. I don't think that's >>> desirable behavior. >>> >>> What I'm wondering is, this works fine for builtin types. What is >>> different in the handling of builtin types? >>> >>> >> >> 3) For user-defined types the 1d loops (functions) for a particular >> user-defined type are stored in a linked-list that itself is stored in a >> Python dictionary (as a C-object) attached to the ufunc and keyed by the >> user-defined type (of the first argument). >> >> Thus, what is missing is code to search all the linked lists in all the >> entries of all the user-defined types on input (only the linked-list >> keyed by the first user-defined type is searched at the moment). This >> would allow similar behavior to the built-in types (but a bit more >> expensive searching). >> > This code is now in place in current SVN. Could you re-try your example > with the current code-base to see if it is fixed. > > Thanks, > > -Travis It seems to have broken 1 test: FAIL: Test of inplace operations and rich comparisons ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.5/site-packages/numpy/ma/tests/test_old_ma.py", line 480, in check_testInplace assert id1 == id(x.data) AssertionError ---------------------------------------------------------------------- Ran 801 tests in 1.229s FAILED (failures=1) But looks like my test is working. BTW, don't forget the patch I sent diff --git a/numpy/core/src/ufuncobject.c b/numpy/core/src/ufuncobject.c --- a/numpy/core/src/ufuncobject.c +++ b/numpy/core/src/ufuncobject.c @@ -3434,10 +3434,10 @@ static int cmp_arg_types(int *arg1, int *arg2, int n) { - while (n--) { - if (PyArray_EquivTypenums(*arg1, *arg2)) continue; - if (PyArray_CanCastSafely(*arg1, *arg2)) - return -1; + for (;n > 0; n--, ++arg1, ++arg2) { + if (PyArray_EquivTypenums(*arg1, *arg2) || + PyArray_CanCastSafely(*arg1, *arg2)) + continue; return 1; } return 0; From millman at berkeley.edu Mon Mar 3 12:21:23 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Mon, 3 Mar 2008 09:21:23 -0800 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday Message-ID: Hello, I would like to tag the 1.0.5 release on Wednesday night and announce the release by Monday (3/10). If you have anything that you would like to get in before then, please do it now. It would also be great if everyone could test the trunk. If anyone finds a bug or regression that should delay the release, please send an email to the list ASAP. Please take a look at the release notes and let me know if you see anything that needs to be changed or updated: http://projects.scipy.org/scipy/numpy/milestone/1.0.5 Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From arnar.flatberg at gmail.com Mon Mar 3 12:42:32 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Mon, 3 Mar 2008 18:42:32 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> Message-ID: <5d3194020803030942i1a6eeaa5rddf515b8176e4c3b@mail.gmail.com> > i read in some document on the topic of eigenfaces that > 'Multiplying the sorted eigenvector with face vector results in > getting the > face-space vector' > facespace=sortedeigenvectorsmatrix * adjustedfacematrix > (when these are numpy.matrices ) This will not work with numpy matrices.* is elementwise mult. > that is why the confusion about transposing X inside > > facespace=dot(X.T,u[:,reorder]) > > if i make matrices out of sortedeigenvectors, adjustedfacematrix > then > i will get facespace =sortedeigenvectorsmatrix * adjustedfacematrix > which has a different set of elements than that obtained by > dot(X.T, u[:,reorder]). No, they are the same. u[:, reorder] *is* the sortedeigenvectormatrix, and the transpose of a matrixproduct: (A*B).T == B.T*A, so your facespace is just the transpose of mine. I dont know why you are getting the end result wrong. Perhaps you are reshaping wrong? I'll try a complete example :-) Get example data: http://www.cs.toronto.edu/~roweis/data/frey_rawface.mat ----- import scipy as sp from matplotlib.pyplot import * fn = "frey_rawface.mat" data = sp.asarray(sp.io.loadmat(fn)['ff'], dtype='d').T data = data - data.mean(0) u, s, vt = sp.linalg.svd(data, 0) # plot the first 6 eigenimages for i in range(6): subplot(2,3,i+1), imshow(vt[i].reshape((28,20)), cmap=cm.gray) axis('image'), xticks([]), yticks([]) title("First 6 eigenfaces") ------ Arnar From Chris.Barker at noaa.gov Mon Mar 3 12:52:42 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 03 Mar 2008 09:52:42 -0800 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: <47CA3B3E.60203@soe.ucsc.edu> References: <47CA3B3E.60203@soe.ucsc.edu> Message-ID: <47CC3AEA.7080209@noaa.gov> Damian Eads wrote: > At work we noticed a significant difference in results occurring in two > versions of our code, an earlier version written in C++, and a later > version written in Python/numpy. The algorithms were structured about > the same. My colleague found the cause of the discrepancy; it turned out > to be a difference in the way numpy and the C++ program were compiled. > One used -mfpmath=sse, and the other, -mfpmath=387. Keeping them both > the same cleared the discrepancy. Was it really a "significant" difference, or just noticeable? I hope not, that would be pretty scary! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From aisaac at american.edu Mon Mar 3 12:56:29 2008 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 3 Mar 2008 12:56:29 -0500 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: Message-ID: I never got a response to this: (Two different types claim to be numpy.int32.) Cheers, Alan From dmitrey.kroshko at scipy.org Mon Mar 3 13:09:45 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Mon, 03 Mar 2008 20:09:45 +0200 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References:

Message-ID: <47CC3EE9.2070800@scipy.org> Also, it would be very well if asfarray() doesn't drop down float128 to float64. D. Alan G Isaac wrote: > I never got a response to this: > > (Two different types claim to be numpy.int32.) > > Cheers, > Alan > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > From arnar.flatberg at gmail.com Mon Mar 3 13:12:56 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Mon, 3 Mar 2008 19:12:56 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020803030942i1a6eeaa5rddf515b8176e4c3b@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> <5d3194020803030942i1a6eeaa5rddf515b8176e4c3b@mail.gmail.com> Message-ID: <5d3194020803031012p2d1679aax1b2c24ab54a0d182@mail.gmail.com> > This will not work with numpy matrices.* is elementwise mult. Sorry, disregard that comment From yves.revaz at obspm.fr Mon Mar 3 14:20:54 2008 From: yves.revaz at obspm.fr (Revaz Yves) Date: Mon, 03 Mar 2008 20:20:54 +0100 Subject: [Numpy-discussion] cross Message-ID: <47CC4F96.5090905@obspm.fr> Dear List, I'm computing the cross product of positions and velocities of n points in a 3d space. Using the numpy function "cross", this can be written as : n=1000 pos = random.random([n,3]) vel = random.random([n,3]) cross(pos,vel) I compare the computation time needed with a C-api I wrote (dedicated to this operation). It appears that my api is in average 20 times faster than the cross function (for n between 100 and 1000000), making the latter useless for my purpose :-( . Is it normal ? or I'm I using the "cross" function the wrong way ? yves PS :Here after you can see some lines the of the C-api. if (!PyArg_ParseTuple(args, "OO", &pos , &vel)) return NULL; /* create a NumPy object similar to the input */ int ld[2]; ld[0]=pos->dimensions[0]; ld[1]=pos->dimensions[1]; lxyz = (PyArrayObject *) PyArray_FromDims(pos->nd,ld,pos->descr->type_num); /* loops over all elements */ for (i = 0; i < pos->dimensions[0]; i++) { x = (float *) (pos->data + i*(pos->strides[0]) ); y = (float *) (pos->data + i*(pos->strides[0]) + 1*pos->strides[1]); z = (float *) (pos->data + i*(pos->strides[0]) + 2*pos->strides[1]); vx = (float *) (vel->data + i*(vel->strides[0]) ); vy = (float *) (vel->data + i*(vel->strides[0]) + 1*vel->strides[1]); vz = (float *) (vel->data + i*(vel->strides[0]) + 2*vel->strides[1]); lx = (*y * *vz - *z * *vy); ly = (*z * *vx - *x * *vz); lz = (*x * *vy - *y * *vx); *(float *)(lxyz->data + i*(lxyz->strides[0]) + 0*lxyz->strides[1]) = lx; *(float *)(lxyz->data + i*(lxyz->strides[0]) + 1*lxyz->strides[1]) = ly; *(float *)(lxyz->data + i*(lxyz->strides[0]) + 2*lxyz->strides[1]) = lz; } From oliphant at enthought.com Mon Mar 3 14:41:33 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 03 Mar 2008 13:41:33 -0600 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References:

Message-ID: <47CC546D.7050908@enthought.com> Alan G Isaac wrote: > I never got a response to this: > > (Two different types claim to be numpy.int32.) > It's not a bug :-) There are two c-level types that are both 32-bit (on 32-bit systems). -Travis From subscriber100 at rjs.org Mon Mar 3 14:57:12 2008 From: subscriber100 at rjs.org (Ray Schumacher) Date: Mon, 03 Mar 2008 11:57:12 -0800 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: References: Message-ID: <6.2.3.4.2.20080303112304.04d97c10@rjs.org> I'm trying to figure out what numpy.correlate does, and, what are people using to calculate the phase shift of 1D signals? (I coded on routine that uses rfft, conjugate, ratio, irfft, and argmax based on a paper by Hongjie Xie "An IDL/ENVI implementation of the FFT Based Algorithm for Automatic Image Registration" - but that seems more intensive than it could be.) In numpy, an identity import numpy from pylab import * l=[1,5,3,8,15,6,7,7,9,10,4] c=numpy.correlate(l,l, mode='same') plot(c) peaks at the center, x=5, and is symmetric when the data is rotated by 2 c=numpy.correlate(l, l[-2:]+l[:-2], mode='same') it peaks at x=3 I was expecting, I guess, that the peak should reflect the x axis shift, as in http://en.wikipedia.org/wiki/Cross-correlation#Explanation If I use a real time domain signal like http://rjs.org/Python/sample.sig fh = open(r'sample.sig','rb') s1 = numpy.fromstring(fh.read(), numpy.int32) fh.close() an identity like c=numpy.correlate(s1, s1, mode='same') plots like noise. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.3/1308 - Release Date: 3/3/2008 10:01 AM From charlesr.harris at gmail.com Mon Mar 3 15:46:54 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 3 Mar 2008 13:46:54 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: Message-ID: On Mon, Mar 3, 2008 at 10:21 AM, Jarrod Millman wrote: > Hello, > > I would like to tag the 1.0.5 release on Wednesday night and announce > the release by Monday (3/10). If you have anything that you would > like to get in before then, please do it now. It would also be great > if everyone could test the trunk. If anyone finds a bug or regression > that should delay the release, please send an email to the list ASAP. > > Please take a look at the release notes and let me know if you see > anything that needs to be changed or updated: > http://projects.scipy.org/scipy/numpy/milestone/1.0.5 > > Thanks, > I think ticket 597 should be pretty easy to fix. I just want to make sure everyone agrees it should be fixed. http://projects.scipy.org/scipy/numpy/ticket/597 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Mon Mar 3 16:13:12 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 03 Mar 2008 15:13:12 -0600 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: Message-ID: <47CC69E8.70501@enthought.com> Charles R Harris wrote: > > > On Mon, Mar 3, 2008 at 10:21 AM, Jarrod Millman > wrote: > > Hello, > > I would like to tag the 1.0.5 release on Wednesday night and announce > the release by Monday (3/10). If you have anything that you would > like to get in before then, please do it now. It would also be great > if everyone could test the trunk. If anyone finds a bug or regression > that should delay the release, please send an email to the list ASAP. > > Please take a look at the release notes and let me know if you see > anything that needs to be changed or updated: > http://projects.scipy.org/scipy/numpy/milestone/1.0.5 > > Thanks, > > > I think ticket 597 should be pretty easy to fix. I just want to make > sure everyone agrees it should be fixed. I can't imagine someone "depending" on this behavior. And it should be consistent between 32-bit and 64-bit systems. -Travis From tim.hochberg at ieee.org Mon Mar 3 16:24:49 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 3 Mar 2008 14:24:49 -0700 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: <6.2.3.4.2.20080303112304.04d97c10@rjs.org> References: <6.2.3.4.2.20080303112304.04d97c10@rjs.org> Message-ID: On Mon, Mar 3, 2008 at 12:57 PM, Ray Schumacher wrote: > I'm trying to figure out what numpy.correlate does, and, what are people > using to calculate the phase shift of 1D signals? > > (I coded on routine that uses rfft, conjugate, ratio, irfft, and argmax > based on a paper by Hongjie Xie "An IDL/ENVI implementation of the FFT Based > Algorithm for Automatic Image Registration" - but that seems more intensive > than it could be.) > > In numpy, an identity import numpy from pylab import * l=[1,5,3,8,15,6,7,7,9,10,4] > c=numpy.correlate(l,l, mode='same') plot(c) peaks at the center, x=5, and > is symmetric > > when the data is rotated by 2 c=numpy.correlate(l, l[-2:]+l[:-2], > mode='same') it peaks at x=3 > > I was expecting, I guess, that the peak should reflect the x axis shift, > as in > http://en.wikipedia.org/wiki/Cross-correlation#Explanation > Interesting. This appears to be a result of the implementation of the various modes. If you use the 'valid' mode, you'll get 0, as I presume you'll expect. If you use 'same' or 'full' you'll end of with different amounts of offset. I imagine that this is due to the way the data is padded. The offset should be deterministic based on the mode and the size of the data, so it should be straightforward to compensate for. > > > If I use a real time domain signal like > http://rjs.org/Python/sample.sig fh = open(r'sample.sig','rb') s1 = > numpy.fromstring(fh.read(), numpy.int32) fh.close() > > an identity like c=numpy.correlate(s1, s1, mode='same') plots like noise. > > > When I download this, it's full of NaNs. There's either a problem in the way I downloaded it or in the uploaded file. You didn't by chance upload it as an ASCII file did you? > -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From subscriber100 at rjs.org Mon Mar 3 16:45:28 2008 From: subscriber100 at rjs.org (Ray Schumacher) Date: Mon, 03 Mar 2008 13:45:28 -0800 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: References: Message-ID: <6.2.3.4.2.20080303133226.04da7718@rjs.org> At 01:24 PM 3/3/2008, you wrote: > > If you use 'same' or 'full' you'll end of with different > >amounts of offset. I imagine that this is due to the way the data is padded. > >The offset should be deterministic based on the mode and the size of the > >data, so it should be straightforward to compensate for. I agree > > If I use a real time domain signal like > > http://rjs.org/Python/sample.sig fh = open(r'sample.sig','rb') s1 = > > numpy.fromstring(fh.read(), numpy.int32) fh.close() > >When I download this, it's full of NaNs. There's either a problem in the way >I downloaded it or in the uploaded file. You didn't by chance upload it as >an ASCII file did you? I just tested the URL myself with Firefox; it came down OK. It is a binary string from numpy.tostring(), 29,956 bytes of int32. It has a fundamental of 42 cycles in the data, and other fs of less power. I just uploaded a http://rjs.org/Python/sample.csv version Xie's 2D algorithm reduced to 1D works nicely for computing the relative phase, but is it the fastest way? It might be, since some correlation algorithms use FFTs as well. What does _correlateND use, in scipy? Thanks, Ray -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.3/1308 - Release Date: 3/3/2008 10:01 AM From peridot.faceted at gmail.com Mon Mar 3 17:08:00 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 3 Mar 2008 17:08:00 -0500 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: <6.2.3.4.2.20080303112304.04d97c10@rjs.org> References: <6.2.3.4.2.20080303112304.04d97c10@rjs.org> Message-ID: On 03/03/2008, Ray Schumacher wrote: > > I'm trying to figure out what numpy.correlate does, and, what are people > using to calculate the phase shift of 1D signals? I use a hand-rolled Fourier-domain cross-correlation, but then, I'm using a Fourier-domain representation of my signals. > (I coded on routine that uses rfft, conjugate, ratio, irfft, and argmax > based on a paper by Hongjie Xie "An IDL/ENVI implementation of the FFT Based > Algorithm for Automatic Image Registration" - but that seems more intensive > than it could be.) Sounds familiar. If you have a good signal-to-noise ratio, you can get subpixel accuracy by oversampling the irfft, or better but slower, by using numerical optimization to refine the peak you found with argmax. > In numpy, an identity import numpy from pylab import * > l=[1,5,3,8,15,6,7,7,9,10,4] c=numpy.correlate(l,l, mode='same') plot(c) > peaks at the center, x=5, and is symmetric You have revealed several flaws in numpy's correlate. First of all, the docstring gives no indication of how to interpret the result: neither the zero-shift position nor the direction of the result is at all clear (if I shift the first vector to the left, does the correlation peak shift left or right?). Second, the mode "same" gives results which are rather difficult to understand. Third, there is no way to get a "circular" correlation. I would be inclined to use convolve (or scipy.ndimage.convolve, which uses a Fourier-domain method), since it is somewhat better specified. Anne From tim.hochberg at ieee.org Mon Mar 3 17:31:29 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 3 Mar 2008 15:31:29 -0700 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: <6.2.3.4.2.20080303133226.04da7718@rjs.org> References: <6.2.3.4.2.20080303133226.04da7718@rjs.org> Message-ID: On Mon, Mar 3, 2008 at 2:45 PM, Ray Schumacher wrote: > At 01:24 PM 3/3/2008, you wrote: > > > If you use 'same' or 'full' you'll end of with different > > >amounts of offset. I imagine that this is due to the way the data is > padded. > > >The offset should be deterministic based on the mode and the size of > the > > >data, so it should be straightforward to compensate for. > > I agree > > > > If I use a real time domain signal like > > > http://rjs.org/Python/sample.sig fh = open(r'sample.sig','rb') s1 = > > > numpy.fromstring(fh.read(), numpy.int32) fh.close() > > > >When I download this, it's full of NaNs. There's either a problem in the > way > >I downloaded it or in the uploaded file. You didn't by chance upload it > as > >an ASCII file did you? > > I just tested the URL myself with Firefox; it came down OK. It is a > binary string from numpy.tostring(), 29,956 bytes of int32. It has a > fundamental of 42 cycles in the data, and other fs of less power. > I just uploaded a http://rjs.org/Python/sample.csv version I'm going to guess that you are using some flavor of Unix, since I also downloaded using Firefox and the data ends up corrupted. My hypothesis is that Firefox doesn't recognize the mime type and treats it as a text file, corrupting it on Windows, but not on Unix. Then again, maybe you're not using Unix and my installation of Firefox is just broken. No biggy, the csv version works fine in any event. With the CSV version I do get a peak at the (un)expected location (7489//2). The peak is pretty flat and only twice the size of the surrounding gunk, but it looks more or less legit. > Xie's 2D algorithm reduced to 1D works nicely for computing the > relative phase, but is it the fastest way? It might be, since some > correlation algorithms use FFTs as well. What does _correlateND use, in > scipy? > I'm going to defer to Anne here. It sounds like she is more experienced in this area. I will mention that at one point I put together a delay finder that used cross correlation in combination with a quadratic fit tot he peak and it worked quite well. However, that was some time ago and speed was not a priority for me in that situation so, you may well be better off using some other approach. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at ieee.org Mon Mar 3 17:31:29 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 3 Mar 2008 15:31:29 -0700 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: <6.2.3.4.2.20080303133226.04da7718@rjs.org> References: <6.2.3.4.2.20080303133226.04da7718@rjs.org> Message-ID: On Mon, Mar 3, 2008 at 2:45 PM, Ray Schumacher wrote: > At 01:24 PM 3/3/2008, you wrote: > > > If you use 'same' or 'full' you'll end of with different > > >amounts of offset. I imagine that this is due to the way the data is > padded. > > >The offset should be deterministic based on the mode and the size of > the > > >data, so it should be straightforward to compensate for. > > I agree > > > > If I use a real time domain signal like > > > http://rjs.org/Python/sample.sig fh = open(r'sample.sig','rb') s1 = > > > numpy.fromstring(fh.read(), numpy.int32) fh.close() > > > >When I download this, it's full of NaNs. There's either a problem in the > way > >I downloaded it or in the uploaded file. You didn't by chance upload it > as > >an ASCII file did you? > > I just tested the URL myself with Firefox; it came down OK. It is a > binary string from numpy.tostring(), 29,956 bytes of int32. It has a > fundamental of 42 cycles in the data, and other fs of less power. > I just uploaded a http://rjs.org/Python/sample.csv version I'm going to guess that you are using some flavor of Unix, since I also downloaded using Firefox and the data ends up corrupted. My hypothesis is that Firefox doesn't recognize the mime type and treats it as a text file, corrupting it on Windows, but not on Unix. Then again, maybe you're not using Unix and my installation of Firefox is just broken. No biggy, the csv version works fine in any event. With the CSV version I do get a peak at the (un)expected location (7489//2). The peak is pretty flat and only twice the size of the surrounding gunk, but it looks more or less legit. > Xie's 2D algorithm reduced to 1D works nicely for computing the > relative phase, but is it the fastest way? It might be, since some > correlation algorithms use FFTs as well. What does _correlateND use, in > scipy? > I'm going to defer to Anne here. It sounds like she is more experienced in this area. I will mention that at one point I put together a delay finder that used cross correlation in combination with a quadratic fit tot he peak and it worked quite well. However, that was some time ago and speed was not a priority for me in that situation so, you may well be better off using some other approach. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Mon Mar 3 18:05:19 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 3 Mar 2008 20:05:19 -0300 Subject: [Numpy-discussion] cross In-Reply-To: <47CC4F96.5090905@obspm.fr> References: <47CC4F96.5090905@obspm.fr> Message-ID: On 3/3/08, Revaz Yves wrote: > I'm computing the cross product of positions and velocities of n points > in a 3d space. > Using the numpy function "cross", this can be written as : > I compare the computation time needed with a C-api I wrote (dedicated to > this operation). > It appears that my api is in average 20 times faster than the cross > function (for n between 100 and 1000000), > making the latter useless for my purpose :-( . > > Is it normal ? or I'm I using the "cross" function the wrong way ? Wel, the numpy 'cross' function is (cleverly) implemented in Python. However, it internall generate some teporary arrays (associated to binary operation) with could be the cause of the slowdown. > > yves > > > > > PS :Here after you can see some lines the of the C-api. > > > > if (!PyArg_ParseTuple(args, "OO", &pos , &vel)) > return NULL; > > /* create a NumPy object similar to the input */ > int ld[2]; > ld[0]=pos->dimensions[0]; > ld[1]=pos->dimensions[1]; > lxyz = (PyArrayObject *) > PyArray_FromDims(pos->nd,ld,pos->descr->type_num); > > > /* loops over all elements */ > for (i = 0; i < pos->dimensions[0]; i++) { > > x = (float *) (pos->data + i*(pos->strides[0]) > ); > y = (float *) (pos->data + i*(pos->strides[0]) + > 1*pos->strides[1]); > z = (float *) (pos->data + i*(pos->strides[0]) + > 2*pos->strides[1]); > > vx = (float *) (vel->data + > i*(vel->strides[0]) ); > vy = (float *) (vel->data + i*(vel->strides[0]) + > 1*vel->strides[1]); > vz = (float *) (vel->data + i*(vel->strides[0]) + > 2*vel->strides[1]); > > lx = (*y * *vz - *z * *vy); > ly = (*z * *vx - *x * *vz); > lz = (*x * *vy - *y * *vx); > > *(float *)(lxyz->data + i*(lxyz->strides[0]) + > 0*lxyz->strides[1]) = lx; > *(float *)(lxyz->data + i*(lxyz->strides[0]) + > 1*lxyz->strides[1]) = ly; > *(float *)(lxyz->data + i*(lxyz->strides[0]) + > 2*lxyz->strides[1]) = lz; > } > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dineshbvadhia at hotmail.com Mon Mar 3 18:29:11 2008 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Mon, 3 Mar 2008 15:29:11 -0800 Subject: [Numpy-discussion] Pickling and initializing Message-ID: When you pickle a numpy/scipy matrix does it have to be initialized by another program? For example: Program One: A = scipy.asmatrix(scipy.empty((i, i)), dtype=int) # initialize matrix A pickle.dump(A) Program Two: pickle.load(A) .. in Program Two, do we need the statement: A = scipy.asmatrix(scipy.empty((i, i)), dtype=int) # initialize matrix A before the pickle.load(A)? If not, why not and doesn't this make documentation difficult? Dinesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Mar 3 18:36:12 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 3 Mar 2008 17:36:12 -0600 Subject: [Numpy-discussion] Pickling and initializing In-Reply-To: References: Message-ID: <3d375d730803031536r6f440089rd7979d1cc993b65c@mail.gmail.com> On Mon, Mar 3, 2008 at 5:29 PM, Dinesh B Vadhia wrote: > > > When you pickle a numpy/scipy matrix does it have to be initialized by > another program? For example: > > Program One: > A = scipy.asmatrix(scipy.empty((i, i)), dtype=int) # initialize > matrix A > > pickle.dump(A) > > Program Two: > pickle.load(A) > > > ... in Program Two, do we need the statement: > > A = scipy.asmatrix(scipy.empty((i, i)), dtype=int) # initialize > matrix A > > before the pickle.load(A)? No. Neither pickle.load() nor pickle.dump() work like that. The signature of pickle.dump() is pickle.dump(f, obj) and the signature of pickle.load() is obj = pickle.load(f) where `f` is an open file object. There is no need to "pre-declare" `obj` before loading it. > If not, why not and doesn't this make documentation difficult? Not particularly, no. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From peridot.faceted at gmail.com Mon Mar 3 19:02:06 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 3 Mar 2008 19:02:06 -0500 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: <6.2.3.4.2.20080303133226.04da7718@rjs.org> References: <6.2.3.4.2.20080303133226.04da7718@rjs.org> Message-ID: On 03/03/2008, Ray Schumacher wrote: > Xie's 2D algorithm reduced to 1D works nicely for computing the > relative phase, but is it the fastest way? It might be, since some > correlation algorithms use FFTs as well. What does _correlateND use, in scipy? Which way will be the fastest really depends what you want to do. Algorithmically, the direct way numpy.correlate operates is O(NM), and the way FFT-based algorithms operate is (roughly) O((N+M)log(N+M)) (or for a more sophisticated algorithm O(N log M) where M is less than N). In practice what this means is that when one or both of the things you're correlating is short (tens of samples or so), you should use a direct method; when one or both are long you should use an FFT-based method. (There are other approaches too, but I don't know of any in wide use.) In your case it sounds like you have two signals of equal fairly large length to compare. Some questions remain, though: * What do you want to happen at the endpoints? Without padding, only a small interval (the difference in lengths plus one) is valid. Zero-padding works, but guarantees a fall-off at the ends. Circular correlation is easy to implement but not appropriate most of the time. * Do you care about sub-sample alignment? How much accuracy do you really need? Direct methods really can't give you this information. With Fourier methods, you can easily pad the spectrum with zeros and inverse FFT, giving you a beautifully-interpolated signal. If you want more accuracy, a quadratic fit to the three points around the peak of the interpolated signal will get you very close. If you need more accuracy, you can use numerical maximization, evaluating each point as sum(a_k exp(2 pi i k x)). The other common application is to have a template (that presumably falls to zero at its endpoint) and to want to compute a running correlation against a stream of data. This too can be done both ways, depending on the size of the template; all that is needed is to think carefully about overlaps. Anne From peridot.faceted at gmail.com Mon Mar 3 19:28:59 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 3 Mar 2008 19:28:59 -0500 Subject: [Numpy-discussion] Pickling and initializing In-Reply-To: References: Message-ID: On 03/03/2008, Dinesh B Vadhia wrote: > When you pickle a numpy/scipy matrix does it have to be initialized by > another program? For example: Most python objects do not need to be initialized. You just call a function that makes the one you want: >>> l = range(10) This makes a list of length 10. You can now manipulate it, adding elements or what have you. Arrays are no different. (You use matrices in your example - the only difference is that the multiplication operator behaves differently, and you often need to use asmatrix to convert them back to arrays. I never use them, even when I have to do some linear algebra.) You simply call a function that makes the array you want: >>> a = arange(10) You can change its contents, but it's not really sensible to say that arrays must be initialized before use. The function empty() is kind of a peculiar aberration - it's for those rare cases when you end up reassigning all the values in the array, and zeros() is too slow. (For debugging it'd be nice to have NaNs()...) Perhaps you are thinking of statically-typed languages, where variables must be initialized? In python variables do not have type, so variables holding arrays are no different from variables holding strings, integers, file objects, or whatever. Using a python variable before it has been assigned a value does indeed raise an exception; all that is needed is to assign a value to it. Unpickling reads a file and constructs a "new" array from the data in that file. The array value is returned; one often assigns this value to a variable. The values in the array are filled in by the pickling function. It is not possible to make the unpickler store its data in a preallocated array. Anne From emanuele at relativita.com Tue Mar 4 05:22:57 2008 From: emanuele at relativita.com (Emanuele Olivetti) Date: Tue, 04 Mar 2008 11:22:57 +0100 Subject: [Numpy-discussion] numpy, "H", and struct: numpy bug? Message-ID: <47CD2301.8030904@relativita.com> Hi, this snippet is causing troubles: --- import struct import numpy a=numpy.arange(10).astype('H') b=struct.pack("<10H",*a) --- (The module struct simply packs and unpacks data in byte-blobs). It works OK with python2.4, but gives problems with python2.5. On my laptop (linux x86_64 on intel core 2 duo) I got this warning: --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack("<10H",*a) --- On another workstation (linux i686 on intel core 2, so a 32 bit OS on 64 bit architecture) I got warning plus an _error_, when using python2.5 (python2.4 works flawlessly): --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack("<10H",*a) Traceback (most recent call last): File "a.py", line 5, in b=struct.pack("<10H",*a) File "/usr/lib/python2.5/struct.py", line 63, in pack return o.pack(*args) SystemError: ../Objects/longobject.c:322: bad argument to internal function --- Both computers are ubuntu gutsy 7.10, updated. Details: python, 2.5.1-1ubuntu2 numpy, 1:1.0.3-1ubuntu2 Same versions on both machines. I did some little test _without_ numpy and the struct module seems not having problems. Is this a numpy bug? Note: If you remove "<" from the struct format string then it seems to work ok. Regards, Emanuele From emanuele at relativita.com Tue Mar 4 08:07:08 2008 From: emanuele at relativita.com (Emanuele Olivetti) Date: Tue, 04 Mar 2008 14:07:08 +0100 Subject: [Numpy-discussion] numpy, "H", and struct: numpy bug? In-Reply-To: <47CD2301.8030904@relativita.com> References: <47CD2301.8030904@relativita.com> Message-ID: <47CD497C.2020302@relativita.com> Just tried on a 32bit workstation (both CPU and OS): I get an error, as before, using python2.5: --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack("<10H",*a) Traceback (most recent call last): File "a.py", line 5, in b=struct.pack("<10H",*a) File "/usr/lib/python2.5/struct.py", line 63, in pack return o.pack(*args) SystemError: ../Objects/longobject.c:322: bad argument to internal function ---- No error with python2.4 so I believe it is a 32bit issue. HTH, Emanuele Emanuele Olivetti wrote: > Hi, > > this snippet is causing troubles: > --- > import struct > import numpy > > a=numpy.arange(10).astype('H') > b=struct.pack("<10H",*a) > --- > (The module struct simply packs and unpacks data in byte-blobs). > > It works OK with python2.4, but gives problems with python2.5. > On my laptop (linux x86_64 on intel core 2 duo) I got this warning: > --- > a.py:5: DeprecationWarning: struct integer overflow masking is deprecated > b=struct.pack("<10H",*a) > --- > > On another workstation (linux i686 on intel core 2, so a 32 bit OS on 64 bit > architecture) I got warning plus an _error_, when using python2.5 (python2.4 > works flawlessly): > --- > a.py:5: DeprecationWarning: struct integer overflow masking is deprecated > b=struct.pack("<10H",*a) > Traceback (most recent call last): > File "a.py", line 5, in > b=struct.pack("<10H",*a) > File "/usr/lib/python2.5/struct.py", line 63, in pack > return o.pack(*args) > SystemError: ../Objects/longobject.c:322: bad argument to internal function > --- > > Both computers are ubuntu gutsy 7.10, updated. > Details: > python, 2.5.1-1ubuntu2 > numpy, 1:1.0.3-1ubuntu2 > Same versions on both machines. > > I did some little test _without_ numpy and the struct module seems not > having > problems. Is this a numpy bug? > Note: If you remove "<" from the struct format string then it seems to work > ok. > > Regards, > > Emanuele > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From jeff at jgarrett.org Tue Mar 4 08:29:38 2008 From: jeff at jgarrett.org (Jeff Garrett) Date: Tue, 4 Mar 2008 07:29:38 -0600 Subject: [Numpy-discussion] Question about mrecarray Message-ID: <20080304132938.GA517@jgarrett.org> Hi, I'm using an mrecarray in a situation where I need to replace the masked values with default values which are not necessarily the same as the fill value... Something like: for field, mask in zip(row, row._fieldmask): value = field if not mask else ... ... Is there a better way to tell if the individual fields are masked than accessing ._fieldmask? Thanks, Jeff Garrett From pgmdevlist at gmail.com Tue Mar 4 10:23:29 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 4 Mar 2008 10:23:29 -0500 Subject: [Numpy-discussion] Question about mrecarray In-Reply-To: <20080304132938.GA517@jgarrett.org> References: <20080304132938.GA517@jgarrett.org> Message-ID: <200803041023.30137.pgmdevlist@gmail.com> Jeff, > Is there a better way to tell if the individual fields are masked than > accessing ._fieldmask? That depends. If you need to access you mrecarray record by record (by rows), yes you have to check the corresponding ._fieldmask. If instead you can process your array field by field (by columns), you don't need to: each field (column) will be a masked array, and you can just check its mask. Let me know if you have more problems. HIH P. From aisaac at american.edu Tue Mar 4 10:24:52 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 4 Mar 2008 10:24:52 -0500 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: <47CC546D.7050908@enthought.com> References:

<47CC546D.7050908@enthought.com> Message-ID: > Alan G Isaac wrote: >> I never got a response to this: >> >> (Two different types claim to be numpy.int32.) On Mon, 03 Mar 2008, "Travis E. Oliphant" apparently wrote: > It's not a bug :-) There are two c-level types that are both 32-bit (on > 32-bit systems). OK, but at the user-level it is confusing to have two different types claim the same type name. This produced a fairly obscure program error for Dmitrey. (Not that I am generally a fan of type checking, but still, it was pretty surprising ...) Thanks, Alan From dalcinl at gmail.com Tue Mar 4 10:26:50 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 4 Mar 2008 12:26:50 -0300 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: <47CC3AEA.7080209@noaa.gov> References: <47CA3B3E.60203@soe.ucsc.edu> <47CC3AEA.7080209@noaa.gov> Message-ID: Damian Eads wrote: > One used -mfpmath=sse, and the other, -mfpmath=387. > Keeping them both > the same cleared the discrepancy. Oh yes! I think you got it... On 3/3/08, Christopher Barker wrote: > > Was it really a "significant" difference, or just noticeable? I hope > not, that would be pretty scary! > I now believe that this is possible causing the trouble. And yes, in my case the cummulative differences leaded to different iteration counts in a matrix-free Newton-Krylov method. Of course, the final answer was as as accurate as the tolerances for the nonlinear solver. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From Chris.Barker at noaa.gov Tue Mar 4 12:30:55 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 04 Mar 2008 09:30:55 -0800 Subject: [Numpy-discussion] numpy and roundoff(?) In-Reply-To: References: <47CA3B3E.60203@soe.ucsc.edu> <47CC3AEA.7080209@noaa.gov> Message-ID: <47CD874F.3030805@noaa.gov> Lisandro Dalcin wrote: > And yes, in > my case the cummulative differences leaded to different iteration > counts in a matrix-free Newton-Krylov method. Of course, the final > answer was as as accurate as the tolerances for the nonlinear solver. OK, so significant differences in iteration counts, but not in the final answer -- that makes me feel better! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From subscriber100 at rjs.org Tue Mar 4 13:47:39 2008 From: subscriber100 at rjs.org (Ray Schumacher) Date: Tue, 04 Mar 2008 10:47:39 -0800 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: References: Message-ID: <6.2.3.4.2.20080304100606.04db6280@rjs.org> Thank you for the input! It sounds like Fourier methods will be fastest, by design, for sample counts of hundreds to thousands. I currently do steps like: Im1 = get_stream_array_data() Im2 = load_template_array_data(fh2) ##note: len(im1)==len(im2) Ffft_im1=fftpack.rfft(Im1) Ffft_im2=fftpack.rfft(Im2) R1= (Ffft_im1 * Ffft_im2.conjugate()) R2= (abs(Ffft_im1) * abs(Ffft_im2)) R = R1 / R2 IR=fftpack.irfft(R) flat_IR = numpy.ravel(numpy.transpose(IR)).real I= numpy.argmax(flat_IR) phase_offset = (I % len(Im1)) At 09:29 AM 3/4/2008, Anne Archibald wrote: > * What do you want to happen at the endpoints? Without padding, only a > small interval (the difference in lengths plus one) is valid. > Zero-padding works, but guarantees a fall-off at the ends. Circular > correlation is easy to implement but not appropriate most of the time. How much should I be concerned?, since the only desired information from this is the scalar best-fit phase value, presumably the argmax() of the xcorr. In current operation, imagine a tone pattern/template of n samples which we want to align to streaming data; the desired result (at least in my current FFT code) is the sample number of recent ADC data where the zero'th sample of the pattern best aligns. Since it is a repeating pattern, we know that it will always align somewhere in the latest n samples. > * Do you care about sub-sample alignment? How much accuracy do you > really need? Integer alignment is sufficient, due both to electronic noise, and desired phase > The other common application is to have a template (that presumably > falls to zero at its endpoint) and to want to compute a running > correlation against a stream of data. This too can be done both ways, > depending on the size of the template; all that is needed is to think > carefully about overlaps. This is very much what the application is, although the template does not terminate at zero. It does terminate at a value near the zero'th value however, and I assumed the FFTs would be well-behaved. Ray -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.4/1310 - Release Date: 3/4/2008 8:35 AM From subscriber100 at rjs.org Tue Mar 4 14:10:54 2008 From: subscriber100 at rjs.org (Ray Schumacher) Date: Tue, 04 Mar 2008 11:10:54 -0800 Subject: [Numpy-discussion] numpy.correlate with phase offset 1D data series In-Reply-To: References: Message-ID: <6.2.3.4.2.20080304104842.04db5ff0@rjs.org> At 03:28 PM 3/3/2008, Ann wrote: > >Sounds familiar. If you have a good signal-to-noise ratio, you can get > >subpixel accuracy by oversampling the irfft, or better but slower, by > >using numerical optimization to refine the peak you found with argmax. the S/N here is poor, and high data rates work against me too... > I would be inclined to use convolve (or scipy.ndimage.convolve, which > uses a Fourier-domain method), since it is somewhat better specified. I'll give it a try as well. I'm guessing scipy.ndimage.correlate1d is a Fourier method too? > From: "Timothy Hochberg" > > I'm going to guess that you are using some flavor of Unix, since I also > downloaded using Firefox and the data ends up corrupted. My hypothesis is > that Firefox doesn't recognize the mime type and treats it as a text file, > corrupting it on Windows, but not on Unix. Then again, maybe you're not > using Unix and my installation of Firefox is just broken. I think that is the case, I have Win2K on this box > With the CSV version I do get a peak at the (un)expected location (7489//2). > The peak is pretty flat and only twice the size of the surrounding gunk, but > it looks more or less legit. I don't see that in my pylab plot! There's actually a dip, and the whole plot is symmetric about 3744 http://rjs.org/Python/corr_array.jpg, self xcorr of http://rjs.org/Python/data.jpg I'll be upgrading my install here shortly though to py2.5 and associated libs. My compiler/distutils environment is broken. Thanks, Ray -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.4/1310 - Release Date: 3/4/2008 8:35 AM From pgmdevlist at gmail.com Tue Mar 4 16:31:51 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 4 Mar 2008 16:31:51 -0500 Subject: [Numpy-discussion] argmin & min on ndarrays Message-ID: <200803041631.51869.pgmdevlist@gmail.com> All, Let a & b be two ndarrays of the same shape. I'm trying to find the elements of b that correspond to the minima of a along an arbitrary axis. The problem is trivial when axis=None or when a.ndim=2, but I'm getting confused with higher dimensions: I came to the following solution that looks rather ugly, and I'd need some ideas to simplify it >>>a=numpy.arange(24).reshape(2,3,4) >>>axis=-1 >>>b = numpy.rollaxis(a,axis,0)[a.argmin(axis)][tuple([0]*(a.ndim-1))] >>>numpy.all(b, a.min(axis)) True Thanks a lot in advance for any suggestions. From peridot.faceted at gmail.com Tue Mar 4 18:00:36 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 5 Mar 2008 00:00:36 +0100 Subject: [Numpy-discussion] argmin & min on ndarrays In-Reply-To: <200803041631.51869.pgmdevlist@gmail.com> References: <200803041631.51869.pgmdevlist@gmail.com> Message-ID: On 04/03/2008, Pierre GM wrote: > All, > Let a & b be two ndarrays of the same shape. I'm trying to find the elements > of b that correspond to the minima of a along an arbitrary axis. > The problem is trivial when axis=None or when a.ndim=2, but I'm getting > confused with higher dimensions: I came to the following solution that looks > rather ugly, and I'd need some ideas to simplify it > > >>>a=numpy.arange(24).reshape(2,3,4) > >>>axis=-1 > >>>b = numpy.rollaxis(a,axis,0)[a.argmin(axis)][tuple([0]*(a.ndim-1))] > >>>numpy.all(b, a.min(axis)) > True > > Thanks a lot in advance for any suggestions. I couldn't find any nice way to make indexing do what you want, but the function choose() can be persuaded to do it. Unfortunately it will only choose along the first axis, so some transpose jiggery-pokery is necessary: def pick_argmin(a,b,axis): assert a.shape == b.shape t = range(len(b.shape)) i = t[axis] del t[axis] t = [i] + t a = a.transpose(t) b = b.transpose(t) return N.choose(N.argmin(a,axis=0),b) I did find a not-nice way to do what you want. The problem is that numpy's fancy indexing is so general, it won't let you simply pick and choose along one axis, you have to pick and choose along all axes. So what you do is use indices() to generate arrays that index all the *other* axes appropriately, and then use the argmin array to index the axis you're interested in: In [39]: c = N.indices((2,4)) In [40]: b[c[0],N.argmin(a,axis=1),c[1]] Out[40]: array([[-0.70659942, -0.997249 , -0.20028296, -0.05171191], [-1.28886394, -1.0610526 , -1.07193295, 0.05356948]]) In [42]: c[0] Out[42]: array([[0, 0, 0, 0], [1, 1, 1, 1]]) In [43]: c[1] Out[43]: array([[0, 1, 2, 3], [0, 1, 2, 3]]) Not only would this require similar jiggery-pokery, it creates the potentially very large intermediate array c. I'd stick with choose(). A third option would be to transpose() and reshape() a and b down to two dimensions, then reshape() the result back to the right shape. More multiaxis jiggery-pokery, and the reshape()s may end up copying the arrays. Finally, you can always just write a python loop (over all axes except the one of interest) using ndenumerate() and one-dimensional argmin(). If the dimension you're argmin()ing over is very large, the cost of the python loop may be negligible. Anne P.S. feel free to use pick_argmin however you like, though error handling would probably be a good idea... -A From pgmdevlist at gmail.com Tue Mar 4 18:44:03 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 4 Mar 2008 18:44:03 -0500 Subject: [Numpy-discussion] argmin & min on ndarrays In-Reply-To: References: <200803041631.51869.pgmdevlist@gmail.com> Message-ID: <200803041844.04875.pgmdevlist@gmail.com> Anne, Thanks a lot for your suggestion. Something like >>>if axis is None: >>> return b.flat[a.argmin()] >>>else: >>> return numpy.choose(a.argmin(axis),numpy.rollaxis(b,axis,0)) seems to do the trick fairly nicely indeed. The other solutions you suggested would require too much ad hoc adaptation. Thanks again ! From peridot.faceted at gmail.com Tue Mar 4 19:21:14 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 4 Mar 2008 19:21:14 -0500 Subject: [Numpy-discussion] argmin & min on ndarrays In-Reply-To: <200803041844.04875.pgmdevlist@gmail.com> References: <200803041631.51869.pgmdevlist@gmail.com> <200803041844.04875.pgmdevlist@gmail.com> Message-ID: On 04/03/2008, Pierre GM wrote: > Anne, > > Thanks a lot for your suggestion. Something like > > >>>if axis is None: > >>> return b.flat[a.argmin()] > >>>else: > >>> return numpy.choose(a.argmin(axis),numpy.rollaxis(b,axis,0)) > > seems to do the trick fairly nicely indeed. The other solutions you suggested > would require too much ad hoc adaptation. > Thanks again ! Ah! "It ain't the things you don't know that'll get you, it's the things you know that ain't so." I thought rollaxis rolled the axes around cyclically. This is much more useful, but what a funny name for what it actually does... I should have provided the link before, but this is very useful for answering this kind of question: http://www.scipy.org/Numpy_Functions_by_Category Good luck, Anne From pgmdevlist at gmail.com Tue Mar 4 19:35:48 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 4 Mar 2008 19:35:48 -0500 Subject: [Numpy-discussion] argmin & min on ndarrays In-Reply-To: References: <200803041631.51869.pgmdevlist@gmail.com> <200803041844.04875.pgmdevlist@gmail.com> Message-ID: <200803041935.49540.pgmdevlist@gmail.com> Anne, > I should have provided the link before, but this is very useful for > answering this kind of question: > http://www.scipy.org/Numpy_Functions_by_Category Great link indeed, that complements well the example list: http://www.scipy.org/Numpy_Example_List Thanks again ! From charlesr.harris at gmail.com Wed Mar 5 04:54:46 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 5 Mar 2008 02:54:46 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: <47CC69E8.70501@enthought.com> References: <47CC69E8.70501@enthought.com> Message-ID: On Mon, Mar 3, 2008 at 2:13 PM, Travis E. Oliphant wrote: > Charles R Harris wrote: > > > > > > On Mon, Mar 3, 2008 at 10:21 AM, Jarrod Millman > > wrote: > > > > Hello, > > > > I would like to tag the 1.0.5 release on Wednesday night and > announce > > the release by Monday (3/10). If you have anything that you would > > like to get in before then, please do it now. It would also be > great > > if everyone could test the trunk. If anyone finds a bug or > regression > > that should delay the release, please send an email to the list > ASAP. > > > > Please take a look at the release notes and let me know if you see > > anything that needs to be changed or updated: > > http://projects.scipy.org/scipy/numpy/milestone/1.0.5 > > > > Thanks, > > > > > > I think ticket 597 should be pretty easy to fix. I just want to make > > sure everyone agrees it should be fixed. > I can't imagine someone "depending" on this behavior. And it should be > consistent between 32-bit and 64-bit systems. > Ok, it's fixed, sorta; it still fails for numbers < -2**63. I really wonder where we should draw the line? The C option would be to convert all integer types using modular arithmetic, but I have to wonder if 10**10000 mod(2**64) really makes much sense. On the other hand, it is convenient to get the largest unsigned number as uint64(-1). On the third hand, the same can be achieved using the known integer bounds and the stricter typing probably makes sense from the numerical point of view. How does FORTRAN deal with these types of conversions? I've forgotten. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From yves.revaz at obspm.fr Wed Mar 5 09:13:45 2008 From: yves.revaz at obspm.fr (Revaz Yves) Date: Wed, 05 Mar 2008 15:13:45 +0100 Subject: [Numpy-discussion] bug report ? In-Reply-To: References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> Message-ID: <47CEAA99.8@obspm.fr> Matthieu Brucher wrote: > Hi, > > What type is pos->dimensions in your case ? It may be long (64bits > long) instead of the expected int (32bits) or something like that ? > yes, pos->dimensions is a 64bits long while PyArray_FromDims expects 32bits int. Why is it so ? > Matthieu > > 2008/2/8, Yves Revaz >: > > > Dear list, > > I'm using old numarray C api with numpy. > It seems that there is a bug when using the PyArray_FromDims function. > > for example, if I define : > acc = (PyArrayObject *) > PyArray_FromDims(pos->nd,pos->dimensions,pos->descr->type_num); > > where pos is PyArrayObject *pos; (3x3 array) > > when using return PyArray_Return(acc); > I get > array([], shape=(3, 0), dtype=float32) > > > It is possible to make everything works if I use the following lines > instead : > int ld[2]; > ld[0]=pos->dimensions[0]; > ld[1]=pos->dimensions[1]; > acc = (PyArrayObject *) > PyArray_FromDims(pos->nd,ld,pos->descr->type_num); > > So, the problem comes from the pos->dimensions. > > > Is it a known bug ? > > > (I'm working on a linux 64bits machine.) > > > Cheers, > > > yves > > > > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From oliphant at enthought.com Wed Mar 5 09:45:06 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 05 Mar 2008 08:45:06 -0600 Subject: [Numpy-discussion] bug report ? In-Reply-To: <47CEAA99.8@obspm.fr> References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> <47CEAA99.8@obspm.fr> Message-ID: <47CEB1F2.3040508@enthought.com> Revaz Yves wrote: > Matthieu Brucher wrote: > >> Hi, >> >> What type is pos->dimensions in your case ? It may be long (64bits >> long) instead of the expected int (32bits) or something like that ? >> >> > yes, > pos->dimensions is a 64bits long > while PyArray_FromDims expects 32bits int. > > Why is it so ? > PyArray_FromDims is backward compatible Numeric API which did not support 64-bit correctly. PyArray_SimpleNew is the equivalent that accepts 64-bit dimensions information and is what you should be using. -Travis O. From yves.revaz at obspm.fr Wed Mar 5 09:49:37 2008 From: yves.revaz at obspm.fr (Revaz Yves) Date: Wed, 05 Mar 2008 15:49:37 +0100 Subject: [Numpy-discussion] bug report ? In-Reply-To: <47CEB1F2.3040508@enthought.com> References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> <47CEAA99.8@obspm.fr> <47CEB1F2.3040508@enthought.com> Message-ID: <47CEB301.4080003@obspm.fr> > PyArray_FromDims is backward compatible Numeric API which did not > support 64-bit correctly. > > PyArray_SimpleNew is the equivalent that accepts 64-bit dimensions > information and is what you should be using. > ok, excellent ! thanks for the answer. yves > -Travis O. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Wed Mar 5 20:26:32 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 6 Mar 2008 10:26:32 +0900 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: Message-ID: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> On Tue, Mar 4, 2008 at 2:21 AM, Jarrod Millman wrote: > Hello, > > I would like to tag the 1.0.5 release on Wednesday night and announce > the release by Monday (3/10). If you have anything that you would > like to get in before then, please do it now. It would also be great > if everyone could test the trunk. If anyone finds a bug or regression > that should delay the release, please send an email to the list ASAP. > bug #653 can be closed I think with the patch I posted (this reminds me I should look for a way to get patch information in trac). cheers, David From charlesr.harris at gmail.com Wed Mar 5 23:03:26 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 5 Mar 2008 21:03:26 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> Message-ID: On Wed, Mar 5, 2008 at 6:26 PM, David Cournapeau wrote: > On Tue, Mar 4, 2008 at 2:21 AM, Jarrod Millman > wrote: > > Hello, > > > > I would like to tag the 1.0.5 release on Wednesday night and announce > > the release by Monday (3/10). If you have anything that you would > > like to get in before then, please do it now. It would also be great > > if everyone could test the trunk. If anyone finds a bug or regression > > that should delay the release, please send an email to the list ASAP. > > > > bug #653 can be closed I think with the patch I posted (this reminds > me I should look for a way to get patch information in trac). > Has the patch been applied? If not, can you attach it to an email. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Mar 5 23:10:49 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 5 Mar 2008 21:10:49 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> Message-ID: On Wed, Mar 5, 2008 at 6:26 PM, David Cournapeau wrote: > On Tue, Mar 4, 2008 at 2:21 AM, Jarrod Millman > wrote: > > Hello, > > > > I would like to tag the 1.0.5 release on Wednesday night and announce > > the release by Monday (3/10). If you have anything that you would > > like to get in before then, please do it now. It would also be great > > if everyone could test the trunk. If anyone finds a bug or regression > > that should delay the release, please send an email to the list ASAP. > > > > bug #653 can be closed I think with the patch I posted (this reminds > me I should look for a way to get patch information in trac). > Ok, I applied the patch. Do you think it is sufficiently tested for the upcoming realease or should I wait for Jarrod to tag the release before committing the changes? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournapeau at cslab.kecl.ntt.co.jp Thu Mar 6 00:08:41 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Thu, 06 Mar 2008 14:08:41 +0900 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> Message-ID: <1204780121.25137.2.camel@bbc8> On Wed, 2008-03-05 at 21:10 -0700, Charles R Harris wrote: > > > On Wed, Mar 5, 2008 at 6:26 PM, David Cournapeau > wrote: > On Tue, Mar 4, 2008 at 2:21 AM, Jarrod Millman > wrote: > > Hello, > > > > I would like to tag the 1.0.5 release on Wednesday night > and announce > > the release by Monday (3/10). If you have anything that > you would > > like to get in before then, please do it now. It would > also be great > > if everyone could test the trunk. If anyone finds a bug or > regression > > that should delay the release, please send an email to the > list ASAP. > > > > bug #653 can be closed I think with the patch I posted (this > reminds > me I should look for a way to get patch information in trac). > > Ok, I applied the patch. Do you think it is sufficiently tested for > the upcoming realease or should I wait for Jarrod to tag the release > before committing the changes? It is not tested :) I just checked that it worked on my system. Since it is using python library, it should be more robust than the current code, but I am not really familiar with the usage of this code, so maybe the changes have unintended consequences. cheers, David From charlesr.harris at gmail.com Thu Mar 6 00:44:48 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 5 Mar 2008 22:44:48 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: <1204780121.25137.2.camel@bbc8> References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> <1204780121.25137.2.camel@bbc8> Message-ID: On Wed, Mar 5, 2008 at 10:08 PM, David Cournapeau < cournapeau at cslab.kecl.ntt.co.jp> wrote: > On Wed, 2008-03-05 at 21:10 -0700, Charles R Harris wrote: > > > > > > On Wed, Mar 5, 2008 at 6:26 PM, David Cournapeau > > wrote: > > On Tue, Mar 4, 2008 at 2:21 AM, Jarrod Millman > > wrote: > > > Hello, > > > > > > I would like to tag the 1.0.5 release on Wednesday night > > and announce > > > the release by Monday (3/10). If you have anything that > > you would > > > like to get in before then, please do it now. It would > > also be great > > > if everyone could test the trunk. If anyone finds a bug or > > regression > > > that should delay the release, please send an email to the > > list ASAP. > > > > > > > bug #653 can be closed I think with the patch I posted (this > > reminds > > me I should look for a way to get patch information in trac). > > > > Ok, I applied the patch. Do you think it is sufficiently tested for > > the upcoming realease or should I wait for Jarrod to tag the release > > before committing the changes? > > It is not tested :) I just checked that it worked on my system. Since it > is using python library, it should be more robust than the current code, > but I am not really familiar with the usage of this code, so maybe the > changes have unintended consequences. > Hmm. Well, it's in now. I have a 32 bit xeon at work and numpy fails one test and warns on another, so that might be a related problem. I'll give things a try and see what happens. I would think things should fail rather spectacularly if the system was misidentified and that isn't the case currently. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.gregory at oregonstate.edu Thu Mar 6 00:45:57 2008 From: matt.gregory at oregonstate.edu (Gregory, Matthew) Date: Wed, 5 Mar 2008 21:45:57 -0800 Subject: [Numpy-discussion] calculating weighted majority using two 3D arrays Message-ID: <451453C181B199458A55B2B1723FAC00A97FC8@SAGE.forestry.oregonstate.edu> Hi list, I'm a definite newbie to numpy, but finding the library to be incredibly useful. I'm trying to calculate a weighted majority using numpy functions. I have two sets of image stacks (one is values, the other weights) that I read into 3D numpy arrays. Assuming I read in a 100 row x 100 col image subset consisting of ten images each, I have two arrays called values and weights with the following shape: values.shape = (10, 100, 100) weights.shape = (10, 100, 100) At this point I need to call my user-defined function to calculate the weighted majority which should return a value for each 'pixel' in my 100 x 100 subset. The way I'm doing it now (which I assume is NOT optimal) is to pass values[:,i,j] and weights[:,i,j] to my function in a double loop for i rows and j columns. I then build up the return values into a subsequent 2D array. It seems like I should be able to use vectorize() or apply_along_axis() to do this, but I'm not clever enough to figure this out. Alternatively, should I be structuring my initial data differently so that it's easier to use one of these functions. The only way I can think about doing that would be to store the two 10-item arrays into a tuple and then make an array of these tuples, but that seemed overly complicated. Or potentially, is there a way to calculate a weighted majority just using standard numpy functions?? Thanks for any suggestions, matt From eads at soe.ucsc.edu Thu Mar 6 01:34:24 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Wed, 05 Mar 2008 23:34:24 -0700 Subject: [Numpy-discussion] calculating weighted majority using two 3D arrays In-Reply-To: <451453C181B199458A55B2B1723FAC00A97FC8@SAGE.forestry.oregonstate.edu> References: <451453C181B199458A55B2B1723FAC00A97FC8@SAGE.forestry.oregonstate.edu> Message-ID: <47CF9070.2090704@soe.ucsc.edu> Gregory, Matthew wrote: > Hi list, > > I'm a definite newbie to numpy, but finding the library to be incredibly > useful. > > I'm trying to calculate a weighted majority using numpy functions. I > have two sets of image stacks (one is values, the other weights) that I > read into 3D numpy arrays. Assuming I read in a 100 row x 100 col image > subset consisting of ten images each, I have two arrays called values > and weights with the following shape: > > values.shape = (10, 100, 100) > weights.shape = (10, 100, 100) You may need to be a bit more specific by what you mean by weighted majority. What are the range of values for values and weights, specifically? This sounds a lot like pixel classification where each pixel is classified with a majority vote over its weights and values. Is that what you're trying to do? Many numpy functions (e.g. mean, max, min, sum) have an axis parameter, which specifies the axis along which the statistic is computed. Omitting the axis parameter causes the statistic to be computed over all values in the multidimensional array. Suppose the 'values' array contains floating point numbers in the range -1 to 1 and a larger absolute value gives a larger confidence. Also suppose the weights are floating point numbers between 0 and 1. The weighted majority vote for pixel i,j over 10 real-valued (confidenced) votes, each vote having a separate weight, is computed by w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum()) This can be vectorized to give a weighted majority vote for each pixel by doing w_vote = numpy.sign((values*weights).sum(axis=0)) The values*weights expression gives a weighted prediction. This also works if the 'values' are just predictions from the set {-1, 1}, i.e. there are ten classifiers, each one predicts either -1 and 1 on each pixel. I hope this helps. Damian > At this point I need to call my user-defined function to calculate the > weighted majority which should return a value for each 'pixel' in my 100 > x 100 subset. The way I'm doing it now (which I assume is NOT optimal) > is to pass values[:,i,j] and weights[:,i,j] to my function in a double > loop for i rows and j columns. I then build up the return values into a > subsequent 2D array. > > It seems like I should be able to use vectorize() or apply_along_axis() > to do this, but I'm not clever enough to figure this out. > Alternatively, should I be structuring my initial data differently so > that it's easier to use one of these functions. The only way I can > think about doing that would be to store the two 10-item arrays into a > tuple and then make an array of these tuples, but that seemed overly > complicated. Or potentially, is there a way to calculate a weighted > majority just using standard numpy functions?? > > Thanks for any suggestions, > matt > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From emanuele at relativita.com Thu Mar 6 04:53:36 2008 From: emanuele at relativita.com (Emanuele Olivetti) Date: Thu, 06 Mar 2008 10:53:36 +0100 Subject: [Numpy-discussion] numpy.ndarray constructor from python list: bug? Message-ID: <47CFBF20.4070709@relativita.com> Dear all, Look at this little example: ---- import numpy a = numpy.array([1]) b = numpy.array([1,2,a]) c = numpy.array([a,1,2]) ---- Which has the following output: ---- Traceback (most recent call last): File "b.py", line 4, in c = numpy.array([a,1,2]) ValueError: setting an array element with a sequence. ---- It seems that a list starting with an ndarray ('a', of a single number) is not a legal input to build an ndarray. Instead if 'a' is in other places of the list the ndarray builds up flawlessly. Is there a meaning for this behavior or is it a bug? Details: numpy 1.04 on ubuntu linux x86_64 Emanuele From robince at gmail.com Thu Mar 6 08:21:06 2008 From: robince at gmail.com (Robin) Date: Thu, 6 Mar 2008 13:21:06 +0000 Subject: [Numpy-discussion] numpy.ndarray constructor from python list: bug? In-Reply-To: <47CFBF20.4070709@relativita.com> References: <47CFBF20.4070709@relativita.com> Message-ID: On Thu, Mar 6, 2008 at 9:53 AM, Emanuele Olivetti wrote: > Dear all, > > Look at this little example: > ---- > import numpy > a = numpy.array([1]) > b = numpy.array([1,2,a]) > c = numpy.array([a,1,2]) > ---- > Which has the following output: > ---- > Traceback (most recent call last): > File "b.py", line 4, in > c = numpy.array([a,1,2]) > ValueError: setting an array element with a sequence. > ---- > > It seems that a list starting with an ndarray ('a', of > a single number) is not a legal input to build an ndarray. > Instead if 'a' is in other places of the list the ndarray > builds up flawlessly. > > Is there a meaning for this behavior or is it a bug? > > Details: numpy 1.04 on ubuntu linux x86_64 Hi, I see the same behaviour with 1.0.5.dev4786. I think the bug is that the b assignment should also fail. They both fail (as I think they should) if you take a as an array with more than one element. I think the array constructor expects lists of numbers, not of arrays etc. To do what you want try b = r_[1,2,a] c = r_[a,1,2] which works for a an array (and of more than one element). Cheers Robin From devnew at gmail.com Thu Mar 6 09:39:56 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Thu, 6 Mar 2008 06:39:56 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020803031012p2d1679aax1b2c24ab54a0d182@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> <9614b846-ed02-4feb-986b-08804b6620b4@s13g2000prd.googlegroups.com> <5d3194020803010950h4d38a8f4s888b933c8905ff67@mail.gmail.com> <5d3194020803030942i1a6eeaa5rddf515b8176e4c3b@mail.gmail.com> <5d3194020803031012p2d1679aax1b2c24ab54a0d182@mail.gmail.com> Message-ID: <116b4851-f17b-440b-a375-9fcf4257088e@i7g2000prf.googlegroups.com> ok..I coded everything again from scratch..looks like i was having a problem with matrix class when i used a matrix for facespace facespace=sortedeigenvectorsmatrix * adjustedfacematrix and trying to convert the row to an image (eigenface). by make_simple_image(facespace[x],"eigenimage_x.jpg",(imgwdth,imght)) .i was getting black images instead of eigenface images. def make_simple_image(v, filename,imsize): v.shape=(-1,) #change to 1 dim array im = Image.new('L', imsize) im.putdata(v) im.save(filename) i made it an array instead of matrix make_simple_image(asarray(facespace[x]),"eigenimage_x.jpg", (imgwdth,imght)) this produces eigenface images another observation, the eigenface images obtained are too dark,unlike the eigenface images generated by Arnar's code.so i examined the elements of the facespace row sample rows: [ -82.35294118, -82.88235294, -91.58823529 ,..., -66.47058824, -68.23529412, -60.76470588] .. [ 89.64705882 82.11764706 79.41176471 ..., 172.52941176 170.76470588 165.23529412] looks like these are signed ints.. i used another make_image() function that converts the elements def make_image(v, filename,imsize): v.shape = (-1,) #change to 1 dim array a, b = v.min(), v.max() span = max(abs(b), abs(a)) im = Image.new('L', imsize) im.putdata((v * 127. / span) + 128) im.save(filename) This function makes clearer images..i think the calculations convert the elements to unsigned 8-bit values (as pointed out by Robin in another posting..) ,i am wondering if there is a more direct way to get clearer pics out of the facespace row elements From doutriaux1 at llnl.gov Thu Mar 6 11:47:46 2008 From: doutriaux1 at llnl.gov (Charles Doutriaux) Date: Thu, 06 Mar 2008 08:47:46 -0800 Subject: [Numpy-discussion] bug in f2py on Mac 10.5 ? Message-ID: <47D02032.6070605@llnl.gov> Hello, we're trying to install fortran extension with f2py, works great on linux, mac 10.4 (gfortran and g77) but on 10.5, it picks up g77 and then complains about cc_dynamic library. Apparently this lib is not part os 10.5 (Xcode), is that a known problem? Should we try with what's in trunk? Thanks, C. From fperez.net at gmail.com Thu Mar 6 13:15:27 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 6 Mar 2008 10:15:27 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea Message-ID: Hi all, after the Scipy/Sage Days 8 meeting, we were all very impressed by the progress made by Cython. For those not familiar with it, Cython: http://www.cython.org/ is an evolved version of Pyrex (which is used by numpy and scipy) with lots of improvements. We'd like to position Cython as the preferred way of writing most, if not all, new extension code written for numpy and scipy, as it is easier to write, get right, debug (when you still get it wrong) and maintain than writing to the raw Python-C API. A specific project along these lines, that would be very beneficial for numpy could be: - Creating new matrix types in cython that match the cvxopt matrices. The creation of new numpy array types with efficient code would be very useful. - Rewriting the existing ndarray subclasses that ship with numpy, such as record arrays, in cython. In doing this, benchmarks of the relative performance of the new code should be obtained. Another possible project would be the addition to Cython of syntactic support for array expressions, multidimensional indexing, and other features of numpy. This is probably more difficult than the above, as it would require fairly detailed knowledge of both the numpy C API and the Cython internals, but would ultimately be extremely useful. Any student interested in this should quickly respond on the list; such a project would likely be co-mentored by people on the Numpy and Cython teams, since it is likely to require expertise from both ends. Cheers, f From Chris.Barker at noaa.gov Thu Mar 6 13:28:32 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 06 Mar 2008 10:28:32 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: <47D037D0.7070502@noaa.gov> Fernando Perez wrote: > after the Scipy/Sage Days 8 meeting, we were all very impressed by the > progress made by Cython. cool stuff! > A specific project along these lines, that would be very beneficial > for numpy could be: Is there any way to set this up as a possible Google Summer of Code project? I don't suppose numpy.scipy is an officially listed project, is it? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Thu Mar 6 13:29:03 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 6 Mar 2008 13:29:03 -0500 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: <200803061329.04444.pgmdevlist@gmail.com> On Thursday 06 March 2008 13:15:27 Fernando Perez wrote: > - Rewriting the existing ndarray subclasses that ship with numpy, such > as record arrays, in cython. In doing this, benchmarks of the > relative performance of the new code should be obtained. Fernando, I remember having huge difficulties trying to implement ndarray subclasses in vanilla Pyrex, to the extent that I gave up that approach. Does it work better in Cython (I haven't tried it yet) ? From matt.gregory at oregonstate.edu Thu Mar 6 13:37:02 2008 From: matt.gregory at oregonstate.edu (Gregory, Matthew) Date: Thu, 6 Mar 2008 10:37:02 -0800 Subject: [Numpy-discussion] calculating weighted majority using two 3D arrays In-Reply-To: <451453C181B199458A55B2B1723FAC00A97FCD@SAGE.forestry.oregonstate.edu> References: <451453C181B199458A55B2B1723FAC00A97FCD@SAGE.forestry.oregonstate.edu> Message-ID: <451453C181B199458A55B2B1723FAC00A97FCE@SAGE.forestry.oregonstate.edu> Eads, Damian wrote: > You may need to be a bit more specific by what you mean by > weighted majority. What are the range of values for values > and weights, specifically? This sounds a lot like pixel > classification where each pixel is classified with a majority > vote over its weights and values. Is that what you're trying to do? > > Many numpy functions (e.g. mean, max, min, sum) have an axis > parameter, which specifies the axis along which the statistic > is computed. Omitting the axis parameter causes the statistic > to be computed over all values in the multidimensional array. > > Suppose the 'values' array contains floating point numbers in > the range > -1 to 1 and a larger absolute value gives a larger > confidence. Also suppose the weights are floating point > numbers between 0 and 1. The weighted majority vote for pixel > i,j over 10 real-valued (confidenced) votes, each vote having > a separate weight, is computed by > > w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum()) > > This can be vectorized to give a weighted majority vote for > each pixel by doing > > w_vote = numpy.sign((values*weights).sum(axis=0)) > > The values*weights expression gives a weighted prediction. > This also works if the 'values' are just predictions from the > set {-1, 1}, i.e. > there are ten classifiers, each one predicts either -1 and 1 > on each pixel. Damian, thank you for the helpful response. I should have been a bit more explicit about what I meant by weighted majority. In my case, I need to find a discrete value (i.e. class) that occurs most often among ten observations where weighting is pre-determined by an inverse-distance calculation. Ignoring for a moment the multidimensionality issue, my values and weights arrays might look like this: values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10]) weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2]) My function to calculate the majority looks like this: def weightedMajority(a, b): # Put all the samples into a dictionary with weights summed for # duplicate values wDict = {} for i in xrange(len(a)): (value, weight) = (a[i], b[i]) if wDict.has_key(value): wDict[value] += weight else: wDict[value] = weight # Create arrays of the values and weights values = numpy.array(wDict.keys()) weights = numpy.array(wDict.values()) # Return the index of the maximum value index = numpy.argmax(weights) # Return the majority value return values[index] In the above example: >> maj = weightedMajority(values, weights) >> maj 19 Correct me if I'm wrong, but I don't think that your example will work when I am looking to return a discrete value from the values set, but you may see something that I'm doing that is truly inefficient! thanks, matt From sameerslists at gmail.com Thu Mar 6 15:21:13 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Thu, 6 Mar 2008 14:21:13 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <47CA307C.9050406@enthought.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> <47C6E5E9.4030201@enthought.com> <47CA307C.9050406@enthought.com> Message-ID: <8fb8cc060803061221r298e2c26n5743dd7e7e222db9@mail.gmail.com> On Sat, Mar 1, 2008 at 10:43 PM, Travis E. Oliphant wrote: > > Can you try: > > olddt.names = ['notfoo', 'notbar'] > > on a recent SVN tree. This should now work.... > Thanks Travis, this works great!! Sameer From robert.kern at gmail.com Thu Mar 6 15:33:38 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 6 Mar 2008 14:33:38 -0600 Subject: [Numpy-discussion] bug in f2py on Mac 10.5 ? In-Reply-To: <47D02032.6070605@llnl.gov> References: <47D02032.6070605@llnl.gov> Message-ID: <3d375d730803061233p4270cba7j4c6e7eb9cc776651@mail.gmail.com> On Thu, Mar 6, 2008 at 10:47 AM, Charles Doutriaux wrote: > Hello, > > we're trying to install fortran extension with f2py, works great on > linux, mac 10.4 (gfortran and g77) > but on 10.5, it picks up g77 and then complains about cc_dynamic library. > > Apparently this lib is not part os 10.5 (Xcode), is that a known > problem? Should we try with what's in trunk? You cannot use g77 with gcc 4. You must use gfortran. You can ensure that you are using gfortran instead of g77, use the --fcompiler=gnu95 flag. $ python setup.py config_fc --fcompiler=gnu95 build or $ f2py -c --fcompiler=gnu95 ... -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From caver_sean at ou.edu Thu Mar 6 15:36:53 2008 From: caver_sean at ou.edu (caver_sean at ou.edu) Date: Thu, 06 Mar 2008 14:36:53 -0600 Subject: [Numpy-discussion] loadtxt and missing values Message-ID: Greetings! I'm relatively new to numpy (and python in general), and so far I have been very pleased! I've been writing an atmospheric boundary-layer observation analysis package to use for my PhD research and I have ran into an issue with the loadtxt function (as an aside, our dataloggers output ascii data files so I use loadtxt...eventually the data get converted to netCDF). The issue: ------------------------- Our SODAR (think radar, but sound waves instead of E&M) spits out a comma delimited string like: yyyy-mm-dd hh:mm:ss,val1,val2,val3,error_code,...,val48,val49\n If the SODAR detects an error, the string will be: yyyy-mm-dd hh:mm:ss,,,,error_code,...,,\n As expected from the doc string (thus not a true 'bug'), loadtxt does not like missing values that are not marked by some 'missing value' (a series of ',,,,,,' does not fly!). Proposed solution: ------------------------- It's probably not the best way (noob, that's me), but this situation could be fixed by: 1) add a fill keyword to loadtxt such that def loadtxt(...,fill=-999): 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4) ====================== for j in range(0,len(vals)): if vals[j] != '': pass else: vals[j]=fill ====================== Testing: ------------------------- Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp). Timings using %timeit in ipython: Reading an ascii file with no missing values using the current version of loadtxt: ***10 loops, best of 3: 704 ms per loop Reading an ascii file with no missing values using the proposed changes to loadtxt: ***10 loops, best of 3: 802 ms per loop The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added? Thanks! Sean Arms Ph.D. Student School of Meteorology University of Oklahoma From robince at gmail.com Thu Mar 6 15:48:10 2008 From: robince at gmail.com (Robin) Date: Thu, 6 Mar 2008 20:48:10 +0000 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: On Thu, Mar 6, 2008 at 6:15 PM, Fernando Perez wrote: > Any student interested in this should quickly respond on the list; > such a project would likely be co-mentored by people on the Numpy and > Cython teams, since it is likely to require expertise from both ends. Hello, I would like to register my keen interest in applying for Numpy/Scipy GSoC project. I am a first year PhD student in Computational Neuroscience and my undergraduate degree was in Mathematics. I have been using Numpy and Scipy for my PhD work for a few months now and have been building up to trying to contribute something to the project - I am keen to get more substantial real world programming experience... The projects described involving Cython definitely interest me, although I don't yet have a sufficient understanding of the Python C-API and Pyrex/Cython to gauge how demanding they might be. As a PhD student in the UK I don't have any official summer vacation, so I wouldn't be able to work full time on the project (I also have a continuation report due just before the GSoC final deadline which is a bit annoying). However I currently work 20 hours per week part time anyway, so I'm confident that I could replace that with GSoC and still keep up with my studies. I would be keen to chat with someone (perhaps on the IRC channel) about whether my existing programming experience and availability would allow me to have a proper crack at this. I understand that first organisations apply (deadline 12th March) with some suggested projects, and then towards the end of the month students can apply to accepted organisations, either for the suggested project or their own ideas. I'd love to see Numpy/Scipy apply as an organisation with these projects (and perhaps some others) so that interested students like myself can apply. Thanks, Robin PS My nick on IRC is 'thrope' and I try to hang out in there most of the time I am online. I am also on Google Talk at this email address. From kwgoodman at gmail.com Thu Mar 6 15:50:36 2008 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 6 Mar 2008 12:50:36 -0800 Subject: [Numpy-discussion] loadtxt and missing values In-Reply-To: References: Message-ID: On Thu, Mar 6, 2008 at 12:36 PM, wrote: > Proposed solution: > ------------------------- > > It's probably not the best way (noob, that's me), but this situation could be fixed by: > > 1) add a fill keyword to loadtxt such that > > def loadtxt(...,fill=-999): > > 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4) > > ====================== > for j in range(0,len(vals)): > if vals[j] != '': > pass > else: > vals[j]=fill > ====================== > > > Testing: ------------------------- > > Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp). > > Timings using %timeit in ipython: > > Reading an ascii file with no missing values using the current version of loadtxt: > ***10 loops, best of 3: 704 ms per loop > > Reading an ascii file with no missing values using the proposed changes to loadtxt: > ***10 loops, best of 3: 802 ms per loop > > The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added? I haven't used loadtxt so I don't have an opinion on changing it. But would this be faster instead of a for loop? vals = [(z, fill)[z is ''] for z in vals] From caver_sean at ou.edu Thu Mar 6 16:12:23 2008 From: caver_sean at ou.edu (Sean Arms) Date: Thu, 06 Mar 2008 15:12:23 -0600 Subject: [Numpy-discussion] loadtxt and missing values In-Reply-To: References: Message-ID: <47D05E37.7060404@ou.edu> Keith Goodman wrote: > On Thu, Mar 6, 2008 at 12:36 PM, wrote: > > >> Proposed solution: >> ------------------------- >> >> It's probably not the best way (noob, that's me), but this situation could be fixed by: >> >> 1) add a fill keyword to loadtxt such that >> >> def loadtxt(...,fill=-999): >> >> 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4) >> >> ====================== >> for j in range(0,len(vals)): >> if vals[j] != '': >> pass >> else: >> vals[j]=fill >> ====================== >> >> >> Testing: ------------------------- >> >> Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp). >> >> Timings using %timeit in ipython: >> >> Reading an ascii file with no missing values using the current version of loadtxt: >> ***10 loops, best of 3: 704 ms per loop >> >> Reading an ascii file with no missing values using the proposed changes to loadtxt: >> ***10 loops, best of 3: 802 ms per loop >> >> The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added? >> > > I haven't used loadtxt so I don't have an opinion on changing it. But > would this be faster instead of a for loop? > > vals = [(z, fill)[z is ''] for z in vals] > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > Your suggestion appears to be about 2 ms faster (but still ~100 ms slower than the unaltered loadtxt). From kwgoodman at gmail.com Thu Mar 6 16:22:02 2008 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 6 Mar 2008 13:22:02 -0800 Subject: [Numpy-discussion] loadtxt and missing values In-Reply-To: <47D05E37.7060404@ou.edu> References: <47D05E37.7060404@ou.edu> Message-ID: On Thu, Mar 6, 2008 at 1:12 PM, Sean Arms wrote: > > Keith Goodman wrote: > > On Thu, Mar 6, 2008 at 12:36 PM, wrote: > > > > > >> Proposed solution: > >> ------------------------- > >> > >> It's probably not the best way (noob, that's me), but this situation could be fixed by: > >> > >> 1) add a fill keyword to loadtxt such that > >> > >> def loadtxt(...,fill=-999): > >> > >> 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4) > >> > >> ====================== > >> for j in range(0,len(vals)): > >> if vals[j] != '': > >> pass > >> else: > >> vals[j]=fill > >> ====================== > >> > >> > >> Testing: ------------------------- > >> > >> Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp). > >> > >> Timings using %timeit in ipython: > >> > >> Reading an ascii file with no missing values using the current version of loadtxt: > >> ***10 loops, best of 3: 704 ms per loop > >> > >> Reading an ascii file with no missing values using the proposed changes to loadtxt: > >> ***10 loops, best of 3: 802 ms per loop > >> > >> The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added? > >> > > > > I haven't used loadtxt so I don't have an opinion on changing it. But > > would this be faster instead of a for loop? > > > > vals = [(z, fill)[z is ''] for z in vals] > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > Your suggestion appears to be about 2 ms faster (but still ~100 ms > slower than the unaltered loadtxt). I guess that's not enough to stop global warming. From ondrej at certik.cz Thu Mar 6 18:14:26 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Fri, 7 Mar 2008 00:14:26 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: <85b5c3130803061514j4a483554tcb6f73f0a6c587a6@mail.gmail.com> On Thu, Mar 6, 2008 at 9:48 PM, Robin wrote: > On Thu, Mar 6, 2008 at 6:15 PM, Fernando Perez wrote: > > Any student interested in this should quickly respond on the list; > > such a project would likely be co-mentored by people on the Numpy and > > Cython teams, since it is likely to require expertise from both ends. > > Hello, > > I would like to register my keen interest in applying for Numpy/Scipy > GSoC project. I am a first year PhD student in Computational > Neuroscience and my undergraduate degree was in Mathematics. > > I have been using Numpy and Scipy for my PhD work for a few months now > and have been building up to trying to contribute something to the > project - I am keen to get more substantial real world programming > experience... The projects described involving Cython definitely > interest me, although I don't yet have a sufficient understanding of > the Python C-API and Pyrex/Cython to gauge how demanding they might > be. > > As a PhD student in the UK I don't have any official summer vacation, > so I wouldn't be able to work full time on the project (I also have a > continuation report due just before the GSoC final deadline which is a > bit annoying). However I currently work 20 hours per week part time > anyway, so I'm confident that I could replace that with GSoC and still > keep up with my studies. Just a note, that the usual commitment is 40 hours/week, i.e. a full time job. See e.g.: http://wiki.python.org/moin/SummerOfCode/Expectations Ondrej From Joris.DeRidder at ster.kuleuven.be Thu Mar 6 18:13:04 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Fri, 7 Mar 2008 00:13:04 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: On 06 Mar 2008, at 19:15, Fernando Perez wrote: > http://www.cython.org/ > is an evolved version of Pyrex (which is used by numpy and scipy) with > lots of improvements. We'd like to position Cython as the preferred > way of writing most, if not all, new extension code written for numpy > and scipy, as it is easier to write, get right, debug (when you still > get it wrong) and maintain than writing to the raw Python-C API. Could you explain a bit more why you think this is the best path to follow? Pyrex is kind of a dialect, so your extension modules would be nor python nor C, but a third language. Is this indeed easier to maintain? When you would like to use legacy C code for an extension, would you rewrite it in Cython? What are Cython's advantages compared to ctypes? Cheers, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From Chris.Barker at noaa.gov Thu Mar 6 19:11:55 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 06 Mar 2008 16:11:55 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References:

Message-ID: <47D0884B.7050000@noaa.gov> I'm not a pyrex/Cython expert, but.... Joris De Ridder wrote: > Pyrex is kind of a dialect, so your extension modules would be nor > python nor C, but a third language. correct. > Is this indeed easier to maintain? yes, because while you can write C extensions in C, you need to use the quite complex Python/C api, and get all sorts of things like reference counting, etc right too -- that is hard. Also, with Cython, you can quite easily mix Python and C in one place, so you truly only need to put the performance intensive bits in Cython specific code. > When you would like to use legacy C code for an extension, would you > rewrite it in Cython? no -- you can call regular old C from Cython, so you can use it to write wrappers, too. > What are Cython's advantages compared to ctypes? for ctypes, you also avoid the wrapping code, but your C code needs to be compiled as a library, and can't use python types directly, which is more limiting. I think Cython is easier for someone not very experienced in C, and no harder for someone who is. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From eads at soe.ucsc.edu Thu Mar 6 22:54:05 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Thu, 06 Mar 2008 20:54:05 -0700 Subject: [Numpy-discussion] calculating weighted majority using two 3D arrays In-Reply-To: <451453C181B199458A55B2B1723FAC00A97FCE@SAGE.forestry.oregonstate.edu> References: <451453C181B199458A55B2B1723FAC00A97FCD@SAGE.forestry.oregonstate.edu> <451453C181B199458A55B2B1723FAC00A97FCE@SAGE.forestry.oregonstate.edu> Message-ID: <47D0BC5D.4010300@soe.ucsc.edu> Hi Gregory, Gregory, Matthew wrote: > Eads, Damian wrote: >> You may need to be a bit more specific by what you mean by >> weighted majority. What are the range of values for values >> and weights, specifically? This sounds a lot like pixel >> classification where each pixel is classified with a majority >> vote over its weights and values. Is that what you're trying to do? >> >> Many numpy functions (e.g. mean, max, min, sum) have an axis >> parameter, which specifies the axis along which the statistic >> is computed. Omitting the axis parameter causes the statistic >> to be computed over all values in the multidimensional array. >> >> Suppose the 'values' array contains floating point numbers in >> the range >> -1 to 1 and a larger absolute value gives a larger >> confidence. Also suppose the weights are floating point >> numbers between 0 and 1. The weighted majority vote for pixel >> i,j over 10 real-valued (confidenced) votes, each vote having >> a separate weight, is computed by >> >> w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum()) >> >> This can be vectorized to give a weighted majority vote for >> each pixel by doing >> >> w_vote = numpy.sign((values*weights).sum(axis=0)) >> >> The values*weights expression gives a weighted prediction. >> This also works if the 'values' are just predictions from the >> set {-1, 1}, i.e. >> there are ten classifiers, each one predicts either -1 and 1 >> on each pixel. > > Damian, thank you for the helpful response. I should have been a bit > more explicit about what I meant by weighted majority. In my case, I > need to find a discrete value (i.e. class) that occurs most often among > ten observations where weighting is pre-determined by an > inverse-distance calculation. Ignoring for a moment the > multidimensionality issue, my values and weights arrays might look like > this: > > values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10]) > weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2]) > > My function to calculate the majority looks like this: > > def weightedMajority(a, b): > > # Put all the samples into a dictionary with weights summed for > # duplicate values > wDict = {} > for i in xrange(len(a)): > (value, weight) = (a[i], b[i]) > > if wDict.has_key(value): > wDict[value] += weight > else: > wDict[value] = weight > > # Create arrays of the values and weights > values = numpy.array(wDict.keys()) > weights = numpy.array(wDict.values()) > > # Return the index of the maximum value > index = numpy.argmax(weights) > > # Return the majority value > return values[index] Hi Matthew, Keep in mind that 'for' loops are inefficient in python. This is less worrisome when the input data sets are small. However, for larger data sets, one must exercise a bit more care when using Python 'for' loops. There is a lot of overhead for each iteration. I would advise looping over the class labels, rather than the examples since the number of class labels is in most cases significantly fewer than the number of examples. def weighted_majority(values, weights): # The number of different kinds of values. kinds = numpy.unique(values) # The weight sums of the values. weight_sums = numpy.zeros((len(kinds),)) # Loop over each different kind of value. for i in xrange(0, len(kinds)): # Grab the i'th kind of value kind = kinds[i] # Create a mask for the values of that kind. kind_mask = values == kind # Sum up the weights corresponding to the masked values. weight_sums[i] += weights[kind_mask].sum() #end for # Return the kind label with the largest weight sum. return kinds[weight_sums.argmax()] The code above should also generalize to multidimensional arrays since the kind_mask matches the dimensionality of both the 'values' and 'weights' variables. A caveat: I have not extensively tested this code but it looks correct. > > In the above example: > >>> maj = weightedMajority(values, weights) >>> maj > 19 > > Correct me if I'm wrong, but I don't think that your example will work > when I am looking to return a discrete value from the values set, but > you may see something that I'm doing that is truly inefficient! If your predictions come from a set of nominal values (or class labels) where order has no meaning among the class labels, and there are more than two kinds of labels (or prediction values) then you are correct, my example from the earlier post will not work. It only works for binary prediction values with or without confidence ratings. Damian From tim.hochberg at ieee.org Thu Mar 6 23:06:57 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Thu, 6 Mar 2008 21:06:57 -0700 Subject: [Numpy-discussion] calculating weighted majority using two 3D arrays In-Reply-To: <451453C181B199458A55B2B1723FAC00A97FCE@SAGE.forestry.oregonstate.edu> References: <451453C181B199458A55B2B1723FAC00A97FCD@SAGE.forestry.oregonstate.edu> <451453C181B199458A55B2B1723FAC00A97FCE@SAGE.forestry.oregonstate.edu> Message-ID: On Thu, Mar 6, 2008 at 11:37 AM, Gregory, Matthew < matt.gregory at oregonstate.edu> wrote: > Eads, Damian wrote: > > You may need to be a bit more specific by what you mean by > > weighted majority. What are the range of values for values > > and weights, specifically? This sounds a lot like pixel > > classification where each pixel is classified with a majority > > vote over its weights and values. Is that what you're trying to do? > > > > Many numpy functions (e.g. mean, max, min, sum) have an axis > > parameter, which specifies the axis along which the statistic > > is computed. Omitting the axis parameter causes the statistic > > to be computed over all values in the multidimensional array. > > > > Suppose the 'values' array contains floating point numbers in > > the range > > -1 to 1 and a larger absolute value gives a larger > > confidence. Also suppose the weights are floating point > > numbers between 0 and 1. The weighted majority vote for pixel > > i,j over 10 real-valued (confidenced) votes, each vote having > > a separate weight, is computed by > > > > w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum()) > > > > This can be vectorized to give a weighted majority vote for > > each pixel by doing > > > > w_vote = numpy.sign((values*weights).sum(axis=0)) > > > > The values*weights expression gives a weighted prediction. > > This also works if the 'values' are just predictions from the > > set {-1, 1}, i.e. > > there are ten classifiers, each one predicts either -1 and 1 > > on each pixel. > > Damian, thank you for the helpful response. I should have been a bit > more explicit about what I meant by weighted majority. In my case, I > need to find a discrete value (i.e. class) that occurs most often among > ten observations where weighting is pre-determined by an > inverse-distance calculation. Ignoring for a moment the > multidimensionality issue, my values and weights arrays might look like > this: > > values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10]) > weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2]) > > My function to calculate the majority looks like this: > > def weightedMajority(a, b): > > # Put all the samples into a dictionary with weights summed for > # duplicate values > wDict = {} > for i in xrange(len(a)): > (value, weight) = (a[i], b[i]) > > if wDict.has_key(value): > wDict[value] += weight > else: > wDict[value] = weight > > # Create arrays of the values and weights > values = numpy.array(wDict.keys()) > weights = numpy.array(wDict.values()) > > # Return the index of the maximum value > index = numpy.argmax(weights) > > # Return the majority value > return values[index] > > In the above example: > > >> maj = weightedMajority(values, weights) > >> maj > 19 > [SNIP] If your values are integers in a reasonably small range, then you might want to use an array to hold your weights as it makes things simpler and likely faster. For example: from itertools import izip def weightedMajority2(a, b): wMap = np.zeros(256, float) # assume all values fall in [0,255] for value, weight in izip(a, b): wMap[value] += weight return numpy.argmax(wMap) Regards, -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Fri Mar 7 03:59:30 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Mar 2008 00:59:30 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <200803061329.04444.pgmdevlist@gmail.com> References: <200803061329.04444.pgmdevlist@gmail.com> Message-ID: Hi Pierre, On Thu, Mar 6, 2008 at 10:29 AM, Pierre GM wrote: > On Thursday 06 March 2008 13:15:27 Fernando Perez wrote: > > - Rewriting the existing ndarray subclasses that ship with numpy, such > > as record arrays, in cython. In doing this, benchmarks of the > > relative performance of the new code should be obtained. > > Fernando, > I remember having huge difficulties trying to implement ndarray subclasses in > vanilla Pyrex, to the extent that I gave up that approach. Does it work > better in Cython (I haven't tried it yet) ? I doubt it's much better, and that's part of the point of the project: to identify the problems and fix them once and for all. Getting anything fixed in pyrex was hard due to a very opaque development process, but Cython is part of the Sage umbrella and thus enjoys a very open and active development community. Furthermore, they are explicitly interested in improving the Cython numpy support, and are willing to help along if this project goes forward. cheers f From fperez.net at gmail.com Fri Mar 7 04:02:37 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Mar 2008 01:02:37 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References:

Message-ID: On Thu, Mar 6, 2008 at 3:13 PM, Joris De Ridder wrote: > > On 06 Mar 2008, at 19:15, Fernando Perez wrote: > > > http://www.cython.org/ > > is an evolved version of Pyrex (which is used by numpy and scipy) with > > lots of improvements. We'd like to position Cython as the preferred > > way of writing most, if not all, new extension code written for numpy > > and scipy, as it is easier to write, get right, debug (when you still > > get it wrong) and maintain than writing to the raw Python-C API. > > > Could you explain a bit more why you think this is the best path to > follow? > Pyrex is kind of a dialect, so your extension modules would be nor > python nor C, but a third language. Is this indeed easier to maintain? > When you would like to use legacy C code for an extension, would you > rewrite it in Cython? What are Cython's advantages compared to ctypes? Chris B gave what I think is a good reply to this, but feel free to ask if you have further questions. I think it's important that we reach some consensus on why this a good idea on technical grounds without anyone feeling like the decision is made opaquely in some back room, so please raise any doubts or concerns you may still have, and we'll do our best to address them. Cheers f From fperez.net at gmail.com Fri Mar 7 04:06:45 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Mar 2008 01:06:45 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: Hi Robin, On Thu, Mar 6, 2008 at 12:48 PM, Robin wrote: > On Thu, Mar 6, 2008 at 6:15 PM, Fernando Perez wrote: > > Any student interested in this should quickly respond on the list; > > such a project would likely be co-mentored by people on the Numpy and > > Cython teams, since it is likely to require expertise from both ends. > > Hello, > > I would like to register my keen interest in applying for Numpy/Scipy > GSoC project. I am a first year PhD student in Computational > Neuroscience and my undergraduate degree was in Mathematics. > > I have been using Numpy and Scipy for my PhD work for a few months now > and have been building up to trying to contribute something to the > project - I am keen to get more substantial real world programming > experience... The projects described involving Cython definitely > interest me, although I don't yet have a sufficient understanding of > the Python C-API and Pyrex/Cython to gauge how demanding they might > be. > > As a PhD student in the UK I don't have any official summer vacation, > so I wouldn't be able to work full time on the project (I also have a > continuation report due just before the GSoC final deadline which is a > bit annoying). However I currently work 20 hours per week part time > anyway, so I'm confident that I could replace that with GSoC and still > keep up with my studies. > > I would be keen to chat with someone (perhaps on the IRC channel) > about whether my existing programming experience and availability > would allow me to have a proper crack at this. > > I understand that first organisations apply (deadline 12th March) with > some suggested projects, and then towards the end of the month > students can apply to accepted organisations, either for the suggested > project or their own ideas. I'd love to see Numpy/Scipy apply as an > organisation with these projects (and perhaps some others) so that > interested students like myself can apply. As Ondrej pointed out, the expectation is a full-time commitment to the project. Other than that it sounds like you might be able to participate, and it's worth noting that this being open source, if you just have some free time and would like to get involved with an interesting project, by all means pitch in. Even if someone picks up an 'official' project, there's plenty to be done on the cython/numpy front for more than one person. Perhaps it's not out of place to mention that many people have made solid contributions for years to open source projects without monetary compensation, and still see value in the activity. If you can spend the time on it, you may still find many rewards out of the work. Cheers, f From konrad.hinsen at laposte.net Fri Mar 7 04:17:39 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Fri, 7 Mar 2008 10:17:39 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: <200803061329.04444.pgmdevlist@gmail.com> Message-ID: <1BA9780C-CBB7-4E36-9DA7-71FB834BE9CD@laposte.net> On 07.03.2008, at 09:59, Fernando Perez wrote: > I doubt it's much better, and that's part of the point of the project: > to identify the problems and fix them once and for all. Getting > anything fixed in pyrex was hard due to a very opaque development > process, but Cython is part of the Sage umbrella and thus enjoys a > very open and active development community. Furthermore, they are > explicitly interested in improving the Cython numpy support, and are > willing to help along if this project goes forward. This is very good news in my opinion. Pyrex and Cython are already very useful tools for scientific computing. They lower the barrier to writing extension modules significantly (compared to writing directly in C), and they permit a continuous transition from a working Python prototype to an efficient extension module. I have been writing all my recent extension modules using Pyrex, and I definitely won't go back to C. If Cython gets explicit array support, it would become an even more useful tool for the NumPy community. Konrad. From fperez.net at gmail.com Fri Mar 7 04:36:40 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Mar 2008 01:36:40 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <1BA9780C-CBB7-4E36-9DA7-71FB834BE9CD@laposte.net> References: <200803061329.04444.pgmdevlist@gmail.com> <1BA9780C-CBB7-4E36-9DA7-71FB834BE9CD@laposte.net> Message-ID: On Fri, Mar 7, 2008 at 1:17 AM, Konrad Hinsen wrote: > On 07.03.2008, at 09:59, Fernando Perez wrote: > > > I doubt it's much better, and that's part of the point of the project: > > to identify the problems and fix them once and for all. Getting > > anything fixed in pyrex was hard due to a very opaque development > > process, but Cython is part of the Sage umbrella and thus enjoys a > > very open and active development community. Furthermore, they are > > explicitly interested in improving the Cython numpy support, and are > > willing to help along if this project goes forward. > > This is very good news in my opinion. Pyrex and Cython are already > very useful tools for scientific computing. They lower the barrier to > writing extension modules significantly (compared to writing directly > in C), and they permit a continuous transition from a working Python > prototype to an efficient extension module. I have been writing all > my recent extension modules using Pyrex, and I definitely won't go > back to C. If Cython gets explicit array support, it would become an > even more useful tool for the NumPy community. Thanks for your feedback and support of the idea, Konrad. I just realized that I forgot to include this message that W. Stein (sage lead) sent me, which I think presents many of these points very nicely and may be useful in this discussion. cheers f ---------- Forwarded message ---------- From: Dag Sverre Seljebotn Date: Tue, Mar 4, 2008 at 2:54 PM Subject: [Cython] Thoughts on numerical computing/NumPy support To: cython-dev at codespeak.net Since Robert mentioned NumPy in relation with adding operator support I thought about sharing my more thoughts about NumPy - I'm very new to Cython so I guess take it for what it is worth - however what I've seen so far looks so promising for me that I might want to spend some time in a few months working on implementing some of this, which perhaps may make my thoughts more intereseting :-) Currently, Cython is mostly geared towards wrapping C code, but it is also an excellent foundation for being a numerical tool - but the rough edges are still prohibitive. A few relatively small steps (in terms of man-hours needed) would improve the situation a lot I think - not perfect, but perhaps in a few years we can have something that will finally kill FORTRAN :-) Three suggestions comes briefly here, if anyone's interested and it is not already discussed and decided I might flesh them out in "PEP-style" in the coming month? Note that a) is what is important for me, b) and c) is just something I throw along... a) numpy.ndarray syntax candy. Really, what one should implement is syntax support for PEP-3118: http://www.python.org/dev/peps/pep-3118/ Because this protocol will be shared between NumPy, PIL etc. in Python 3 it could make sense to simply have "native"/hard-coded support for this aspect without necesarrily making it a generic operator feature, and one can then use the same approach as will be needed for buffers in Python 3 for NumPy in Python 2? Example (where "array" is considered a new, Cython-native type that will have automatic conversion from any NumPy arrays and Python 3 buffers): def myfunc(array<2, unsigned char> arr): arr[4, 5] = 10 might be translated to the equivalent of the currently legal: def myfunc(numpy.ndarray arr): if arr.nd != 2 or arr.dtype != numpy.dtype(numpy.uint8): raise ValueError("Must pass 2-dimensional uint8 array.") cdef unsigned char* arr_buf = arr.data arr.data[4 * arr.strides[0] + 5 * arr.strides[1]] = 10 (Probably caching the strides in local variables etc.). That should do as a first implementation -- it is always possible to be more sophisticated, but this little will allow NumPyers to simply dive in. Specifically, the number of dimensions must be declared first and only direct access in that many dimensions are allowed. Slices etc. should be less important (they can be done on the Python object instead). Moving on from here, one should probably instead define bufferinfo from PEP-3118 and make it say def myfunc(bufferinfo arr): if arr.ndim != 2 or arr.format != "B") or arr.readonly: raise ValueError("Must pass writeable 2-dimensional buffer with format 'B'.") ... with automatic conversion from NumPy arrays to bufferinfo. b) Allow numpy types? Basically, make it possible to say "cdef uint8 myvar", at least for in-function-variables that is not interfacing with C code, so that for numerical use one doesn't need to learn C. This can be in addition, so it should not break existing code, though I can understand resentment against the idea as well. c) Probably controversial: More Pythonic syntax. A syntax for decoration of function arguments is decided upon (at least in Python 3), so to align with that one could allow for stuff like @Compile def myfunc(a: uint8, b: array(2, uint8), c: int = 10): d: ptr(int) = &a print a, b, c, d Which is "almost" Python - only the definition of d is different, but consistency talks for change there as well. This can also be in addition to the existing syntax so it should not break anything (allowing, say, only one type of syntax per function). But a) is what is interesting here... -- Dag Sverre From giorgio at gilestro.tk Fri Mar 7 09:56:59 2008 From: giorgio at gilestro.tk (Giorgio F. Gilestro) Date: Fri, 07 Mar 2008 08:56:59 -0600 Subject: [Numpy-discussion] behavior of masked arrays Message-ID: <47D157BB.90003@gilestro.tk> Hi Everybody, I have some arrays that sometimes need to have some of their values masked away or, simply said, not considered during manipulation. I tried to fulfill my purposes using both NaNs and MaskedArray but neither of them really helped completely. Let's give an example: from numpy import * import scipy a = array(arange(40).reshape(5,8), dtype=float32) b = array(arange(40,80).reshape(5,8), dtype=float32) a[1,1] = NaN tt, ttp = scipy.stats.ttest_ind(a,b,axis=0) c = numpy.ma.masked_array(a, mask=isnan(a)) tt1, ttp1 = scipy.stats.ttest_ind(c,b,axis=0) print (ttp == ttp1).all() will return True. My understanding is that only a few functions will be able to properly use MA during execution. Is this correct or am I missing something here? Thanks From pgmdevlist at gmail.com Fri Mar 7 10:37:04 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 7 Mar 2008 10:37:04 -0500 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <47D157BB.90003@gilestro.tk> References: <47D157BB.90003@gilestro.tk> Message-ID: <200803071037.05524.pgmdevlist@gmail.com> On Friday 07 March 2008 09:56:59 Giorgio F. Gilestro wrote: > Hi Everybody, > My understanding is that only a few functions will be able to properly > use MA during execution. Is this correct or am I missing something here? Giogio, You're right: there's no full support of masked arrays in Scipy yet. I ported some functions I needed for my own research, you'll find them in numpy.ma.mstast and numpy.ma.morestats, but many, many more are missing. In your particular example, masked arrays are simply/silently converted to regular ndarray with the internal use of numpy.asarray in _chk2_asarray. Therefore, you're losing your mask... You have several options: 1. Rewrite the function(s) you need to make sure masked arrays are properly handled. In your case, that'd mean rewriting _chk2_asarray to use numpy.asanyarray instead of numpy.asarray, and using the numpy.ma functions instead of their numpy.counterparts (that last step might not be necessary, but we need to check that). 2. Don't use masked arrays, but compressed arrays, that is, arrays where the missing values have been discarded with a.compressed(). That way, you have ndarrays that are processed properly. In your case, that'd imply to define a common mask for your samples, select the rows/columns depending on you axis, and apply ttest_ind on each compressed row/column. Of course, the #1 solution sounds like the best for the community. On a side note: * That particular function (ttest_ind) uses mean and var as functions: I'm sure it'd be better to use the corresponding methods, that way masked arrays could be taken into account more easily. From Joris.DeRidder at ster.kuleuven.be Fri Mar 7 11:10:26 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Fri, 7 Mar 2008 17:10:26 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea Message-ID: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> On 07 Mar 2008, at 10:02, Fernando Perez wrote: > Chris B gave what I think is a good reply to this, but feel free to > ask if you have further questions. I think it's important that we > reach some consensus on why this a good idea on technical grounds > without anyone feeling like the decision is made opaquely in some back > room, so please raise any doubts or concerns you may still have, and > we'll do our best to address them. Thanks. I've a few questions concerning the objections against ctypes. It's part of the Python standard library, brand new from v2.5, and it allows creating extensions. Disregarding it, requires therefore good arguments, I think. I trust you that there are, but I would like to understand them better. For ctypes your extensions needs to be compiled as a shared library, but as numpy is moving towards Scons which seem to facilitate this quite a lot, is this still a difficulty/ objection? Secondly, looking at the examples given by Travis in his Numpy Book, neither pyrex nor ctypes seem to be particularly user- friendly concerning Numpy ndarrays (although ctypes does seem slightly easier). From your email, I understand it's possibly to mediate this for Cython. From a technical point of view, would it also be possible to make ctypes work better with Numpy, and if yes, do you have any idea whether it would be more or less work than for Cython? Cheers, Joris P.S. I had some problems with bounces, sorry if this message appears more than once. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From robince at gmail.com Fri Mar 7 11:36:39 2008 From: robince at gmail.com (Robin) Date: Fri, 7 Mar 2008 16:36:39 +0000 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: On Fri, Mar 7, 2008 at 9:06 AM, Fernando Perez wrote: > Hi Robin, > > As Ondrej pointed out, the expectation is a full-time commitment to > the project. Other than that it sounds like you might be able to > participate, and it's worth noting that this being open source, if you > just have some free time and would like to get involved with an > interesting project, by all means pitch in. Even if someone picks up > an 'official' project, there's plenty to be done on the cython/numpy > front for more than one person. > > Perhaps it's not out of place to mention that many people have made > solid contributions for years to open source projects without monetary > compensation, and still see value in the activity. If you can spend > the time on it, you may still find many rewards out of the work. Thanks, I hadn't seen the link Ondrej provided, although the 40 hour week seems to be a Python/PSF requirement. Prior to posting I had checked the Google information, where they say the time commitment depends on both the scope of your project and the requirements of your mentoring organisation. They also say they have had successful applicants in previous years from full-time students at non-US universities (who don't get a summer break), so I thought it might be possible for me to be considered. I also asked in #gsoc where I was advised 20 hours per week would be a good baseline, again depending on the project. Of course, I hope to contribute to Numpy/Scipy anyway - but this scheme would be a great way to kick-start that. I look forward to seeing Numpy/Scipy accepted as a mentor organisation this year anyway, even if I am unable to take part. Cheers, Robin From giorgio at gilestro.tk Fri Mar 7 12:25:13 2008 From: giorgio at gilestro.tk (Giorgio F. Gilestro) Date: Fri, 07 Mar 2008 11:25:13 -0600 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <200803071037.05524.pgmdevlist@gmail.com> References: <47D157BB.90003@gilestro.tk> <200803071037.05524.pgmdevlist@gmail.com> Message-ID: <47D17A79.1070103@gilestro.tk> Ok, I see, thank you Pierre. I thought scipy.stats would have been a widely used extension so I didn't really consider the trivial possibility that simply wasn't compatible with ma yet. I had a quick look at the code and it really seems that ma handling can be achieved by replacing np.asarray with np.ma.asarray, and some functions with their methods (like ravel) here and there. Yet, I just saw here http://scipy.org/scipy/scipy/wiki/StatisticsReview that April and May are going to be StatisticsReview month so I don't think it is a good idea to go on and fix things myself now :-) I think I will go through here http://scipy.org/scipy/scipy/query?status=new&status=assigned&status=reopened&milestone=Statistics+Review+Months&order=priority and see what I can do. Thanks Pierre GM wrote: > On Friday 07 March 2008 09:56:59 Giorgio F. Gilestro wrote: > >> Hi Everybody, >> > > >> My understanding is that only a few functions will be able to properly >> use MA during execution. Is this correct or am I missing something here? >> > > Giogio, > You're right: there's no full support of masked arrays in Scipy yet. I ported > some functions I needed for my own research, you'll find them in > numpy.ma.mstast and numpy.ma.morestats, but many, many more are missing. > > In your particular example, masked arrays are simply/silently converted to > regular ndarray with the internal use of numpy.asarray in _chk2_asarray. > Therefore, you're losing your mask... > > You have several options: > 1. Rewrite the function(s) you need to make sure masked arrays are properly > handled. In your case, that'd mean rewriting _chk2_asarray to use > numpy.asanyarray instead of numpy.asarray, and using the numpy.ma functions > instead of their numpy.counterparts (that last step might not be necessary, > but we need to check that). > > 2. Don't use masked arrays, but compressed arrays, that is, arrays where the > missing values have been discarded with a.compressed(). That way, you have > ndarrays that are processed properly. > In your case, that'd imply to define a common mask for your samples, select > the rows/columns depending on you axis, and apply ttest_ind on each > compressed row/column. > > Of course, the #1 solution sounds like the best for the community. > > On a side note: > * That particular function (ttest_ind) uses mean and var as functions: I'm > sure it'd be better to use the corresponding methods, that way masked arrays > could be taken into account more easily. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- giorgio at gilestro.tk http://www.cafelamarck.it From oliphant at enthought.com Fri Mar 7 12:32:30 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 07 Mar 2008 11:32:30 -0600 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <47D157BB.90003@gilestro.tk> References: <47D157BB.90003@gilestro.tk> Message-ID: <47D17C2E.3090407@enthought.com> Giorgio F. Gilestro wrote: > Hi Everybody, > I have some arrays that sometimes need to have some of their values > masked away or, simply said, not considered during manipulation. > I tried to fulfill my purposes using both NaNs and MaskedArray but > neither of them really helped completely. > > Let's give an example: > > from numpy import * > import scipy > > a = array(arange(40).reshape(5,8), dtype=float32) > b = array(arange(40,80).reshape(5,8), dtype=float32) > a[1,1] = NaN > > tt, ttp = scipy.stats.ttest_ind(a,b,axis=0) > > c = numpy.ma.masked_array(a, mask=isnan(a)) > tt1, ttp1 = scipy.stats.ttest_ind(c,b,axis=0) > > print (ttp == ttp1).all() > > will return True. > > My understanding is that only a few functions will be able to properly > use MA during execution. Is this correct or am I missing something here? > Yes, that is correct. A function that supports masked arrays natively requires that it be understood from the beginning. The concept of a masked array is not understood by most of the functions that NumPy and SciPy provide. There is a price to be paid for checking on the validity of the data for every function and so people differ on whether or not there *should* be support for masked arrays on a very low level. I support the concept of separate masked-array functions which do not penalize non masked array functions significantly (perhaps Generic functions can help us here so that the interface to the user is the same, but the underlying function called is different depending on whether or not the array is masked. As long as this is done per array and not per element it is usually not significant. -Travis O. > Thanks > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From pgmdevlist at gmail.com Fri Mar 7 12:37:55 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 7 Mar 2008 12:37:55 -0500 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <47D17A79.1070103@gilestro.tk> References: <47D157BB.90003@gilestro.tk> <200803071037.05524.pgmdevlist@gmail.com> <47D17A79.1070103@gilestro.tk> Message-ID: <200803071237.57208.pgmdevlist@gmail.com> On Friday 07 March 2008 12:25:13 Giorgio F. Gilestro wrote: > Ok, I see, thank you Pierre. > I thought scipy.stats would have been a widely used extension so I > didn't really consider the trivial possibility that simply wasn't > compatible with ma yet. Partly my fault here, as I should have ported more functions. Blame the fact that working on an open-source project doesn't translate in publications, and that my bosses are shortening the leash.... Note that most (all?) of the functions in scipy.stats never supported masked arrays in the first place anyway. Now that MaskedArray is just a subclass of ndarray, porting the functions should be easier. > I had a quick look at the code and it really seems that ma handling can > be achieved by replacing np.asarray with np.ma.asarray, and some > functions with their methods (like ravel) here and there. Yes and no. I'd prefer to use numpy.asanyarray as to avoid converting ndarrays to masked arrays, and use methods as much as possible. Of course, there's gonna be some particular cases to handle (as when all the data are masked), but that should be relatively painless. Another issue is where to store the new functions: should we try to ensure full compatibility of scipy.stats with masked arrays? Create a new module scipy.mstats instead, that we'd fill up with time ? I'd be keener on the second approach, as we could move most of the functions currently in numpy.ma.m(ore)stats to this new module, and that'd probably less work at once... From Chris.Barker at noaa.gov Fri Mar 7 12:50:24 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 07 Mar 2008 09:50:24 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> Message-ID: <47D18060.2080300@noaa.gov> Joris De Ridder wrote: > Thanks. I've a few questions concerning the objections against ctypes. It's not so much an abjection (I think), but the fact that pyrex/Cython really are different beasts, with different goals. > For ctypes your extensions needs to be > compiled as a shared library, The compiling isn't the key issue -- you're right, that's not too big a deal, and Scons helps. If your goal is primarily to wrap existing C code, then ctypes is a good option. But if you are trying to write new code as extension modules, then Cython helps with that a lot. You do need to "get" C, but you don't actually have to write functional stand-alone C code. > neither pyrex nor ctypes seem to be particularly user- > friendly concerning Numpy ndarrays True, though it looks like one of the goals of Cython is to make it more user-friendly to numpy arrays -- I'm really looking forward to that. I suppose an example might be in order here - does anyone have a small, but not trivial, example of an extension that could be done with both Ctypes and Cython that we could examine? By the way, I know Greg Ewing was asked about better support for numpy arrays in Pyrex, and he said "I'm *definitely* not going to re-implement C++ templates!" -- is there talk of creating a way to write extensions that could operate on numpy arrays of arbitrary type with Cython? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From william.ratcliff at gmail.com Fri Mar 7 14:41:22 2008 From: william.ratcliff at gmail.com (william ratcliff) Date: Fri, 7 Mar 2008 14:41:22 -0500 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <47D18060.2080300@noaa.gov> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> <47D18060.2080300@noaa.gov> Message-ID: <827183970803071141n4f7e683fod4bde43040ed649@mail.gmail.com> Will Cython be compatible with OpenMP? I tried with weave some time back and failed miserably. Has anyone tried with ctypes? Cheers, William On Fri, Mar 7, 2008 at 12:50 PM, Christopher Barker wrote: > Joris De Ridder wrote: > > Thanks. I've a few questions concerning the objections against ctypes. > > It's not so much an abjection (I think), but the fact that pyrex/Cython > really are different beasts, with different goals. > > > For ctypes your extensions needs to be > > compiled as a shared library, > > The compiling isn't the key issue -- you're right, that's not too big a > deal, and Scons helps. > > If your goal is primarily to wrap existing C code, then ctypes is a good > option. But if you are trying to write new code as extension modules, > then Cython helps with that a lot. You do need to "get" C, but you don't > actually have to write functional stand-alone C code. > > > neither pyrex nor ctypes seem to be particularly user- > > friendly concerning Numpy ndarrays > > True, though it looks like one of the goals of Cython is to make it more > user-friendly to numpy arrays -- I'm really looking forward to that. > > I suppose an example might be in order here - does anyone have a small, > but not trivial, example of an extension that could be done with both > Ctypes and Cython that we could examine? > > By the way, I know Greg Ewing was asked about better support for numpy > arrays in Pyrex, and he said "I'm *definitely* not going to > re-implement C++ templates!" -- is there talk of creating a way to write > extensions that could operate on numpy arrays of arbitrary type with > Cython? > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wright at esrf.fr Fri Mar 7 14:43:23 2008 From: wright at esrf.fr (Jon Wright) Date: Fri, 07 Mar 2008 20:43:23 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <47D18060.2080300@noaa.gov> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> <47D18060.2080300@noaa.gov> Message-ID: <47D19ADB.8000005@esrf.fr> Christopher Barker wrote: > By the way, I know Greg Ewing was asked about better support for numpy > arrays in Pyrex, and he said "I'm *definitely* not going to > re-implement C++ templates!" -- is there talk of creating a way to write > extensions that could operate on numpy arrays of arbitrary type with Cython? Don't forget that one of the advantages of having data type information is that you can choose an algorithm accordingly. For example, large arrays of the smaller integer types can be efficiently sorted using histograms. The idea separating the algorithm from the datatype means that (ultra-fast-optimised) things like blas, fftw etc become quite hard to program. This is straying far from the discussion of a summer of code project, which seemed like a great idea. Jon From david at ar.media.kyoto-u.ac.jp Fri Mar 7 22:36:14 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 08 Mar 2008 12:36:14 +0900 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> Message-ID: <47D209AE.4010100@ar.media.kyoto-u.ac.jp> Joris De Ridder wrote: > > Thanks. I've a few questions concerning the objections against ctypes. > It's part of the Python standard library, brand new from v2.5, and it > allows creating extensions. Disregarding it, requires therefore good > arguments, I think. I trust you that there are, but I would like to > understand them better. For ctypes your extensions needs to be > compiled as a shared library, but as numpy is moving towards Scons > Please note that a full move toward scons is not likely to happen soon. It will have to be the default build system for both numpy and scipy, unless someone hacks a ctypes-based extension builder (e.g dynamically loaded library builder) for distutils. Another issue is the detection of a 3rd party library to be usable by ctypes: this should be easy to do in distutils, and not too difficult for scons. cheers, David From fperez.net at gmail.com Sat Mar 8 04:43:00 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 8 Mar 2008 01:43:00 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <827183970803071141n4f7e683fod4bde43040ed649@mail.gmail.com> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> <47D18060.2080300@noaa.gov> <827183970803071141n4f7e683fod4bde43040ed649@mail.gmail.com> Message-ID: On Fri, Mar 7, 2008 at 11:41 AM, william ratcliff wrote: > Will Cython be compatible with OpenMP? I tried with weave some time back > and failed miserably. Has anyone tried with ctypes? As far as I know cython has no explicit OpenMP support, but it *may* be possible to get it to generate the proper directives, using similar tricks to those that C++ wrapping uses: http://wiki.cython.org/WrappingCPlusPlus Note that this is just an idea, I haven't actually tried to do it. cheers f From fperez.net at gmail.com Sat Mar 8 05:06:58 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 8 Mar 2008 02:06:58 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: Message-ID: Hi Robin, On Fri, Mar 7, 2008 at 8:36 AM, Robin wrote: > I hadn't seen the link Ondrej provided, although the 40 hour week > seems to be a Python/PSF requirement. Prior to posting I had checked > the Google information, where they say the time commitment depends on > both the scope of your project and the requirements of your mentoring > organisation. They also say they have had successful applicants in > previous years from full-time students at non-US universities (who > don't get a summer break), so I thought it might be possible for me to > be considered. I also asked in #gsoc where I was advised 20 hours per > week would be a good baseline, again depending on the project. > > Of course, I hope to contribute to Numpy/Scipy anyway - but this > scheme would be a great way to kick-start that. > > I look forward to seeing Numpy/Scipy accepted as a mentor organisation > this year anyway, even if I am unable to take part. I don't want to mislead anyone because I'm not directly involved with the actual mentoring, so forgive any confusion I may have caused. My current understanding is that we just don't have the time and resources right now for numpy/scipy to be a separate mentor organization, and thus we'd go in under the PSF umbrella. In that case, we'd probably be bound to the PSF guidelines, I imagine. I offered to get the ball rolling on the cython idea because time is tight and at the Sage/Scipy meeting there was lot of interest on this topic from everyone present. But the actual mentoring will need to come from others who are much more directly involved with cython and numpy at the C API level than myself, so I'll try not to answer anything too specifically on that front to avoid spreading misinformation. Cheers, f From fperez.net at gmail.com Sat Mar 8 05:15:17 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 8 Mar 2008 02:15:17 -0800 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> Message-ID: Hi Joris, On Fri, Mar 7, 2008 at 8:10 AM, Joris De Ridder wrote: > Thanks. I've a few questions concerning the objections against ctypes. > It's part of the Python standard library, brand new from v2.5, and it > allows creating extensions. Disregarding it, requires therefore good > arguments, I think. I trust you that there are, but I would like to > understand them better. For ctypes your extensions needs to be > compiled as a shared library, but as numpy is moving towards Scons > which seem to facilitate this quite a lot, is this still a difficulty/ > objection? Secondly, looking at the examples given by Travis in his > Numpy Book, neither pyrex nor ctypes seem to be particularly user- > friendly concerning Numpy ndarrays (although ctypes does seem slightly > easier). From your email, I understand it's possibly to mediate this > for Cython. From a technical point of view, would it also be possible > to make ctypes work better with Numpy, and if yes, do you have any > idea whether it would be more or less work than for Cython? As Chris B. said, I also think that ctypes and cython are simply different, complementary tools. Cython allows you to create complete functions that can potentially run at C speed entirely, by letting you bypass some of the more dynamic (but expensive) features of python, while retaining a python-like sytnax and having to learn a lot less of the Python C API. Ctypes is pure python, so while you can access arbitrary shared libraries, it won't help you one bit if you need to write new looping code and the execution speed in pure python isn't enough. At that point if ctypes is your only tool, you'd need to write a pure C library (to the pure Python C API, with manual memory management included) and access it via ctypes. The point we're trying to reach is one where most of the extension code for numpy is Cython, to improve its long-term maintainability, to make it easier for non-experts in the C API to contribute 'low level' tools, and to open up future possibilities for fast code generation. I don't want to steal Travis' thunder, but I've heard him make some very interesting comments about his long term ideas for novel tools to express high-level routines in python/cython into highly efficient low-level representations, in a way that code written explicitly to the python C API may well make very difficult. I hope this (Travis' ideas teaser and all :) provides some better perspective on the recent enthusiasm regarding cython, as a tool complementary to ctypes that could greatly benefit numpy and scipy. If it doesn't it just means I did a poor job of communicating, so keep on asking. We all really want to make sure that this is something where we reach technical consensus; the fact that Sage has been so successful with this approach is a very strong argument in favor (and they've done LOTS of non-trivial work on cython to further their goal), but we still need to ensure that the numpy/scipy community is equally on board with the decision. Cheers, f From faltet at carabos.com Sat Mar 8 05:52:43 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 8 Mar 2008 11:52:43 +0100 Subject: [Numpy-discussion] ANN: PyTables 2.0.3 released Message-ID: <200803081152.43679.faltet@carabos.com> =========================== ?Announcing PyTables 2.0.3 =========================== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. ?PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is a maintenance release that mainly fixes a couple of important bugs (bad update of multidimensional columns in table objects, and problems using large indexes in 32-bit platforms), some small enhancements, and most importantly, support for the latest HDF5 1.8.0 library. Also, binaries have been compiled against the latest stable version of HDF5, 1.6.7, released during the past February. ?Thanks to the broadening PyTables community for all the valuable feedback. In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt``. ?Find the HTML version for this document at: http://www.pytables.org/moin/ReleaseNotes/Release_2.0.3 You can download a source package of the version 2.0.3 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/stable/ For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.0.3 Migration Notes for PyTables 1.x users ====================================== If you are a user of PyTables 1.x, probably it is worth for you to look at ``MIGRATING_TO_2.x.txt`` file where you will find directions on how to migrate your existing PyTables 1.x apps to the 2.x versions. ?You can find an HTML version of this document at http://www.pytables.org/moin/ReleaseNotes/Migrating_To_2.x Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ To know more about the company behind the development of PyTables, see: http://www.carabos.com/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. ?See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. ?Many thanks also to SourceForge who have helped to make and distribute this package! ?And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From david at ar.media.kyoto-u.ac.jp Sat Mar 8 06:26:41 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 08 Mar 2008 20:26:41 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.5.1: building scipy Message-ID: <47D277F1.4030302@ar.media.kyoto-u.ac.jp> Hi, Mumscons 0.5.1 is available through pypi (eggs and tarballs). This is the first version which can build the whole scipy source tree. To build scipy with numscons, you should first get the code in the branch: svn co http://svn.scipy.org/svn/scipy/branches/build_with_scons And then build it like numpy: python setupscons.py install Technically speaking, you can build scipy with numscons above a numpy build the standard way, but that's not a good idea (because of potential libraries and compilers mismatches between distutils and numscons). See http://projects.scipy.org/scipy/numpy/wiki/NumScons for more details. The only tested platform for now are: - linux + gcc; other compilers on linux should work as well. - solaris + sunstudio with sunperf. On both those platforms, only a few tests do not pass. I don't expect windows or mac OS X to work yet, but I can not test those platforms ATM. I am releasing the current state of numscons because I won't have much time to work on numscons the next few weeks unfortunately. PLEASE DO NOT USE IT FOR PRODUCTION USE ! There are still some serious issues: - I painfully discovered that at least g77 is extremely sensitive to different orders of linker flags (can cause crashes). I don't have any problem anymore on my workstation (Ubuntu 32 bits, atlas + gcc/g77), but this needs more testing. - there are some race conditions with f2py which I do not fully understand yet, and which prevents parallel build to work (so do not use the scons command --jobs option) - optimization flags of proprietary compilers: they are a PITA. They often break IEEE conformance in quite a hard way, and this causes crashes or wrong results (for example, the -fast option of sun compilers breaks the argsort function of numpy). So again, this is really just a release for people to test things if they want, but nothing else. cheers, David From matthew.brett at gmail.com Sat Mar 8 17:10:59 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 8 Mar 2008 17:10:59 -0500 Subject: [Numpy-discussion] numpy.distutils bug, fix, comments? Message-ID: <1e2af89e0803081410y52c47278ic3c36f2b2b8de2e3@mail.gmail.com> Hi, I think I found a bug in numpy/distutils/ccompiler.py - and wanted to check that no-one has any objections before I fix it. These lines (390ff distutils.ccompiler.py) for _cc in ['msvc', 'bcpp', 'cygwinc', 'emxc', 'unixc']: _m = sys.modules.get('distutils.'+_cc+'compiler') if _m is not None: setattr(getattr(_m, _cc+'compiler'), 'gen_lib_options', gen_lib_options) occasionally cause an error with message of form module has no attribute 'unixccompiler'. As far as I can see, the line beginning '_m' can only return None, or, in my case, the distutils.unixccompiler module. Then the getattr phrase will request an attribute 'unixccompiler' from the distutils.unixccompiler module, causing an error. I'm suggesting changing the relevant line to: setattr(_m, 'gen_lib_options', Any objections? If not I'll commit soon... Matthew From robert.kern at gmail.com Sat Mar 8 17:35:00 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 8 Mar 2008 16:35:00 -0600 Subject: [Numpy-discussion] numpy.distutils bug, fix, comments? In-Reply-To: <1e2af89e0803081410y52c47278ic3c36f2b2b8de2e3@mail.gmail.com> References: <1e2af89e0803081410y52c47278ic3c36f2b2b8de2e3@mail.gmail.com> Message-ID: <3d375d730803081435s64b9c818k2da01bdb7afd9e8e@mail.gmail.com> On Sat, Mar 8, 2008 at 4:10 PM, Matthew Brett wrote: > Hi, > > I think I found a bug in numpy/distutils/ccompiler.py - and wanted to > check that no-one has any objections before I fix it. > > These lines (390ff distutils.ccompiler.py) > > for _cc in ['msvc', 'bcpp', 'cygwinc', 'emxc', 'unixc']: > _m = sys.modules.get('distutils.'+_cc+'compiler') > if _m is not None: > setattr(getattr(_m, _cc+'compiler'), 'gen_lib_options', > gen_lib_options) > > occasionally cause an error with message of form module has no > attribute 'unixccompiler'. > > As far as I can see, the line beginning '_m' can only return None, or, > in my case, the > distutils.unixccompiler module. Then the getattr phrase will request > an attribute 'unixccompiler' from the distutils.unixccompiler module, > causing an error. > > I'm suggesting changing the relevant line to: > > setattr(_m, 'gen_lib_options', > > Any objections? If not I'll commit soon... I believe you are correct. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From vfulco1 at gmail.com Sat Mar 8 19:19:17 2008 From: vfulco1 at gmail.com (Vince Fulco) Date: Sat, 8 Mar 2008 19:19:17 -0500 Subject: [Numpy-discussion] Slice and assign into new NDarray... Message-ID: <34f2770f0803081619p2e5d8acld4aa7a7ea9a4a60b@mail.gmail.com> * This may be a dupe as gmail hotkeys sent a draft prematurely... After scouring material and books I remain stumped with this one as a new Numpy user- I have an ND array with shape (10,15) and want to slice or subset(?) the data into a new 2D array with the following criteria: 1) Separate each 5 observations along axis=0 (row) and transpose them to the new array with shape (50,3) Col1 Co2 Col3 Slice1 Slice2 Slice3 ... ... ... Slice1 should have the coordinates[0:5,0], Slice2[0:5,1] and so on...I've tried initializing the target ND array with D = NP.zeros((50,3), dtype='int') and then assigning into it with something like: for s in xrange(original_array.shape[0]): D= NP.transpose([data[s,i:i+step] for i in range(0,data.shape[1], step)]) with step = 5 but I get errors i.e. IndexError: invalid index Also tried various combos of explicitly referencing D coordinates but to no avail. TIA, Vince Fulco From peridot.faceted at gmail.com Sat Mar 8 21:02:12 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 8 Mar 2008 21:02:12 -0500 Subject: [Numpy-discussion] Slice and assign into new NDarray... In-Reply-To: <34f2770f0803081619p2e5d8acld4aa7a7ea9a4a60b@mail.gmail.com> References: <34f2770f0803081619p2e5d8acld4aa7a7ea9a4a60b@mail.gmail.com> Message-ID: On 08/03/2008, Vince Fulco wrote: > I have an ND array with shape (10,15) and want to slice or subset(?) the data > into a new 2D array with the following criteria: > > 1) Separate each 5 observations along axis=0 (row) and transpose them to > the new array with shape (50,3) > > > Col1 Co2 Col3 > > Slice1 Slice2 Slice3 > ... > ... > ... > > Slice1 should have the coordinates[0:5,0], Slice2[0:5,1] and so > on...I've tried initializing the target ND array with > > D = NP.zeros((50,3), dtype='int') and then assigning into it with > something like: > > for s in xrange(original_array.shape[0]): > D= NP.transpose([data[s,i:i+step] for i in range(0,data.shape[1], step)]) > > with step = 5 but I get errors i.e. IndexError: invalid index > > Also tried various combos of explicitly referencing D coordinates but > to no avail. You're not going to get a slice - in the sense of a view on the same underlying array, and through which you can modify the original array - but this is perfectly possible without for loops. First set up the array: In [12]: a = N.arange(150) In [13]: a = N.reshape(a, (-1,15)) You can check that the values are sensible. Now reshape it so that you can split up slice1, slice2, and slice3: In [14]: b = N.reshape(a, (-1, 3, 5)) slice1 is b[:,0,:]. Now we want to flatten the first and third coordinates together. reshape() doesn't do that, exactly, but if we swap the axes around we can use reshape to put them together: In [15]: c = N.reshape(b.swapaxes(1,2),(-1,3)) This reshape necessarily involves copying the original array. You can check that it gives you the value you want. I recommend reading http://www.scipy.org/Numpy_Functions_by_Category for all those times you know what you want to do but can't find the function to make numpy do it. Anne From mani.sabri at gmail.com Sun Mar 9 04:57:19 2008 From: mani.sabri at gmail.com (mani sabri) Date: Sun, 9 Mar 2008 12:27:19 +0330 Subject: [Numpy-discussion] Create a numpy array from an array of a C structure Message-ID: <47d3a6bb.06c8100a.4ae5.ffffca9d@mx.google.com> Hello Is it possible to create a numpy array from an array of a C structure like this? struct RateInfo { unsigned int ctm; double open; double low; double high; double close; double vol; }; I am embedding python in a financial application and I have an array of this structure that I want to perform some statistical computations on it. Best regards, Mani Sabri From robert.kern at gmail.com Sun Mar 9 05:15:19 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 9 Mar 2008 03:15:19 -0600 Subject: [Numpy-discussion] Create a numpy array from an array of a C structure In-Reply-To: <47d3a6bb.06c8100a.4ae5.ffffca9d@mx.google.com> References: <47d3a6bb.06c8100a.4ae5.ffffca9d@mx.google.com> Message-ID: <3d375d730803090115r460c4421n1c3908cdb98205c7@mail.gmail.com> On Sun, Mar 9, 2008 at 2:57 AM, mani sabri wrote: > Hello > > Is it possible to create a numpy array from an array of a C structure like > this? > > struct RateInfo > { > unsigned int ctm; > double open; > double low; > double high; > double close; > double vol; > }; Sure. On the numpy side, you would make an record array with the appropriate dtype and size. In [1]: from numpy import * In [2]: dt = dtype([('ctm', uint), ('open', double), ('low', double), ('high', double), ('close', double), ('vol', double)]) In [3]: a = empty(10, dtype=dt) On the C side, you would iterate through your C array and your numpy array and just assign elements from the one to the other. If you have a contiguous C array, you could also just use memcpy(). This is probably reliable because all of your struct members take up multiples of 4 bytes and most C compilers will pack those without any space between them. If you were mixing, say, chars and doubles, the C compiler may try to align the doubles on a 4-byte boundary (or possibly another boundary, but 4-bytes is common). In that case, you will have to figure out how your C compiler is packing the member and emulate that in your dtype. Each of the tuples in the constructor can have a third element which represents the byte offset of that member from the beginning of the struct. In [4]: dt2 = dtype([('ctm', uint, 0), ('open', double, 4), ('low', double, 12), ('high', double, 20), ('close', double, 28), ('vol', double, 36)]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mani.sabri at gmail.com Sun Mar 9 05:24:35 2008 From: mani.sabri at gmail.com (mani sabri) Date: Sun, 9 Mar 2008 12:54:35 +0330 Subject: [Numpy-discussion] Create a numpy array from an array of a Cstructure In-Reply-To: <3d375d730803090115r460c4421n1c3908cdb98205c7@mail.gmail.com> Message-ID: <47d3ad21.0ec5100a.0433.1641@mx.google.com> I don't want to disturb the list with this kind of crap but I can't hold my self to tell how much I love you guys! >-----Original Message----- >From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >bounces at scipy.org] On Behalf Of Robert Kern >Sent: Sunday, March 09, 2008 12:45 PM >To: Discussion of Numerical Python >Subject: Re: [Numpy-discussion] Create a numpy array from an array of a >Cstructure > >On Sun, Mar 9, 2008 at 2:57 AM, mani sabri wrote: >> Hello >> >> Is it possible to create a numpy array from an array of a C structure >like >> this? >> >> struct RateInfo >> { >> unsigned int ctm; >> double open; >> double low; >> double high; >> double close; >> double vol; >> }; > >Sure. On the numpy side, you would make an record array with the >appropriate dtype and size. > >In [1]: from numpy import * > >In [2]: dt = dtype([('ctm', uint), ('open', double), ('low', double), >('high', double), ('close', double), ('vol', double)]) > >In [3]: a = empty(10, dtype=dt) > > >On the C side, you would iterate through your C array and your numpy >array and just assign elements from the one to the other. If you have >a contiguous C array, you could also just use memcpy(). > >This is probably reliable because all of your struct members take up >multiples of 4 bytes and most C compilers will pack those without any >space between them. If you were mixing, say, chars and doubles, the C >compiler may try to align the doubles on a 4-byte boundary (or >possibly another boundary, but 4-bytes is common). In that case, you >will have to figure out how your C compiler is packing the member and >emulate that in your dtype. Each of the tuples in the constructor can >have a third element which represents the byte offset of that member >from the beginning of the struct. > >In [4]: dt2 = dtype([('ctm', uint, 0), ('open', double, 4), ('low', >double, 12), ('high', double, 20), ('close', double, 28), ('vol', >double, 36)]) > >-- >Robert Kern > >"I have come to believe that the whole world is an enigma, a harmless >enigma that is made terrible by our own mad attempt to interpret it as >though it had an underlying truth." > -- Umberto Eco >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at scipy.org >http://projects.scipy.org/mailman/listinfo/numpy-discussion From mani.sabri at gmail.com Sun Mar 9 06:48:28 2008 From: mani.sabri at gmail.com (mani sabri) Date: Sun, 9 Mar 2008 14:18:28 +0330 Subject: [Numpy-discussion] Create a numpy array from an array of a Cstructure In-Reply-To: <3d375d730803090115r460c4421n1c3908cdb98205c7@mail.gmail.com> Message-ID: <47d3c0c8.0c92100a.7629.ffffd643@mx.google.com> Sorry for the outburst of emotions! I deeply regret the word "crap" :(. Just wanted to say thank you! :) Mani >-----Original Message----- >From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >bounces at scipy.org] On Behalf Of Robert Kern >Sent: Sunday, March 09, 2008 12:45 PM >To: Discussion of Numerical Python >Subject: Re: [Numpy-discussion] Create a numpy array from an array of a >Cstructure > >On Sun, Mar 9, 2008 at 2:57 AM, mani sabri wrote: >> Hello >> >> Is it possible to create a numpy array from an array of a C structure >like >> this? >> >> struct RateInfo >> { >> unsigned int ctm; >> double open; >> double low; >> double high; >> double close; >> double vol; >> }; > >Sure. On the numpy side, you would make an record array with the >appropriate dtype and size. > >In [1]: from numpy import * > >In [2]: dt = dtype([('ctm', uint), ('open', double), ('low', double), >('high', double), ('close', double), ('vol', double)]) > >In [3]: a = empty(10, dtype=dt) > > >On the C side, you would iterate through your C array and your numpy >array and just assign elements from the one to the other. If you have >a contiguous C array, you could also just use memcpy(). > >This is probably reliable because all of your struct members take up >multiples of 4 bytes and most C compilers will pack those without any >space between them. If you were mixing, say, chars and doubles, the C >compiler may try to align the doubles on a 4-byte boundary (or >possibly another boundary, but 4-bytes is common). In that case, you >will have to figure out how your C compiler is packing the member and >emulate that in your dtype. Each of the tuples in the constructor can >have a third element which represents the byte offset of that member >from the beginning of the struct. > >In [4]: dt2 = dtype([('ctm', uint, 0), ('open', double, 4), ('low', >double, 12), ('high', double, 20), ('close', double, 28), ('vol', >double, 36)]) > >-- >Robert Kern > >"I have come to believe that the whole world is an enigma, a harmless >enigma that is made terrible by our own mad attempt to interpret it as >though it had an underlying truth." > -- Umberto Eco >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at scipy.org >http://projects.scipy.org/mailman/listinfo/numpy-discussion From giorgio at gilestro.tk Sun Mar 9 13:35:27 2008 From: giorgio at gilestro.tk (Giorgio F. Gilestro) Date: Sun, 09 Mar 2008 12:35:27 -0500 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <200803071237.57208.pgmdevlist@gmail.com> References: <47D157BB.90003@gilestro.tk> <200803071037.05524.pgmdevlist@gmail.com> <47D17A79.1070103@gilestro.tk> <200803071237.57208.pgmdevlist@gmail.com> Message-ID: <47D41FDF.4030504@gilestro.tk> Ok generic functions and a ma.stats specific module sounds very good to me. Hope is going to happen for ma are a great plus. Pierre, I did some adjusting to some of the functions in scipy.stats.stats and more I am planning to do - not all but those I'll need I am afraid. Is it ok if I send you what I'll have so that you have a look at it (at your convenience) and maybe integrate it to numpy.ma.mstats? For the moment the only issues I met are: - some functions require to know N, the number of elements on which we are performing the operation. A simple N.shape[axis] won't work but there is no native method returning the number of unmasked elements on a given axis (maybe there should be?). So I am using instead N = a.shape[axis] - a.mask.sum(axis) - some functions need to handle float data. The float method on masked array will raise an exception (why so?) so I am either introducing float constant where possible e.g. svar = ((n-1)*v) / float(df) becomes svar = ((n-1.0)*v) / df or multiply by 1.0 Pierre GM wrote: > On Friday 07 March 2008 12:25:13 Giorgio F. Gilestro wrote: >> Ok, I see, thank you Pierre. >> I thought scipy.stats would have been a widely used extension so I >> didn't really consider the trivial possibility that simply wasn't >> compatible with ma yet. > > Partly my fault here, as I should have ported more functions. Blame the > fact that working on an open-source project doesn't translate in > publications, and that my bosses are shortening the leash.... > Note that most (all?) of the functions in scipy.stats never supported masked > arrays in the first place anyway. Now that MaskedArray is just a subclass of > ndarray, porting the functions should be easier. > >> I had a quick look at the code and it really seems that ma handling can >> be achieved by replacing np.asarray with np.ma.asarray, and some >> functions with their methods (like ravel) here and there. > > Yes and no. I'd prefer to use numpy.asanyarray as to avoid converting ndarrays > to masked arrays, and use methods as much as possible. Of course, there's > gonna be some particular cases to handle (as when all the data are masked), > but that should be relatively painless. > > Another issue is where to store the new functions: should we try to ensure > full compatibility of scipy.stats with masked arrays? Create a new module > scipy.mstats instead, that we'd fill up with time ? I'd be keener on the > second approach, as we could move most of the functions currently in > numpy.ma.m(ore)stats to this new module, and that'd probably less work at > once... > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Sun Mar 9 13:40:09 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 9 Mar 2008 13:40:09 -0400 Subject: [Numpy-discussion] behavior of masked arrays In-Reply-To: <47D41FDF.4030504@gilestro.tk> References: <47D157BB.90003@gilestro.tk> <200803071237.57208.pgmdevlist@gmail.com> <47D41FDF.4030504@gilestro.tk> Message-ID: <200803091340.10185.pgmdevlist@gmail.com> On Sunday 09 March 2008 13:35:27 Giorgio F. Gilestro wrote: > Pierre, I did some adjusting to some of the functions in > scipy.stats.stats and more I am planning to do - not all but those I'll > need I am afraid. Is it ok if I send you what I'll have so that you have > a look at it (at your convenience) and maybe integrate it to > numpy.ma.mstats? Sure, no problem. I foresee a reorganization of numpy.ma.mstats in the near future, with most functions being sent to a scipy.stats.mstats package instead. mmedian would be introduced in core for compatibility with numpy. > For the moment the only issues I met are: > > - some functions require to know N, the number of elements on which we > are performing the operation. A simple N.shape[axis] won't work but > there is no native method returning the number of unmasked elements on a > given axis (maybe there should be?). So I am using instead > > N = a.shape[axis] - a.mask.sum(axis) Well, you can count the number of missing values along a given axis with self.count(axis), so the number of unmasked values is simply self.shape[axis]-self.count(axis) > - some functions need to handle float data. The float method on masked > array will raise an exception (why so?) so I am either introducing float > constant where possible Mmh, what float method ? If you're using the regular float function, that should work on 0d arrays, with nan being returned if you have a masked values. From cournapeau at cslab.kecl.ntt.co.jp Sun Mar 9 23:49:00 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Mon, 10 Mar 2008 12:49:00 +0900 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? Message-ID: <1205120940.25618.3.camel@bbc8> Hi, I have some problems with the f2py tool for numscons, or more exactly, some limitations related to thread handling in python means that I may not be able to reliably use f2py as a python module in numscons in a thread-safe manner. So I am thinking about using f2py from the command-line, but for this to work, f2py needs to be installed. IOW, if at some point, we want to use f2py for numpy (bootstrap), this won't work. Is it something I should take into account, or not ? cheers, David From robert.kern at gmail.com Mon Mar 10 00:11:55 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 9 Mar 2008 23:11:55 -0500 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? In-Reply-To: <1205120940.25618.3.camel@bbc8> References: <1205120940.25618.3.camel@bbc8> Message-ID: <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> On Sun, Mar 9, 2008 at 10:49 PM, David Cournapeau wrote: > Hi, > > I have some problems with the f2py tool for numscons, or more exactly, > some limitations related to thread handling in python means that I may > not be able to reliably use f2py as a python module in numscons in a > thread-safe manner. So I am thinking about using f2py from the > command-line, but for this to work, f2py needs to be installed. IOW, if > at some point, we want to use f2py for numpy (bootstrap), this won't > work. Is it something I should take into account, or not ? Almost certainly f2py will never be used to build any part of numpy itself because we will not include something that requires a FORTRAN compiler to build numpy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From roygeorget at gmail.com Mon Mar 10 01:37:05 2008 From: roygeorget at gmail.com (royG) Date: Sun, 9 Mar 2008 22:37:05 -0700 (PDT) Subject: [Numpy-discussion] eigenvector and eigenface Message-ID: <461f32ad-0caf-42e4-955d-3481947e4964@e23g2000prf.googlegroups.com> friends I am learning eigenfaces using numpy . i use data from N images and create eigenvectors to get a 'sorted eigenvectors' array of size N X N. when i project the 'zero mean imagedata' i will get a facespace array of N X numpixels. (where numpixels is total pixels in one image) is eigenface the same as eigenvector? some of the docs i read(pissarenko-Eigenface-based facial recognition), use these two words to mean the same thing..but when i look at the dimensions of 'sorted eigenvectors' array it is only NXN and i don't know how i can make images out of it representing eigenfaces. on the other hand the projection of 'zero mean imagedata' on eigenvectors by using numpy.dot(eigenvectors,zeromeanimagedata) can make an array of N X numpixels .I believe this is what is known as the facespace .is this what represents the eigenface images ? will be thankful for any expert opinion on this.. RG From charlesr.harris at gmail.com Mon Mar 10 01:54:34 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 9 Mar 2008 23:54:34 -0600 Subject: [Numpy-discussion] Create a numpy array from an array of a C structure In-Reply-To: <47d3a6bb.06c8100a.4ae5.ffffca9d@mx.google.com> References: <47d3a6bb.06c8100a.4ae5.ffffca9d@mx.google.com> Message-ID: On Sun, Mar 9, 2008 at 2:57 AM, mani sabri wrote: > Hello > > Is it possible to create a numpy array from an array of a C structure like > this? > > struct RateInfo > { > unsigned int ctm; > double open; > double low; > double high; > double close; > double vol; > }; You might have an alignment problem if unsigned int is of different size than double and depending on the architecture and whether or not the OS is 64 bit. C compilers like to add spaces so that each variable is efficiently aligned and as a result C structures tend to be non-portable and should be avoided for data storage and transport. It helps a bit if you place the longest variables first in the structure but there are no guarantees. You should at least check the size of the structure to see if it is packed or not. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.skomoroch at gmail.com Mon Mar 10 02:08:44 2008 From: peter.skomoroch at gmail.com (Peter Skomoroch) Date: Mon, 10 Mar 2008 02:08:44 -0400 Subject: [Numpy-discussion] eigenvector and eigenface In-Reply-To: <461f32ad-0caf-42e4-955d-3481947e4964@e23g2000prf.googlegroups.com> References: <461f32ad-0caf-42e4-955d-3481947e4964@e23g2000prf.googlegroups.com> Message-ID: See this thread: http://www.mail-archive.com/numpy-discussion at scipy.org/msg06877.html On Mon, Mar 10, 2008 at 1:37 AM, royG wrote: > friends > I am learning eigenfaces using numpy . i use data from N images and > create eigenvectors to get a 'sorted eigenvectors' array of size N X > N. when i project the 'zero mean imagedata' i will get a facespace > array of N X numpixels. (where numpixels is total pixels in one image) > > is eigenface the same as eigenvector? some of the docs i > read(pissarenko-Eigenface-based facial recognition), use these two > words to mean the same thing..but when i look at the dimensions of > 'sorted eigenvectors' array > it is only NXN and i don't know how i can make images out of it > representing eigenfaces. > > on the other hand the projection of 'zero mean imagedata' on > eigenvectors by using numpy.dot(eigenvectors,zeromeanimagedata) can > make an array of N X numpixels > .I believe this is what is known as the facespace .is this what > represents the eigenface images ? > > will be thankful for any expert opinion on this.. > RG > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Peter N. Skomoroch peter.skomoroch at gmail.com http://www.datawrangling.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournapeau at cslab.kecl.ntt.co.jp Mon Mar 10 02:49:30 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Mon, 10 Mar 2008 15:49:30 +0900 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? In-Reply-To: <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> References: <1205120940.25618.3.camel@bbc8> <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> Message-ID: <1205131770.25618.4.camel@bbc8> On Sun, 2008-03-09 at 23:11 -0500, Robert Kern wrote: > > Almost certainly f2py will never be used to build any part of numpy > itself because we will not include something that requires a FORTRAN > compiler to build numpy. Can't f2py be used to wrap C code, too ? cheers, David From robert.kern at gmail.com Mon Mar 10 03:11:33 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 10 Mar 2008 02:11:33 -0500 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? In-Reply-To: <1205131770.25618.4.camel@bbc8> References: <1205120940.25618.3.camel@bbc8> <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> <1205131770.25618.4.camel@bbc8> Message-ID: <3d375d730803100011i3fa1f654s559f87fdca8148f3@mail.gmail.com> On Mon, Mar 10, 2008 at 1:49 AM, David Cournapeau wrote: > On Sun, 2008-03-09 at 23:11 -0500, Robert Kern wrote: > > > > Almost certainly f2py will never be used to build any part of numpy > > itself because we will not include something that requires a FORTRAN > > compiler to build numpy. > > Can't f2py be used to wrap C code, too ? Yes, but it's probably going to be easier to wrap whatever by hand than try to ensure that f2py bootstraps correctly, scons or no scons. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From roygeorget at gmail.com Mon Mar 10 03:17:58 2008 From: roygeorget at gmail.com (royG) Date: Mon, 10 Mar 2008 00:17:58 -0700 (PDT) Subject: [Numpy-discussion] dot() instead of tensordot() Message-ID: hi can numpy.dot() be used instead of tensordot()? is there any performance difference? I am talking about multipln btw numpy arrays of dimensions 50 X 20,000 where elements are of float type. RG From cournapeau at cslab.kecl.ntt.co.jp Mon Mar 10 04:45:19 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Mon, 10 Mar 2008 17:45:19 +0900 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? In-Reply-To: <3d375d730803100011i3fa1f654s559f87fdca8148f3@mail.gmail.com> References: <1205120940.25618.3.camel@bbc8> <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> <1205131770.25618.4.camel@bbc8> <3d375d730803100011i3fa1f654s559f87fdca8148f3@mail.gmail.com> Message-ID: <1205138719.25618.12.camel@bbc8> On Mon, 2008-03-10 at 02:11 -0500, Robert Kern wrote: > > Yes, but it's probably going to be easier to wrap whatever by hand > than try to ensure that f2py bootstraps correctly, scons or no scons. > Ok, thanks. Some last questions regarding f2py: - does it make any difference to use it from the command line (executing it through the shell) compared to using it by importing the module (import numpy.f2py), as long as I am making sure I use the right executable ? - Would it be possible to add a facility to f2py to get the executable name from the module ? Something like sys.executable, but for f2py ? (If it is ok, I can add the facility myself) cheers, David From robert.kern at gmail.com Mon Mar 10 04:57:54 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 10 Mar 2008 03:57:54 -0500 Subject: [Numpy-discussion] Will f2py ever be used in numpy ? In-Reply-To: <1205138719.25618.12.camel@bbc8> References: <1205120940.25618.3.camel@bbc8> <3d375d730803092111s4bbee5a9n138b43f6b30b8095@mail.gmail.com> <1205131770.25618.4.camel@bbc8> <3d375d730803100011i3fa1f654s559f87fdca8148f3@mail.gmail.com> <1205138719.25618.12.camel@bbc8> Message-ID: <3d375d730803100157m1f55ebdai598a05b426d9add6@mail.gmail.com> On Mon, Mar 10, 2008 at 3:45 AM, David Cournapeau wrote: > On Mon, 2008-03-10 at 02:11 -0500, Robert Kern wrote: > > > > > Yes, but it's probably going to be easier to wrap whatever by hand > > than try to ensure that f2py bootstraps correctly, scons or no scons. > > > > Ok, thanks. Some last questions regarding f2py: > - does it make any difference to use it from the command line > (executing it through the shell) compared to using it by importing the > module (import numpy.f2py), as long as I am making sure I use the right > executable ? Depends on exactly what you are doing with numpy.f2py. The Python API is (by logical necessity) more capable than the executable. > - Would it be possible to add a facility to f2py to get the executable > name from the module ? Something like sys.executable, but for f2py ? (If > it is ok, I can add the facility myself) No, because the module knows nothing about where an executable might be installed. There might be several executables, in fact. It would be better to just execute [sys.executable, '-c', 'from numpy.f2py.f2py2e import main;main()'] -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Mon Mar 10 07:43:46 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Mar 2008 05:43:46 -0600 Subject: [Numpy-discussion] dot() instead of tensordot() In-Reply-To: References: Message-ID: On Mon, Mar 10, 2008 at 1:17 AM, royG wrote: > hi > can numpy.dot() be used instead of tensordot()? is there any > performance difference? I am talking about multipln btw numpy arrays > of dimensions 50 X 20,000 where elements are of float type. > Dot is the usual matrix multiplication operator, tensordot extends it to allow contraction on an arbitrary set of indices. If you don't need that capability just use dot. I suspect dot might be a bit faster, but in your case the call overhead is probably negligible relative to the computation time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joris.DeRidder at ster.kuleuven.be Mon Mar 10 08:46:28 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Mon, 10 Mar 2008 13:46:28 +0100 Subject: [Numpy-discussion] Numpy/Cython Google Summer of Code project idea In-Reply-To: References: <520CD99F-9096-4806-B353-3D61AF74CBCF@ster.kuleuven.be> Message-ID: <1275E793-99D1-45C4-B26B-93657466DABA@ster.kuleuven.be> Hi Fernando, > I hope this (Travis' ideas teaser and all :) provides some better > perspective on the recent enthusiasm regarding cython, as a tool > complementary to ctypes that could greatly benefit numpy and scipy. > If it doesn't it just means I did a poor job of communicating, Nope, you did a great job! Cheers, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From faltet at carabos.com Mon Mar 10 13:08:41 2008 From: faltet at carabos.com (Francesc Altet) Date: Mon, 10 Mar 2008 18:08:41 +0100 Subject: [Numpy-discussion] On Numexpr and uint64 type Message-ID: <200803101808.42126.faltet@carabos.com> Hi, In order to allow in-kernel queries in PyTables (www.pytables.org) work with unsigned 64-bit integers, we would like to see uint64 support in Numexpr (http://code.google.com/p/numexpr/). To do this, we have to decide first how uint64 interacts with other types. For example, which should be the outcome of: numpy.array([1], 'int64') / numpy.array([2], 'uint64') Basically, there are a couple of possibilities: 1) To follow the behaviour of NumPy and upcast both operands to float64 and do the operation. That is: In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') Out[21]: array([ 0.5]) 2) Implement support for uint64 as a non-upcastable type, so that one cannot merge uint64 operands with other types. That is: In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') Out[21]: TypeError: unsupported operand type(s) for /: 'int64' and 'uint64' Solution 1) is appealing because is how NumPy works, but I don't personally like the upcasting to float64. First of all, because you transparently convert numbers potentially loosing the least significant digits. Second, because an operation between integers gives a float as a result, and this is different for typical programming languages. Solution 2) addresses shortcomings of solution 1), but introduces the problem that can only operate in conjunction with other uint64 operands, making it practically an 'isolated' type (much like a string type). We are mostly inclined to implement 2) behaviour, but before proceed, I'd like to know what other people think about this. Thanks, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Mon Mar 10 13:27:37 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Mar 2008 11:27:37 -0600 Subject: [Numpy-discussion] On Numexpr and uint64 type In-Reply-To: <200803101808.42126.faltet@carabos.com> References: <200803101808.42126.faltet@carabos.com> Message-ID: On Mon, Mar 10, 2008 at 11:08 AM, Francesc Altet wrote: > Hi, > > In order to allow in-kernel queries in PyTables (www.pytables.org) work > with unsigned 64-bit integers, we would like to see uint64 support in > Numexpr (http://code.google.com/p/numexpr/). > > To do this, we have to decide first how uint64 interacts with other > types. For example, which should be the outcome of: > > numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > Basically, there are a couple of possibilities: > > 1) To follow the behaviour of NumPy and upcast both operands to float64 > and do the operation. That is: > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > Out[21]: array([ 0.5]) > > 2) Implement support for uint64 as a non-upcastable type, so that one > cannot merge uint64 operands with other types. That is: > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > Out[21]: TypeError: unsupported operand type(s) for /: 'int64' > and 'uint64' > > Solution 1) is appealing because is how NumPy works, but I don't > personally like the upcasting to float64. First of all, because you > transparently convert numbers potentially loosing the least significant > digits. Second, because an operation between integers gives a float as > a result, and this is different for typical programming languages. > I don't like the up(down)casting either. I suspect the original justification was preserving precision, but it doesn't do that. Addition of signed and unsinged numbers are the same in modular arithmetic, so simply treating everything as uint64 would, IMHO, be the best option there and for multiplication. Not everything has a modular inverse, but truncation is the C solution in that case. The question seems to be whether to return a signed or unsigned integer. Hmm. I would go for unsigned, which could be converted to signed by casting. The sign of the remainder might be a problem, though, which would give unusual truncation behavior. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Mon Mar 10 14:50:55 2008 From: faltet at carabos.com (Francesc Altet) Date: Mon, 10 Mar 2008 19:50:55 +0100 Subject: [Numpy-discussion] On Numexpr and uint64 type In-Reply-To: References: <200803101808.42126.faltet@carabos.com> Message-ID: <200803101950.55527.faltet@carabos.com> A Monday 10 March 2008, Charles R Harris escrigu?: > On Mon, Mar 10, 2008 at 11:08 AM, Francesc Altet wrote: > > Hi, > > > > In order to allow in-kernel queries in PyTables (www.pytables.org) > > work with unsigned 64-bit integers, we would like to see uint64 > > support in Numexpr (http://code.google.com/p/numexpr/). > > > > To do this, we have to decide first how uint64 interacts with other > > types. For example, which should be the outcome of: > > > > numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > > > Basically, there are a couple of possibilities: > > > > 1) To follow the behaviour of NumPy and upcast both operands to > > float64 and do the operation. That is: > > > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > Out[21]: array([ 0.5]) > > > > 2) Implement support for uint64 as a non-upcastable type, so that > > one cannot merge uint64 operands with other types. That is: > > > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > Out[21]: TypeError: unsupported operand type(s) for /: 'int64' > > and 'uint64' > > > > Solution 1) is appealing because is how NumPy works, but I don't > > personally like the upcasting to float64. First of all, because > > you transparently convert numbers potentially loosing the least > > significant digits. Second, because an operation between integers > > gives a float as a result, and this is different for typical > > programming languages. > > I don't like the up(down)casting either. I suspect the original > justification was preserving precision, but it doesn't do that. > Addition of signed and unsinged numbers are the same in modular > arithmetic, so simply treating everything as uint64 would, IMHO, be > the best option there and for multiplication. Not everything has a > modular inverse, but truncation is the C solution in that case. The > question seems to be whether to return a signed or unsigned integer. > Hmm. I would go for unsigned, which could be converted to signed by > casting. The sign of the remainder might be a problem, though, which > would give unusual truncation behavior. Mmm, yes. We've already considered converting all operands to uint64 first too, and have an uint64 as an outcome too, but realized that we could have some difficulties when doing boolean comparisons in Numexpr. For example, if a is an int64 and b is uint64, and we want to compute "a + b", we could have: In [44]: a = numpy.array([-4], 'int64') In [45]: b = numpy.array([2], 'uint64') In [46]: c = a.astype('uint64') + b.astype('uint64') In [47]: c Out[47]: array([18446744073709551614], dtype=uint64) In [48]: c.astype('int64') Out[48]: array([-2], dtype=int64) # in case we want signed integers The difficulty that we observed is that the expression 'a + b < 0' (i.e. checking for signedness) could surprise the unexperienced user (this would be evaluated as false because the outcome of a + b is unsigned). Having said that, this approach is completely consistent and, if properly documented, could be a nice way to implement uint64 for Numexpr case. D. Cooke or T. Hochberg have something to say to that regard? Thanks, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From tim.hochberg at ieee.org Mon Mar 10 16:12:54 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 10 Mar 2008 13:12:54 -0700 Subject: [Numpy-discussion] On Numexpr and uint64 type In-Reply-To: <200803101950.55527.faltet@carabos.com> References: <200803101808.42126.faltet@carabos.com> <200803101950.55527.faltet@carabos.com> Message-ID: On Mon, Mar 10, 2008 at 11:50 AM, Francesc Altet wrote: > A Monday 10 March 2008, Charles R Harris escrigu?: > > On Mon, Mar 10, 2008 at 11:08 AM, Francesc Altet > wrote: > > > Hi, > > > > > > In order to allow in-kernel queries in PyTables (www.pytables.org) > > > work with unsigned 64-bit integers, we would like to see uint64 > > > support in Numexpr (http://code.google.com/p/numexpr/). > > > > > > To do this, we have to decide first how uint64 interacts with other > > > types. For example, which should be the outcome of: > > > > > > numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > > > > > Basically, there are a couple of possibilities: > > > > > > 1) To follow the behaviour of NumPy and upcast both operands to > > > float64 and do the operation. That is: > > > > > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > > Out[21]: array([ 0.5]) > > > > > > 2) Implement support for uint64 as a non-upcastable type, so that > > > one cannot merge uint64 operands with other types. That is: > > > > > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64') > > > Out[21]: TypeError: unsupported operand type(s) for /: 'int64' > > > and 'uint64' > > > > > > Solution 1) is appealing because is how NumPy works, but I don't > > > personally like the upcasting to float64. First of all, because > > > you transparently convert numbers potentially loosing the least > > > significant digits. Second, because an operation between integers > > > gives a float as a result, and this is different for typical > > > programming languages. > > > > I don't like the up(down)casting either. I suspect the original > > justification was preserving precision, but it doesn't do that. > > Addition of signed and unsinged numbers are the same in modular > > arithmetic, so simply treating everything as uint64 would, IMHO, be > > the best option there and for multiplication. Not everything has a > > modular inverse, but truncation is the C solution in that case. The > > question seems to be whether to return a signed or unsigned integer. > > Hmm. I would go for unsigned, which could be converted to signed by > > casting. The sign of the remainder might be a problem, though, which > > would give unusual truncation behavior. > > Mmm, yes. We've already considered converting all operands to uint64 > first too, and have an uint64 as an outcome too, but realized that we > could have some difficulties when doing boolean comparisons in Numexpr. > For example, if a is an int64 and b is uint64, and we want to > compute "a + b", we could have: > > In [44]: a = numpy.array([-4], 'int64') > > In [45]: b = numpy.array([2], 'uint64') > > In [46]: c = a.astype('uint64') + b.astype('uint64') > > In [47]: c > Out[47]: array([18446744073709551614], dtype=uint64) > > In [48]: c.astype('int64') > Out[48]: array([-2], dtype=int64) # in case we want signed integers > > The difficulty that we observed is that the expression 'a + b < 0' (i.e. > checking for signedness) could surprise the unexperienced user (this > would be evaluated as false because the outcome of a + b is unsigned). > Having said that, this approach is completely consistent and, if > properly documented, could be a nice way to implement uint64 for > Numexpr case. > > D. Cooke or T. Hochberg have something to say to that regard? Without a compelling use case, we should try to avoid subtly different semantics for numexpr and numpy. I'm fine with option #2 since that will generally result in an unsubtle difference (aka, an exception), but casting everything to uint64 seems questionable. Another option, that sounds good to me, at least at first glance, is implement #2, but expose casting operators from uint64->int64 and vice-versa. I would spell them as int64 and uint64 since that already works in numpy. Then one could efficiently perform mixed operations if needed, for example "a + uint64(b)", but not have the potential pitfalls of automatic casting. That's my rapidly depreciating $.02 anyway. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bolme1234 at comcast.net Mon Mar 10 19:24:04 2008 From: bolme1234 at comcast.net (David Bolme) Date: Mon, 10 Mar 2008 17:24:04 -0600 Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: References: Message-ID: The steps you describe here are correct. I am putting together an open source computer vision library based on numpy/scipy. It will include an automatic PCA algorithm with face detection, eye detection, PCA dimensionally reduction, and distance measurement. If you are interested let me know and I will redouble my efforts to release the code soon. Dave On Feb 29, 2008, at 12:15 PM, devnew at gmail.com wrote: > 1.represent matrix of face images data > 2.find the adjusted matrix by substracting the mean face > 3.calculate covariance matrix (cov=A* A_transpose) where A is from > step2 > 4.find eigenvectors and select those with highest eigenvalues > 5.calculate facespace=eigenvectors*A > From millman at berkeley.edu Tue Mar 11 02:10:52 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Mon, 10 Mar 2008 23:10:52 -0700 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> <1204780121.25137.2.camel@bbc8> Message-ID: On Wed, Mar 5, 2008 at 10:44 PM, Charles R Harris wrote: > Hmm. Well, it's in now. I have a 32 bit xeon at work and numpy fails one > test and warns on another, so that might be a related problem. I'll give > things a try and see what happens. I would think things should fail rather > spectacularly if the system was misidentified and that isn't the case > currently. Hey Chuck, Is your 32 bit Xeon machine still failing a NumPy test and warning on another? Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From petyuk at gmail.com Tue Mar 11 02:11:58 2008 From: petyuk at gmail.com (Vladislav Petyuk) Date: Mon, 10 Mar 2008 23:11:58 -0700 Subject: [Numpy-discussion] creating large arrays cause memory error, although there is more than enough RAM Message-ID: I have Memory Error if I try to create numpy arrays or large size like 100-500 Mb (e.g. 30000 x 3000 'float32' array) My computer has 3 Gb of RAM, which is well enough to handle these arrays. And there is definetely memory available. Nevertheless, the program crushes with "Potential Memory Error". I would appreciate any tips for tackling this problem. This problem is similar to the one described before: http://thread.gmane.org/gmane.comp.python.numeric.general/2311 Thanks, Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Mar 11 02:20:33 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 11 Mar 2008 15:20:33 +0900 Subject: [Numpy-discussion] creating large arrays cause memory error, although there is more than enough RAM In-Reply-To: References: Message-ID: <47D624B1.3050905@ar.media.kyoto-u.ac.jp> Vladislav Petyuk wrote: > I have Memory Error if I try to create numpy arrays or large size like > 100-500 Mb (e.g. 30000 x 3000 'float32' array) > My computer has 3 Gb of RAM, which is well enough to handle these > arrays. And there is definetely memory available. > Nevertheless, the program crushes with "Potential Memory Error". > I would appreciate any tips for tackling this problem. > Hi, Could you give us a small script which shows the problem ? Also, which OS are you using ? cheers, David From charlesr.harris at gmail.com Tue Mar 11 02:31:37 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Mar 2008 00:31:37 -0600 Subject: [Numpy-discussion] preparing to tag NumPy 1.0.5 on Wednesday In-Reply-To: References: <5b8d13220803051726m25d3b4c5id0aa53c96917978@mail.gmail.com> <1204780121.25137.2.camel@bbc8> Message-ID: On Tue, Mar 11, 2008 at 12:10 AM, Jarrod Millman wrote: > On Wed, Mar 5, 2008 at 10:44 PM, Charles R Harris > wrote: > > Hmm. Well, it's in now. I have a 32 bit xeon at work and numpy fails one > > test and warns on another, so that might be a related problem. I'll give > > things a try and see what happens. I would think things should fail > rather > > spectacularly if the system was misidentified and that isn't the case > > currently. > > Hey Chuck, > > Is your 32 bit Xeon machine still failing a NumPy test and warning on > another? > Yes. It's an old dual Xeon machine from Dell and I don't know what the problem is, It started about a month ago when I updated svn after a long time of disuse. The messages can be seen at *http://tinyurl.com/2elhyx* Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 11 02:38:43 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Mar 2008 00:38:43 -0600 Subject: [Numpy-discussion] creating large arrays cause memory error, although there is more than enough RAM In-Reply-To: References: Message-ID: On Tue, Mar 11, 2008 at 12:11 AM, Vladislav Petyuk wrote: > I have Memory Error if I try to create numpy arrays or large size like > 100-500 Mb (e.g. 30000 x 3000 'float32' array) > My computer has 3 Gb of RAM, which is well enough to handle these arrays. > And there is definetely memory available. > Nevertheless, the program crushes with "Potential Memory Error". > I would appreciate any tips for tackling this problem. > > The OS would be helpful and the amount of virtual memory. Note that 1Gib is probably taken by the OS. If you are running linux the output of free -m before the array creation might be helpful. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Tue Mar 11 05:44:03 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 11 Mar 2008 10:44:03 +0100 Subject: [Numpy-discussion] [Pytables-users] On Numexpr and uint64 type In-Reply-To: References: <200803101808.42126.faltet@carabos.com> Message-ID: <200803111044.04092.faltet@carabos.com> Hi Marteen, A Monday 10 March 2008, escrigu?reu: > > Solution 1) is appealing because is how NumPy works, but I don't > > personally like the upcasting to float64. First of all, because > > you transparently convert numbers potentially loosing the least > > significant > > digits. Second, because an operation between integers gives a > > float as > > a result, and this is different for typical programming languages. > > For what it is worth, Py3K will change this behaviour. > See http://www.python.org/dev/peps/pep-3100/ and PEP 238. > While it is different from all current languages, that doesn't mean > it is > a good idea to floor() all integer divisions (/me ducks for cover). > > > We are mostly inclined to implement 2) behaviour, but before > > proceed, I'd like to know what other people think about this. > > While Py3K is still a while away, I think it is good to keep it in > mind with new developments. Thanks for the remind about the future of the division operator in Py3k. However, the use of the / operator in this example is mostly anecdotal. The most important point here is how to cast (or not to cast) the types different than uint64 in order to operate with them. The thing that makes uint64 so special is that it is the largest integer (in current processors) that has a native representation (i.e. the processor can operate directly on them, so they can be processed very fast), and besides, there is no other (common native) type that can fully include all its precision (float64 has a mantissa of 53 bits, so this is not enough to represent 64 bits). So the problem is basically what to do when operations with uint64 have overflows (or underflows, like for example, dealing with negative values). In some sense, int64 has exactly the same problem, and typical languages seem to cope with this by using modular arithmetic (as Charles Harris graciously pointed out). Python doesn't need to rely on this, because in front of an overflow in native integers the outcome is silently promoted to a long int, which has an infinite precision in python (at the expense of much slower performance in operations and more space required to store it). However, NumPy and Numexpr (as well as PyTables itself) are all about performance and space efficency, so going to infinite precision is a no go. So, for me, it is becoming more and more clear that implementing support for uint64 (and probably int64) as a non-upcastable type, with the possible addition of casting operators (uint64->int64 and int64->uint64, and also probably int-->int64 and int-->uint64), as has been suggested by Timothy Hochberg in the NumPy list, and adopting modular arithmetic for dealing with overflows/underflows is probably the most sensible solution. I don't know how difficult it would be to implement this, however. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Tue Mar 11 06:00:27 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 11 Mar 2008 11:00:27 +0100 Subject: [Numpy-discussion] [Pytables-users] On Numexpr and uint64 type In-Reply-To: <200803111044.04092.faltet@carabos.com> References: <200803101808.42126.faltet@carabos.com> <200803111044.04092.faltet@carabos.com> Message-ID: <200803111100.27486.faltet@carabos.com> A Tuesday 11 March 2008, Francesc Altet escrigu?: > The thing that makes uint64 so special is that it is the largest > integer (in current processors) that has a native representation > (i.e. the processor can operate directly on them, so they can be > processed very fast), and besides, there is no other (common native) > type that can fully include all its precision (float64 has a mantissa > of 53 bits, so this is not enough to represent 64 bits). So the > problem is basically what to do when operations with uint64 have > overflows (or underflows, like for example, dealing with negative > values). Mmm, I'm thinking now that there exist a relatively common floating point that have a mantissa of 64 bit (at minimum), namely the extended precision ploating point [1] (in its 80-bit incarnation, it is an IEEE standard). In modern platforms, this is avalaible as a 'long double', and I'm wondering whether it would be useful for Numexpr purposes, but seems like it is. [1] http://en.wikipedia.org/wiki/Extended_precision Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Tue Mar 11 10:56:33 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Mar 2008 08:56:33 -0600 Subject: [Numpy-discussion] [Pytables-users] On Numexpr and uint64 type In-Reply-To: <200803111100.27486.faltet@carabos.com> References: <200803101808.42126.faltet@carabos.com> <200803111044.04092.faltet@carabos.com> <200803111100.27486.faltet@carabos.com> Message-ID: On Tue, Mar 11, 2008 at 4:00 AM, Francesc Altet wrote: > A Tuesday 11 March 2008, Francesc Altet escrigu?: > > The thing that makes uint64 so special is that it is the largest > > integer (in current processors) that has a native representation > > (i.e. the processor can operate directly on them, so they can be > > processed very fast), and besides, there is no other (common native) > > type that can fully include all its precision (float64 has a mantissa > > of 53 bits, so this is not enough to represent 64 bits). So the > > problem is basically what to do when operations with uint64 have > > overflows (or underflows, like for example, dealing with negative > > values). > > Mmm, I'm thinking now that there exist a relatively common floating > point that have a mantissa of 64 bit (at minimum), namely the extended > precision ploating point [1] (in its 80-bit incarnation, it is an IEEE > standard). In modern platforms, this is avalaible as a 'long double', > and I'm wondering whether it would be useful for Numexpr purposes, but > seems like it is. > Extended precision is iffy. It doesn't work on all platforms and even when it does the implementation can be strange. I think the normal double is the only thing you can count on right now. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From markbak at gmail.com Tue Mar 11 13:14:11 2008 From: markbak at gmail.com (mark) Date: Tue, 11 Mar 2008 10:14:11 -0700 (PDT) Subject: [Numpy-discussion] question on different win32 installers Message-ID: Hello - Anybody know the difference between numpy-1.0.4.win32-py2.4.exe and numpy-1.0.4.win32-p3-py2.4.exe Probably a simple question. Thanks for your help, Mark From matthieu.brucher at gmail.com Tue Mar 11 13:17:59 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 11 Mar 2008 18:17:59 +0100 Subject: [Numpy-discussion] question on different win32 installers In-Reply-To: References: Message-ID: p3 is not compiled with the SSE2 instructions (it stands for Pentium 3 and is needed for P3 and Athlon XP processors). Matthieu 2008/3/11, mark