From philbinj at gmail.com Fri Feb 1 03:17:43 2008 From: philbinj at gmail.com (James Philbin) Date: Fri, 1 Feb 2008 08:17:43 +0000 Subject: [Numpy-discussion] searchsorted bug In-Reply-To: <47A226EE.2050705@enthought.com> References: <2b1c8c4f0801310635y58700fe3n399d22311bc0441c@mail.gmail.com> <2b1c8c4f0801310710lb6dd039r9101d89f6b76878f@mail.gmail.com> <80c99e790801310817n69680f86u1ec371f4f89ab8c5@mail.gmail.com> <2b1c8c4f0801310933u48d92103q6924fa977078ecd6@mail.gmail.com> <2b1c8c4f0801310955wf3af941g6c88354c813b503e@mail.gmail.com> <2b1c8c4f0801311041t25c2f09ch398e5a6f62af33f6@mail.gmail.com> <47A226EE.2050705@enthought.com> Message-ID: <2b1c8c4f0802010017v3d8c80a3r17a94de2b993d1c5@mail.gmail.com> > Try out latest SVN. It should have this problem fixed. Thanks for this. I've realized that for my case, using object arrays is probably best. I still think that long term it would be good to allow comparison functions to take different types, so that one could compare say integer arrays with floating point arrays without doing an upcast. From lfriedri at imtek.de Fri Feb 1 04:57:37 2008 From: lfriedri at imtek.de (Lars Friedrich) Date: Fri, 01 Feb 2008 10:57:37 +0100 Subject: [Numpy-discussion] histogramdd memory needs Message-ID: <47A2ED11.8080405@imtek.de> Hello, I use numpy.histogramdd to compute three dimensional histograms with a total number of bins in the order of 1e7. It is clear to me, that such a histogram will take a lot of memory. For a dtype=N.float64, it will take roughly 80 megabytes. However, I have the feeling that during the histogram calculation, much more memory is needed. For example, when I have data.shape = (8e6, 3) and do a numpy.histogramdd(d, 280), I expect a histogram size of (280**3)*8 = 176 megabytes, but during histogram calculation the memory need of pythonw.exe in the Windows Task Manager increases up to 687 megabytes over the level before histogram calculation. When the calculation is done, the mem usage drops down to the expected value. I assume this is due to the internal way, numpy.histogramdd works. However, when I need to calculate even bigger histograms, I cannot do it this way. So I have the following questions: 1) How can I tell histogramdd to use another dtype than float64? My bins will be very little populated so an int16 should be sufficient. Without normalization, a Integer dtype makes more sense to me. 2) Is there a way to use another algorithm (at the cost of performance) that uses less memory during calculation so that I can generate bigger histograms? My numpy version is '1.0.4.dev3937' Thanks, Lars -- Dipl.-Ing. Lars Friedrich Photonic Measurement Technology Department of Microsystems Engineering -- IMTEK University of Freiburg Georges-K?hler-Allee 102 D-79110 Freiburg Germany phone: +49-761-203-7531 fax: +49-761-203-7537 room: 01 088 email: lars.friedrich at imtek.de From andrea.gavana at gmail.com Fri Feb 1 06:28:45 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 1 Feb 2008 11:28:45 +0000 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays Message-ID: Hi All, I sent a couple of messages to f2py mailing list, but it seems like my problem has no simple solution so I thought to ask for some suggestions here. Basically, I read some huge unformatted binary files which contain time-step data from a reservoir simulation. I don't know the dimensions (i.e., lengths) of the vectors I am going to read, and I find out this information only when I start reading the file. So, I thought it would be nice to do something like: 1) Declare outputVector as allocatable; 2) Start reading the file; 3) Find the outputVector dimension and allocate it; 4) Read the data in the outputVector; 5) Return this outputVector. It works when I compile it and build it in Fortran as an executable (defining a "main" program in my f90 module), but it bombs when I try to use it from Python with the error: C:\Documents and Settings\gavana\Desktop\ECLIPSEReader>prova.py Traceback (most recent call last): File "C:\Documents and Settings\gavana\Desktop\ECLIPSEReader\prova.py", line 3, in inteHead, propertyNames, propertyTypes, propertyNumbers = ECLIPSEReader.init.readinspec("OPT_INJ.INSPEC") ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (-1,) So, I have tried with a suggestion given in the f2py mailing list, and I found out that this routine works: MODULE DUMMY IMPLICIT NONE ! Ok, so I want an allocatable array as output real(8), allocatable :: realOutput(:) CONTAINS subroutine AllocateDummy(dummyInput) implicit none save ! dummyInput is *not* used, it's here just as ! an example integer, intent(in) :: dummyInput ! Allocate and build the output array allocate(realOutput(10)) realOutput(1:10) = 0.0 realOutput(3) = 3.0 realOutput(7) = 7.0 deallocate(realOutput) return end subroutine AllocateDummy END MODULE DUMMY But this one doesn't work: MODULE DUMMY IMPLICIT NONE ! Ok, so I want an allocatable array as output real(8), allocatable :: realOutput(:) integer, allocatable :: inteOutput(:) CONTAINS subroutine AllocateDummy(dummyInput) implicit none save ! dummyInput is *not* used, it's here just as ! an example integer, intent(in) :: dummyInput ! Allocate and build the output array allocate(realOutput(10)) allocate(inteOutput(20)) realOutput(1:10) = 0.0 realOutput(3) = 3.0 realOutput(7) = 7.0 inteOutput(10) = 2 deallocate(realOutput) deallocate(inteOutput) return end subroutine AllocateDummy END MODULE DUMMY The difference between the 2 scripts, is just that in the second one I want 2 allocatable arrays instead of 1. When I compile it with f2py, I get this warning from getarrdims: Building modules... Building module "dummy"... Constructing F90 module support for "dummy"... Variables: realoutput inteoutput getarrdims:warning: assumed shape array, using 0 instead of ':' getarrdims:warning: assumed shape array, using 0 instead of ':' Constructing wrapper function "dummy.allocatedummy"... allocatedummy(dummyinput) Which is not present if I compile script number 1. Actually, if I run script 2, I can't access anymore the 2 variables realoutput and inteoutput (they are not there), while with script 1 I can easily access realoutput by writing dummy.dummy.realoutput. I can't actually see any big difference between the 2 scripts... am I missing something? This is Windows XP, Python 2.5, numpy 1.0.3.1, Compaq Visual Fortran 6.6, MS Visual Studio .NET 2003. Thank you for all your suggestions, I am at loss. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From dalcinl at gmail.com Fri Feb 1 08:59:12 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 1 Feb 2008 10:59:12 -0300 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: References: Message-ID: Sorry if I'm making noise, my knowledge of fortran is really little, but in your routine AllocateDummy your are fist allocating and next deallocating the arrays. Are you sure you can then access the contents of your arrays after deallocating them? How much complicated is your binary format? For simple formats, you can just use numpy to read binary data, I use this sometimes, but again, for simple formats. On 2/1/08, Andrea Gavana wrote: > Hi All, > > I sent a couple of messages to f2py mailing list, but it seems > like my problem has no simple solution so I thought to ask for some > suggestions here. > > Basically, I read some huge unformatted binary files which contain > time-step data from a reservoir simulation. I don't know the > dimensions (i.e., lengths) of the vectors I am going to read, and I > find out this information only when I start reading the file. So, I > thought it would be nice to do something like: > > 1) Declare outputVector as allocatable; > 2) Start reading the file; > 3) Find the outputVector dimension and allocate it; > 4) Read the data in the outputVector; > 5) Return this outputVector. > > It works when I compile it and build it in Fortran as an executable > (defining a "main" program in my f90 module), but it bombs when I try > to use it from Python with the error: > > C:\Documents and Settings\gavana\Desktop\ECLIPSEReader>prova.py > Traceback (most recent call last): > File "C:\Documents and > Settings\gavana\Desktop\ECLIPSEReader\prova.py", line 3, in > inteHead, propertyNames, propertyTypes, propertyNumbers = > ECLIPSEReader.init.readinspec("OPT_INJ.INSPEC") > ValueError: failed to create intent(cache|hide)|optional array-- must > have defined dimensions but got (-1,) > > > So, I have tried with a suggestion given in the f2py mailing list, and > I found out that this routine works: > > > MODULE DUMMY > IMPLICIT NONE > > ! Ok, so I want an allocatable array as output > > real(8), allocatable :: realOutput(:) > > CONTAINS > > subroutine AllocateDummy(dummyInput) > > implicit none > save > > ! dummyInput is *not* used, it's here just as > ! an example > integer, intent(in) :: dummyInput > > ! Allocate and build the output array > allocate(realOutput(10)) > > realOutput(1:10) = 0.0 > realOutput(3) = 3.0 > realOutput(7) = 7.0 > > deallocate(realOutput) > > return > > end subroutine AllocateDummy > > > END MODULE DUMMY > > > > But this one doesn't work: > > > MODULE DUMMY > IMPLICIT NONE > > ! Ok, so I want an allocatable array as output > > real(8), allocatable :: realOutput(:) > integer, allocatable :: inteOutput(:) > > CONTAINS > > subroutine AllocateDummy(dummyInput) > > implicit none > save > > ! dummyInput is *not* used, it's here just as > ! an example > integer, intent(in) :: dummyInput > > > ! Allocate and build the output array > allocate(realOutput(10)) > allocate(inteOutput(20)) > > realOutput(1:10) = 0.0 > realOutput(3) = 3.0 > realOutput(7) = 7.0 > > inteOutput(10) = 2 > > deallocate(realOutput) > deallocate(inteOutput) > > return > > end subroutine AllocateDummy > > > END MODULE DUMMY > > > > The difference between the 2 scripts, is just that in the second one I > want 2 allocatable arrays instead of 1. When I compile it with f2py, I > get this warning from getarrdims: > > Building modules... > Building module "dummy"... > Constructing F90 module support for "dummy"... > Variables: realoutput inteoutput > getarrdims:warning: assumed shape array, using 0 instead of ':' > getarrdims:warning: assumed shape array, using 0 instead of ':' > Constructing wrapper function "dummy.allocatedummy"... > allocatedummy(dummyinput) > > > Which is not present if I compile script number 1. Actually, if I run > script 2, I can't access anymore the 2 variables realoutput and > inteoutput (they are not there), while with script 1 I can easily > access realoutput by writing dummy.dummy.realoutput. > I can't actually see any big difference between the 2 scripts... am I > missing something? > > This is Windows XP, Python 2.5, numpy 1.0.3.1, Compaq Visual Fortran > 6.6, MS Visual Studio .NET 2003. > > Thank you for all your suggestions, I am at loss. > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://xoomer.alice.it/infinity77/ > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From andrea.gavana at gmail.com Fri Feb 1 09:18:53 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 1 Feb 2008 14:18:53 +0000 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: References: Message-ID: Hi Lisandro, On Feb 1, 2008 1:59 PM, Lisandro Dalcin wrote: > Sorry if I'm making noise, my knowledge of fortran is really little, > but in your routine AllocateDummy your are fist allocating and next > deallocating the arrays. Are you sure you can then access the contents > of your arrays after deallocating them? Thank you for your answer. Unfortunately it seems that it doesn't matter whether I deallocate them or not, I still get the compilation warning and I can't access those variable in any case. It seems like f2py (or python or whatever) does not like having more than 1 allocatable array inside a MODULE declaration. > How much complicated is your binary format? *Very* complex. The fact is, I already know how to read those files in Fortran, is the linking with Python via f2py that is driving me mad. I can't believe no one has used before allocatable arrays as outputs (whether from a subroutine or from a module). > On 2/1/08, Andrea Gavana wrote: > > Hi All, > > > > I sent a couple of messages to f2py mailing list, but it seems > > like my problem has no simple solution so I thought to ask for some > > suggestions here. > > > > Basically, I read some huge unformatted binary files which contain > > time-step data from a reservoir simulation. I don't know the > > dimensions (i.e., lengths) of the vectors I am going to read, and I > > find out this information only when I start reading the file. So, I > > thought it would be nice to do something like: > > > > 1) Declare outputVector as allocatable; > > 2) Start reading the file; > > 3) Find the outputVector dimension and allocate it; > > 4) Read the data in the outputVector; > > 5) Return this outputVector. > > > > It works when I compile it and build it in Fortran as an executable > > (defining a "main" program in my f90 module), but it bombs when I try > > to use it from Python with the error: > > > > C:\Documents and Settings\gavana\Desktop\ECLIPSEReader>prova.py > > Traceback (most recent call last): > > File "C:\Documents and > > Settings\gavana\Desktop\ECLIPSEReader\prova.py", line 3, in > > inteHead, propertyNames, propertyTypes, propertyNumbers = > > ECLIPSEReader.init.readinspec("OPT_INJ.INSPEC") > > ValueError: failed to create intent(cache|hide)|optional array-- must > > have defined dimensions but got (-1,) > > > > > > So, I have tried with a suggestion given in the f2py mailing list, and > > I found out that this routine works: > > > > > > MODULE DUMMY > > IMPLICIT NONE > > > > ! Ok, so I want an allocatable array as output > > > > real(8), allocatable :: realOutput(:) > > > > CONTAINS > > > > subroutine AllocateDummy(dummyInput) > > > > implicit none > > save > > > > ! dummyInput is *not* used, it's here just as > > ! an example > > integer, intent(in) :: dummyInput > > > > ! Allocate and build the output array > > allocate(realOutput(10)) > > > > realOutput(1:10) = 0.0 > > realOutput(3) = 3.0 > > realOutput(7) = 7.0 > > > > deallocate(realOutput) > > > > return > > > > end subroutine AllocateDummy > > > > > > END MODULE DUMMY > > > > > > > > But this one doesn't work: > > > > > > MODULE DUMMY > > IMPLICIT NONE > > > > ! Ok, so I want an allocatable array as output > > > > real(8), allocatable :: realOutput(:) > > integer, allocatable :: inteOutput(:) > > > > CONTAINS > > > > subroutine AllocateDummy(dummyInput) > > > > implicit none > > save > > > > ! dummyInput is *not* used, it's here just as > > ! an example > > integer, intent(in) :: dummyInput > > > > > > ! Allocate and build the output array > > allocate(realOutput(10)) > > allocate(inteOutput(20)) > > > > realOutput(1:10) = 0.0 > > realOutput(3) = 3.0 > > realOutput(7) = 7.0 > > > > inteOutput(10) = 2 > > > > deallocate(realOutput) > > deallocate(inteOutput) > > > > return > > > > end subroutine AllocateDummy > > > > > > END MODULE DUMMY > > > > > > > > The difference between the 2 scripts, is just that in the second one I > > want 2 allocatable arrays instead of 1. When I compile it with f2py, I > > get this warning from getarrdims: > > > > Building modules... > > Building module "dummy"... > > Constructing F90 module support for "dummy"... > > Variables: realoutput inteoutput > > getarrdims:warning: assumed shape array, using 0 instead of ':' > > getarrdims:warning: assumed shape array, using 0 instead of ':' > > Constructing wrapper function "dummy.allocatedummy"... > > allocatedummy(dummyinput) > > > > > > Which is not present if I compile script number 1. Actually, if I run > > script 2, I can't access anymore the 2 variables realoutput and > > inteoutput (they are not there), while with script 1 I can easily > > access realoutput by writing dummy.dummy.realoutput. > > I can't actually see any big difference between the 2 scripts... am I > > missing something? > > > > This is Windows XP, Python 2.5, numpy 1.0.3.1, Compaq Visual Fortran > > 6.6, MS Visual Studio .NET 2003. > > > > Thank you for all your suggestions, I am at loss. > > > > Andrea. > > > > "Imagination Is The Only Weapon In The War Against Reality." > > http://xoomer.alice.it/infinity77/ > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From pearu at cens.ioc.ee Fri Feb 1 09:45:09 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri, 1 Feb 2008 16:45:09 +0200 (EET) Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: References: Message-ID: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> On Fri, February 1, 2008 1:28 pm, Andrea Gavana wrote: > Hi All, > > I sent a couple of messages to f2py mailing list, but it seems > like my problem has no simple solution so I thought to ask for some > suggestions here. Sorry, I haven't been around there long time. > Basically, I read some huge unformatted binary files which contain > time-step data from a reservoir simulation. I don't know the > dimensions (i.e., lengths) of the vectors I am going to read, and I > find out this information only when I start reading the file. So, I > thought it would be nice to do something like: > > 1) Declare outputVector as allocatable; > 2) Start reading the file; > 3) Find the outputVector dimension and allocate it; > 4) Read the data in the outputVector; looks ok. > 5) Return this outputVector. What do you mean by "return"? You cannot return allocatable arrays as far as comes to using f2py for generating wrappers. However, you can access allocatable array outputVector if it is module data, as you do below. > It works when I compile it and build it in Fortran as an executable > (defining a "main" program in my f90 module), but it bombs when I try > to use it from Python with the error: > > C:\Documents and Settings\gavana\Desktop\ECLIPSEReader>prova.py > Traceback (most recent call last): > File "C:\Documents and > Settings\gavana\Desktop\ECLIPSEReader\prova.py", line 3, in > inteHead, propertyNames, propertyTypes, propertyNumbers = > ECLIPSEReader.init.readinspec("OPT_INJ.INSPEC") > ValueError: failed to create intent(cache|hide)|optional array-- must > have defined dimensions but got (-1,) This exception is not directly related to what follows below. > So, I have tried with a suggestion given in the f2py mailing list, and > I found out that this routine works: > > > MODULE DUMMY > IMPLICIT NONE > > ! Ok, so I want an allocatable array as output > > real(8), allocatable :: realOutput(:) ... > END MODULE DUMMY > > > But this one doesn't work: > > > MODULE DUMMY > IMPLICIT NONE > > ! Ok, so I want an allocatable array as output > > real(8), allocatable :: realOutput(:) > integer, allocatable :: inteOutput(:) > > CONTAINS ... > END MODULE DUMMY This one works fine here: $ f2py -c -m m2 m2.f90 --fcompiler=gnu95 >>> import m2 >>> print m2.dummy.__doc__ realoutput - 'd'-array(-1), not allocated inteoutput - 'i'-array(-1), not allocated allocatedummy - Function signature: allocatedummy(dummyinput) Required arguments: dummyinput : input int > The difference between the 2 scripts, is just that in the second one I > want 2 allocatable arrays instead of 1. When I compile it with f2py, I > get this warning from getarrdims: > > Building modules... > Building module "dummy"... > Constructing F90 module support for "dummy"... > Variables: realoutput inteoutput ^^^^^^^^^^^^^^^^^^^^^ Looks like both allocatable arrays should be present also in your case. > getarrdims:warning: assumed shape array, using 0 instead of ':' > getarrdims:warning: assumed shape array, using 0 instead of ':' These warnings can be ignored. > Which is not present if I compile script number 1. Actually, if I run > script 2, I can't access anymore the 2 variables realoutput and > inteoutput (they are not there), while with script 1 I can easily > access realoutput by writing dummy.dummy.realoutput. What do you mean by accessing? What happens when you type: dummy.dummy.inteoutput ? Note that when you call AllocateDummy function, then you allocate and then deallocate the arrays. So, in Python dummy.dummy.realoutput and dummy.dummy.inteoutput should always return None. See http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#allocatable-arrays for how to use allocatable module data from Python. > I can't actually see any big difference between the 2 scripts... am I > missing something? > > This is Windows XP, Python 2.5, numpy 1.0.3.1, Compaq Visual Fortran > 6.6, MS Visual Studio .NET 2003. I am using numpy from svn. Regards, Pearu From pearu at cens.ioc.ee Fri Feb 1 09:49:06 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri, 1 Feb 2008 16:49:06 +0200 (EET) Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: References: Message-ID: <60311.85.166.27.136.1201877346.squirrel@cens.ioc.ee> On Fri, February 1, 2008 4:18 pm, Andrea Gavana wrote: > Hi Lisandro, > > On Feb 1, 2008 1:59 PM, Lisandro Dalcin wrote: >> Sorry if I'm making noise, my knowledge of fortran is really little, >> but in your routine AllocateDummy your are fist allocating and next >> deallocating the arrays. Are you sure you can then access the contents >> of your arrays after deallocating them? > > Thank you for your answer. > > Unfortunately it seems that it doesn't matter whether I deallocate > them or not, I still get the compilation warning and I can't access > those variable in any case. You cannot access those becase they are deallocated. Try to disable deallocate statements in your fortran code. > It seems like f2py (or python or whatever) > does not like having more than 1 allocatable array inside a MODULE > declaration. This is not true. >> How much complicated is your binary format? > > *Very* complex. The fact is, I already know how to read those files in > Fortran, is the linking with Python via f2py that is driving me mad. I > can't believe no one has used before allocatable arrays as outputs > (whether from a subroutine or from a module). You can use allocatable arrays from module in Python as described in f2py users guide. It could be that the problem is related to deallocating the arrays in the fortran code. Regards, Pearu From david.huard at gmail.com Fri Feb 1 10:08:19 2008 From: david.huard at gmail.com (David Huard) Date: Fri, 1 Feb 2008 10:08:19 -0500 Subject: [Numpy-discussion] histogramdd memory needs In-Reply-To: <47A2ED11.8080405@imtek.de> References: <47A2ED11.8080405@imtek.de> Message-ID: <91cf711d0802010708v47f337b5vd9dacf77b5da4b27@mail.gmail.com> Hi Lars, [...] 2008/2/1, Lars Friedrich : > > > 1) How can I tell histogramdd to use another dtype than float64? My bins > will be very little populated so an int16 should be sufficient. Without > normalization, a Integer dtype makes more sense to me. There is no way you'll be able to ask that without tweaking the histogramdd function yourself. The relevant bit of code is the instantiation of hist : hist = zeros(nbin.prod(), float) 2) Is there a way to use another algorithm (at the cost of performance) > that uses less memory during calculation so that I can generate bigger > histograms? You could work through your array block by block. Simply fix the range and generate an histogram for each slice of 100k data and sum them up at the end. The current histogram and histogramdd implementation has the advantage of being general, that is you can work with uniform or non-uniform bins, but it is not particularly efficient, at least for large number of bins (>30). Cheers, David My numpy version is '1.0.4.dev3937' > > Thanks, > Lars > > > -- > Dipl.-Ing. Lars Friedrich > > Photonic Measurement Technology > Department of Microsystems Engineering -- IMTEK > University of Freiburg > Georges-K?hler-Allee 102 > D-79110 Freiburg > Germany > > phone: +49-761-203-7531 > fax: +49-761-203-7537 > room: 01 088 > email: lars.friedrich at imtek.de > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Fri Feb 1 13:14:47 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 1 Feb 2008 19:14:47 +0100 Subject: [Numpy-discussion] Can not update a submatrix In-Reply-To: <200801311323.36744.faltet@carabos.com> References: <1201700540.26483.16.camel@nadav.envision.co.il> <200801311323.36744.faltet@carabos.com> Message-ID: <200802011914.47563.faltet@carabos.com> A Thursday 31 January 2008, Francesc Altet escrigu?: > A Wednesday 30 January 2008, Timothy Hochberg escrigu?: > > [...a fine explanation by Anne and Timothy...] > > Ok. As it seems that this subject has interest enough, I went ahead > and created a small document about views vs copies at: > > http://www.scipy.org/Cookbook/ViewsVsCopies Ooops, I think I've missed the NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial which already talked about copies vs views :-/. Well, I think my small document can complement some parts of the tutorial. I'll do that as soon as I can and remove the recipe from the cookbook. Sorry for the noise. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From robert.kern at gmail.com Fri Feb 1 13:39:21 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 01 Feb 2008 12:39:21 -0600 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> References: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> Message-ID: <47A36759.70502@gmail.com> Pearu Peterson wrote: > On Fri, February 1, 2008 1:28 pm, Andrea Gavana wrote: >> Hi All, >> >> I sent a couple of messages to f2py mailing list, but it seems >> like my problem has no simple solution so I thought to ask for some >> suggestions here. > > Sorry, I haven't been around there long time. Are you going to continue not reading the f2py list? If so, you should point everyone there to this list and close the list. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pearu at cens.ioc.ee Fri Feb 1 14:08:34 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri, 1 Feb 2008 21:08:34 +0200 (EET) Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: <47A36759.70502@gmail.com> References: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> <47A36759.70502@gmail.com> Message-ID: <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> On Fri, February 1, 2008 8:39 pm, Robert Kern wrote: > Pearu Peterson wrote: >> On Fri, February 1, 2008 1:28 pm, Andrea Gavana wrote: >>> Hi All, >>> >>> I sent a couple of messages to f2py mailing list, but it seems >>> like my problem has no simple solution so I thought to ask for some >>> suggestions here. >> >> Sorry, I haven't been around there long time. > > Are you going to continue not reading the f2py list? If so, you should > point everyone there to this list and close the list. It is a bit embarrassing, I haven't read the list because I lost the link to f2py list archives that I used to use in my mailing box in the server.. but I have been also busy with non-python stuff last years (I moved and got married..:). Anyway, I have subscribed to the f2py list again I'll try to respond to any messages that have unresolved issues, also in the arhives. Thanks, Pearu From robert.kern at gmail.com Fri Feb 1 14:14:09 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 01 Feb 2008 13:14:09 -0600 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> References: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> <47A36759.70502@gmail.com> <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> Message-ID: <47A36F81.6090000@gmail.com> Pearu Peterson wrote: > On Fri, February 1, 2008 8:39 pm, Robert Kern wrote: >> Pearu Peterson wrote: >>> On Fri, February 1, 2008 1:28 pm, Andrea Gavana wrote: >>>> Hi All, >>>> >>>> I sent a couple of messages to f2py mailing list, but it seems >>>> like my problem has no simple solution so I thought to ask for some >>>> suggestions here. >>> Sorry, I haven't been around there long time. >> Are you going to continue not reading the f2py list? If so, you should >> point everyone there to this list and close the list. > > It is a bit embarrassing, I haven't read the list because I lost > the link to f2py list archives that I used to use in my mailing box > in the server.. but I have been also busy with non-python stuff > last years (I moved and got married..:). > Anyway, I have subscribed to the f2py list again I'll try to respond > to any messages that have unresolved issues, also in the arhives. Great. Thank you. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Fri Feb 1 18:43:26 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Feb 2008 18:43:26 -0500 Subject: [Numpy-discussion] Call to mrecords users Message-ID: <200802011843.26566.pgmdevlist@gmail.com> All, I just commited some updates on mrecords (numpy.maskedarray branch). Overall, it's a reorganization/simplification of the code. Regular masked arrays can already recognize named fields, but the mask works only at the record level (ie, all the fields of one record are masked). In comparison, masked record arrays (mrecarrays) permit the masking of individual fields. The conversion between a masked array of records and a mrecarray is as easy as for the classical ndarray: just use a .view(mrecarray). I'd be grateful if the current users of mrecords could give the new version a try, and let me know whether everything works seamlessly or if new modifications need to be implemented. Thanks a lot in advance. PS: that goes as well for users of trecords (in scikits.timeseries): the package has been updated to match the recent modifications on mrecords From david at ar.media.kyoto-u.ac.jp Sat Feb 2 07:07:01 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 02 Feb 2008 21:07:01 +0900 Subject: [Numpy-discussion] [ANN] Blas-Lapack superpack: click only blas/lapack installation (first alpha) Message-ID: <47A45CE5.2000305@ar.media.kyoto-u.ac.jp> Hi, I started working on an easy installer for blas/lapack. The idea is that you would use this installer so that building numpy and scipy from source is easy on windows (32 bits for now). It would give blas/lapack compiled correctly, with optional atlas optimized version. http://www.ar.media.kyoto-u.ac.jp/members/david/archives/blas_lapack_superpack.exe How to use ========== Run the setup.exe, click yes all the way. Add the installed dll in your path, or add the path where the dll are installed in your PATH. main features: ============== - Click only, easy installation of blas and lapack libraries (including atlas if supported, see below). - Install atlas *only if your cpu supports it*: that's the main feature, actually. The installer detects your cpu, and install ATLAS only if an ATLAS matching your CPU is found (only SSE3 supported for now, but other arch, included 3dnow and co, can easily be added depending on people help to provide the built ATLAS). What can you do with it: ======================== - compile numpy and scipy without wrong SSE problem, without bothering about compiling netlib BLAS/LAPACK, etc... - compile numpy wo any fortran compiler, VS only (no need for mingw, etc... thanks to VS import libraries + DLL). - use the installed lapack to build an optimized ATLAS for your architecture (using both gnu compilers and proprietary compilers should be possible). More details ============ - built with mingw g77 from linux (dll, unix-style static archives and def) - import libraries built with VS 2003. This means you can compile numpy wo any fortran compiler, in particular, no need for mingw. I don't know if this is compatible with other versions of VS, though. - Only SSE3 and above will get ATLAS for now. This is because compiling ATLAS on windows is a PITA, and I don't want to spend time on this, so if you want something else, you will have to provide me the atlas binary first. But having atlas for sse, sse2, 3dnow, etc... is entirely possible. - I do not register the DLL yet, because I am not sure yet how to do it in a safe way (thanks MS for a totally broken handling of shared libraries, BTW) - I do not guarantee that the built atlas is optimal. ATLAS performances depend on many parameters, not just sse/sse2/sse3 (size of L1/L2/L3 cache are significant, for example), and again, I cannot build many different libraries. - The installer is built using nsis. - The whole process of making the installer is not 100 % automatic yet, but I intend to make it so, and put the necessary scripts somewhere so people can improve it if they want. This is alpha software, and because it is an installer, it can screw up your computer if I did something wrong. However, I barely touch the registry, and only install files in one directory, so the chances are pretty minimal (that's why also you have to put dll manually in a path where they will be found manually: at some point, this will be done by the installer, but that's by far the most dangerous thing, so I prefer avoiding it for now). cheers, David From eads at soe.ucsc.edu Sun Feb 3 14:25:56 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sun, 03 Feb 2008 12:25:56 -0700 Subject: [Numpy-discussion] Unexpected behavior with numpy array Message-ID: <47A61544.1060401@soe.ucsc.edu> Good day, Reversing a 1-dimensional array in numpy is simple, A = A[:,:,-1] . However A is a new array referring to the old one and is no longer contiguous. While trying to reverse an array in place and keep it contiguous, I encountered some weird behavior. The reason for keeping it contiguous is the array must be passed to an old C function I have, which expects the buffer to be in row major order and contiguous. I am using lots of memory so I want to minimize copying and allocation of new arrays. >>> A=numpy.arange(0,10) >>> A array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> A[::-1] array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) >>> A[:] = A[::-1] >>> A array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9]) >>> Is there any way to perform assignments of the following form X[sliceI] = expression involving X with dimensions that are compatible with the left hand side without causing the odd behavior I mentioned. If not, it might be helpful to throw an exception when both the LHS and the RHS of an assignment reference an array slice of the same variable? On similar note, does the assignment A = A * B create a new array with a new buffer to hold the result of A * B, and assign A to refer to the new array? Thanks very much! Damian From eads at soe.ucsc.edu Sun Feb 3 14:27:32 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sun, 03 Feb 2008 12:27:32 -0700 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A61544.1060401@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> Message-ID: <47A615A4.9020906@soe.ucsc.edu> Damian Eads wrote: > Good day, > > Reversing a 1-dimensional array in numpy is simple, > > A = A[:,:,-1] . Err, I meant A=A[::-1] here. My apologies. From gael.varoquaux at normalesup.org Sun Feb 3 14:29:16 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 3 Feb 2008 20:29:16 +0100 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A61544.1060401@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> Message-ID: <20080203192916.GO27970@phare.normalesup.org> On Sun, Feb 03, 2008 at 12:25:56PM -0700, Damian Eads wrote: > On similar note, does the assignment > A = A * B > create a new array with a new buffer to hold the result of A * B, and > assign A to refer to the new array? Yes. Without a JIT, Python cannot know that A is present both on the RHS and on the LHS of the equation. If you want to modify in place A, you can do A *= B HTH, Ga?l From peridot.faceted at gmail.com Sun Feb 3 17:15:58 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 3 Feb 2008 17:15:58 -0500 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A61544.1060401@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> Message-ID: On 03/02/2008, Damian Eads wrote: > Good day, > > Reversing a 1-dimensional array in numpy is simple, > > A = A[:,:,-1] . > > However A is a new array referring to the old one and is no longer > contiguous. > > While trying to reverse an array in place and keep it contiguous, I > encountered some weird behavior. The reason for keeping it contiguous is > the array must be passed to an old C function I have, which expects the > buffer to be in row major order and contiguous. I am using lots of > memory so I want to minimize copying and allocation of new arrays. The short answer is that reversing an array in-place requires a certain amount of care, in C - you have to explicitly walk through swapping element i and element n-i, using a temporary variable. Getting numpy to exchange the elements in-place is going to be a real pain. I suggest trying a copying method first, and only getting fancier if it's too slow. > >>> A=numpy.arange(0,10) > >>> A > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > >>> A[::-1] > array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) > >>> A[:] = A[::-1] > >>> A > array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9]) > >>> > > Is there any way to perform assignments of the following form > > X[sliceI] = expression involving X > with dimensions that are > compatible with the left > hand side > > without causing the odd behavior I mentioned. If not, it might be > helpful to throw an exception when both the LHS and the RHS of an > assignment reference an array slice of the same variable? This is, odd as it seems, intended behaviour. Sometimes it's really useful to use the same array on the LHS and RHS: for example A[:-1]=A[1:] You just need to know how the operation is done under the hood (the arrays are iterated over in index order). (Actually, I'm not totally sure this is specified - under some circumstances numpy may iterate over dimensions in an order based on stride and/or size; whether this can affect the result of an operation like the above I'll have to think about.) In any case, much code depends on this. Tricks for getting the array backwards without copying... hmm. Well, you might be able to fill in the array in an unconventional order: A = N.arange(n-1,-1,-1) A=A[::-1] N.sin(A,A) # or whatever Now the reversal of A is a perfectly normal C array (though numpy may not realize this; don't trust the flags, check the strides and sizes). Or you could just write a little C function inplace_reverse(A); if you're already linking to C this shouldn't add too much complexity to your project. Or you could do it explicitly in numpy: for i in xrange(n/2): t = A[i] A[i]=A[n-i] A[n-i]=t In the likely case that this is too slow, you can do the copying in blocks, small enough that memory consumption is moderate but large enough that the python overhead is not too much. Finally, I realize that digging around in legacy code can be miserable, but it is often not really very difficult to make a C function handle strided data - the whole principle of numpy is that compiled code really just needs to know the start address, data type, and the spacing and length of an array along each dimension. Anne From eads at soe.ucsc.edu Sun Feb 3 20:27:13 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sun, 03 Feb 2008 18:27:13 -0700 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: References: <47A61544.1060401@soe.ucsc.edu> Message-ID: <47A669F1.1070509@soe.ucsc.edu> Thanks Anne for your very informative response. Anne Archibald wrote: > On 03/02/2008, Damian Eads wrote: >> Good day, >> >> Reversing a 1-dimensional array in numpy is simple, >> >> A = A[:,:,-1] . >> >> However A is a new array referring to the old one and is no longer >> contiguous. >> >> While trying to reverse an array in place and keep it contiguous, I >> encountered some weird behavior. The reason for keeping it contiguous is >> the array must be passed to an old C function I have, which expects the >> buffer to be in row major order and contiguous. I am using lots of >> memory so I want to minimize copying and allocation of new arrays. > > The short answer is that reversing an array in-place requires a > certain amount of care, in C - you have to explicitly walk through > swapping element i and element n-i, using a temporary variable. > Getting numpy to exchange the elements in-place is going to be a real > pain. I suggest trying a copying method first, and only getting > fancier if it's too slow. I was using the copying method on my smaller data set. When I run the same code on the larger data set, I get "Out of Memory" errors. >> >>> A=numpy.arange(0,10) >> >>> A >> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >>> A[::-1] >> array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) >> >>> A[:] = A[::-1] >> >>> A >> array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9]) >> >>> >> >> Is there any way to perform assignments of the following form >> >> X[sliceI] = expression involving X >> with dimensions that are >> compatible with the left >> hand side >> >> without causing the odd behavior I mentioned. If not, it might be >> helpful to throw an exception when both the LHS and the RHS of an >> assignment reference an array slice of the same variable? > > This is, odd as it seems, intended behaviour. Sometimes it's really > useful to use the same array on the LHS and RHS: for example > > A[:-1]=A[1:] Agreed, this way of in-place shifting is useful, and we certainly would not want to forbid it. > You just need to know how the operation is done under the hood (the > arrays are iterated over in index order). (Actually, I'm not totally > sure this is specified - under some circumstances numpy may iterate > over dimensions in an order based on stride and/or size; whether this > can affect the result of an operation like the above I'll have to > think about.) In any case, much code depends on this. > > Tricks for getting the array backwards without copying... hmm. Well, > you might be able to fill in the array in an unconventional order: > > A = N.arange(n-1,-1,-1) > A=A[::-1] > N.sin(A,A) # or whatever Indeed, building an index array in descending order is one way to fill an array in reverse order. The array I am dealing with is not generated from arange but is generated from another numerical routine, which is significantly more complicated. arange was used in the example to give a simple reproduction of the error. > Now the reversal of A is a perfectly normal C array (though numpy may > not realize this; don't trust the flags, check the strides and sizes). Agreed -- that reversals are easy to implement in C. To disclose more: I need much more than reversals so I tried to see how well numpy supports in-place algorithms, in general. If it was well-supported, I figured I could save myself some time from writing a whole bunch of other C code, and write more succinct numpy code instead. The reversal code was just a small experiment. I should have given the idea more thought -- that many in-place algorithms can be very non-trivial to vectorize. Thus, it is unreasonable to expect numpy array slicing to generally support the safe implementation of vectorized, in-place algorithms. The fact that sometimes weird behavior occurs is a bit concerning but I guess there are always dangers that come with the flexibility offered by Python. > Or you could do it explicitly in numpy: > for i in xrange(n/2): > t = A[i] > A[i]=A[n-i] > A[n-i]=t I'm trying to avoid using Python 'for' loops; there is too much data. > In the likely case that this is too slow, you can do the copying in > blocks, small enough that memory consumption is moderate but large > enough that the python overhead is not too much. One of the reasons why I chose numpy/Python was that it offers a succinct syntax. A block-based approach would work, but with less succinctness and readability than the C function. > Finally, I realize that digging around in legacy code can be > miserable, but it is often not really very difficult to make a C > function handle strided data - the whole principle of numpy is that > compiled code really just needs to know the start address, data type, > and the spacing and length of an array along each dimension. This concept of striding an array buffer passed from some higher level language is not new to numpy/Python though. There are potentially additional costs when more complicated non-contiguous striding is used like page faults and additional arithmetic for computing complex indexes. Adding one to an index or incrementing a buffer pointer is generally more readable and probably requires less computation. I agree that in general it's not difficult to use striding but the larger the legacy code base, the more changes that might be needed, and the more likely bugs or bottlenecks will be introduced. As the number of bugs increases, they become more difficult to pin down. Thus, I'm generally a bit cautious when considering such changes to legacy code. Here's another question: is there any way to construct a numpy array and specify the buffer address where it should store its values? I ask because I would like to construct numpy arrays that work on buffers that come from mmap. Thanks again for your comments and help! Damian Eads --- University of California, Santa Cruz http://www.soe.ucsc.edu/~eads From robert.kern at gmail.com Sun Feb 3 21:55:57 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 03 Feb 2008 20:55:57 -0600 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A669F1.1070509@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> <47A669F1.1070509@soe.ucsc.edu> Message-ID: <47A67EBD.4070608@gmail.com> Damian Eads wrote: > Here's another question: is there any way to construct a numpy array and > specify the buffer address where it should store its values? I ask > because I would like to construct numpy arrays that work on buffers that > come from mmap. Can you clarify that a little? By "buffer" do you mean a Python buffer() object? By "mmap" do you mean Python's mmap in the standard library? numpy has a memmap class which subclasses ndarray to wrap a mmapped file. It handles the opening and mmapping of the file itself, but it could be subclassed to override this behavior to take an already opened mmap object. In general, if you have a buffer() object, you can make an array from it using numpy.frombuffer(). This will be a standard ndarray and won't have the conveniences of syncing to disk that the memmap class provides. If you don't have a buffer() object, but just have a pointer allocated from some C code, then you *could* fake an object which exposes the __array_interface__() method to describe the memory. The numpy.asarray() constructor will use that to make an ndarray object that uses the specified memory. This is advanced stuff and difficult to get right because of memory ownership and object lifetime issues. If you can modify the C code, it might be easier for you to have numpy allocate the memory, then make the C code use that pointer to do its operations. But look at numpy.memmap first and see if it fits your needs. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From eads at soe.ucsc.edu Sun Feb 3 23:43:34 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Sun, 03 Feb 2008 21:43:34 -0700 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A67EBD.4070608@gmail.com> References: <47A61544.1060401@soe.ucsc.edu> <47A669F1.1070509@soe.ucsc.edu> <47A67EBD.4070608@gmail.com> Message-ID: <47A697F6.1010403@soe.ucsc.edu> Robert Kern wrote: > Damian Eads wrote: >> Here's another question: is there any way to construct a numpy array and >> specify the buffer address where it should store its values? I ask >> because I would like to construct numpy arrays that work on buffers that >> come from mmap. > > Can you clarify that a little? By "buffer" do you mean a Python buffer() object? Yes, I mean the .data field of a numpy array, which is a buffer object, and points to the memory where an array's values are stored. > By "mmap" do you mean Python's mmap in the standard library? I actually was referring to the C Standard Library's mmap. My intention was to use a pointer returned by C-mmap as the ".data" buffer to store array values. > numpy has a memmap class which subclasses ndarray to wrap a mmapped file. It > handles the opening and mmapping of the file itself, but it could be subclassed > to override this behavior to take an already opened mmap object. This may satisfy my needs. I'm going to look into it and get back to you. > In general, if you have a buffer() object, you can make an array from it using > numpy.frombuffer(). This will be a standard ndarray and won't have the > conveniences of syncing to disk that the memmap class provides. This is good to know because there have been a few situations when this would have been very useful. Suppose I do something like (in Python): import ctypes mylib = ctypes.CDLL('libmylib.so') y = mylib.get_float_array_from_c_function() which returns a float* as a Python int, and then I do nelems = mylib.get_float_array_num_elems() x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems) This gives me an ndarray x with its (.data) buffer pointing to the memory address give by y. When the ndarray x is no longer referenced (even as another array's base), does numpy attempt to free the memory pointed to by y? In other words, does numpy always deallocate the (.data) buffer in the __del__ method? Or, does fromarray set a flag telling it not to? > If you don't have a buffer() object, but just have a pointer allocated from some > C code, then you *could* fake an object which exposes the __array_interface__() > method to describe the memory. The numpy.asarray() constructor will use that to > make an ndarray object that uses the specified memory. This is advanced stuff > and difficult to get right because of memory ownership and object lifetime > issues. Allocating memory in C code would be very useful for me. If I were to use such a numpy.asarray() function (seems the frombuffer you mentioned would also work as described above), it makes sense for the C code to be responsible for deallocating the memory, not numpy. I understand that I would need to ensure that the deallocation happens only when the containing ndarray is no longer referenced anywhere in Python (hopefully, ndarray's finalization code does not need access to the .data buffer). > If you can modify the C code, it might be easier for you to have numpy > allocate the memory, then make the C code use that pointer to do its operations. > > But look at numpy.memmap first and see if it fits your needs. Will do! Thanks for the pointers! Damian From robert.kern at gmail.com Mon Feb 4 00:47:28 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 03 Feb 2008 23:47:28 -0600 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A697F6.1010403@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> <47A669F1.1070509@soe.ucsc.edu> <47A67EBD.4070608@gmail.com> <47A697F6.1010403@soe.ucsc.edu> Message-ID: <47A6A6F0.4040405@gmail.com> Damian Eads wrote: > Robert Kern wrote: >> Damian Eads wrote: >>> Here's another question: is there any way to construct a numpy array and >>> specify the buffer address where it should store its values? I ask >>> because I would like to construct numpy arrays that work on buffers that >>> come from mmap. >> Can you clarify that a little? By "buffer" do you mean a Python buffer() object? > > Yes, I mean the .data field of a numpy array, which is a buffer object, > and points to the memory where an array's values are stored. Actually, the .data field is always constructed by ndarray; it is never provided *to* ndarray even if you construct the ndarray from a buffer object. The buffer object's information is interpreted to construct the ndarray object and then the original buffer object is ignored. The .data attribute will be constructed "on-the-fly" when it is requested. In [9]: from numpy import * In [10]: s = 'aaaa' In [11]: b = buffer(s) In [12]: a = frombuffer(b, dtype=int32) In [13]: a.data is b Out[13]: False In [14]: d1 = a.data In [15]: d2 = a.data In [16]: d1 is d2 Out[16]: False >> By "mmap" do you mean Python's mmap in the standard library? > > I actually was referring to the C Standard Library's mmap. My intention > was to use a pointer returned by C-mmap as the ".data" buffer to store > array values. > >> numpy has a memmap class which subclasses ndarray to wrap a mmapped file. It >> handles the opening and mmapping of the file itself, but it could be subclassed >> to override this behavior to take an already opened mmap object. > > This may satisfy my needs. I'm going to look into it and get back to you. > >> In general, if you have a buffer() object, you can make an array from it using >> numpy.frombuffer(). This will be a standard ndarray and won't have the >> conveniences of syncing to disk that the memmap class provides. > > This is good to know because there have been a few situations when this > would have been very useful. > > Suppose I do something like (in Python): > > import ctypes > mylib = ctypes.CDLL('libmylib.so') > y = mylib.get_float_array_from_c_function() > > which returns a float* as a Python int, and then I do > > nelems = mylib.get_float_array_num_elems() > x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems) > > This gives me an ndarray x with its (.data) buffer pointing to the > memory address give by y. When the ndarray x is no longer referenced > (even as another array's base), does numpy attempt to free the memory > pointed to by y? In other words, does numpy always deallocate the > (.data) buffer in the __del__ method? Or, does fromarray set a flag > telling it not to? By default, frombuffer() creates an array that is flagged as not owning the data. That means it will not delete the data memory when the ndarray object is destroyed. In [69]: import ctypes In [70]: ca = (ctypes.c_int*8)() In [71]: a = frombuffer(ci, int) In [72]: a Out[72]: array([0, 0, 0, 0, 0, 0, 0, 0]) In [73]: a.flags Out[73]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >> If you don't have a buffer() object, but just have a pointer allocated from some >> C code, then you *could* fake an object which exposes the __array_interface__() >> method to describe the memory. The numpy.asarray() constructor will use that to >> make an ndarray object that uses the specified memory. This is advanced stuff >> and difficult to get right because of memory ownership and object lifetime >> issues. > > Allocating memory in C code would be very useful for me. If I were to > use such a numpy.asarray() function (seems the frombuffer you mentioned > would also work as described above), Yes, if you can create the buffer object or something that obeys the buffer protocol. ctypes arrays work fine; ctypes pointers don't. > it makes sense for the C code to be > responsible for deallocating the memory, not numpy. I understand that I > would need to ensure that the deallocation happens only when the > containing ndarray is no longer referenced anywhere in Python > (hopefully, ndarray's finalization code does not need access to the > .data buffer). My experience has been that this is fairly difficult to do. If you have *complete* control of the ndarray object over its entire lifetime, then this is reasonable. If you don't, then you are going to run into (nondeterministic!) segfaulting bugs eventually. For example, if you are only using it as a temporary inside a function and never return it, this is fine. You will also need to be very careful about constructing views from the ndarray; these will need to be controlled, too. You will have a bug if you delete myarray but return reversed_array=myarray[::-1], for example. I see that you are using ctypes. Be sure to take a look at the .ctypes attribute on ndarrays. This allows you to get a ctypes pointer object from an array. This might help you use numpy to allocate the memory and pass that in to your C functions. In [47]: a.ctypes.data_as(ctypes.POINTER(ctypes.c_int)) Out[47]: http://www.scipy.org/Cookbook/Ctypes -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at enthought.com Mon Feb 4 01:03:11 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 04 Feb 2008 00:03:11 -0600 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A697F6.1010403@soe.ucsc.edu> References: <47A61544.1060401@soe.ucsc.edu> <47A669F1.1070509@soe.ucsc.edu> <47A67EBD.4070608@gmail.com> <47A697F6.1010403@soe.ucsc.edu> Message-ID: <47A6AA9F.3000607@enthought.com> Damian Eads wrote: > > This is good to know because there have been a few situations when this > would have been very useful. > > Suppose I do something like (in Python): > > import ctypes > mylib = ctypes.CDLL('libmylib.so') > y = mylib.get_float_array_from_c_function() > > which returns a float* as a Python int, and then I do > > nelems = mylib.get_float_array_num_elems() > x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems) > > This gives me an ndarray x with its (.data) buffer pointing to the > memory address give by y. When the ndarray x is no longer referenced > (even as another array's base), does numpy attempt to free the memory > pointed to by y? In other words, does numpy always deallocate the > (.data) buffer in the __del__ method? Or, does fromarray set a flag > telling it not to? > NumPy won't free the memory unless the OWNDATA flag is set. Look at the flags attribute. Frombuffer creates arrays that don't own there own data, so you are safe. A reference is kept in the NumPy array to the buffer object so it won't be deleted. The ctypes.c_buffer is a new one for me. But, it looks like that would work. NumPy is pretty useful for wrapping raw pointers to memory and then playing with the data inside of Python however you would like. The extended data-types make it very easy to do simple things with large data sets. It's one of the not as widely understood features of NumPy. -Travis O. From robert.kern at gmail.com Mon Feb 4 01:31:28 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 04 Feb 2008 00:31:28 -0600 Subject: [Numpy-discussion] Unexpected behavior with numpy array In-Reply-To: <47A6AA9F.3000607@enthought.com> References: <47A61544.1060401@soe.ucsc.edu> <47A669F1.1070509@soe.ucsc.edu> <47A67EBD.4070608@gmail.com> <47A697F6.1010403@soe.ucsc.edu> <47A6AA9F.3000607@enthought.com> Message-ID: <47A6B140.8000305@gmail.com> Travis E. Oliphant wrote: > Damian Eads wrote: >> This is good to know because there have been a few situations when this >> would have been very useful. >> >> Suppose I do something like (in Python): >> >> import ctypes >> mylib = ctypes.CDLL('libmylib.so') >> y = mylib.get_float_array_from_c_function() >> >> which returns a float* as a Python int, and then I do >> >> nelems = mylib.get_float_array_num_elems() >> x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems) >> >> This gives me an ndarray x with its (.data) buffer pointing to the >> memory address give by y. When the ndarray x is no longer referenced >> (even as another array's base), does numpy attempt to free the memory >> pointed to by y? In other words, does numpy always deallocate the >> (.data) buffer in the __del__ method? Or, does fromarray set a flag >> telling it not to? > > NumPy won't free the memory unless the OWNDATA flag is set. Look at the > flags attribute. Frombuffer creates arrays that don't own there own > data, so you are safe. A reference is kept in the NumPy array to the > buffer object so it won't be deleted. > > The ctypes.c_buffer is a new one for me. Unfortunately, it doesn't work the way it is used in Damian's example. It is a deprecated alias for create_string_buffer(), which creates a ctypes array of c_char from a string or a size. It does not make a buffer object from a ctypes pointer object or an address. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lfriedri at imtek.de Mon Feb 4 03:37:51 2008 From: lfriedri at imtek.de (Lars Friedrich) Date: Mon, 04 Feb 2008 09:37:51 +0100 Subject: [Numpy-discussion] histogramdd memory needs Message-ID: <47A6CEDF.9020909@imtek.de> Hi, > 2) Is there a way to use another algorithm (at the cost of performance) >> > that uses less memory during calculation so that I can generate bigger >> > histograms? > > > You could work through your array block by block. Simply fix the range and > generate an histogram for each slice of 100k data and sum them up at the > end. Thank you for your answer. I sliced the (original) data into blocks. However, when I do this, I need at least twice the memory for the whole histogram (one for the temporary result and one for accumulating the total result). Assuming my histogram has a size of (280**3)*8 = 176 (megabytes) this does not help, I think. What I will try next is to compute smaller parts of the big histogram and combine them at the end. (Slice the histogram into blocks) Is it this, that you were recommending? Lars From cournape at gmail.com Mon Feb 4 05:13:53 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 4 Feb 2008 19:13:53 +0900 Subject: [Numpy-discussion] An idea for future numpy windows installers Message-ID: <5b8d13220802040213w36d044cfm9865da5474eb2f79@mail.gmail.com> Hi, While studying a bit nsis (an open source system to build windows installers), I realized that it would be good if we could detect the target CPU and install the right numpy accordingly. I have coded a nsis plugin to detect SSE availability (no SSE vs SSE vs SSE2 vs SS3), and including installers within the nsis installer is easy. What would people think about including the installers generated with the current method (bdist_wininst, I guess ?) for every CPU target, and distribute the bundled installer ? The only drawback I can see is the size of the installer: in this case, we could have a system which download the right installer, but that would be more work, obviously. This seems like an easy, "not too much work required" solution to the recurrent problem we get with atlas on windows. cheers, David From haase at msg.ucsf.edu Mon Feb 4 08:56:54 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon, 4 Feb 2008 14:56:54 +0100 Subject: [Numpy-discussion] numpy.asarray( iterator ) Message-ID: Hi, Can this be changed: If I have a list L the usual N.asarray( L ) works well -- however I just discovered that N.asarray( reversed( L ) ) breaks my code.... Apparently reversed( L ) returns an iterator object, and N.asarray( reversed( L ) ) (called arrY in my function) results in: (Pdb) p arrY array(, dtype=object) (Pdb) p arrY.shape () Comments ? How about letting asarray call fromiter when it sees that the argument is a iterator !? Thanks, Sebastian Haase From dalcinl at gmail.com Mon Feb 4 09:39:59 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 4 Feb 2008 11:39:59 -0300 Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> References: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> <47A36759.70502@gmail.com> <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> Message-ID: On 2/1/08, Pearu Peterson wrote: > >> Sorry, I haven't been around there long time. > > > > Are you going to continue not reading the f2py list? If so, you should > > point everyone there to this list and close the list. > > Anyway, I have subscribed to the f2py list again I'll try to respond > to any messages that have unresolved issues, also in the arhives. Pearu, now that f2py is part of numpy, I think it would be easier for you and also for users to post to the numpy list for f2py-related issues. What do you think? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From david.huard at gmail.com Mon Feb 4 10:07:02 2008 From: david.huard at gmail.com (David Huard) Date: Mon, 4 Feb 2008 10:07:02 -0500 Subject: [Numpy-discussion] histogramdd memory needs In-Reply-To: <47A6CEDF.9020909@imtek.de> References: <47A6CEDF.9020909@imtek.de> Message-ID: <91cf711d0802040707m295df8c6qbd19b6b605357652@mail.gmail.com> 2008/2/4, Lars Friedrich : > > Hi, > > > 2) Is there a way to use another algorithm (at the cost of performance) > >> > that uses less memory during calculation so that I can generate > bigger > >> > histograms? > > > > > > You could work through your array block by block. Simply fix the range > and > > generate an histogram for each slice of 100k data and sum them up at the > > end. > > Thank you for your answer. > > I sliced the (original) data into blocks. However, when I do this, I > need at least twice the memory for the whole histogram (one for the > temporary result and one for accumulating the total result). Assuming my > histogram has a size of (280**3)*8 = 176 (megabytes) this does not help, > I think. > > What I will try next is to compute smaller parts of the big histogram > and combine them at the end. (Slice the histogram into blocks) Is it > this, that you were recommending? It was badly explained, sorry, but the goal is to reduce memory footprint, so storing each intermediate result and adding them at the end does not help indeed. You should update the partial histogram as soon as a block is computed. I'm sending you a script that does this for 1D histograms. This comes from the pymc code base. Look at the histogram function in utils.py. Cheers, David Lars > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: utils.py Type: application/octet-stream Size: 18972 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: histogram.f Type: text/x-fortran Size: 8479 bytes Desc: not available URL: From vfulco1 at gmail.com Mon Feb 4 11:02:29 2008 From: vfulco1 at gmail.com (Vince Fulco) Date: Mon, 4 Feb 2008 11:02:29 -0500 Subject: [Numpy-discussion] Numpy and C++ integration... Message-ID: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Dear Numpy Experts- I find myself working with Numpy arrays and wanting to access *simple* C++ functions for time series returning the results to Numpy. As I am a relatively new user of Python/Numpy, the number of paths to use in incorporating C++ code into one's scripts is daunting. I've attempted the Weave app but can not get past the examples. I've also looked at all the other choices out there such as Boost, SIP, PyInline, etc. Any trailheads for the simplest approach (assuming a very minimal understanding of C++) would be much appreciated. At this point, I can't release the code however for review. Thank you. -- Vince Fulco From gael.varoquaux at normalesup.org Mon Feb 4 11:25:58 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 4 Feb 2008 17:25:58 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Message-ID: <20080204162558.GB9185@phare.normalesup.org> On Mon, Feb 04, 2008 at 11:02:29AM -0500, Vince Fulco wrote: > Any trailheads for the simplest approach I find ctypes very easy to understand. See http://www.scipy.org/Cookbook/Ctypes for simple instructions. HTH, Ga?l From lou_boog2000 at yahoo.com Mon Feb 4 11:32:49 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 4 Feb 2008 08:32:49 -0800 (PST) Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Message-ID: <819625.25707.qm@web34405.mail.mud.yahoo.com> Dear Mr. Fulco , This may not be exactly what you want to do, but I would recommend using the C API and then calling your C++ programs from there (where interface functions to the C++ code is compiled in the extern "C" {, } block. I will be doing this soon with my own project. Why? Because the C interface is doable and, I think, simple enough that it is better to take the Python to C++ in two steps. Anyway, worth a look. So here are two links that show how to use the C API: http://www.scipy.org/Cookbook/C_Extensions - A short intro, this also has documentation links http://www.scipy.org/Cookbook/C_Extensions/NumPy_arrays?highlight=%28%28----%28-%2A%29%28%5Cr%29%3F%5Cn%29%28.%2A%29CategoryCookbook%5Cb%29 - This is an article I wrote last year for the SciPy.org site and I go into a lot of detail with a lot of examples on how you pass and handle Numpy arrays. I think it is (mostly) right and works well for me. One warning (which I also talk about in my tutorial) is to make sure your NumPy arrays are "Continguous", i.e. the array components are in order in one memory block. That makes things easier on the C/C++ side. --- Vince Fulco wrote: > Dear Numpy Experts- I find myself working with > Numpy arrays and > wanting to access *simple* C++ functions for time > series returning the > results to Numpy. As I am a relatively new user of > Python/Numpy, the > number of paths to use in incorporating C++ code > into one's scripts is > daunting. I've attempted the Weave app but can not > get past the > examples. I've also looked at all the other choices > out there such as > Boost, SIP, PyInline, etc. Any trailheads for the > simplest approach > (assuming a very minimal understanding of C++) would > be much > appreciated. At this point, I can't release the > code however for > review. Thank you. > > -- > Vince Fulco -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From matthieu.brucher at gmail.com Mon Feb 4 11:39:42 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 4 Feb 2008 17:39:42 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <819625.25707.qm@web34405.mail.mud.yahoo.com> References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> <819625.25707.qm@web34405.mail.mud.yahoo.com> Message-ID: 2008/2/4, Lou Pecora : > > Dear Mr. Fulco , > > This may not be exactly what you want to do, but I > would recommend using the C API and then calling your > C++ programs from there (where interface functions to > the C++ code is compiled in the extern "C" {, } > block. I will be doing this soon with my own project. > Why? Because the C interface is doable and, I think, > simple enough that it is better to take the Python to > C++ in two steps. Anyway, worth a look. So here are > two links that show how to use the C API: > > http://www.scipy.org/Cookbook/C_Extensions - A short > intro, this also has documentation links > > > http://www.scipy.org/Cookbook/C_Extensions/NumPy_arrays?highlight=%28%28----%28-%2A%29%28%5Cr%29%3F%5Cn%29%28.%2A%29CategoryCookbook%5Cb%29 > - This is an article I wrote last year for the > SciPy.org site and I go into a lot of detail with a > lot of examples on how you pass and handle Numpy > arrays. I think it is (mostly) right and works well > for me. > > One warning (which I also talk about in my tutorial) > is to make sure your NumPy arrays are "Continguous", > i.e. the array components are in order in one memory > block. That makes things easier on the C/C++ side. Whatever solution you choose (Boost.Python, ...), you will have to use the Numpy C API at least a little bit. So Travis' book is a good start. As Ga?l told you, you can use ctypes if you wrap manually every method with a C function and recreate the class in Python. This can be avoided, but you'll have to use more powerful tools. I would advice SWIG (see my blog for some examples with C++ and SWIG). Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From lou_boog2000 at yahoo.com Mon Feb 4 11:46:51 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 4 Feb 2008 08:46:51 -0800 (PST) Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: Message-ID: <211849.5685.qm@web34412.mail.mud.yahoo.com> --- Matthieu Brucher wrote: > Whatever solution you choose (Boost.Python, ...), > you will have to use the > Numpy C API at least a little bit. So Travis' book > is a good start. As Ga?l > told you, you can use ctypes if you wrap manually > every method with a C > function and recreate the class in Python. > This can be avoided, but you'll have to use more > powerful tools. I would > advice SWIG (see my blog for some examples with C++ > and SWIG). > > Matthieu Ah, yes, I will also recommend Travis' book. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ndbecker2 at gmail.com Mon Feb 4 11:58:39 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 04 Feb 2008 11:58:39 -0500 Subject: [Numpy-discussion] Numpy and C++ integration... References: <211849.5685.qm@web34412.mail.mud.yahoo.com> Message-ID: I have a variety of experiments that I put in this mercurial repo: https://nbecker.dyndns.org/hg/ The primary aim of this is to reuse c++ code written to a generic container interface, with numpy. From wfspotz at sandia.gov Mon Feb 4 12:24:49 2008 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 4 Feb 2008 10:24:49 -0700 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> <819625.25707.qm@web34405.mail.mud.yahoo.com> Message-ID: <40C402E8-90B5-46C0-9602-F5AE6D4BCDEC@sandia.gov> On Feb 4, 2008, at 9:39 AM, Matthieu Brucher wrote: > This can be avoided, but you'll have to use more powerful tools. I > would advice SWIG (see my blog for some examples with C++ and SWIG). Note that a lot of work has been done to bridge between numpy and swig. There is a swig interface file for numpy in numpy/doc/swig, along with documentation on how to use it. It is largely geared towards generating wrappers to C/C++ functions that take pointers to arrays. If your data is encapsulated within a C++ class, then it gets more complicated. ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From sdb at cloud9.net Mon Feb 4 12:34:37 2008 From: sdb at cloud9.net (Stuart Brorson) Date: Mon, 4 Feb 2008 12:34:37 -0500 (EST) Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args Message-ID: Hi -- I'm fiddling with NumPy's chopping and truncating operators: round, fix, ceil, and floor. In the case where they are passed real args, they work just fine. However, I find that when they are passed complex args, I get the following: round -> works fine. ceil -> throws exception: 'complex' object has no attribute 'ceil' floor -> throws exception: 'complex' object has no attribute 'floor' fix -> throws exception: 'complex' object has no attribute 'floor' Please see the session log below for more details. My question: Is this a bug or a feature? It seems to me that if you implement round for complex args, then you need to also support ceil, floor, and fix for complex args, so it's a bug. But I thought I'd ask the developers what they thought before filing a ticket. Regards, Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/ -------------------------- ------------------- In [14]: import numpy In [15]: A = 10*numpy.random.rand(2, 2) + 10j*numpy.random.rand(2, 2) In [16]: numpy.round(A) Out[16]: array([[ 6.+8.j, 6.+5.j], [ 10.+6.j, 9.+9.j]]) In [17]: numpy.floor(A) --------------------------------------------------------------------------- Traceback (most recent call last) /fs/home/sdb/ in () : 'complex' object has no attribute 'floor' In [18]: numpy.ceil(A) --------------------------------------------------------------------------- Traceback (most recent call last) /fs/home/sdb/ in () : 'complex' object has no attribute 'ceil' In [19]: numpy.fix(A) --------------------------------------------------------------------------- Traceback (most recent call last) /fs/home/sdb/ in () /home/sdb/trunk/output/ia32_linux/python/lib/python2.5/site-packages/numpy/lib/ufunclike.py in fix(x, y) 14 x = asanyarray(x) 15 if y is None: ---> 16 y = nx.floor(x) 17 else: 18 nx.floor(x, y) : 'complex' object has no attribute 'floor' In [20]: numpy.__version__ Out[20]: '1.0.4' In [21]: ---------------------------- ------------------- From tim.hochberg at ieee.org Mon Feb 4 13:14:35 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 4 Feb 2008 11:14:35 -0700 Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args In-Reply-To: References: Message-ID: On Mon, Feb 4, 2008 at 10:34 AM, Stuart Brorson wrote: > Hi -- > > I'm fiddling with NumPy's chopping and truncating operators: round, > fix, ceil, and floor. In the case where they are passed real args, > they work just fine. However, I find that when they are passed > complex args, I get the following: > > round -> works fine. > ceil -> throws exception: 'complex' object has no attribute 'ceil' > floor -> throws exception: 'complex' object has no attribute 'floor' > fix -> throws exception: 'complex' object has no attribute 'floor' > > Please see the session log below for more details. > > My question: Is this a bug or a feature? It seems to me that if you > implement round for complex args, then you need to also support ceil, > floor, and fix for complex args, so it's a bug. But I thought I'd ask > the developers what they thought before filing a ticket. IMO, the problem is not that ceil, floor and fix are not defined for complex, but rather that round is. (Re, Im) is not a unique representation for complex numbers, although that is the internal representation that numpy uses, and as a result none of these functions are uniquely defined. Since it's trivial to synthesize the effect that I assume you are looking for (operating on both the Re and Im parts as if the were floats), there's no reason to have this functionality built in. [....examples....] -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at ieee.org Mon Feb 4 13:28:49 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 4 Feb 2008 11:28:49 -0700 Subject: [Numpy-discussion] numpy.asarray( iterator ) In-Reply-To: References: Message-ID: On Mon, Feb 4, 2008 at 6:56 AM, Sebastian Haase wrote: > Hi, > > Can this be changed: > If I have a list L the usual N.asarray( L ) works well -- however I > just discovered that N.asarray( reversed( L ) ) breaks my code.... > > Apparently reversed( L ) returns an iterator object, and N.asarray( > reversed( L ) ) (called arrY in my function) > results in: > (Pdb) p arrY > array(, dtype=object) > (Pdb) p arrY.shape > () > > > Comments ? How about letting asarray call fromiter when it sees > that the argument is a iterator !? That's not really feasible. fromiter requires knowledge of the type of the data which you don't in general know in asarray. In addition, the various array creations are already teetering on the edge of having too much magic built in, and IMO it's a mistake to try to do any more guessing than we already do. My suggestion is to not use reversed. If your input (L) is a sequence rather than an iterator, why not just use L[::-1]. If L, might be an iterator you might need to do something else, but in any case, you will know more about the possible types of L than asarray can, so you are more able to make some sensible decisions about how to treat it. If you don't care about efficiency, then I believe array(list(L)) should work in a lot of cases, but it's pretty horribly inefficient. > > Thanks, > Sebastian Haase > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdb at cloud9.net Mon Feb 4 13:59:33 2008 From: sdb at cloud9.net (Stuart Brorson) Date: Mon, 4 Feb 2008 13:59:33 -0500 (EST) Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args In-Reply-To: References: Message-ID: round -> works fine. ceil -> throws exception: 'complex' object has no attribute 'ceil' floor -> throws exception: 'complex' object has no attribute 'floor' fix -> throws exception: 'complex' object has no attribute 'floor' >> My question: Is this a bug or a feature? It seems to me that if you >> implement round for complex args, then you need to also support ceil, >> floor, and fix for complex args, so it's a bug. But I thought I'd ask >> the developers what they thought before filing a ticket. > > IMO, the problem is not that ceil, floor and fix are not defined for > complex, but rather that round is. (Re, Im) is not a unique representation > for complex numbers, although that is the internal representation that numpy > uses, and as a result none of these functions are uniquely defined. Since > it's trivial to synthesize the effect that I assume you are looking for > (operating on both the Re and Im parts as if the were floats), there's no > reason to have this functionality built in. What you say is reasonable prima face. However, looking deeper, NumPy *already* treats complex numbers using the (Re, Im) representation when implementing ordering operators. That is, if I do "A <= B", then NumPy first checks the real parts, and if it can't fully decide, then it checks the imaginary parts. That is, the (Re, Im) representation already excludes other ways in which you might define ordering operations for complex numbers. BTW: I have whined about this behavior several times, including here [1]: http://projects.scipy.org/pipermail/numpy-discussion/2008-January/031056.html Anyway, since NumPy is committed to (Re, Im) as the base representation of complex numbers, then it is not unreasonable to implement round, fix, and so on, by operating independently on the Re and Im parts. Or am I wrong? Cheers, Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/ [1] Sorry for whining, by the way! I'm just poking at the boundaries of NumPy's feature envelope and trying to see how self-consistent it is. From charlesr.harris at gmail.com Mon Feb 4 14:24:09 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Feb 2008 12:24:09 -0700 Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args In-Reply-To: References: Message-ID: On Feb 4, 2008 10:34 AM, Stuart Brorson wrote: > Hi -- > > I'm fiddling with NumPy's chopping and truncating operators: round, > fix, ceil, and floor. In the case where they are passed real args, > they work just fine. However, I find that when they are passed > complex args, I get the following: > > round -> works fine. > ceil -> throws exception: 'complex' object has no attribute 'ceil' > floor -> throws exception: 'complex' object has no attribute 'floor' > fix -> throws exception: 'complex' object has no attribute 'floor' > > Please see the session log below for more details. > > My question: Is this a bug or a feature? It seems to me that if you > implement round for complex args, then you need to also support ceil, > floor, and fix for complex args, so it's a bug. But I thought I'd ask > the developers what they thought before filing a ticket. > > Regards, > I think it would be reasonable to add these operations for consistency. Useful is perhaps a different question. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at ieee.org Mon Feb 4 14:25:24 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Mon, 4 Feb 2008 12:25:24 -0700 Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args In-Reply-To: References: Message-ID: On Mon, Feb 4, 2008 at 11:59 AM, Stuart Brorson wrote: > round -> works fine. > ceil -> throws exception: 'complex' object has no attribute 'ceil' > floor -> throws exception: 'complex' object has no attribute 'floor' > fix -> throws exception: 'complex' object has no attribute 'floor' > > >> My question: Is this a bug or a feature? It seems to me that if you > >> implement round for complex args, then you need to also support ceil, > >> floor, and fix for complex args, so it's a bug. But I thought I'd ask > >> the developers what they thought before filing a ticket. > > > > IMO, the problem is not that ceil, floor and fix are not defined for > > complex, but rather that round is. (Re, Im) is not a unique > representation > > for complex numbers, although that is the internal representation that > numpy > > uses, and as a result none of these functions are uniquely defined. > Since > > it's trivial to synthesize the effect that I assume you are looking for > > (operating on both the Re and Im parts as if the were floats), there's > no > > reason to have this functionality built in. > > What you say is reasonable prima face. > > However, looking deeper, NumPy *already* treats complex > numbers using the (Re, Im) representation when implementing ordering > operators. That is, if I do "A <= B", then NumPy first checks the > real parts, and if it can't fully decide, then it checks the imaginary > parts. That is, the (Re, Im) representation already excludes other > ways in which you might define ordering operations for complex > numbers. > > BTW: I have whined about this behavior several times, including here [1]: > > > http://projects.scipy.org/pipermail/numpy-discussion/2008-January/031056.html I agree with this, FWIW. Ordering comparisons between complex numbers should raise an exception. > > Anyway, since NumPy is committed to (Re, Im) as the base > representation of complex numbers, then it is not unreasonable to > implement round, fix, and so on, by operating independently on the Re > and Im parts. IMO, just because numpy has a certain wart is no reason to push for adding a bunch of vaguely similar warts just to make things more consistent. Better to try to remove the initial wart if the opportunity presents itself. The more the (Re, Im) representation leaks, the harder it will become to ever fix. And, in this case at least, full consistency is not possible short of removing the ordered comparison of complex numbers, since the Python complex type does not support ordered comparisons. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pearu at cens.ioc.ee Mon Feb 4 14:47:46 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Mon, 4 Feb 2008 21:47:46 +0200 (EET) Subject: [Numpy-discussion] [F2PY]: Allocatable Arrays In-Reply-To: References: <59257.85.166.27.136.1201877109.squirrel@cens.ioc.ee> <47A36759.70502@gmail.com> <56766.85.166.27.136.1201892914.squirrel@cens.ioc.ee> Message-ID: <54790.85.166.27.136.1202154466.squirrel@cens.ioc.ee> On Mon, February 4, 2008 4:39 pm, Lisandro Dalcin wrote: > Pearu, now that f2py is part of numpy, I think it would be easier for > you and also for users to post to the numpy list for f2py-related > issues. What do you think? Personaly, I don't have strong opinions on this. On one hand, it would be better if f2py related issues are raised in one and only one list. It could be numpy list as f2py issues would not add much extra traffic to it. On the other hand, if redirecting f2py-users messages to numpy list and also vice versa, then I am not sure that all subscribes in f2py-users list will appreciate extra messages about numpy issues that might be irrelevant to them. Currently there are about 180 subscribes to the f2py-users list, many of them also subscribe numpy-discussion list, I guess. If most of them would subscribe numpy-discussion list then we could get rid of f2py-users list. If the administrators of the numpy-discussion list could share (privately) the list of subscribers mails with me then I could check how many of the f2py users subscribe the numpy list and it would be easier to make a decision on the future of f2py-users list. When I'll be back to working on f2py g3 then I'll probably create a google code project for it. There we'll have separate list for the future version of f2py anyway. Regards, Pearu From Chris.Barker at noaa.gov Mon Feb 4 15:05:45 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 Feb 2008 12:05:45 -0800 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <819625.25707.qm@web34405.mail.mud.yahoo.com> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> Message-ID: <47A77019.1020204@noaa.gov> Lou Pecora wrote: > I > would recommend using the C API I would recommend against this -- there is a lot of code to write in extensions to make sure you do reference counting, etc, and it is hard to get right. Much of it is also boiler-plate code, so it makes more sense to have that code auto-generated. There are just too many good tools to do this for you to do it by hand. The problem is that there is an embarrassment of riches -- if only one or tow of the C/C++ interface tools were out there, it would be a whole lot easier to choose! My take: ctypes -- best if you have dll-type interface already defined, and particularly if there are a smallish number of functions you want to call. Can it call C++ directly at all? pyrex -- best if you want to implement some custom functions in C from scratch. Also pretty good for calling external C code. Only supports calling C++ code that's not too fancy -- i.e. stuff that can be called from C -- pyrex has no explicit support for C++ SWIG -- best if you have a lot of code to wrap that shares similar interfaces - once you write the typemaps, the rest is automatic. Also best choice if you want to support more than one scripting language, or you want to integrate with other packages built with SWIG (wxPython, GDAL, VTK, ...). Bill's numpy-swig typemaps make it easy to deal with classic C-style pointers to data blocks. It also comes with built-in wrappers for std:vector, though not numpy integration for those. SIP -- built for pyQT -- folks seem to like it. I don't know if anyone has done anything for numpy with it. Boost::python -- best for writing custom extensions in C++ -- also can be used for interfacing with legacy C++. There were boost array classes for numpy -- are these being maintained? Any of these can do that job -- so it's hard to choose, but maybe the above helps focus your search. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Mon Feb 4 15:07:16 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 Feb 2008 12:07:16 -0800 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <211849.5685.qm@web34412.mail.mud.yahoo.com> Message-ID: <47A77074.5090205@noaa.gov> Neal Becker wrote: > I have a variety of experiments that I put in this mercurial repo: > https://nbecker.dyndns.org/hg/ > > The primary aim of this is to reuse c++ code written to a generic container > interface, with numpy. Neal, I'd love to hear more about this. Do you have a two paragraph description of what you're up to? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From wfspotz at sandia.gov Mon Feb 4 15:13:22 2008 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 4 Feb 2008 13:13:22 -0700 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <47A77019.1020204@noaa.gov> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> Message-ID: On Feb 4, 2008, at 1:05 PM, Christopher Barker wrote: > Boost::python -- best for writing custom extensions in C++ -- also can > be used for interfacing with legacy C++. There were boost array > classes > for numpy -- are these being maintained? There are boost array classes for Numeric, and *talk* of upgrading them to numpy, but it hasn't happened yet, to my knowledge. ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From ndbecker2 at gmail.com Mon Feb 4 15:29:10 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 04 Feb 2008 15:29:10 -0500 Subject: [Numpy-discussion] Numpy and C++ integration... References: <211849.5685.qm@web34412.mail.mud.yahoo.com> <47A77074.5090205@noaa.gov> Message-ID: Christopher Barker wrote: > Neal Becker wrote: >> I have a variety of experiments that I put in this mercurial repo: >> https://nbecker.dyndns.org/hg/ >> >> The primary aim of this is to reuse c++ code written to a generic >> container interface, with numpy. > > Neal, > > I'd love to hear more about this. Do you have a two paragraph > description of what you're up to? > I need to update it, but here is a short doc: https://nbecker.dyndns.org/misc/design.pdf If you look at the hg repo, you will see a few interesting exercises. accumulator shows the idea of making 1-d numpy arrays usable as containers compatible with boost::range. numpy_iter.hpp has most of the work. This has: n-dim wrapper for numpy array, and iterator to go with it. Since the n-d iter concept is not too well defined, I haven't worked on this much. 1-dim wrapper and iterator to go with it. numpy is run-time polymorphic, which the c++ code I want to use is compile-time. misc has some tests of dispatching based on types. num2.cc has some tests of creating numpy arrays. limit.cc has a little test of ufunc. From gael.varoquaux at normalesup.org Mon Feb 4 15:37:48 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 4 Feb 2008 21:37:48 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <47A77019.1020204@noaa.gov> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> Message-ID: <20080204203748.GA19901@phare.normalesup.org> On Mon, Feb 04, 2008 at 12:05:45PM -0800, Christopher Barker wrote: > ctypes -- [...] Can it call C++ directly at all? No, but you can use 'extern "C"' in you cpp file, if you have controle over the file. Ga?l From lou_boog2000 at yahoo.com Mon Feb 4 15:49:58 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 4 Feb 2008 12:49:58 -0800 (PST) Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <47A77019.1020204@noaa.gov> Message-ID: <388660.35981.qm@web34402.mail.mud.yahoo.com> --- Christopher Barker wrote: > Lou Pecora wrote: > > I > > would recommend using the C API > > I would recommend against this -- there is a lot of > code to write in > extensions to make sure you do reference counting, > etc, and it is hard > to get right. Well, fair enough to some extent, but I didn't find it so hard after I did a few. I will speak for myself here. The reason I went to the C API is because I tried several of the routes you suggest and I could not get any of them to work. And you're right, the C API is boilerplate. That also argues for using it. So, for those looking for speed up through some external C or C++ code, I would say (trying to be fair here), try what Chris recommends below, if you want, but IMHO, none of it is trivial. If you get it to work, great. If not, you have the fall back of the C API. > ctypes > pyrex > SWIG > SIP > Boost::python I tried all of these except SIP and got nowhere. So maybe others will be a lot smarter than I. > Christopher Barker, Ph.D. > Oceanographer -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From oliphant at enthought.com Mon Feb 4 15:50:24 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 04 Feb 2008 14:50:24 -0600 Subject: [Numpy-discussion] round, fix, ceil, and floor for complex args In-Reply-To: References: Message-ID: <47A77A90.6020208@enthought.com> Stuart Brorson wrote: > > Anyway, since NumPy is committed to (Re, Im) as the base > representation of complex numbers, then it is not unreasonable to > implement round, fix, and so on, by operating independently on the Re > and Im parts. > > Or am I wrong? > Sounds reasonable to me... -Travis O. From gael.varoquaux at normalesup.org Mon Feb 4 16:11:38 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 4 Feb 2008 22:11:38 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <388660.35981.qm@web34402.mail.mud.yahoo.com> References: <47A77019.1020204@noaa.gov> <388660.35981.qm@web34402.mail.mud.yahoo.com> Message-ID: <20080204211138.GC19901@phare.normalesup.org> On Mon, Feb 04, 2008 at 12:49:58PM -0800, Lou Pecora wrote: > So, for those looking for speed up through some > external C or C++ code, I would say (trying to be fair > here), try what Chris recommends below, if you want, > but IMHO, none of it is trivial. If you get it to > work, great. If not, you have the fall back of the C > API. Honestly, I found ctypes trivial, using http://www.scipy.org/Cookbook/Ctypes as a reference, and simply copying the code. I started by copying the code, made sure it worked, and I understood how to modify it, than adapted it to my problem. Ga?l From lxander.m at gmail.com Mon Feb 4 16:19:34 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Mon, 4 Feb 2008 16:19:34 -0500 Subject: [Numpy-discussion] An idea for future numpy windows installers In-Reply-To: <5b8d13220802040213w36d044cfm9865da5474eb2f79@mail.gmail.com> References: <5b8d13220802040213w36d044cfm9865da5474eb2f79@mail.gmail.com> Message-ID: <525f23e80802041319j61bbda01ya0e050f2c3254900@mail.gmail.com> On Feb 4, 2008 5:13 AM, David Cournapeau wrote: > Hi, > > While studying a bit nsis (an open source system to build windows > installers), I realized that it would be good if we could detect the > target CPU and install the right numpy accordingly. I have coded a > nsis plugin to detect SSE availability (no SSE vs SSE vs SSE2 vs SS3), > and including installers within the nsis installer is easy. What would > people think about including the installers generated with the current > method (bdist_wininst, I guess ?) for every CPU target, and distribute > the bundled installer ? The only drawback I can see is the size of the > installer: in this case, we could have a system which download the > right installer, but that would be more work, obviously. > This seems like an easy, "not too much work required" solution to > the recurrent problem we get with atlas on windows. I like the idea of creating such a "universal" Windows installer for the (optional) numpy dependencies (particularly ATLAS) which is separate from the numpy distribution. Ultimately, it would be great if numpy automatically noticed if ATLAS has been installed this way and self-configured itself to use the libraries when available, but I would still consider this a better situation if it was easy to build numpy to use such an installation with numscons. This would also provide a natural decoupling between the numpy and ATLAS distributions. From barrywark at gmail.com Mon Feb 4 17:23:58 2008 From: barrywark at gmail.com (Barry Wark) Date: Mon, 4 Feb 2008 14:23:58 -0800 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080204211138.GC19901@phare.normalesup.org> References: <47A77019.1020204@noaa.gov> <388660.35981.qm@web34402.mail.mud.yahoo.com> <20080204211138.GC19901@phare.normalesup.org> Message-ID: For comparison of ctypes and SWIG wrappers of a simple C++ codebase, feel free to take a look at the code for scikits.ann (http://scipy.org/scipy/scikits/wiki/AnnWrapper). The original wrapper was written using SWIG and the numpy typemaps. Rob Hetland has coded an almost-the-same API wrapper using ctypes which is linked from the above wiki page. Perhaps it will help the OP to see a similar project side-by-side. Barry On Feb 4, 2008 1:11 PM, Gael Varoquaux wrote: > On Mon, Feb 04, 2008 at 12:49:58PM -0800, Lou Pecora wrote: > > So, for those looking for speed up through some > > external C or C++ code, I would say (trying to be fair > > here), try what Chris recommends below, if you want, > > but IMHO, none of it is trivial. If you get it to > > work, great. If not, you have the fall back of the C > > API. > > Honestly, I found ctypes trivial, using > http://www.scipy.org/Cookbook/Ctypes as a reference, and simply copying > the code. I started by copying the code, made sure it worked, and I > understood how to modify it, than adapted it to my problem. > > Ga?l > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From phaustin at gmail.com Mon Feb 4 17:41:40 2008 From: phaustin at gmail.com (Phil Austin) Date: Mon, 04 Feb 2008 14:41:40 -0800 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> Message-ID: <47A794A4.3080207@gmail.com> Bill Spotz wrote: > On Feb 4, 2008, at 1:05 PM, Christopher Barker wrote: > >> Boost::python -- best for writing custom extensions in C++ -- also can >> be used for interfacing with legacy C++. There were boost array >> classes >> for numpy -- are these being maintained? > > There are boost array classes for Numeric, and *talk* of upgrading > them to numpy, but it hasn't happened yet, to my knowledge. I've updated and tested (but not uploaded) our num_util helper functions (http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2) for python 2.5.1, numpy 1.0.5.dev4731, boost_1_34_1, Centos 5.1 64bit on Athlon64. Anyone that's interested is welcome to contact me for a tarfile. I plan to switch the underlying array object from boost::python::numeric::array to Neal's numpy_array, but that's not likely to happen before April -- Phil From haase at msg.ucsf.edu Tue Feb 5 03:15:29 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue, 5 Feb 2008 09:15:29 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <47A77019.1020204@noaa.gov> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> Message-ID: On Feb 4, 2008 9:05 PM, Christopher Barker wrote: > Lou Pecora wrote: > > I > > would recommend using the C API > > I would recommend against this -- there is a lot of code to write in > extensions to make sure you do reference counting, etc, and it is hard > to get right. > > Much of it is also boiler-plate code, so it makes more sense to have > that code auto-generated. > > There are just too many good tools to do this for you to do it by hand. > > The problem is that there is an embarrassment of riches -- if only one > or tow of the C/C++ interface tools were out there, it would be a whole > lot easier to choose! My take: > > ctypes -- best if you have dll-type interface already defined, and > particularly if there are a smallish number of functions you want to > call. Can it call C++ directly at all? > > pyrex -- best if you want to implement some custom functions in C from > scratch. Also pretty good for calling external C code. Only supports > calling C++ code that's not too fancy -- i.e. stuff that can be called > from C -- pyrex has no explicit support for C++ > > SWIG -- best if you have a lot of code to wrap that shares similar > interfaces - once you write the typemaps, the rest is automatic. Also > best choice if you want to support more than one scripting language, or > you want to integrate with other packages built with SWIG (wxPython, > GDAL, VTK, ...). Bill's numpy-swig typemaps make it easy to deal with > classic C-style pointers to data blocks. It also comes with built-in > wrappers for std:vector, though not numpy integration for those. > > SIP -- built for pyQT -- folks seem to like it. I don't know if anyone > has done anything for numpy with it. > > Boost::python -- best for writing custom extensions in C++ -- also can > be used for interfacing with legacy C++. There were boost array classes > for numpy -- are these being maintained? > > Any of these can do that job -- so it's hard to choose, but maybe the > above helps focus your search. > > -Chris Hi, Which of these can automatically wrap (and "overload" !!) function templates ? I have many "helper" functions which can operate equally on many -- if not all -- number types. As examples I would like to mention my favorite function that I call "mmms" which calculates min,max,mean,std.dev. in one go. I use SWIG with my own typemaps which (by now) might be similar to what is in numpy (written by Bill). They can handle uint8, int16, uint16, uint32, int32, float32,float64. A C-preprocessor function instantiates the function for each type and another (SWIG command) overloads it for python. They work without any memory-copy (!!) *if* the data is C-contiguous. Can ctypes do this ? Thanks, -Sebastian From gael.varoquaux at normalesup.org Tue Feb 5 03:21:39 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 5 Feb 2008 09:21:39 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> Message-ID: <20080205082139.GE9772@phare.normalesup.org> On Tue, Feb 05, 2008 at 09:15:29AM +0100, Sebastian Haase wrote: > Can ctypes do this ? No. Ctypes is only a way of loading C (and not C++) libraries in Python. That makes it very simple, but not very powerful. Ga?l From david at ar.media.kyoto-u.ac.jp Tue Feb 5 05:06:21 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 05 Feb 2008 19:06:21 +0900 Subject: [Numpy-discussion] An idea for future numpy windows installers In-Reply-To: <525f23e80802041319j61bbda01ya0e050f2c3254900@mail.gmail.com> References: <5b8d13220802040213w36d044cfm9865da5474eb2f79@mail.gmail.com> <525f23e80802041319j61bbda01ya0e050f2c3254900@mail.gmail.com> Message-ID: <47A8351D.7060604@ar.media.kyoto-u.ac.jp> Alexander Michael wrote: > On Feb 4, 2008 5:13 AM, David Cournapeau wrote: >> Hi, >> >> While studying a bit nsis (an open source system to build windows >> installers), I realized that it would be good if we could detect the >> target CPU and install the right numpy accordingly. I have coded a >> nsis plugin to detect SSE availability (no SSE vs SSE vs SSE2 vs SS3), >> and including installers within the nsis installer is easy. What would >> people think about including the installers generated with the current >> method (bdist_wininst, I guess ?) for every CPU target, and distribute >> the bundled installer ? The only drawback I can see is the size of the >> installer: in this case, we could have a system which download the >> right installer, but that would be more work, obviously. >> This seems like an easy, "not too much work required" solution to >> the recurrent problem we get with atlas on windows. > > I like the idea of creating such a "universal" Windows installer for the > (optional) numpy dependencies (particularly ATLAS) which is > separate from the numpy distribution. Ultimately, it would be great if > numpy automatically noticed if ATLAS has been installed this way and > self-configured itself to use the libraries when available, but I would still > consider this a better situation if it was easy to build numpy to use > such an installation with numscons. Well, this has nothing to do with numscons per se. I indeed started working on this because of my work on numscons, though (I still need to support windows platform, which I find extremely frustrating to work with, and a super pack installer for all numpy/scipy dependencies makes the pain lower for reproducible builds). I see two cases, which is why I suggested this as a separate issue of my recently announced blas/lapack superpack: - people who just want to install numpy: people want to try numpy, they don't want to care about sse and co. That's why an installer with several numpy versions inside would be good: it would work for everybody. - people who work with SVN: particularly for scipy, that's something many people want to. Building blas, lapack and atlas is hard. I think I know the problems pretty well, having build and installed them on so many compiler/platforms combinations by now, but that's not something terribly interesting. And it is hard to explain it well, because it is so easy to make a mistake at some point. So instead of explaining how to do it, just put something which works out of the box: that's what the BLAS/LAPACK superpack is for. cheers, David From david at ar.media.kyoto-u.ac.jp Tue Feb 5 05:23:21 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 05 Feb 2008 19:23:21 +0900 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205082139.GE9772@phare.normalesup.org> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> Message-ID: <47A83919.2050902@ar.media.kyoto-u.ac.jp> Gael Varoquaux wrote: > On Tue, Feb 05, 2008 at 09:15:29AM +0100, Sebastian Haase wrote: > >> Can ctypes do this ? >> > > No. Ctypes is only a way of loading C (and not C++) libraries in Python. > That makes it very simple, but not very powerful. > I would not call ctypes not very powerful :) For sure you cannot do the same way as swig does, but you could imagine some automatic scheme to solve Sebastian's problem. Typically, having a C wrapper automatically generated from the C++ headers, you could use the ctypes code generator, and you have something almost automatic (with a bit some boilerplate code in python, maybe). cheers, David From faltet at carabos.com Tue Feb 5 05:33:39 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 5 Feb 2008 11:33:39 +0100 Subject: [Numpy-discussion] Generating a series of integers Message-ID: <200802051133.39623.faltet@carabos.com> Hi, I need to generate a series of uint8 integers similar to: In [37]: numpy.linspace(10, 20, num=25).astype('uint8') Out[37]: array([10, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 19, 19, 20], dtype=uint8) i.e. create evenly spaced samples in a range, but without using an intermediate array to create it (memory footprint is important). I know that I can create this with a loop, but I'm curious if that can be done more compactly in one single sentence. Thanks, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From haase at msg.ucsf.edu Tue Feb 5 05:48:37 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue, 5 Feb 2008 11:48:37 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205082139.GE9772@phare.normalesup.org> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> Message-ID: On Feb 5, 2008 9:21 AM, Gael Varoquaux wrote: > On Tue, Feb 05, 2008 at 09:15:29AM +0100, Sebastian Haase wrote: > > Can ctypes do this ? > > No. Ctypes is only a way of loading C (and not C++) libraries in Python. > That makes it very simple, but not very powerful. > > Ga?l (sorry, this email got stuck in moderation, because I used the wrong sender address) Thanks fr the reply. How about "manual" overloading. I mean, if -- for example -- I have two functions mmms_b and mmms_i in C, I could still use ctypes; could I then "merge" them into one python function, which "re-routes" depending on the argument dtype !? This is what SWIG must be doing internally -- right ?! Numpy/ctypes could come with such "re-routing helper meta-function" (decorators?) out-of-the-box... Thanks, -Sebastian From ondrej at certik.cz Tue Feb 5 05:48:38 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Tue, 5 Feb 2008 11:48:38 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <47A83919.2050902@ar.media.kyoto-u.ac.jp> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> <47A83919.2050902@ar.media.kyoto-u.ac.jp> Message-ID: <85b5c3130802050248y4237ab15i4b2348b16bab9b97@mail.gmail.com> On Feb 5, 2008 11:23 AM, David Cournapeau wrote: > Gael Varoquaux wrote: > > On Tue, Feb 05, 2008 at 09:15:29AM +0100, Sebastian Haase wrote: > > > >> Can ctypes do this ? > >> > > > > No. Ctypes is only a way of loading C (and not C++) libraries in Python. > > That makes it very simple, but not very powerful. > > > I would not call ctypes not very powerful :) For sure you cannot do the > same way as swig does, but you could imagine some automatic scheme to > solve Sebastian's problem. > > Typically, having a C wrapper automatically generated from the C++ > headers, you could use the ctypes code generator, and you have something > almost automatic (with a bit some boilerplate code in python, maybe). > > cheers, Also feel free to extend this wiki: http://wiki.cython.org/WrappingCorCpp I use Cython, mostly for the same reasons Gael is using ctypes - it's trivial. Ondrej From gael.varoquaux at normalesup.org Tue Feb 5 05:50:09 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 5 Feb 2008 11:50:09 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> Message-ID: <20080205105009.GF15367@phare.normalesup.org> On Tue, Feb 05, 2008 at 11:48:37AM +0100, Sebastian Haase wrote: > Thanks fr the reply. > How about "manual" overloading. I mean, if -- for example -- I have > two functions mmms_b and mmms_i in C, I could still use ctypes; could > I then "merge" them into one python function, which "re-routes" > depending on the argument dtype !? Yes, that's exactly what I do (except I rarely use C++, so I reroute to different C functions). It doesn't scale well, but it's OK if you have only a few functions to worry about. Ga?l From gael.varoquaux at normalesup.org Tue Feb 5 05:52:00 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 5 Feb 2008 11:52:00 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <85b5c3130802050248y4237ab15i4b2348b16bab9b97@mail.gmail.com> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> <47A83919.2050902@ar.media.kyoto-u.ac.jp> <85b5c3130802050248y4237ab15i4b2348b16bab9b97@mail.gmail.com> Message-ID: <20080205105200.GG15367@phare.normalesup.org> On Tue, Feb 05, 2008 at 11:48:38AM +0100, Ondrej Certik wrote: > I use Cython, mostly for the same reasons Gael is using ctypes - it's trivial. Actually, when I want to do something really trivial, I use scipy.weave.inline ( see http://scipy.org/PerformancePython for an example of scipy.weave.inline use). Of course it doesn't work when linking to external libraries, but to accelerate a for loop, it's great. Ga?l From haase at msg.ucsf.edu Tue Feb 5 05:53:05 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue, 5 Feb 2008 11:53:05 +0100 Subject: [Numpy-discussion] Generating a series of integers In-Reply-To: <200802051133.39623.faltet@carabos.com> References: <200802051133.39623.faltet@carabos.com> Message-ID: On Feb 5, 2008 11:33 AM, Francesc Altet wrote: > Hi, > > I need to generate a series of uint8 integers similar to: > > In [37]: numpy.linspace(10, 20, num=25).astype('uint8') > Out[37]: > array([10, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 14, 15, 15, 15, 16, > 16, 17, 17, 17, 18, 18, 19, 19, 20], dtype=uint8) > > i.e. create evenly spaced samples in a range, but without using an > intermediate array to create it (memory footprint is important). > > I know that I can create this with a loop, but I'm curious if that can > be done more compactly in one single sentence. > > Thanks, > Could a "dtype" argument be added to linspace() -- there are many similar functions. (I remember for example fromfunction() ) -Sebastian From matthieu.brucher at gmail.com Tue Feb 5 05:57:45 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 5 Feb 2008 11:57:45 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> Message-ID: > > This is what SWIG must be doing internally -- right ?! > Yes, it is with an additional typemap that checks the type of the data. I don't think that it is a good idea for numpy to add such multi-dispatching, it is not its job. There are a lot of ways to do it, and besides it would be very cumbersome to have something specific to Numpy (that is only for Numpy's types), because when ctypes will be used with something else than Numpy, it would be a nightmare. The only thing you have to do to have your own dispatcher is to wrap the C functions in a Python function that will call the good function. It is very straightforward, so you might think that this solution could be integarted into Numpy but : - for such simple code, it is not very useful, a wiki entry can do better - if you have a double dispatch to do, you would have to implement it yourself, so you would have to check the code (or the wiki entry) and you would send a mail to include a double dispatcher to Numpy - etc for a triple dispatcher, four Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Feb 5 05:52:13 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 05 Feb 2008 19:52:13 +0900 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205105009.GF15367@phare.normalesup.org> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> <20080205105009.GF15367@phare.normalesup.org> Message-ID: <47A83FDD.6020304@ar.media.kyoto-u.ac.jp> Gael Varoquaux wrote: > On Tue, Feb 05, 2008 at 11:48:37AM +0100, Sebastian Haase wrote: >> Thanks fr the reply. >> How about "manual" overloading. I mean, if -- for example -- I have >> two functions mmms_b and mmms_i in C, I could still use ctypes; could >> I then "merge" them into one python function, which "re-routes" >> depending on the argument dtype !? > > Yes, that's exactly what I do (except I rarely use C++, so I reroute to > different C functions). It doesn't scale well, but it's OK if you have > only a few functions to worry about. It does not scale well if you do it manually, but I don't see any reason why it cannot be automated. ctypes main developer, Thomas Heller, has developed a code generator which parses C headers (which uses gcc-xml, that is can parse anything you can throw at gcc) and gives you ctypes-compatible code. It can be used on windows.h, which is the most horrendous/biggest header I can think of :) When using C++, you would need to find a way to generate C headers (needed anyway because you cannot dynamically load C++ code and used functions with C++ linkage, at least in a cross platform way). I have never used the code generator in such a way (handling multiple types), but I have used it to wrap win32 functions for sound IO, and it works pretty well. cheers, David From ondrej at certik.cz Tue Feb 5 06:31:58 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Tue, 5 Feb 2008 12:31:58 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205105200.GG15367@phare.normalesup.org> References: <819625.25707.qm@web34405.mail.mud.yahoo.com> <47A77019.1020204@noaa.gov> <20080205082139.GE9772@phare.normalesup.org> <47A83919.2050902@ar.media.kyoto-u.ac.jp> <85b5c3130802050248y4237ab15i4b2348b16bab9b97@mail.gmail.com> <20080205105200.GG15367@phare.normalesup.org> Message-ID: <85b5c3130802050331i2bbe2e6bke3c7b742f450830a@mail.gmail.com> On Feb 5, 2008 11:52 AM, Gael Varoquaux wrote: > On Tue, Feb 05, 2008 at 11:48:38AM +0100, Ondrej Certik wrote: > > I use Cython, mostly for the same reasons Gael is using ctypes - it's trivial. > > Actually, when I want to do something really trivial, I use > scipy.weave.inline ( see http://scipy.org/PerformancePython for an > example of scipy.weave.inline use). Of course it doesn't work when > linking to external libraries, but to accelerate a for loop, it's great. Yep. The power of Cython/Pyrex strategy is that you can work as I described here: http://ondrej.certik.cz/development/ Ondrej From ndbecker2 at gmail.com Tue Feb 5 06:55:27 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Feb 2008 06:55:27 -0500 Subject: [Numpy-discussion] C-api to slicing? Message-ID: Is there a C-api to array slicing? From lou_boog2000 at yahoo.com Tue Feb 5 09:45:25 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 5 Feb 2008 06:45:25 -0800 (PST) Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205082139.GE9772@phare.normalesup.org> Message-ID: <921458.42650.qm@web34409.mail.mud.yahoo.com> Hmmm... last time I tried ctypes it seemed pretty Windows oriented and I got nowhere. But enough people have said how easy it is that I'll give it another try. Believe me, I'd be happy to be wrong and find a nice easy way to pass NumPy arrays and such. Thanks. -- Lou Pecora --- Gael Varoquaux wrote: > On Tue, Feb 05, 2008 at 09:15:29AM +0100, Sebastian > Haase wrote: > > Can ctypes do this ? > > No. Ctypes is only a way of loading C (and not C++) > libraries in Python. > That makes it very simple, but not very powerful. > Ga?l ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From gael.varoquaux at normalesup.org Tue Feb 5 09:54:20 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 5 Feb 2008 15:54:20 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <921458.42650.qm@web34409.mail.mud.yahoo.com> References: <20080205082139.GE9772@phare.normalesup.org> <921458.42650.qm@web34409.mail.mud.yahoo.com> Message-ID: <20080205145420.GA27188@phare.normalesup.org> On Tue, Feb 05, 2008 at 06:45:25AM -0800, Lou Pecora wrote: > Hmmm... last time I tried ctypes it seemed pretty > Windows oriented and I got nowhere. But enough people > have said how easy it is that I'll give it another > try. I don't use windows much. One thing I liked about ctypes when I used it, was that what I found it pretty easy to get working on both Linux and Windows. Ga?l From lou_boog2000 at yahoo.com Tue Feb 5 10:16:01 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 5 Feb 2008 07:16:01 -0800 (PST) Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205145420.GA27188@phare.normalesup.org> Message-ID: <757157.7783.qm@web34404.mail.mud.yahoo.com> --- Gael Varoquaux wrote: Re: ctypes > I don't use windows much. One thing I liked about > ctypes when I used it, > was that what I found it pretty easy to get working > on both Linux and > Windows. > > Ga?l I got ctypes to install easily on Mac OS X 10.4.11 and it passed the test using python setup.py test. Now I have to find some examples on using it and learn to compile shared libraries (.so type I guess). -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From ceball at users.sourceforge.net Tue Feb 5 10:22:53 2008 From: ceball at users.sourceforge.net (Chris Ball) Date: Tue, 5 Feb 2008 15:22:53 +0000 (UTC) Subject: [Numpy-discussion] Problem accessing elements of an array of dtype="O" from C Message-ID: Hi, I'm having some trouble accessing elements in an array of dtype="O" from C code; I hope someone on the list could give me some advice (because I might be doing something stupid). I have an array of simple objects, created as follows: class CF(object): def __init__(self,num=0.0): self.num=num from numpy import array objs = array([[CF(0.0),CF(0.1),CF(0.2)], [CF(1.0),CF(1.1),CF(1.2)]],dtype=object) I'd like to loop through this array and access the 'num' attribute of each CF object - but using C. I have a C function (based on an example in the numpy book - 'Basic Iteration', page 312): double loop(PyObject* a_){ PyArrayIterObject *iter; iter = (PyArrayIterObject *)PyArray_IterNew(a_); while (iter->index < iter->size) { PyObject *cf = (PyObject *)(iter->dataptr); PyObject *num_obj = PyObject_GetAttrString(cf,"num"); PyArray_ITER_NEXT(iter); } return 0.0; } I get a segmentation fault when I try to call loop(objs). (Commenting out the 'PyObject_GetAttrString' line, I do not get a segmentation fault.) Using instead an array of floats and a very similar function, I have no problems: from numpy import array objs = array([[1.0,2.0,3.0],[4.0,5.0,6.0]],dtype=float) ... double loop(PyObject* a_){ PyArrayIterObject *iter; iter = (PyArrayIterObject *)PyArray_IterNew(a_); while (iter->index < iter->size) { double *num = (double *)(iter->dataptr); printf("%f\\n",*num); PyArray_ITER_NEXT(iter); } return 0.0; } I am calling this C code via Instant [1]. I have previously written to the SciPy-Dev mailing list about a strange (to me) indexing problem using arrays of dtype='O' with Weave [2]; I believe that this is the same problem (and that it is either with numpy, or with my understanding of numpy, rather than with Weave or Instant). Please could someone help me out? If you think you could help if I were to give you a runnable example of the problem in your own preferred way of calling C code from Python, please let me know, and I can probably provide one. Thanks, Chris [1] http://www.fenics.org/wiki/Instant [2] http://thread.gmane.org/gmane.comp.python.scientific.devel/7198/focus=7264 From lou_boog2000 at yahoo.com Tue Feb 5 11:25:47 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 5 Feb 2008 08:25:47 -0800 (PST) Subject: [Numpy-discussion] New to ctypes. Some problems with loading shared library. In-Reply-To: <757157.7783.qm@web34404.mail.mud.yahoo.com> Message-ID: <16514.83229.qm@web34406.mail.mud.yahoo.com> I got ctypes installed and passing its own tests. But I cannot get the shared library to load. I am using Mac OS X 10.4.11, Python 2.4 running through the Terminal. I am using Albert Strasheim's example on http://scipy.org/Cookbook/Ctypes2 except that I had to remove the defined 'extern' for FOO_API since the gcc compiler complained about two 'externs' (I don't really understand what the extern does here anyway). My make file for generating the library is simple, # ---- Link --------------------------- test1ctypes.so: test1ctypes.o test1ctypes.mak gcc -bundle -flat_namespace -undefined suppress -o test1ctypes.so test1ctypes.o # ---- gcc C compile ------------------ test1ctypes.o: test1ctypes.c test1ctypes.h test1ctypes.mak gcc -c test1ctypes.c -o test1ctypes.o This generates the file test1ctypes.so. But when I try to load it import numpy as N import ctypes as C _test1 = N.ctypeslib.load_library('test1ctypes', '.') I get the error message, OSError: dlopen(/Users/loupecora/test1ctypes.dylib, 6): image not found I've been googling for two hours trying to find the problem or other examples that would give me a clue, but no luck. Any ideas what I'm doing wrong? Thanks for any clues. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From rmay at ou.edu Tue Feb 5 11:34:47 2008 From: rmay at ou.edu (Ryan May) Date: Tue, 05 Feb 2008 10:34:47 -0600 Subject: [Numpy-discussion] New to ctypes. Some problems with loading shared library. In-Reply-To: <16514.83229.qm@web34406.mail.mud.yahoo.com> References: <16514.83229.qm@web34406.mail.mud.yahoo.com> Message-ID: <47A89027.1000401@ou.edu> Lou Pecora wrote: > I got ctypes installed and passing its own tests. But > I cannot get the shared library to load. I am using > Mac OS X 10.4.11, Python 2.4 running through the > Terminal. > > I am using Albert Strasheim's example on > http://scipy.org/Cookbook/Ctypes2 except that I had to > remove the defined 'extern' for FOO_API since the gcc > compiler complained about two 'externs' (I don't > really understand what the extern does here anyway). > > My make file for generating the library is simple, > > # ---- Link --------------------------- > test1ctypes.so: test1ctypes.o test1ctypes.mak > gcc -bundle -flat_namespace -undefined suppress -o > test1ctypes.so test1ctypes.o > > # ---- gcc C compile ------------------ > test1ctypes.o: test1ctypes.c test1ctypes.h > test1ctypes.mak > gcc -c test1ctypes.c -o test1ctypes.o > > This generates the file test1ctypes.so. But when I > try to load it > > import numpy as N > import ctypes as C > > _test1 = N.ctypeslib.load_library('test1ctypes', '.') > > I get the error message, > > OSError: dlopen(/Users/loupecora/test1ctypes.dylib, > 6): image not found > > I've been googling for two hours trying to find the > problem or other examples that would give me a clue, > but no luck. > > Any ideas what I'm doing wrong? Thanks for any clues. > Well, it's looking for test1ctypes.dylib, which I guess is a MacOSX shared library? Meanwhile, you made a test1ctypes.so, which is why it can't find it. You could try using this instead: _test1 = N.ctypeslib.load_library('test1ctypes.so', '.') or try to get gcc to make a test1ctypes.dylib. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From lou_boog2000 at yahoo.com Tue Feb 5 11:49:47 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 5 Feb 2008 08:49:47 -0800 (PST) Subject: [Numpy-discussion] New to ctypes. Some problems with loading shared library. In-Reply-To: <47A89027.1000401@ou.edu> Message-ID: <431148.29095.qm@web34401.mail.mud.yahoo.com> > Well, it's looking for test1ctypes.dylib, which I > guess is a MacOSX > shared library? Meanwhile, you made a > test1ctypes.so, which is why it > can't find it. You could try using this instead: > > _test1 = N.ctypeslib.load_library('test1ctypes.so', > '.') > > or try to get gcc to make a test1ctypes.dylib. > > Ryan > Thanks, Ryan. You were on the right track. I changed the name of the file in the load_library call to test1ctypes.so and I had to put in the full path to the file as the second argument. The default path was to my home directory. I could probably change paths with a python os call, too. Anyway, IT WORKED! How 'bout that? One simple example down and now on to more complex things. Thanks, again. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From kent-and at simula.no Tue Feb 5 13:16:02 2008 From: kent-and at simula.no (Kent-Andre Mardal) Date: Tue, 5 Feb 2008 18:16:02 +0000 (UTC) Subject: [Numpy-discussion] Numpy and C++ integration... References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Message-ID: Vince Fulco gmail.com> writes: > > Dear Numpy Experts- I find myself working with Numpy arrays and > wanting to access *simple* C++ functions for time series returning the > results to Numpy. As I am a relatively new user of Python/Numpy, the > number of paths to use in incorporating C++ code into one's scripts is > daunting. I've attempted the Weave app but can not get past the > examples. I've also looked at all the other choices out there such as > Boost, SIP, PyInline, etc. Any trailheads for the simplest approach > (assuming a very minimal understanding of C++) would be much > appreciated. At this point, I can't release the code however for > review. Thank you. > We have created a small Python module Instant (www.fenics.org/instant) on top of SWIG, which makes integration of C/C++ and NumPy arrays easy in some cases. Its use: import numpy from instant import inline_with_numpy c_code = """ double sum (int n1, double* array1){ double tmp = 0.0; for (int i=0; i Hello - Is there a function to compute the matrix rank of a numpy array or matrix? So I don't mean the current rank(), which gives the number of dimensions. I mean the number of independent equations of a matrix. Thanks, Mark From eads at soe.ucsc.edu Tue Feb 5 13:57:37 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Tue, 05 Feb 2008 11:57:37 -0700 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Message-ID: <47A8B1A1.1020508@soe.ucsc.edu> Dear Vince, You probably have heard better solutions but I think what I do works and is simple to learn. When I need to call C++ code from Python, I write a wrapper extern "C" function that calls the C++ function that returns the result. Then I just use ctypes to call the extern "C" function from Python. C++/C: extern "C" { double *get_result(double *input, int n) { return CPlusPlusFunction::GetResult(input, n); } } Python: import ctypes mylib = ctypes.CDLL('libmylib') def get_result(A): return mylib.get_result(input.ctypes.data) I hope this helps. Damian > Vince Fulco gmail.com> writes: > > Dear Numpy Experts- I find myself working with Numpy arrays and > wanting to access *simple* C++ functions for time series returning the > results to Numpy. As I am a relatively new user of Python/Numpy, the > number of paths to use in incorporating C++ code into one's scripts is > daunting. I've attempted the Weave app but can not get past the > examples. I've also looked at all the other choices out there such as > Boost, SIP, PyInline, etc. Any trailheads for the simplest approach > (assuming a very minimal understanding of C++) would be much > appreciated. At this point, I can't release the code however for > review. Thank you. From nwagner at iam.uni-stuttgart.de Tue Feb 5 13:59:50 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 05 Feb 2008 19:59:50 +0100 Subject: [Numpy-discussion] matrix rank of numpy array or matrix In-Reply-To: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> References: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> Message-ID: On Tue, 5 Feb 2008 10:54:01 -0800 (PST) mark wrote: > Hello - > > Is there a function to compute the matrix rank of a >numpy array or > matrix? > So I don't mean the current rank(), which gives the >number of > dimensions. > I mean the number of independent equations of a matrix. > > Thanks, > > Mark > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion AFAIK, a build-in function is (still) missing. However you might use a singular value decomposition to compute the numerical rank of a matrix. See e.g. http://osdir.com/ml/python.numeric.general/2006-02/msg00154.html Nils From kwgoodman at gmail.com Tue Feb 5 14:05:09 2008 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 5 Feb 2008 11:05:09 -0800 Subject: [Numpy-discussion] matrix rank of numpy array or matrix In-Reply-To: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> References: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> Message-ID: On Feb 5, 2008 10:54 AM, mark wrote: > Is there a function to compute the matrix rank of a numpy array or > matrix? I'm sure there's a more direct way, but numpy.linalg.lstsq returns the rank of a matrix. From markbak at gmail.com Tue Feb 5 14:37:12 2008 From: markbak at gmail.com (mark) Date: Tue, 5 Feb 2008 11:37:12 -0800 (PST) Subject: [Numpy-discussion] matrix rank of numpy array or matrix In-Reply-To: References: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> Message-ID: Thanks. I rewrote the line as: from numpy.linalg import svd from numpy import sum,where def matrixrank(A,tol=1e-8): s = svd(A,compute_uv=0) return sum( where( s>tol, 1, 0 ) ) Would be nice to include matrixrank in numpy, as it is really useful, Thanks again, Mark On Feb 5, 7:59 pm, "Nils Wagner" wrote: > On Tue, 5 Feb 2008 10:54:01 -0800 (PST) > mark wrote: > > > > > Hello - > > > Is there a function to compute the matrix rank of a > >numpy array or > > matrix? > > So I don't mean the current rank(), which gives the > >number of > > dimensions. > > I mean the number of independent equations of a matrix. > > > Thanks, > > > Mark > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discuss... at scipy.org > >http://projects.scipy.org/mailman/listinfo/numpy-discussion > > AFAIK, a build-in function is (still) missing. > However you might use a singular value decomposition > to compute the numerical rank of a matrix. > > See e.g.http://osdir.com/ml/python.numeric.general/2006-02/msg00154.html > > Nils > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion From nwagner at iam.uni-stuttgart.de Tue Feb 5 14:48:52 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 05 Feb 2008 20:48:52 +0100 Subject: [Numpy-discussion] matrix rank of numpy array or matrix In-Reply-To: References: <85a1290d-faec-4b5a-b6bf-b10e684e6dc2@e6g2000prf.googlegroups.com> Message-ID: On Tue, 5 Feb 2008 11:37:12 -0800 (PST) mark wrote: > Thanks. > I rewrote the line as: > > from numpy.linalg import svd > from numpy import sum,where > > def matrixrank(A,tol=1e-8): > s = svd(A,compute_uv=0) > return sum( where( s>tol, 1, 0 ) ) > > Would be nice to include matrixrank in numpy, as it is >really useful, +1 And a nullspace function - how about that ? See http://aspn.activestate.com/ASPN/Mail/Message/scipy-user/2726126 Nils From robert.kern at gmail.com Tue Feb 5 15:31:22 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 05 Feb 2008 14:31:22 -0600 Subject: [Numpy-discussion] C-api to slicing? In-Reply-To: References: Message-ID: <47A8C79A.9030903@gmail.com> Neal Becker wrote: > Is there a C-api to array slicing? PyObject_GetItem(), PySlice_New(), and friends, for the most part. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Glen.Mabey at swri.org Tue Feb 5 17:26:38 2008 From: Glen.Mabey at swri.org (Glen W. Mabey) Date: Tue, 5 Feb 2008 16:26:38 -0600 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> Message-ID: <20080205222638.GA16998@bams.ccf.swri.edu> On Tue, Feb 05, 2008 at 12:16:02PM -0600, Kent-Andre Mardal wrote: > We have created a small Python module Instant (www.fenics.org/instant) on top > of SWIG, which makes integration of C/C++ and NumPy arrays easy in some cases. Hello, Thank you for posting about instant. I think it looks like a great idea and hope to try it out soon. I noticed that you are distributing under the GPL. Would you consider releasing it under a more permissive license? Some rationale is given here: http://www.scipy.org/License_Compatibility Best Regards, Glen mabey From cfinley at u.washington.edu Tue Feb 5 17:34:25 2008 From: cfinley at u.washington.edu (Chris Finley) Date: Tue, 05 Feb 2008 14:34:25 -0800 Subject: [Numpy-discussion] Stride of 2 for correlate() Message-ID: <1202250865.16296.4.camel@sdf-workstation> Greetings, After searching the archives, I was unable to find a good method for changing the stride of the correlate or convolve routines. I am doing a Daubechies analysis of some sample data, say data = arange(0:80). The coefficient array or four floats (say daub_g2[0:4]) is correlated over the data. However, the product only needs to be calculated for every other data index. For example, I need to take the inner product of: data[0:4] with daub_g2[0:4] then data[2:6] with daub_g2[0:4] then data[4:8] with daub_g2[0:4] and so on. You help is greatly appreciated, Chris From peridot.faceted at gmail.com Tue Feb 5 19:09:04 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 5 Feb 2008 19:09:04 -0500 Subject: [Numpy-discussion] Stride of 2 for correlate() In-Reply-To: <1202250865.16296.4.camel@sdf-workstation> References: <1202250865.16296.4.camel@sdf-workstation> Message-ID: On 05/02/2008, Chris Finley wrote: > After searching the archives, I was unable to find a good method for > changing the stride of the correlate or convolve routines. I am doing a > Daubechies analysis of some sample data, say data = arange(0:80). The > coefficient array or four floats (say daub_g2[0:4]) is correlated over > the data. However, the product only needs to be calculated for every > other data index. I don't think that correlate or convolve can be made to behave this way. You can of course just throw away half the values, but I imagine you'd like it to be reasonably fast (do compare, though, speed tradeoffs can be surprising, particularly in python). This sort of thing comes up from time to time, so I took some old code I posted to scipy-user and put it in the cookbook at http://www.scipy.org/Cookbook/SegmentAxis . It works by converting your input array into a (in this case) 4 by n matrix with overlapping rows (without copying). You can then do what you like with the resulting array; convolutions just become matrix products. It's conceivable (though frankly unlikely) that the BLAS acceleration of matrix multiplication might make this run faster than correlate(). Good luck, Anne From oliphant at enthought.com Tue Feb 5 19:32:37 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Tue, 05 Feb 2008 18:32:37 -0600 Subject: [Numpy-discussion] Problem accessing elements of an array of dtype="O" from C In-Reply-To: References: Message-ID: <47A90025.4010508@enthought.com> Chris Ball wrote: > Hi, > > I'm having some trouble accessing elements in an array of dtype="O" > from C code; I hope someone on the list could give me some advice > (because I might be doing something stupid). > > I have an array of simple objects, created as follows: > > class CF(object): > def __init__(self,num=0.0): > self.num=num > > from numpy import array > objs = array([[CF(0.0),CF(0.1),CF(0.2)], > [CF(1.0),CF(1.1),CF(1.2)]],dtype=object) > > > I'd like to loop through this array and access the 'num' attribute of > each CF object - but using C. > > I have a C function (based on an example in the numpy book - 'Basic > Iteration', page 312): > > double loop(PyObject* a_){ > > PyArrayIterObject *iter; > iter = (PyArrayIterObject *)PyArray_IterNew(a_); > > while (iter->index < iter->size) { > PyObject *cf = (PyObject *)(iter->dataptr); > PyObject *num_obj = PyObject_GetAttrString(cf,"num"); > PyArray_ITER_NEXT(iter); > } > return 0.0; > } > The problem here is that iter->dataptr should be re-cast to a PyObject ** because what is contained at the memory location is a *pointer* to the PyObject. Thus, you have to de-reference iter->dataptr to get the PyObject * that you want: PyObject **cf = (PyObject **)PyArray_ITER_DATA(iter); PyObject *num_obj = PyObject_GetAttrString(*cf, "num"); PyArray_ITER_NEXT(iter); should do what you want. -Travis O. From humufr at yahoo.fr Tue Feb 5 14:58:40 2008 From: humufr at yahoo.fr (humufr at yahoo.fr) Date: Tue, 5 Feb 2008 14:58:40 -0500 Subject: [Numpy-discussion] [Bug] important bug in method sum Message-ID: <200802051458.41290.humufr@yahoo.fr> Hello, when doing some test I saw a very important bug in numpy (at least on the svn version and 1.0.3 (ubuntu package)). I'm using a svn version of numpy: In [31]: numpy.__version__ Out[31]: '1.0.5.dev4767' The problem is for an array larger than 256*256 the sum is going crazy. In [45]: numpy.arange(256*256) Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) In [46]: numpy.arange(256*256).sum() Out[46]: 2147450880 In [47]: numpy.arange(257*257) Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) In [48]: numpy.arange(257*257).sum() Out[48]: -2113765120 >>> import numpy >>> numpy.arange(256*256).sum() 2147450880 >>> numpy.arange(257*257).sum() -2113765120 >>> numpy.__version__ '1.0.3' Sorry for this bad news. N. ps: my system is an ubuntu linux 32 From humufr at yahoo.fr Tue Feb 5 15:20:09 2008 From: humufr at yahoo.fr (humufr at yahoo.fr) Date: Tue, 5 Feb 2008 15:20:09 -0500 Subject: [Numpy-discussion] [Bug] important bug in method sum Message-ID: <200802051520.09863.humufr@yahoo.fr> Sorry its not really a bug. I understood why . It's an integer and I'm doing an overflow. Perhaps an error message can be printed or an automatic change (with a warning) can be done. I think that I prefer to loose the type but keep the value correct. N. From kwgoodman at gmail.com Tue Feb 5 23:27:46 2008 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 5 Feb 2008 20:27:46 -0800 Subject: [Numpy-discussion] [Bug] important bug in method sum In-Reply-To: <200802051458.41290.humufr@yahoo.fr> References: <200802051458.41290.humufr@yahoo.fr> Message-ID: On Feb 5, 2008 11:58 AM, wrote: > The problem is for an array larger than 256*256 the sum is going crazy. > > In [45]: numpy.arange(256*256) > Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) > > In [46]: numpy.arange(256*256).sum() > Out[46]: 2147450880 > > In [47]: numpy.arange(257*257) > Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) > > In [48]: numpy.arange(257*257).sum() > Out[48]: -2113765120 You hit the limit on how big an integer can be. You'll have to switch to floats to do the sum: >> numpy.arange(257*257, dtype=numpy.float64).sum() 2181202176.0 From charlesr.harris at gmail.com Wed Feb 6 00:03:58 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Feb 2008 22:03:58 -0700 Subject: [Numpy-discussion] [Bug] important bug in method sum In-Reply-To: References: <200802051458.41290.humufr@yahoo.fr> Message-ID: On Feb 5, 2008 9:27 PM, Keith Goodman wrote: > On Feb 5, 2008 11:58 AM, wrote: > > The problem is for an array larger than 256*256 the sum is going crazy. > > > > In [45]: numpy.arange(256*256) > > Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) > > > > In [46]: numpy.arange(256*256).sum() > > Out[46]: 2147450880 > > > > In [47]: numpy.arange(257*257) > > Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) > > > > In [48]: numpy.arange(257*257).sum() > > Out[48]: -2113765120 > > You hit the limit on how big an integer can be. You'll have to switch > to floats to do the sum: > > >> numpy.arange(257*257, dtype=numpy.float64).sum() > 2181202176.0 Or tell numpy to use float64 for the sum: In [6]: a = arange(257*257) In [7]: a.sum(dtype=float64) Out[7]: 2181202176.0 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dg.numpy at thesamovar.net Wed Feb 6 07:11:28 2008 From: dg.numpy at thesamovar.net (Dan Goodman) Date: Wed, 6 Feb 2008 12:11:28 +0000 (UTC) Subject: [Numpy-discussion] Bug in numpy all() function Message-ID: Hi all, I think this is a bug (I'm running Numpy 1.0.3.1): >>> from numpy import * >>> def f(x): return False >>> all(f(x) for x in range(10)) True I guess the all function doesn't know about generators? Dan From haase at msg.ucsf.edu Wed Feb 6 07:33:53 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Wed, 6 Feb 2008 13:33:53 +0100 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <20080205222638.GA16998@bams.ccf.swri.edu> References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> <20080205222638.GA16998@bams.ccf.swri.edu> Message-ID: How does Instant compare to scipy.weave !? -Sebastian Haase On Feb 5, 2008 11:26 PM, Glen W. Mabey wrote: > On Tue, Feb 05, 2008 at 12:16:02PM -0600, Kent-Andre Mardal wrote: > > We have created a small Python module Instant (www.fenics.org/instant) on top > > of SWIG, which makes integration of C/C++ and NumPy arrays easy in some cases. > > Hello, > > Thank you for posting about instant. I think it looks like a great > idea and hope to try it out soon. > > I noticed that you are distributing under the GPL. > > Would you consider releasing it under a more permissive license? > > Some rationale is given here: > > http://www.scipy.org/License_Compatibility > > Best Regards, > Glen mabey > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From david at ar.media.kyoto-u.ac.jp Wed Feb 6 07:27:58 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 06 Feb 2008 21:27:58 +0900 Subject: [Numpy-discussion] [ANN] Blas-LAPACK superpack, 2nd alpha Message-ID: <47A9A7CE.8070201@ar.media.kyoto-u.ac.jp> Hi, I have finished a second alpha of the BLAS/LAPACK superpack for windows: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/blas-lapack-superpack.exe (~ 9 Mb). Changes from first alpha - Both SSE3 and SSE2 are supported. - custom installation possible: you can choose to install .a, .dll, .def and .lib for any target, atlas or netlib. The super pack is an installer which by default install the most optimized blas/lapack possible to compile numpy/scipy with. cheers, David From ndbecker2 at gmail.com Wed Feb 6 08:10:28 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 06 Feb 2008 08:10:28 -0500 Subject: [Numpy-discussion] random enhancement Message-ID: One thing missing from random is a mechanism to share a single underlying rng with other code that is not part of numpy.random. For example, I have code that generates distributions that expect a mersenne twister (the shared, underlying rng) to be passed in as a constructor argument. numpy.random shares a single rng amongst it's own distributions, but I don't see any way to share with others. Ideally, I believe this is the preferred design: rng1 = mersenne_twister (seed = 0) poisson (rng1, lambda=4) uniform (rng1, min=0, max=4) It would be best if numpy.random adopted this approach. Since I understand that's not likely, the alternative is for numpy.random to add some API that would allow direct access to the shared rng object. From Glen.Mabey at swri.org Wed Feb 6 09:32:25 2008 From: Glen.Mabey at swri.org (Glen W. Mabey) Date: Wed, 6 Feb 2008 08:32:25 -0600 Subject: [Numpy-discussion] Numpy and C++ integration... In-Reply-To: <1202289823.16333.43.camel@localhost> References: <34f2770f0802040802q1c493978s28dbfafe3405aa27@mail.gmail.com> <20080205222638.GA16998@bams.ccf.swri.edu> <1202289823.16333.43.camel@localhost> Message-ID: <20080206143225.GA21564@bams.ccf.swri.edu> On Wed, Feb 06, 2008 at 03:23:43AM -0600, Kent-Andre Mardal wrote: > No problem, it is now under BSD. OK? Perfect. Thank you. Glen From rmay at ou.edu Wed Feb 6 10:11:20 2008 From: rmay at ou.edu (Ryan May) Date: Wed, 06 Feb 2008 09:11:20 -0600 Subject: [Numpy-discussion] Bug in numpy all() function In-Reply-To: References: Message-ID: <47A9CE18.3020805@ou.edu> Dan Goodman wrote: > Hi all, > > I think this is a bug (I'm running Numpy 1.0.3.1): > >>>> from numpy import * >>>> def f(x): return False > >>>> all(f(x) for x in range(10)) > True > > I guess the all function doesn't know about generators? > That's likely the problem. However, as of Python 2.5, there's a built in function that will do what you want. However, you would mask that builtin with the from numpy import *. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From matthew.yeomans at gmail.com Wed Feb 6 10:23:01 2008 From: matthew.yeomans at gmail.com (matthew yeomans) Date: Wed, 6 Feb 2008 16:23:01 +0100 Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue 13 In-Reply-To: References: Message-ID: <4ff732450802060723s773c5873i7634a847a3179c21@mail.gmail.com> Is it possible to compile numpy with py2exe? Matthew Yeomans On 2/6/08, numpy-discussion-request at scipy.org < numpy-discussion-request at scipy.org> wrote: > > Send Numpy-discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://projects.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Numpy-discussion digest..." > > > Today's Topics: > > 1. Re: Problem accessing elements of an array of dtype="O" from > C (Travis E. Oliphant) > 2. [Bug] important bug in method sum (humufr at yahoo.fr) > 3. Re: [Bug] important bug in method sum (humufr at yahoo.fr) > 4. Re: [Bug] important bug in method sum (Keith Goodman) > 5. Re: [Bug] important bug in method sum (Charles R Harris) > 6. Bug in numpy all() function (Dan Goodman) > 7. Re: Numpy and C++ integration... (Sebastian Haase) > 8. [ANN] Blas-LAPACK superpack, 2nd alpha (David Cournapeau) > 9. random enhancement (Neal Becker) > 10. Re: Numpy and C++ integration... (Glen W. Mabey) > 11. Re: Bug in numpy all() function (Ryan May) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 05 Feb 2008 18:32:37 -0600 > From: "Travis E. Oliphant" > Subject: Re: [Numpy-discussion] Problem accessing elements of an array > of dtype="O" from C > To: Discussion of Numerical Python > Message-ID: <47A90025.4010508 at enthought.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Chris Ball wrote: > > Hi, > > > > I'm having some trouble accessing elements in an array of dtype="O" > > from C code; I hope someone on the list could give me some advice > > (because I might be doing something stupid). > > > > I have an array of simple objects, created as follows: > > > > class CF(object): > > def __init__(self,num=0.0): > > self.num=num > > > > from numpy import array > > objs = array([[CF(0.0),CF(0.1),CF(0.2)], > > [CF(1.0),CF(1.1),CF(1.2)]],dtype=object) > > > > > > I'd like to loop through this array and access the 'num' attribute of > > each CF object - but using C. > > > > I have a C function (based on an example in the numpy book - 'Basic > > Iteration', page 312): > > > > double loop(PyObject* a_){ > > > > PyArrayIterObject *iter; > > iter = (PyArrayIterObject *)PyArray_IterNew(a_); > > > > while (iter->index < iter->size) { > > PyObject *cf = (PyObject *)(iter->dataptr); > > PyObject *num_obj = PyObject_GetAttrString(cf,"num"); > > PyArray_ITER_NEXT(iter); > > } > > return 0.0; > > } > > > The problem here is that iter->dataptr should be re-cast to a PyObject > ** because what is contained at the memory location is a *pointer* to > the PyObject. Thus, you have to de-reference iter->dataptr to get the > PyObject * that you want: > > PyObject **cf = (PyObject **)PyArray_ITER_DATA(iter); > PyObject *num_obj = PyObject_GetAttrString(*cf, "num"); > PyArray_ITER_NEXT(iter); > > should do what you want. > > -Travis O. > > > > ------------------------------ > > Message: 2 > Date: Tue, 5 Feb 2008 14:58:40 -0500 > From: humufr at yahoo.fr > Subject: [Numpy-discussion] [Bug] important bug in method sum > To: Discussion of Numerical Python > Message-ID: <200802051458.41290.humufr at yahoo.fr> > Content-Type: text/plain; charset="us-ascii" > > Hello, > > when doing some test I saw a very important bug in numpy (at least on the > svn > version and 1.0.3 (ubuntu package)). > > I'm using a svn version of numpy: > > In [31]: numpy.__version__ > Out[31]: '1.0.5.dev4767' > > The problem is for an array larger than 256*256 the sum is going crazy. > > In [45]: numpy.arange(256*256) > Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) > > In [46]: numpy.arange(256*256).sum() > Out[46]: 2147450880 > > In [47]: numpy.arange(257*257) > Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) > > In [48]: numpy.arange(257*257).sum() > Out[48]: -2113765120 > > >>> import numpy > >>> numpy.arange(256*256).sum() > 2147450880 > >>> numpy.arange(257*257).sum() > -2113765120 > >>> numpy.__version__ > '1.0.3' > > > Sorry for this bad news. > > N. > > > ps: my system is an ubuntu linux 32 > > > > ------------------------------ > > Message: 3 > Date: Tue, 5 Feb 2008 15:20:09 -0500 > From: humufr at yahoo.fr > Subject: Re: [Numpy-discussion] [Bug] important bug in method sum > To: Discussion of Numerical Python > Message-ID: <200802051520.09863.humufr at yahoo.fr> > Content-Type: text/plain; charset="us-ascii" > > Sorry its not really a bug. I understood why . It's an integer and I'm > doing > an overflow. Perhaps an error message can be printed or an automatic > change > (with a warning) can be done. I think that I prefer to loose the type but > keep the value correct. > > N. > > > > ------------------------------ > > Message: 4 > Date: Tue, 5 Feb 2008 20:27:46 -0800 > From: "Keith Goodman" > Subject: Re: [Numpy-discussion] [Bug] important bug in method sum > To: "Discussion of Numerical Python" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > On Feb 5, 2008 11:58 AM, wrote: > > The problem is for an array larger than 256*256 the sum is going crazy. > > > > In [45]: numpy.arange(256*256) > > Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) > > > > In [46]: numpy.arange(256*256).sum() > > Out[46]: 2147450880 > > > > In [47]: numpy.arange(257*257) > > Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) > > > > In [48]: numpy.arange(257*257).sum() > > Out[48]: -2113765120 > > You hit the limit on how big an integer can be. You'll have to switch > to floats to do the sum: > > >> numpy.arange(257*257, dtype=numpy.float64).sum() > 2181202176.0 > > > ------------------------------ > > Message: 5 > Date: Tue, 5 Feb 2008 22:03:58 -0700 > From: "Charles R Harris" > Subject: Re: [Numpy-discussion] [Bug] important bug in method sum > To: "Discussion of Numerical Python" > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Feb 5, 2008 9:27 PM, Keith Goodman wrote: > > > On Feb 5, 2008 11:58 AM, wrote: > > > The problem is for an array larger than 256*256 the sum is going > crazy. > > > > > > In [45]: numpy.arange(256*256) > > > Out[45]: array([ 0, 1, 2, ..., 65533, 65534, 65535]) > > > > > > In [46]: numpy.arange(256*256).sum() > > > Out[46]: 2147450880 > > > > > > In [47]: numpy.arange(257*257) > > > Out[47]: array([ 0, 1, 2, ..., 66046, 66047, 66048]) > > > > > > In [48]: numpy.arange(257*257).sum() > > > Out[48]: -2113765120 > > > > You hit the limit on how big an integer can be. You'll have to switch > > to floats to do the sum: > > > > >> numpy.arange(257*257, dtype=numpy.float64).sum() > > 2181202176.0 > > > > Or tell numpy to use float64 for the sum: > > In [6]: a = arange(257*257) > > In [7]: a.sum(dtype=float64) > Out[7]: 2181202176.0 > > Chuck > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080205/9b51621d/attachment-0001.html > > ------------------------------ > > Message: 6 > Date: Wed, 6 Feb 2008 12:11:28 +0000 (UTC) > From: Dan Goodman > Subject: [Numpy-discussion] Bug in numpy all() function > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset=us-ascii > > Hi all, > > I think this is a bug (I'm running Numpy 1.0.3.1): > > >>> from numpy import * > >>> def f(x): return False > > >>> all(f(x) for x in range(10)) > True > > I guess the all function doesn't know about generators? > > Dan > > > > ------------------------------ > > Message: 7 > Date: Wed, 6 Feb 2008 13:33:53 +0100 > From: "Sebastian Haase" > Subject: Re: [Numpy-discussion] Numpy and C++ integration... > To: "Discussion of Numerical Python" > Cc: Kent-Andre Mardal > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > How does Instant compare to scipy.weave !? > > -Sebastian Haase > > > On Feb 5, 2008 11:26 PM, Glen W. Mabey wrote: > > On Tue, Feb 05, 2008 at 12:16:02PM -0600, Kent-Andre Mardal wrote: > > > We have created a small Python module Instant (www.fenics.org/instant) on > top > > > of SWIG, which makes integration of C/C++ and NumPy arrays easy in > some cases. > > > > Hello, > > > > Thank you for posting about instant. I think it looks like a great > > idea and hope to try it out soon. > > > > I noticed that you are distributing under the GPL. > > > > Would you consider releasing it under a more permissive license? > > > > Some rationale is given here: > > > > http://www.scipy.org/License_Compatibility > > > > Best Regards, > > Glen mabey > > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > ------------------------------ > > Message: 8 > Date: Wed, 06 Feb 2008 21:27:58 +0900 > From: David Cournapeau > Subject: [Numpy-discussion] [ANN] Blas-LAPACK superpack, 2nd alpha > To: Discussion of Numerical Python > Message-ID: <47A9A7CE.8070201 at ar.media.kyoto-u.ac.jp> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi, > > I have finished a second alpha of the BLAS/LAPACK superpack for > windows: > > > http://www.ar.media.kyoto-u.ac.jp/members/david/archives/blas-lapack-superpack.exe > (~ 9 Mb). > > Changes from first alpha > - Both SSE3 and SSE2 are supported. > - custom installation possible: you can choose to install .a, .dll, > .def and .lib for any target, atlas or netlib. > > The super pack is an installer which by default install the most > optimized blas/lapack possible to compile numpy/scipy with. > > cheers, > > David > > > ------------------------------ > > Message: 9 > Date: Wed, 06 Feb 2008 08:10:28 -0500 > From: Neal Becker > Subject: [Numpy-discussion] random enhancement > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset=us-ascii > > One thing missing from random is a mechanism to share a single underlying > rng with other code that is not part of numpy.random. > > For example, I have code that generates distributions that expect a > mersenne > twister (the shared, underlying rng) to be passed in as a constructor > argument. > > numpy.random shares a single rng amongst it's own distributions, but I > don't > see any way to share with others. > > Ideally, I believe this is the preferred design: > > rng1 = mersenne_twister (seed = 0) > > poisson (rng1, lambda=4) > uniform (rng1, min=0, max=4) > > It would be best if numpy.random adopted this approach. Since I > understand > that's not likely, the alternative is for numpy.random to add some API > that > would allow direct access to the shared rng object. > > > > ------------------------------ > > Message: 10 > Date: Wed, 6 Feb 2008 08:32:25 -0600 > From: "Glen W. Mabey" > Subject: Re: [Numpy-discussion] Numpy and C++ integration... > To: Kent-Andre Mardal , > "numpy-discussion at scipy.org" > Message-ID: <20080206143225.GA21564 at bams.ccf.swri.edu> > Content-Type: text/plain; charset=us-ascii > > On Wed, Feb 06, 2008 at 03:23:43AM -0600, Kent-Andre Mardal wrote: > > No problem, it is now under BSD. OK? > > Perfect. Thank you. > > Glen > > > ------------------------------ > > Message: 11 > Date: Wed, 06 Feb 2008 09:11:20 -0600 > From: Ryan May > Subject: Re: [Numpy-discussion] Bug in numpy all() function > To: Discussion of Numerical Python > Message-ID: <47A9CE18.3020805 at ou.edu> > Content-Type: text/plain; charset=ISO-8859-1 > > Dan Goodman wrote: > > Hi all, > > > > I think this is a bug (I'm running Numpy 1.0.3.1): > > > >>>> from numpy import * > >>>> def f(x): return False > > > >>>> all(f(x) for x in range(10)) > > True > > > > I guess the all function doesn't know about generators? > > > > That's likely the problem. However, as of Python 2.5, there's a built > in function that will do what you want. However, you would mask that > builtin with the from numpy import *. > > Ryan > > -- > Ryan May > Graduate Research Assistant > School of Meteorology > University of Oklahoma > > > ------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > End of Numpy-discussion Digest, Vol 17, Issue 13 > ************************************************ > -- Kollox Ghal Xejn -------------- next part -------------- An HTML attachment was scrubbed... URL: From listservs at mac.com Wed Feb 6 13:35:35 2008 From: listservs at mac.com (Chris) Date: Wed, 6 Feb 2008 18:35:35 +0000 (UTC) Subject: [Numpy-discussion] f2py compiled module not found by python Message-ID: Hello, I'm trying to build a package on Linux (Ubuntu) that contains a fortran module, built using f2py. However, despite the module building and installing without error, python cannot seem to see it (see log below). This works fine on Windows and Mac; the problem only seems to happen on Linux: In [1]: import PyMC ----------------------------------------------- exceptions.ImportError Traceback (most recent call last) /home/tianhuil/?ipython console> /usr/lib/python2.4/site-packages/PyMC/__init__.py /home/tianhuil/?string> /usr/lib/python2.4/site-packages/PyMC/MCMC.py ImportError: No module named flib /usr/lib/python2.4/site-packages/PyMC/MCMC.py Notice that the module exists in the site-packages directory: tianhuil tianhuil:/usr/lib/python2.4/site-packages/PyMC$ ll total 432 drwxr-xr-x 2 root root 4096 2008-02-03 17:24 Backends -rwxrwx--- 1 root root 195890 2008-02-03 17:24 flib.so -rwxrwx--- 1 root root 259 2008-02-03 17:14 __init__.py -rw-r--r-- 1 root root 473 2008-02-03 17:24 __init__.pyc -rwxrwx--- 1 root root 10250 2008-02-03 17:14 Matplot.py -rw-r--r-- 1 root root 7516 2008-02-03 17:24 Matplot.pyc -rwxrwx--- 1 root root 98274 2008-02-03 17:14 MCMC.py -rw-r--r-- 1 root root 79039 2008-02-03 17:24 MCMC.pyc drwxr-xr-x 2 root root 4096 2008-02-03 17:24 Tests -rwxrwx--- 1 root root 6631 2008-02-03 17:14 TimeSeries.py -rw-r--r-- 1 root root 5043 2008-02-03 17:24 TimeSeries.pyc From pearu at cens.ioc.ee Wed Feb 6 13:58:07 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Wed, 6 Feb 2008 20:58:07 +0200 (EET) Subject: [Numpy-discussion] f2py compiled module not found by python In-Reply-To: References: Message-ID: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> On Wed, February 6, 2008 8:35 pm, Chris wrote: > Hello, > > I'm trying to build a package on Linux (Ubuntu) that contains a fortran > module, built using f2py. However, despite the module building and > installing without error, python cannot seem to see it (see log below). > This works fine on Windows and Mac; the problem only seems to > happen on Linux: Can you import flib module directly? That is, what happens if you execute cd .../PyMC PYTHONPATH=. python -c 'import flib' ? Pearu From robert.kern at gmail.com Wed Feb 6 15:03:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 06 Feb 2008 14:03:20 -0600 Subject: [Numpy-discussion] Bug in numpy all() function In-Reply-To: References: Message-ID: <47AA1288.6070700@gmail.com> Dan Goodman wrote: > Hi all, > > I think this is a bug (I'm running Numpy 1.0.3.1): > >>>> from numpy import * >>>> def f(x): return False > >>>> all(f(x) for x in range(10)) > True > > I guess the all function doesn't know about generators? Yup. It works on arrays and things it can turn into arrays by calling the C API equivalent of numpy.asarray(). There's a ton of magic and special cases in asarray() in order to interpret nested Python sequences as arrays. That magic works fairly well when we have sequences with known lengths; it fails utterly when given an arbitrary iterator of unknown length. So we punt. Unfortunately, what happens then is that asarray() sees an object that it can't interpret as a sequence to turn into a real array, so it makes a rank-0 array with the iterator object as the value. This evaluates to True. It's possible that asarray() should raise an exception for generators, but it would be a special case. We wouldn't be able to test for arbitrary iterables. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From listservs at mac.com Wed Feb 6 15:13:04 2008 From: listservs at mac.com (Chris) Date: Wed, 6 Feb 2008 20:13:04 +0000 (UTC) Subject: [Numpy-discussion] f2py compiled module not found by python References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> Message-ID: Pearu Peterson cens.ioc.ee> writes: > > This works fine on Windows and Mac; the problem only seems to > > happen on Linux: > > Can you import flib module directly? That is, what happens if you > execute > cd .../PyMC > PYTHONPATH=. python -c 'import flib' It gives a "no module named flib" error. From robert.kern at gmail.com Wed Feb 6 15:15:18 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 06 Feb 2008 14:15:18 -0600 Subject: [Numpy-discussion] random enhancement In-Reply-To: References: Message-ID: <47AA1556.8070708@gmail.com> Neal Becker wrote: > One thing missing from random is a mechanism to share a single underlying > rng with other code that is not part of numpy.random. > > For example, I have code that generates distributions that expect a mersenne > twister (the shared, underlying rng) to be passed in as a constructor > argument. > > numpy.random shares a single rng amongst it's own distributions, but I don't > see any way to share with others. Are you talking about C or Python? In Python, just instantiate numpy.random.RandomState() and pass it around. All of the functions in numpy.random are just aliases to the methods on a global RandomState() instance. C is a problem because the module is implemented in Pyrex, and RandomState is an extension type. I've tried working on exposing the C API as a PyCObject like numpy does, but it is incredibly tedious and, furthermore, is unlikely to capture the higher-level methods like multivariate_normal(). I believe that Cython has a way to automatically expose the C API of a Pyrex/Cython module, but I haven't had the time to investigate it. For everything but the higher level methods like multivariate_normal(), we might be able to expose the pointer to the rk_state struct on the RandomState object as a PyCObject and punt on exposing the API. The C user can copy the randomkit.[ch] and distributions.[ch] files into their own code and operate on the rk_state pointer with those functions. We may be thwarted by symbol conflicts on some platforms, but I'm not sure. Contributions are, of course, welcome. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From peridot.faceted at gmail.com Wed Feb 6 15:18:38 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 6 Feb 2008 15:18:38 -0500 Subject: [Numpy-discussion] Bug in numpy all() function In-Reply-To: <47AA1288.6070700@gmail.com> References: <47AA1288.6070700@gmail.com> Message-ID: On 06/02/2008, Robert Kern wrote: > > I guess the all function doesn't know about generators? > > Yup. It works on arrays and things it can turn into arrays by calling the C API > equivalent of numpy.asarray(). There's a ton of magic and special cases in > asarray() in order to interpret nested Python sequences as arrays. That magic > works fairly well when we have sequences with known lengths; it fails utterly > when given an arbitrary iterator of unknown length. So we punt. Unfortunately, > what happens then is that asarray() sees an object that it can't interpret as a > sequence to turn into a real array, so it makes a rank-0 array with the iterator > object as the value. This evaluates to True. > > It's possible that asarray() should raise an exception for generators, but it > would be a special case. We wouldn't be able to test for arbitrary iterables. Would it be possible for asarray() to pull out the first element from the iterable, make an array out of it, then assume that all other values out of the iterable will have the same shape (raising an error, of course, when they aren't)? I guess this has high foot-shooting potential, but is it that much worse than numpy's shpe-guessing generally? It would be handy to be able to use an iterable to fill an array, so that you'd never need to store the values in anything else first: a = N.array((sin(N.pi*x/n) for x in xrange(n))) Anne From robert.kern at gmail.com Wed Feb 6 15:51:29 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 06 Feb 2008 14:51:29 -0600 Subject: [Numpy-discussion] Bug in numpy all() function In-Reply-To: References: <47AA1288.6070700@gmail.com> Message-ID: <47AA1DD1.7090504@gmail.com> Anne Archibald wrote: > On 06/02/2008, Robert Kern wrote: > >>> I guess the all function doesn't know about generators? >> Yup. It works on arrays and things it can turn into arrays by calling the C API >> equivalent of numpy.asarray(). There's a ton of magic and special cases in >> asarray() in order to interpret nested Python sequences as arrays. That magic >> works fairly well when we have sequences with known lengths; it fails utterly >> when given an arbitrary iterator of unknown length. So we punt. Unfortunately, >> what happens then is that asarray() sees an object that it can't interpret as a >> sequence to turn into a real array, so it makes a rank-0 array with the iterator >> object as the value. This evaluates to True. >> >> It's possible that asarray() should raise an exception for generators, but it >> would be a special case. We wouldn't be able to test for arbitrary iterables. > > Would it be possible for asarray() to pull out the first element from > the iterable, make an array out of it, then assume that all other > values out of the iterable will have the same shape (raising an error, > of course, when they aren't)? I guess this has high foot-shooting > potential, but is it that much worse than numpy's shpe-guessing > generally? I'm skeptical. Personally, it comes down to this: if you provide code that implements this safely and efficiently without making a confusing API, I'm more than happy to consider it for inclusion. But I'm not going to spend time trying to write the code. > It would be handy to be able to use an iterable to fill an array, so > that you'd never need to store the values in anything else first: > > a = N.array((sin(N.pi*x/n) for x in xrange(n))) If n is large enough that storage matters, a = N.sin(N.linspace(0, np.pi, n)) is always faster, more memory efficient, and more readable. Remember that the array will have to be dynamically resized as we go through the iterator. The memory movement is going to wipe out much of the benefit of having an iterator in the first place. For 1D arrays, remember that we have numpy.fromiter() already, so we can test this. In [39]: import numpy as np In [40]: from math import sin In [41]: n = 10 In [42]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) 100000 loops, best of 3: 11.5 ?s per loop In [43]: %timeit np.sin(np.linspace(0, np.pi, n)) 10000 loops, best of 3: 26.1 ?s per loop In [44]: n = 100 In [45]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) 10000 loops, best of 3: 84 ?s per loop In [46]: %timeit np.sin(np.linspace(0, np.pi, n)) 10000 loops, best of 3: 32.3 ?s per loop In [47]: n = 1000 In [48]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) 1000 loops, best of 3: 794 ?s per loop In [49]: %timeit np.sin(np.linspace(0, np.pi, n)) 10000 loops, best of 3: 91.8 ?s per loop So, for n=10, the generator wins, but is n=10 really the case that you want to use a generator for? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From steve at shrogers.com Wed Feb 6 20:51:10 2008 From: steve at shrogers.com (Steven H. Rogers) Date: Wed, 06 Feb 2008 18:51:10 -0700 Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue 13 In-Reply-To: <4ff732450802060723s773c5873i7634a847a3179c21@mail.gmail.com> References: <4ff732450802060723s773c5873i7634a847a3179c21@mail.gmail.com> Message-ID: <47AA640E.2030006@shrogers.com> matthew yeomans wrote: > Is it possible to compile numpy with py2exe? > > Matthew Yeomans > If you mean to generate a Windows executable containing py2exe, the answer is yes. The process isn't what is usually thought of as compilation as it just packages the Python interpreter, your application code, and required libraries for easy distribution. # Steve From dmitrey.kroshko at scipy.org Thu Feb 7 08:41:41 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 07 Feb 2008 15:41:41 +0200 Subject: [Numpy-discussion] isn't it a bug? (matrix multiplication) Message-ID: <47AB0A95.8050809@scipy.org> from numpy import array a = array((1.0, 2.0)) b = c = 15 b = b*a#ok c *= a#ok d = array(15) e = array(15) d = d*a#this works ok e *= a#this intended to be same as prev line, but yields error: Traceback (innermost last): File "", line 1, in ValueError: invalid return array shape From aisaac at american.edu Thu Feb 7 09:10:59 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 7 Feb 2008 09:10:59 -0500 Subject: [Numpy-discussion] isn't it a bug? (matrix multiplication) In-Reply-To: <47AB0A95.8050809@scipy.org> References: <47AB0A95.8050809@scipy.org> Message-ID: On Thu, 07 Feb 2008, dmitrey apparently wrote: > a = array((1.0, 2.0)) > e = array(15) > e *= a # ... yields error: You are trying to stuff in two values where you have only allocated space for 1. Cheers, Alan Isaac From Chris.Barker at noaa.gov Thu Feb 7 14:10:16 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Feb 2008 11:10:16 -0800 Subject: [Numpy-discussion] isn't it a bug? (matrix multiplication) In-Reply-To: References: <47AB0A95.8050809@scipy.org> Message-ID: <47AB5798.3000709@noaa.gov> Alan G Isaac wrote: > On Thu, 07 Feb 2008, dmitrey apparently wrote: >> a = array((1.0, 2.0)) >> e = array(15) >> e *= a # ... yields error: > > You are trying to stuff in two values where > you have only allocated space for 1. Exactly. but to expound a bit more: The ?= operators are in-place operators -- they attempt to modify the left hand side in-place. The regular math operators create a new array, which can be a different size than either of the two operands, thanks to "array broadcasting". x *= y should be the same as x = x*y iff the size of x*y is the same size as x. That's why: >>> e array(15) >>> a array([ 1., 2.]) >>> e*=a fails, but: >>> a*=e >>> a array([ 15., 30.]) works. One more note: you're doing element-wise array multiplication, not matrix multiplication -- a distinction that does matter sometimes. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From dalcinl at gmail.com Thu Feb 7 15:50:30 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 7 Feb 2008 17:50:30 -0300 Subject: [Numpy-discussion] f2py compiled module not found by python In-Reply-To: References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> Message-ID: Unless you try to run it as root, it will not work. Your file permissions are a mess. Please do the following (as root or via sudo) and try again $ chmod 755 /flib.so On 2/6/08, Chris wrote: > Pearu Peterson cens.ioc.ee> writes: > > > This works fine on Windows and Mac; the problem only seems to > > > happen on Linux: > > > > Can you import flib module directly? That is, what happens if you > > execute > > cd .../PyMC > > PYTHONPATH=. python -c 'import flib' > > It gives a "no module named flib" error. > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From yves.revaz at obspm.fr Fri Feb 8 05:28:15 2008 From: yves.revaz at obspm.fr (Yves Revaz) Date: Fri, 8 Feb 2008 11:28:15 +0100 (CET) Subject: [Numpy-discussion] bug report ? In-Reply-To: References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> Message-ID: <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> Dear list, I'm using old numarray C api with numpy. It seems that there is a bug when using the PyArray_FromDims function. for example, if I define : acc = (PyArrayObject *) PyArray_FromDims(pos->nd,pos->dimensions,pos->descr->type_num); where pos is PyArrayObject *pos; (3x3 array) when using return PyArray_Return(acc); I get array([], shape=(3, 0), dtype=float32) It is possible to make everything works if I use the following lines instead : int ld[2]; ld[0]=pos->dimensions[0]; ld[1]=pos->dimensions[1]; acc = (PyArrayObject *) PyArray_FromDims(pos->nd,ld,pos->descr->type_num); So, the problem comes from the pos->dimensions. Is it a known bug ? (I'm working on a linux 64bits machine.) Cheers, yves From millman at berkeley.edu Fri Feb 8 05:31:43 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Fri, 8 Feb 2008 02:31:43 -0800 Subject: [Numpy-discussion] David's build_with_scons branch merged! Message-ID: Hello, In preparation for the upcoming NumPy 1.0.5 release, I just merged David Cournapeau's build_with_scons branch: http://projects.scipy.org/scipy/numpy/changeset/4773 The current build system using numpy.distutils is still the default. NumPy does not include numscons; this merge adds scons support to numpy.distutils, provides some scons scripts, and modifies the configuration of numpy/core. David has extensively tested these changes and I did a very quick sanity check to make sure I didn't completely break everything. Obviously, we will need to push back the 1.0.5 release date again to ensure that there is sufficient testing. So please test these changes and let us know if you have any problems (or successes). David has been putting in a considerable effort over the last several months in developing numscons. If you are interested in the advantages to Davids approach, please read the description here: http://projects.scipy.org/scipy/numpy/wiki/NumpyScons Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From matthieu.brucher at gmail.com Fri Feb 8 05:33:18 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 8 Feb 2008 11:33:18 +0100 Subject: [Numpy-discussion] bug report ? In-Reply-To: <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> References: <61111.85.166.27.136.1202324287.squirrel@cens.ioc.ee> <40196.129.194.8.8.1202466495.squirrel@webmail.obspm.fr> Message-ID: Hi, What type is pos->dimensions in your case ? It may be long (64bits long) instead of the expected int (32bits) or something like that ? Matthieu 2008/2/8, Yves Revaz : > > > Dear list, > > I'm using old numarray C api with numpy. > It seems that there is a bug when using the PyArray_FromDims function. > > for example, if I define : > acc = (PyArrayObject *) > PyArray_FromDims(pos->nd,pos->dimensions,pos->descr->type_num); > > where pos is PyArrayObject *pos; (3x3 array) > > when using return PyArray_Return(acc); > I get > array([], shape=(3, 0), dtype=float32) > > > It is possible to make everything works if I use the following lines > instead : > int ld[2]; > ld[0]=pos->dimensions[0]; > ld[1]=pos->dimensions[1]; > acc = (PyArrayObject *) PyArray_FromDims(pos->nd,ld,pos->descr->type_num); > > So, the problem comes from the pos->dimensions. > > > Is it a known bug ? > > > (I'm working on a linux 64bits machine.) > > > Cheers, > > > yves > > > > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Feb 8 07:10:25 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 08 Feb 2008 07:10:25 -0500 Subject: [Numpy-discussion] PySequence_GetItem on numpy array doesn't inc refcount? Message-ID: It seems that calling PySequence_GetItem on a PyArrayObject does not inc refcount on the original object? That is surprising. Then, I suppose my code is supposed to do that itself? From faltet at carabos.com Fri Feb 8 07:29:34 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 8 Feb 2008 13:29:34 +0100 Subject: [Numpy-discussion] String sort Message-ID: <200802081329.35470.faltet@carabos.com> Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: >>> s = numpy.empty(10, "S10") >>> s.sort(kind="merge") TypeError: desired sort not supported for this type However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is "merge" (I presume because it is stable). So, perhaps the message above should be fixed. Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From ndbecker2 at gmail.com Fri Feb 8 07:39:41 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 08 Feb 2008 07:39:41 -0500 Subject: [Numpy-discussion] C-api to slicing? References: <47A8C79A.9030903@gmail.com> Message-ID: Robert Kern wrote: > Neal Becker wrote: >> Is there a C-api to array slicing? > > PyObject_GetItem(), PySlice_New(), and friends, for the most part. > I tried PySequence_GetItem on my array, and it seems the refcount isn't working. inline object test_slice3 (object const& in_obj, int r) { if (!PyArray_Check (in_obj.ptr())) { throw std::runtime_error ("test_slice3: input must be numpy::array"); } PyArrayObject* ao = (PyArrayObject*)(in_obj.ptr()); PyArrayObject* ro = (PyArrayObject*) PySequence_GetItem ((PyObject*)ao, r); return object (handle (ro)); } In the above, object is a boost::python::object, which is just a wrapper around a PyArrayObject*. I extract the PyArrayObject*. Now test: b = array ((2,3)) print sys.getrefcount (b) c = test_slice3(a,0) print sys.getrefcount (b), sys.getrefcount(c) This prints: 2 2 2 Here's what it should do: print sys.getrefcount (b) c = b[0] print sys.getrefcount (b), sys.getrefcount(c) 2 3 2 From ceball at users.sourceforge.net Fri Feb 8 07:56:53 2008 From: ceball at users.sourceforge.net (C. Ball) Date: Fri, 8 Feb 2008 12:56:53 +0000 (UTC) Subject: [Numpy-discussion] Problem accessing elements of an array of dtype="O" from C References: <47A90025.4010508@enthought.com> Message-ID: Travis E. Oliphant enthought.com> writes: [...] > The problem here is that iter->dataptr should be re-cast to a PyObject > ** because what is contained at the memory location is a *pointer* to > the PyObject. Thus, you have to de-reference iter->dataptr to get the > PyObject * that you want: Thanks for pointing out that mistake, Travis. You were correct, and the code now works - great! ...Well, it works when called from Instant, but we still have a problem using it with Weave. As I've mentioned before, we tried to adapt Weave to work with arrays of dtype="O", but something is still not working. The small pieces of C below show code that works when called from Instant, and almost identical C code that does not work when called from Weave. Presumably, Weave still has a problem with converting arrays of dtype="O". ------------------------------------------------------------ Working C code (used with Instant; as posted before, but with Travis's correction): """ void loop(PyObject* a_){ PyArrayIterObject *iter; iter = (PyArrayIterObject *)PyArray_IterNew(a_); while (iter->index < iter->size) { PyObject **cf = (PyObject **)(iter->dataptr); PyObject *num_obj = PyObject_GetAttrString(*cf,"num"); double num_ = PyFloat_AsDouble(num_obj); printf("%f\\n",num_); PyArray_ITER_NEXT(iter); } return; } """ C code used with Weave (Weave patched as given in http://article.gmane.org/gmane.comp.python.scientific.devel/7264): code=""" // get "cannot convert 'py::object*' to 'PyObject*' // in argument passing" without this... PyObject *a_ = (PyObject *)a; PyArrayIterObject *iter; iter = (PyArrayIterObject *)PyArray_IterNew(a_); while (iter->index < iter->size) { PyObject **cf = (PyObject **)(iter->dataptr); PyObject *num_obj = PyObject_GetAttrString(*cf,"num"); double num_ = PyFloat_AsDouble(num_obj); printf("%f\\n",num_); PyArray_ITER_NEXT(iter); } """ weave.inline(code,['a'],local_dict=locals(),verbose=1) Python array of objects: class CF(object): def __init__(self,num=0.0): self.num = num objs = numpy.array([[CF(0.0),CF(0.1),CF(0.2)], [CF(1.0),CF(1.1),CF(1.2)]],dtype=object) # Pass objs to the two functions above. The first version, used with # Instant, works fine. The second version, used with Weave, causes # a segmentation fault. ------------------------------------------------------------ We already have a lot of code written for Weave, and it seems to allow simpler inlining of C code than does Instant (though I might be wrong), so we'd really like to be able to continue to use Weave. I'll file a bug report for Weave, but I wanted to point out the problem on the numpy list in case anyone here has an idea (since Weave was part of numpy for a long time). Thanks for your help, Chris From matthew.yeomans at gmail.com Fri Feb 8 09:07:54 2008 From: matthew.yeomans at gmail.com (matthew yeomans) Date: Fri, 8 Feb 2008 15:07:54 +0100 Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue 15 In-Reply-To: References: Message-ID: <4ff732450802080607yefaa33aqdaa0079280dc7599@mail.gmail.com> Thanks I been trying to compile a code that uses random,pylab and numpy with py2exe the code of setup.py(compiles mycode.py into mycode.exe) follows #Start here from distutils.core import setup import py2exe import pylab import numpy import glob import scipy import random import os setup( console=['mycode.py'],options={'py2exe': {"skip_archive":1,'packages':['matplotlib','pytz']',}},data_files=[ matplotlib.get_py2exe_datafiles()]) #End here It works well for codes that uses pylab only. But It i add more modules i get trouble Is there any good books on how to use py2exe? On 2/7/08, numpy-discussion-request at scipy.org < numpy-discussion-request at scipy.org> wrote: > > Send Numpy-discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://projects.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Numpy-discussion digest..." > > > Today's Topics: > > 1. f2py compiled module not found by python (Chris) > 2. Re: f2py compiled module not found by python (Pearu Peterson) > 3. Re: Bug in numpy all() function (Robert Kern) > 4. Re: f2py compiled module not found by python (Chris) > 5. Re: random enhancement (Robert Kern) > 6. Re: Bug in numpy all() function (Anne Archibald) > 7. Re: Bug in numpy all() function (Robert Kern) > 8. Re: Numpy-discussion Digest, Vol 17, Issue 13 (Steven H. Rogers) > 9. isn't it a bug? (matrix multiplication) (dmitrey) > 10. Re: isn't it a bug? (matrix multiplication) (Alan G Isaac) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 6 Feb 2008 18:35:35 +0000 (UTC) > From: Chris > Subject: [Numpy-discussion] f2py compiled module not found by python > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset=utf-8 > > Hello, > > I'm trying to build a package on Linux (Ubuntu) that contains a fortran > module, built using f2py. However, despite the module building and > installing without error, python cannot seem to see it (see log below). > This works fine on Windows and Mac; the problem only seems to > happen on Linux: > > In [1]: import PyMC > ----------------------------------------------- > exceptions.ImportError Traceback (most > recent call last) > > /home/tianhuil/?ipython console> > > /usr/lib/python2.4/site-packages/PyMC/__init__.py > > /home/tianhuil/?string> > > /usr/lib/python2.4/site-packages/PyMC/MCMC.py > > ImportError: No module named flib > /usr/lib/python2.4/site-packages/PyMC/MCMC.py > > Notice that the module exists in the site-packages > directory: > > tianhuil tianhuil:/usr/lib/python2.4/site-packages/PyMC$ ll > total 432 > drwxr-xr-x 2 root root 4096 2008-02-03 17:24 Backends > -rwxrwx--- 1 root root 195890 2008-02-03 17:24 flib.so > -rwxrwx--- 1 root root 259 2008-02-03 17:14 __init__.py > -rw-r--r-- 1 root root 473 2008-02-03 17:24 __init__.pyc > -rwxrwx--- 1 root root 10250 2008-02-03 17:14 Matplot.py > -rw-r--r-- 1 root root 7516 2008-02-03 17:24 Matplot.pyc > -rwxrwx--- 1 root root 98274 2008-02-03 17:14 MCMC.py > -rw-r--r-- 1 root root 79039 2008-02-03 17:24 MCMC.pyc > drwxr-xr-x 2 root root 4096 2008-02-03 17:24 Tests > -rwxrwx--- 1 root root 6631 2008-02-03 17:14 TimeSeries.py > -rw-r--r-- 1 root root 5043 2008-02-03 17:24 TimeSeries.pyc > > > > ------------------------------ > > Message: 2 > Date: Wed, 6 Feb 2008 20:58:07 +0200 (EET) > From: "Pearu Peterson" > Subject: Re: [Numpy-discussion] f2py compiled module not found by > python > To: "Discussion of Numerical Python" > Message-ID: <61111.85.166.27.136.1202324287.squirrel at cens.ioc.ee> > Content-Type: text/plain;charset=iso-8859-1 > > On Wed, February 6, 2008 8:35 pm, Chris wrote: > > Hello, > > > > I'm trying to build a package on Linux (Ubuntu) that contains a fortran > > module, built using f2py. However, despite the module building and > > installing without error, python cannot seem to see it (see log below). > > This works fine on Windows and Mac; the problem only seems to > > happen on Linux: > > Can you import flib module directly? That is, what happens if you > execute > cd .../PyMC > PYTHONPATH=. python -c 'import flib' > ? > Pearu > > > > > ------------------------------ > > Message: 3 > Date: Wed, 06 Feb 2008 14:03:20 -0600 > From: Robert Kern > Subject: Re: [Numpy-discussion] Bug in numpy all() function > To: Discussion of Numerical Python > Message-ID: <47AA1288.6070700 at gmail.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Dan Goodman wrote: > > Hi all, > > > > I think this is a bug (I'm running Numpy 1.0.3.1): > > > >>>> from numpy import * > >>>> def f(x): return False > > > >>>> all(f(x) for x in range(10)) > > True > > > > I guess the all function doesn't know about generators? > > Yup. It works on arrays and things it can turn into arrays by calling the > C API > equivalent of numpy.asarray(). There's a ton of magic and special cases in > asarray() in order to interpret nested Python sequences as arrays. That > magic > works fairly well when we have sequences with known lengths; it fails > utterly > when given an arbitrary iterator of unknown length. So we punt. > Unfortunately, > what happens then is that asarray() sees an object that it can't interpret > as a > sequence to turn into a real array, so it makes a rank-0 array with the > iterator > object as the value. This evaluates to True. > > It's possible that asarray() should raise an exception for generators, but > it > would be a special case. We wouldn't be able to test for arbitrary > iterables. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 4 > Date: Wed, 6 Feb 2008 20:13:04 +0000 (UTC) > From: Chris > Subject: Re: [Numpy-discussion] f2py compiled module not found by > python > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset=us-ascii > > Pearu Peterson cens.ioc.ee> writes: > > > This works fine on Windows and Mac; the problem only seems to > > > happen on Linux: > > > > Can you import flib module directly? That is, what happens if you > > execute > > cd .../PyMC > > PYTHONPATH=. python -c 'import flib' > > It gives a "no module named flib" error. > > > > > > > > ------------------------------ > > Message: 5 > Date: Wed, 06 Feb 2008 14:15:18 -0600 > From: Robert Kern > Subject: Re: [Numpy-discussion] random enhancement > To: Discussion of Numerical Python > Message-ID: <47AA1556.8070708 at gmail.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Neal Becker wrote: > > One thing missing from random is a mechanism to share a single > underlying > > rng with other code that is not part of numpy.random. > > > > For example, I have code that generates distributions that expect a > mersenne > > twister (the shared, underlying rng) to be passed in as a constructor > > argument. > > > > numpy.random shares a single rng amongst it's own distributions, but I > don't > > see any way to share with others. > > Are you talking about C or Python? > > In Python, just instantiate numpy.random.RandomState() and pass it around. > All > of the functions in numpy.random are just aliases to the methods on a > global > RandomState() instance. > > C is a problem because the module is implemented in Pyrex, and RandomState > is an > extension type. I've tried working on exposing the C API as a PyCObject > like > numpy does, but it is incredibly tedious and, furthermore, is unlikely to > capture the higher-level methods like multivariate_normal(). I believe > that > Cython has a way to automatically expose the C API of a Pyrex/Cython > module, but > I haven't had the time to investigate it. > > For everything but the higher level methods like multivariate_normal(), we > might > be able to expose the pointer to the rk_state struct on the RandomState > object > as a PyCObject and punt on exposing the API. The C user can copy the > randomkit.[ch] and distributions.[ch] files into their own code and > operate on > the rk_state pointer with those functions. We may be thwarted by symbol > conflicts on some platforms, but I'm not sure. > > Contributions are, of course, welcome. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 6 > Date: Wed, 6 Feb 2008 15:18:38 -0500 > From: "Anne Archibald" > Subject: Re: [Numpy-discussion] Bug in numpy all() function > To: "Discussion of Numerical Python" > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > On 06/02/2008, Robert Kern wrote: > > > > I guess the all function doesn't know about generators? > > > > Yup. It works on arrays and things it can turn into arrays by calling > the C API > > equivalent of numpy.asarray(). There's a ton of magic and special cases > in > > asarray() in order to interpret nested Python sequences as arrays. That > magic > > works fairly well when we have sequences with known lengths; it fails > utterly > > when given an arbitrary iterator of unknown length. So we punt. > Unfortunately, > > what happens then is that asarray() sees an object that it can't > interpret as a > > sequence to turn into a real array, so it makes a rank-0 array with the > iterator > > object as the value. This evaluates to True. > > > > It's possible that asarray() should raise an exception for generators, > but it > > would be a special case. We wouldn't be able to test for arbitrary > iterables. > > Would it be possible for asarray() to pull out the first element from > the iterable, make an array out of it, then assume that all other > values out of the iterable will have the same shape (raising an error, > of course, when they aren't)? I guess this has high foot-shooting > potential, but is it that much worse than numpy's shpe-guessing > generally? > > It would be handy to be able to use an iterable to fill an array, so > that you'd never need to store the values in anything else first: > > a = N.array((sin(N.pi*x/n) for x in xrange(n))) > > Anne > > > ------------------------------ > > Message: 7 > Date: Wed, 06 Feb 2008 14:51:29 -0600 > From: Robert Kern > Subject: Re: [Numpy-discussion] Bug in numpy all() function > To: Discussion of Numerical Python > Message-ID: <47AA1DD1.7090504 at gmail.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Anne Archibald wrote: > > On 06/02/2008, Robert Kern wrote: > > > >>> I guess the all function doesn't know about generators? > >> Yup. It works on arrays and things it can turn into arrays by calling > the C API > >> equivalent of numpy.asarray(). There's a ton of magic and special cases > in > >> asarray() in order to interpret nested Python sequences as arrays. That > magic > >> works fairly well when we have sequences with known lengths; it fails > utterly > >> when given an arbitrary iterator of unknown length. So we punt. > Unfortunately, > >> what happens then is that asarray() sees an object that it can't > interpret as a > >> sequence to turn into a real array, so it makes a rank-0 array with the > iterator > >> object as the value. This evaluates to True. > >> > >> It's possible that asarray() should raise an exception for generators, > but it > >> would be a special case. We wouldn't be able to test for arbitrary > iterables. > > > > Would it be possible for asarray() to pull out the first element from > > the iterable, make an array out of it, then assume that all other > > values out of the iterable will have the same shape (raising an error, > > of course, when they aren't)? I guess this has high foot-shooting > > potential, but is it that much worse than numpy's shpe-guessing > > generally? > > I'm skeptical. Personally, it comes down to this: if you provide code that > implements this safely and efficiently without making a confusing API, I'm > more > than happy to consider it for inclusion. But I'm not going to spend time > trying > to write the code. > > > It would be handy to be able to use an iterable to fill an array, so > > that you'd never need to store the values in anything else first: > > > > a = N.array((sin(N.pi*x/n) for x in xrange(n))) > > If n is large enough that storage matters, > > a = N.sin(N.linspace(0, np.pi, n)) > > is always faster, more memory efficient, and more readable. Remember that > the > array will have to be dynamically resized as we go through the iterator. > The > memory movement is going to wipe out much of the benefit of having an > iterator > in the first place. > > For 1D arrays, remember that we have numpy.fromiter() already, so we can > test this. > > > In [39]: import numpy as np > > In [40]: from math import sin > > In [41]: n = 10 > > In [42]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) > 100000 loops, best of 3: 11.5 ?s per loop > > In [43]: %timeit np.sin(np.linspace(0, np.pi, n)) > 10000 loops, best of 3: 26.1 ?s per loop > > In [44]: n = 100 > > In [45]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) > 10000 loops, best of 3: 84 ?s per loop > > In [46]: %timeit np.sin(np.linspace(0, np.pi, n)) > 10000 loops, best of 3: 32.3 ?s per loop > > In [47]: n = 1000 > > In [48]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float) > 1000 loops, best of 3: 794 ?s per loop > > In [49]: %timeit np.sin(np.linspace(0, np.pi, n)) > 10000 loops, best of 3: 91.8 ?s per loop > > > So, for n=10, the generator wins, but is n=10 really the case that you > want to > use a generator for? > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 8 > Date: Wed, 06 Feb 2008 18:51:10 -0700 > From: "Steven H. Rogers" > Subject: Re: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue > 13 > To: Discussion of Numerical Python > Message-ID: <47AA640E.2030006 at shrogers.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > matthew yeomans wrote: > > Is it possible to compile numpy with py2exe? > > > > Matthew Yeomans > > > If you mean to generate a Windows executable containing py2exe, the > answer is yes. The process isn't what is usually thought of as > compilation as it just packages the Python interpreter, your application > code, and required libraries for easy distribution. > > # Steve > > > ------------------------------ > > Message: 9 > Date: Thu, 07 Feb 2008 15:41:41 +0200 > From: dmitrey > Subject: [Numpy-discussion] isn't it a bug? (matrix multiplication) > To: Discussion of Numerical Python > Message-ID: <47AB0A95.8050809 at scipy.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > from numpy import array > a = array((1.0, 2.0)) > > b = c = 15 > b = b*a#ok > c *= a#ok > > d = array(15) > e = array(15) > d = d*a#this works ok > e *= a#this intended to be same as prev line, but yields error: > Traceback (innermost last): > File "", line 1, in > ValueError: invalid return array shape > > > ------------------------------ > > Message: 10 > Date: Thu, 7 Feb 2008 09:10:59 -0500 > From: Alan G Isaac > Subject: Re: [Numpy-discussion] isn't it a bug? (matrix > multiplication) > To: Discussion of Numerical Python > Message-ID: > Content-Type: TEXT/PLAIN; CHARSET=UTF-8 > > On Thu, 07 Feb 2008, dmitrey apparently wrote: > > a = array((1.0, 2.0)) > > e = array(15) > > e *= a # ... yields error: > > You are trying to stuff in two values where > you have only allocated space for 1. > > Cheers, > Alan Isaac > > > > > > ------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > End of Numpy-discussion Digest, Vol 17, Issue 15 > ************************************************ > -- Kollox Ghal Xejn -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 8 10:19:25 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Feb 2008 08:19:25 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802081329.35470.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> Message-ID: On Feb 8, 2008 5:29 AM, Francesc Altet wrote: > Hi, > > I'm a bit confused that the sort method of a string character doesn't > allow a mergesort: > > >>> s = numpy.empty(10, "S10") > >>> s.sort(kind="merge") > TypeError: desired sort not supported for this type I think it's an error parsing the keyword. In fact, I thought I fixed that, but maybe I was waiting till I added the other methods. > However, by looking at the numpy sources, it seems that the only > implemented method for sorting array strings is "merge" (I presume > because it is stable). So, perhaps the message above should be fixed. > > Also, in the context of my work in indexing, and because of the slowness > of the current implementation in NumPy, I've ended with an > implementation of the quicksort method for 1-D array strings. For > moderately large arrays, it is about 2.5x-3x faster than the > (supposedly) mergesort version in NumPy, not only due to the quicksort, > but also because I've implemented a couple of macros for efficient > string swapping and copy. If this is of interest for NumPy developers, > tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Fri Feb 8 10:27:09 2008 From: david.huard at gmail.com (David Huard) Date: Fri, 8 Feb 2008 10:27:09 -0500 Subject: [Numpy-discussion] David's build_with_scons branch merged! In-Reply-To: References: Message-ID: <91cf711d0802080727p297c98e2uaeac97ac9e322916@mail.gmail.com> Jarrod and David, I am reporting a success on FC8, Xeon. Some tests don't pass, but I don't believe it is related to the build process. Well done, David 2008/2/8, Jarrod Millman : > > Hello, > > In preparation for the upcoming NumPy 1.0.5 release, I just merged > David Cournapeau's build_with_scons branch: > http://projects.scipy.org/scipy/numpy/changeset/4773 > > The current build system using numpy.distutils is still the default. > NumPy does not include numscons; this merge adds scons support to > numpy.distutils, provides some scons scripts, and modifies the > configuration of numpy/core. David has extensively tested these > changes and I did a very quick sanity check to make sure I didn't > completely break everything. > > Obviously, we will need to push back the 1.0.5 release date again to > ensure that there is sufficient testing. So please test these changes > and let us know if you have any problems (or successes). > > David has been putting in a considerable effort over the last several > months in developing numscons. If you are interested in the > advantages to Davids approach, please read the description here: > http://projects.scipy.org/scipy/numpy/wiki/NumpyScons > > Thanks, > > -- > Jarrod Millman > Computational Infrastructure for Research Labs > 10 Giannini Hall, UC Berkeley > phone: 510.643.4014 > http://cirl.berkeley.edu/ > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Fri Feb 8 10:58:33 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 8 Feb 2008 16:58:33 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> Message-ID: <200802081658.34330.faltet@carabos.com> A Friday 08 February 2008, Charles R Harris escrigu?: > > Also, in the context of my work in indexing, and because of the > > slowness of the current implementation in NumPy, I've ended with an > > implementation of the quicksort method for 1-D array strings. For > > moderately large arrays, it is about 2.5x-3x faster than the > > (supposedly) mergesort version in NumPy, not only due to the > > quicksort, but also because I've implemented a couple of macros for > > efficient string swapping and copy. If this is of interest for > > NumPy developers, tell me and I will provide the code. > > I have some code for this too and was going to merge it. Send yours > along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" -------------- next part -------------- A non-text attachment was scrubbed... Name: quicksort_string.c Type: text/x-csrc Size: 2919 bytes Desc: not available URL: From ndbecker2 at gmail.com Fri Feb 8 11:33:35 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 08 Feb 2008 11:33:35 -0500 Subject: [Numpy-discussion] PySequence_GetItem on numpy array doesn't inc refcount? References: Message-ID: Neal Becker wrote: > It seems that calling PySequence_GetItem on a PyArrayObject does not inc > refcount on the original object? That is surprising. Then, I suppose my > code is supposed to do that itself? Not sure what I was doing wrong before, but seems to be working as expected now, sorry for the noise. From jim at well.com Fri Feb 8 11:42:38 2008 From: jim at well.com (jim stockford) Date: Fri, 8 Feb 2008 08:42:38 -0800 Subject: [Numpy-discussion] bayPIGgies meets Thursday, 2/21: Guido van Rossum on Python 3.0 Message-ID: <82e6a624e6692e25d3e5872a1e1597d1@well.com> * SPECIAL NOTE: because Valentine's Day is on the second * Thursday of February (2/14) bayPIGgies has moved our * meeting to the third Thursday of the month, 2/21. bayPIGgies meeting Thursday 2/21: Guido van Rossum on Python 3.0 by Guido van Rossum Guido previews his keynote about Python 3000 at PyCon next month. Hear all about what Python 3000 means for your code, what tools will be available to help you in the transition, and how to be prepared for the next millennium. Location: Google Campus in Mountain View, CA Building 40, the Kiev room (first floor) bayPIGgies meeting information: http://baypiggies.net/new/plone * Please sign up in advance to have your google access badge ready: http://wiki.python.org/moin/BayPiggiesGoogleMeetings (no later than close of business on Wednesday.) Agenda----------------------------- ..... 7:30 PM ........................... General hubbub, inventory end-of-meeting announcements, any first-minute announcements. ..... 7:35 PM to 8:45 PM ................ The Talk (may extend a bit late) ..... 8:45 PM to 9:00 PM or After The Talk ................ Mapping and Random Access Mapping is a rapid-fire audience announcement of topics the announcers are interested in. Random Access follows immediately to allow follow up individually on the announcements and other topics of interest. ..... The March Meeting ................ TBD From guilherme.augusto.flach at gmail.com Fri Feb 8 11:48:59 2008 From: guilherme.augusto.flach at gmail.com (Guilherme Flach) Date: Fri, 8 Feb 2008 14:48:59 -0200 Subject: [Numpy-discussion] CVXOPT and OpenOffice Message-ID: <6a72bc5d0802080848u56317562h5c5b81a18f26545b@mail.gmail.com> Hi, I'm trying to use the CVXOPT extension for OpenOffice under Windows, but I got this error: CVXOPT might not be installed. On the "CVXOPT plugin for OpenOffice.org Users's Guide" I have seen the warning "The installation of CVXOPT must be in a location known to the OpenOffice.org spreadsheet. On a Linux system this corresponds to a regular "system-wide" installation. For other platforms installation may vary." So what is the right location to install CVXOPT on Windows in order to make OpenOffice found it? Thanks in advance. Guilherme Flach From faltet at carabos.com Fri Feb 8 12:31:15 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 8 Feb 2008 18:31:15 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802081658.34330.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802081658.34330.faltet@carabos.com> Message-ID: <200802081831.15573.faltet@carabos.com> A Friday 08 February 2008, Francesc Altet escrigu?: > A Friday 08 February 2008, Charles R Harris escrigu?: > > > Also, in the context of my work in indexing, and because of the > > > slowness of the current implementation in NumPy, I've ended with > > > an implementation of the quicksort method for 1-D array strings. > > > For moderately large arrays, it is about 2.5x-3x faster than the > > > (supposedly) mergesort version in NumPy, not only due to the > > > quicksort, but also because I've implemented a couple of macros > > > for efficient string swapping and copy. If this is of interest > > > for NumPy developers, tell me and I will provide the code. > > > > I have some code for this too and was going to merge it. Send yours > > along and I'll get to it this weekend. > > Ok, great. I'm attaching it. Tell me if you need some clarification > on the code. Ops. I've introduced a last-minute problem in my code. To fix this, just replace the flawed opt_strncmp() that I sent before by: static int inline opt_strncmp(char *a, char *b, int n) { int i; for (i=0; i b[i]) return i+1; if (a[i] < b[i]) return -(i+1); /* Another way, but seems equivalent in speed, at least here */ /* if (a[i] != b[i]) */ /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ } return 0; } Apparently, this version works just fine. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Fri Feb 8 13:07:20 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Feb 2008 11:07:20 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802081831.15573.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802081658.34330.faltet@carabos.com> <200802081831.15573.faltet@carabos.com> Message-ID: On Feb 8, 2008 10:31 AM, Francesc Altet wrote: > A Friday 08 February 2008, Francesc Altet escrigu?: > > A Friday 08 February 2008, Charles R Harris escrigu?: > > > > Also, in the context of my work in indexing, and because of the > > > > slowness of the current implementation in NumPy, I've ended with > > > > an implementation of the quicksort method for 1-D array strings. > > > > For moderately large arrays, it is about 2.5x-3x faster than the > > > > (supposedly) mergesort version in NumPy, not only due to the > > > > quicksort, but also because I've implemented a couple of macros > > > > for efficient string swapping and copy. If this is of interest > > > > for NumPy developers, tell me and I will provide the code. > > > > > > I have some code for this too and was going to merge it. Send yours > > > along and I'll get to it this weekend. > > > > Ok, great. I'm attaching it. Tell me if you need some clarification > > on the code. > > Ops. I've introduced a last-minute problem in my code. To fix this, > just replace the flawed opt_strncmp() that I sent before by: > > static int inline opt_strncmp(char *a, char *b, int n) { > int i; > for (i=0; i if (a[i] > b[i]) return i+1; > if (a[i] < b[i]) return -(i+1); > /* Another way, but seems equivalent in speed, at least here */ > /* if (a[i] != b[i]) */ > /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ > } > return 0; > } > > Apparently, this version works just fine. > Did you find this significantly faster than strncmp? There is also a unicode compare, do you have thoughts about that? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Feb 8 14:25:10 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 08 Feb 2008 13:25:10 -0600 Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue 15 In-Reply-To: <4ff732450802080607yefaa33aqdaa0079280dc7599@mail.gmail.com> References: <4ff732450802080607yefaa33aqdaa0079280dc7599@mail.gmail.com> Message-ID: <47ACAC96.8040406@gmail.com> Matthew, please do not reply to the digests. Think of them as read-only. If you want to start a new thread, send your mail, with a descriptive Subject line, to numpy-discussion at scipy.org . If you want to reply to individual messages, please turn digest delivery *off* and receive and respond to messages normally. Thank you. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Fri Feb 8 14:30:07 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Feb 2008 12:30:07 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802081658.34330.faltet@carabos.com> <200802081831.15573.faltet@carabos.com> Message-ID: On Feb 8, 2008 11:07 AM, Charles R Harris wrote: > > > On Feb 8, 2008 10:31 AM, Francesc Altet wrote: > > > A Friday 08 February 2008, Francesc Altet escrigu?: > > > A Friday 08 February 2008, Charles R Harris escrigu?: > > > > > Also, in the context of my work in indexing, and because of the > > > > > slowness of the current implementation in NumPy, I've ended with > > > > > an implementation of the quicksort method for 1-D array strings. > > > > > For moderately large arrays, it is about 2.5x-3x faster than the > > > > > (supposedly) mergesort version in NumPy, not only due to the > > > > > quicksort, but also because I've implemented a couple of macros > > > > > for efficient string swapping and copy. If this is of interest > > > > > for NumPy developers, tell me and I will provide the code. > > > > > > > > I have some code for this too and was going to merge it. Send yours > > > > along and I'll get to it this weekend. > > > > > > Ok, great. I'm attaching it. Tell me if you need some clarification > > > on the code. > > > > Ops. I've introduced a last-minute problem in my code. To fix this, > > just replace the flawed opt_strncmp() that I sent before by: > > > > static int inline opt_strncmp(char *a, char *b, int n) { > > int i; > > for (i=0; i > if (a[i] > b[i]) return i+1; > > if (a[i] < b[i]) return -(i+1); > > /* Another way, but seems equivalent in speed, at least here */ > > /* if (a[i] != b[i]) */ > > /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ > > } > > return 0; > > } > > > > Apparently, this version works just fine. > > > > Did you find this significantly faster than strncmp? There is also a > unicode compare, do you have thoughts about that? > Does anyone know if inline is compiler safe/portable these days. There are definitely places where I would like to use it, but I've been hesitant. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bolme1234 at comcast.net Fri Feb 8 16:14:35 2008 From: bolme1234 at comcast.net (David Bolme) Date: Fri, 8 Feb 2008 14:14:35 -0700 Subject: [Numpy-discussion] segfault problem with numpy and pickle In-Reply-To: <9521F49B-FBA4-4A9A-8DEE-6B2253321F23@comcast.net> References: <5C3C064A-B9E2-4DF3-B09D-CF6A87FEB462@comcast.net> <1201363910.6964.6.camel@localhost.localdomain> <9521F49B-FBA4-4A9A-8DEE-6B2253321F23@comcast.net> Message-ID: <8151F743-8862-419B-9C3A-B6206BE06FF8@comcast.net> I have added a valgrind report to bug 551. The report indicates a problem with uninitialized values. The segfault does seem to be related to certain configurations of atlas. I can confirm that I had this same problem occurs with the Ubuntu 7.04 installed scipy with SSE2 optimized ATLAS. The valgrind output is from a run where the code did not crash but valgrind still detected many errors. On Jan 26, 2008, at 3:01 PM, David Bolme wrote: > I think you are right. This does seem to be the same bug as 551. I > will try a non optimized ATLAS to see if that helps. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Fri Feb 8 16:38:18 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Feb 2008 14:38:18 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802081658.34330.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802081658.34330.faltet@carabos.com> Message-ID: On Feb 8, 2008 8:58 AM, Francesc Altet wrote: > A Friday 08 February 2008, Charles R Harris escrigu?: > > > Also, in the context of my work in indexing, and because of the > > > slowness of the current implementation in NumPy, I've ended with an > > > implementation of the quicksort method for 1-D array strings. For > > > moderately large arrays, it is about 2.5x-3x faster than the > > > (supposedly) mergesort version in NumPy, not only due to the > > > quicksort, but also because I've implemented a couple of macros for > > > efficient string swapping and copy. If this is of interest for > > > NumPy developers, tell me and I will provide the code. > > > > I have some code for this too and was going to merge it. Send yours > > along and I'll get to it this weekend. > > Ok, great. I'm attaching it. Tell me if you need some clarification on > the code. > I ran a few timing tests. On my machine strncmp is about 100x faster than opt_strncmp, but sSWAP (with some fixes), is about 10x faster then using the memcpy in a recent compiler. Does this match with your experience. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rex at nosyntax.com Fri Feb 8 21:11:53 2008 From: rex at nosyntax.com (rex) Date: Fri, 8 Feb 2008 18:11:53 -0800 Subject: [Numpy-discussion] svn 4777 distutils fails to find Intel MKL Message-ID: <20080209021153.GB27481@nosyntax.net> This is the 3rd time I have reported this problem and a fix. -rex ----- Forwarded message from rex ----- Date: Fri, 9 Nov 2007 11:16:17 -0800 From: rex To: Discussion of Numerical Python Subject: NumPy 1.04, MKL 10.0, & Intel 10.1 icc & ifort Message-ID: <20071109191617.GA17405 at nosyntax.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline Status: RO Content-Length: 4571 Lines: 126 Build was successful after a change in distutils. Core 2 Duo, Debian Etch-32, Python 2.5, icc 10.1, ifort 10.1, & mkl 10.0. MKL & the compilers were installed to their default locations: /opt/intel/mkl/, /opt/intel/cc/, /opt/intel/fc/ Installation will not interfere with earlier versions. NumPy site.cfg for MKL 10.0 ========================================== [DEFAULT] library_dirs=/opt/intel/mkl/10.0.011/lib/32 include_dirs=/opt/intel/mkl/10.0.011/include [mkl] libraries=mkl,guide [lapack_info] libraries=mkl_lapack,mkl,guide ========================================== Changes in distutils: ========================================================= In MKL 9.1 & 10.0, mkl_lapack32 & mkl_lapack64 no longer exist. They were replaced by /32/mkl_lapack and /64/mkl_lapack. As a result, numpy-1.04/numpy/distutils/system_info.py needs to be changed: #lapack_libs = self.get_libs('lapack_libs',['mkl_lapack32','mkl_lapack64']) lapack_libs = self.get_libs('lapack_libs',['mkl_lapack']) If the above change is not made numpy builds, but without using MKL. In numpy-1.04/numpy/distutils/ccompiler.py change: #cc_exe = 'icc' #works, but is suboptimal cc_exe = 'icc -msse3 -fast' #adjust to suit your cpu ========================================================= Running the command below (all one line) from the numpy-1.04 directory python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install > inst.log Gave this inst.log: ===================================================== F2PY Version 2_4412 blas_opt_info: blas_mkl_info: FOUND: libraries = ['mkl', 'pthread', 'mkl', 'guide'] library_dirs = ['/opt/intel/mkl/10.0.011/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/mkl/10.0.011/include'] FOUND: libraries = ['mkl', 'pthread', 'mkl', 'guide'] library_dirs = ['/opt/intel/mkl/10.0.011/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/mkl/10.0.011/include'] lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: libraries = ['mkl', 'pthread', 'mkl', 'guide'] library_dirs = ['/opt/intel/mkl/10.0.011/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/mkl/10.0.011/include'] FOUND: libraries = ['mkl_lapack', 'mkl', 'pthread', 'mkl', 'guide', 'mkl', 'guide'] library_dirs = ['/opt/intel/mkl/10.0.011/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/mkl/10.0.011/include'] FOUND: libraries = ['mkl_lapack', 'mkl', 'pthread', 'mkl', 'guide', 'mkl', 'guide'] library_dirs = ['/opt/intel/mkl/10.0.011/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/mkl/10.0.011/include'] running config running build_clib running build_ext running build_src building py_modules sources creating build [...] =========================================================== For reference, here are the respective lib/32 files for MKL 9.1 and 10.0: c2d0:/opt/intel/mkl/9.1/lib/32# ls libguide.a libmkl_gfortran.a libmkl_ias.so libmkl_p3.so libmkl_p4.so libmkl_vml_def.so libmkl_vml_p4p.so libguide.so libmkl_gfortran.so libmkl_lapack.a libmkl_p4m.so libmkl.so libmkl_vml_p3.so libmkl_vml_p4.so libmkl_def.so libmkl_ia32.a libmkl_lapack.so libmkl_p4p.so libmkl_solver.a libmkl_vml_p4m.so libvml.so c2d0:/opt/intel/mkl/10.0.011/lib/32# ls libguide.a libmkl_cdft_core.a libmkl_intel.a libmkl_p4.so libmkl_vml_ia.so libguide.so libmkl_core.a libmkl_intel.so libmkl_scalapack.a libmkl_vml_p3.so libiomp5.a libmkl_core.so libmkl_intel_thread.a libmkl_scalapack_core.a libmkl_vml_p4m2.so libiomp5.so libmkl_def.so libmkl_intel_thread.so libmkl_sequential.a libmkl_vml_p4m.s o libmkl_blacs.a libmkl_gf.a libmkl_lapack.a libmkl_sequential.so libmkl_vml_p4p.so libmkl_blacs_intelmpi20.a libmkl_gf.so libmkl_lapack.so libmkl.so libmkl_vml_p4.so libmkl_blacs_intelmpi.a libmkl_gnu_thread.a libmkl_p3.so libmkl_solver.a libmkl_blacs_openmpi.a libmkl_gnu_thread.so libmkl_p4m.so libmkl_solver_sequential.a libmkl_cdft.a libmkl_ia32.a libmkl_p4p.so libmkl_vml_def.so Thanks to all who have helped me with earlier versions. -rex ----- End forwarded message ----- From steve at shrogers.com Fri Feb 8 21:14:04 2008 From: steve at shrogers.com (Steven H. Rogers) Date: Fri, 08 Feb 2008 19:14:04 -0700 Subject: [Numpy-discussion] py2exe issues (was Numpy-discussion Digest, Vol 17, Issue 15) In-Reply-To: <4ff732450802080607yefaa33aqdaa0079280dc7599@mail.gmail.com> References: <4ff732450802080607yefaa33aqdaa0079280dc7599@mail.gmail.com> Message-ID: <47AD0C6C.3050003@shrogers.com> matthew yeomans wrote: > Thanks I been trying to compile a code that uses random,pylab and > numpy with py2exe > the code of setup.py(compiles mycode.py into mycode.exe) follows > > #Start here > from distutils.core import setup > import py2exe > import pylab > import numpy > import glob > import scipy > import random > import os > > setup( console=['mycode.py'],options={'py2exe': > {"skip_archive":1,'packages':['matplotlib','pytz']',}},data_files=[matplotlib.get_py2exe_datafiles()]) > > #End here > > It works well for codes that uses pylab only. But It i add more > modules i get trouble > > Is there any good books on how to use py2exe? Matthew: I've only used py2exe with a couple of imports (numpy and os). You're importing an awful lot. Slimming this down to only import the things you really need from each module may help, e.g. from numpy import foo from random import bar etc. # Steve From rex at nosyntax.com Sat Feb 9 02:42:08 2008 From: rex at nosyntax.com (rex) Date: Fri, 8 Feb 2008 23:42:08 -0800 Subject: [Numpy-discussion] svn 4777 fails to build with icc Message-ID: <20080209074208.GX25391@nosyntax.net> After doing the necessary fix for distutils, svn4774 builds with gcc, but trying to build with icc with: python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install > inst.log fails with: Running from numpy source directory. /usr/local/src/numpy4777/numpy/distutils/system_info.py:1341: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) /usr/local/src/numpy4777/numpy/distutils/system_info.py:1350: UserWarning: Blas (http://www.netlib.org/blas/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [blas]) or by setting the BLAS environment variable. warnings.warn(BlasNotFoundError.__doc__) /usr/local/src/numpy4777/numpy/distutils/system_info.py:1353: UserWarning: Blas (http://www.netlib.org/blas/) sources not found. Directories to search for the sources can be specified in the numpy/distutils/site.cfg file (section [blas_src]) or by setting the BLAS_SRC environment variable. warnings.warn(BlasSrcNotFoundError.__doc__) /usr/local/src/numpy4777/numpy/distutils/system_info.py:1248: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) /usr/local/src/numpy4777/numpy/distutils/system_info.py:1259: UserWarning: Lapack (http://www.netlib.org/lapack/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [lapack]) or by setting the LAPACK environment variable. warnings.warn(LapackNotFoundError.__doc__) /usr/local/src/numpy4777/numpy/distutils/system_info.py:1262: UserWarning: Lapack (http://www.netlib.org/lapack/) sources not found. Directories to search for the sources can be specified in the numpy/distutils/site.cfg file (section [lapack_src]) or by setting the LAPACK_SRC environment variable. warnings.warn(LapackSrcNotFoundError.__doc__) site.cfg is: [DEFAULT] library_dirs=/opt/intel/mkl/10.0.1.014/lib/32 include_dirs=/opt/intel/mkl/10.0.1.014/include [mkl] library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ lapack_libs = mkl_lapack [lapack_src] libraries=mkl_lapack,mkl,guide OS is Debian Etch 4.0 I've built hundreds of programs from source over the years, including many versions of Numpy and Scipy. NOTHING has ever been as frustrating as these two. I often spend days trying to build a new release. Sometimes it works. More often, it does not, especially Scipy. I even switched from SUSE to Debian in the hope that it would help. It did not. I suggest that efforts to produce a product that ordinary 150 IQ mortals can install rather than add new features would grow the user population faster. For example, distutils has been broken for months for Intel MKL. I've posted 3x about it now. -rex -- I pray for a soroban. From silva at lma.cnrs-mrs.fr Sat Feb 9 02:52:54 2008 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Sat, 09 Feb 2008 08:52:54 +0100 Subject: [Numpy-discussion] [f2py] Troubles building a module Message-ID: <1202543574.4504.7.camel@Portable-s2m.cnrs-mrs.fr> Reading the tutorial http://scipy.org/Cookbook/Theoretical_Ecology/Hastings_and_Powell I've tried to run the provided code. But compiling the fortran module with the line command given in the tuto, I've got the following traceback (you can see it with syntax highlighting at http://paste.debian.net/48759 ): fab at Portable-s2m:/tmp$ f2py -c -m hastings hastings.f90 running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src building extension "hastings" sources f2py options: [] f2py:> /tmp/tmptfNK80/src.linux-i686-2.4/hastingsmodule.c creating /tmp/tmptfNK80 creating /tmp/tmptfNK80/src.linux-i686-2.4 Reading fortran codes... Reading file 'hastings.f90' (format:free) Post-processing... Block: hastings Block: model Block: fweb Post-processing (stage 2)... Block: hastings Block: unknown_interface Block: model Block: fweb Building modules... Building module "hastings"... Constructing F90 module support for "model"... Variables: a1 a2 b1 b2 d2 d1 Constructing wrapper function "model.fweb"... yprime = fweb(y,t) Wrote C/API module "hastings" to file "/tmp/tmptfNK80/src.linux-i686-2.4/hastingsmodule.c" Traceback (most recent call last): File "/usr/bin/f2py", line 26, in ? main() File "/var/lib/python-support/python2.4/numpy/f2py/f2py2e.py", line 558, in main run_compile() File "/var/lib/python-support/python2.4/numpy/f2py/f2py2e.py", line 545, in run_compile setup(ext_modules = [ext]) File "/var/lib/python-support/python2.4/numpy/distutils/core.py", line 176, in setup return old_setup(**new_attr) File "/usr/lib/python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/lib/python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/usr/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/usr/lib/python2.4/distutils/command/build.py", line 113, in run self.run_command(cmd_name) File "/usr/lib/python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/var/lib/python-support/python2.4/numpy/distutils/command/build_src.py", line 130, in run self.build_sources() File "/var/lib/python-support/python2.4/numpy/distutils/command/build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/var/lib/python-support/python2.4/numpy/distutils/command/build_src.py", line 256, in build_extension_sources sources = self.f2py_sources(sources, ext) File "/var/lib/python-support/python2.4/numpy/distutils/command/build_src.py", line 511, in f2py_sources numpy.f2py.run_main(f2py_options + ['--lower', File "/var/lib/python-support/python2.4/numpy/f2py/f2py2e.py", line 367, in run_main ret=buildmodules(postlist) File "/var/lib/python-support/python2.4/numpy/f2py/f2py2e.py", line 319, in buildmodules dict_append(ret[mnames[i]],rules.buildmodule(modules[i],um)) File "/var/lib/python-support/python2.4/numpy/f2py/rules.py", line 1222, in buildmodule for l in '\n\n'.join(funcwrappers2)+'\n'.split('\n'): TypeError: cannot concatenate 'str' and 'list' objects From david at ar.media.kyoto-u.ac.jp Sat Feb 9 06:06:18 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 09 Feb 2008 20:06:18 +0900 Subject: [Numpy-discussion] svn 4777 distutils fails to find Intel MKL In-Reply-To: <20080209021153.GB27481@nosyntax.net> References: <20080209021153.GB27481@nosyntax.net> Message-ID: <47AD892A.7070901@ar.media.kyoto-u.ac.jp> rex wrote: > This is the 3rd time I have reported this problem and a fix. > There is no need for a fix, this can be done wo touching distutils. See site.cfg.example, there is a mention at the end. cheers, David From david at ar.media.kyoto-u.ac.jp Sat Feb 9 06:17:40 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 09 Feb 2008 20:17:40 +0900 Subject: [Numpy-discussion] svn 4777 fails to build with icc In-Reply-To: <20080209074208.GX25391@nosyntax.net> References: <20080209074208.GX25391@nosyntax.net> Message-ID: <47AD8BD4.3060908@ar.media.kyoto-u.ac.jp> rex wrote: > After doing the necessary fix for distutils, svn4774 builds with > gcc, but trying to build with icc with: > > python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install > inst.log > > fails with: > > Running from numpy source directory. > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1341: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1350: UserWarning: > Blas (http://www.netlib.org/blas/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [blas]) or by setting > the BLAS environment variable. > warnings.warn(BlasNotFoundError.__doc__) > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1353: UserWarning: > Blas (http://www.netlib.org/blas/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [blas_src]) or by setting > the BLAS_SRC environment variable. > warnings.warn(BlasSrcNotFoundError.__doc__) > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1248: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1259: UserWarning: > Lapack (http://www.netlib.org/lapack/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [lapack]) or by setting > the LAPACK environment variable. > warnings.warn(LapackNotFoundError.__doc__) > /usr/local/src/numpy4777/numpy/distutils/system_info.py:1262: UserWarning: > Lapack (http://www.netlib.org/lapack/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [lapack_src]) or by setting > the LAPACK_SRC environment variable. > warnings.warn(LapackSrcNotFoundError.__doc__) > > site.cfg is: > > [DEFAULT] > library_dirs=/opt/intel/mkl/10.0.1.014/lib/32 > include_dirs=/opt/intel/mkl/10.0.1.014/include > > [mkl] > library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ > lapack_libs = mkl_lapack > > [lapack_src] > libraries=mkl_lapack,mkl,guide > > OS is Debian Etch 4.0 > > I've built hundreds of programs from source over the years, > including many versions of Numpy and Scipy. NOTHING has ever been > as frustrating as these two. Then, with all your respect, you have not built many complicated softwares. On debian, you can install all the dependencies with sudo apt-get install gcc g77 atlas3-base-dev python-dev And then build numpy and scipy using python setup.py install. This has always worked for me. > I often spend days trying to > build a new release. Sometimes it works. More often, it does not, > especially Scipy. I even switched from SUSE to Debian in the hope > that it would help. It did not. > > I suggest that efforts to produce a product that ordinary 150 > IQ mortals can install rather than add new features would grow the > user population faster. For example, distutils has been broken for > months for Intel MKL. I've posted 3x about it now. A non open source software keeps changing the options to link to it, and you complain to us ? I don't understand the logic here. Complain to Intel instead. The fact that building numpy/scipy with "exotic" softwares is at all possible is already a big service. I mean, try linking some blas/lapack libraries to matlab, and see if it is easier than with numpy or scipy. David From david at ar.media.kyoto-u.ac.jp Sat Feb 9 06:19:02 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 09 Feb 2008 20:19:02 +0900 Subject: [Numpy-discussion] David's build_with_scons branch merged! In-Reply-To: <91cf711d0802080727p297c98e2uaeac97ac9e322916@mail.gmail.com> References: <91cf711d0802080727p297c98e2uaeac97ac9e322916@mail.gmail.com> Message-ID: <47AD8C26.70106@ar.media.kyoto-u.ac.jp> David Huard wrote: > Jarrod and David, > > I am reporting a success on FC8, Xeon. Some tests don't pass, but I > don't believe it is related to the build process. If you did not have tests failing before, then it may be a regression, so do not hesitate to open a ticket, cheers, David From faltet at carabos.com Sat Feb 9 08:40:56 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 14:40:56 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802081658.34330.faltet@carabos.com> Message-ID: <200802091440.57193.faltet@carabos.com> A Friday 08 February 2008, Charles R Harris escrigu?: > On Feb 8, 2008 8:58 AM, Francesc Altet wrote: > > A Friday 08 February 2008, Charles R Harris escrigu?: > > > > Also, in the context of my work in indexing, and because of the > > > > slowness of the current implementation in NumPy, I've ended > > > > with an implementation of the quicksort method for 1-D array > > > > strings. For moderately large arrays, it is about 2.5x-3x > > > > faster than the (supposedly) mergesort version in NumPy, not > > > > only due to the quicksort, but also because I've implemented a > > > > couple of macros for efficient string swapping and copy. If > > > > this is of interest for NumPy developers, tell me and I will > > > > provide the code. > > > > > > I have some code for this too and was going to merge it. Send > > > yours along and I'll get to it this weekend. > > > > Ok, great. I'm attaching it. Tell me if you need some > > clarification on the code. > > I ran a few timing tests. On my machine strncmp is about 100x faster > than opt_strncmp, but sSWAP (with some fixes), is about 10x faster > then using the memcpy in a recent compiler. Does this match with your > experience. Well, I've run some more exhaustive tests on my laptop (Pentium 4 @ 2 GHz, Ubuntu 7.10, gcc 4.1.3, using -O3 optlevel) with the next sa1 array: numpy.random.seed(1) nelem = 10000 a = numpy.random.rand(nelem) sa1 = a.astype('S16') And I've chosen the next benchmark: /* start: the start of data for sa1 array ss: the length of the string type (16) num: the number of elements in sa1 (10000) */ int sort_S(char *start, int ss, npy_intp num) { char *pl = start; int a = 0; npy_intp i, j; for (i=0; i b[i]) return i+1; if (a[i] < b[i]) return -(i+1); } return 0; } I get a time of 1.70 s. When using the next implementation: static int inline opt_strncmp2(char *a, char *b, intp n) { intp i; for (i=0; i0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Sat Feb 9 08:47:25 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 14:47:25 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802081831.15573.faltet@carabos.com> Message-ID: <200802091447.25281.faltet@carabos.com> A Friday 08 February 2008, Charles R Harris escrigu?: > On Feb 8, 2008 10:31 AM, Francesc Altet wrote: > > A Friday 08 February 2008, Francesc Altet escrigu?: > > > A Friday 08 February 2008, Charles R Harris escrigu?: > > > > > Also, in the context of my work in indexing, and because of > > > > > the slowness of the current implementation in NumPy, I've > > > > > ended with an implementation of the quicksort method for 1-D > > > > > array strings. For moderately large arrays, it is about > > > > > 2.5x-3x faster than the (supposedly) mergesort version in > > > > > NumPy, not only due to the quicksort, but also because I've > > > > > implemented a couple of macros for efficient string swapping > > > > > and copy. If this is of interest for NumPy developers, tell > > > > > me and I will provide the code. > > > > > > > > I have some code for this too and was going to merge it. Send > > > > yours along and I'll get to it this weekend. > > > > > > Ok, great. I'm attaching it. Tell me if you need some > > > clarification on the code. > > > > Ops. I've introduced a last-minute problem in my code. To fix > > this, just replace the flawed opt_strncmp() that I sent before by: > > > > static int inline opt_strncmp(char *a, char *b, int n) { > > int i; > > for (i=0; i > if (a[i] > b[i]) return i+1; > > if (a[i] < b[i]) return -(i+1); > > /* Another way, but seems equivalent in speed, at least here */ > > /* if (a[i] != b[i]) */ > > /* return (((unsigned char *)a)[i] - ((unsigned char > > *)b)[i]); */ } > > return 0; > > } > > > > Apparently, this version works just fine. > > Did you find this significantly faster than strncmp? There is also a > unicode compare, do you have thoughts about that? Well, for the unicode case it wouldn't be enough by replacing 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too in this regard. -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From david at ar.media.kyoto-u.ac.jp Sat Feb 9 08:41:15 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 09 Feb 2008 22:41:15 +0900 Subject: [Numpy-discussion] Getting started with numscons build system Message-ID: <47ADAD7B.6060608@ar.media.kyoto-u.ac.jp> Hi, Since numscons is now available on the trunk, more people can easily try it out. I put some basic instructions there, mainly how to get started and how to easily modify compilation flags, as well as a list of supported platforms http://scipy.org/scipy/numpy/wiki/Numcons I will add more when I will have more time. David From charlesr.harris at gmail.com Sat Feb 9 11:57:29 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 09:57:29 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802091447.25281.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802081831.15573.faltet@carabos.com> <200802091447.25281.faltet@carabos.com> Message-ID: On Feb 9, 2008 6:47 AM, Francesc Altet wrote: > A Friday 08 February 2008, Charles R Harris escrigu?: > > On Feb 8, 2008 10:31 AM, Francesc Altet wrote: > > > A Friday 08 February 2008, Francesc Altet escrigu?: > > > > A Friday 08 February 2008, Charles R Harris escrigu?: > > > > > > Also, in the context of my work in indexing, and because of > > > > > > the slowness of the current implementation in NumPy, I've > > > > > > ended with an implementation of the quicksort method for 1-D > > > > > > array strings. For moderately large arrays, it is about > > > > > > 2.5x-3x faster than the (supposedly) mergesort version in > > > > > > NumPy, not only due to the quicksort, but also because I've > > > > > > implemented a couple of macros for efficient string swapping > > > > > > and copy. If this is of interest for NumPy developers, tell > > > > > > me and I will provide the code. > > > > > > > > > > I have some code for this too and was going to merge it. Send > > > > > yours along and I'll get to it this weekend. > > > > > > > > Ok, great. I'm attaching it. Tell me if you need some > > > > clarification on the code. > > > > > > Ops. I've introduced a last-minute problem in my code. To fix > > > this, just replace the flawed opt_strncmp() that I sent before by: > > > > > > static int inline opt_strncmp(char *a, char *b, int n) { > > > int i; > > > for (i=0; i > > if (a[i] > b[i]) return i+1; > > > if (a[i] < b[i]) return -(i+1); > > > /* Another way, but seems equivalent in speed, at least here */ > > > /* if (a[i] != b[i]) */ > > > /* return (((unsigned char *)a)[i] - ((unsigned char > > > *)b)[i]); */ } > > > return 0; > > > } > > > > > > Apparently, this version works just fine. > > > > Did you find this significantly faster than strncmp? There is also a > > unicode compare, do you have thoughts about that? > > Well, for the unicode case it wouldn't be enough by replacing 'char' > by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too > in this regard. > Looks like that for Numpy. The problem I was thinking about is that for wide characters Windows C defaults to UTF16 while the Unixes default to UTF32. The C99 standard didn't specify the exact length, but Numpy seems to use (or assume) UTF32. Anyway, after doing some work to fool the optimizer and subtracting loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs 16e-9 seconds to compare strings of length 10. I've attached the program. Note that on my machine malloc appears to return zeroed memory, so the string compares always go to the end. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare.c Type: text/x-csrc Size: 630 bytes Desc: not available URL: From efiring at hawaii.edu Sat Feb 9 13:37:55 2008 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 Feb 2008 08:37:55 -1000 Subject: [Numpy-discussion] Getting started with numscons build system In-Reply-To: <47ADAD7B.6060608@ar.media.kyoto-u.ac.jp> References: <47ADAD7B.6060608@ar.media.kyoto-u.ac.jp> Message-ID: <47ADF303.2020800@hawaii.edu> David, When I try to build numscons-0.3.4 from the tarball, I get: Traceback (most recent call last): File "setup.py", line 74, in import release as R ImportError: No module named release Where should the "release" module be coming from? I've never heard of it before. Eric David Cournapeau wrote: > Hi, > > Since numscons is now available on the trunk, more people can easily > try it out. I put some basic instructions there, mainly how to get > started and how to easily modify compilation flags, as well as a list of > supported platforms > > http://scipy.org/scipy/numpy/wiki/Numcons > > I will add more when I will have more time. > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From faltet at carabos.com Sat Feb 9 13:50:38 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 19:50:38 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802091447.25281.faltet@carabos.com> Message-ID: <200802091950.38650.faltet@carabos.com> A Saturday 09 February 2008, Charles R Harris escrigu?: > > Well, for the unicode case it wouldn't be enough by replacing > > 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some > > benchmarking too in this regard. > > Looks like that for Numpy. The problem I was thinking about is that > for wide characters Windows C defaults to UTF16 while the Unixes > default to UTF32. If it were so simple ;-) The fact is that the Python crew is delivering the tarballs ready to compile with the UCS2 as default, and this applies to both UNIX and Windows. However, some Linux distributions (most in particular, Debian and derivatives), has chosen to make UCS4 the default in their Python packages. This is not a (big) problem in itself, but when it comes to writing arrays on disk and hope for portability (not only with different platforms, but also with different UCS python interpreter in the same machine!), we realized that this was a real problem (see discussion in [1]). So, NumPy had to make a decision in that regard, and Travis finally opted to only give support for the UCS4 charset in NumPy [2]. Also, he opened the door to possible UCS2 implementations in NumPy in the future, but that would be a real pain, IMHO. [1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html [2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html So, at least for the time being, you only have to worry about UCS4. > The C99 standard didn't specify the exact length, > but Numpy seems to use (or assume) UTF32. Well, I should say that UTF32 and UCS4 are names referring to the same thing, but most literature (and specially package configuration procedures) talks about UCS4. > Anyway, after doing some work to fool the optimizer and subtracting > loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs > 16e-9 seconds to compare strings of length 10. I've attached the > program. Note that on my machine malloc appears to return zeroed > memory, so the string compares always go to the end. I've seen the benchmark, and the problem is that C strncmp stops checking when it finds a \0 in the first string, while strncmp1 have to check the complete set of chars in strings. However, you won't really want to do C string comparisons with NumPy strings: In [35]: ns1 = numpy.array("as\0as") In [36]: ns2 = numpy.array("as\0bs") In [37]: ns1 == ns2 Out[37]: array(False, dtype=bool) In [38]: ns1 < ns2 Out[38]: array(True, dtype=bool) or, with Python strings, in general: In [39]: ns1 = "as\0as" In [40]: ns2 = "as\0bs" In [41]: ns1 == ns2 Out[41]: False In [42]: ns1 < ns2 Out[42]: True As you see, Python/NumPy strings are different beasts than C strings in that regard. The strings in the latter always end with a \0 (NULL) character, while in Python/NumPy the end is defined by a length property (btw, the same than in Pascal, if you know it). So, strncmp1 is not only faster than its C counterpart, but also the one doing the correct job with NumPy (unicode) strings. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From david at ar.media.kyoto-u.ac.jp Sat Feb 9 13:59:41 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 10 Feb 2008 03:59:41 +0900 Subject: [Numpy-discussion] Getting started with numscons build system In-Reply-To: <47ADF303.2020800@hawaii.edu> References: <47ADAD7B.6060608@ar.media.kyoto-u.ac.jp> <47ADF303.2020800@hawaii.edu> Message-ID: <47ADF81D.40009@ar.media.kyoto-u.ac.jp> Eric Firing wrote: > David, > > When I try to build numscons-0.3.4 from the tarball, I get: > > Traceback (most recent call last): > File "setup.py", line 74, in > import release as R > ImportError: No module named release > > Where should the "release" module be coming from? I've never heard of > it before. > > That's because I forgot to add it to the MANIFEST.in file, hence it is missing in the tarball generated by sdist distutils command... I correct the problem in the 0.3.4 branch and uploaded new tarballs (0.3.4.1) on launchpad. cheers, David From charlesr.harris at gmail.com Sat Feb 9 15:53:25 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 13:53:25 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802091950.38650.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802091447.25281.faltet@carabos.com> <200802091950.38650.faltet@carabos.com> Message-ID: On Feb 9, 2008 11:50 AM, Francesc Altet wrote: > A Saturday 09 February 2008, Charles R Harris escrigu?: > > > Well, for the unicode case it wouldn't be enough by replacing > > > 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some > > > benchmarking too in this regard. > > > > Looks like that for Numpy. The problem I was thinking about is that > > for wide characters Windows C defaults to UTF16 while the Unixes > > default to UTF32. > > If it were so simple ;-) The fact is that the Python crew is delivering > the tarballs ready to compile with the UCS2 as default, and this > applies to both UNIX and Windows. However, some Linux distributions > (most in particular, Debian and derivatives), has chosen to make UCS4 > the default in their Python packages. > > This is not a (big) problem in itself, but when it comes to writing > arrays on disk and hope for portability (not only with different > platforms, but also with different UCS python interpreter in the same > machine!), we realized that this was a real problem (see discussion in > [1]). So, NumPy had to make a decision in that regard, and Travis > finally opted to only give support for the UCS4 charset in NumPy [2]. > Also, he opened the door to possible UCS2 implementations in NumPy in > the future, but that would be a real pain, IMHO. > > > [1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html > > [2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html > > So, at least for the time being, you only have to worry about UCS4. > > > The C99 standard didn't specify the exact length, > > but Numpy seems to use (or assume) UTF32. > > Well, I should say that UTF32 and UCS4 are names referring to the same > thing, but most literature (and specially package configuration > procedures) talks about UCS4. > > > Anyway, after doing some work to fool the optimizer and subtracting > > loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs > > 16e-9 seconds to compare strings of length 10. I've attached the > > program. Note that on my machine malloc appears to return zeroed > > memory, so the string compares always go to the end. > > I've seen the benchmark, and the problem is that C strncmp stops > checking when it finds a \0 in the first string, while strncmp1 have to > check the complete set of chars in strings. However, you won't really > want to do C string comparisons with NumPy strings: > > In [35]: ns1 = numpy.array("as\0as") > > In [36]: ns2 = numpy.array("as\0bs") > > In [37]: ns1 == ns2 > Out[37]: array(False, dtype=bool) > > In [38]: ns1 < ns2 > Out[38]: array(True, dtype=bool) > > or, with Python strings, in general: > > In [39]: ns1 = "as\0as" > > In [40]: ns2 = "as\0bs" > > In [41]: ns1 == ns2 > Out[41]: False > > In [42]: ns1 < ns2 > Out[42]: True > > As you see, Python/NumPy strings are different beasts than C strings in > that regard. The strings in the latter always end with a \0 (NULL) > character, while in Python/NumPy the end is defined by a length > property (btw, the same than in Pascal, if you know it). > > So, strncmp1 is not only faster than its C counterpart, but also the one > doing the correct job with NumPy (unicode) strings. > Ah, in that case the current indirect sort for NumPy strings, which uses strncmp, is incorrect and needs to be fixed. It seems that strings with zeros are not part of the current test series ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Sat Feb 9 16:07:53 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 22:07:53 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802091950.38650.faltet@carabos.com> Message-ID: <200802092207.53479.faltet@carabos.com> A Saturday 09 February 2008, Charles R Harris escrigu?: > > So, strncmp1 is not only faster than its C counterpart, but also > > the one doing the correct job with NumPy (unicode) strings. > > Ah, in that case the current indirect sort for NumPy strings, which > uses strncmp, is incorrect and needs to be fixed. It seems that > strings with zeros are not part of the current test series ;) Yeah, that's right. And yes, it would be advisable to have at least a couple of tests having zeros interspersed throughout the string. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Sat Feb 9 16:21:35 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 14:21:35 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802092207.53479.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802091950.38650.faltet@carabos.com> <200802092207.53479.faltet@carabos.com> Message-ID: On Feb 9, 2008 2:07 PM, Francesc Altet wrote: > A Saturday 09 February 2008, Charles R Harris escrigu?: > > > So, strncmp1 is not only faster than its C counterpart, but also > > > the one doing the correct job with NumPy (unicode) strings. > > > > Ah, in that case the current indirect sort for NumPy strings, which > > uses strncmp, is incorrect and needs to be fixed. It seems that > > strings with zeros are not part of the current test series ;) > > Yeah, that's right. And yes, it would be advisable to have at least a > couple of tests having zeros interspersed throughout the string. > Like this should do: In [5]: argsort(fromstring("\0\2\0\1", dtype="|S2")) Out[5]: array([0, 1]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Sat Feb 9 16:29:07 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 22:29:07 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802092207.53479.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802092207.53479.faltet@carabos.com> Message-ID: <200802092229.07203.faltet@carabos.com> Chuck, One more thing on this. I've been doing some benchmarking with my opt_memcpy() macro in the quicksort_string function, and I should say that while it is definitely more efficient than my system memcpy for small values of n (the number of bytes to copy), this doesn't keep true for all values of n. For example, for n<16, opt_memcpy() can be more than 4x faster than system memcpy (and this is why I naively thought that it would be faster in general). However, for n>80, memcpy beats opt_memcpy between a 25% and 100% (depending on whether n is divisible by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps with Windows the behaviour can be different. I think I would be able to come up with a routine that can offer a balance between opt_memcpy and system memcpy, but that should take some time. So, until I (or anybody else) do more research on this, I think it would be safer if you use system memcpy for string sorting in NumPy. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Sat Feb 9 16:35:48 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 22:35:48 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802092207.53479.faltet@carabos.com> Message-ID: <200802092235.49065.faltet@carabos.com> A Saturday 09 February 2008, Charles R Harris escrigu?: > On Feb 9, 2008 2:07 PM, Francesc Altet wrote: > > A Saturday 09 February 2008, Charles R Harris escrigu?: > > > > So, strncmp1 is not only faster than its C counterpart, but > > > > also the one doing the correct job with NumPy (unicode) > > > > strings. > > > > > > Ah, in that case the current indirect sort for NumPy strings, > > > which uses strncmp, is incorrect and needs to be fixed. It seems > > > that strings with zeros are not part of the current test series > > > ;) > > > > Yeah, that's right. And yes, it would be advisable to have at > > least a couple of tests having zeros interspersed throughout the > > string. > > Like this should do: > > In [5]: argsort(fromstring("\0\2\0\1", dtype="|S2")) > Out[5]: array([0, 1]) Exactly, but I understand that the correct result should be: array([1, 0]) ;-) Something a bit more complex, like: In [5]: argsort(fromstring("a\0b\0\0\2a\0b\0\0\1", dtype="|S6")) Out[5]: array([1, 0]) wouldn't hurt neither. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Sat Feb 9 16:42:24 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 14:42:24 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802092229.07203.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802092207.53479.faltet@carabos.com> <200802092229.07203.faltet@carabos.com> Message-ID: On Feb 9, 2008 2:29 PM, Francesc Altet wrote: > Chuck, > > One more thing on this. I've been doing some benchmarking with my > opt_memcpy() macro in the quicksort_string function, and I should say > that while it is definitely more efficient than my system memcpy for > small values of n (the number of bytes to copy), this doesn't keep true > for all values of n. For example, for n<16, opt_memcpy() can be more > than 4x faster than system memcpy (and this is why I naively thought > that it would be faster in general). However, for n>80, memcpy beats > opt_memcpy between a 25% and 100% (depending on whether n is divisible > by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps > with Windows the behaviour can be different. > > I think I would be able to come up with a routine that can offer a > balance between opt_memcpy and system memcpy, but that should take some > time. So, until I (or anybody else) do more research on this, I think > it would be safer if you use system memcpy for string sorting in NumPy. > The memcpy in newer compilers is actually pretty good. For integers and such it sometime compiles inline using integer assignments, but I was loath to make it the default implementation until >= 4.1.x gcc became more common. However, strings might be a good place to use it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 9 16:55:52 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 14:55:52 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802092207.53479.faltet@carabos.com> <200802092229.07203.faltet@carabos.com> Message-ID: On Feb 9, 2008 2:42 PM, Charles R Harris wrote: > > > On Feb 9, 2008 2:29 PM, Francesc Altet wrote: > > > Chuck, > > > > One more thing on this. I've been doing some benchmarking with my > > opt_memcpy() macro in the quicksort_string function, and I should say > > that while it is definitely more efficient than my system memcpy for > > small values of n (the number of bytes to copy), this doesn't keep true > > for all values of n. For example, for n<16, opt_memcpy() can be more > > than 4x faster than system memcpy (and this is why I naively thought > > that it would be faster in general). However, for n>80, memcpy beats > > opt_memcpy between a 25% and 100% (depending on whether n is divisible > > by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps > > with Windows the behaviour can be different. > > > > I think I would be able to come up with a routine that can offer a > > balance between opt_memcpy and system memcpy, but that should take some > > time. So, until I (or anybody else) do more research on this, I think > > it would be safer if you use system memcpy for string sorting in NumPy. > > > > The memcpy in newer compilers is actually pretty good. For integers and > such it sometime compiles inline using integer assignments, but I was loath > to make it the default implementation until >= 4.1.x gcc became more > common. However, strings might be a good place to use it. > I'm also thinking that at some point it becomes more efficient to do a indirect sort followed by take than to move all those big strings around. But I guess we won't know where that point is until we have both versions available. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Sat Feb 9 17:19:42 2008 From: faltet at carabos.com (Francesc Altet) Date: Sat, 9 Feb 2008 23:19:42 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> Message-ID: <200802092319.43053.faltet@carabos.com> A Saturday 09 February 2008, Charles R Harris escrigu?: > I'm also thinking that at some point it becomes more efficient to do > a indirect sort followed by take than to move all those big strings > around. But I guess we won't know where that point is until we have > both versions available. I've done some experiments in that matter too. They are saying that, with the current mergesort in NumPy, an indirect sort followed by take performs similarly to direct sort for small string lengths (<=16), but indirect sort starts to win afterwards. The version with quicksort and optimized sSWAP should be between 2x and 3x times faster than current mergesort implementation, so the advantage for direct sort could grow up to somewhere between 50 and 100. A nice idea could be doing some more toughful experiments in order to find the point where an indirect sort followed by a take would be more efficient and automatically select this method beyond that point. However, this has the drawback that you have to use additional memory for keeping the indices in the indirect method. Of course, when strings are large, those indices should take a rather negligible space compared with strings itself. In any case, in some situations where space is critical, this can still be important. I don't know, but my opinion is that we shouldn't take too agressive optimizations for that matter. My vote is to document this possibility in the docstrings, so that the user wanting for extreme performance can use this approach if he wants to. Still, for string sizes greater than, say, 1000, well, an automatic selection of the indirect method is very tempting indeed. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Sat Feb 9 21:54:21 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 19:54:21 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802092319.43053.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802092319.43053.faltet@carabos.com> Message-ID: On Feb 9, 2008 3:19 PM, Francesc Altet wrote: > A Saturday 09 February 2008, Charles R Harris escrigu?: > > I'm also thinking that at some point it becomes more efficient to do > > a indirect sort followed by take than to move all those big strings > > around. But I guess we won't know where that point is until we have > > both versions available. > > I've done some experiments in that matter too. They are saying that, > with the current mergesort in NumPy, an indirect sort followed by take > performs similarly to direct sort for small string lengths (<=16), but > indirect sort starts to win afterwards. > > The version with quicksort and optimized sSWAP should be between 2x and > 3x times faster than current mergesort implementation, so the advantage > for direct sort could grow up to somewhere between 50 and 100. A nice > idea could be doing some more toughful experiments in order to find the > point where an indirect sort followed by a take would be more efficient > and automatically select this method beyond that point. > > However, this has the drawback that you have to use additional memory > for keeping the indices in the indirect method. Of course, when > strings are large, those indices should take a rather negligible space > compared with strings itself. In any case, in some situations where > space is critical, this can still be important. I don't know, but my > opinion is that we shouldn't take too agressive optimizations for that > matter. My vote is to document this possibility in the docstrings, so > that the user wanting for extreme performance can use this approach if > he wants to. Still, for string sizes greater than, say, 1000, well, an > automatic selection of the indirect method is very tempting indeed. > The strings with zeros problem runs deeper than it looked at first glance. Normal sorts don't work either, which means the type has a bad comparison function. And argsort still doesn't work even with the correct comparison function. Python, however, works as it should sorting lists of strings with zeros. So I'm going to have to track down and fix this oddity, but it is going to delay putting in the type specific quicksort for strings. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 9 23:47:40 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 21:47:40 -0700 Subject: [Numpy-discussion] How not to screw up the interface. Message-ID: Question, The current string compare doesn't work correctly with strings containing zeros, so I have replaced it in various spots with a working version. I think this should be added to the array_api to match up with PyArray_CompareUCS4. However, I notice that that the multiarray_api_order and array_api_order files are merged and the pointers in the api are at different offsets if I just add the new function to the end of the array_api_order.txt file. Is this a problem, and if so, where should I put it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 10 01:41:26 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Feb 2008 23:41:26 -0700 Subject: [Numpy-discussion] _compiled_base.c Message-ID: There is some ugly code in _compiled_base.c and what look to be duplicate functions. Do we really need it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrey.kroshko at scipy.org Sun Feb 10 06:23:59 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Sun, 10 Feb 2008 13:23:59 +0200 Subject: [Numpy-discussion] numerical noise for simple calcululations Message-ID: <47AEDECF.1020905@scipy.org> hi all, I need a good estimation of noise value for simple calculations. I.e. when I calculate something like sin(15)+cos(80) I get a solution with precision, for example, 1e-11. I guess the precision depends on system arch, isn't it? So what's the best way to estimate the value? I guess here should be something like 10*numpy.machine_precision, isn't it? Regards, D. From tim.hochberg at ieee.org Sun Feb 10 13:18:07 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Sun, 10 Feb 2008 11:18:07 -0700 Subject: [Numpy-discussion] numerical noise for simple calcululations In-Reply-To: <47AEDECF.1020905@scipy.org> References: <47AEDECF.1020905@scipy.org> Message-ID: On Sun, Feb 10, 2008 at 4:23 AM, dmitrey wrote: > hi all, > I need a good estimation of noise value for simple calculations. > > I.e. when I calculate something like sin(15)+cos(80) I get a solution > with precision, for example, 1e-11. > > I guess the precision depends on system arch, isn't it? > > So what's the best way to estimate the value? > > I guess here should be something like 10*numpy.machine_precision, isn't > it? This is a complicated subject, which I'm really not qualified to comment on, but I'm not going to let that stop me. I believe that you want to know how accurate something like the above is given exact inputs. That is a somewhat artificial problem, but I'll answer it to the best of my ability. Functions like sin, cos, +, etc can in theory compute there result to within on ULP, or maybe half an ULP (I can't recall exactly). An ULP is a Unit in the Last Place. To explain an ULP, let's pretend that we were using decimal floating point with 3 digits of precision and look at a couple of numbers: 1.03e-03 --> 1 ULP = 1e-5 3.05e+02 --> 1 ULP = 1 We're obviously not using decimal floating point, we're using binary floating point, but the basic idea is the same. The result is that the accuracy is going to totally depend on the magnitude of the result. If the result is small, in general the result will be more accurate in an absolute sense, although not generally in a relative sense. In practice, this is drastically oversimplified since the inputs are generally of finite accuracy. Different functions will either magnify or shrink the input error depending on both the function and the value of the input. If you can find an easy to read introduction to numerical analysis, it would probably help. Unfortunately, I don't know of a good one to recommend; the text I have is a pretty hard slog. To complicate this further, functions don't always compute there results to maximum theoretical accuracy; presumably in the interest of reasonable performance. So, in the end the answer is; it depends. In practice the only useful, simple advice I've seen to get a handle on accuracy is to compute results using at least two different precisions and verify that things are converging sensibly. And compare to known results wherever possible. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrey.kroshko at scipy.org Sun Feb 10 14:23:47 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Sun, 10 Feb 2008 21:23:47 +0200 Subject: [Numpy-discussion] numerical noise for simple calcululations In-Reply-To: References: <47AEDECF.1020905@scipy.org> Message-ID: <47AF4F43.20702@scipy.org> I need just a single number "in avarage". I have committed some changes to NLP/NSP ralg solver from scikits.openopt, for non-noisy funcs it works better, but for noisy funcs vise versa, hence now my examples/nssolveVSfsolve.py doesn't work as it should be, so I need to implement "noise" parameter and assing a default value to the one. So, the question is: what default value should be here? I was thinking of either 0 or something like K*numpy.machine_precesion, where K is something like 1...10...100. Regards, D. Timothy Hochberg wrote: > > > On Sun, Feb 10, 2008 at 4:23 AM, dmitrey > wrote: > > hi all, > I need a good estimation of noise value for simple calculations. > > I.e. when I calculate something like sin(15)+cos(80) I get a solution > with precision, for example, 1e-11. > > I guess the precision depends on system arch, isn't it? > > So what's the best way to estimate the value? > > I guess here should be something like 10*numpy.machine_precision, > isn't it? > > > This is a complicated subject, which I'm really not qualified to > comment on, but I'm not going to let that stop me. I believe that you > want to know how accurate something like the above is given exact > inputs. That is a somewhat artificial problem, but I'll answer it to > the best of my ability. > > Functions like sin, cos, +, etc can in theory compute there result to > within on ULP, or maybe half an ULP (I can't recall exactly). An ULP > is a Unit in the Last Place. To explain an ULP, let's pretend that we > were using decimal floating point with 3 digits of precision and look > at a couple of numbers: > > 1.03e-03 --> 1 ULP = 1e-5 > 3.05e+02 --> 1 ULP = 1 > > We're obviously not using decimal floating point, we're using binary > floating point, but the basic idea is the same. The result is that the > accuracy is going to totally depend on the magnitude of the result. If > the result is small, in general the result will be more accurate in an > absolute sense, although not generally in a relative sense. > > In practice, this is drastically oversimplified since the inputs are > generally of finite accuracy. Different functions will either magnify > or shrink the input error depending on both the function and the value > of the input. If you can find an easy to read introduction to > numerical analysis, it would probably help. Unfortunately, I don't > know of a good one to recommend; the text I have is a pretty hard slog. > > To complicate this further, functions don't always compute there > results to maximum theoretical accuracy; presumably in the interest of > reasonable performance. > > So, in the end the answer is; it depends. In practice the only useful, > simple advice I've seen to get a handle on accuracy is to compute > results using at least two different precisions and verify that things > are converging sensibly. And compare to known results wherever possible. > > > -- > . __ > . |-\ > . > . tim.hochberg at ieee.org > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Sun Feb 10 14:50:52 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 10 Feb 2008 19:50:52 +0000 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None Message-ID: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> Hi, I just noticed this: >From the sort method docstring: axis : integer Axis to be sorted along. None indicates that the flattened array should be used. Default is -1. In [40]: import numpy as N In [41]: a = N.arange(10) In [42]: N.sort(a, None) Out[42]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [43]: a.sort(None) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/mb312/ in () TypeError: an integer is required Perhaps the sort method is calling the c code directly, and this is not checking for axis=None? Matthew From matthew.brett at gmail.com Sun Feb 10 18:15:27 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 10 Feb 2008 23:15:27 +0000 Subject: [Numpy-discussion] Setting contents of buffer for array object Message-ID: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> Hi, I am sorry if I have missed something obvious, but is there any way in python of doing this: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Thanks a lot for any pointers. Matthew From robert.kern at gmail.com Sun Feb 10 18:51:05 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 10 Feb 2008 17:51:05 -0600 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> Message-ID: <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> On Feb 10, 2008 5:15 PM, Matthew Brett wrote: > Hi, > > I am sorry if I have missed something obvious, but is there any way in > python of doing this: > > import numpy as np > a = np.arange(10) > b = np.arange(10)+1 > a.data = b.data # raises error, but I hope you see what I mean > > ? Not really, no. Can you describe your use case in more detail? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthew.brett at gmail.com Sun Feb 10 19:48:36 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 11 Feb 2008 00:48:36 +0000 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> Message-ID: <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> > > import numpy as np > > a = np.arange(10) > > b = np.arange(10)+1 > > a.data = b.data # raises error, but I hope you see what I mean > > > > ? > > Not really, no. Can you describe your use case in more detail? Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean(): def median(a, axis=0, dtype=None, out=None) (axis=0 to change to axis=None default at some point). To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed. Matthew From robert.kern at gmail.com Sun Feb 10 20:08:09 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 10 Feb 2008 19:08:09 -0600 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> Message-ID: <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> On Feb 10, 2008 6:48 PM, Matthew Brett wrote: > > > import numpy as np > > > a = np.arange(10) > > > b = np.arange(10)+1 > > > a.data = b.data # raises error, but I hope you see what I mean > > > > > > ? > > > > Not really, no. Can you describe your use case in more detail? > > Yes - I am just writing the new median implementation. To allow > future optimization, I would like to have the same signature as > mean(): > > def median(a, axis=0, dtype=None, out=None) > > (axis=0 to change to axis=None default at some point). > > To do this, I need to copy the results of the median calculation in > the routine into the array object given by 'out' - when passed. Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't "copy" anything at all; otherwise, "median(x, out=out)" is no better than "out[:] = median(x)". Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency. Can you show us the current implementation that you have? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthew.brett at gmail.com Sun Feb 10 20:17:02 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 11 Feb 2008 01:17:02 +0000 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> Message-ID: <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> > Ah, I see. You definitely do not want to reassign the .data buffer in > this case. An out= parameter does not reassign the memory location > that the array object points to. It should use the allocated memory > that was already there. It shouldn't "copy" anything at all; > otherwise, "median(x, out=out)" is no better than "out[:] = > median(x)". Personally, I don't think that a function should expose an > out= parameter unless if it can make good on that promise of memory > efficency. I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though. > Can you show us the current implementation that you have? is attached, comments welcome... Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: mymedian.py Type: text/x-python Size: 2493 bytes Desc: not available URL: From brad.malone at gmail.com Sun Feb 10 20:43:05 2008 From: brad.malone at gmail.com (Brad Malone) Date: Sun, 10 Feb 2008 17:43:05 -0800 Subject: [Numpy-discussion] non-contiguous array error Message-ID: Hi, I am receiving a "AttributeError: incompatible shape for a non-contiguous array" error. A quick illustration of the type of code that gives me the error is shown below: -------------------------------------------- from numpy import * list=[i for i in range(0,27)] c=array(list) c.shape=(3,3,3) d=fft.fftn(c) d.shape=(27) -------------------------------------------- I suppose this has something to do with the fact that the fourier transform of c has imaginary parts, which affects the way the information is stored in memory, and this messed up the call to .shape? Is there another way to do this, or will I need to rewrite this section of my code? Thanks for all you do, Brad From robert.kern at gmail.com Sun Feb 10 21:04:41 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 10 Feb 2008 20:04:41 -0600 Subject: [Numpy-discussion] non-contiguous array error In-Reply-To: References: Message-ID: <3d375d730802101804h6b965384x72df79a25ef49948@mail.gmail.com> On Feb 10, 2008 7:43 PM, Brad Malone wrote: > Hi, I am receiving a "AttributeError: incompatible shape for a > non-contiguous array" error. A quick illustration of the type of code > that gives me the error is shown below: > -------------------------------------------- > from numpy import * > list=[i for i in range(0,27)] > c=array(list) > c.shape=(3,3,3) > d=fft.fftn(c) > d.shape=(27) > > -------------------------------------------- > > I suppose this has something to do with the fact that the fourier > transform of c has imaginary parts, which affects the way the > information is stored in memory, and this messed up the call to > .shape? No. The problem is that "(27)" is not a tuple. Parentheses are also used for grouping expressions in Python, so a single-element tuple needs a comma to disambiguate. You want "d.shape = (27,)". -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Feb 10 21:07:15 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 10 Feb 2008 20:07:15 -0600 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> Message-ID: <3d375d730802101807q5238bb89ua324078235065d6e@mail.gmail.com> On Feb 10, 2008 7:17 PM, Matthew Brett wrote: > > Ah, I see. You definitely do not want to reassign the .data buffer in > > this case. An out= parameter does not reassign the memory location > > that the array object points to. It should use the allocated memory > > that was already there. It shouldn't "copy" anything at all; > > otherwise, "median(x, out=out)" is no better than "out[:] = > > median(x)". Personally, I don't think that a function should expose an > > out= parameter unless if it can make good on that promise of memory > > efficency. > > I agree - but there are more efficient median algorithms out there > which can make use of the memory efficiently. I wanted to establish > the call signature to allow that. I don't feel strongly about it > though. I say add the out= parameter when you use such an algorithm. But if you like, just use slice assignment for now. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From eads at lanl.gov Sun Feb 10 22:28:06 2008 From: eads at lanl.gov (Damian R. Eads) Date: Sun, 10 Feb 2008 20:28:06 -0700 (MST) Subject: [Numpy-discussion] Setting contents of buffer for array object Message-ID: <15833.128.165.0.81.1202700486.squirrel@webmail.lanl.gov> Matthew Brett wrote: >>> import numpy as np >>> a = np.arange(10) >>> b = np.arange(10)+1 >>> a.data = b.data # raises error, but I hope you see what I mean >>> >>> ? >> Not really, no. Can you describe your use case in more detail? > > Yes - I am just writing the new median implementation. To allow > future optimization, I would like to have the same signature as > mean(): > > def median(a, axis=0, dtype=None, out=None) > > (axis=0 to change to axis=None default at some point). > > To do this, I need to copy the results of the median calculation in > the routine into the array object given by 'out' - when passed. My understanding of numerical routines that accept an "out" parameter is that this is a convention for in-place algorithms. When None is passed in the out parameter, it's the caller's way of indicating that in-place is not needed, and a new array is allocated to store the result; otherwise, the result is stored in the 'out' array. Either way, the result is returned. One can break from this convention by allocating more memory than provided by the out array but that's a performance issue that may or may not be unavoidable. Remember that A[:] = sets the value of the elements in A to the values of array elements in the expression expr, and this copying is done in-place. To copy an array C, and make the copy contiguous, use the .copy() method on C. Assigning the .data buffers is not something I have seen before in non-constructor (or npn=pseudo-constructor like from_buffer) code. I think it might even be dangerous if you don't do it right. If one does not properly recalculate the strides of A, slicing operations on A may not behave as expected. If this is library code, reassigning the .data buffer can confuse the user, since it messes up array view semantics. Suppose I'm an ignorant user and I write the following code: A=numpy.random.rand(10,20) dummy_input=numpy.random.rand(10,20) B=A.T C=B[0::-1,:] then I use a library function foo (suppose foo accepts an input array inp and an output array out, and assigns out.data to something else) foo(in=dummy_input, out=B) Now, A and B point to two different .data buffers, B's base points to A, and C's base points to B but A and C share the same .data buffer. As a user, I may expect B and C to be a view of A (certainly B isn't), and C to be a view of B (which is verified by checking 'C.base is B') but changing C's values changes A's but not B's. That's confusing. Also, suppose B's new data buffer has less elements than its original data buffer. I may be clever and set B's size and strides attributes accordingly but changing C's values might cause the manipulation of undefined memory. Damian From charlesr.harris at gmail.com Sun Feb 10 23:44:11 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 10 Feb 2008 21:44:11 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802081329.35470.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> Message-ID: On Feb 8, 2008 5:29 AM, Francesc Altet wrote: > Hi, > > I'm a bit confused that the sort method of a string character doesn't > allow a mergesort: > > >>> s = numpy.empty(10, "S10") > >>> s.sort(kind="merge") > TypeError: desired sort not supported for this type > > However, by looking at the numpy sources, it seems that the only > implemented method for sorting array strings is "merge" (I presume > because it is stable). So, perhaps the message above should be fixed. > The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use. > > Also, in the context of my work in indexing, and because of the slowness > of the current implementation in NumPy, I've ended with an > implementation of the quicksort method for 1-D array strings. For > moderately large arrays, it is about 2.5x-3x faster than the > (supposedly) mergesort version in NumPy, not only due to the quicksort, > but also because I've implemented a couple of macros for efficient > string swapping and copy. If this is of interest for NumPy developers, > tell me and I will provide the code. > I've now got a string/ucs4 specific argsort(kind='q'), the string version of which is about 40% faster than the old default and about 10% faster than the mergesort version, but the string/ucs4 specific versions of sort aren't yet fairing as well. I'm timing things with In [1]: import timeit In [2]: t = timeit.Timer("np.fromstring(np.empty(10000).tostring(),dtype='|S8').sort(kind='q')","import numpy as np") In [3]: t.repeat(3,100) Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195] That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Mon Feb 11 00:27:56 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 11 Feb 2008 00:27:56 -0500 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> Message-ID: On 10/02/2008, Matthew Brett wrote: > > Ah, I see. You definitely do not want to reassign the .data buffer in > > this case. An out= parameter does not reassign the memory location > > that the array object points to. It should use the allocated memory > > that was already there. It shouldn't "copy" anything at all; > > otherwise, "median(x, out=out)" is no better than "out[:] = > > median(x)". Personally, I don't think that a function should expose an > > out= parameter unless if it can make good on that promise of memory > > efficency. > > I agree - but there are more efficient median algorithms out there > which can make use of the memory efficiently. I wanted to establish > the call signature to allow that. I don't feel strongly about it > though. This is a startling claim! Are there really median algorithms that are faster for having the use of a single float as storage space? If it were permissible to mutilate the original array in-place, I can certainly see a good median algorithm (based on quicksort, perhaps) being faster, but modifying the input array is a different question from using an output array. I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a "blocked" iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help? Anne From robert.kern at gmail.com Mon Feb 11 03:21:46 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Feb 2008 02:21:46 -0600 Subject: [Numpy-discussion] New bug with "setup,py develop" Message-ID: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> I've just updated the SVN trunk to get the latest numscons merge. Something broke the support I put in for the setuptools "develop" command. In order to make sure that setuptools' "develop" works with numpy.distutils' "build_src", we override the "develop" command to reinitialize the "build_src" command to add the --inplace option. This used to work as of r4772, but now any Fortran Extensions have the generated sources added twice. This causes links to fail since the same symbol shows up twice. David, any ideas? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Mon Feb 11 03:38:49 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Feb 2008 02:38:49 -0600 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> Message-ID: <3d375d730802110038x60c0ee84r2e6510851ad35b9a@mail.gmail.com> On Feb 11, 2008 2:21 AM, Robert Kern wrote: > I've just updated the SVN trunk to get the latest numscons merge. > Something broke the support I put in for the setuptools "develop" > command. In order to make sure that setuptools' "develop" works with > numpy.distutils' "build_src", we override the "develop" command to > reinitialize the "build_src" command to add the --inplace option. This > used to work as of r4772, but now any Fortran Extensions have the > generated sources added twice. Spoke too soon. It fails with r4772, too. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Mon Feb 11 03:40:06 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Feb 2008 01:40:06 -0700 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> Message-ID: On Feb 11, 2008 1:21 AM, Robert Kern wrote: > I've just updated the SVN trunk to get the latest numscons merge. > Something broke the support I put in for the setuptools "develop" > command. In order to make sure that setuptools' "develop" works with > numpy.distutils' "build_src", we override the "develop" command to > reinitialize the "build_src" command to add the --inplace option. This > used to work as of r4772, but now any Fortran Extensions have the > generated sources added twice. This causes links to fail since the > same symbol shows up twice. > While we're talking build, how do I set the compiler flags? Numpy here always compiles with -march=i386, which seems a bit conservative. My environment flags are also ignored, but I assume there is someway of getting the compile to behave. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 11 04:31:29 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 11 Feb 2008 09:31:29 +0000 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> Message-ID: <1e2af89e0802110131n520da3b7w7142e056cc588ed3@mail.gmail.com> Hi, > I can also see that this could possibly be improved by using a for > loop to iterate over the output elements, so that there was no need to > duplicate the large input array, or perhaps a "blocked" iteration that > duplicated arrays of modest size would be better. But how can a single > float per data set whose median is being taken help? Sorry, you are right to call me on this very sloppy late-night phrasing - I only meant that it would be useful in due course to use a C implementation for median such as the ones you're describing, and that this could write the result directly into the in-place memory - in the same way that mean() does. It's quite true that it's difficult to imagine the algorithm itself benefiting from the memory buffer. Thanks, Matthew From cournape at gmail.com Mon Feb 11 04:49:19 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 11 Feb 2008 18:49:19 +0900 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <3d375d730802110038x60c0ee84r2e6510851ad35b9a@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <3d375d730802110038x60c0ee84r2e6510851ad35b9a@mail.gmail.com> Message-ID: <5b8d13220802110149y397bc899k99a3c44ee49d8063@mail.gmail.com> On Feb 11, 2008 5:38 PM, Robert Kern wrote: > On Feb 11, 2008 2:21 AM, Robert Kern wrote: > > I've just updated the SVN trunk to get the latest numscons merge. > > Something broke the support I put in for the setuptools "develop" > > command. In order to make sure that setuptools' "develop" works with > > numpy.distutils' "build_src", we override the "develop" command to > > reinitialize the "build_src" command to add the --inplace option. This > > used to work as of r4772, but now any Fortran Extensions have the > > generated sources added twice. > > Spoke too soon. It fails with r4772, too. Does it mean that it was already broken before numscons merge or not ? Nobody should be changed when numpy is built using setup.py, so any problem here is something I should fix ASAP. Supporting setuptools with numscons (using setupscons.py) is not a priority for me, though. David From faltet at carabos.com Mon Feb 11 04:58:59 2008 From: faltet at carabos.com (Francesc Altet) Date: Mon, 11 Feb 2008 10:58:59 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> Message-ID: <200802111059.00095.faltet@carabos.com> A Monday 11 February 2008, Charles R Harris escrigu?: > On Feb 8, 2008 5:29 AM, Francesc Altet wrote: > > Hi, > > > > I'm a bit confused that the sort method of a string character > > doesn't > > > > allow a mergesort: > > >>> s = numpy.empty(10, "S10") > > >>> s.sort(kind="merge") > > > > TypeError: desired sort not supported for this type > > > > However, by looking at the numpy sources, it seems that the only > > implemented method for sorting array strings is "merge" (I presume > > because it is stable). So, perhaps the message above should be > > fixed. > > The only available method is the default quicksort. The mergesort is > for argsort and was put in for lexsort to use. That's good to know. However, I'm curious about where it is the specific quicksort implementation for strings/unicode. I've had a look at _sortmodule.c.src, but I can only find a quicksort implementation for: /**begin repeat #TYPE=BOOL,BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE,CFLOAT,CDOUBLE,CLONGDOUBLE# **/ Where are the STRING/UNICODE versions? > > Also, in the context of my work in indexing, and because of the > > slowness of the current implementation in NumPy, I've ended with an > > implementation of the quicksort method for 1-D array strings. For > > moderately large arrays, it is about 2.5x-3x faster than the > > (supposedly) mergesort version in NumPy, not only due to the > > quicksort, but also because I've implemented a couple of macros for > > efficient string swapping and copy. If this is of interest for > > NumPy developers, tell me and I will provide the code. > > I've now got a string/ucs4 specific argsort(kind='q'), the string > version of which is about 40% faster than the old default and about > 10% faster than the mergesort version, but the string/ucs4 specific > versions of sort aren't yet fairing as well. I'm timing things with > > In [1]: import timeit > > In [2]: t = > timeit.Timer("np.fromstring(np.empty(10000).tostring(),dtype='|S8').s >ort(kind='q')","import numpy as np") > > In [3]: t.repeat(3,100) > Out[3]: [0.22127485275268555, 0.21282196044921875, > 0.21273088455200195] > > That's with the current sort(kind='q') in svn, which uses the new > string compare function but is otherwise the old default quicksort. > The new string specific version of quicksort I'm testing is actually > a bit slower than that. Both versions correctly sort the array. So > I'm going to continue to experiment a bit until I see what is going > on. The version you are testing is your own or the one that I provided? Here are the timings for my laptop: In [32]: a = np.random.rand(10000).astype('S8') In [33]: %timeit a.copy().sort() # original sort in NumPy 10 loops, best of 3: 16.8 ms per loop In [34]: %timeit newqsort(a.copy()) # My own qsort implementation 100 loops, best of 3: 4.29 ms per loop (I'm using a random string array here mainly because I use the sort in my system NumPy, with the old string compare. However, as all the contents in strings are not NULL chars, the performance should be comparable, bar a few percent of improvement). So, my newqsort still seems to run almost 4x faster than the one in NumPy (you know, using the old string compare). However, when using a server with an Opteron processor, the view is much different: In [55]: a = np.random.rand(10000).astype('S8') In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron). On its hand, newqsort seems to perform reasonably well in both. I don't know what exactly is the reason for this (I don't know where it is the code for the original quicksort for strings, so I can't do a visual comparison), but it would be great if we can discover it! Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From cournape at gmail.com Mon Feb 11 05:05:41 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 11 Feb 2008 19:05:41 +0900 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <5b8d13220802110149y397bc899k99a3c44ee49d8063@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <3d375d730802110038x60c0ee84r2e6510851ad35b9a@mail.gmail.com> <5b8d13220802110149y397bc899k99a3c44ee49d8063@mail.gmail.com> Message-ID: <5b8d13220802110205p323381e4pf563155047bd3eaa@mail.gmail.com> On Feb 11, 2008 6:49 PM, David Cournapeau wrote: > On Feb 11, 2008 5:38 PM, Robert Kern wrote: > > On Feb 11, 2008 2:21 AM, Robert Kern wrote: > > > I've just updated the SVN trunk to get the latest numscons merge. > > > Something broke the support I put in for the setuptools "develop" > > > command. In order to make sure that setuptools' "develop" works with > > > numpy.distutils' "build_src", we override the "develop" command to > > > reinitialize the "build_src" command to add the --inplace option. This > > > used to work as of r4772, but now any Fortran Extensions have the > > > generated sources added twice. > > > > Spoke too soon. It fails with r4772, too. > > Does it mean that it was already broken before numscons merge or not ? > Nobody should be changed when numpy is built using setup.py, so any ^^^^^^ I meant something, not nobody, here, obviously. David From faltet at carabos.com Mon Feb 11 06:06:36 2008 From: faltet at carabos.com (Francesc Altet) Date: Mon, 11 Feb 2008 12:06:36 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802111059.00095.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802111059.00095.faltet@carabos.com> Message-ID: <200802111206.36675.faltet@carabos.com> A Monday 11 February 2008, Francesc Altet escrigu?: > A Monday 11 February 2008, Charles R Harris escrigu?: > > That's with the current sort(kind='q') in svn, which uses the new > > string compare function but is otherwise the old default quicksort. > > The new string specific version of quicksort I'm testing is > > actually a bit slower than that. Both versions correctly sort the > > array. So I'm going to continue to experiment a bit until I see > > what is going on. > > The version you are testing is your own or the one that I provided? > Here are the timings for my laptop: > > In [32]: a = np.random.rand(10000).astype('S8') > > In [33]: %timeit a.copy().sort() # original sort in NumPy > 10 loops, best of 3: 16.8 ms per loop > > In [34]: %timeit newqsort(a.copy()) # My own qsort implementation > 100 loops, best of 3: 4.29 ms per loop > > (I'm using a random string array here mainly because I use the sort > in my system NumPy, with the old string compare. However, as all the > contents in strings are not NULL chars, the performance should be > comparable, bar a few percent of improvement). > > So, my newqsort still seems to run almost 4x faster than the one in > NumPy (you know, using the old string compare). > > However, when using a server with an Opteron processor, the view is > much different: > > In [55]: a = np.random.rand(10000).astype('S8') > > In [56]: %timeit a.copy().sort() > 100 loops, best of 3: 3.82 ms per loop > > In [57]: %timeit newqsort(a.copy()) > 100 loops, best of 3: 3.29 ms per loop > > Here, the difference in performance has been reduced to a mere 15% > (still favouring newqsort). So, with this, it seems like the > performance of the original sorting in NumPy only suffers a lot when > running in old processors (eg. Pentium 4), while the performance is > reasonable with newer ones (Opteron). On its hand, newqsort seems to > perform reasonably well in both. I don't know what exactly is the > reason for this (I don't know where it is the code for the original > quicksort for strings, so I can't do a visual comparison), but it > would be great if we can discover it! Mmm, comparing my new strncmp and the one that you have implemented in SVN, I've found a difference that can account for part of the difference in performances. With your version of strncmp in SVN (compare_string), these are my timings with the Opteron server: In [17]: np.random.seed(1) In [18]: a = np.random.rand(10000).astype('S8') In [19]: %timeit a.copy().sort() 100 loops, best of 3: 3.86 ms per loop In [20]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.44 ms per loop which gives times a 5% worse. Try to use my version and tell me if it does better: static int inline opt_strncmp(char *a, char *b, size_t n) { size_t i; unsigned char c, d; for (i = 0; i < n; i++) { c = a[i]; d = b[i]; if (c != d) return c - d; } return 0; } Although a 5% is maybe too little improvement :-/ -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From peridot.faceted at gmail.com Mon Feb 11 09:22:02 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 11 Feb 2008 09:22:02 -0500 Subject: [Numpy-discussion] Setting contents of buffer for array object In-Reply-To: <1e2af89e0802110131n520da3b7w7142e056cc588ed3@mail.gmail.com> References: <1e2af89e0802101515u2d64f01fh6e065b0e2eeb22eb@mail.gmail.com> <3d375d730802101551ub3ac4c7icb48ee4b85d61940@mail.gmail.com> <1e2af89e0802101648x6fac3cbbt4c21ac3552a0465b@mail.gmail.com> <3d375d730802101708i432f690ckff385cb6568bd120@mail.gmail.com> <1e2af89e0802101717x5c3aa2bar6f8f43ab6981ccce@mail.gmail.com> <1e2af89e0802110131n520da3b7w7142e056cc588ed3@mail.gmail.com> Message-ID: On 11/02/2008, Matthew Brett wrote: > > I can also see that this could possibly be improved by using a for > > loop to iterate over the output elements, so that there was no need to > > duplicate the large input array, or perhaps a "blocked" iteration that > > duplicated arrays of modest size would be better. But how can a single > > float per data set whose median is being taken help? > > Sorry, you are right to call me on this very sloppy late-night > phrasing - I only meant that it would be useful in due course to use a > C implementation for median such as the ones you're describing, and > that this could write the result directly into the in-place memory - > in the same way that mean() does. It's quite true that it's difficult > to imagine the algorithm itself benefiting from the memory buffer. My point was not to catch you in an error - goodness knows I make enough of those, and not only late at night! - but to point out that there may not really be much need for an output argument. Even with a C code, for the median to be of much use, the output array can be at most half the size of the input array. The extra storage space required is not that big a concern, unlike a ufunc, and including an output argument forces you to deal with all sorts of data conversion issues. On the other hand, there is something to be said for allowing the code to destroy the input array. Perhaps *that* should be an optional argument (defaulting to zero)? Anne From matthew.yeomans at gmail.com Mon Feb 11 09:32:06 2008 From: matthew.yeomans at gmail.com (matthew yeomans) Date: Mon, 11 Feb 2008 15:32:06 +0100 Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 17, Issue 20 In-Reply-To: References: Message-ID: <4ff732450802110632v69cfdc52od16da5022fc1c86b@mail.gmail.com> > > matthew yeomans wrote: > > Thanks I been trying to compile a code that uses random,pylab and > > numpy with py2exe > > the code of setup.py(compiles mycode.py into mycode.exe) follows > > > > #Start here > > from distutils.core import setup > > import py2exe > > import pylab > > import numpy > > import glob > > import scipy > > import random > > import os > > > > setup( console=['mycode.py'],options={'py2exe': > > {"skip_archive":1,'packages':['matplotlib','pytz']',}},data_files=[ > matplotlib.get_py2exe_datafiles()]) > > > > #End here > > > > It works well for codes that uses pylab only. But It i add more > > modules i get trouble > > > > Is there any good books on how to use py2exe? > > Matthew: > > I've only used py2exe with a couple of imports (numpy and os). You're > importing an awful lot. Slimming this down to only import the things > you really need from each module may help, e.g. > > from numpy import foo > from random import bar > etc. > > # Steve I tried so, I reduced everything example from pylab import plot from pylab import axis .... and so on I think I noticed is that when i call from numpy import array it confuses array from numpy to array module. If there a way how to tell py2exe that it requires array from numpy and not module array? thanks for all the help. Matthew Yeomans -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 11 09:47:13 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Feb 2008 07:47:13 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802111059.00095.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802111059.00095.faltet@carabos.com> Message-ID: On Feb 11, 2008 2:58 AM, Francesc Altet wrote: > A Monday 11 February 2008, Charles R Harris escrigu?: > > On Feb 8, 2008 5:29 AM, Francesc Altet wrote: > > > Hi, > > > > > > I'm a bit confused that the sort method of a string character > > > doesn't > > > > > > allow a mergesort: > > > >>> s = numpy.empty(10, "S10") > > > >>> s.sort(kind="merge") > > > > > > TypeError: desired sort not supported for this type > > > > > > However, by looking at the numpy sources, it seems that the only > > > implemented method for sorting array strings is "merge" (I presume > > > because it is stable). So, perhaps the message above should be > > > fixed. > > > > The only available method is the default quicksort. The mergesort is > > for argsort and was put in for lexsort to use. > > That's good to know. However, I'm curious about where it is the > specific quicksort implementation for strings/unicode. I've had a look > at _sortmodule.c.src, but I can only find a quicksort implementation > for: > The default is the C qsort, it is called from PyArray_Sort in multiarraymodule.c. The type specific sorts, when they exist, are also called from there through _new_sort. To see what type specific sorts are registered, look at the end of _sortmodule.c.src. You can write a new sort, and if it isn't registered it won't be used. So commenting out the registration is a good way to get back to the default. > > > The version you are testing is your own or the one that I provided? > Here are the timings for my laptop: > They turned out to be essentially identical, except I used len instead of ss ;) I also used inlined functions for the copy and swap as they are safer with the arguments. Comparison with the macro versions showed no difference there. > > In [55]: a = np.random.rand(10000).astype('S8') > > In [56]: %timeit a.copy().sort() > 100 loops, best of 3: 3.82 ms per loop > > In [57]: %timeit newqsort(a.copy()) > 100 loops, best of 3: 3.29 ms per loop > > Here, the difference in performance has been reduced to a mere 15% > (still favouring newqsort). So, with this, it seems like the > performance of the original sorting in NumPy only suffers a lot when > running in old processors (eg. Pentium 4), while the performance is > reasonable with newer ones (Opteron). It could also depend on the C library that comes with the compiler. I'm running on a E6600, but numpy compiles everything for the i386, which might make a difference also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 11 10:08:49 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Feb 2008 08:08:49 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802111206.36675.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802111059.00095.faltet@carabos.com> <200802111206.36675.faltet@carabos.com> Message-ID: On Feb 11, 2008 4:06 AM, Francesc Altet wrote: > A Monday 11 February 2008, Francesc Altet escrigu?: > > A Monday 11 February 2008, Charles R Harris escrigu?: > > Mmm, comparing my new strncmp and the one that you have implemented in > SVN, I've found a difference that can account for part of the > difference in performances. With your version of strncmp in SVN > (compare_string), these are my timings with the Opteron server: > > > In [17]: np.random.seed(1) > > In [18]: a = np.random.rand(10000).astype('S8') > > In [19]: %timeit a.copy().sort() > 100 loops, best of 3: 3.86 ms per loop > > In [20]: %timeit newqsort(a.copy()) > 100 loops, best of 3: 3.44 ms per loop > > which gives times a 5% worse. Try to use my version and tell me if it > does better: > > static int inline > opt_strncmp(char *a, char *b, size_t n) > { > size_t i; > unsigned char c, d; > for (i = 0; i < n; i++) { > c = a[i]; d = b[i]; > if (c != d) return c - d; > } > return 0; > } > I didn't notice any speed difference. And while returning the difference of two unsigned numbers should work with modular arithmetic when it is cast to integer, I thought the explicit return based on a compare was clearer and safer. Comparisons always work. I've attached my working _sortmodule.c.src file so you can fool with these different changes on your machines also. This is on top of current svn. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: _sortmodule.c.src Type: application/x-wais-source Size: 19053 bytes Desc: not available URL: From ivilata at carabos.com Mon Feb 11 11:17:12 2008 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Mon, 11 Feb 2008 17:17:12 +0100 Subject: [Numpy-discussion] [ANN] Release of the second PyTables video Message-ID: <20080211161712.GC17180@tardis.terramar.selidor.net> ====================================== Release of the second PyTables video ====================================== Carabos [1]_ is happy to announce the second of a series of videos dedicated to introducing the main features of PyTables to the public in a visual and easy to grasp manner: http://www.carabos.com/videos/pytables-2-tables PyTables [2]_ is a Free/Open Source package designed to handle massive amounts of data in a simple, but highly efficient way, using the HDF5 file format and NumPy data containers. .. [1] http://www.carabos.com/ .. [2] http://www.pytables.org/ Our second video explains how to work with tables, PyTables' main data container. It shows how to: * describe the structure of a table * create a table * iterate over a table * access tables by blocks * handle big tables * query a table The video is only 15 minutes long, so you can watch it while you enjoy a nice cup of coffee. If you are used to SQL databases, you may also be interested in the introduction to tables at http://www.pytables.org/moin/HintsForSQLUsers You can also see more on table queries in the latest video about ViTables (our PyTables GUI) at http://www.carabos.com/videos/vitables-2-queries More videos about PyTables will be published in the near future, so stay tuned on www.pytables.org for further announcements. We would like to hear your opinion on the video so we can do it better the next time. We are also open to suggestions for the topics of future videos. You can contact us at pytables at carabos.com. Best regards, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 307 bytes Desc: Digital signature URL: From robert.kern at gmail.com Mon Feb 11 12:27:56 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Feb 2008 11:27:56 -0600 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <5b8d13220802110149y397bc899k99a3c44ee49d8063@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <3d375d730802110038x60c0ee84r2e6510851ad35b9a@mail.gmail.com> <5b8d13220802110149y397bc899k99a3c44ee49d8063@mail.gmail.com> Message-ID: <3d375d730802110927u27dd4ca5p9846a32e6068053a@mail.gmail.com> On Feb 11, 2008 3:49 AM, David Cournapeau wrote: > On Feb 11, 2008 5:38 PM, Robert Kern wrote: > > On Feb 11, 2008 2:21 AM, Robert Kern wrote: > > > I've just updated the SVN trunk to get the latest numscons merge. > > > Something broke the support I put in for the setuptools "develop" > > > command. In order to make sure that setuptools' "develop" works with > > > numpy.distutils' "build_src", we override the "develop" command to > > > reinitialize the "build_src" command to add the --inplace option. This > > > used to work as of r4772, but now any Fortran Extensions have the > > > generated sources added twice. > > > > Spoke too soon. It fails with r4772, too. > > Does it mean that it was already broken before numscons merge or not ? Yes. I think I figured out the problem. "python setup.py develop" works fine. However, I ran into the problem when I added "develop" to the end of a command line that already had "build". So essentially, "build_src" gets run twice. I'm not sure there is anything that can be done about that given distutils' handling of options. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lou_boog2000 at yahoo.com Mon Feb 11 13:53:40 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 11 Feb 2008 10:53:40 -0800 (PST) Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: Message-ID: <726858.37449.qm@web34405.mail.mud.yahoo.com> I will be writing some C code that I will compile into a shared library (.so) on my MacOSX computer to use with ctypes. That code will be calling code from a (big) scientific numerical library (Gnu Scientific Library - GSL) to crunch the numbers. But I don't see how I incorporate that code into the .so file so my shared code can get to it when I call it from Python with ctypes. I do _not_ want to make the GSL callable from Python, only from my own C module. I suspect this isn't a ctypes question in particular. I'm hoping to avoid having to tur the whole GSL into a shared library and loading it just to use a few functions. Or avoid having to track down which functions my code will call (all the way down the trees) and rip that out to add to my own shared lib. There's got to be a better way to make use of big, useful libraries when speeding up python with shared lib extension. I hope. Maybe there are ways to do this using a gcc or g++ option. Right now my make file is simply gcc - bundle -flat_namespace -undefined suppress -o mycode.so mycode.o gcc -c mycode.c -o mycode.o Any hints appreciated. I will continue googling. Nothing so far. Thanks. -- Lou Pecora, my views are my own. --------------------------------- Never miss a thing. Make Yahoo your homepage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Mon Feb 11 15:15:10 2008 From: faltet at carabos.com (Francesc Altet) Date: Mon, 11 Feb 2008 21:15:10 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802111206.36675.faltet@carabos.com> Message-ID: <200802112115.11255.faltet@carabos.com> A Monday 11 February 2008, Charles R Harris escrigu?: > I've attached my working _sortmodule.c.src file so you can fool with > these different changes on your machines also. This is on top of > current svn. Ok. In order to compare pears with pears, I've decided to create a standalone program in C (attached), based on your version (yes, it is almost the same that the one that I came up with). This also allows to run it quickly in as many platforms as possible. The compiler throws some warnings, but they are not important (I think). Here are the results of running it in several platforms: 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.440000 NumPy newqsort: 0.650000 2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) An Opteron server: SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.640000 C qsort with Python style compare: 0.600000 NumPy newqsort: 0.590000 Some of the conclusions that can be drawn: * C qsort performs pretty badly on my Pentium4 laptop with Ubuntu * C qsort on Win on my laptop performs very similar to newqsort * newqsort performs much better on my Ubuntu Linux than in Windows * On Opteron, C qsort and newqsort do perform very similarly * and most importantly, newqsort runs faster in *all* platforms So, provided the last conclusion, I think it is safe to check newqsort in NumPy (unless something catastrofic might occur on other platforms). Finally, a couple of small things: * MSVC doesn't swallow the "inline" qualifier. So we should remove it and hope that most of NumPy installations will be compiled -O3 at least. * I'd definitely keep memcpy by default. From my timings, it looks like the best option for all platforms. I hope the benchmark will behave well in your platform too (i.e. newqsort will perform the best ;) Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" -------------- next part -------------- A non-text attachment was scrubbed... Name: sort-string-bench.c Type: text/x-csrc Size: 4979 bytes Desc: not available URL: From charlesr.harris at gmail.com Mon Feb 11 16:07:47 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Feb 2008 14:07:47 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802112115.11255.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802111206.36675.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> Message-ID: On Feb 11, 2008 1:15 PM, Francesc Altet wrote: > A Monday 11 February 2008, Charles R Harris escrigu?: > > I've attached my working _sortmodule.c.src file so you can fool with > > these different changes on your machines also. This is on top of > > current svn. > > Ok. In order to compare pears with pears, I've decided to create a > standalone program in C (attached), based on your version (yes, it is > almost the same that the one that I came up with). This also allows to > run it quickly in as many platforms as possible. The compiler throws > some warnings, but they are not important (I think). > > Here are the results of running it in several platforms: > > 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 2.450000 > C qsort with Python style compare: 2.440000 > NumPy newqsort: 0.650000 > Wow, what a difference. > > 2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.971000 > C qsort with Python style compare: 0.962000 > NumPy newqsort: 0.921000 > > 3) An Opteron server: SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.640000 > C qsort with Python style compare: 0.600000 > NumPy newqsort: 0.590000 > > Some of the conclusions that can be drawn: > > * C qsort performs pretty badly on my Pentium4 laptop with Ubuntu > * C qsort on Win on my laptop performs very similar to newqsort > * newqsort performs much better on my Ubuntu Linux than in Windows > * On Opteron, C qsort and newqsort do perform very similarly > * and most importantly, newqsort runs faster in *all* platforms > > So, provided the last conclusion, I think it is safe to check newqsort > in NumPy (unless something catastrofic might occur on other platforms). > > Finally, a couple of small things: > > * MSVC doesn't swallow the "inline" qualifier. So we should remove it > and hope that most of NumPy installations will be compiled -O3 at > least. > I was afraid of that. The inline keyword is a fairly new standard; gcc has had it for a while but the older versions of MSVC didn't. I don't know if the newer MSVC versions do. IIRC, there was another way to get MSVC to inline. Of course, we could go to C++ :0) > * I'd definitely keep memcpy by default. From my timings, it looks like > the best option for all platforms. > OK. Was that just for the copies, or was it for the swaps also? I ran a version of swap using memcpy on my machine and the sort was about half as fast for 8 character strings. > > I hope the benchmark will behave well in your platform too (i.e. > newqsort will perform the best ;) > I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at shrogers.com Mon Feb 11 21:02:57 2008 From: steve at shrogers.com (Steven H. Rogers) Date: Mon, 11 Feb 2008 19:02:57 -0700 Subject: [Numpy-discussion] py2exe (was Numpy-discussion Digest, Vol 17, Issue 20) In-Reply-To: <4ff732450802110632v69cfdc52od16da5022fc1c86b@mail.gmail.com> References: <4ff732450802110632v69cfdc52od16da5022fc1c86b@mail.gmail.com> Message-ID: <47B0FE51.6080900@shrogers.com> matthew yeomans wrote: > > I tried so, I reduced everything example > > from pylab import plot > from pylab import axis > .... and so on > > > I think I noticed is that when i call > from numpy import array > > it confuses array from numpy to array module. > > If there a way how to tell py2exe that it requires array from numpy > and not module array? > > > thanks for all the help. > > Matthew Yeomans Matthew: First, you've already been asked not to reply to the Digest, but to use a meaningful Subject. Second, this really sounds like a py2exe issue, not a NumPy issue, so you should check the py2exe wiki and mailing list. # Steve From cournape at gmail.com Mon Feb 11 21:10:57 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 12 Feb 2008 11:10:57 +0900 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> Message-ID: <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> On Feb 11, 2008 5:40 PM, Charles R Harris wrote: > > > > On Feb 11, 2008 1:21 AM, Robert Kern wrote: > > > I've just updated the SVN trunk to get the latest numscons merge. > > Something broke the support I put in for the setuptools "develop" > > command. In order to make sure that setuptools' "develop" works with > > numpy.distutils' "build_src", we override the "develop" command to > > reinitialize the "build_src" command to add the --inplace option. This > > used to work as of r4772, but now any Fortran Extensions have the > > generated sources added twice. This causes links to fail since the > > same symbol shows up twice. > > > > While we're talking build, how do I set the compiler flags? Numpy here > always compiles with -march=i386, which seems a bit conservative. My > environment flags are also ignored, but I assume there is someway of getting > the compile to behave. Well, you assumed wrong :) It is one of the reasons why I started working on scons support. It is extremely ackward to control compilation flags with the python distutils from stdlib, and by design (flags are added at several different locations, which depend on the platform, and understanding the exact logic is beyond mere mortal capability). You can use CFLAGS with distutils, but I don't really understand what's happening in this case: it seems that they are appended to the current compilation flags, but the order also seems to depend on the architecture. Sometimes they are inserted toward the end of compilation flags. With numscons, CFLAGS override the optimization/warning flags (but not -fPIC, etc... which are necessary to build the code; internally, optimization/debug/warning/necessary flags are separated). One example: with numpy.distutils, compiling for debug is not trivial: CFLAGS="-DDEBUG" python setup.py build will still compile with -O2, and with -DNDEBUG. You need to add -O0 to override the default O2 used by distutils (at least for gcc, when several options of the same kind are given, the last one wins), and you have problems for options which cannot be overriden by later options. With numscons: CFLAGS="-DDEBUG" will compile with only this option (and -fPIC, etc... on linux). I intend to add more options on the UI, but anway, they will be all overridable (for example, only overriding warning flags, etc...). cheers, David From charlesr.harris at gmail.com Mon Feb 11 22:42:33 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Feb 2008 20:42:33 -0700 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> Message-ID: On Feb 11, 2008 7:10 PM, David Cournapeau wrote: > On Feb 11, 2008 5:40 PM, Charles R Harris > wrote: > > > > > > > > On Feb 11, 2008 1:21 AM, Robert Kern wrote: > > > > > I've just updated the SVN trunk to get the latest numscons merge. > > > Something broke the support I put in for the setuptools "develop" > > > command. In order to make sure that setuptools' "develop" works with > > > numpy.distutils' "build_src", we override the "develop" command to > > > reinitialize the "build_src" command to add the --inplace option. This > > > used to work as of r4772, but now any Fortran Extensions have the > > > generated sources added twice. This causes links to fail since the > > > same symbol shows up twice. > > > > > > > While we're talking build, how do I set the compiler flags? Numpy here > > always compiles with -march=i386, which seems a bit conservative. My > > environment flags are also ignored, but I assume there is someway of > getting > > the compile to behave. > Well, you assumed wrong :) For some reason I was hoping you would pipe up :0) Yeah, that in itself is a good reason for trying something like scons. I note that with -O2 -finline-functions, or -O3, I can knock almost 30% off the string sort times. That's a lot better than I can do fooling around with the code. It is one of the reasons why I started > working on scons support. It is extremely ackward to control > compilation flags with the python distutils from stdlib, and by design > (flags are added at several different locations, which depend on the > platform, and understanding the exact logic is beyond mere mortal > capability). Thanks Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournapeau at cslab.kecl.ntt.co.jp Mon Feb 11 23:24:50 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Tue, 12 Feb 2008 13:24:50 +0900 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> Message-ID: <1202790290.29123.11.camel@bbc8> On Mon, 2008-02-11 at 20:42 -0700, Charles R Harris wrote: > > > On Feb 11, 2008 7:10 PM, David Cournapeau wrote: > > On Feb 11, 2008 5:40 PM, Charles R Harris > wrote: > > > > > > > > On Feb 11, 2008 1:21 AM, Robert Kern > wrote: > > > > > I've just updated the SVN trunk to get the latest numscons > merge. > > > Something broke the support I put in for the setuptools > "develop" > > > command. In order to make sure that setuptools' "develop" > works with > > > numpy.distutils' "build_src", we override the "develop" > command to > > > reinitialize the "build_src" command to add the --inplace > option. This > > > used to work as of r4772, but now any Fortran Extensions > have the > > > generated sources added twice. This causes links to fail > since the > > > same symbol shows up twice. > > > > > > > While we're talking build, how do I set the compiler flags? > Numpy here > > always compiles with -march=i386, which seems a bit > conservative. My > > environment flags are also ignored, but I assume there is > someway of getting > > the compile to behave. > > Well, you assumed wrong :) > > For some reason I was hoping you would pipe up :0) Yeah, that in > itself is a good reason for trying something like scons. To be more exact: you can simply add flags in distutils. It is just that there is not much logic to handle different cases (e.g. having different set of warnings for pyrex vs swig vs normal C extensions is difficult). > I note that with -O2 -finline-functions, or -O3, I can knock almost > 30% off the string sort times. That's a lot better than I can do > fooling around with the code. You should be able to do it just with numpy.distutils: CFLAGS="-O2 -finline-functions" setup.py build What should be easier with numscons at some point is a fine grained control. For example, the basic configuration is set in files similar to site.cfg right now, and I intend to add the possibility to set your own file in a near future (you can look for the file numscons/core/compiler.cfg to see how I do things now, in numscons sources). If that's the kind of things you are interested in, do not hesitate to try thing with numscons and tell me what's missing/what could be improved. I am mostly focused on getting things working by default on all platforms right now, but I am interested by inputs from other people with different needs. cheers, David From eads at soe.ucsc.edu Tue Feb 12 00:50:30 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Mon, 11 Feb 2008 22:50:30 -0700 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <726858.37449.qm@web34405.mail.mud.yahoo.com> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> Message-ID: <47B133A6.20102@soe.ucsc.edu> Dear Lou, You may want to try using distutils or setuputils, which makes compiling extensions much easier. It does the hard work of finding out which flags are needed to compile extensions on the host platform. There are many examples on the web on how to use distutils to build C extensions (http://docs.python.org/ext/building.html). PyGSL interfaces with numpy, and it may have what you need. The trouble with calling GSL directly from ctypes is GSL's input and output is very structure-oriented, which complicates the python/C logic. Like you mentioned, GSL is pretty big, and you'd like to avoid loading it in its entirety. Even if you write your own shared library module that links against GSL, GSL might still get loaded in its entirety when your own shared library gets loaded. There is a mode argument with C's dlopen and the ctypes.CDLL function. dlopen supports a RTLD_LAZY flag (see man dlopen), which you can try passing to ctypes.CDLL, but I'm not sure what will happen since the ctypes documentation makes no mention of it. Here is a quick example of how to call a C function from python using ctypes. /** Sum num_elements numbers and return the result. **/ extern int myfunc(int *numbers, int num_elements) { int i, sum = 0; for (i = 0; i < num_elements; i++) { sum += numbers[i]; } return sum; } # First compile the source $ gcc -DNDEBUG -O2 -g -pipe -Wall -fstack-protector -D_GNU_SOURCE -fPIC -I/usr/lib/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c foo.c -o foo.o # Now link it. $ gcc -pthread -shared foo.o -L/usr/lib -lpython2.5 -o foo.so # Now run it in python. $ ipython In [1]: import numpy, ctypes # Load the shared library we just created In [2]: foo=ctypes.CDLL('./foo.so') # Create a numpy array of ints In [3]: A=numpy.array([1,2,3],dtype='int') # Set the return type we expect. In [4]: foo.myfunc.restype = ctypes.c_int # Call the C function. In [5]: print foo.myfunc(A.ctypes.data, ctypes.c_int(3)) 6 I hope this helps. Damian Lou Pecora wrote: > I will be writing some C code that I will compile into a shared library > (.so) on my MacOSX computer to use with ctypes. That code will be > calling code from a (big) scientific numerical library (Gnu Scientific > Library - GSL) to crunch the numbers. But I don't see how I incorporate > that code into the .so file so my shared code can get to it when I call > it from Python with ctypes. I do _not_ want to make the GSL callable > from Python, only from my own C module. I suspect this isn't a ctypes > question in particular. I'm hoping to avoid having to tur the whole GSL > into a shared library and loading it just to use a few functions. Or > avoid having to track down which functions my code will call (all the > way down the trees) and rip that out to add to my own shared lib. > There's got to be a better way to make use of big, useful libraries when > speeding up python with shared lib extension. I hope. > > Maybe there are ways to do this using a gcc or g++ option. Right now my > make file is simply > > gcc - bundle -flat_namespace -undefined suppress -o mycode.so mycode.o > > gcc -c mycode.c -o mycode.o > > Any hints appreciated. I will continue googling. Nothing so far. Thanks. > > > > > -- Lou Pecora, my views are my own. > > ------------------------------------------------------------------------ > Never miss a thing. Make Yahoo your homepage. > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From Garry.Willgoose at newcastle.edu.au Tue Feb 12 00:52:51 2008 From: Garry.Willgoose at newcastle.edu.au (Garry Willgoose) Date: Tue, 12 Feb 2008 16:52:51 +1100 Subject: [Numpy-discussion] f2py: sharing F90 module data between modules Message-ID: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> I have a suite of fortran modules that I want to wrap with f2py independently (so they appear to python as seperate imports) but where each module has access to another fortran module (which contains global data that is shared between the suite of fortran modules). I currently compile all the fortran modules, including the global data, in a single f2py statement so all the fortran code gets imported in a single statement import tsim where tsim is the python name of the aggregated modules. What I would like to do is as below, where fm1, fm2 are two example modules that will be called from python, and fm3 has the global data shared between fm1, fm2. module fm1 contains subroutine modify(myvalue) use fm3 real :: myvalue value=myvalue end subroutine modify real function result() use fm3 result=value end function result end module fm1 module fm2 contains subroutine modify(myvalue) use fm3 real, intent(in) :: myvalue value=myvalue end subroutine modify real function result() use fm3 result=value end function result end module fm2 module fm3 real :: value=3.0 end module fm3 The result of running the following script is import fm1,fm2 result1=fm1.fm1.result() result2=fm2.fm2.result() print 'original',result1,result2 x=10.0 fm1.fm1.modify(x) result1=fm1.fm1.result() result2=fm2.fm2.result() print 'fm1 allocate',result1,result2 x=20.0 fm2.fm2.modify(x) result1=fm1.fm1.result() result2=fm2.fm2.result() print 'fm2 allocate',result1,result2 is original 3.0 3.0 fm1 allocate 10.0 3.0 fm2 allocate 10.0 20.0 which clearly shows that fm1, and fm2 have independent images of fm3 (I vaguely recall some time back finding docs that said that f2py/ python loaded each imported into seperate memory and common/modules couldn't be shared between python modules). As I mentioned if fm1,fm2,fm3 are all compiled in a single f2py statement (so they are in a single python module) then this works fine. I guess the question is there any way that I can get fm3 to be shared between fm1 and fm2? The reasons for wanting to do this are because I'm developing a plug-in like architecture for environmental modelling where the user can develop new fortran modules (suitably f2py'ed) that can just be dropped into the module search path but still have access to the global data (subject to fortran module interfaces, etc). ==================================================================== Prof Garry Willgoose, Australian Professorial Fellow in Environmental Engineering, Director, Centre for Climate Impact Management (C2IM), School of Engineering, The University of Newcastle, Callaghan, 2308 Australia. Centre webpage: www.c2im.org.au Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri PM-Mon) FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and Telluric) Env. Engg. Secretary: (International) +61 2 4921 6042 email: garry.willgoose at newcastle.edu.au; g.willgoose at telluricresearch.com email-for-life: garry.willgoose at alum.mit.edu personal webpage: www.telluricresearch.com/garry ==================================================================== "Do not go where the path may lead, go instead where there is no path and leave a trail" Ralph Waldo Emerson ==================================================================== From cournapeau at cslab.kecl.ntt.co.jp Tue Feb 12 01:05:55 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Tue, 12 Feb 2008 15:05:55 +0900 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B133A6.20102@soe.ucsc.edu> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> Message-ID: <1202796356.29123.14.camel@bbc8> On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > Dear Lou, > > You may want to try using distutils or setuputils, which makes compiling > extensions much easier. It does the hard work of finding out which flags > are needed to compile extensions on the host platform. There are many > examples on the web on how to use distutils to build C extensions > (http://docs.python.org/ext/building.html). Unfortunately, this does not work. Distutils only knows how to build python extensions, not shared libraries. Depending on the platform, this is not the same thing, and mac os X is such a platform where both are not the same. cheers, David From eads at soe.ucsc.edu Tue Feb 12 01:14:06 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Mon, 11 Feb 2008 23:14:06 -0700 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <1202796356.29123.14.camel@bbc8> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> Message-ID: <47B1392E.8050607@soe.ucsc.edu> David Cournapeau wrote: > On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: >> Dear Lou, >> >> You may want to try using distutils or setuputils, which makes compiling >> extensions much easier. It does the hard work of finding out which flags >> are needed to compile extensions on the host platform. There are many >> examples on the web on how to use distutils to build C extensions >> (http://docs.python.org/ext/building.html). > > Unfortunately, this does not work. Distutils only knows how to build > python extensions, not shared libraries. Depending on the platform, this > is not the same thing, and mac os X is such a platform where both are > not the same. > > cheers, > > David Really? distutils generates .so files for me, which I assume are shared libraries. FYI: I'm running Fedora 8 on an x86. Does distutils not generate a shared library on a mac? Damian From robert.kern at gmail.com Tue Feb 12 01:29:17 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Feb 2008 00:29:17 -0600 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B1392E.8050607@soe.ucsc.edu> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> Message-ID: <3d375d730802112229r198185a5ya645a963a6b8a484@mail.gmail.com> On Feb 12, 2008 12:14 AM, Damian Eads wrote: > David Cournapeau wrote: > > On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > >> Dear Lou, > >> > >> You may want to try using distutils or setuputils, which makes compiling > >> extensions much easier. It does the hard work of finding out which flags > >> are needed to compile extensions on the host platform. There are many > >> examples on the web on how to use distutils to build C extensions > >> (http://docs.python.org/ext/building.html). > > > > Unfortunately, this does not work. Distutils only knows how to build > > python extensions, not shared libraries. Depending on the platform, this > > is not the same thing, and mac os X is such a platform where both are > > not the same. > > > > cheers, > > > > David > > Really? distutils generates .so files for me, which I assume are shared > libraries. FYI: I'm running Fedora 8 on an x86. Does distutils not > generate a shared library on a mac? Python extension modules are shared libraries, yes. But they must follow a particular format, namely exposing a correct "init" function. distutils/setuptools only maked Python extension modules, not arbitrary shared libraries. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue Feb 12 01:33:12 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Feb 2008 00:33:12 -0600 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B1392E.8050607@soe.ucsc.edu> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> Message-ID: <3d375d730802112233m3544b03ej6696ab0cad95d745@mail.gmail.com> On Feb 12, 2008 12:14 AM, Damian Eads wrote: > David Cournapeau wrote: > > On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > >> Dear Lou, > >> > >> You may want to try using distutils or setuputils, which makes compiling > >> extensions much easier. It does the hard work of finding out which flags > >> are needed to compile extensions on the host platform. There are many > >> examples on the web on how to use distutils to build C extensions > >> (http://docs.python.org/ext/building.html). > > > > Unfortunately, this does not work. Distutils only knows how to build > > python extensions, not shared libraries. Depending on the platform, this > > is not the same thing, and mac os X is such a platform where both are > > not the same. > > > > cheers, > > > > David > > Really? distutils generates .so files for me, which I assume are shared > libraries. FYI: I'm running Fedora 8 on an x86. Does distutils not > generate a shared library on a mac? As to David's point, yes, distutils makes a .so shared library on Macs. This is not the same thing as a dynamic library (on Macs) which is what ctypes needs (on Macs), IIRC. There is a subtle, but important difference between the two. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From eads at soe.ucsc.edu Tue Feb 12 01:41:55 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Mon, 11 Feb 2008 23:41:55 -0700 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <3d375d730802112229r198185a5ya645a963a6b8a484@mail.gmail.com> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> <3d375d730802112229r198185a5ya645a963a6b8a484@mail.gmail.com> Message-ID: <47B13FB3.4000703@soe.ucsc.edu> Robert Kern wrote: > On Feb 12, 2008 12:14 AM, Damian Eads wrote: >> David Cournapeau wrote: >>> On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: >>>> Dear Lou, >>>> >>>> You may want to try using distutils or setuputils, which makes compiling >>>> extensions much easier. It does the hard work of finding out which flags >>>> are needed to compile extensions on the host platform. There are many >>>> examples on the web on how to use distutils to build C extensions >>>> (http://docs.python.org/ext/building.html). >>> Unfortunately, this does not work. Distutils only knows how to build >>> python extensions, not shared libraries. Depending on the platform, this >>> is not the same thing, and mac os X is such a platform where both are >>> not the same. >>> >>> cheers, >>> >>> David >> Really? distutils generates .so files for me, which I assume are shared >> libraries. FYI: I'm running Fedora 8 on an x86. Does distutils not >> generate a shared library on a mac? > > Python extension modules are shared libraries, yes. But they must > follow a particular format, namely exposing a correct > "init" function. distutils/setuptools only maked Python > extension modules, not arbitrary shared libraries. Perhaps I was a bit too liberal in my use of the term "extension module". Several small libraries for a project at work do not define the standard init function, and yet they build with distutils. I can load them into ctypes without any hitches. Perhaps distutils does not check for the presence of the init function and required data structures? I'll admit I may be abusing distutils by using it for something for which it wasn't designed. Damian From robert.kern at gmail.com Tue Feb 12 01:54:17 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Feb 2008 00:54:17 -0600 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B13FB3.4000703@soe.ucsc.edu> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> <3d375d730802112229r198185a5ya645a963a6b8a484@mail.gmail.com> <47B13FB3.4000703@soe.ucsc.edu> Message-ID: <3d375d730802112254q29e824c8hfffb898ca66dc9ad@mail.gmail.com> On Feb 12, 2008 12:41 AM, Damian Eads wrote: > Robert Kern wrote: > > On Feb 12, 2008 12:14 AM, Damian Eads wrote: > >> David Cournapeau wrote: > >>> On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > >>>> Dear Lou, > >>>> > >>>> You may want to try using distutils or setuputils, which makes compiling > >>>> extensions much easier. It does the hard work of finding out which flags > >>>> are needed to compile extensions on the host platform. There are many > >>>> examples on the web on how to use distutils to build C extensions > >>>> (http://docs.python.org/ext/building.html). > >>> Unfortunately, this does not work. Distutils only knows how to build > >>> python extensions, not shared libraries. Depending on the platform, this > >>> is not the same thing, and mac os X is such a platform where both are > >>> not the same. > >>> > >>> cheers, > >>> > >>> David > >> Really? distutils generates .so files for me, which I assume are shared > >> libraries. FYI: I'm running Fedora 8 on an x86. Does distutils not > >> generate a shared library on a mac? > > > > Python extension modules are shared libraries, yes. But they must > > follow a particular format, namely exposing a correct > > "init" function. distutils/setuptools only maked Python > > extension modules, not arbitrary shared libraries. > > Perhaps I was a bit too liberal in my use of the term "extension > module". Several small libraries for a project at work do not define the > standard init function, and yet they build with distutils. I > can load them into ctypes without any hitches. Perhaps distutils does > not check for the presence of the init function and required > data structures? I'll admit I may be abusing distutils by using it for > something for which it wasn't designed. Yup. It usually works on Linux because the init bit isn't checked. On Windows, it is, and the build will fail. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournapeau at cslab.kecl.ntt.co.jp Tue Feb 12 02:05:16 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Tue, 12 Feb 2008 16:05:16 +0900 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B1392E.8050607@soe.ucsc.edu> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> Message-ID: <1202799916.29123.27.camel@bbc8> On Mon, 2008-02-11 at 23:14 -0700, Damian Eads wrote: > David Cournapeau wrote: > > On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > >> Dear Lou, > >> > >> You may want to try using distutils or setuputils, which makes compiling > >> extensions much easier. It does the hard work of finding out which flags > >> are needed to compile extensions on the host platform. There are many > >> examples on the web on how to use distutils to build C extensions > >> (http://docs.python.org/ext/building.html). > > > > Unfortunately, this does not work. Distutils only knows how to build > > python extensions, not shared libraries. Depending on the platform, this > > is not the same thing, and mac os X is such a platform where both are > > not the same. > > > > cheers, > > > > David > > Really? distutils generates .so files for me, which I assume are shared > libraries. That's correct. But python extensions are not shared libraries; more exactly: some systems make the different between libraries which are loaded when the executable is launched (dynamic linking), and libraries which are loaded dynamically (dynamic loading, through dlopen), possible in the middle of the execution. All python extensions fall in the later category. It happens that on Linux (and most unices I know, mac os x being an exception), those are the same. But on mac os x (and windows), those are different: that's why you have .so and .dylib. dlopen does not really exist on mac os X, in the sense that it is a wrapper around the native linker/loader tools (and dlopen is not 100 % complete: you cannot unload/reload extensions I think on mac os X). Some of the differences are namespace (mac os X has the notion of a namespace for libraries, which I know nothing about; unices traditionally have a flat namespace for libraries), etc... T David From cournapeau at cslab.kecl.ntt.co.jp Tue Feb 12 02:12:45 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Tue, 12 Feb 2008 16:12:45 +0900 Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <1202799916.29123.27.camel@bbc8> References: <726858.37449.qm@web34405.mail.mud.yahoo.com> <47B133A6.20102@soe.ucsc.edu> <1202796356.29123.14.camel@bbc8> <47B1392E.8050607@soe.ucsc.edu> <1202799916.29123.27.camel@bbc8> Message-ID: <1202800365.29123.30.camel@bbc8> On Tue, 2008-02-12 at 16:05 +0900, David Cournapeau wrote: > On Mon, 2008-02-11 at 23:14 -0700, Damian Eads wrote: > > David Cournapeau wrote: > > > On Mon, 2008-02-11 at 22:50 -0700, Damian Eads wrote: > > >> Dear Lou, > > >> > > >> You may want to try using distutils or setuputils, which makes compiling > > >> extensions much easier. It does the hard work of finding out which flags > > >> are needed to compile extensions on the host platform. There are many > > >> examples on the web on how to use distutils to build C extensions > > >> (http://docs.python.org/ext/building.html). > > > > > > Unfortunately, this does not work. Distutils only knows how to build > > > python extensions, not shared libraries. Depending on the platform, this > > > is not the same thing, and mac os X is such a platform where both are > > > not the same. > > > > > > cheers, > > > > > > David > > > > Really? distutils generates .so files for me, which I assume are shared > > libraries. > > That's correct. But python extensions are not shared libraries; more > exactly: some systems make the different between libraries which are > loaded when the executable is launched (dynamic linking), and libraries > which are loaded dynamically (dynamic loading, through dlopen), possible > in the middle of the execution. All python extensions fall in the later > category. > > It happens that on Linux (and most unices I know, mac os x being an > exception), those are the same. But on mac os x (and windows), those are > different: that's why you have .so and .dylib. dlopen does not really > exist on mac os X, in the sense that it is a wrapper around the native > linker/loader tools (and dlopen is not 100 % complete: you cannot > unload/reload extensions I think on mac os X). Some of the differences > are namespace (mac os X has the notion of a namespace for libraries, > which I know nothing about; unices traditionally have a flat namespace > for libraries), etc... > If you are interested in knowing the difference between bundle and dynamic libraries (mac os X), here is some info: http://www.finkproject.org/doc/porting/porting.en.html#shared.lib-and-mod David From faltet at carabos.com Tue Feb 12 03:58:44 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 12 Feb 2008 09:58:44 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> Message-ID: <200802120958.44829.faltet@carabos.com> A Monday 11 February 2008, Charles R Harris escrigu?: > On Feb 11, 2008 1:15 PM, Francesc Altet wrote: > > Here are the results of running it in several platforms: > > > > 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 2.450000 > > C qsort with Python style compare: 2.440000 > > NumPy newqsort: 0.650000 > > Wow, what a difference. Yeah. This is why I got so excited initially. It's unfortunate that most of the speed-up in newqsort in this case is probably due to a possible flaw in qsort implementation for the combination Ubuntu/Pentium4. On the positive side, it is nice to see that other distros/processors have a decent performance on system qsort ;-) > > * I'd definitely keep memcpy by default. From my timings, it looks > > like the best option for all platforms. > > OK. Was that just for the copies, or was it for the swaps also? I ran > a version of swap using memcpy on my machine and the sort was about > half as fast for 8 character strings. No, only for the copies. For the swaps, this memcpy-based version: #define swap_string(s1, s2, len) { \ memcpy((vp), (s2), (len)); \ memcpy((s2), (s1), (len)); \ memcpy((s1), (vp), (len)); \ } performs extremely bad on my systems: Pentium4/Ubuntu 7.10: * newqsort with the loop version of swap_string: 0.65 s * newqsort with the memcpy version of swap_string: 9.14 s Opteron/SuSe LE 10.3: * newqsort with the loop version of swap_string: 0.59 s * newqsort with the memcpy version of swap_string: 8.71 s So, it seems that the nice newqsort performance is extremely dependent on the loop version of swap_string. > > I hope the benchmark will behave well in your platform too (i.e. > > newqsort will perform the best ;) > > I'll check it out when I get home. As I say, it was running about 10% > slower on my machine, but if it does better on most platforms it is > probably the way to go. We can always change it in the future when > everyone is running on quantum computers. Quantum computers? Oh, I can't wait for mine ;-) -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From dmitrey.kroshko at scipy.org Tue Feb 12 05:36:03 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Tue, 12 Feb 2008 12:36:03 +0200 Subject: [Numpy-discussion] asfarray() drops precision (float128->float64) - is it correct? Message-ID: <47B17693.8000000@scipy.org> As for me, it yields lots of inconveniences (lots of my code should be rewritten, since I didn't know it before): from numpy import * a = array((1.0, 2.0), float128) b=asfarray(a) type(a[0]) # type(b[0]) # __version__ '1.0.5.dev4767' Shouldn't it be changed? (I.e. let's left 128). As for me I use asfarray() very often since I don't know does user provide arrays as numpy ndarray or matrix or Python list/tuple. D. From pearu at cens.ioc.ee Tue Feb 12 06:10:53 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Tue, 12 Feb 2008 13:10:53 +0200 (EET) Subject: [Numpy-discussion] f2py: sharing F90 module data between modules In-Reply-To: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> References: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> Message-ID: <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> On Tue, February 12, 2008 7:52 am, Garry Willgoose wrote: > I have a suite of fortran modules that I want to wrap with f2py > independently (so they appear to python as seperate imports) but > where each module has access to another fortran module (which > contains global data that is shared between the suite of fortran > modules). I currently compile all the fortran modules, including the > global data, in a single f2py statement so all the fortran code gets > imported in a single statement The source of this issue boils down to http://bugs.python.org/issue521854 according to which makes your goal unachivable because of how Python loads shared libraries *by default*, see below. > I guess the question is there any way that I can get fm3 to be shared > between fm1 and fm2? The reasons for wanting to do this are because > I'm developing a plug-in like architecture for environmental > modelling where the user can develop new fortran modules (suitably > f2py'ed) that can just be dropped into the module search path but > still have access to the global data (subject to fortran module > interfaces, etc). The link above also gives an hint how to resolve this issue. Try to use sys.setdlopenflags(...) before importing f2py generated extension modules and then reset the state using sys.setdlopenflags(0). See http://docs.python.org/lib/module-sys.html for more information how to find proper value for ... HTH, Pearu From matthew.brett at gmail.com Tue Feb 12 06:31:17 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Feb 2008 11:31:17 +0000 Subject: [Numpy-discussion] Median advice Message-ID: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> Hi, I'm just doing the median implementation. This is just to pull together a couple of threads, about what the call signature, behavior should be. We've already established that we're going to leave the default behavior of taking the median over the first axis, even though it differs from that of max, min, mean etc - for compatibility - and change to axis=None (flatten) behavior around v1.1 to harmonize with the others. Suggestion 1: def median(a, axis=0, out=None) (same signature as max, min etc) Problem - there is no memory saving implementation in median at the moment for the out= argument. It's possible to imagine one, but it would be a little annoying to code. Should we put this in the signature, behavior without memory-saving in order to make space for a memory saving implementation in the future? Suggestion 2: def median(a, axis=0, scratch_input=False) Here, according to a suggestion by Anne A - we could allow the routine to use the input matrix a as scratch space to calculate the median, saving memory if the user does not need to preserve the values of a. a would be in an undefined state on return. Any votes, thoughts?. Matthew From Joris.DeRidder at ster.kuleuven.be Tue Feb 12 07:33:36 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Tue, 12 Feb 2008 13:33:36 +0100 Subject: [Numpy-discussion] Median advice In-Reply-To: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> References: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> Message-ID: <72D87DAD-8AD5-4CBB-94A4-5DCD99DCCE05@ster.kuleuven.be> On 12 Feb 2008, at 12:31, Matthew Brett wrote: > def median(a, axis=0, out=None) > (same signature as max, min etc) I would be slightly in favour of this option. Using the same signature would be convenient in code like def myfunc(myarray, somefunc): # do stuff ... x = somefunc(myarray, axis = 0, out = None) # do more stuff ... where somefunc could be median(), mean(), max(), min(), std() etc. I once wrote this kind of function to provide (small) image filtering. If the same signature is used, there is no need to special-case median(). I realise it's kind of a niche example, though. Just my 0.02 euros. Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From lou_boog2000 at yahoo.com Tue Feb 12 08:01:45 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 12 Feb 2008 05:01:45 -0800 (PST) Subject: [Numpy-discussion] CTypes: How to incorporate a library with shared library module? In-Reply-To: <47B133A6.20102@soe.ucsc.edu> Message-ID: <587206.97473.qm@web34413.mail.mud.yahoo.com> Damian, Lots of good info there. Thanks very much. -- Lou --- Damian Eads wrote: > Dear Lou, > > You may want to try using distutils or setuputils, > which makes compiling > extensions much easier. It does the hard work of > finding out which flags > are needed to compile extensions on the host > platform. There are many > examples on the web on how to use distutils to build > C extensions > (http://docs.python.org/ext/building.html). [cut] -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From dalcinl at gmail.com Tue Feb 12 08:12:45 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 12 Feb 2008 10:12:45 -0300 Subject: [Numpy-discussion] f2py: sharing F90 module data between modules In-Reply-To: <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> References: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> Message-ID: On 2/12/08, Pearu Peterson wrote: > according to which makes your goal unachivable because of how > Python loads shared libraries *by default*, see below. > Try to use sys.setdlopenflags(...) before importing f2py generated > extension modules and then reset the state using sys.setdlopenflags(0). I also had to do something similar for solving a different problem, feel free to reuse the code here. This way, you have chances to make it working in a many platforms. You can put this in a __init__.py, and next import all your extensions inside the last try/finally block. http://projects.scipy.org/mpi4py/browser/mpi4py/trunk/src/_rtld.py -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From lou_boog2000 at yahoo.com Tue Feb 12 09:25:02 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 12 Feb 2008 06:25:02 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries" In-Reply-To: <47B133A6.20102@soe.ucsc.edu> Message-ID: <85282.66984.qm@web34410.mail.mud.yahoo.com> First, thanks to all who answered my questions about trying to use a large library with CTypes and my own shared library. The bottom line seems to be this: There is no way to incorporate code external to your own shared library. You have to either pull out the code you want from the static library's source (painful) or you must just include the whole library (huge!) and make it all one big shared library. Did I get that right? If so, it's a sad statement that makes shared libraries harder to write and works against the reuse of older established code bases. I am not criticizing CTypes. This appears to be the way static and shared libraries work, especially on Mac OS X, maybe elsewhere. I'd really like to be wrong about this and I will follow up on some of the suggested reading you all gave me. Thanks, again. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From fullung at gmail.com Tue Feb 12 09:42:06 2008 From: fullung at gmail.com (Albert Strasheim) Date: Tue, 12 Feb 2008 16:42:06 +0200 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <85282.66984.qm@web34410.mail.mud.yahoo.com> References: <47B133A6.20102@soe.ucsc.edu> <85282.66984.qm@web34410.mail.mud.yahoo.com> Message-ID: <5eec5f300802120642s2708058asac437989e40ddd19@mail.gmail.com> Hello, On Feb 12, 2008 4:25 PM, Lou Pecora wrote: > > First, thanks to all who answered my questions about > trying to use a large library with CTypes and my own > shared library. The bottom line seems to be this: > There is no way to incorporate code external to your > own shared library. You have to either pull out the > code you want from the static library's source > (painful) or you must just include the whole library > (huge!) and make it all one big shared library. > > Did I get that right? If so, it's a sad statement > that makes shared libraries harder to write and works > against the reuse of older established code bases. I > am not criticizing CTypes. This appears to be the way > static and shared libraries work, especially on Mac OS > X, maybe elsewhere. > > I'd really like to be wrong about this and I will > follow up on some of the suggested reading you all > gave me. I only quickly read through the previous thread, but I get that idea that what you want to do is to link your shared library against the the GSL shared library and then access your own library using ctypes. If done like this, you don't need to worry about wrapping GSL or pulling GSL code into your own library. As far as I know, this works exactly like it does when you link an executable against a shared library. If distutils doesn't allow you to do this easily, you could try using SCons's SharedLibrary builder instead. Regards, Albert From faltet at carabos.com Tue Feb 12 11:07:13 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 12 Feb 2008 17:07:13 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> Message-ID: <200802121707.13522.faltet@carabos.com> A Monday 11 February 2008, Charles R Harris escrigu?: > I'll check it out when I get home. As I say, it was running about 10% > slower on my machine, but if it does better on most platforms it is > probably the way to go. We can always change it in the future when > everyone is running on quantum computers. We've done some testing on newqsort in several computers in our company. Here are the results for ordering a list with 1 million of strings of length 15 filled with random information (using C rand()): 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.440000 NumPy newqsort: 0.650000 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.640000 C qsort with Python style compare: 0.600000 NumPy newqsort: 0.590000 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz) C qsort with C style compare: 1.770000 C qsort with Python style compare: 1.750000 NumPy newqsort: 0.440000 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz) C qsort with C style compare: 1.590000 C qsort with Python style compare: 1.550000 NumPy newqsort: 0.510000 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz) C qsort with C style compare: 1.890000 C qsort with Python style compare: 1.900000 NumPy newqsort: 0.500000 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 3.030000 C qsort with Python style compare: 2.970000 NumPy newqsort: 1.040000 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 1.560000 C qsort with Python style compare: 1.510000 NumPy newqsort: 1.220000 All benchmarks have been run using the attached benchmark (if anybody else wants to join the fiesta, please report back your feedback). Summarizing, one can say a couple of things: * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are suffering from a innefficient system qsort (at least the implementation for strings). SuSe Linux Enterprise 10.3 seems to have solved this. And Windows XP (SP2) and MacOSX (Tiger) looks like they have a relatively efficient implementation of qsort. * The newqsort performs the best on all the platforms we have checked (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some Pentium4/Ubuntu systems). All in all, I'd also say that newqsort would be a good candidate to be put into NumPy. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" -------------- next part -------------- A non-text attachment was scrubbed... Name: sort-string-bench.c Type: text/x-csrc Size: 5299 bytes Desc: not available URL: From lou_boog2000 at yahoo.com Tue Feb 12 11:19:37 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 12 Feb 2008 08:19:37 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <5eec5f300802120642s2708058asac437989e40ddd19@mail.gmail.com> Message-ID: <819218.10559.qm@web34402.mail.mud.yahoo.com> Albert Strasheim wrote: Hello, I only quickly read through the previous thread, but I get that idea that what you want to do is to link your shared library against the the GSL shared library and then access your own library using ctypes. If done like this, you don't need to worry about wrapping GSL or pulling GSL code into your own library. As far as I know, this works exactly like it does when you link an executable against a shared library. If distutils doesn't allow you to do this easily, you could try using SCons's SharedLibrary builder instead. Regards, Albert _______________________________________________ Albert, Yes, I think you got the idea right. I want to call my own C code using CTypes interface, then from within my C code call GSL C code, i.e. a C function calling another C function directly. I do *not* want to go back out through the Python interface. So you are right, I do not want to wrap GSL. It sounds like I can just add something like -lnameofGSLdylib (where I put in the real name of the GSL library after the -l) in my gcc command to make my shared lib. Is that right? Thanks for your help. -- Lou Pecora, my views are my own. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wright at esrf.fr Tue Feb 12 11:20:22 2008 From: wright at esrf.fr (Jon Wright) Date: Tue, 12 Feb 2008 17:20:22 +0100 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries" In-Reply-To: <85282.66984.qm@web34410.mail.mud.yahoo.com> References: <85282.66984.qm@web34410.mail.mud.yahoo.com> Message-ID: <47B1C746.2010907@esrf.fr> Lou Pecora wrote: >... This appears to be the way > static and shared libraries work, especially on Mac OS > X, maybe elsewhere. Have you tried linking against a GSL static library? I don't have a mac, but most linkers only pull in the routines you need. For example, using windows and mingw: #include #include int main (void) { double x = 5.0; double y = gsl_sf_bessel_J0 (x); printf ("J0(%g) = %.18e\n", x, y); return 0; } ...compiles to a.exe which outputs: J0(5) = -1.775967713143382900e-001 The stripped executable is about 92 kB in comparison to the 2 mega byte libgsl.a. Unstripped there are about 150 symbols containing gsl, compared to 5351 symbols in the library libgsl.a. I just needed to put "-lgsl" on the command line and rename "$LIB/libgsl.dll.def" to something else so the shared version wasn't found. In this case the linker has not pulled in all of the library. Presumably just the parts it needed, including various things like error reporting, sin, cos, exp etc. Older platforms, including vax and various unix'es also seemed to behave in the same way in the past. Are you saying the mac is somehow different? Perhaps they're trying to hold people to "open source ransom", where they have to give away to source so it can be recompiled when the next OSX escapes ;-) Cheers, Jon From fullung at gmail.com Tue Feb 12 11:59:46 2008 From: fullung at gmail.com (Albert Strasheim) Date: Tue, 12 Feb 2008 18:59:46 +0200 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & librarie In-Reply-To: <819218.10559.qm@web34402.mail.mud.yahoo.com> References: <5eec5f300802120642s2708058asac437989e40ddd19@mail.gmail.com> <819218.10559.qm@web34402.mail.mud.yahoo.com> Message-ID: <5eec5f300802120859u325c4454j7ec64a54eb61a5bb@mail.gmail.com> Hello, On Feb 12, 2008 6:19 PM, Lou Pecora wrote: > Albert, > > Yes, I think you got the idea right. I want to call my own C code using > CTypes interface, then from within my C code call GSL C code, i.e. a C > function calling another C function directly. I do *not* want to go back > out through the Python interface. So you are right, I do not want to wrap > GSL. > > It sounds like I can just add something like -lnameofGSLdylib (where I > put in the real name of the GSL library after the -l) in my gcc command to > make my shared lib. Is that right? Sounds about right. I don't know the Mac that well as far as the various types of dynamic libraries go, so just check that you're working with the right type of libraries, but you've got the right idea. Regards, Albert From charlesr.harris at gmail.com Tue Feb 12 13:34:14 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Feb 2008 11:34:14 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802121707.13522.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: On Feb 12, 2008 9:07 AM, Francesc Altet wrote: > A Monday 11 February 2008, Charles R Harris escrigu?: > > I'll check it out when I get home. As I say, it was running about 10% > > slower on my machine, but if it does better on most platforms it is > > probably the way to go. We can always change it in the future when > > everyone is running on quantum computers. > > We've done some testing on newqsort in several computers in our company. > Here are the results for ordering a list with 1 million of strings of > length 15 filled with random information (using C rand()): > > 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) > C qsort with C style compare: 2.450000 > C qsort with Python style compare: 2.440000 > NumPy newqsort: 0.650000 > > 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz) > C qsort with C style compare: 0.971000 > C qsort with Python style compare: 0.962000 > NumPy newqsort: 0.921000 > > 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) > C qsort with C style compare: 0.640000 > C qsort with Python style compare: 0.600000 > NumPy newqsort: 0.590000 > > 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz) > C qsort with C style compare: 1.770000 > C qsort with Python style compare: 1.750000 > NumPy newqsort: 0.440000 > > 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz) > C qsort with C style compare: 1.590000 > C qsort with Python style compare: 1.550000 > NumPy newqsort: 0.510000 > > 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz) > C qsort with C style compare: 1.890000 > C qsort with Python style compare: 1.900000 > NumPy newqsort: 0.500000 > > 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz) > C qsort with C style compare: 3.030000 > C qsort with Python style compare: 2.970000 > NumPy newqsort: 1.040000 > > 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz) > C qsort with C style compare: 1.560000 > C qsort with Python style compare: 1.510000 > NumPy newqsort: 1.220000 > > All benchmarks have been run using the attached benchmark (if anybody > else wants to join the fiesta, please report back your feedback). > > Summarizing, one can say a couple of things: > > * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are > suffering from a innefficient system qsort (at least the implementation > for strings). SuSe Linux Enterprise 10.3 seems to have solved this. > And Windows XP (SP2) and MacOSX (Tiger) looks like they have a > relatively efficient implementation of qsort. > > * The newqsort performs the best on all the platforms we have checked > (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some > Pentium4/Ubuntu systems). > The 3.8 is amazing, isn't it? I've found that the performance also depends on whether I initialize the strings with random or empty. With the random initialization the new sort is ~2x faster. That's fedora 8, core duo, 32 bit OS, gcc 4.1.2. > All in all, I'd also say that newqsort would be a good candidate to be > put into NumPy. > I've merged some sorting tests preparatory to merging the new sorts. There is a release coming up this weekend, I don't know if it is tagged yet, but in any case I plan to merge the new sorts soon. Please help with the testing when I do. Now it's off to paying work. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Feb 12 13:53:55 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 12 Feb 2008 12:53:55 -0600 Subject: [Numpy-discussion] String sort In-Reply-To: <200802121707.13522.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: Hi, I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives C qsort with C style compare: 0.650000 C qsort with Python style compare: 0.640000 NumPy newqsort: 0.360000 I did notice that -O3 was essential to get the performance gain as -O2 gave: C qsort with C style compare: 0.690000 C qsort with Python style compare: 0.700000 NumPy newqsort: 0.610000 Bruce On Feb 12, 2008 10:07 AM, Francesc Altet wrote: > A Monday 11 February 2008, Charles R Harris escrigu?: > > I'll check it out when I get home. As I say, it was running about 10% > > slower on my machine, but if it does better on most platforms it is > > probably the way to go. We can always change it in the future when > > everyone is running on quantum computers. > > We've done some testing on newqsort in several computers in our company. > Here are the results for ordering a list with 1 million of strings of > length 15 filled with random information (using C rand()): > > 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) > C qsort with C style compare: 2.450000 > C qsort with Python style compare: 2.440000 > NumPy newqsort: 0.650000 > > 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz) > C qsort with C style compare: 0.971000 > C qsort with Python style compare: 0.962000 > NumPy newqsort: 0.921000 > > 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) > C qsort with C style compare: 0.640000 > C qsort with Python style compare: 0.600000 > NumPy newqsort: 0.590000 > > 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz) > C qsort with C style compare: 1.770000 > C qsort with Python style compare: 1.750000 > NumPy newqsort: 0.440000 > > 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz) > C qsort with C style compare: 1.590000 > C qsort with Python style compare: 1.550000 > NumPy newqsort: 0.510000 > > 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz) > C qsort with C style compare: 1.890000 > C qsort with Python style compare: 1.900000 > NumPy newqsort: 0.500000 > > 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz) > C qsort with C style compare: 3.030000 > C qsort with Python style compare: 2.970000 > NumPy newqsort: 1.040000 > > 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz) > C qsort with C style compare: 1.560000 > C qsort with Python style compare: 1.510000 > NumPy newqsort: 1.220000 > > All benchmarks have been run using the attached benchmark (if anybody > else wants to join the fiesta, please report back your feedback). > > Summarizing, one can say a couple of things: > > * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are > suffering from a innefficient system qsort (at least the implementation > for strings). SuSe Linux Enterprise 10.3 seems to have solved this. > And Windows XP (SP2) and MacOSX (Tiger) looks like they have a > relatively efficient implementation of qsort. > > * The newqsort performs the best on all the platforms we have checked > (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some > Pentium4/Ubuntu systems). > > All in all, I'd also say that newqsort would be a good candidate to be > put into NumPy. > > Cheers, > > -- > >0,0< Francesc Altet http://www.carabos.com/ > V V C?rabos Coop. V. Enjoy Data > "-" > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From robert.kern at gmail.com Tue Feb 12 13:54:47 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Feb 2008 12:54:47 -0600 Subject: [Numpy-discussion] Median advice In-Reply-To: <72D87DAD-8AD5-4CBB-94A4-5DCD99DCCE05@ster.kuleuven.be> References: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> <72D87DAD-8AD5-4CBB-94A4-5DCD99DCCE05@ster.kuleuven.be> Message-ID: <3d375d730802121054t35245412k756c5564601a913b@mail.gmail.com> On Feb 12, 2008 6:33 AM, Joris De Ridder wrote: > > On 12 Feb 2008, at 12:31, Matthew Brett wrote: > > > def median(a, axis=0, out=None) > > (same signature as max, min etc) > > I would be slightly in favour of this option. > Using the same signature would be convenient in code like > > def myfunc(myarray, somefunc): > > # do stuff > ... > x = somefunc(myarray, axis = 0, out = None) > > # do more stuff > ... > > > where somefunc could be median(), mean(), max(), min(), std() etc. I > once wrote this kind of function to provide (small) image filtering. > If the same signature is used, there is no need to special-case > median(). I realise it's kind of a niche example, though. I'm happy with that use case. The docstring should mention that out= is not memory-optimized like it is for the others, though. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue Feb 12 14:07:06 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Feb 2008 12:07:06 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: On Feb 12, 2008 11:53 AM, Bruce Southey wrote: > Hi, > > I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives > C qsort with C style compare: 0.650000 > C qsort with Python style compare: 0.640000 > NumPy newqsort: 0.360000 > > I did notice that -O3 was essential to get the performance gain as -O2 > gave: > C qsort with C style compare: 0.690000 > C qsort with Python style compare: 0.700000 > NumPy newqsort: 0.610000 > Try -O2 -finline-functions, it should come in somewhere between -O2 and -O3 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Tue Feb 12 15:48:16 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 12 Feb 2008 21:48:16 +0100 Subject: [Numpy-discussion] Median advice In-Reply-To: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> References: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> Message-ID: On 12/02/2008, Matthew Brett wrote: > Suggestion 1: > def median(a, axis=0, out=None) [...] > Suggestion 2: > def median(a, axis=0, scratch_input=False) No reason not to combine the two. It's a pretty straightforward modification to do the sorting in place, and it could make a lot more difference to the runtime (and memory usage) than an output array. Anne From matthew.brett at gmail.com Tue Feb 12 16:09:11 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Feb 2008 21:09:11 +0000 Subject: [Numpy-discussion] Median advice In-Reply-To: References: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> Message-ID: <1e2af89e0802121309k72b4860cvd8f35edf8a908099@mail.gmail.com> Hi, On Feb 12, 2008 8:48 PM, Anne Archibald wrote: > On 12/02/2008, Matthew Brett wrote: > > > Suggestion 1: > > def median(a, axis=0, out=None) > [...] > > Suggestion 2: > > def median(a, axis=0, scratch_input=False) > > No reason not to combine the two. It's a pretty straightforward > modification to do the sorting in place, and it could make a lot more > difference to the runtime (and memory usage) than an output array. Yes, true. I'll include it unless anyone pipes up to object. Matthew From matthew.brett at gmail.com Tue Feb 12 16:54:00 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Feb 2008 21:54:00 +0000 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> Message-ID: <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> Hi, To rephrase: Is it possible, in fact, to do an inplace sort on an array with axis=None (ie flat sort)? Should the sort method have its docstring changed to reflect the fact that axis=None is not valid? Matthew On Feb 10, 2008 7:50 PM, Matthew Brett wrote: > Hi, > > I just noticed this: > > From the sort method docstring: > > axis : integer > Axis to be sorted along. None indicates that the flattened array > should be used. Default is -1. > > In [40]: import numpy as N > > In [41]: a = N.arange(10) > > In [42]: N.sort(a, None) > Out[42]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [43]: a.sort(None) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > > /home/mb312/ in () > > TypeError: an integer is required > > > Perhaps the sort method is calling the c code directly, and this is > not checking for axis=None? > > Matthew > From lou_boog2000 at yahoo.com Tue Feb 12 17:18:37 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 12 Feb 2008 14:18:37 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries" In-Reply-To: <47B1C746.2010907@esrf.fr> Message-ID: <514394.62270.qm@web34413.mail.mud.yahoo.com> --- Jon Wright wrote: > Lou Pecora wrote: > >... This appears to be the way > > static and shared libraries work, especially on > Mac OS > > X, maybe elsewhere. > > Have you tried linking against a GSL static library? > I don't have a mac, > but most linkers only pull in the routines you need. > For example, using > windows and mingw: > > #include > #include > int main (void) > { double x = 5.0; > double y = gsl_sf_bessel_J0 (x); > printf ("J0(%g) = %.18e\n", x, y); > return 0; } > > ...compiles to a.exe which outputs: > > J0(5) = -1.775967713143382900e-001 > Yes, I know about this approach if I am making an executable. But I want to make my code into a shared library (my code will not have a main, just the functions I write) and, if possible, let my code call the GSL code it needs from the C function I write (i.e. no python interface). If what you did can be done for a shared library, then that would be great. However, I am ignorant of how to do this. I will try to make my shared library using gcc and then add the GSL library using the -l option as someone else suggested. Maybe that will work. I'll report back. I have been searching for info on the right approach to this on the Mac, since, as I understand, Mac OS X does make a distinction between shared libraries and dynamic libraries (which I don't understand fully). Thanks. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From lou_boog2000 at yahoo.com Tue Feb 12 17:19:50 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 12 Feb 2008 14:19:50 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & librarie In-Reply-To: <5eec5f300802120859u325c4454j7ec64a54eb61a5bb@mail.gmail.com> Message-ID: <649671.993.qm@web34415.mail.mud.yahoo.com> --- Albert Strasheim wrote: > Hello, > > Sounds about right. I don't know the Mac that well > as far as the > various types of dynamic libraries go, so just check > that you're > working with the right type of libraries, but you've > got the right > idea. > > Regards, > > Albert Thanks, Albert. I'll report back to this thread when I give it a try. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From peridot.faceted at gmail.com Tue Feb 12 20:56:59 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 12 Feb 2008 20:56:59 -0500 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> Message-ID: On 12/02/2008, Matthew Brett wrote: > Is it possible, in fact, to do an inplace sort on an array with > axis=None (ie flat sort)? It is, sometimes; just make an array object to point to the flattened version and sort that: In [16]: b = a[:] In [17]: b.shape = (16,) In [18]: b.sort() This is not always possible, depending on the arrangement of a in memory. An efficient way to handle in-place (or out-of-place, come to think of it) median along multiple axes is actually to take medians along all axes in succession. That saves you some sorting effort, and some programming effort, and doesn't require in-place multidimensional sorting: In [24]: def all_axes_median(a): ....: if len(a.shape)>1: ....: return all_axes_median(N.median(a)) ....: else: ....: return N.median(a) ....: ....: In [26]: all_axes_median(N.reshape(N.arange(32),(2,4,2,-1))) Out[26]: 15.5 Anne From peridot.faceted at gmail.com Tue Feb 12 21:00:56 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 12 Feb 2008 21:00:56 -0500 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> Message-ID: On 12/02/2008, Anne Archibald wrote: > An efficient way to handle in-place (or out-of-place, come to think of > it) median along multiple axes is actually to take medians along all > axes in succession. That saves you some sorting effort, and some > programming effort, and doesn't require in-place multidimensional > sorting: Aargh. Sorry. No, that doesn't work: In [28]: all_axes_median(N.reshape([1,5,6,7],(2,2))) Out[28]: 4.75 Oops. Anne From charlesr.harris at gmail.com Tue Feb 12 21:59:25 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Feb 2008 19:59:25 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802112115.11255.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: OK, The new quicksorts are in svn. Francesc, can you check them out? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Wed Feb 13 00:48:33 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 13 Feb 2008 14:48:33 +0900 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries" In-Reply-To: <514394.62270.qm@web34413.mail.mud.yahoo.com> References: <514394.62270.qm@web34413.mail.mud.yahoo.com> Message-ID: <47B284B1.9040705@ar.media.kyoto-u.ac.jp> Lou Pecora wrote: > --- Jon Wright wrote: > >> Lou Pecora wrote: >> >... This appears to be the way >>> static and shared libraries work, especially on >> Mac OS >>> X, maybe elsewhere. >> Have you tried linking against a GSL static library? >> I don't have a mac, >> but most linkers only pull in the routines you need. >> For example, using >> windows and mingw: >> >> #include >> #include >> int main (void) >> { double x = 5.0; >> double y = gsl_sf_bessel_J0 (x); >> printf ("J0(%g) = %.18e\n", x, y); >> return 0; } >> >> ...compiles to a.exe which outputs: >> >> J0(5) = -1.775967713143382900e-001 >> > > Yes, I know about this approach if I am making an > executable. But I want to make my code into a shared > library (my code will not have a main, just the > functions I write) and, if possible, let my code call > the GSL code it needs from the C function I write > (i.e. no python interface). If what you did can be > done for a shared library, then that would be great. > However, I am ignorant of how to do this. I will try > to make my shared library using gcc and then add the > GSL library using the -l option as someone else > suggested. Maybe that will work. Oh, I may have misunderstood what you are trying to do then. You just want to call a shared library from another shared library ? This is possible on any platform supporting shared library (including but not limited to mac os x, windows, linux, most not ancient unices). As Albert said, just do (with gcc): gcc -shared -o mysharedlib mysharedlib.c -lgsl This works on mac os X as well as linux (and even windows with mingw). If you want to link the gsl statically (so that your own lib does not depend on the gsl anymore), you have use a trick to tell gcc to link the gsl: gcc -shared -o mysharedlib mysharedlib.c -Wl,-Bstatic -lgsl -Wl,-Bdynamic -Wl is used by gcc to pass option to the linker directly. -Bstatic says that all link options after will be static. You have to use -Bdynamic after, to avoid linking everything static (like gcc runtime, the C lib, which are automatically linked by default by gcc: it is almost always a bad idea to statically link those). In the first case, mysharedlib.so will need libgsl.so: ldd mysharedlib.so -> linux-gate.so.1 => (0xffffe000) libgsl.so.0 => /usr/lib/libgsl.so.0 (0xb7ddb000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c91000) libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7c6b000) /lib/ld-linux.so.2 (0x80000000) In the second case: linux-gate.so.1 => (0xffffe000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7e60000) /lib/ld-linux.so.2 (0x80000000) I don't know if the second method works on mac os X: since it bypass gcc and goes directly to the linker, which is notably different on mac os X, you may have to do it differently. > I'll report back. > I have been searching for info on the right approach > to this on the Mac, since, as I understand, Mac OS X > does make a distinction between shared libraries and > dynamic libraries (which I don't understand fully). To be over-simplistic: shared libraries are linked into the executable, and all its symbols (function, variables) are solved when you launch the executable. A dynamic library is not linked into the executable, and can be loaded at anytime during the execution of the executable. Shared library are "just" a way to avoid duplicate code, but are totally transparent to the code user: int foo() { return bar(); } If bar is in a shared lib (libbar.so) or in another object code (bar.o), it does not make a difference for you. With a dynamic lib, on most unices, you do hdl = dlopen("libbar.so") ((int)(*bar)()) = dlsym(hdl, "bar"); That is you explicitly load the functions you want to use. Without this scheme, you would have to link your extension to the python executable when python is built, which is totally impractical of course. IOW, dynamic libraries are used for "plugins", things which can be added to an executable *after* the executable is built. On linux (and other unices using the elf binary format), both types of libraries are built exactly the same. On mac os X (and windows as well), they are not. Again, this is oversimplification, but you don't need to know much more in almost all the cases. cheers, David From gael.varoquaux at normalesup.org Wed Feb 13 03:13:44 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 13 Feb 2008 09:13:44 +0100 Subject: [Numpy-discussion] Median advice In-Reply-To: <1e2af89e0802121309k72b4860cvd8f35edf8a908099@mail.gmail.com> References: <1e2af89e0802120331u278caa49j3cbd53772829c9d6@mail.gmail.com> <1e2af89e0802121309k72b4860cvd8f35edf8a908099@mail.gmail.com> Message-ID: <20080213081343.GD4837@phare.normalesup.org> On Tue, Feb 12, 2008 at 09:09:11PM +0000, Matthew Brett wrote: > > > Suggestion 1: > > > def median(a, axis=0, out=None) > > [...] > > > Suggestion 2: > > > def median(a, axis=0, scratch_input=False) > > No reason not to combine the two. It's a pretty straightforward > > modification to do the sorting in place, and it could make a lot more > > difference to the runtime (and memory usage) than an output array. > Yes, true. I'll include it unless anyone pipes up to object. Scratch_input is not clear at all for me. I would suggest "inplace" rather. Ga?l From faltet at carabos.com Wed Feb 13 05:34:46 2008 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Feb 2008 11:34:46 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: <200802131134.47114.faltet@carabos.com> A Tuesday 12 February 2008, Charles R Harris escrigu?: > On Feb 12, 2008 9:07 AM, Francesc Altet wrote: > > * The newqsort performs the best on all the platforms we have > > checked (ranging from a 5% of improvement on Opteron/SuSe, up to > > 3.8x with some Pentium4/Ubuntu systems). > > The 3.8 is amazing, isn't it? I've found that the performance also > depends on whether I initialize the strings with random or empty. > With the random initialization the new sort is ~2x faster. That's > fedora 8, core duo, 32 bit OS, gcc 4.1.2. Well, for me, a 3.8x (or even a 2x for that matter) of improvement is less amazing once you know that there is a flaw in C qsort for most of Linux distros around. Neither Windows, MacOSX or certain versions of Linux (namely, SuSe 10.3) does reflect such a large difference in performance. I'd say that newqsort that is to be included in next release of NumPy would be just a 10% better than using a sane implementation of C qsort. And, while a 10% is not as amazing than a 380%, the great news is that newqsort will provide first-class performance even to people having the flawed C qsort on their machines (which apparently are much more that I initially realized). Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Wed Feb 13 05:35:33 2008 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Feb 2008 11:35:33 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> Message-ID: <200802131135.34122.faltet@carabos.com> A Tuesday 12 February 2008, Bruce Southey escrigu?: > Hi, > > I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that > gives C qsort with C style compare: 0.650000 > C qsort with Python style compare: 0.640000 > NumPy newqsort: 0.360000 That's very intersting. In a similar configuration, but using SuSe 10.3 (Enterprise) instead of 10.1, I don't see this factor of almost 2 of difference in performance (in fact, both performances, C qsort and NumPy newqsort, are very similar). This seems to confirm that the GNU glibc crew has fixed the qsort performance very recently (i.e. I hope it is not only a fix in SuSe 10.3 Enterprise), and this is why most of current distros are seeing the poor performance in qsort. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Wed Feb 13 09:27:47 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 07:27:47 -0700 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: <1202790290.29123.11.camel@bbc8> References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> <1202790290.29123.11.camel@bbc8> Message-ID: On Feb 11, 2008 9:24 PM, David Cournapeau wrote: > On Mon, 2008-02-11 at 20:42 -0700, Charles R Harris wrote: > > > > > > On Feb 11, 2008 7:10 PM, David Cournapeau wrote: > > > > On Feb 11, 2008 5:40 PM, Charles R Harris > > wrote: > > > > > > > > > > > > On Feb 11, 2008 1:21 AM, Robert Kern > > wrote: > > > > > > > I've just updated the SVN trunk to get the latest numscons > > merge. > > > > Something broke the support I put in for the setuptools > > "develop" > > > > command. In order to make sure that setuptools' "develop" > > works with > > > > numpy.distutils' "build_src", we override the "develop" > > command to > > > > reinitialize the "build_src" command to add the --inplace > > option. This > > > > used to work as of r4772, but now any Fortran Extensions > > have the > > > > generated sources added twice. This causes links to fail > > since the > > > > same symbol shows up twice. > > > > > > > > > > While we're talking build, how do I set the compiler flags? > > Numpy here > > > always compiles with -march=i386, which seems a bit > > conservative. My > > > environment flags are also ignored, but I assume there is > > someway of getting > > > the compile to behave. > > > > Well, you assumed wrong :) > > > > For some reason I was hoping you would pipe up :0) Yeah, that in > > itself is a good reason for trying something like scons. > > To be more exact: you can simply add flags in distutils. It is just that > there is not much logic to handle different cases (e.g. having different > set of warnings for pyrex vs swig vs normal C extensions is difficult). > > > I note that with -O2 -finline-functions, or -O3, I can knock almost > > 30% off the string sort times. That's a lot better than I can do > > fooling around with the code. > > You should be able to do it just with numpy.distutils: > > CFLAGS="-O2 -finline-functions" setup.py build > Curiously, CFLAGS="-O3 -finline-functions" causes the -fno-strict-aliasing flag to disappear when the random module is compiled, resulting in a lot of warnings and, in my experience, probably buggy code generation. So a safer bet is CFLAGS="-O3 -finline-functions -fno-strict-aliasing" . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lou_boog2000 at yahoo.com Wed Feb 13 10:14:04 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Wed, 13 Feb 2008 07:14:04 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries" In-Reply-To: <47B284B1.9040705@ar.media.kyoto-u.ac.jp> Message-ID: <127604.8424.qm@web34401.mail.mud.yahoo.com> --- David Cournapeau wrote: > Oh, I may have misunderstood what you are trying to > do then. You just > want to call a shared library from another shared > library ? This is > possible on any platform supporting shared library > (including but not > limited to mac os x, windows, linux, most not > ancient unices). [cut] David, First, thanks very much for all the information. I am still digesting it, but you gave a clear explanation about the difference between shared and dynamic libraries on the Mac. I tried some of your compile/like commands, but the Mac gcc did not understand some things like -Bstatic and -shared. It seems to want to make bundles. I guess your code was a Linux version which the Mac doesn't like. But encouraged by your help, I got the following make file to work: # ---- Library make --------------------------- mysharedlib.so: mysharedlib.o mysharedlib.mak gcc -bundle -flat_namespace -undefined suppress -o mysharedlib.so mysharedlib.o \ fcnlib.a # ---- gcc C compile ------------------ mysharedlib.o: mysharedlib.c mysharedlib.h mysharedlib.mak gcc -c mysharedlib.c -o mysharedlib.o In the above fcnlib.a is a simple static library I made before using the above make. This created the shared library mysharedlib.so which I imported and handled with CTypes. Calling a function in fcnlib.a from python worked. A possible downside is that the shared library contains *all* of the fcnlib. I examined it using "nm mysharedlib.so". That showed that all the functions of fcnlib.a were present in mysharedlib.so even though the function in mysharedlib.c only called one function of fcnlib.a. I don't know how much of a burden this will impose at run time if do this with GSL. It would be nice to only pick up the stuff I need. But at least I have workable approach. Thanks for your help. Comments welcome. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From cournape at gmail.com Wed Feb 13 10:38:50 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Feb 2008 00:38:50 +0900 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <127604.8424.qm@web34401.mail.mud.yahoo.com> References: <47B284B1.9040705@ar.media.kyoto-u.ac.jp> <127604.8424.qm@web34401.mail.mud.yahoo.com> Message-ID: <5b8d13220802130738o4568b36dqe16c13799cbdfd2c@mail.gmail.com> On Feb 14, 2008 12:14 AM, Lou Pecora wrote: > > --- David Cournapeau > wrote: > > > Oh, I may have misunderstood what you are trying to > > do then. You just > > want to call a shared library from another shared > > library ? This is > > possible on any platform supporting shared library > > (including but not > > limited to mac os x, windows, linux, most not > > ancient unices). > [cut] > > David, > > First, thanks very much for all the information. I am > still digesting it, but you gave a clear explanation > about the difference between shared and dynamic > libraries on the Mac. > > I tried some of your compile/like commands, but the > Mac gcc did not understand some things like -Bstatic > and -shared. It seems to want to make bundles. I > guess your code was a Linux version which the Mac > doesn't like. Yes, I forgot that -shared does not work on mac os X. -Bstatic, being a linker option as I said, had little chance to work on mac os X. But encouraged by your help, I got the > > # ---- Library make --------------------------- > mysharedlib.so: mysharedlib.o mysharedlib.mak > gcc -bundle -flat_namespace -undefined suppress -o > mysharedlib.so mysharedlib.o \ > fcnlib.a > > # ---- gcc C compile ------------------ > mysharedlib.o: mysharedlib.c mysharedlib.h > mysharedlib.mak > gcc -c mysharedlib.c -o mysharedlib.o > > In the above fcnlib.a is a simple static library I > made before using the above make. This created the > shared library mysharedlib.so which I imported and > handled with CTypes. Calling a function in fcnlib.a > from python worked. > > A possible downside is that the shared library > contains *all* of the fcnlib. I examined it using "nm > mysharedlib.so". That showed that all the functions > of fcnlib.a were present in mysharedlib.so even though > the function in mysharedlib.c only called one function > of fcnlib.a. I don't know how much of a burden this > will impose at run time if do this with GSL. It would > be nice to only pick up the stuff I need. But at least > I have workable approach. > This is logical, or more exactly, the tools did what it thinks you are asking: putting the archive (the .a library) in your executable. If instead of libfnclib.a, you just put -lfcnlib in the command line, it will pick up only the necessary code for the functions you are calling (at least it does on linux if I remember correctly). But the real question is : if you are concerned with code bload, why using static lib at all ? Why not using shared library, which is exactly designed to solve what you are trying to do ? cheers, David From cournape at gmail.com Wed Feb 13 10:41:03 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Feb 2008 00:41:03 +0900 Subject: [Numpy-discussion] New bug with "setup,py develop" In-Reply-To: References: <3d375d730802110021t11716895y32450a8a7e11f4f9@mail.gmail.com> <5b8d13220802111810r501c3fav9f6f0f0c9d62863d@mail.gmail.com> <1202790290.29123.11.camel@bbc8> Message-ID: <5b8d13220802130741h1bdc8259tb96b34fed102c31c@mail.gmail.com> On Feb 13, 2008 11:27 PM, Charles R Harris wrote: > > Curiously, CFLAGS="-O3 -finline-functions" causes the -fno-strict-aliasing > flag to disappear when the random module is compiled, resulting in a lot of > warnings and, in my experience, probably buggy code generation. So a safer > bet is CFLAGS="-O3 -finline-functions -fno-strict-aliasing" . That's the kind of weirdness which does not happen (or at least is not support to happen) with numscons :) cheers, David From bsouthey at gmail.com Wed Feb 13 11:19:31 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 13 Feb 2008 10:19:31 -0600 Subject: [Numpy-discussion] String sort In-Reply-To: <200802131135.34122.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> <200802131135.34122.faltet@carabos.com> Message-ID: Hi, I added gcc 4.2 from the openSUSE 10.1 repository so I now have both the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with 4.1.2 so I think that there could be a difference in the compiler flags. I don't know enough about those to help but I can test any suggestions. $ gcc --version gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) $ gcc -O3 sort-string-bench.c -o sort412 $ ./sort412 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.630000 C qsort with Python style compare: 0.640000 NumPy newqsort: 0.360000 $ gcc-4.2 --version gcc-4.2 (GCC) 4.2.1 (SUSE Linux) $ gcc-4.2 -O3 sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.620000 C qsort with Python style compare: 0.610000 NumPy newqsort: 0.550000 This is the same as: $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.710000 C qsort with Python style compare: 0.700000 NumPy newqsort: 0.550000 (NumPy newqsort with -O2 alone is 0.60000) For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000 Regards Bruce On Feb 13, 2008 4:35 AM, Francesc Altet wrote: > A Tuesday 12 February 2008, Bruce Southey escrigu?: > > Hi, > > > > I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that > > gives C qsort with C style compare: 0.650000 > > C qsort with Python style compare: 0.640000 > > NumPy newqsort: 0.360000 > > That's very intersting. In a similar configuration, but using SuSe 10.3 > (Enterprise) instead of 10.1, I don't see this factor of almost 2 of > difference in performance (in fact, both performances, C qsort and > NumPy newqsort, are very similar). > > This seems to confirm that the GNU glibc crew has fixed the qsort > performance very recently (i.e. I hope it is not only a fix in SuSe > 10.3 Enterprise), and this is why most of current distros are seeing > the poor performance in qsort. > > > Cheers, > > -- > >0,0< Francesc Altet http://www.carabos.com/ > V V C?rabos Coop. V. Enjoy Data > "-" > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From lou_boog2000 at yahoo.com Wed Feb 13 11:20:41 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Wed, 13 Feb 2008 08:20:41 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <5b8d13220802130738o4568b36dqe16c13799cbdfd2c@mail.gmail.com> Message-ID: <79117.1061.qm@web34414.mail.mud.yahoo.com> --- David Cournapeau wrote: > But the real question is : if you are concerned with > code bload, why > using static lib at all ? Why not using shared > library, which is > exactly designed to solve what you are trying to do > ? > cheers, > David Yes, a good question. Two reasons I started off with the static library. One is that Gnu instructions claimed the dynamic library did not always build properly on the Mac OS X. So I just built the static GSL and figured if I got that to link up to my code, I could then spend some time trying the dynamic build. The other reason is that I am just learning this and I am probably backing into the "right" way to do this rather than starting right off with the right way. Maybe my worries about bloat and (even more) time to load are not important for the GSL and the code will load fast enough and not take up too much in resources to matter. Later today I will try to build the dynamic version of GSL and see what that yields. If I get it I will link to that as you suggest. Thanks, again. Your suggestions have moved me along nicely. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From charlesr.harris at gmail.com Wed Feb 13 11:50:36 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 09:50:36 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802121707.13522.faltet@carabos.com> <200802131135.34122.faltet@carabos.com> Message-ID: On Feb 13, 2008 9:19 AM, Bruce Southey wrote: > Hi, > I added gcc 4.2 from the openSUSE 10.1 repository so I now have both > the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 > installed. I see your result with 4.2.1 but not with 4.1.2 so I think > that there could be a difference in the compiler flags. I don't know > enough about those to help but I can test any suggestions. > > $ gcc --version > gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) > $ gcc -O3 sort-string-bench.c -o sort412 > $ ./sort412 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.630000 > C qsort with Python style compare: 0.640000 > NumPy newqsort: 0.360000 > > $ gcc-4.2 --version > gcc-4.2 (GCC) 4.2.1 (SUSE Linux) > $ gcc-4.2 -O3 sort-string-bench.c -o sort421 > $ ./sort421 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.620000 > C qsort with Python style compare: 0.610000 > NumPy newqsort: 0.550000 > > This is the same as: > $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 > $ ./sort421 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.710000 > C qsort with Python style compare: 0.700000 > NumPy newqsort: 0.550000 > > (NumPy newqsort with -O2 alone is 0.60000) > > For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is > NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000 > It's certainly interesting how much difference the compiler/flags make. I suppose one more thing to add to the mix is the -march and -mtune flags. They didn't make much difference here, but they might for someone else. It would also be interesting to see how a 64 bit system handles things. /lib/libgcc_s-4.1.2 gcc 4.1.2, -O3 -march=prescott -mtune=generic Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.940000 C qsort with Python style compare: 0.940000 NumPy newqsort: 0.310000 I suppose much also depends on the flags with which libgcc is compiled, which in turn probably depends on the distro. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Wed Feb 13 12:21:56 2008 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Feb 2008 18:21:56 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802131135.34122.faltet@carabos.com> Message-ID: <200802131821.57305.faltet@carabos.com> A Wednesday 13 February 2008, Bruce Southey escrigu?: > Hi, > I added gcc 4.2 from the openSUSE 10.1 repository so I now have both > the 4.1.2 and 4.2.1 compilers installed. But still have > glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with > 4.1.2 so I think that there could be a difference in the compiler > flags. I don't know enough about those to help but I can test any > suggestions. > > $ gcc --version > gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) > $ gcc -O3 sort-string-bench.c -o sort412 > $ ./sort412 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.630000 > C qsort with Python style compare: 0.640000 > NumPy newqsort: 0.360000 > > $ gcc-4.2 --version > gcc-4.2 (GCC) 4.2.1 (SUSE Linux) > $ gcc-4.2 -O3 sort-string-bench.c -o sort421 > $ ./sort421 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.620000 > C qsort with Python style compare: 0.610000 > NumPy newqsort: 0.550000 > > This is the same as: > $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 > $ ./sort421 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.710000 > C qsort with Python style compare: 0.700000 > NumPy newqsort: 0.550000 > > (NumPy newqsort with -O2 alone is 0.60000) > > For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' > is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000 That's really interesting. Let me remember my figures for our Opteron: 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.640000 C qsort with Python style compare: 0.600000 NumPy newqsort: 0.590000 Yours is running at a clock rate of 2.66 GHz and mine at 2 GHz, so that makes yours a 33% faster. With this, I'd expect, for newqsort in your machine and using gcc 4.2.1 something like 0.59/1.33 = 0.44s. Of course, this is not the case, and you are getting 0.55s, which is only a 7% faster. This mostly reflects the fact that newqsort is bounded by other things than CPU speed (most probably, memory latency, but I might be wrong). But the most important thing is that it turns out that gcc 4.2.1 is doing a much worse job in optimizing newqsort than gcc 4.1.2 (at least on Opterons). That is unfortunate, because the similar performance of C qsort and newqsort on SuSe 10.3 made me think that it was due to the fact that SuSe speed-up the C qsort. I see now that this is not the case, and the problem seems that gcc 4.2.1 is not able to optimize newqsort as much as 4.1.2. So, it is becoming more and more clear that newqsort is potentially much faster that C qsort: you have seen a 2x of speedup, Chuck a 3x and me up to a 3.8x. The only issue seems to find a good enough compiler (or find the correct flags) to take advantage of all of its potential, which doesn't seem an easy thing indeed :-/ Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Wed Feb 13 12:56:11 2008 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Feb 2008 18:56:11 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> Message-ID: <200802131856.11563.faltet@carabos.com> A Wednesday 13 February 2008, Charles R Harris escrigu?: > OK, > > The new quicksorts are in svn. Francesc, can you check them out? > Looks good here. However, you seem to keep using your own copy_string() instead of plain memcpy(). In previous benchmarks, I've seen that copy_string() is faster than memcpy only for small values of the length of the block to be copied. In order to do a better assessment of this affirmation, I've created a small benchmark (attached) in order to compare your copy_string against system memcpy when sorting array strings. I've also come up with a new function (copy_string2, attached), that tries to combine the best of copy_string and memcpy. Look at the attached plot in order to see how they behave. In the plot, you can see how memcpy is generally faster than copy_string, specially for relatively large string lengths. However, copy_string can be faster for small lengths (the maximum difference, a 20%, happens around 8/10 bytes). It may happen that you were doing time mesaurements whith strings of size 8, and you were driven to the wrong conclusion that copy_string was faster than memcpy. In case you think that performance for small string lengths is important, you may want to include copy_string2, that uses one method or another depending on the size block to be copied. There is of course a small impact in performance (one more condition test was introduced), but it is rather negligible. Finally, you also will have noticed the indirect sort line in the plot. This is because I was curious about when this method would win a direct sort. And, by looking at the plot, it seems that the crosspoint is around strings of 128 bytes (much more in fact that I initially thought), and starts to be very significant (around 40% faster) at 256 bytes. So perhaps it would make sense to add the possibility to choose the indirect method when sorting those large strings. This, of course, would require more memory for the indices, but using 4 or 8 additional bytes (depending if we on 32-bit or 64-bit), when each string takes 200 bytes, doesn't seem too crazy. In any case, it would be nice to document this in docstrings. Be warned, I'd like to stress out that these are my figures for my _own laptop_. It would be nice if you can verify all of this with other achitectures (your Core2 machine seems different enough). I can run the benchmarks on Windows (installed in the same laptop) too. Tell me if you are interested on me doing this. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" -------------- next part -------------- A non-text attachment was scrubbed... Name: sort-string-bench.py Type: application/x-python Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: copy_string2.c Type: text/x-csrc Size: 179 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ubuntu-pentium4-newqsort.pdf Type: application/pdf Size: 21310 bytes Desc: not available URL: From charlesr.harris at gmail.com Wed Feb 13 13:44:05 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 11:44:05 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802131856.11563.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802131856.11563.faltet@carabos.com> Message-ID: On Feb 13, 2008 10:56 AM, Francesc Altet wrote: > A Wednesday 13 February 2008, Charles R Harris escrigu?: > > OK, > > > > The new quicksorts are in svn. Francesc, can you check them out? > > > > Looks good here. However, you seem to keep using your own copy_string() > instead of plain memcpy(). In previous benchmarks, I've seen that > copy_string() is faster than memcpy only for small values of the length > of the block to be copied. > Yes, I noticed that your benchmark program crossed over to using memcpy at 16 chars, and I will probably add that feature. I was being conservative to start with. > Finally, you also will have noticed the indirect sort line in the plot. > This is because I was curious about when this method would win a direct > sort. And, by looking at the plot, it seems that the crosspoint is > around strings of 128 bytes (much more in fact that I initially > thought), and starts to be very significant (around 40% faster) at 256 > bytes. So perhaps it would make sense to add the possibility to choose > the indirect method when sorting those large strings. This, of course, > would require more memory for the indices, but using 4 or 8 additional > bytes (depending if we on 32-bit or 64-bit), when each string takes 200 > bytes, doesn't seem too crazy. In any case, it would be nice to > document this in docstrings. > It would be easy to add this feature, but for the moment I think the best thing is to document it. Another fairly easy change that could be made is to support strided arrays. That might speed sorting of non-contiguous arrays and sorts on axis other than -1. The only reason it isn't there now is that I originally wrote the sorting routines for numarray and numarray's upper level interface passed contiguous arrays to the sort functions. > Be warned, I'd like to stress out that these are my figures for my _own > laptop_. It would be nice if you can verify all of this with other > achitectures (your Core2 machine seems different enough). I can run > the benchmarks on Windows (installed in the same laptop) too. Tell me > if you are interested on me doing this. > Its easy enough to test if you compile from svn, just add your new copy function and change the name in this line: #copy=copy_string, copy_ucs4# to use your function instead of copy_string. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Wed Feb 13 14:37:37 2008 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Feb 2008 20:37:37 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802131821.57305.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802131821.57305.faltet@carabos.com> Message-ID: <200802132037.37874.faltet@carabos.com> A Wednesday 13 February 2008, Francesc Altet escrigu?: > A Wednesday 13 February 2008, Bruce Southey escrigu?: > > Hi, > > I added gcc 4.2 from the openSUSE 10.1 repository so I now have > > both the 4.1.2 and 4.2.1 compilers installed. But still have > > glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with > > 4.1.2 so I think that there could be a difference in the compiler > > flags. I don't know enough about those to help but I can test any > > suggestions. > > > > $ gcc --version > > gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) > > $ gcc -O3 sort-string-bench.c -o sort412 > > $ ./sort412 > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 0.630000 > > C qsort with Python style compare: 0.640000 > > NumPy newqsort: 0.360000 > > > > $ gcc-4.2 --version > > gcc-4.2 (GCC) 4.2.1 (SUSE Linux) > > $ gcc-4.2 -O3 sort-string-bench.c -o sort421 > > $ ./sort421 > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 0.620000 > > C qsort with Python style compare: 0.610000 > > NumPy newqsort: 0.550000 > > > > This is the same as: > > $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 > > $ ./sort421 > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 0.710000 > > C qsort with Python style compare: 0.700000 > > NumPy newqsort: 0.550000 > > > > (NumPy newqsort with -O2 alone is 0.60000) > > > > For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' > > is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000 > > That's really interesting. Let me remember my figures for our > Opteron: > > 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) > C qsort with C style compare: 0.640000 > C qsort with Python style compare: 0.600000 > NumPy newqsort: 0.590000 > Just an addedum. I've compiled the benchmark using gcc 4.1.2 using our Opteron machine. Here are the results: SuSe LE 10.3 (gcc 4.1.2, -O3, AMD Opteron @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.620000 C qsort with Python style compare: 0.610000 NumPy newqsort: 0.380000 So, I'm getting a 55% more of performance than by using gcc 4.2.1 (!). Also, I've installed gcc 4.2.1 on my laptop and here are the results: Ubuntu 7.10 (gcc 4.2.1, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.420000 NumPy newqsort: 0.630000 While using gcc 4.1.2, I get: Ubuntu 7.10 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.440000 NumPy newqsort: 0.650000 So, in this case (32-bit platform) gcc 4.2.1 seems to perform similarly to 4.1.2. So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this? Also, MSVC 7.1 32-bit (with opt level /Ox) doesn't seem to find such a best path (the benchmark for newqsort takes 0.92s using MSVC 7.1, while gcc 4.1.2 takes 0.65s using the same machine, a 40% faster). I don't know whether newer versions of MSVC will do better or not, though. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Wed Feb 13 15:01:12 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 13:01:12 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802132037.37874.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802131821.57305.faltet@carabos.com> <200802132037.37874.faltet@carabos.com> Message-ID: On Feb 13, 2008 12:37 PM, Francesc Altet wrote: > A Wednesday 13 February 2008, Francesc Altet escrigu?: > > A Wednesday 13 February 2008, Bruce Southey escrigu?: > > > Hi, > > > I added gcc 4.2 from the openSUSE 10.1 repository so I now have > > > both the 4.1.2 and 4.2.1 compilers installed. But still have > > > glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with > > > 4.1.2 so I think that there could be a difference in the compiler > > > flags. I don't know enough about those to help but I can test any > > > suggestions. > > > > > > $ gcc --version > > > gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) > > > $ gcc -O3 sort-string-bench.c -o sort412 > > > $ ./sort412 > > > Benchmark with 1000000 strings of size 15 > > > C qsort with C style compare: 0.630000 > > > C qsort with Python style compare: 0.640000 > > > NumPy newqsort: 0.360000 > > > > > > $ gcc-4.2 --version > > > gcc-4.2 (GCC) 4.2.1 (SUSE Linux) > > > $ gcc-4.2 -O3 sort-string-bench.c -o sort421 > > > $ ./sort421 > > > Benchmark with 1000000 strings of size 15 > > > C qsort with C style compare: 0.620000 > > > C qsort with Python style compare: 0.610000 > > > NumPy newqsort: 0.550000 > > > > > > This is the same as: > > > $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 > > > $ ./sort421 > > > Benchmark with 1000000 strings of size 15 > > > C qsort with C style compare: 0.710000 > > > C qsort with Python style compare: 0.700000 > > > NumPy newqsort: 0.550000 > > > > > > (NumPy newqsort with -O2 alone is 0.60000) > > > > > > For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' > > > is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000 > > > > That's really interesting. Let me remember my figures for our > > Opteron: > > > > 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) > > C qsort with C style compare: 0.640000 > > C qsort with Python style compare: 0.600000 > > NumPy newqsort: 0.590000 > > > > Just an addedum. I've compiled the benchmark using gcc 4.1.2 using our > Opteron machine. Here are the results: > > SuSe LE 10.3 (gcc 4.1.2, -O3, AMD Opteron @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.620000 > C qsort with Python style compare: 0.610000 > NumPy newqsort: 0.380000 > > So, I'm getting a 55% more of performance than by using gcc 4.2.1 (!). > Also, I've installed gcc 4.2.1 on my laptop and here are the results: > > Ubuntu 7.10 (gcc 4.2.1, -O3, Intel Pentium 4 @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 2.450000 > C qsort with Python style compare: 2.420000 > NumPy newqsort: 0.630000 > > While using gcc 4.1.2, I get: > > Ubuntu 7.10 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 2.450000 > C qsort with Python style compare: 2.440000 > NumPy newqsort: 0.650000 > > So, in this case (32-bit platform) gcc 4.2.1 seems to perform similarly > to 4.1.2. > > So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, > AMD Opteron architecture) and that newqsort performs really well in > general (provided that the compiler can find the best path for > optimizing its code). Anyone using a 64-bit platform and having both > gcc 4.1.2 and 4.2.1 installed can confirm this? > > Also, MSVC 7.1 32-bit (with opt level /Ox) doesn't seem to find such a > best path (the benchmark for newqsort takes 0.92s using MSVC 7.1, while > gcc 4.1.2 takes 0.65s using the same machine, a 40% faster). I don't > know whether newer versions of MSVC will do better or not, though. > Now we need someone to try ICC ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 13 15:48:20 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 13 Feb 2008 20:48:20 +0000 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> Message-ID: <1e2af89e0802131248u3e056678i2e956bb9f2d9b5f8@mail.gmail.com> Hi, > Is it possible, in fact, to do an inplace sort on an array with > axis=None (ie flat sort)? > > Should the sort method have its docstring changed to reflect the fact > that axis=None is not valid? Sorry to press on, but it would be good to resolve this somehow. Is there some reason not to: Suggestion 1: Wrap the .sort method call in a tiny python wrapper of the form: def sort(self, axis=-1, kind='quicksort', order=None): if axis=None: _c_sort(self.ravel(), axis, kind, order) else: _c_sort(self, axis, kind, order) or 2: Modify the method docstring to remove axis=None as valid option. I'm happy to do either. Matthew From matthew.brett at gmail.com Wed Feb 13 15:52:25 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 13 Feb 2008 20:52:25 +0000 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: <1e2af89e0802131248u3e056678i2e956bb9f2d9b5f8@mail.gmail.com> References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> <1e2af89e0802131248u3e056678i2e956bb9f2d9b5f8@mail.gmail.com> Message-ID: <1e2af89e0802131252t2bf0d297y633c670a164a135a@mail.gmail.com> Ah, To answer my own question: > Suggestion 1: > Wrap the .sort method call in a tiny python wrapper of the form: > > def sort(self, axis=-1, kind='quicksort', order=None): > if axis=None: > _c_sort(self.ravel(), axis, kind, order) > else: > _c_sort(self, axis, kind, order) I guess this is not good because self.ravel might return a copy, in situations I don't think I fully grasp? Guessing that there is no other way to do a guaranteed inplace sort for axis=None, I guess that making that clear in the method docstring is the best way to go? Matthew From aisaac at american.edu Wed Feb 13 16:09:50 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 13 Feb 2008 16:09:50 -0500 Subject: [Numpy-discussion] unexpected downcast Message-ID: On Tue, 12 Feb 2008, dmitrey apparently wrote: > from numpy import * > a = array((1.0, 2.0), float128) > b=asfarray(a) > type(a[0]) > # > type(b[0]) > # > __version__ > '1.0.5.dev4767' Dmitrey noted an unexpected down cast (above). Is there a reason for it? Or should there be a ticket? Thank you, Alan Isaac From robert.kern at gmail.com Wed Feb 13 16:13:34 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Feb 2008 15:13:34 -0600 Subject: [Numpy-discussion] unexpected downcast In-Reply-To: References: Message-ID: <47B35D7E.6070700@gmail.com> Alan G Isaac wrote: > On Tue, 12 Feb 2008, dmitrey apparently wrote: >> from numpy import * >> a = array((1.0, 2.0), float128) >> b=asfarray(a) >> type(a[0]) >> # >> type(b[0]) >> # >> __version__ >> '1.0.5.dev4767' > > > Dmitrey noted an unexpected down cast (above). > Is there a reason for it? That's just what asfarray is designed to do. If you don't give it a dtype, it uses float64. Changing it would be a redesign of the function that may break code. The amount of code is probably minimal, so I'm only -0 on changing it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Wed Feb 13 16:27:08 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 14:27:08 -0700 Subject: [Numpy-discussion] sort method raises unexpected error with axis=None In-Reply-To: <1e2af89e0802131252t2bf0d297y633c670a164a135a@mail.gmail.com> References: <1e2af89e0802101150r37c4baaag49117c87741e1f5e@mail.gmail.com> <1e2af89e0802121354o298c6e3r5f60497c9266d580@mail.gmail.com> <1e2af89e0802131248u3e056678i2e956bb9f2d9b5f8@mail.gmail.com> <1e2af89e0802131252t2bf0d297y633c670a164a135a@mail.gmail.com> Message-ID: On Feb 13, 2008 1:52 PM, Matthew Brett wrote: > Ah, > > To answer my own question: > > > Suggestion 1: > > Wrap the .sort method call in a tiny python wrapper of the form: > > > > def sort(self, axis=-1, kind='quicksort', order=None): > > if axis=None: > > _c_sort(self.ravel(), axis, kind, order) > > else: > > _c_sort(self, axis, kind, order) > > I guess this is not good because self.ravel might return a copy, in > situations I don't think I fully grasp? Guessing that there is no > other way to do a guaranteed inplace sort for axis=None, I guess that > making that clear in the method docstring is the best way to go? > I think it is possible to make sort work with the None keyword, So I think the question is whether or not we want it to. If we do, then the current lack is a bug, if we don't, then the documentation needs to be fixed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Feb 13 16:27:48 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 13 Feb 2008 13:27:48 -0800 Subject: [Numpy-discussion] unexpected downcast In-Reply-To: <47B35D7E.6070700@gmail.com> References: <47B35D7E.6070700@gmail.com> Message-ID: <47B360D4.50609@noaa.gov> Robert Kern wrote: > That's just what asfarray is designed to do. If you don't give it a dtype, it > uses float64. For the record, it upcasts float32 arrays also. So why does it exist at all? Is is just syntactic sugar for: asarray(a, dtype=float64) Which kind of seems to be not worth it. If, on the other hand, it meant: "make this a floating point array, but keep the input precision if it's already a float type", that could be useful (and not completely trivial to write yourself). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sransom at nrao.edu Wed Feb 13 17:19:55 2008 From: sransom at nrao.edu (Scott Ransom) Date: Wed, 13 Feb 2008 17:19:55 -0500 Subject: [Numpy-discussion] String sort In-Reply-To: <200802132037.37874.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802131821.57305.faltet@carabos.com> <200802132037.37874.faltet@carabos.com> Message-ID: <200802131719.55891.sransom@nrao.edu> On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote: > So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very > least, AMD Opteron architecture) and that newqsort performs really > well in general (provided that the compiler can find the best path > for optimizing its code). Anyone using a 64-bit platform and having > both gcc 4.1.2 and 4.2.1 installed can confirm this? Here are results from a 64-bit Debian system using a Core2 Duo 2.66 GHz processor. I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) with -O2 and -O3. Summary: There is a big difference between -02 and -O3. gcc-4.2 seems slightly better than the other gccs. And the newqsort is a lot faster (always) than the libc version. Scott eiger:/data1$ ./sort346_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.450000 eiger:/data1$ ./sort346_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.520000 NumPy newqsort: 0.350000 eiger:/data1$ ./sort413_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.560000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.420000 eiger:/data1$ ./sort413_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.540000 C qsort with Python style compare: 0.500000 NumPy newqsort: 0.280000 eiger:/data1$ ./sort423_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.560000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.390000 eiger:/data1$ ./sort423_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.530000 C qsort with Python style compare: 0.500000 NumPy newqsort: 0.270000 eiger:/data1$ ./sort43_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.340000 eiger:/data1$ ./sort43_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.530000 C qsort with Python style compare: 0.510000 NumPy newqsort: 0.330000 -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From charlesr.harris at gmail.com Wed Feb 13 17:39:53 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 15:39:53 -0700 Subject: [Numpy-discussion] test failures Message-ID: Hi Stefan, I believe these come from your latest commit. File "/usr/lib/python2.5/site-packages/numpy/lib/tests/test_ufunclike.py", line 25, in test_ufunclike Failed example: nx.sign(a) Expected: array([ 1., -1., 0., 0., 1., -1.]) Got: array([ 1., -1., -1., 0., 1., -1.]) ********************************************************************** File "/usr/lib/python2.5/site-packages/numpy/lib/tests/test_ufunclike.py", line 40, in test_ufunclike Failed example: nx.sign(a, y) Expected: array([ True, True, False, False, True, True], dtype=bool) Got: array([ True, True, True, False, True, True], dtype=bool) ********************************************************************** File "/usr/lib/python2.5/site-packages/numpy/lib/tests/test_ufunclike.py", line 43, in test_ufunclike Failed example: y Expected: array([ True, True, False, False, True, True], dtype=bool) Got: array([ True, True, True, False, True, True], dtype=bool) ............................................................................................................................................... ====================================================================== FAIL: Ticket #588 ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", line 734, in check_dot_negative_stride assert_equal(np.dot(x,z),np.dot(x,y2)) File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 143, in assert_equal return assert_array_equal(actual, desired, err_msg) File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 225, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 217, in assert_array_compare assert cond, msg AssertionError: Arrays are not equal (mismatch 100.0%) x: array([[ 55924.]]) y: array([[ 640000.]]) ====================================================================== FAIL: Ticket #396 ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", line 600, in check_poly1d_nan_roots self.failUnlessRaises(np.linalg.LinAlgError,getattr,p,"r") AssertionError: LinAlgError not raised ---------------------------------------------------------------------- Ran 805 tests in 2.983s FAILED (failures=2) Out[1]: Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournapeau at cslab.kecl.ntt.co.jp Wed Feb 13 20:58:08 2008 From: cournapeau at cslab.kecl.ntt.co.jp (David Cournapeau) Date: Thu, 14 Feb 2008 10:58:08 +0900 Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <79117.1061.qm@web34414.mail.mud.yahoo.com> References: <79117.1061.qm@web34414.mail.mud.yahoo.com> Message-ID: <1202954288.29123.66.camel@bbc8> On Wed, 2008-02-13 at 08:20 -0800, Lou Pecora wrote: > --- David Cournapeau wrote: > > > But the real question is : if you are concerned with > > code bload, why > > using static lib at all ? Why not using shared > > library, which is > > exactly designed to solve what you are trying to do > > ? > > cheers, > > David > > Yes, a good question. Two reasons I started off with > the static library. One is that Gnu instructions > claimed the dynamic library did not always build > properly on the Mac OS X. If true, that's a good argument. I don't know the state of libtool of mac os X (the part of autotools which deals with building libraries in a cross platform way). Given the history of apple with open source, I would not be surprised if the general support was subpar compared to other unices. > So I just built the static > GSL and figured if I got that to link up to my code, I > could then spend some time trying the dynamic build. > The other reason is that I am just learning this and I > am probably backing into the "right" way to do this > rather than starting right off with the right way. > Maybe my worries about bloat and (even more) time to > load are not important for the GSL and the code will > load fast enough and not take up too much in resources > to matter. I don't know what kind of applications you are developing, but taking care of the time to load the application because of the huge number of symbols seems like really premature optimization to me. That's the kind of problems you don't see if your applications are not huge (or developed with C++, which put a huge pressure on the linker/loader tools by its very nature). Also, note that all modern OS (this includes even windows since NT) do not load the whole shared library in memory, and that two applications needing the GSL will share the same version in memory. The same "physical page" of a shared library can be "mapped" into different address spaces (for different processes). I use "", because that's a huge over-simplification, and that's where it reaches my own understanding of the thing. This sharing cannot happen for static libraries. cheers, David From dg.gmane.comp.python.numeric.general at thesamovar.net Wed Feb 13 21:12:58 2008 From: dg.gmane.comp.python.numeric.general at thesamovar.net (Dan Goodman) Date: Thu, 14 Feb 2008 02:12:58 +0000 (UTC) Subject: [Numpy-discussion] Proxy array class and units Message-ID: Hi all, I'm looking for some advice on how to implement a 'unit checking proxy array'. Background: I'm writing a package to run simulations which make extensive use of linear algebra, for which I'm using numpy. However - it is important to my package that quantities can have dimesions, so I've written a class Quantity subclassed from float which includes the physical dimensions and raises exceptions etc. if you try to add volts to kilograms or whatever. This seems to work fine, but I also wanted arrays containing physical dimensions. I wrote a class qarray subclassed from numpy.ndarray which mostly just copies the functionality of an array with dtype=object. Here is the issue. The code for my package internally doesn't use these qarray objects, it uses numpy arrays because it would obviously be terribly slow to be repeatedly checking if the dimensions were consistent all the time. However, I want to present a (somewhat) consistent strict unit-checking front to the user of this package. As one part of this, I would like a class that appears to act exactly like a qarray (which itself is supposed to act very much like a numpy array) but maintains a connection with its numpy array counterpart in my package. Suppose I have an array X in my package, then I want an object Y which the user is free to manipulate like an array, but if they do something like Y[0]=2*volt then the object Y does (1) check that Y[0] has volt units, (2) if so, update Y[0] and also X[0]. My question is about what is the best way to implement something like this. So far, what I've done is written a proxy array class (call it pqarray say) derived from qarray (which is pretty much just array with dtype=object but not exactly as it does some other things too). When my package creates a pqarray for the user, it does the following: take the underlying numpy array X; create Y a qarray based on X with the correct units (this necessarily copies the data in X because the dtype is different); create Z from X, Y, a pqarray that knows that Y is based on X. The class pqarray simply overrides the __setitem__ method to update itself and X if the units are correct (or raise an exception). Now, this does actually work, but it's not terribly elegant and there is a problem. The standard way that this proxy array will be accessed is through another object, let's just call it P. P has a property val (it has to be a property in general, because the internal numpy array could change at any moment). If you write P.val it returns a pqarray corresponding to the internal representation of the data. Let's suppose P.val has length 1000. Suppose the user writes: for i in range(1000): P.val[i] = i*volt Now what happens is that this creates a new pqarray 1000 times, and this involves copying 1000 elements each time for something of the order of 1M operations total when it should really only be of the order of 1000 operations total. Now, obviously given the way it's written, the user 'ought' to write: val = P.val for i in range(1000): val[i] = i*volt or even better (and what you would do in practice): P.val = qarray(range(1000))*volt and then it will be O(1000) operations, but (a) the user shouldn't need to know this, and (b) my pqarray class seems horrifically ugly and I want something better. It seems to me that my options are: (1) Use my current solution, and instruct the user about the potential problem (which probably won't come up too often anyway). (2) Write pqarray to be a subclass of numpy.ndarray so that no data is copied in creating the pqarray, and give pqarray __getitem__ and __setitem__ methods that return a Quantity with the right units and do unit checking, respectively. I have actually implemented this, and it sort of works but I have some worries about it. For example, if Y is a pqarray of the numpy array X then it seems print Y will give the same result as print X (i.e. it forgets the units). I could presumably look this up and override this behaviour but that's a game which has no end (or at least, a very distant end that might end up with me having reimplemented the whole of numpy.ndarray). (3) Have some sort of caching mechanism, so that P.val returns the cached value unless the underlying numpy array is changed. The problem with this is that you have to keep track of whether or not the underlying numpy array has changed which could be fiddly and rather unsafe. (4) Give up hoping that P.val can be a subclass of an array, make it a new class that keeps a copy of the necessary units and the underlying array, and can do some basic array operations. Since it's not an array type object itself, the user can be told not to expect it to behave like one. This would be quite a sad solution though, because I want the user to be able to write things like mean(P.val) and have them just work as they'd expect (which works at the moment). (5) Something I haven't thought of yet...? Any advice of any sort would be very much appreciated. I did search for relevant information but nothing much came up. Plenty about how to do subclassing of numpy array's, but that's not really my problem here - it's more specific than that. Many thanks in advance, -- Dan Goodman http://thesamovar.net/contact From charlesr.harris at gmail.com Wed Feb 13 22:22:39 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Feb 2008 20:22:39 -0700 Subject: [Numpy-discussion] Question for Travis Message-ID: Travis, I notice that you used PyDataMem_NEW, PyDimMem_NEW, and friends to allocate memory in the sort routines. Is there a good reason to use these rather than malloc? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Wed Feb 13 22:30:42 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 13 Feb 2008 21:30:42 -0600 Subject: [Numpy-discussion] Question for Travis In-Reply-To: References: Message-ID: <47B3B5E2.9090607@enthought.com> Charles R Harris wrote: > Travis, > > I notice that you used PyDataMem_NEW, PyDimMem_NEW, and friends to > allocate memory in the sort routines. Is there a good reason to use > these rather than malloc? Only to allow for the possibility of different allocation routines. There is an option to use the Python memory allocators, for example. For temporary memory, though malloc and free is fine. -Travis From peridot.faceted at gmail.com Wed Feb 13 23:07:58 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 13 Feb 2008 23:07:58 -0500 Subject: [Numpy-discussion] Proxy array class and units In-Reply-To: References: Message-ID: On 13/02/2008, Dan Goodman wrote: > Background: I'm writing a package to run simulations which make extensive use of > linear algebra, for which I'm using numpy. However - it is important to my > package that quantities can have dimesions, so I've written a class Quantity > subclassed from float which includes the physical dimensions and raises > exceptions etc. if you try to add volts to kilograms or whatever. This seems to > work fine, but I also wanted arrays containing physical dimensions. I wrote a > class qarray subclassed from numpy.ndarray which mostly just copies the > functionality of an array with dtype=object. Here is the issue. The code for my > package internally doesn't use these qarray objects, it uses numpy arrays > because it would obviously be terribly slow to be repeatedly checking if the > dimensions were consistent all the time. However, I want to present a (somewhat) > consistent strict unit-checking front to the user of this package. As one part > of this, I would like a class that appears to act exactly like a qarray (which > itself is supposed to act very much like a numpy array) but maintains a > connection with its numpy array counterpart in my package. I'm not sure I understand all the different layers you're talking about here. I think that I would attack your problem in the following way: Use ScientificPython's PhysicalQuantity (alone and in object arrays) or "unum"s to represent quantities with units: http://books.google.com/books?id=3nR75KSvsq4C&pg=PA169&lpg=PA169&dq=numpy+physicalquantity&source=web&ots=_cmXEC0qpx&sig=JBEz9BlegnIQJor-1BwnAdRTKLE Object arrays are fairly horrible, but having one unit per array shouldn't be very inefficient. It could cause some headaches, since it's sometimes useful to have an array with mixed units (differential equation solvers often require this when you convert a higher-order problem to first-order, for example), but it should be quite efficient computationally. So I would pull together an array that held a collection of quantities all in the same units. Type checking then becomes a once-per-array operation, which is relatively cheap. The way I'd implement this is (by preference) by grabbing a version off the net, or (if that fails) as a subclass of ndarray. If I want to do some heavy-duty array mangling without type checking getting in my way, I'll just convert to an ndarray (either normalizing the units or saving them), do the heavy lifting, and reapply the units afterwards. I don't really see why you're talking about proxy classes... Perhaps your problem is actually two problems: implementing unit arrays, and doing something I haven't understood. Is there any reason not to separate the two, and use one of the existing solutions for unit arrays? Anne From Garry.Willgoose at newcastle.edu.au Thu Feb 14 01:24:00 2008 From: Garry.Willgoose at newcastle.edu.au (Garry Willgoose) Date: Thu, 14 Feb 2008 17:24:00 +1100 Subject: [Numpy-discussion] f2py: sharing F90 module data between modules In-Reply-To: References: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> Message-ID: <752B77BF-07F1-45BD-9C5E-E827DF585004@newcastle.edu.au> Thanks for that. The docs suggest library dl is Unix only. Does that mean this solution will not work on Windows? Windows is on my implementation roadmap but I'm not quite there yet to test it. I guess I am now thinking maybe I can assemble (using f2py) an (aggregated) shared library on the fly from the individual module shared libraries when I open my modelling environment. I could check the aggregated library mod dates against all the .so files of the components and only rebuild the aggregated library if there was a newer component than the aggregated library. That appears to work (and is fast) except for one behaviour of f2py. If I give f2py a list of files that are ALL .so (i.e. no fortran files) then f2py quits without actually doing anything, even if all the component shared libraries all have perfectly fine pythons interfaces from previous f2py builds. I can give it a trivial fortran file (module .. end module) and it works fine. f2py -c --fcompiler=g95 --verbose -m stuff -L/Library/Frameworks/ Python.framework/Versions/Current/lib/python2.5/config -lpython2.5 aread8.so fglobal_data.so # doesn't work f2py -c --fcompiler=g95 --verbose -m stuff -L/Library/Frameworks/ Python.framework/Versions/Current/lib/python2.5/config -lpython2.5 aread8.so fglobal_data.so junk.f90 # does work Why is that problem? I can envisage a user that just wants to use the environment without writing any additional fortran modules (and thus may not even have an installed fortran compiler) and if they screw up mod dates on the files (by say a ftp from one machine to another ... for instance on our cluster the compiler is only installed on one machine and only binaries are moved around the cluster) then the environment might want to reassemble (with f2py) the aggregated library because it (erroneously) thinks there is a newer component shared library. This will fail because f2py quits when asked to process ONLY .so files. If I have a trivial fortran file to force f2py then this forces users to have a fortran compiler on their machine, even if they do not want to actually compile a new fortran module component, simply because f2py will not operate unless it is offered at least one fortran file. Does this make sense or am I just being thick about this? Is there a way of making f2py merge a number of existing shared libraries into a single library without having to compile any fortran. I guess I could just invoke the linker directly in the case where there are no fortran files to compile but is nice being able to use distutils to get away from platform dependencies. >> according to which makes your goal unachivable because of how >> Python loads shared libraries *by default*, see below. > >> Try to use sys.setdlopenflags(...) before importing f2py generated >> extension modules and then reset the state using sys.setdlopenflags >> (0). > > I also had to do something similar for solving a different problem, > feel free to reuse the code here. This way, you have chances to make > it working in a many platforms. You can put this in a __init__.py, and > next import all your extensions inside the last try/finally block. > > http://projects.scipy.org/mpi4py/browser/mpi4py/trunk/src/_rtld.py > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion ==================================================================== Prof Garry Willgoose, Australian Professorial Fellow in Environmental Engineering, Director, Centre for Climate Impact Management (C2IM), School of Engineering, The University of Newcastle, Callaghan, 2308 Australia. Centre webpage: www.c2im.org.au Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri PM-Mon) FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and Telluric) Env. Engg. Secretary: (International) +61 2 4921 6042 email: garry.willgoose at newcastle.edu.au; g.willgoose at telluricresearch.com email-for-life: garry.willgoose at alum.mit.edu personal webpage: www.telluricresearch.com/garry ==================================================================== "Do not go where the path may lead, go instead where there is no path and leave a trail" Ralph Waldo Emerson ==================================================================== From faltet at carabos.com Thu Feb 14 03:27:26 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 09:27:26 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802131719.55891.sransom@nrao.edu> References: <200802081329.35470.faltet@carabos.com> <200802132037.37874.faltet@carabos.com> <200802131719.55891.sransom@nrao.edu> Message-ID: <200802140927.27213.faltet@carabos.com> A Wednesday 13 February 2008, Scott Ransom escrigu?: > On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote: > > So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very > > least, AMD Opteron architecture) and that newqsort performs really > > well in general (provided that the compiler can find the best path > > for optimizing its code). Anyone using a 64-bit platform and > > having both gcc 4.1.2 and 4.2.1 installed can confirm this? > > Here are results from a 64-bit Debian system using a Core2 Duo 2.66 > GHz processor. > > I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) > with -O2 and -O3. > > Summary: There is a big difference between -02 and -O3. gcc-4.2 > seems slightly better than the other gccs. And the newqsort is a lot > faster (always) than the libc version. > > Scott > > eiger:/data1$ ./sort346_O2 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.550000 > C qsort with Python style compare: 0.530000 > NumPy newqsort: 0.450000 > > eiger:/data1$ ./sort346_O3 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.550000 > C qsort with Python style compare: 0.520000 > NumPy newqsort: 0.350000 > > eiger:/data1$ ./sort413_O2 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.560000 > C qsort with Python style compare: 0.530000 > NumPy newqsort: 0.420000 > > eiger:/data1$ ./sort413_O3 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.540000 > C qsort with Python style compare: 0.500000 > NumPy newqsort: 0.280000 > > eiger:/data1$ ./sort423_O2 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.560000 > C qsort with Python style compare: 0.530000 > NumPy newqsort: 0.390000 > > eiger:/data1$ ./sort423_O3 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.530000 > C qsort with Python style compare: 0.500000 > NumPy newqsort: 0.270000 > > eiger:/data1$ ./sort43_O2 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.550000 > C qsort with Python style compare: 0.530000 > NumPy newqsort: 0.340000 > > eiger:/data1$ ./sort43_O3 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.530000 > C qsort with Python style compare: 0.510000 > NumPy newqsort: 0.330000 Thanks Scott. Your input is very valuable, as it seems to confirm that the problem must be on gcc 4.2.1 on 64-bit (or Opteron architecture at very least) because apparently your gcc 4.2.3 is doing very well. It's a pity that I don't have a 4.2.3 available in our SuSe/Opteron machine so as to check if the optimization flaw disappears. But it seems to me that the problem could be specific of 4.2.1, and apparently the GCC crew has fixed the problem in 4.2.3, which is a relief. In any case, if anybody have access to an Opteron machine and gcc 4.2.3, it would be great if he can run the benchmark and contribute his feedback. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From jussi.enkovaara at csc.fi Thu Feb 14 06:17:31 2008 From: jussi.enkovaara at csc.fi (Jussi Enkovaara) Date: Thu, 14 Feb 2008 13:17:31 +0200 Subject: [Numpy-discussion] Building a static libnumpy In-Reply-To: <4795CAF2.70404@csc.fi> References: <479590CA.9040404@csc.fi> <5b8d13220801220012p6ac1b15er606e583c2e27af57@mail.gmail.com> <4795CAF2.70404@csc.fi> Message-ID: <47B4234B.7050906@csc.fi> Jussi Enkovaara wrote: > It is of course very cumbersome as one has to specify all the modules which are > written in C before compiling the actual interpreter. I think that the whole > procedure cannot be automatized, but it should be possible to have distutils to > create the static library and produce maybe the lines to be included in > Modules/Setup. Thus, one should first create a minimal interpreter, then build the > necessary extensions statically with this minimal interpreter and distutils, and > at the end create the full featured python interpreter. > > At some point I tried to look the distutils source but I did not have the time to > understand it properly so that I could make the necessary modifications. In case some one is interested in using numpy in systems without dynamic libraries, I have made some notes in our wiki https://wiki.fysik.dtu.dk/gpaw/Platforms_and_Architectures#louhi-csc-fi The tricky part is actually building the special python interpreter, after minor changes to distutils one can create static libraries of all numpy c-extensions with normal "python setup.py build". Best regards, Jussi Enkovaara From pearu at cens.ioc.ee Thu Feb 14 06:45:16 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Thu, 14 Feb 2008 13:45:16 +0200 (EET) Subject: [Numpy-discussion] f2py: sharing F90 module data between modules In-Reply-To: <752B77BF-07F1-45BD-9C5E-E827DF585004@newcastle.edu.au> References: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> <752B77BF-07F1-45BD-9C5E-E827DF585004@newcastle.edu.au> Message-ID: <65031.85.166.27.136.1202989516.squirrel@cens.ioc.ee> On Thu, February 14, 2008 8:24 am, Garry Willgoose wrote: > Thanks for that. The docs suggest library dl is Unix only. Does that > mean this solution will not work on Windows? Windows is on my > implementation roadmap but I'm not quite there yet to test it. I have no idea whether it will work on Windows or not. I would try though as there seems to be other ways than dl to find the needed flags as Lisandro pointed out. > I guess I am now thinking maybe I can assemble (using f2py) an > (aggregated) shared library on the fly from the individual module > shared libraries when I open my modelling environment. I could check > the aggregated library mod dates against all the .so files of the > components and only rebuild the aggregated library if there was a > newer component than the aggregated library. That appears to work > (and is fast) except for one behaviour of f2py. If I give f2py a list > of files that are ALL .so (i.e. no fortran files) then f2py quits > without actually doing anything, even if all the component shared > libraries all have perfectly fine pythons interfaces from previous > f2py builds. I can give it a trivial fortran file (module .. end > module) and it works fine. Note that f2py job is not really perform linking tasks. It is a useful feature that simplifies creating extensions modules, but please, don't complain if it cannot be used as a general linker:) > Why is that problem? I can envisage a user that just wants to use the > environment without writing any additional fortran modules (and thus > may not even have an installed fortran compiler) and if they screw up > mod dates on the files (by say a ftp from one machine to another ... > for instance on our cluster the compiler is only installed on one > machine and only binaries are moved around the cluster) then the > environment might want to reassemble (with f2py) the aggregated > library because it (erroneously) thinks there is a newer component > shared library. This will fail because f2py quits when asked to > process ONLY .so files. If I have a trivial fortran file to force > f2py then this forces users to have a fortran compiler on their > machine, even if they do not want to actually compile a new fortran > module component, simply because f2py will not operate unless it is > offered at least one fortran file. This is not a typical task for f2py. f2py is not a general purpose linker. It's amazing that f2py could even be used for such a task, so I don't think that the above demonstrates any bug of f2py. However, if you are worried about whether users have fortran compilers installed then can you assume that they have a C compiler installed? If so, then instead of trivial Fortran file try using the following trivial .pyf file: python module dummy interface subroutine dummyfunc() fortranname callstatement ; end subroutine dummyfunc end interface end python module dummy that should force f2py to build a shared library dummy.so with no Fortran dependencies. > Does this make sense or am I just being thick about this? Is there a > way of making f2py merge a number of existing shared libraries into a > single library without having to compile any fortran. I guess I could > just invoke the linker directly in the case where there are no > fortran files to compile but is nice being able to use distutils to > get away from platform dependencies. Hopefully the hint above works for you. Pearu From lou_boog2000 at yahoo.com Thu Feb 14 08:57:08 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Thu, 14 Feb 2008 05:57:08 -0800 (PST) Subject: [Numpy-discussion] C Extensions, CTypes and "external code & libraries In-Reply-To: <1202954288.29123.66.camel@bbc8> Message-ID: <307055.54956.qm@web34411.mail.mud.yahoo.com> --- David Cournapeau wrote: > On Wed, 2008-02-13 at 08:20 -0800, Lou Pecora wrote: > > Yes, a good question. Two reasons I started off > with > > the static library. One is that Gnu instructions > > claimed the dynamic library did not always build > > properly on the Mac OS X. > > If true, that's a good argument. I don't know the > state of libtool of > mac os X (the part of autotools which deals with > building libraries in a > cross platform way). Given the history of apple with > open source, I > would not be surprised if the general support was > subpar compared to > other unices. I ran the GSL install again and allowed the dynamic lib to be built. It worked fine so maybe Apple has done better lately. I should have tried it right from the beginning. Anyway, I have a make script that seems to work find and I can link my shared lib to the GSL library or, I would guess, any other library I have. Thanks, again for your helpful suggestions. I will put up my code (it's small) for others to see on this list in a separate message. That might help others not to make the same mistakes I did. > I don't know what kind of applications you are > developing, but taking > care of the time to load the application because of > the huge number of > symbols seems like really premature optimization to > me. That's the kind > of problems you don't see if your applications are > not huge (or > developed with C++, which put a huge pressure on the > linker/loader tools > by its very nature). > > Also, note that all modern OS (this includes even > windows since NT) do > not load the whole shared library in memory, and > that two applications > needing the GSL will share the same version in > memory. The same > "physical page" of a shared library can be "mapped" > into different > address spaces (for different processes). I use "", > because that's a > huge over-simplification, and that's where it > reaches my own > understanding of the thing. This sharing cannot > happen for static > libraries. Good advice. You are right. My code is not large and it loads fast. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From albanese at fbk.eu Thu Feb 14 10:00:20 2008 From: albanese at fbk.eu (Davide Albanese) Date: Thu, 14 Feb 2008 16:00:20 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning Message-ID: <47B45784.80208@fbk.eu> *Machine Learning Py* (MLPY) is a *Python/NumPy* based package for machine learning. The package now includes: * *Support Vector Machines* (linear, gaussian, polinomial, terminated ramps) for 2-class problems * *Fisher Discriminant Analysis* for 2-class problems * *Iterative Relief* for feature weighting for 2-class problems * *Feature Ranking* methods based on Recursive Feature Elimination (rfe, onerfe, erfe, bisrfe, sqrtrfe) and Recursive Forward Selection (rfs) * *Input Data* functions * *Confidence Interval* functions Requires Python >= 2.4 and NumPy >= 1.0.3.* MLPY* is a project of MPBA Group (mpa.fbk.eu) at Fondazione Bruno Kessler (www.fbk.eu). * MLPY* is free software. It is licensed under the GNU General Public License (GPL) version 3 . HomePage: mlpy.fbk.eu From faltet at carabos.com Thu Feb 14 10:34:31 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 16:34:31 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <20080214130858.GD822@ssh.cv.nrao.edu> References: <200802081329.35470.faltet@carabos.com> <200802140927.27213.faltet@carabos.com> <20080214130858.GD822@ssh.cv.nrao.edu> Message-ID: <200802141634.32059.faltet@carabos.com> A Thursday 14 February 2008, escrigu?reu: > > In any case, if anybody have access to an Opteron machine and gcc > > 4.2.3, it would be great if he can run the benchmark and contribute > > his feedback. > > Here it is with gcc 4.2.3 on an Opteron 246 (2.0 GHz): > > uller:~$ ./sort423_O2 # with -O2 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.770000 > C qsort with Python style compare: 0.740000 > NumPy newqsort: 0.630000 > > uller:~$ ./sort423_O3 # with -O3 > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.640000 > C qsort with Python style compare: 0.660000 > NumPy newqsort: 0.400000 And here are my timings with gcc 4.1.3 and using a similar Opteron than yours (270 @ 2.0 GHz): With -O2: Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.750000 C qsort with Python style compare: 0.700000 NumPy newqsort: 0.690000 With -O3: Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.670000 C qsort with Python style compare: 0.620000 NumPy newqsort: 0.380000 So, it seems clear that the GCC people has fixed in 4.2.3 the problem with the optimizer introduced in 4.2.1. Very good! By the way, it's nice to see the wide range of platforms that this list allows to test out :-) Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Thu Feb 14 11:11:33 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 17:11:33 +0100 Subject: [Numpy-discussion] String sort Message-ID: <200802141711.33917.faltet@carabos.com> A Wednesday 13 February 2008, Charles R Harris escrigu?: > On Feb 13, 2008 10:56 AM, Francesc Altet wrote: > > Be warned, I'd like to stress out that these are my figures for my > > _own laptop_. It would be nice if you can verify all of this with > > other achitectures (your Core2 machine seems different enough). I > > can run the benchmarks on Windows (installed in the same laptop) > > too. Tell me if you are interested on me doing this. > > Its easy enough to test if you compile from svn, just add your new > copy function and change the name in this line: > > #copy=copy_string, copy_ucs4# > > to use your function instead of copy_string. I've spoken too fast. I've never compiled NumPy on Windows before, and had problems when trying to compile it using MSVC 7.1 and a recent copy of the repository. Well, in any case, I've exercised the Opteron platform, using gcc 4.1.3 (i.e. the one that can optimize newqsort correctly), and this brings new light to our study. From the plot (attached), it can be drawn the next conclusions: 1) copy_string2 (the combination of manual copy and memcpy) is not better than memcpy for *any* value of the string length in our Opteron platform. Also, the improvements with Pentium4 was not that big (<20%). In consequence, I'd choose to always use memcpy and discard copy_string2. 2) Curiously enough, the indirect sort in Opterons is *always* faster than newqsort+memcpy. For large values of string lengths (> 256), the speed-up can be up to 3x, which is a lot. And I'd say that this keeps true also for most modern processors (read Core2, Barcelona). For older processors (Pentium4), the indirect method can be slower than direct plot for small lengths, but by a very few extent (<10%). Conclusion 2 makes me wonder if it wouldn't be useful the introduction of a new flag in sort, like: """ `indirect` - Use the indirect method for sorting. This requires more memory (4/8 bytes per array element), but for sorting arrays of strings it is almost always faster than the direct approach (default). Beware: this is not the case when using numerical values, where the use of this method for sorting is not recommendable. """ Agreed, that could introduce some confusion, but as this flag would be 'False' by default, not many people should bother about its existence, and can definitely help to people who cares about sorting string performance. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" ------------------------------------------------------- -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" -------------- next part -------------- A non-text attachment was scrubbed... Name: suse-opteron-newqsort.pdf Type: application/pdf Size: 20365 bytes Desc: not available URL: From dg.gmane.comp.python.numeric.general at thesamovar.net Thu Feb 14 11:18:44 2008 From: dg.gmane.comp.python.numeric.general at thesamovar.net (Dan Goodman) Date: Thu, 14 Feb 2008 16:18:44 +0000 (UTC) Subject: [Numpy-discussion] Proxy array class and units References: Message-ID: Hi Anne, Thanks for your reply. As you say there are a few different problems in here. The first is about implementing a system of physical quantities with units. I reviewed various options for this including the ones you mentioned (unum and ScientificPython), but they all had some important features missing, so I implemented my own class which does this (building on the ideas they used). This is finished now and I don't need to worry about it. The problem I'm working on now is the best way to deal with arrays of quantities with units, and efficient computations. At the moment, what I'm going for is an internal implementation (the code in the package) which knows about the units of the arrays, but stores them as numpy arrays for speed. What I want to do though is to have it so that as far as the user of the package is concerned, everything appears to have units (single quantities or arrays of quantities). One way of doing this is as you have suggested, defining an array class that has a single unit. This is quite appealing and I'm trying to implement this in a way that does everything I want it to do (Unum does this but it has some odd behaviours that make it unsuitable for what I want to do). There is still a problem though that one of the main data structures in my package is a 2d array with each column having a different unit. This wouldn't be a huge problem because I can pass the user slices of that array which have a single unit. Perhaps a more specific question I can ask is: what considerations do you need to take into account when subclassing numpy arrays in a way that changes their semantics in a substantial way like adding units to them? The documentation that I have found only seems to deal with fairly simple cases where you are just adding something to the array (like in the http://scipy.org/Subclasses document). For example, when I tried to add a single unit to a numpy array, I tried overriding __getitem__(self,i) so that it returned numpy.ndarray.__getitem__(self,i)*self.unit which kind of works except that when I print the array you don't see the units, just the underlying float values. As I said before, I can probably override this behaviour, but how many other similar things like that would I need to do? Dan From charlesr.harris at gmail.com Thu Feb 14 11:58:03 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Feb 2008 09:58:03 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802141711.33917.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> Message-ID: On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet wrote: > A Wednesday 13 February 2008, Charles R Harris escrigu?: > > On Feb 13, 2008 10:56 AM, Francesc Altet wrote: > > > Be warned, I'd like to stress out that these are my figures for my > > > _own laptop_. It would be nice if you can verify all of this with > > > other achitectures (your Core2 machine seems different enough). I > > > can run the benchmarks on Windows (installed in the same laptop) > > > too. Tell me if you are interested on me doing this. > > > > Its easy enough to test if you compile from svn, just add your new > > copy function and change the name in this line: > > > > #copy=copy_string, copy_ucs4# > > > > to use your function instead of copy_string. > > I've spoken too fast. I've never compiled NumPy on Windows before, and > had problems when trying to compile it using MSVC 7.1 and a recent copy > of the repository. Well, in any case, I've exercised the Opteron > platform, using gcc 4.1.3 (i.e. the one that can optimize newqsort > correctly), and this brings new light to our study. > > From the plot (attached), it can be drawn the next conclusions: > > 1) copy_string2 (the combination of manual copy and memcpy) is not > better than memcpy for *any* value of the string length in our Opteron > platform. Also, the improvements with Pentium4 was not that big > (<20%). In consequence, I'd choose to always use memcpy and discard > copy_string2. > Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy the default for all string lengths because I've read that memcpy was much improved in later gcc (>= 4.1 ?), but known slow in older versions. So perhaps in a year or two when the newer compilers are more widespread would be a better time to make the change. The switch at the 16 char length shouldn't make that much difference in practice. I'll put a comment in the source so that the thought won't get lost. BTW, using copy_string2 much improved the performance of the string mergesort where a lot of data needs to be copied to the work array. It's now half as fast as quicksort instead of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of all. Slow but safe. > 2) Curiously enough, the indirect sort in Opterons is *always* faster > than newqsort+memcpy. For large values of string lengths (> 256), the > speed-up can be up to 3x, which is a lot. And I'd say that this keeps > true also for most modern processors (read Core2, Barcelona). For older > processors (Pentium4), the indirect method can be slower than direct > plot for small lengths, but by a very few extent (<10%). > The new indirect quicksort for strings is faster than the old qsort based default, so perhaps that is also making a difference. > > Conclusion 2 makes me wonder if it wouldn't be useful the introduction > of a new flag in sort, like: > > """ > `indirect` - Use the indirect method for sorting. This requires more > memory (4/8 bytes per array element), but for sorting arrays of strings > it is almost always faster than the direct approach (default). Beware: > this is not the case when using numerical values, where the use of this > method for sorting is not recommendable. > """ > I'm more inclined to leave this to the user. I have a todo to add a function to numpy that makes it easier to use the argsort output to sort multidimensional arrays, I'll name it argtake or some such and it will use the argsort output along with an axis argument. It won't be quite as memory efficient for multidimensional arrays, but it should work about the same in the 1D case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Feb 14 12:18:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 14 Feb 2008 11:18:06 -0600 Subject: [Numpy-discussion] String sort In-Reply-To: <200802141634.32059.faltet@carabos.com> References: <200802081329.35470.faltet@carabos.com> <200802140927.27213.faltet@carabos.com> <20080214130858.GD822@ssh.cv.nrao.edu> <200802141634.32059.faltet@carabos.com> Message-ID: Hi, I confirmed the gcc 4.2.3 performance for the Opteron: Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.630000 C qsort with Python style compare: 0.630000 NumPy newqsort: 0.360000 I also installed the Intel icc 10.1 compiler on my Opteron system but I did not use any flags: $ /opt/intel/cc/10.1.008/bin/icc sort-string-bench.c -o icc_sort $ ./icc_sort Benchmark with 1000000 strings of size 15 C qsort with C style compare: 1.030000 C qsort with Python style compare: 0.960000 NumPy newqsort: 0.530000 Just glad to be able to contribute something, Bruce On Thu, Feb 14, 2008 at 9:34 AM, Francesc Altet wrote: > A Thursday 14 February 2008, escrigu?reu: > > > > In any case, if anybody have access to an Opteron machine and gcc > > > 4.2.3, it would be great if he can run the benchmark and contribute > > > his feedback. > > > > Here it is with gcc 4.2.3 on an Opteron 246 (2.0 GHz): > > > > uller:~$ ./sort423_O2 # with -O2 > > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 0.770000 > > C qsort with Python style compare: 0.740000 > > NumPy newqsort: 0.630000 > > > > uller:~$ ./sort423_O3 # with -O3 > > > Benchmark with 1000000 strings of size 15 > > C qsort with C style compare: 0.640000 > > C qsort with Python style compare: 0.660000 > > NumPy newqsort: 0.400000 > > And here are my timings with gcc 4.1.3 and using a similar Opteron than > yours (270 @ 2.0 GHz): > > With -O2: > > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.750000 > C qsort with Python style compare: 0.700000 > NumPy newqsort: 0.690000 > > With -O3: > > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.670000 > C qsort with Python style compare: 0.620000 > NumPy newqsort: 0.380000 > > So, it seems clear that the GCC people has fixed in 4.2.3 the problem > with the optimizer introduced in 4.2.1. Very good! > > By the way, it's nice to see the wide range of platforms that this list > allows to test out :-) > > > > Cheers, > > -- > >0,0< Francesc Altet http://www.carabos.com/ > V V C?rabos Coop. V. Enjoy Data > "-" > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From faltet at carabos.com Thu Feb 14 12:46:05 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 18:46:05 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802141711.33917.faltet@carabos.com> Message-ID: <200802141846.05577.faltet@carabos.com> A Thursday 14 February 2008, Charles R Harris escrigu?: > On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet wrote: > > From the plot (attached), it can be drawn the next conclusions: > > > > 1) copy_string2 (the combination of manual copy and memcpy) is not > > better than memcpy for *any* value of the string length in our > > Opteron platform. Also, the improvements with Pentium4 was not > > that big (<20%). In consequence, I'd choose to always use memcpy > > and discard copy_string2. > > Your copy_string2 is now the version in numpy. I'm hesitant to make > memcpy the default for all string lengths because I've read that > memcpy was much improved in later gcc (>= 4.1 ?), but known slow in > older versions. So perhaps in a year or two when the newer compilers > are more widespread would be a better time to make the change. The > switch at the 16 char length shouldn't make that much difference in > practice. I'll put a comment in the source so that the thought won't > get lost. Well, copy_string2 is only marginally slower than memcpy on modern gcc compilers and processors, so I presume that this is fine. > BTW, using copy_string2 much improved the performance of > the string mergesort where a lot of data needs to be copied to the > work array. It's now half as fast as quicksort instead of 1/3 ;) Heap > sort continues in it's traditional slot as the slowest of all. Slow > but safe. I've seen that you have added specific code for merge and heap sorting algorithms for strings. Looks good. > > 2) Curiously enough, the indirect sort in Opterons is *always* > > faster than newqsort+memcpy. For large values of string lengths (> > > 256), the speed-up can be up to 3x, which is a lot. And I'd say > > that this keeps true also for most modern processors (read Core2, > > Barcelona). For older processors (Pentium4), the indirect method > > can be slower than direct plot for small lengths, but by a very few > > extent (<10%). > > The new indirect quicksort for strings is faster than the old qsort > based default, so perhaps that is also making a difference. Yes, indeed it does! > > Conclusion 2 makes me wonder if it wouldn't be useful the > > introduction of a new flag in sort, like: > > > > """ > > `indirect` - Use the indirect method for sorting. This requires > > more memory (4/8 bytes per array element), but for sorting arrays > > of strings it is almost always faster than the direct approach > > (default). Beware: this is not the case when using numerical > > values, where the use of this method for sorting is not > > recommendable. > > """ > > I'm more inclined to leave this to the user. I have a todo to add a > function to numpy that makes it easier to use the argsort output to > sort multidimensional arrays, I'll name it argtake or some such and > it will use the argsort output along with an axis argument. It won't > be quite as memory efficient for multidimensional arrays, but it > should work about the same in the 1D case. OK. I don't completely grasp what are you trying to do here, but seems like a conservative enough path (in the sense that it won't touch the current parameters of existing sorting methods). Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From faltet at carabos.com Thu Feb 14 12:54:56 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 18:54:56 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802081329.35470.faltet@carabos.com> <200802141634.32059.faltet@carabos.com> Message-ID: <200802141854.56351.faltet@carabos.com> A Thursday 14 February 2008, Bruce Southey escrigu?: > Hi, > I confirmed the gcc 4.2.3 performance for the Opteron: > > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 0.630000 > C qsort with Python style compare: 0.630000 > NumPy newqsort: 0.360000 > > I also installed the Intel icc 10.1 compiler on my Opteron system but > I did not use any flags: > $ /opt/intel/cc/10.1.008/bin/icc sort-string-bench.c -o icc_sort > $ ./icc_sort > Benchmark with 1000000 strings of size 15 > C qsort with C style compare: 1.030000 > C qsort with Python style compare: 0.960000 > NumPy newqsort: 0.530000 That's excellent Bruce. Definitely it looks like the problem with the optimizer in 4.2.1 has been fixed in 4.2.3. And why you haven't used optimization flags with ICC? just curious... Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Thu Feb 14 13:03:56 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Feb 2008 11:03:56 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802141846.05577.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> <200802141846.05577.faltet@carabos.com> Message-ID: On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet wrote: > A Thursday 14 February 2008, Charles R Harris escrigu?: > > On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet > wrote: > > > From the plot (attached), it can be drawn the next conclusions: > > > > > > 1) copy_string2 (the combination of manual copy and memcpy) is not > > > better than memcpy for *any* value of the string length in our > > > Opteron platform. Also, the improvements with Pentium4 was not > > > that big (<20%). In consequence, I'd choose to always use memcpy > > > and discard copy_string2. > > > > Your copy_string2 is now the version in numpy. I'm hesitant to make > > memcpy the default for all string lengths because I've read that > > memcpy was much improved in later gcc (>= 4.1 ?), but known slow in > > older versions. So perhaps in a year or two when the newer compilers > > are more widespread would be a better time to make the change. The > > switch at the 16 char length shouldn't make that much difference in > > practice. I'll put a comment in the source so that the thought won't > > get lost. > > Well, copy_string2 is only marginally slower than memcpy on modern gcc > compilers and processors, so I presume that this is fine. > > > BTW, using copy_string2 much improved the performance of > > the string mergesort where a lot of data needs to be copied to the > > work array. It's now half as fast as quicksort instead of 1/3 ;) Heap > > sort continues in it's traditional slot as the slowest of all. Slow > > but safe. > > I've seen that you have added specific code for merge and heap sorting > algorithms for strings. Looks good. > > > > 2) Curiously enough, the indirect sort in Opterons is *always* > > > faster than newqsort+memcpy. For large values of string lengths (> > > > 256), the speed-up can be up to 3x, which is a lot. And I'd say > > > that this keeps true also for most modern processors (read Core2, > > > Barcelona). For older processors (Pentium4), the indirect method > > > can be slower than direct plot for small lengths, but by a very few > > > extent (<10%). > > > > The new indirect quicksort for strings is faster than the old qsort > > based default, so perhaps that is also making a difference. > > Yes, indeed it does! > > > > Conclusion 2 makes me wonder if it wouldn't be useful the > > > introduction of a new flag in sort, like: > > > > > > """ > > > `indirect` - Use the indirect method for sorting. This requires > > > more memory (4/8 bytes per array element), but for sorting arrays > > > of strings it is almost always faster than the direct approach > > > (default). Beware: this is not the case when using numerical > > > values, where the use of this method for sorting is not > > > recommendable. > > > """ > > > > I'm more inclined to leave this to the user. I have a todo to add a > > function to numpy that makes it easier to use the argsort output to > > sort multidimensional arrays, I'll name it argtake or some such and > > it will use the argsort output along with an axis argument. It won't > > be quite as memory efficient for multidimensional arrays, but it > > should work about the same in the 1D case. > > OK. I don't completely grasp what are you trying to do here, but seems > like a conservative enough path (in the sense that it won't touch the > current parameters of existing sorting methods). > > Looking forward to see the new qsort for strings in NumPy (the specific > version for merge sort is very welcome too!). > I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrey.kroshko at scipy.org Thu Feb 14 13:43:57 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 14 Feb 2008 20:43:57 +0200 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B45784.80208@fbk.eu> References: <47B45784.80208@fbk.eu> Message-ID: <47B48BED.3050705@scipy.org> isn't MLPY a new name to PyML? http://mloss.org/software/view/28/ if no, I guess you'd better add link to your software to http://mloss.org/software/ ("mloss" is "machine learning open source software") Regards, D. Davide Albanese wrote: > *Machine Learning Py* (MLPY) is a *Python/NumPy* based package for > machine learning. > The package now includes: > > * *Support Vector Machines* (linear, gaussian, polinomial, > terminated ramps) for 2-class problems > * *Fisher Discriminant Analysis* for 2-class problems > * *Iterative Relief* for feature weighting for 2-class problems > * *Feature Ranking* methods based on Recursive Feature Elimination > (rfe, onerfe, erfe, bisrfe, sqrtrfe) and Recursive Forward > Selection (rfs) > * *Input Data* functions > * *Confidence Interval* functions > > Requires Python >= 2.4 and NumPy > >= 1.0.3.* > MLPY* is a project of MPBA Group (mpa.fbk.eu) at > Fondazione Bruno Kessler (www.fbk.eu). * > MLPY* is free software. It is licensed under the GNU General Public > License (GPL) version 3 . > > HomePage: mlpy.fbk.eu > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > From faltet at carabos.com Thu Feb 14 13:44:55 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 19:44:55 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802141711.33917.faltet@carabos.com> <200802141846.05577.faltet@carabos.com> Message-ID: <200802141944.56425.faltet@carabos.com> A Thursday 14 February 2008, Charles R Harris escrigu?: > On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet wrote: > > Looking forward to see the new qsort for strings in NumPy (the > > specific version for merge sort is very welcome too!). > > I could never figure out what the merge sort was good for. I did the > indirect version in numarray because I needed a stable sort to > implement lexsort, which was my original aim. I just added the direct > version for completeness. If you have a use for it, I would love to > hear it. Well, I must to confess that I've not used merge sorts yet, but I'd like to test them in the context of my PSI (Partially Sorted Indexes, see [1] for a white paper on a concrete implementation) work. My hope is that, as a merge sort keeps the order of indices of elements that are equal (this is what 'stable' means), this would allow better compression rates for indices (and hence, less I/O effort to bring the indices from disk into memory and ultimately allowing for faster lookup speed). This will probably be only important when one have data distributions with rather low cardinality, but these scenarios can be more frequent/important than one may think. [1] http://www.carabos.com/docs/OPSI-indexes.pdf -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From matthieu.brucher at gmail.com Thu Feb 14 13:48:57 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 14 Feb 2008 19:48:57 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B45784.80208@fbk.eu> References: <47B45784.80208@fbk.eu> Message-ID: Hi, How does it compare to the elarn scikit, especially for the SVM part ? How was it implemented ? Matthieu 2008/2/14, Davide Albanese : > > *Machine Learning Py* (MLPY) is a *Python/NumPy* based package for > machine learning. > The package now includes: > > * *Support Vector Machines* (linear, gaussian, polinomial, > terminated ramps) for 2-class problems > * *Fisher Discriminant Analysis* for 2-class problems > * *Iterative Relief* for feature weighting for 2-class problems > * *Feature Ranking* methods based on Recursive Feature Elimination > (rfe, onerfe, erfe, bisrfe, sqrtrfe) and Recursive Forward > Selection (rfs) > * *Input Data* functions > * *Confidence Interval* functions > > Requires Python >= 2.4 and NumPy > >= 1.0.3.* > MLPY* is a project of MPBA Group (mpa.fbk.eu) at > Fondazione Bruno Kessler (www.fbk.eu). * > MLPY* is free software. It is licensed under the GNU General Public > License (GPL) version 3 . > > HomePage: mlpy.fbk.eu > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 14 14:29:48 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Feb 2008 12:29:48 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802141944.56425.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> <200802141846.05577.faltet@carabos.com> <200802141944.56425.faltet@carabos.com> Message-ID: On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet wrote: > A Thursday 14 February 2008, Charles R Harris escrigu?: > > On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet > wrote: > > > Looking forward to see the new qsort for strings in NumPy (the > > > specific version for merge sort is very welcome too!). > > > > I could never figure out what the merge sort was good for. I did the > > indirect version in numarray because I needed a stable sort to > > implement lexsort, which was my original aim. I just added the direct > > version for completeness. If you have a use for it, I would love to > > hear it. > > Well, I must to confess that I've not used merge sorts yet, but I'd like > to test them in the context of my PSI (Partially Sorted Indexes, see > [1] for a white paper on a concrete implementation) work. My hope is > that, as a merge sort keeps the order of indices of elements that are > equal (this is what 'stable' means), this would allow better > compression rates for indices (and hence, less I/O effort to bring the > indices from disk into memory and ultimately allowing for faster lookup > speed). This will probably be only important when one have data > distributions with rather low cardinality, but these scenarios can be > more frequent/important than one may think. > Well, I take that back a bit. I think mergesort might work best for large memory mapped arrays because it does sequential accesses, which might be more disk efficient than random accesses. Then again, a divide and conquer approach like quicksort eventually becomes localized too. I've never experimented with really large sorts, they might perform differently than the sorts that fit in memory. Insertion sort is supposed to work well for almost sorted sequences, but that application has always seemed a bit specialized to me. Although I'll admit to being occasionally tempted to pull the insertion sorts out of quicksort and mergesort and make them their own (type specific) routines. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Thu Feb 14 15:03:48 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Feb 2008 21:03:48 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802141711.33917.faltet@carabos.com> <200802141944.56425.faltet@carabos.com> Message-ID: <200802142103.49350.faltet@carabos.com> A Thursday 14 February 2008, Charles R Harris escrigu?: > On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet wrote: > > A Thursday 14 February 2008, Charles R Harris escrigu?: > > > On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet > > > > > > > wrote: > > > > Looking forward to see the new qsort for strings in NumPy (the > > > > specific version for merge sort is very welcome too!). > > > > > > I could never figure out what the merge sort was good for. I did > > > the indirect version in numarray because I needed a stable sort > > > to implement lexsort, which was my original aim. I just added the > > > direct version for completeness. If you have a use for it, I > > > would love to hear it. > > > > Well, I must to confess that I've not used merge sorts yet, but I'd > > like to test them in the context of my PSI (Partially Sorted > > Indexes, see [1] for a white paper on a concrete implementation) > > work. My hope is that, as a merge sort keeps the order of indices > > of elements that are equal (this is what 'stable' means), this > > would allow better compression rates for indices (and hence, less > > I/O effort to bring the indices from disk into memory and > > ultimately allowing for faster lookup speed). This will probably > > be only important when one have data distributions with rather low > > cardinality, but these scenarios can be more frequent/important > > than one may think. > > Well, I take that back a bit. I think mergesort might work best for > large memory mapped arrays because it does sequential accesses, which > might be more disk efficient than random accesses. Then again, a > divide and conquer approach like quicksort eventually becomes > localized too. I've never experimented with really large sorts, they > might perform differently than the sorts that fit in memory. Yeah, but I don't really want to use merge sort for out-of-core sorting, but just because it is stable. The main point of a PSI indexing schema is that you don't need to completely sort your dataset (hence the name: "Partially Sorted") in order to get an usable index, and this normally leads to much faster index creation times. > Insertion sort is supposed to work well for almost sorted sequences, > but that application has always seemed a bit specialized to me. > Although I'll admit to being occasionally tempted to pull the > insertion sorts out of quicksort and mergesort and make them their > own (type specific) routines. Maybe I'd also be interested in trying insertion sort out. During the optimization process of an OPSI index, there is a need to sort out a slice of data that is already made of smaller chunks that are already sorted, so chances are that insertion sort could be significantly faster than the merge sort (or even the quick sort) in this scenario. But this is becoming an OT. However, I'd be glad to further dicuss this privately, if you like to. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lou_boog2000 at yahoo.com Thu Feb 14 15:14:03 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Thu, 14 Feb 2008 12:14:03 -0800 (PST) Subject: [Numpy-discussion] Example: How to use ctypes and link to a C library In-Reply-To: <200802141854.56351.faltet@carabos.com> Message-ID: <252323.42770.qm@web34405.mail.mud.yahoo.com> I successfully compiled a shared library for use with CTypes and linked it to an external library (Gnu Scientific Library) on Mac OS X 10.4. I hope this helps Mac people and anyone else who wants to use CTypes to access their own C extensions and use other C libraries in the process. I want to thank several people on this list who gave me many helpful suggestions and asked me good questions. I also want to thank the several people who kept nudging me to try CTypes even though I was reluctant. It is much easier than programming an extension all in C. Below are 4 files that enable building of a C shared library in Mac OS X (10.4) that can be used with CTypes to call a function from the Gnu Scientific Library (a Bessel function program gsl_sf_bessel_J0). You can see that the idea is pretty simple. The code requires that you have ctypes (in site-packages) and GSL (dynlib version in /usr/local/lib) or your desired C library installed. I suspect on other platforms what will be different will be the make file. I do not know enough to provide Linux or Windows versions. I'm sorry. Note: This works best if the libraries are shared (e.g. the GSL library to use is the dynlib version). That way only the code that's needed is loaded when the C functions are called from python. Comments welcome. Of course, I am responsible for any and all mistakes. So, I make no guarantees or warrenties. These are examples and should not be used where loss of property, life, or other dangers exist. #==== Source code 'bess.c' ====================== #include #include "bess.h" #include /* Must include the header to define the function for compiler */ /* ---- test fcns ------------------------- */ #ifdef __cplusplus extern "C" { #endif double J0_bess (double x) { /* Call the GSL Bessel function order 0 of the first kind */ double y = gsl_sf_bessel_J0 (x); /* Print the value right here */ printf ("J0(%g) = %.18e\n", x, y); return y; } #ifdef __cplusplus } #endif #==== Header file 'bess.h' ===================== /* ---- Prototypes -------------------- */ #ifdef __cplusplus extern "C" { #endif double J0_bess(double x); #ifdef __cplusplus } #endif #==== Make file 'bess.make' =================== # ---- Link to existing library in this directory ------------ bess.so: bess.o bess.mak gcc -bundle -flat_namespace -undefined suppress -o bess.so bess.o -lgsl # ---- gcc C compile ------------------ bess.o: bess.c bess.h bess.mak gcc -c bess.c -o bess.o #==== Python file 'bess.py' ======================= #!/usr/local/bin/pythonw import numpy as N import ctypes as C # Put the name of your library in place of 'bess.so' and the path to # it in place of the path below in load_library _bess = N.ctypeslib.load_library('bess.so', '/Users/loupecora/Code_py/test_folder/ctypes_tests/test3ctypes/simplelink-GSL/') _bess.J0_bess.restype = C.c_double _bess.J0_bess.argtypes = [C.c_double] def fcn_J0(x): return _bess.J0_bess(x) x = 0.2 y = fcn_J0(x) print "x, y: %e %.18e" % (x, y) #==== Typical output =============== # The first line is printed from the shared library function J0_bess # The second line is from the python code that called the shared lib. function J0(0.2) = 9.900249722395765284e-01 x, y: 2.000000e-01 9.900249722395765284e-01 -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From stefan at sun.ac.za Thu Feb 14 04:01:44 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Thu, 14 Feb 2008 11:01:44 +0200 Subject: [Numpy-discussion] test failures In-Reply-To: References: Message-ID: <20080214090143.GB31612@mentat.za.net> Hi Charles On Wed, Feb 13, 2008 at 03:39:53PM -0700, Charles R Harris wrote: > I believe these come from your latest commit. My changeset is here: http://projects.scipy.org/scipy/numpy/changeset/4800 I don't see how that would have broken the tests you listed. Regards St?fan From charlesr.harris at gmail.com Thu Feb 14 15:23:39 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Feb 2008 13:23:39 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802142103.49350.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> <200802141944.56425.faltet@carabos.com> <200802142103.49350.faltet@carabos.com> Message-ID: On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet wrote: > A Thursday 14 February 2008, Charles R Harris escrigu?: > > On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet > wrote: > > > A Thursday 14 February 2008, Charles R Harris escrigu?: > > > > On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet > > > > > > > > > > wrote: > > > > > Looking forward to see the new qsort for strings in NumPy (the > > > > > specific version for merge sort is very welcome too!). > > > > > > > > I could never figure out what the merge sort was good for. I did > > > > the indirect version in numarray because I needed a stable sort > > > > to implement lexsort, which was my original aim. I just added the > > > > direct version for completeness. If you have a use for it, I > > > > would love to hear it. > > > > > > Well, I must to confess that I've not used merge sorts yet, but I'd > > > like to test them in the context of my PSI (Partially Sorted > > > Indexes, see [1] for a white paper on a concrete implementation) > > > work. My hope is that, as a merge sort keeps the order of indices > > > of elements that are equal (this is what 'stable' means), this > > > would allow better compression rates for indices (and hence, less > > > I/O effort to bring the indices from disk into memory and > > > ultimately allowing for faster lookup speed). This will probably > > > be only important when one have data distributions with rather low > > > cardinality, but these scenarios can be more frequent/important > > > than one may think. > > > > Well, I take that back a bit. I think mergesort might work best for > > large memory mapped arrays because it does sequential accesses, which > > might be more disk efficient than random accesses. Then again, a > > divide and conquer approach like quicksort eventually becomes > > localized too. I've never experimented with really large sorts, they > > might perform differently than the sorts that fit in memory. > > Yeah, but I don't really want to use merge sort for out-of-core sorting, > but just because it is stable. The main point of a PSI indexing schema > is that you don't need to completely sort your dataset (hence the > name: "Partially Sorted") in order to get an usable index, and this > normally leads to much faster index creation times. > > > Insertion sort is supposed to work well for almost sorted sequences, > > but that application has always seemed a bit specialized to me. > > Although I'll admit to being occasionally tempted to pull the > > insertion sorts out of quicksort and mergesort and make them their > > own (type specific) routines. > > Maybe I'd also be interested in trying insertion sort out. During the > optimization process of an OPSI index, there is a need to sort out a > slice of data that is already made of smaller chunks that are already > sorted, so chances are that insertion sort could be significantly > faster than the merge sort (or even the quick sort) in this scenario. > > But this is becoming an OT. However, I'd be glad to further dicuss this > privately, if you like to. > Well, I don't have much more to say. If you do decide that insertion sort will be useful you won't have to twist my arm much to get it, but I think it is most useful when the data never has to move far. In the case of quicksort and mergesort it is called to deal with small unsorted chunks, but the chunks themselves are already in the right place. Some kind of multi merge sort might be more appropriate to the OPSI index. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 14 15:27:59 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Feb 2008 13:27:59 -0700 Subject: [Numpy-discussion] test failures In-Reply-To: <20080214090143.GB31612@mentat.za.net> References: <20080214090143.GB31612@mentat.za.net> Message-ID: On Thu, Feb 14, 2008 at 2:01 AM, Stefan van der Walt wrote: > Hi Charles > > On Wed, Feb 13, 2008 at 03:39:53PM -0700, Charles R Harris wrote: > > I believe these come from your latest commit. > > My changeset is here: > > http://projects.scipy.org/scipy/numpy/changeset/4800 > > I don't see how that would have broken the tests you listed. > Yeah, it is probably something else, sorry for the noise. I don't see the test failures at home, so I need to see what is going on with my work computer. The failure turned up suddenly and your's was the last commit I (mis)remembered going by. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxander.m at gmail.com Thu Feb 14 15:43:46 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Thu, 14 Feb 2008 15:43:46 -0500 Subject: [Numpy-discussion] List Array References? Message-ID: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> Is there a way to list all of the arrays that are referencing a given array? Similarly, is there a way to get a list of all arrays that are currently in memory? Thanks, Alex From garry.willgoose at newcastle.edu.au Thu Feb 14 16:21:04 2008 From: garry.willgoose at newcastle.edu.au (Garry Willgoose) Date: Fri, 15 Feb 2008 08:21:04 +1100 Subject: [Numpy-discussion] f2py: sharing F90 module data between modules In-Reply-To: <65031.85.166.27.136.1202989516.squirrel@cens.ioc.ee> References: <87366D56-ECB4-43C6-BE95-D17C5181A99E@newcastle.edu.au> <54410.85.166.27.136.1202814653.squirrel@cens.ioc.ee> <752B77BF-07F1-45BD-9C5E-E827DF585004@newcastle.edu.au> <65031.85.166.27.136.1202989516.squirrel@cens.ioc.ee> Message-ID: Pearu, Ohh Pearu I'm not complaining about deficiencies in f2py ... its a great piece of work that makes what I'm doing possible at all. Just like most open source software (including my own ;-) there may be ways to tweak it to do things that are undocumented. >> Why is that problem? I can envisage a user that just wants to use the >> environment without writing any additional fortran modules (and thus >> may not even have an installed fortran compiler) and if they screw up >> mod dates on the files (by say a ftp from one machine to another ... >> for instance on our cluster the compiler is only installed on one >> machine and only binaries are moved around the cluster) then the >> environment might want to reassemble (with f2py) the aggregated >> library because it (erroneously) thinks there is a newer component >> shared library. This will fail because f2py quits when asked to >> process ONLY .so files. If I have a trivial fortran file to force >> f2py then this forces users to have a fortran compiler on their >> machine, even if they do not want to actually compile a new fortran >> module component, simply because f2py will not operate unless it is >> offered at least one fortran file. > > This is not a typical task for f2py. f2py is not a general purpose > linker. It's amazing that f2py could even be used for such a task, > so I don't think that the above demonstrates any bug of f2py. Indeed not typical ... as I recognise ... which is why I wondered if there was an undocumented way to tweak to do what I want to do (I get requests like this on my own software all the time;-). > > However, if you are worried about whether users have fortran compilers > installed then can you assume that they have a C compiler installed? > If so, then instead of trivial Fortran file try using the following > trivial .pyf file: > > python module dummy > interface > subroutine dummyfunc() > fortranname > callstatement ; > end subroutine dummyfunc > end interface > end python module dummy > > that should force f2py to build a shared library dummy.so > with no Fortran dependencies. Perfect ... > ==================================================================== Prof Garry Willgoose, Australian Professorial Fellow in Environmental Engineering, Director, Centre for Climate Impact Management (C2IM), School of Engineering, The University of Newcastle, Callaghan, 2308 Australia. Centre webpage: www.c3im.org.au Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri PM-Mon) FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and Telluric) Env. Engg. Secretary: (International) +61 2 4921 6042 email: garry.willgoose at newcastle.edu.au; g.willgoose at telluricresearch.com email-for-life: garry.willgoose at alum.mit.edu personal webpage: www.telluricresearch.com/garry ==================================================================== "Do not go where the path may lead, go instead where there is no path and leave a trail" Ralph Waldo Emerson ==================================================================== From albanese at fbk.eu Fri Feb 15 02:59:30 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 08:59:30 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B48BED.3050705@scipy.org> References: <47B45784.80208@fbk.eu> <47B48BED.3050705@scipy.org> Message-ID: <47B54662.8000905@fbk.eu> No, it isn't, a new name to PyML, it is a new project. Thank you for your advice! Regards, /* da */ dmitrey ha scritto: > isn't MLPY a new name to PyML? > http://mloss.org/software/view/28/ > > if no, I guess you'd better add link to your software to > http://mloss.org/software/ > ("mloss" is "machine learning open source software") > Regards, D. > > Davide Albanese wrote: > >> *Machine Learning Py* (MLPY) is a *Python/NumPy* based package for >> machine learning. >> The package now includes: >> >> * *Support Vector Machines* (linear, gaussian, polinomial, >> terminated ramps) for 2-class problems >> * *Fisher Discriminant Analysis* for 2-class problems >> * *Iterative Relief* for feature weighting for 2-class problems >> * *Feature Ranking* methods based on Recursive Feature Elimination >> (rfe, onerfe, erfe, bisrfe, sqrtrfe) and Recursive Forward >> Selection (rfs) >> * *Input Data* functions >> * *Confidence Interval* functions >> >> Requires Python >= 2.4 and NumPy >> >= 1.0.3.* >> MLPY* is a project of MPBA Group (mpa.fbk.eu) at >> Fondazione Bruno Kessler (www.fbk.eu). * >> MLPY* is free software. It is licensed under the GNU General Public >> License (GPL) version 3 . >> >> HomePage: mlpy.fbk.eu >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From albanese at fbk.eu Fri Feb 15 03:54:05 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 09:54:05 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: References: <47B45784.80208@fbk.eu> Message-ID: <47B5532D.5000504@fbk.eu> Dear Matthieu, I don't know very well scikit. The Svm is implemented by Sequential Minimal Optimization (SMO). As for Terminated Ramps (TR) you can read this paper: /S. Merler and G. Jurman/* Terminated Ramp - Support Vector Machine: a nonparametric data dependent kernel* Neural Network, 19(10), 1597-1611, 2006. /* da */ Matthieu Brucher ha scritto: > Hi, > > How does it compare to the elarn scikit, especially for the SVM part ? > How was it implemented ? > > Matthieu > > 2008/2/14, Davide Albanese >: > > *Machine Learning Py* (MLPY) is a *Python/NumPy* based package for > machine learning. > The package now includes: > > * *Support Vector Machines* (linear, gaussian, polinomial, > terminated ramps) for 2-class problems > * *Fisher Discriminant Analysis* for 2-class problems > * *Iterative Relief* for feature weighting for 2-class problems > * *Feature Ranking* methods based on Recursive Feature Elimination > (rfe, onerfe, erfe, bisrfe, sqrtrfe) and Recursive Forward > Selection (rfs) > * *Input Data* functions > * *Confidence Interval* functions > > Requires Python >= 2.4 and NumPy > >= 1.0.3.* > MLPY* is a project of MPBA Group (mpa.fbk.eu) at > Fondazione Bruno Kessler (www.fbk.eu). * > MLPY* is free software. It is licensed under the GNU General Public > License (GPL) version 3 . > > HomePage: mlpy.fbk.eu > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher From matthieu.brucher at gmail.com Fri Feb 15 03:58:44 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 15 Feb 2008 09:58:44 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B5532D.5000504@fbk.eu> References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> Message-ID: Thanks for the reference :) I should have asked in other terms : how does it compare to libsvm, which is one of the most known packages for SVMs ? Matthieu 2008/2/15, Davide Albanese : > > Dear Matthieu, > I don't know very well scikit. > The Svm is implemented by Sequential Minimal Optimization (SMO). > As for Terminated Ramps (TR) you can read this paper: > /S. Merler and G. Jurman/* Terminated Ramp - Support Vector Machine: a > nonparametric data dependent kernel* Neural Network, 19(10), 1597-1611, > 2006. > > /* da */ -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From albanese at fbk.eu Fri Feb 15 04:35:53 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 10:35:53 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> Message-ID: <47B55CF9.6090207@fbk.eu> I don't know very well libsvm too, the core of svm-mlpy is written in C and was developed by Stefano Merler (merler at fbk.eu). I have simply wrapped it into svm() Python class. Regards, /* da */ Matthieu Brucher ha scritto: > Thanks for the reference :) > > I should have asked in other terms : how does it compare to libsvm, > which is one of the most known packages for SVMs ? > > Matthieu > > 2008/2/15, Davide Albanese >: > > Dear Matthieu, > I don't know very well scikit. > The Svm is implemented by Sequential Minimal Optimization (SMO). > As for Terminated Ramps (TR) you can read this paper: > /S. Merler and G. Jurman/* Terminated Ramp - Support Vector Machine: a > nonparametric data dependent kernel* Neural Network, 19(10), > 1597-1611, > 2006. > > /* da */ > > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher From matthieu.brucher at gmail.com Fri Feb 15 06:34:55 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 15 Feb 2008 12:34:55 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B55CF9.6090207@fbk.eu> References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> <47B55CF9.6090207@fbk.eu> Message-ID: OK, I'll try it then :) Is there an access to the underlying cost function ? (this is mainly what I need) Matthieu 2008/2/15, Davide Albanese : > > I don't know very well libsvm too, the core of svm-mlpy is written in C > and was developed by Stefano Merler (merler at fbk.eu). > I have simply wrapped it into svm() Python class. > > Regards, > > /* da */ -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Fri Feb 15 07:09:39 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 15 Feb 2008 13:09:39 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: <200802141846.05577.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> <200802141846.05577.faltet@carabos.com> Message-ID: <200802151309.39839.faltet@carabos.com> Hi Chuck, I've given more testing to the new quicksort routines for strings in the forthcoming NumPy. I've run the indexing test units in PyTables Pro (they stress the sorting routines a lot) against the current version of NumPy in the repository, for the complete set of quicksort, mergesort and heapsort of the new implementation, and I'm happy to say to everything went very smoothly, i.e. more than 1000 tests with different input arrays has passed flawlessly. Good job! Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From stefan at sun.ac.za Fri Feb 15 07:12:41 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 15 Feb 2008 14:12:41 +0200 Subject: [Numpy-discussion] List Array References? In-Reply-To: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> Message-ID: <20080215121241.GD7365@mentat.za.net> Hi Alexander On Thu, Feb 14, 2008 at 03:43:46PM -0500, Alexander Michael wrote: > Is there a way to list all of the arrays that are referencing a given > array? Similarly, is there a way to get a list of all arrays that are > currently in memory? As far as I know, the reference count to the array is increased when you create a view, but the views themselves are not tracked anywhere. You can therefore say whether there are references around, but you cannot identify the Python objects. I'm curious why you need this information -- maybe there is an alternative solution to your problem? Regards St?fan From david at ar.media.kyoto-u.ac.jp Fri Feb 15 07:03:32 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 15 Feb 2008 21:03:32 +0900 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? Message-ID: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Hi, My head hurt trying to understand dll management with windows. I wanted to find a sane way to use dll for numscons, but I can't see how to do this, so I was wondering if anyone on this ML had any deep knowledge on how to install dll, and reuse them with python extensions ? The problem is the following: - you have some dll (say MKL) in a path (say C:\program files\MKL). - how do you tell another dll (more exactly a python extension, a .pyd) to look for mkl dll into the MKL path ? It seems that there is no rpath-like capability on windows, and the only way to load dll is to put them into some windows directory, or to put the dll path into PATH. Is it really the only way ? cheers, David From robince at gmail.com Fri Feb 15 07:22:18 2008 From: robince at gmail.com (Robin) Date: Fri, 15 Feb 2008 12:22:18 +0000 Subject: [Numpy-discussion] List Array References? In-Reply-To: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> Message-ID: On Thu, Feb 14, 2008 at 8:43 PM, Alexander Michael wrote: > Is there a way to list all of the arrays that are referencing a given > array? Similarly, is there a way to get a list of all arrays that are > currently in memory? For the second question, if you are working interactively in Ipython, you can do %who ndarray or %whos ndarray to get a list of arrays. It might be possible to get this functionality within a script/program through the ipython api, but I'm afraid I don't know much about that. Cheers, Robin From faltet at carabos.com Fri Feb 15 07:27:00 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 15 Feb 2008 13:27:00 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802141711.33917.faltet@carabos.com> <200802142103.49350.faltet@carabos.com> Message-ID: <200802151327.01050.faltet@carabos.com> A Thursday 14 February 2008, Charles R Harris escrigu?: > On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet wrote: > > Maybe I'd also be interested in trying insertion sort out. During > > the optimization process of an OPSI index, there is a need to sort > > out a slice of data that is already made of smaller chunks that are > > already sorted, so chances are that insertion sort could be > > significantly faster than the merge sort (or even the quick sort) > > in this scenario. > > > > But this is becoming an OT. However, I'd be glad to further dicuss > > this privately, if you like to. > > Well, I don't have much more to say. If you do decide that insertion > sort will be useful you won't have to twist my arm much to get it, > but I think it is most useful when the data never has to move far. In > the case of quicksort and mergesort it is called to deal with small > unsorted chunks, but the chunks themselves are already in the right > place. Some kind of multi merge sort might be more appropriate to the > OPSI index. OK, thanks, I'll have you in mind ;) And yes, multi-way merge seems very interesting for OPSI indeed. Eventually, it might even be a good candidate for taking advantage of the several cores in modern CPU's; so it would be really interesting to check out this path. Thanks for very insightful hints! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From wright at esrf.fr Fri Feb 15 07:34:37 2008 From: wright at esrf.fr (Jon Wright) Date: Fri, 15 Feb 2008 13:34:37 +0100 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Message-ID: <47B586DD.10803@esrf.fr> David Cournapeau wrote: > Hi, > > My head hurt trying to understand dll management with windows. I > wanted to find a sane way to use dll for numscons, but I can't see how > to do this, so I was wondering if anyone on this ML had any deep > knowledge on how to install dll, and reuse them with python extensions ? > The problem is the following: > - you have some dll (say MKL) in a path (say C:\program files\MKL). > - how do you tell another dll (more exactly a python extension, a > .pyd) to look for mkl dll into the MKL path ? Hi, To be honest I'd be nervous about spreading your depedencies all over the disk, the potential interactions go as n^2. According to msdn there is a function which might work: "SetDllDirectory: Adds a directory to the search path used to locate DLLs for the application." http://msdn2.microsoft.com/en-us/library/ms686203.aspx Perhaps this is only for vista/xp? Otherwise you might have to copy the mkl dll into the "directory from which the application loaded". This would seem like a better choice, but I do wonder where that is for python. Hope this helps. I'm certainly not an expert, but keen to try to support your work on help the masses access optimised libraries. Best, Jon > > It seems that there is no rpath-like capability on windows, and the only > way to load dll is to put them into some windows directory, or to put > the dll path into PATH. Is it really the only way ? > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From matthieu.brucher at gmail.com Fri Feb 15 07:40:11 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 15 Feb 2008 13:40:11 +0100 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Message-ID: When Visual Studio 2008 will be used, there might be a way of using the manifest files (that were created for a similar purpose). For the moment, All I know is that you must put the dll in the Windows/system32 folder or somewhere in the PATH. Matthieu 2008/2/15, David Cournapeau : > > Hi, > > My head hurt trying to understand dll management with windows. I > wanted to find a sane way to use dll for numscons, but I can't see how > to do this, so I was wondering if anyone on this ML had any deep > knowledge on how to install dll, and reuse them with python extensions ? > The problem is the following: > - you have some dll (say MKL) in a path (say C:\program files\MKL). > - how do you tell another dll (more exactly a python extension, a > .pyd) to look for mkl dll into the MKL path ? > > It seems that there is no rpath-like capability on windows, and the only > way to load dll is to put them into some windows directory, or to put > the dll path into PATH. Is it really the only way ? > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From albanese at fbk.eu Fri Feb 15 07:47:48 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 13:47:48 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> <47B55CF9.6090207@fbk.eu> Message-ID: <47B589F4.9070502@fbk.eu> Yes: https://mlpy.fbk.eu/wiki/MlpyExamplesWithDoc * svm() Initialize the svm class. Inputs: ... cost - for cost-sensitive classification [-1.0, 1.0] Matthieu Brucher ha scritto: > OK, I'll try it then :) > > Is there an access to the underlying cost function ? (this is mainly > what I need) > > Matthieu > > 2008/2/15, Davide Albanese >: > > I don't know very well libsvm too, the core of svm-mlpy is written > in C > and was developed by Stefano Merler (merler at fbk.eu > ). > I have simply wrapped it into svm() Python class. > > Regards, > > /* da */ > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher From david at ar.media.kyoto-u.ac.jp Fri Feb 15 07:44:42 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 15 Feb 2008 21:44:42 +0900 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: <47B586DD.10803@esrf.fr> References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> <47B586DD.10803@esrf.fr> Message-ID: <47B5893A.2010606@ar.media.kyoto-u.ac.jp> Jon Wright wrote: > > Hi, > > To be honest I'd be nervous about spreading your depedencies all over > the disk, the potential interactions go as n^2. Well, I am more than nervous, this is totally insane, and I tried different things (registry: per application path, etc...). But it just looks like windows is even more f***-up than I thought. I am tired of wasting my time for this broken platform: I am really near giving up supporting MS compilers at all. > According to msdn there > is a function which might work: > > "SetDllDirectory: Adds a directory to the search path used to locate > DLLs for the application." > http://msdn2.microsoft.com/en-us/library/ms686203.aspx Well, this does not sound right to me either; anyway, this is a C function which does not seem to be available from python. > Otherwise you might have to copy the > mkl dll into the "directory from which the application loaded". This > would seem like a better choice, but I do wonder where that is for python. This directory depends on how you start python, I think, so this is not better. Having looked at how installers do it, there does not seem to be much choice: either use PATH (actually, that's what the MKL is doing), or put everything in hardcoded directories (system32 and co). The other solution I see is to add the paths to the registry for the python application (it seems that windows has a registry subtree App Paths to handle per application dll path, but touching the registry makes me quite nervous too). > > Hope this helps. I'm certainly not an expert, but keen to try to support > your work on help the masses access optimised libraries. Thanks, David From albanese at fbk.eu Fri Feb 15 07:54:40 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 13:54:40 +0100 Subject: [Numpy-discussion] List Array References? In-Reply-To: References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> Message-ID: <47B58B90.60502@fbk.eu> Whith "standard" Python: >>> who() Robin ha scritto: > On Thu, Feb 14, 2008 at 8:43 PM, Alexander Michael wrote: > >> Is there a way to list all of the arrays that are referencing a given >> array? Similarly, is there a way to get a list of all arrays that are >> currently in memory? >> > > For the second question, if you are working interactively in Ipython, you can do > %who ndarray > or > %whos ndarray > to get a list of arrays. > > It might be possible to get this functionality within a script/program > through the ipython api, but I'm afraid I don't know much about that. > > Cheers, > > Robin > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From david at ar.media.kyoto-u.ac.jp Fri Feb 15 07:45:59 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 15 Feb 2008 21:45:59 +0900 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Message-ID: <47B58987.4020001@ar.media.kyoto-u.ac.jp> Matthieu Brucher wrote: > When Visual Studio 2008 will be used, there might be a way of using > the manifest files (that were created for a similar purpose). > For the moment, All I know is that you must put the dll in the > Windows/system32 folder or somewhere in the PATH. Do you know where to find some informations about that ? I tried to find something readable, but I could only find some msdn pages with almost no concrete information on how to use those manifest files. cheers, David From charlesr.harris at gmail.com Fri Feb 15 07:58:24 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Feb 2008 05:58:24 -0700 Subject: [Numpy-discussion] String sort In-Reply-To: <200802151309.39839.faltet@carabos.com> References: <200802141711.33917.faltet@carabos.com> <200802141846.05577.faltet@carabos.com> <200802151309.39839.faltet@carabos.com> Message-ID: On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet wrote: > Hi Chuck, > > I've given more testing to the new quicksort routines for strings in the > forthcoming NumPy. I've run the indexing test units in PyTables Pro > (they stress the sorting routines a lot) against the current version of > NumPy in the repository, for the complete set of quicksort, mergesort > and heapsort of the new implementation, and I'm happy to say to > everything went very smoothly, i.e. more than 1000 tests with different > input arrays has passed flawlessly. Good job! > Hi Francesc, Thanks for the thorough testing. It makes me feel much more comfortable this close to the release of 1.0.5. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Fri Feb 15 08:04:32 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 15 Feb 2008 14:04:32 +0100 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Message-ID: Matthieu Brucher wrote: > When Visual Studio 2008 will be used, there might be a way of using the > manifest files (that were created for a similar purpose). > For the moment, All I know is that you must put the dll in the > Windows/system32 folder or somewhere in the PATH. That's not enough for some DLLs. .NET assemblies as well as Side-by-Side dlls (SxS) must be registered properly. You can install a SxS dll in PATH but simple copying the DLL isn't enough. It also depends on the OS. Have fun! The SxS issue has bitten us in the a.. when we delivered the second beta of Python 3.0. Christian From david at ar.media.kyoto-u.ac.jp Fri Feb 15 08:02:18 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 15 Feb 2008 22:02:18 +0900 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> Message-ID: <47B58D5A.8060701@ar.media.kyoto-u.ac.jp> Christian Heimes wrote: > > That's not enough for some DLLs. .NET assemblies as well as Side-by-Side > dlls (SxS) must be registered properly. You can install a SxS dll in > PATH but simple copying the DLL isn't enough. It also depends on the OS. Ah, that reminds me of something I tried, that is using the global cache something, with gacutil, which is supposed to work with 'bare' dll. Of course, it did not work as said on MSDN, and I got some obscure error message (I am starting to wonder if anything coming from MS works at all). > > Have fun! The SxS issue has bitten us in the a.. when we delivered the > second beta of Python 3.0. > Do you have a link to the related python ML discussion by any chance ? cheers, David From lxander.m at gmail.com Fri Feb 15 08:28:08 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Fri, 15 Feb 2008 08:28:08 -0500 Subject: [Numpy-discussion] List Array References? In-Reply-To: <20080215121241.GD7365@mentat.za.net> References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> <20080215121241.GD7365@mentat.za.net> Message-ID: <525f23e80802150528g2907ea64i48add3b9b84ed1ed@mail.gmail.com> On Fri, Feb 15, 2008 at 7:12 AM, Stefan van der Walt wrote: > As far as I know, the reference count to the array is increased when > you create a view, but the views themselves are not tracked anywhere. > You can therefore say whether there are references around, but you > cannot identify the Python objects. I would like to occasionally dynamically grow an array (i.e. add length to existing dimensions) as I do not know th ultimately required length beforehand, but I have a good guess so I won't be need to reallocate that often if at all. The only way I know how to do this is numpy is to create a new larger array with the new dimensions and copy the existing data from the smaller array into it. Perhaps there is a better way that doesn't invalidate views on the array? I want to make sure that there are no outstanding references to the old array (i.e. views on it, etc.) so that I can raise a helpful exception while developing. Robin and Davide- Thanks for the pointer to the numpy.who method, while that only searches in a specified dictionary, it was the method that I thought I had encountered before. From ndbecker2 at gmail.com Fri Feb 15 08:49:25 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Feb 2008 08:49:25 -0500 Subject: [Numpy-discussion] resizing arrays Message-ID: I have a situation where I'm going to save some data sets to plot, but I don't know a-priori how many sets there will be. I'm using this code: try: shape = list(phase_plots.shape) shape[0] += 1 phase_plots.resize (shape, refcheck=0) except NameError: phase_plots = empty ((1, 2*iterations+1, l)) This works, I'm just wondering if this is a reasonable approach or if maybe something else would be better (or more efficient). From matthieu.brucher at gmail.com Fri Feb 15 09:01:38 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 15 Feb 2008 15:01:38 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: <47B589F4.9070502@fbk.eu> References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> <47B55CF9.6090207@fbk.eu> <47B589F4.9070502@fbk.eu> Message-ID: Well this is an input parameter, I'd like to access the cost function directly so that I can use it to follow its gradient to the limit between the two classes. Matthieu 2008/2/15, Davide Albanese : > > Yes: https://mlpy.fbk.eu/wiki/MlpyExamplesWithDoc > > * svm() > Initialize the svm class. > > Inputs: > ... > cost - for cost-sensitive classification [-1.0, 1.0] > > > > Matthieu Brucher ha scritto: > > > OK, I'll try it then :) > > > > Is there an access to the underlying cost function ? (this is mainly > > what I need) > > > > Matthieu > > > > > 2008/2/15, Davide Albanese >: > > > > > I don't know very well libsvm too, the core of svm-mlpy is written > > in C > > and was developed by Stefano Merler (merler at fbk.eu > > > ). > > > I have simply wrapped it into svm() Python class. > > > > Regards, > > > > /* da */ > > > > > > > > -- > > French PhD student > > Website : http://matthieu-brucher.developpez.com/ > > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > > LinkedIn : http://www.linkedin.com/in/matthieubrucher > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From albanese at fbk.eu Fri Feb 15 09:59:21 2008 From: albanese at fbk.eu (Davide Albanese) Date: Fri, 15 Feb 2008 15:59:21 +0100 Subject: [Numpy-discussion] MLPY - Machine Learning Py - Python/NumPy based package for machine learning In-Reply-To: References: <47B45784.80208@fbk.eu> <47B5532D.5000504@fbk.eu> <47B55CF9.6090207@fbk.eu> <47B589F4.9070502@fbk.eu> Message-ID: <47B5A8C9.7090106@fbk.eu> Ok, sorry! The cost function is embedded into the C smo() function. I think that you cannot access it directly. Matthieu Brucher ha scritto: > Well this is an input parameter, I'd like to access the cost function > directly so that I can use it to follow its gradient to the limit > between the two classes. > > Matthieu > > 2008/2/15, Davide Albanese >: > > Yes: https://mlpy.fbk.eu/wiki/MlpyExamplesWithDoc > > * svm() > Initialize the svm class. > > Inputs: > ... > cost - for cost-sensitive classification [-1.0, 1.0] > > > > Matthieu Brucher ha scritto: > > > OK, I'll try it then :) > > > > Is there an access to the underlying cost function ? (this is mainly > > what I need) > > > > Matthieu > > > > > 2008/2/15, Davide Albanese >>: > > > > > I don't know very well libsvm too, the core of svm-mlpy is > written > > in C > > and was developed by Stefano Merler (merler at fbk.eu > > > > >). > > > I have simply wrapped it into svm() Python class. > > > > Regards, > > > > /* da */ > > > > > > > > -- > > French PhD student > > Website : http://matthieu-brucher.developpez.com/ > > Blogs : http://matt.eifelle.com and > http://blog.developpez.com/?blog=92 > > LinkedIn : http://www.linkedin.com/in/matthieubrucher > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher From faltet at carabos.com Fri Feb 15 10:15:11 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 15 Feb 2008 16:15:11 +0100 Subject: [Numpy-discussion] String sort In-Reply-To: References: <200802141711.33917.faltet@carabos.com> <200802151309.39839.faltet@carabos.com> Message-ID: <200802151615.11212.faltet@carabos.com> A Friday 15 February 2008, Charles R Harris escrigu?: > On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet wrote: > > Hi Chuck, > > > > I've given more testing to the new quicksort routines for strings > > in the forthcoming NumPy. I've run the indexing test units in > > PyTables Pro (they stress the sorting routines a lot) against the > > current version of NumPy in the repository, for the complete set of > > quicksort, mergesort and heapsort of the new implementation, and > > I'm happy to say to everything went very smoothly, i.e. more than > > 1000 tests with different input arrays has passed flawlessly. Good > > job! > > Hi Francesc, > > Thanks for the thorough testing. It makes me feel much more > comfortable this close to the release of 1.0.5. You are welcome. I forgot to mention that I tested out both direct (.sort()) and indirect (.argsort()) methods. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lists at cheimes.de Fri Feb 15 10:43:27 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 15 Feb 2008 16:43:27 +0100 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: <47B58D5A.8060701@ar.media.kyoto-u.ac.jp> References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> <47B58D5A.8060701@ar.media.kyoto-u.ac.jp> Message-ID: David Cournapeau wrote: > Do you have a link to the related python ML discussion by any chance ? No, I'm sorry. It was a private chat between between Guido, Martin and me during the release phase of Python 3.0a2. The MSDN website has some articles about SxS DLLs though. I had to read about ten articles to get the big picture. The information is scattered all over the place. :/ Christian From lxander.m at gmail.com Fri Feb 15 10:51:04 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Fri, 15 Feb 2008 10:51:04 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ for Record Values Message-ID: <525f23e80802150751n2ebb284dhf730825e4a0a730a@mail.gmail.com> I am attempting to work with field dtypes (i.e. record arrays) as MaskedArrays. I don't want to use MaskedRecords because I want the mask to apply to the whole record since the first field is primary and the rest are "tag along" values associated with the primary data value. The numpy.ndarray supports the ability to set a record with a (heterogenous) sequence, but MaskedArray does not: >>> import numpy >>> from numpy import ma >>> d = numpy.empty((5,4), dtype=[('value', float),('source', int)]) >>> a = ma.MaskedArray(d, mask=True, fill_value=(0.0,0)) >>> a[0,0] = (10.0, 1) numpy\ma\core.py in __setitem__(self, indx, value) 1359 return 1360 #.... -> 1361 dval = getdata(value).astype(self.dtype) 1362 valmask = getmask(value) 1363 if self._mask is nomask: TypeError: expected a readable buffer object I suggest replacing " dval = getdata(value).astype(self.dtype)" with something along the lines of: dval = None if self.dtype.fields is None: dval = getdata(value).astype(self.dtype) else: dval = value but I do not understand all of the activity in the __setitem__ method, so this may be too naive a fix (but it *appears* to work). Thanks, Alex From cournape at gmail.com Fri Feb 15 11:00:57 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 16 Feb 2008 01:00:57 +0900 Subject: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ? In-Reply-To: References: <47B57F94.4090708@ar.media.kyoto-u.ac.jp> <47B58D5A.8060701@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220802150800i397d6fc7x407580bba36f5350@mail.gmail.com> On Sat, Feb 16, 2008 at 12:43 AM, Christian Heimes wrote: > > The MSDN website has some articles about SxS DLLs though. I had to read > about ten articles to get the big picture. The information is scattered > all over the place. :/ 10 pages, xml files and not even compatible across OS versions: it it what is meant by MS when they talk about rich API ? :) I won't read those pages to understand something which is available on unix for 20 years and understandable in 5 minutes. I wasted one half day, I won't waste more than that. Well, this is it, I give up, I won't support linking dll with MS compilers. I don't have easy access to windows, and I don't even use this crap. Thanks for your help, though David From stefan at sun.ac.za Fri Feb 15 11:51:45 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 15 Feb 2008 18:51:45 +0200 Subject: [Numpy-discussion] List Array References? In-Reply-To: <525f23e80802150528g2907ea64i48add3b9b84ed1ed@mail.gmail.com> References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> <20080215121241.GD7365@mentat.za.net> <525f23e80802150528g2907ea64i48add3b9b84ed1ed@mail.gmail.com> Message-ID: <20080215165145.GG7365@mentat.za.net> On Fri, Feb 15, 2008 at 08:28:08AM -0500, Alexander Michael wrote: > On Fri, Feb 15, 2008 at 7:12 AM, Stefan van der Walt wrote: > > As far as I know, the reference count to the array is increased when > > you create a view, but the views themselves are not tracked anywhere. > > You can therefore say whether there are references around, but you > > cannot identify the Python objects. > > I would like to occasionally dynamically grow an array (i.e. add > length to existing dimensions) as I do not know th ultimately required > length beforehand, but I have a good guess so I won't be need to > reallocate that often if at all. The only way I know how to do this is > numpy is to create a new larger array with the new dimensions and copy > the existing data from the smaller array into it. Perhaps there is a > better way that doesn't invalidate views on the array? I want to make > sure that there are no outstanding references to the old array (i.e. > views on it, etc.) so that I can raise a helpful exception while > developing. Numpy does complain if you attempt to resize an array with views on it: In [8]: x = np.array([1,2,3]) In [14]: y = x[::-1] In [18]: x.resize((4,)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ in () ValueError: cannot resize an array that has been referenced or is referencing another array in this way. Use the resize function You can catch that exception and work from there. I hope that is what you had in mind? Regards St?fan From charlesr.harris at gmail.com Fri Feb 15 11:56:06 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Feb 2008 09:56:06 -0700 Subject: [Numpy-discussion] resizing arrays In-Reply-To: References: Message-ID: On Fri, Feb 15, 2008 at 6:49 AM, Neal Becker wrote: > I have a situation where I'm going to save some data sets to plot, but I > don't know a-priori how many sets there will be. I'm using this code: > > try: > shape = list(phase_plots.shape) > shape[0] += 1 > phase_plots.resize (shape, refcheck=0) > except NameError: > phase_plots = empty ((1, 2*iterations+1, l)) > > This works, I'm just wondering if this is a reasonable approach or if > maybe > something else would be better (or more efficient). > > I'd just append the data to a list. Arrays aren't meant to be dynamic and what you end up with is a bunch of reallocations and copies. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Fri Feb 15 12:59:38 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 15 Feb 2008 12:59:38 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ for Record Values In-Reply-To: <525f23e80802150751n2ebb284dhf730825e4a0a730a@mail.gmail.com> References: <525f23e80802150751n2ebb284dhf730825e4a0a730a@mail.gmail.com> Message-ID: <200802151259.39349.pgmdevlist@gmail.com> On Friday 15 February 2008 10:51:04 Alexander Michael wrote: > >>> d = numpy.empty((5,4), dtype=[('value', float),('source', int)]) > >>> a = ma.MaskedArray(d, mask=True, fill_value=(0.0,0)) > >>> a[0,0] = (10.0, 1) > > numpy\ma\core.py in __setitem__(self, indx, value) > 1359 return > 1360 #.... > -> 1361 dval = getdata(value).astype(self.dtype) > 1362 valmask = getmask(value) > 1363 if self._mask is nomask: > > TypeError: expected a readable buffer object Good call. The easiest is still to replace the line 1361 with: dval = narray(value, copy=False, dtype=self.dtype) The problem with the initial method was that the tuple got transformed into a (2,) array whose type could not be changed afterwards. With the new line, we directly transform value to a ndarray of the proper type. Mmh. Where should I commit the fix ? Directly to the trunk ? From lxander.m at gmail.com Fri Feb 15 14:36:09 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Fri, 15 Feb 2008 14:36:09 -0500 Subject: [Numpy-discussion] List Array References? In-Reply-To: <20080215165145.GG7365@mentat.za.net> References: <525f23e80802141243p3ea504bcr6d681b0c5b713405@mail.gmail.com> <20080215121241.GD7365@mentat.za.net> <525f23e80802150528g2907ea64i48add3b9b84ed1ed@mail.gmail.com> <20080215165145.GG7365@mentat.za.net> Message-ID: <525f23e80802151136m75391423r5fa57fe69b91f488@mail.gmail.com> On Fri, Feb 15, 2008 at 11:51 AM, Stefan van der Walt wrote: > Numpy does complain if you attempt to resize an array with views on > it: > > In [8]: x = np.array([1,2,3]) > > In [14]: y = x[::-1] > > In [18]: x.resize((4,)) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > > /tmp/ in () > > ValueError: cannot resize an array that has been referenced or is > referencing another array in this way. Use the resize function > > You can catch that exception and work from there. I hope that is what > you had in mind? Actually, I'm essentially trying to figure out how situations like that are arising. I'm not using resize, because it reshapes the array when adding to any dimension other than the first. At "the end of the day" I want to enlarge an array without leaving any "references" dangling. Thanks for the suggestion! Alex From lxander.m at gmail.com Fri Feb 15 14:41:27 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Fri, 15 Feb 2008 14:41:27 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ for Record Values In-Reply-To: <200802151259.39349.pgmdevlist@gmail.com> References: <525f23e80802150751n2ebb284dhf730825e4a0a730a@mail.gmail.com> <200802151259.39349.pgmdevlist@gmail.com> Message-ID: <525f23e80802151141y2ba00eb5tcf265d5b81f453b0@mail.gmail.com> On Fri, Feb 15, 2008 at 12:59 PM, Pierre GM wrote: > Good call. > The easiest is still to replace the line 1361 with: > dval = narray(value, copy=False, dtype=self.dtype) > > The problem with the initial method was that the tuple got transformed into a > (2,) array whose type could not be changed afterwards. With the new line, we > directly transform value to a ndarray of the proper type. Even better- thanks! > Mmh. Where should I commit the fix ? Directly to the trunk ? I hope so! Regards, Alex From pgmdevlist at gmail.com Fri Feb 15 14:56:14 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 15 Feb 2008 14:56:14 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ for Record Values In-Reply-To: <525f23e80802151141y2ba00eb5tcf265d5b81f453b0@mail.gmail.com> References: <525f23e80802150751n2ebb284dhf730825e4a0a730a@mail.gmail.com> <200802151259.39349.pgmdevlist@gmail.com> <525f23e80802151141y2ba00eb5tcf265d5b81f453b0@mail.gmail.com> Message-ID: <200802151456.15825.pgmdevlist@gmail.com> On Friday 15 February 2008 14:41:27 Alexander Michael wrote: > Even better- thanks! You're welcome. > > Mmh. Where should I commit the fix ? Directly to the trunk ? Done. From dineshbvadhia at hotmail.com Fri Feb 15 16:00:58 2008 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 15 Feb 2008 13:00:58 -0800 Subject: [Numpy-discussion] import issue with new Python Message-ID: I upgraded to Python 2.5.2c1 today, and got the following error for: > import numpy > import scipy Traceback (most recent call last): File "C:\ ... .py", line 19, in import scipy ImportError: No module named scipy I'm using Numpy 1.0.4 and Scipy 0.6. Any ideas? Dinesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Wed Feb 13 16:34:10 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 13 Feb 2008 16:34:10 -0500 Subject: [Numpy-discussion] import issue with new Python In-Reply-To: References: Message-ID: On Fri, 15 Feb 2008, Dinesh B Vadhia apparently wrote: > I upgraded to Python 2.5.2c1 today, and got the following error for: >> import numpy >> import scipy > Traceback (most recent call last): > File "C:\ ... .py", line 19, in > import scipy > ImportError: No module named scipy > I'm using Numpy 1.0.4 and Scipy 0.6. > Any ideas? If you are an Windows, check Lib\site-packages and see if there are there: looks like not. If you did a side-by-side install, you'll have to reinstall SciPy and NumPy on your new Python. Cheers, Alan Isaac From charlesr.harris at gmail.com Fri Feb 15 16:32:38 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Feb 2008 14:32:38 -0700 Subject: [Numpy-discussion] test failures when numpy built without atlas libraries. Message-ID: When built without ATLAS, the following tests fail : ====================================================================== ERROR: check_testUfuncRegression (numpy.core.tests.test_ma.test_ufuncs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/numpy/core/tests/test_ma.py", line 691, in check_testUfuncRegression self.failUnless(eq(ur.filled(0), mr.filled(0), f)) File "/usr/lib/python2.5/site-packages/numpy/ma/core.py", line 1552, in filled result = self._data.copy() File "/usr/lib/python2.5/site-packages/numpy/ma/core.py", line 1474, in _get_data return self.view(self._baseclass) TypeError: Cannot change data-type for object array. ====================================================================== FAIL: Ticket #588 ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", line 734, in check_dot_negative_stride assert_equal(np.dot(x,z),np.dot(x,y2)) File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 143, in assert_equal return assert_array_equal(actual, desired, err_msg) File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 225, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/usr/lib/python2.5/site-packages/numpy/testing/utils.py", line 217, in assert_array_compare assert cond, msg AssertionError: Arrays are not equal (mismatch 100.0%) x: array([[ 55924.]]) y: array([[ 640000.]]) When built with ATLAS the second test passes, but the check_testUfuncRegression error persists. This is at revision 4807. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxander.m at gmail.com Fri Feb 15 23:12:37 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Fri, 15 Feb 2008 23:12:37 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ Performance Message-ID: <525f23e80802152012u21c52a9aha060480df3f3f08d@mail.gmail.com> In part of some code I'm rewriting from carrying around a data and mask array to using MaskedArray, I read data into an array from an input stream. By its nature this a "one at a time" process, so it is basically a loop over assigning single elements (in no predetermined order) of already allocated arrays. Unfortunately, using MaskedArray in this way is significantly slower. The sample code below demonstrates that for this particular procedure, filling the MaskedArray is 32x slower than working with the two arrays I had been carying around. It appears that I can regain the fill performance by working on _data and _mask directly. I can guarantee that the MaskedArrays I'm working with have been created with a dense mask as I've done below (there are always some masked elements, so there is no gain in shrinking to nomask). Is this safe? If not, can I make it safe for this particular performance critical section? I'm assuming that doing array operations won't incur this sort of penalty when I get further into my translation. Some overhead is acceptable for the convenience of not dragging around the mask and thinking about it all of the time, but hopefully less than 2x slower. Thanks! Alex import numpy def get_ndarrays(): return (numpy.zeros((5000,500), dtype=float), numpy.ones((5000,500), dtype=bool)) import timeit t_base = timeit.Timer( 'a[0,0] = 1.0; m[0,0] = False', 'from __main__ import get_ndarrays; a,m = get_ndarrays()' ).timeit(1000)/1000 print t_base 6.97574691756e-007 import numpy.ma def get_maskedarray(): return numpy.ma.MaskedArray( numpy.zeros((5000,500), dtype=float), numpy.ones((5000,500), dtype=bool) ) t_ma = timeit.Timer( 'a[0,0] = 1.0', 'from __main__ import get_maskedarray; a = get_maskedarray()' ).timeit(1000)/1000 print t_ma, t_ma/t_base 2.26880790715e-005 32.5242290749 t_ma_com = timeit.Timer( 'd[0,0] = 1.0; m[0,0] = False', 'from __main__ import get_maskedarray, get_setter; a = get_maskedarray(); d,m = a._data,a._mask' ).timeit(1000)/1000 print t_ma_com, t_ma_com/t_base 7.34450886914e-007 1.05286343612 From josh8912 at yahoo.com Sat Feb 16 04:52:21 2008 From: josh8912 at yahoo.com (JJ) Date: Sat, 16 Feb 2008 01:52:21 -0800 (PST) Subject: [Numpy-discussion] changing display options for dtype info for record arrays Message-ID: <155173.81232.qm@web54012.mail.re2.yahoo.com> Hello: I am starting to use record arrays and would like to know how to keep numpy from displaying the dtype info. For example, I can make a record array containing a long tuple: mydescriptor = dtype([('first', 'f4'),('second', 'f4'), ('third', [(str(x),' References: <525f23e80802152012u21c52a9aha060480df3f3f08d@mail.gmail.com> Message-ID: <200802161225.03274.pgmdevlist@gmail.com> Alexander, You get the gist here: process your _data and _mask separately and recombine them into a MaskedArray at the end. That way, you'll skip most of the overhead costs brought by some tests in the package (in __getitem__, __setitem__...). From dmitrey.kroshko at scipy.org Sat Feb 16 14:14:15 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Sat, 16 Feb 2008 21:14:15 +0200 Subject: [Numpy-discussion] best way for C code wrapping Message-ID: <47B73607.105@scipy.org> hi all, I intend to connect some C code to Python for some my purposes. What is the best software for the aim? Is it numpy.ctypes or swig or something else? IIRC ctypes are present in Python since v2.5, so it's ok to use just ctypes, not numpy.ctypes, or some difference is present? Another one question: if I'll translate a fortran code to C via f2c, which % of speed down should I expect (in average, using gfortran and gcc)? Afaik it contains operations with sparse matrices. Thank you in advance, D. From lxander.m at gmail.com Sat Feb 16 14:23:26 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Sat, 16 Feb 2008 14:23:26 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ Performance In-Reply-To: <200802161225.03274.pgmdevlist@gmail.com> References: <525f23e80802152012u21c52a9aha060480df3f3f08d@mail.gmail.com> <200802161225.03274.pgmdevlist@gmail.com> Message-ID: <525f23e80802161123h74cdbe60v70b516052c648515@mail.gmail.com> On Feb 16, 2008 12:25 PM, Pierre GM wrote: > Alexander, > You get the gist here: process your _data and _mask separately and recombine > them into a MaskedArray at the end. That way, you'll skip most of the > overhead costs brought by some tests in the package (in __getitem__, > __setitem__...). Can I safely carry around the data, mask and MaskedArray? I'm considering working along the lines of the following conceptual outline: d = numpy.array(shape, dtype) m = numpy.array(shape, bool) a = numpy.ma.MaskedArray(d, m) load_initial_data(d, m) for update in updates: apply_update(update, d, m) result = calculate_result(a) I guess the alternative would be like: d = numpy.array(shape, dtype) m = numpy.array(shape, bool) load_initial_data(d, m) for update in updates: apply_update(update, d, m) a = numpy.ma.MaskedArray(d, m) result = calculate_result(a) Perhaps this is cleaner in some ways, but I'm trying to squeeze the most performance out of the basic update loop I've sketched, so that the calculate_result function can afford to exchange some performance for clarity and simplicity (if desired). I haven't yet measured the overhead in creating a MaskedArray, but there probably isn't much since by default no copies are made. Thanks for your advice, Alex From fullung at gmail.com Sat Feb 16 14:26:45 2008 From: fullung at gmail.com (Albert Strasheim) Date: Sat, 16 Feb 2008 21:26:45 +0200 Subject: [Numpy-discussion] best way for C code wrapping In-Reply-To: <47B73607.105@scipy.org> References: <47B73607.105@scipy.org> Message-ID: <5eec5f300802161126n630b4b4cm60fc6d74f87dd4e5@mail.gmail.com> Hello, On Feb 16, 2008 9:14 PM, dmitrey wrote: > hi all, > I intend to connect some C code to Python for some my purposes. > What is the best software for the aim? > Is it numpy.ctypes or swig or something else? > IIRC ctypes are present in Python since v2.5, so it's ok to use just > ctypes, not numpy.ctypes, or some difference is present? I would definitely recommend ctypes. numpy.ctypes is just some extra code to make it easier to use NumPy arrays with ctypes. It's not a standalone thing. There are some example of what you can do with the ctypes support in NumPy here: http://www.scipy.org/Cookbook/Ctypes Look for ndpointer. > Another one question: if I'll translate a fortran code to C via f2c, > which % of speed down should I expect (in average, using gfortran and > gcc)? Afaik it contains operations with sparse matrices. Depending on what you want to do exactly, you might consider wrapping the Fortran code using f2py instead of translating it to C. You could also build the Fortran code as a shared library and wrap it using ctypes. Regards, Albert From matthieu.brucher at gmail.com Sat Feb 16 14:28:55 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 16 Feb 2008 20:28:55 +0100 Subject: [Numpy-discussion] best way for C code wrapping In-Reply-To: <47B73607.105@scipy.org> References: <47B73607.105@scipy.org> Message-ID: Hi, numpy.ctypes uses ctypes to work, it consists of some additional utility functions. There was a discussion on this some time ago (SWIG, ctypes, ...) with David (C.), Ga?l and others. Why translating some code to C ? Why not using f2py ? Matthieu 2008/2/16, dmitrey : > > hi all, > I intend to connect some C code to Python for some my purposes. > What is the best software for the aim? > Is it numpy.ctypes or swig or something else? > IIRC ctypes are present in Python since v2.5, so it's ok to use just > ctypes, not numpy.ctypes, or some difference is present? > > Another one question: if I'll translate a fortran code to C via f2c, > which % of speed down should I expect (in average, using gfortran and > gcc)? Afaik it contains operations with sparse matrices. > > Thank you in advance, > D. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Sat Feb 16 15:21:32 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sat, 16 Feb 2008 15:21:32 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ Performance In-Reply-To: <525f23e80802161123h74cdbe60v70b516052c648515@mail.gmail.com> References: <525f23e80802152012u21c52a9aha060480df3f3f08d@mail.gmail.com> <200802161225.03274.pgmdevlist@gmail.com> <525f23e80802161123h74cdbe60v70b516052c648515@mail.gmail.com> Message-ID: <200802161521.33324.pgmdevlist@gmail.com> > Can I safely carry around the data, mask and MaskedArray? I'm > considering working along the lines of the following conceptual > outline: That depends a lot on what calculate_results does, and whether you update the arrays in place or not. > d = numpy.array(shape, dtype) > m = numpy.array(shape, bool) > a = numpy.ma.MaskedArray(d, m) You should be able to update d and m, and have the changes passed to a (as long as you're not using copy=True). You have to make sure that m has indeed a dtype of MaskType (or bool), else you'll break the connection. Explanation: in MaskedArray.__new__, the mask argument is converted to a dtype of MaskType (bool): if the mask is originally in integer, for example, a copy is made, and the _mask of your masked array does not point to `mask`. For example: >>>d=numpy.array([1,2,3]) >>>m=numpy.array([0,0,1]) >>>x=numpy.ma.array(d,mask=m) >>>x [1 2 --] >>>d[0]=17 >>>x [17 2 --] OK, x is properly updated. If now we try to change the mask: >>>m[0]=1 >>>x [17 2 --] x is not updated, as x._mask doesn't point to m, but to a copy of m as the dtype changed from int to bool. Now, if we ensure that m is an array of booleans: >>>d=numpy.array([1,2,3]) >>>m=numpy.array([0,0,1], dtype=bool) >>>x=numpy.ma.array(d,mask=m) >>>print x [1 2 --] >>>d[0]=17 >>>print x [17 2 --] >>>m[0]=1 >>>print x [-- 2 --] m was of the correct dtype in the first place, so no copy is made, and x._mask does point to m. In short: in your example, updating d and m should work and be more efficient than updating a directly. From lxander.m at gmail.com Sat Feb 16 16:41:12 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Sat, 16 Feb 2008 16:41:12 -0500 Subject: [Numpy-discussion] MaskedArray __setitem__ Performance In-Reply-To: <200802161521.33324.pgmdevlist@gmail.com> References: <525f23e80802152012u21c52a9aha060480df3f3f08d@mail.gmail.com> <200802161225.03274.pgmdevlist@gmail.com> <525f23e80802161123h74cdbe60v70b516052c648515@mail.gmail.com> <200802161521.33324.pgmdevlist@gmail.com> Message-ID: <525f23e80802161341k65810b62v302d42c004a8b9b@mail.gmail.com> On Feb 16, 2008 3:21 PM, Pierre GM wrote: > > Can I safely carry around the data, mask and MaskedArray? I'm > > considering working along the lines of the following conceptual > > outline: > > That depends a lot on what calculate_results does, and whether you update the > arrays in place or not. > > > d = numpy.array(shape, dtype) > > m = numpy.array(shape, bool) > > a = numpy.ma.MaskedArray(d, m) > > You should be able to update d and m, and have the changes passed to a (as > long as you're not using copy=True). You have to make sure that m has indeed > a dtype of MaskType (or bool), else you'll break the connection. > > Explanation: in MaskedArray.__new__, the mask argument is converted to a dtype > of MaskType (bool): if the mask is originally in integer, for example, a copy > is made, and the _mask of your masked array does not point to `mask`. For > example: > >>>d=numpy.array([1,2,3]) > >>>m=numpy.array([0,0,1]) > >>>x=numpy.ma.array(d,mask=m) > >>>x > [1 2 --] > >>>d[0]=17 > >>>x > [17 2 --] > > OK, x is properly updated. If now we try to change the mask: > > >>>m[0]=1 > >>>x > [17 2 --] > > x is not updated, as x._mask doesn't point to m, but to a copy of m as the > dtype changed from int to bool. > Now, if we ensure that m is an array of booleans: > >>>d=numpy.array([1,2,3]) > >>>m=numpy.array([0,0,1], dtype=bool) > >>>x=numpy.ma.array(d,mask=m) > >>>print x > [1 2 --] > >>>d[0]=17 > >>>print x > [17 2 --] > >>>m[0]=1 > >>>print x > [-- 2 --] > m was of the correct dtype in the first place, so no copy is made, and x._mask > does point to m. > > In short: in your example, updating d and m should work and be more efficient > than updating a directly. Cool. Thanks! From ndbecker2 at gmail.com Sun Feb 17 09:41:57 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 17 Feb 2008 09:41:57 -0500 Subject: [Numpy-discussion] Help with user-defined data type Message-ID: I'm trying out user-defined data type for numpy, but I'm a bit stuck. Maybe someone can spot the problem? I'm creating a descr like this: PyArray_Descr * d = (PyArray_Descr*)(PyObject_New (PyArray_Descr, &PyArrayDescr_Type)); d->typeobj = typeptr; d->kind = 'O'; // ??? d->type = 'z'; d->byteorder = '='; d->hasobject = 0; d->elsize = sizeof (cmplx_int_t); d->alignment = __alignof__ (cmplx_int_t); d->subarray = 0; d->fields = 0; d->names = 0; PyArray_ArrFuncs * f = new PyArray_ArrFuncs; PyArray_InitArrFuncs (f); f->copyswapn = ©swapn; f->copyswap = ©swap; f->getitem = &getitem; f->setitem = &setitem; d->f = f; incref (typeptr); Then register: PyArray_RegisterDataType (d1); This seems OK, I get a typenum 256. But if I try to use it, bad things happen. This tests creating an array, both from typenum and from descr: PyArray_Descr* descr = PyArray_DescrFromType (typenum); npy_intp dims[] = {10}; PyObject * o1 = PyArray_NewFromDescr (&PyArray_Type, descr, 1, dims, 0, 0, 0, 0); PyObject * o2 = PyArray_New (&PyArray_Type, 1, dims, descr->type_num, 0, 0, 0, 0, 0); If I try this for typenum=1, it's fine. But if I use my typenum=256 I get >>> a,b = testit (256) >>> a python: Objects/typeobject.c:1418: extra_ivars: Assertion `t_size >= b_size' failed. Program received signal SIGABRT, Aborted. 0x0000003834c30ec5 in raise () from /lib64/libc.so.6 From ndbecker2 at gmail.com Sun Feb 17 16:27:52 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 17 Feb 2008 16:27:52 -0500 Subject: [Numpy-discussion] Help with user-defined data type References: Message-ID: Are there any examples of user-defined data types that I could get hold of? From robert.kern at gmail.com Sun Feb 17 17:46:13 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 Feb 2008 16:46:13 -0600 Subject: [Numpy-discussion] Help with user-defined data type In-Reply-To: References: Message-ID: <3d375d730802171446q1c20940dx4e91bb688b69e00a@mail.gmail.com> On Feb 17, 2008 3:27 PM, Neal Becker wrote: > Are there any examples of user-defined data types that I could get hold of? I think you may be the first. The problems you encounter may well be bugs rather than problems with your code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndbecker2 at gmail.com Mon Feb 18 06:43:11 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 18 Feb 2008 06:43:11 -0500 Subject: [Numpy-discussion] Help with user-defined data type References: <3d375d730802171446q1c20940dx4e91bb688b69e00a@mail.gmail.com> Message-ID: Robert Kern wrote: > On Feb 17, 2008 3:27 PM, Neal Becker wrote: >> Are there any examples of user-defined data types that I could get hold >> of? > > I think you may be the first. The problems you encounter may well be > bugs rather than problems with your code. > There is mention in several places about 'inheritance' from existing scalar objects. What is this about? Why would I want my type to inherit from an existing type, and what does this imply about numpy handling of my type? From cournape at gmail.com Mon Feb 18 14:30:04 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 19 Feb 2008 04:30:04 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! Message-ID: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> Hi, I've just finished a new release of numscons. This version is the first one to be able to compile numpy using MS compilers and g77 compiler on windows, which was the last "big" platform not supported by numscons. This also means I can now refocus on building scipy with numscons, so that the whole numpy/scipy framework can be built with numscons on all platforms. This release is available through pypi (just use easy_install numscons, or easy_install -U numscons to upgrade a previous install), as well as on launchpad: https://launchpad.net/numpy.scons.support/0.4/0.4.1 cheers, David From fullung at gmail.com Tue Feb 19 02:42:41 2008 From: fullung at gmail.com (Albert Strasheim) Date: Tue, 19 Feb 2008 09:42:41 +0200 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> Message-ID: <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> Hello, On Feb 18, 2008 9:30 PM, David Cournapeau wrote: > Hi, > > I've just finished a new release of numscons. This version is the > first one to be able to compile numpy using MS compilers and g77 > compiler on windows, which was the last "big" platform not supported > by numscons. Good stuff. I noticed that the Launchpad page says: "I decided not to support dynamic linking against 3rd party dll. Because of intrinsics windows limitations, it is impossible to do it in a reliable way without putting too much burden on the maintainer." I might note that I had problems with NumPy and other C code crashing randomly when statically linked against MKL 9.1 on Win32 (probably when using multiple cores). Dynamically linking fixed the problem. I haven't tested with MKL 10 yet. Cheers, Albert From matthieu.brucher at gmail.com Tue Feb 19 03:06:13 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Feb 2008 09:06:13 +0100 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> Message-ID: > > Good stuff. I noticed that the Launchpad page says: > > "I decided not to support dynamic linking against 3rd party dll. > Because of intrinsics windows limitations, it is impossible to do it > in a reliable way without putting too much burden on the maintainer." > > I might note that I had problems with NumPy and other C code crashing > randomly when statically linked against MKL 9.1 on Win32 (probably > when using multiple cores). Dynamically linking fixed the problem. I > haven't tested with MKL 10 yet. > If you use an installed ATLAS/MKL/... library, I don't know where is the problem wit linking with them :| Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Feb 19 04:03:04 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Feb 2008 18:03:04 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> Message-ID: <47BA9B48.70509@ar.media.kyoto-u.ac.jp> Albert Strasheim wrote: > > Good stuff. I noticed that the Launchpad page says: > > "I decided not to support dynamic linking against 3rd party dll. > Because of intrinsics windows limitations, it is impossible to do it > in a reliable way without putting too much burden on the maintainer." > > I might note that I had problems with NumPy and other C code crashing > randomly when statically linked against MKL 9.1 on Win32 (probably > when using multiple cores). Dynamically linking fixed the problem. I > haven't tested with MKL 10 yet. Yes, it is a problem, at least it is advertised as such on mkl release notes (static vs dll). I could use PATH and get away with it, but I am reluctant to do that, because it is really fragile. If I could be sure that nobody would complain about it when it does not work, then I would do it in a minute (codewise, it is trivial; actually, the development version of numscons does it, because it is useful for debugging). But by experience with all my other packages, windows installation is always what get me the most emails, because windows is more fragile, and its users less aware of technical issues. I am okay spending time on code, much less spending time on a platform which I do not use, on a project which I am working for free. So it really boils down to either someone pays me to do it, or someone tells me how to do it reliably (I would of course do something which works reliably, if anyone can explain to me how to). cheers, David From david at ar.media.kyoto-u.ac.jp Tue Feb 19 04:12:37 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Feb 2008 18:12:37 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> Message-ID: <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> Matthieu Brucher wrote: > > If you use an installed ATLAS/MKL/... library, I don't know where is > the problem wit linking with them :| Atlas is not a problem, because if you know how to build a dll for ATLAS, you know how to handle environment variable problems. It is not a purely technical problem, but a combination of both cultural and technical. I know that if I tell you, Matthieu, to put you library path in your path, you won't tell me it does not work if the PATH is changed or something similar. But that's not how most windows users work in my experience. As for the problem: if you use PATH for dll, you need: - to use PATH inside scons, that is polluting the build environment with a variable you cannot control at all. - to add the path of the used libraries into the user environment. I can imagine many things going wrong in this scenario. And guess who a user will complain to if something is messed up ? :) cheers, David From matthieu.brucher at gmail.com Tue Feb 19 04:31:56 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Feb 2008 10:31:56 +0100 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> Message-ID: 2008/2/19, David Cournapeau : > > Matthieu Brucher wrote: > > > > If you use an installed ATLAS/MKL/... library, I don't know where is > > the problem wit linking with them :| > > > Atlas is not a problem, because if you know how to build a dll for > ATLAS, you know how to handle environment variable problems. It is not a > purely technical problem, but a combination of both cultural and > technical. I know that if I tell you, Matthieu, to put you library path > in your path, you won't tell me it does not work if the PATH is changed > or something similar. But that's not how most windows users work in my > experience. Now that you provide an installer for Atlas, it may become the same problem as MKL, can't it ? As for the problem: if you use PATH for dll, you need: > - to use PATH inside scons, that is polluting the build environment > with a variable you cannot control at all. > - to add the path of the used libraries into the user environment. > I can imagine many things going wrong in this scenario. And guess who a > user will complain to if something is messed up ? :) Yes, they will complain to you for sure, but I can't see where dynamic libraries can become a problem : - are they numpy dlls ? is so there is a problem, I agree with you - are they external dlls ? If so, the installer must have set up everything so that everything works, but this is where people bark at the wrong people : the users of the dll and not the installers. You seem to indicate that the problem is the last dlls, am I correct ? Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Feb 19 04:34:45 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Feb 2008 18:34:45 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> Message-ID: <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> Matthieu Brucher wrote: > > > Now that you provide an installer for Atlas, it may become the same > problem as MKL, can't it ? It is exactly the same problem, yes. Right now, my installer does not modify the environment at all (like MKL or ACML, actually), and you have to do it manually (add PATH, or put in system32). > > > Yes, they will complain to you for sure, but I can't see where dynamic > libraries can become a problem : > - are they numpy dlls ? is so there is a problem, I agree with you numpy 'dll' (they are technically dll, but are named .pyd on recent version of windows) are mandatory, and is not a problem: it is handled by python (whose import mechanism knows how to import dll not in the path env var). > - are they external dlls ? If so, the installer must have set up > everything so that everything works, but this is where people bark at > the wrong people : the users of the dll and not the installers. Well, neither ACML or MKL does it, at least the versions I looked into. You have to launch a special cmd.exe (like for VS2005) which import the variables. cheers, David From fullung at gmail.com Tue Feb 19 04:50:47 2008 From: fullung at gmail.com (Albert Strasheim) Date: Tue, 19 Feb 2008 11:50:47 +0200 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> Message-ID: <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> Hello, On Feb 19, 2008 11:34 AM, David Cournapeau wrote: > Matthieu Brucher wrote: > > > > > > Now that you provide an installer for Atlas, it may become the same > > problem as MKL, can't it ? > > It is exactly the same problem, yes. Right now, my installer does not > modify the environment at all (like MKL or ACML, actually), and you have > to do it manually (add PATH, or put in system32). Have you tried installing the DLLs to C:\Python2x or to the same directory as the numpy .pyd? As far as I know, this should work. Obviously there are some issues: multiple modules might be linked to MKL, in which case you would be duplicating a lot of files, but hey, so it goes. Ideally, all the modules should be using one module to interact with native BLAS and LAPACK. In my opinion, modifying the PATH or installing to System32 are not ways to properly deploy DLLs on Windows. Cheers, Albert From matthieu.brucher at gmail.com Tue Feb 19 04:54:17 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Feb 2008 10:54:17 +0100 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> Message-ID: > > > Now that you provide an installer for Atlas, it may become the same > > problem as MKL, can't it ? > > It is exactly the same problem, yes. Right now, my installer does not > modify the environment at all (like MKL or ACML, actually), and you have > to do it manually (add PATH, or put in system32). OK ;) > > Yes, they will complain to you for sure, but I can't see where dynamic > > libraries can become a problem : > > - are they numpy dlls ? is so there is a problem, I agree with you > > > numpy 'dll' (they are technically dll, but are named .pyd on recent > version of windows) are mandatory, and is not a problem: it is handled > by python (whose import mechanism knows how to import dll not in the > path env var). Yes, that works as long as there are no "real" dll that were built at the same time. > - are they external dlls ? If so, the installer must have set up > > everything so that everything works, but this is where people bark at > > the wrong people : the users of the dll and not the installers. > > Well, neither ACML or MKL does it, at least the versions I looked into. > You have to launch a special cmd.exe (like for VS2005) which import the > variables. > That's almost stupid... I understand now what you meant :| Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Feb 19 04:55:59 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Feb 2008 18:55:59 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> Message-ID: <47BAA7AF.6010903@ar.media.kyoto-u.ac.jp> Matthieu Brucher wrote: > > Yes, that works as long as there are no "real" dll that were built at > the same time. Well, I don't see that happening unintentionally. Installed modules have a directory architecture, so this is not much of an issue, or am I missing something ? David From matthieu.brucher at gmail.com Tue Feb 19 05:06:02 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Feb 2008 11:06:02 +0100 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> Message-ID: > > It is exactly the same problem, yes. Right now, my installer does not > > modify the environment at all (like MKL or ACML, actually), and you have > > to do it manually (add PATH, or put in system32). > > Have you tried installing the DLLs to C:\Python2x or to the same > directory as the numpy .pyd? As far as I know, this should work. > The first solution may indeed work (perhaps in C:\Python2x\Lib as well ?) but the second does not. I tried it several times (same with Linux BTW), the library search is then the default one, not Python's one. Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Feb 19 05:02:52 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Feb 2008 19:02:52 +0900 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> Message-ID: <47BAA94C.8040907@ar.media.kyoto-u.ac.jp> Albert Strasheim wrote: > > Have you tried installing the DLLs to C:\Python2x or to the same > directory as the numpy .pyd? As far as I know, this should work. Yes, it does, I think I tried it. But this mean duplicating dll, and more worrying, filesystem manipulations, which I don't like much (windows having this extremely annoying 'feature' of locking files by default, which is another thing I would have to care about). > > Obviously there are some issues: multiple modules might be linked to > MKL, in which case you would be duplicating a lot of files, but hey, > so it goes. Ideally, all the modules should be using one module to > interact with native BLAS and LAPACK. > > In my opinion, modifying the PATH or installing to System32 are not > ways to properly deploy DLLs on Windows. I agree, but once you've said modifying PATH and system32 are not the right way, and there is no other way to do it, what's your conclusion :) cheers, David From matthieu.brucher at gmail.com Tue Feb 19 05:47:15 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Feb 2008 11:47:15 +0100 Subject: [Numpy-discussion] [ANN] numscons 0.4.1. Building numpy with MS compiler + g77 works ! In-Reply-To: <47BAA94C.8040907@ar.media.kyoto-u.ac.jp> References: <5b8d13220802181130v54fb2099r2cfd6119beec424f@mail.gmail.com> <5eec5f300802182342r75eb1dc8l766cff531d7588fa@mail.gmail.com> <47BA9D85.9090508@ar.media.kyoto-u.ac.jp> <47BAA2B5.8090907@ar.media.kyoto-u.ac.jp> <5eec5f300802190150m4e51eb67m30937dde25687a2f@mail.gmail.com> <47BAA94C.8040907@ar.media.kyoto-u.ac.jp> Message-ID: > > Yes, it does, I think I tried it. > Strange that it worked for you, it didn't for me :| Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Feb 19 13:38:06 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 19 Feb 2008 13:38:06 -0500 Subject: [Numpy-discussion] partial_sum/adj_difference? Message-ID: Does numpy/scipy have a partial_sum and adj_difference function? partial_sum[i] = \sum_{j=0}^{i} x[j] adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise From faltet at carabos.com Tue Feb 19 13:59:48 2008 From: faltet at carabos.com (Francesc Altet) Date: Tue, 19 Feb 2008 19:59:48 +0100 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: Message-ID: <200802191959.48659.faltet@carabos.com> A Tuesday 19 February 2008, Neal Becker escrigu?: > Does numpy/scipy have a partial_sum and adj_difference function? > > partial_sum[i] = \sum_{j=0}^{i} x[j] > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise I don't know, but by using views the next should be fairly efficient: # Partial sum In [28]: a = numpy.arange(10) In [29]: ps = numpy.empty(len(a), 'int') In [30]: for i in range(len(a)): ps[i] = a[:i].sum() ....: In [31]: ps Out[31]: array([ 0, 0, 1, 3, 6, 10, 15, 21, 28, 36]) # Adj difference: In [35]: ad = numpy.empty(len(a), 'int') In [36]: ad[0] = a[0] In [37]: ad[1:] = a[1:] - a[:-1] In [38]: ad Out[38]: array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1]) Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From nadavh at visionsense.com Tue Feb 19 14:13:03 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 19 Feb 2008 21:13:03 +0200 Subject: [Numpy-discussion] partial_sum/adj_difference? References: <200802191959.48659.faltet@carabos.com> Message-ID: <710F2847B0018641891D9A21602763600B6EDB@ex3.envision.co.il> -----Original Message----- From: numpy-discussion-bounces at scipy.org on behalf of Francesc Altet Instead of for i in range(len(a)): ps[i] = a[:i].sum() use a.cumsum() Nadav Sent: Tue 19-Feb-08 20:59 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] partial_sum/adj_difference? A Tuesday 19 February 2008, Neal Becker escrigu?: > Does numpy/scipy have a partial_sum and adj_difference function? > > partial_sum[i] = \sum_{j=0}^{i} x[j] > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise I don't know, but by using views the next should be fairly efficient: # Partial sum In [28]: a = numpy.arange(10) In [29]: ps = numpy.empty(len(a), 'int') In [30]: for i in range(len(a)): ps[i] = a[:i].sum() ....: In [31]: ps Out[31]: array([ 0, 0, 1, 3, 6, 10, 15, 21, 28, 36]) # Adj difference: In [35]: ad = numpy.empty(len(a), 'int') In [36]: ad[0] = a[0] In [37]: ad[1:] = a[1:] - a[:-1] In [38]: ad Out[38]: array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1]) Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V C?rabos Coop. V. Enjoy Data "-" _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3203 bytes Desc: not available URL: From charlesr.harris at gmail.com Tue Feb 19 14:40:43 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Feb 2008 12:40:43 -0700 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: Message-ID: On Feb 19, 2008 11:38 AM, Neal Becker wrote: > Does numpy/scipy have a partial_sum and adj_difference function? > > partial_sum[i] = \sum_{j=0}^{i} x[j] Make add.accumulate will do the trick: In [1]: add.accumulate(arange(10)) Out[1]: array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45]) In [3]: arange(10).cumsum() Out[3]: array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45]) > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise > > Well, x[1:] - x[:-1] will give the usual differences. If you need the leading x[0] prefix the x vector with a 0. In [4]: a = arange(10) In [5]: b = a[1:] - a[:-1] In [6]: b Out[6]: array([1, 1, 1, 1, 1, 1, 1, 1, 1]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Feb 19 14:41:42 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue, 19 Feb 2008 21:41:42 +0200 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: Message-ID: <20080219194142.GK29099@mentat.za.net> Hi Neal On Tue, Feb 19, 2008 at 01:38:06PM -0500, Neal Becker wrote: > Does numpy/scipy have a partial_sum and adj_difference function? > > partial_sum[i] = \sum_{j=0}^{i} x[j] numpy.cumsum Yikes, the docstring contains "Blah, blah". I'll fix that immediately. > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise numpy.diff Regards St?fan From pav at iki.fi Tue Feb 19 14:48:22 2008 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 19 Feb 2008 21:48:22 +0200 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: Message-ID: <1203450502.7760.2.camel@localhost.localdomain> ti, 2008-02-19 kello 13:38 -0500, Neal Becker kirjoitti: > Does numpy/scipy have a partial_sum and adj_difference function? > > partial_sum[i] = \sum_{j=0}^{i} x[j] > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise cumsum and diff do something like this: >>> import numpy >>> a = [1,2,3,4,5,3,1] >>> numpy.cumsum(a) array([ 1, 3, 6, 10, 15, 18, 19]) >>> numpy.diff(a) array([ 1, 1, 1, 1, -2, -2]) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digitaalisesti allekirjoitettu viestin osa URL: From pgmdevlist at gmail.com Tue Feb 19 15:08:03 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 19 Feb 2008 15:08:03 -0500 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: Message-ID: <200802191508.04059.pgmdevlist@gmail.com> > On Feb 19, 2008 11:38 AM, Neal Becker wrote: > > adj_diff[i] = x[i] - x[i-1] : i > 1, x[i] otherwise > > Well, x[1:] - x[:-1] will give the usual differences. If you need the > leading x[0] prefix the x vector with a 0. There's also numpy.diff, and the little known numpy.ediff1d >>>x=numpy.arange(10) >>>numpy.ediff1d(x,to_begin=0) array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1]) From charlesr.harris at gmail.com Tue Feb 19 15:50:04 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Feb 2008 13:50:04 -0700 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: <20080219194142.GK29099@mentat.za.net> References: <20080219194142.GK29099@mentat.za.net> Message-ID: On Feb 19, 2008 12:41 PM, Stefan van der Walt wrote: > Hi Neal > > On Tue, Feb 19, 2008 at 01:38:06PM -0500, Neal Becker wrote: > > Does numpy/scipy have a partial_sum and adj_difference function? > > > > partial_sum[i] = \sum_{j=0}^{i} x[j] > > numpy.cumsum > > Yikes, the docstring contains "Blah, blah". I'll fix that > immediately. > Gosh, And here I thought you were going to fix that. Deleting the "blahs" isn't a fix, it's a coverup. Now there is no extended documentation at all. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Tue Feb 19 16:17:45 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 19 Feb 2008 22:17:45 +0100 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: <20080219194142.GK29099@mentat.za.net> Message-ID: On Tue, 19 Feb 2008 13:50:04 -0700 "Charles R Harris" wrote: > On Feb 19, 2008 12:41 PM, Stefan van der Walt > wrote: > >> Hi Neal >> >> On Tue, Feb 19, 2008 at 01:38:06PM -0500, Neal Becker >>wrote: >> > Does numpy/scipy have a partial_sum and adj_difference >>function? >> > >> > partial_sum[i] = \sum_{j=0}^{i} x[j] >> >> numpy.cumsum >> >> Yikes, the docstring contains "Blah, blah". I'll fix >>that >> immediately. >> > > Gosh, > > And here I thought you were going to fix that. Deleting >the "blahs" isn't a > fix, it's a coverup. Now there is no extended >documentation at all. > > Chuck ;-) >>> from numpy import cumprod >>> help (cumprod) Nils From stefan at sun.ac.za Tue Feb 19 16:20:19 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue, 19 Feb 2008 23:20:19 +0200 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: <20080219194142.GK29099@mentat.za.net> Message-ID: <20080219212018.GM29099@mentat.za.net> On Tue, Feb 19, 2008 at 01:50:04PM -0700, Charles R Harris wrote: > And here I thought you were going to fix that. Deleting the "blahs" isn't a > fix, it's a coverup. Now there is no extended documentation at all. I wouldn't call "Blah, blah" extended documentation -- in fact, I would've been rather embarrassed if that showed up on my screen during a workshop. "Blah, blah" also doesn't strike me as the ideal TODO marker. We can maybe use some more sensible text, or write a decorator to mark these functions. I'd say we add them to the TODO list for the next doc-day and run with it... Regards St?fan From sameerslists at gmail.com Tue Feb 19 18:10:10 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Tue, 19 Feb 2008 17:10:10 -0600 Subject: [Numpy-discussion] numpy record array segfault Message-ID: <8fb8cc060802191510h29c2503x56ac582f95203501@mail.gmail.com> Hi, I'm getting a segfault when using python objects with record arrays. The code (below) basically assigns a single datetime object to a slice of a column in the record array and then python segfaults as soon as I try to access those array values. I'm using the latest svn version of numpy compiled with gcc 3.4.1 on Solaris (intel). Any ideas why this is happening? A reference counting problem maybe? Thanks for taking a look. Sameer johnh at flag:~> gcc --version gcc (GCC) 3.4.1 Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. johnh at flag:~> uname -a SunOS flag 5.10 Generic_118855-15 i86pc i386 i86pc johnh at flag:~> python -V Python 2.4.2 johnh at flag:~> cat tmp.py import datetime import numpy as np print np.__version__ def myfunc(N): newrec = np.empty(N, dtype=[('date', '|O4'), ('age', int)]) newrec['date'] = datetime.date(2002,1,1) newrec['age'] = 22 newrec['date'][1:12] = datetime.date(2003,1,1) return newrec.view(np.recarray) if __name__=='__main__': newrec = myfunc(29) print newrec['date'] johnh at flag:~> python ~/tmp.py 1.0.5.dev4812 Segmentation Fault (core dumped) From charlesr.harris at gmail.com Tue Feb 19 19:36:52 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Feb 2008 17:36:52 -0700 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: <20080219212018.GM29099@mentat.za.net> References: <20080219194142.GK29099@mentat.za.net> <20080219212018.GM29099@mentat.za.net> Message-ID: On Feb 19, 2008 2:20 PM, Stefan van der Walt wrote: > On Tue, Feb 19, 2008 at 01:50:04PM -0700, Charles R Harris wrote: > > And here I thought you were going to fix that. Deleting the "blahs" > isn't a > > fix, it's a coverup. Now there is no extended documentation at all. > > I wouldn't call "Blah, blah" extended documentation -- in fact, I > would've been rather embarrassed if that showed up on my screen during > a workshop. > > "Blah, blah" also doesn't strike me as the ideal TODO marker. We can > maybe use some more sensible text, or write a decorator to mark these > functions. > It has certainly been effective in practice. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Wed Feb 20 01:51:43 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Feb 2008 15:51:43 +0900 Subject: [Numpy-discussion] CFLAGS, LDFLAGS, appending and overwriting: what should be the behavior of numscons ? Message-ID: <47BBCDFF.9010109@ar.media.kyoto-u.ac.jp> Hi, I would like to know what the UI should be for numscons wrt to compilation / link flags. This is an issue which has confused many people with distutils, and something we can fix with numscons. Several approaches are possible for numscons, but I was wondering about the preferred behavior. Right now, by default: - using CFLAGS will only change debug/warning/optimization flags (that is, you can't remove -fPIC with CFLAGS on platforms which need it). I understand than totally overwriting the flags should also be possible (e.g. the current *distutils* behavior). Is handling new compilers the only reason, in which case a better solution is available for numscons through scons tools ? - using LDFLAGS: I do not use it right now. Is anything needed on this front ? More generally, since numscons is easier to use now thanks to the merge into the numpy trunk, and can work on all major platforms since 0.4.1 release, I am interested in getting feedback from numpy developers as well. Internally, I tried to make as few choices as possible, so it should be quite flexible wrt things difficult to handle with distutils. thanks, David From devnew at gmail.com Wed Feb 20 02:08:17 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Tue, 19 Feb 2008 23:08:17 -0800 (PST) Subject: [Numpy-discussion] finding eigenvectors etc Message-ID: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> hi i was calculating eigenvalues and eigenvectors for a covariancematrix using numpy adjfaces=matrix(adjarr) faces_trans=adjfaces.transpose() covarmat=adjfaces*faces_trans evalues,evect=eigh(covarmat) for a sample covarmat like [[ 1.69365981e+13 , -5.44960784e+12, -9.00346400e+12 , -2.48352625e +12] [ -5.44960784e+12, 5.08860660e+12, -8.67539205e+11 , 1.22854045e +12] [ -9.00346400e+12, -8.67539205e+11, 1.78184943e+13 ,-7.94749110e +12] [ -2.48352625e+12 , 1.22854045e+12, -7.94749110e+12 , 9.20247690e +12]] i get these evalues [ 3.84433376e-03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13] evect [[ 0.5 -0.04330262 0.60041892 -0.62259297] [ 0.5 -0.78034307 -0.35933516 0.10928372] [ 0.5 0.25371931 0.3700265 0.74074753] [ 0.5 0.56992638 -0.61111026 -0.22743827]] what bothers me is that for the same covarmat i get a different set of eigenvectors and eigenvalues when i use java library Jama's methods Matrix faceM = new Matrix(faces, nrfaces,length); Matrix faceM_transpose = faceM.transpose(); Matrix covarM = faceM.times(faceM_transpose); EigenvalueDecomposition E = covarM.eig(); double[] eigValue = diag(E.getD().getArray()); double[][] eigVector = E.getV().getArray(); here the eigValue= [-6.835301757686207E-4, 4.170999335736721E12, 1.7177136443134865E13, 2.7698040117669414E13] and eigVector [ [0.5, -0.04330262221379265, 0.6004189175979487, 0.6225929700052174], [0.5, -0.7803430730840767, -0.3593351608695496, -0.10928371540423852], [0.49999999999999994, 0.2537193127299541, 0.370026504572483, -0.7407475253159538], [0.49999999999999994, 0.5699263825679145, -0.6111102613008821, 0.22743827071497524] ] I am quite confused bythis difference in results ..the first element in eigValue is different and also the signs in last column of eigVectors are diff..can someone tell me why this happens? thanks dn From matthieu.brucher at gmail.com Wed Feb 20 02:13:57 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 20 Feb 2008 08:13:57 +0100 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> Message-ID: Hi, The results are OK, they are very close. Your matrix is almost singular, is badly conditionned, ... But the results are very close is you check them in a relative way. 3.84433376e-03 or -6.835301757686207E-4 is the same compared to 2.76980401e+13 Matthieu 2008/2/20, devnew at gmail.com : > > hi > i was calculating eigenvalues and eigenvectors for a covariancematrix > using numpy > > adjfaces=matrix(adjarr) > faces_trans=adjfaces.transpose() > covarmat=adjfaces*faces_trans > evalues,evect=eigh(covarmat) > > for a sample covarmat like > [[ 1.69365981e+13 , -5.44960784e+12, -9.00346400e+12 , -2.48352625e > +12] > [ -5.44960784e+12, 5.08860660e+12, -8.67539205e+11 , 1.22854045e > +12] > [ -9.00346400e+12, -8.67539205e+11, 1.78184943e+13 ,-7.94749110e > +12] > [ -2.48352625e+12 , 1.22854045e+12, -7.94749110e+12 , 9.20247690e > +12]] > > i get these > evalues > [ 3.84433376e-03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13] > > evect > [[ 0.5 -0.04330262 0.60041892 -0.62259297] > [ 0.5 -0.78034307 -0.35933516 0.10928372] > [ 0.5 0.25371931 0.3700265 0.74074753] > [ 0.5 0.56992638 -0.61111026 -0.22743827]] > > what bothers me is that for the same covarmat i get a different set of > eigenvectors and eigenvalues when i use java library Jama's methods > Matrix faceM = new Matrix(faces, nrfaces,length); > Matrix faceM_transpose = faceM.transpose(); > Matrix covarM = faceM.times(faceM_transpose); > EigenvalueDecomposition E = covarM.eig(); > double[] eigValue = diag(E.getD().getArray()); > double[][] eigVector = E.getV().getArray(); > > here the eigValue= > [-6.835301757686207E-4, 4.170999335736721E12, 1.7177136443134865E13, > 2.7698040117669414E13] > > and eigVector > [ > [0.5, -0.04330262221379265, 0.6004189175979487, 0.6225929700052174], > [0.5, -0.7803430730840767, -0.3593351608695496, -0.10928371540423852], > [0.49999999999999994, 0.2537193127299541, 0.370026504572483, > -0.7407475253159538], > [0.49999999999999994, 0.5699263825679145, -0.6111102613008821, > 0.22743827071497524] > ] > > I am quite confused bythis difference in results ..the first element > in eigValue is different and also the signs in last column of > eigVectors are diff..can someone tell me why this happens? > thanks > dn > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Wed Feb 20 02:22:53 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 20 Feb 2008 08:22:53 +0100 Subject: [Numpy-discussion] numpy record array segfault In-Reply-To: <8fb8cc060802191510h29c2503x56ac582f95203501@mail.gmail.com> References: <8fb8cc060802191510h29c2503x56ac582f95203501@mail.gmail.com> Message-ID: On Tue, 19 Feb 2008 17:10:10 -0600 "Sameer DCosta" wrote: > Hi, > > I'm getting a segfault when using python objects with >record arrays. > The code (below) basically assigns a single datetime >object to a slice > of a column in the record array and then python >segfaults as soon as I > try to access those array values. I'm using the latest >svn version of > numpy compiled with gcc 3.4.1 on Solaris (intel). Any >ideas why this > is happening? A reference counting problem maybe? Thanks >for taking a > look. > > Sameer > > johnh at flag:~> gcc --version > gcc (GCC) 3.4.1 > Copyright (C) 2004 Free Software Foundation, Inc. > This is free software; see the source for copying >conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A >PARTICULAR PURPOSE. > > johnh at flag:~> uname -a > SunOS flag 5.10 Generic_118855-15 i86pc i386 i86pc > johnh at flag:~> python -V > Python 2.4.2 > johnh at flag:~> cat tmp.py > import datetime > import numpy as np > > print np.__version__ > > def myfunc(N): > newrec = np.empty(N, dtype=[('date', '|O4'), ('age', >int)]) > newrec['date'] = datetime.date(2002,1,1) > newrec['age'] = 22 > > > newrec['date'][1:12] = datetime.date(2003,1,1) > return newrec.view(np.recarray) > > > if __name__=='__main__': > newrec = myfunc(29) > print newrec['date'] > > johnh at flag:~> python ~/tmp.py > 1.0.5.dev4812 > Segmentation Fault (core dumped) > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion No problem here python tmp.py 1.0.5.dev4811 [2002-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01] Can you send a backtrace (gdb) ? Nils From focke at slac.stanford.edu Wed Feb 20 02:24:01 2008 From: focke at slac.stanford.edu (Warren Focke) Date: Tue, 19 Feb 2008 23:24:01 -0800 (PST) Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> Message-ID: Yes. Your first eigenvalue is effectively 0, the values you see are just noise. Different implementations produce different noise. As for the signs ot the eigenvector components, which direction is + or - X is arbitrary. Different implementations follow different conventions as to which is which. Sometimes it's just an accident. Nothing-to-see-here-move-along-ly, w On Wed, 20 Feb 2008, Matthieu Brucher wrote: > Hi, > > The results are OK, they are very close. Your matrix is almost singular, is > badly conditionned, ... But the results are very close is you check them in > a relative way. 3.84433376e-03 or -6.835301757686207E-4 is the same compared > to 2.76980401e+13 > > Matthieu > > 2008/2/20, devnew at gmail.com : >> >> hi >> i was calculating eigenvalues and eigenvectors for a covariancematrix >> using numpy >> >> adjfaces=matrix(adjarr) >> faces_trans=adjfaces.transpose() >> covarmat=adjfaces*faces_trans >> evalues,evect=eigh(covarmat) >> >> for a sample covarmat like >> [[ 1.69365981e+13 , -5.44960784e+12, -9.00346400e+12 , -2.48352625e >> +12] >> [ -5.44960784e+12, 5.08860660e+12, -8.67539205e+11 , 1.22854045e >> +12] >> [ -9.00346400e+12, -8.67539205e+11, 1.78184943e+13 ,-7.94749110e >> +12] >> [ -2.48352625e+12 , 1.22854045e+12, -7.94749110e+12 , 9.20247690e >> +12]] >> >> i get these >> evalues >> [ 3.84433376e-03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13] >> >> evect >> [[ 0.5 -0.04330262 0.60041892 -0.62259297] >> [ 0.5 -0.78034307 -0.35933516 0.10928372] >> [ 0.5 0.25371931 0.3700265 0.74074753] >> [ 0.5 0.56992638 -0.61111026 -0.22743827]] >> >> what bothers me is that for the same covarmat i get a different set of >> eigenvectors and eigenvalues when i use java library Jama's methods >> Matrix faceM = new Matrix(faces, nrfaces,length); >> Matrix faceM_transpose = faceM.transpose(); >> Matrix covarM = faceM.times(faceM_transpose); >> EigenvalueDecomposition E = covarM.eig(); >> double[] eigValue = diag(E.getD().getArray()); >> double[][] eigVector = E.getV().getArray(); >> >> here the eigValue= >> [-6.835301757686207E-4, 4.170999335736721E12, 1.7177136443134865E13, >> 2.7698040117669414E13] >> >> and eigVector >> [ >> [0.5, -0.04330262221379265, 0.6004189175979487, 0.6225929700052174], >> [0.5, -0.7803430730840767, -0.3593351608695496, -0.10928371540423852], >> [0.49999999999999994, 0.2537193127299541, 0.370026504572483, >> -0.7407475253159538], >> [0.49999999999999994, 0.5699263825679145, -0.6111102613008821, >> 0.22743827071497524] >> ] >> >> I am quite confused bythis difference in results ..the first element >> in eigValue is different and also the signs in last column of >> eigVectors are diff..can someone tell me why this happens? >> thanks >> dn >> >> >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > French PhD student > Website : http://matthieu-brucher.developpez.com/ > Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn : http://www.linkedin.com/in/matthieubrucher > From devnew at gmail.com Wed Feb 20 03:00:41 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Wed, 20 Feb 2008 00:00:41 -0800 (PST) Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> Message-ID: <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> > Different implementations follow different conventions as to which > is which. thank you for the replies ..the reason why i asked was that the most significant eigenvectors ( sorted according to eigenvalues) are later used in calculations and then the results obtained differ in java and python..so i was worried as to which one to use >Your matrix is almost singular, is badly conditionned, Mathew, can you explain that..i didn't quite get it.. dn From matthieu.brucher at gmail.com Wed Feb 20 03:19:20 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 20 Feb 2008 09:19:20 +0100 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: > > >Your matrix is almost singular, is badly conditionned, > > Mathew, can you explain that..i didn't quite get it.. > dn > The condition number is the ratio between the biggest eigenvalue and the lowest one. In your case, it is 10E-16, so the precision of the double numbers. That means that some computations that depend on this ratio (like inversions) can lead to numerical errors. In your case, it is OK, but you should keep in mind this kind of trouble (read what you can on numerical computations ;)) Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From focke at slac.stanford.edu Wed Feb 20 03:27:00 2008 From: focke at slac.stanford.edu (Warren Focke) Date: Wed, 20 Feb 2008 00:27:00 -0800 (PST) Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: The vectors that you used to build your covariance matrix all lay in or close to a 3-dimensional subspace of the 4-dimensional space in which they were represented. So one of the eigenvalues of the covariance matrix is 0, or close to it; the matrix is singular. Condition is the ratio of the largest eigenvalue to the smallest, large values can be troublesome. Here it is ~1e17, which is the dynamic range of doubles. Which means that the value you observe for the smallest eigenvaulue is just the result of rounding errors. w On Wed, 20 Feb 2008, devnew at gmail.com wrote: >> Different implementations follow different conventions as to which >> is which. > > thank you for the replies ..the reason why i asked was that the most > significant eigenvectors ( sorted according to eigenvalues) are later > used in calculations and then the results obtained differ in java and > python..so i was worried as to which one to use > >> Your matrix is almost singular, is badly conditionned, > > Mathew, can you explain that..i didn't quite get it.. > dn > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Wed Feb 20 03:52:08 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Feb 2008 01:52:08 -0700 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: On Feb 20, 2008 1:00 AM, devnew at gmail.com wrote: > > Different implementations follow different conventions as to which > > is which. > > thank you for the replies ..the reason why i asked was that the most > significant eigenvectors ( sorted according to eigenvalues) are later > used in calculations and then the results obtained differ in java and > python..so i was worried as to which one to use > How are you using the values? How significant are the differences? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Feb 20 04:11:05 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed, 20 Feb 2008 11:11:05 +0200 Subject: [Numpy-discussion] partial_sum/adj_difference? In-Reply-To: References: <20080219194142.GK29099@mentat.za.net> <20080219212018.GM29099@mentat.za.net> Message-ID: <20080220091105.GO29099@mentat.za.net> On Tue, Feb 19, 2008 at 05:36:52PM -0700, Charles R Harris wrote: > On Feb 19, 2008 2:20 PM, Stefan van der Walt wrote: > > On Tue, Feb 19, 2008 at 01:50:04PM -0700, Charles R Harris wrote: > > And here I thought you were going to fix that. Deleting the > > "blahs" isn't a > fix, it's a coverup. Now there is no extended > > documentation at all. > > I wouldn't call "Blah, blah" extended documentation -- in fact, I > would've been rather embarrassed if that showed up on my screen during > a workshop. > > "Blah, blah" also doesn't strike me as the ideal TODO marker. We can > maybe use some more sensible text, or write a decorator to mark these > functions. > > It has certainly been effective in practice. Yes, ironically :) http://projects.scipy.org/scipy/numpy/changeset/4813 http://projects.scipy.org/scipy/numpy/changeset/4814 The first courtesy of Matthew Brett. Regards St?fan From ndbecker2 at gmail.com Wed Feb 20 07:16:44 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 20 Feb 2008 07:16:44 -0500 Subject: [Numpy-discussion] mixed mode arithmetic Message-ID: I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve? From devnew at gmail.com Wed Feb 20 08:45:04 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Wed, 20 Feb 2008 05:45:04 -0800 (PST) Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: > How are you using the values? How significant are the differences? > i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. like #pseudocode... sortedeigenvectors=mysort(eigenvectors) facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */ if i do this in python i get a facespace [[-1028755.44341439, 1480864.32750018, 1917712.0162213, -983526.60328021, -1662357.13091294, -499792.41540038, 208696.97376238, -916628.92613255, -1454071.95225114, -1563209.39113008, -231969.96968212 , -768417.98606125] [ -866174.88336972, 1212934.33524067, 543013.86361006, -1352625.86282073, -309872.30710619 , 466301.12884198, 216088.93319292 ,-1512378.8688779, 2581349.03171275, 1797812.01270033, 1876754.7339826 , 751781.8166291 ] [ -57026.32567001 , -69918.94570563, -399715.51441018, -233720.8360051, 188227.41229887, 177341.47889165 , -65241.23138424 , -311917.28253664, 1133399.70627111, 1089028.99019462, 684854.41725944 , 413465.86494352] [ 405955.15245412, 562832.78296479 , 864334.63457882 , 629752.80210603, 894768.52572026, 578460.80766675 , 629146.32442893 , 768037.57754708, -485157.28573271, -1718776.11176486 , -780929.18155991 , -165391.19551137]] whereas the same steps in java [ [-516653.73649950844, 274000.54127598763, -108557.2732037272, -799041.4108906921, -495577.7478765989, -49125.38109725664, -162041.57505147497, -917033.3002665655, 1207264.8912226136, 1384551.3481945703, 1056098.9289163304, 357801.9553511339], [-956064.0724430305, 1424775.0801567277, 898684.8188346579, -1385008.5401600213, -514677.038573372, 387195.56502804917, 281164.65362325957, -1512307.8891047493, 2114204.697920214, 1280391.7056360755, 1650660.0594245053, 554096.482085637], [-666313.7580419029, 1365981.2428742633, 2011095.455319733, -453217.29083790665, -1199981.2283586136, -358852.32104592584, 375855.4012532809, -311436.16701894277, -2033000.776565753, -2418152.391663846, -847661.841421182, -926486.0374297247], [593030.0669844414, 121955.63569302124, 124121.99904933537, 697146.7418886195, 1321002.514808584, 743093.1371151333, 493712.52017493406, 767889.8563902564, 487050.6874229272, -641935.1621667973, -310387.14691965195, 246026.0999929544] ] such difference causes diff results in all calculations involving the facespace dn From matthieu.brucher at gmail.com Wed Feb 20 08:53:03 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 20 Feb 2008 14:53:03 +0100 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: You should have such differences, that's strange. Are you sure you're using the correct eigenvectors ? Matthieu 2008/2/20, devnew at gmail.com : > > > > How are you using the values? How significant are the differences? > > > > > i am using these eigenvectors to do PCA on a set of images(of faces).I > sort the eigenvectors in descending order of their eigenvalues and > this is multiplied with the (orig data of some images viz a matrix)to > obtain a facespace. > like > #pseudocode... > sortedeigenvectors=mysort(eigenvectors) > facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */ > > if i do this in python i get a facespace > [[-1028755.44341439, 1480864.32750018, 1917712.0162213, > -983526.60328021, > -1662357.13091294, -499792.41540038, 208696.97376238, > -916628.92613255, > -1454071.95225114, -1563209.39113008, -231969.96968212 , > -768417.98606125] > [ -866174.88336972, 1212934.33524067, 543013.86361006, > -1352625.86282073, > -309872.30710619 , 466301.12884198, > 216088.93319292 ,-1512378.8688779, > 2581349.03171275, 1797812.01270033, 1876754.7339826 , > 751781.8166291 ] > [ -57026.32567001 , -69918.94570563, -399715.51441018, > -233720.8360051, > 188227.41229887, 177341.47889165 , -65241.23138424 , > -311917.28253664, > 1133399.70627111, 1089028.99019462, 684854.41725944 , > 413465.86494352] > [ 405955.15245412, 562832.78296479 , 864334.63457882 , > 629752.80210603, > 894768.52572026, 578460.80766675 , 629146.32442893 , > 768037.57754708, > -485157.28573271, -1718776.11176486 , -780929.18155991 , > -165391.19551137]] > > whereas the same steps > in java > [ > [-516653.73649950844, 274000.54127598763, -108557.2732037272, > -799041.4108906921, -495577.7478765989, -49125.38109725664, > -162041.57505147497, -917033.3002665655, 1207264.8912226136, > 1384551.3481945703, 1056098.9289163304, 357801.9553511339], > [-956064.0724430305, 1424775.0801567277, 898684.8188346579, > -1385008.5401600213, -514677.038573372, 387195.56502804917, > 281164.65362325957, -1512307.8891047493, 2114204.697920214, > 1280391.7056360755, 1650660.0594245053, 554096.482085637], > [-666313.7580419029, 1365981.2428742633, 2011095.455319733, > -453217.29083790665, -1199981.2283586136, -358852.32104592584, > 375855.4012532809, -311436.16701894277, -2033000.776565753, > -2418152.391663846, -847661.841421182, -926486.0374297247], > [593030.0669844414, 121955.63569302124, 124121.99904933537, > 697146.7418886195, 1321002.514808584, 743093.1371151333, > 493712.52017493406, 767889.8563902564, 487050.6874229272, > -641935.1621667973, -310387.14691965195, 246026.0999929544] > ] > > such difference causes diff results in all calculations involving the > facespace > > > dn > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From javier.maria.torres at ericsson.com Wed Feb 20 09:56:20 2008 From: javier.maria.torres at ericsson.com (Javier Maria Torres) Date: Wed, 20 Feb 2008 15:56:20 +0100 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: <7ADF8B83DD0ED941BFA36F9AA2C140866C16D4@eesmdmw020.eemea.ericsson.se> Hi, I would also like to know what Java package you're using. I find Weka PCA differs from Matlab (whereas previous experiments with Scipy PCA didn't show significant differences from Matlab), but I'm still looking into the cause. Thanks, and greetings, Javier Torres -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of devnew at gmail.com Sent: mi?rcoles, 20 de febrero de 2008 14:45 To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] finding eigenvectors etc > How are you using the values? How significant are the differences? > i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. like #pseudocode... sortedeigenvectors=mysort(eigenvectors) facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */ if i do this in python i get a facespace [[-1028755.44341439, 1480864.32750018, 1917712.0162213, -983526.60328021, -1662357.13091294, -499792.41540038, 208696.97376238, -916628.92613255, -1454071.95225114, -1563209.39113008, -231969.96968212 , -768417.98606125] [ -866174.88336972, 1212934.33524067, 543013.86361006, -1352625.86282073, -309872.30710619 , 466301.12884198, 216088.93319292 ,-1512378.8688779, 2581349.03171275, 1797812.01270033, 1876754.7339826 , 751781.8166291 ] [ -57026.32567001 , -69918.94570563, -399715.51441018, -233720.8360051, 188227.41229887, 177341.47889165 , -65241.23138424 , -311917.28253664, 1133399.70627111, 1089028.99019462, 684854.41725944 , 413465.86494352] [ 405955.15245412, 562832.78296479 , 864334.63457882 , 629752.80210603, 894768.52572026, 578460.80766675 , 629146.32442893 , 768037.57754708, -485157.28573271, -1718776.11176486 , -780929.18155991 , -165391.19551137]] whereas the same steps in java [ [-516653.73649950844, 274000.54127598763, -108557.2732037272, -799041.4108906921, -495577.7478765989, -49125.38109725664, -162041.57505147497, -917033.3002665655, 1207264.8912226136, 1384551.3481945703, 1056098.9289163304, 357801.9553511339], [-956064.0724430305, 1424775.0801567277, 898684.8188346579, -1385008.5401600213, -514677.038573372, 387195.56502804917, 281164.65362325957, -1512307.8891047493, 2114204.697920214, 1280391.7056360755, 1650660.0594245053, 554096.482085637], [-666313.7580419029, 1365981.2428742633, 2011095.455319733, -453217.29083790665, -1199981.2283586136, -358852.32104592584, 375855.4012532809, -311436.16701894277, -2033000.776565753, -2418152.391663846, -847661.841421182, -926486.0374297247], [593030.0669844414, 121955.63569302124, 124121.99904933537, 697146.7418886195, 1321002.514808584, 743093.1371151333, 493712.52017493406, 767889.8563902564, 487050.6874229272, -641935.1621667973, -310387.14691965195, 246026.0999929544] ] such difference causes diff results in all calculations involving the facespace dn _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From sameerslists at gmail.com Wed Feb 20 10:47:56 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Wed, 20 Feb 2008 09:47:56 -0600 Subject: [Numpy-discussion] numpy record array segfault In-Reply-To: References: <8fb8cc060802191510h29c2503x56ac582f95203501@mail.gmail.com> Message-ID: <8fb8cc060802200747r2f549570pacbf0b6bc71567be@mail.gmail.com> On Wed, Feb 20, 2008 at 1:22 AM, Nils Wagner wrote: > > On Tue, 19 Feb 2008 17:10:10 -0600 > "Sameer DCosta" wrote: > > Hi, > > > > I'm getting a segfault when using python objects with > >record arrays. > > The code (below) basically assigns a single datetime > >object to a slice > > of a column in the record array and then python > >segfaults as soon as I > > try to access those array values. I'm using the latest > >svn version of > > numpy compiled with gcc 3.4.1 on Solaris (intel). Any > >ideas why this > > is happening? A reference counting problem maybe? Thanks > >for taking a > > look. > > > > Sameer > > > > johnh at flag:~> gcc --version > > gcc (GCC) 3.4.1 > > Copyright (C) 2004 Free Software Foundation, Inc. > > This is free software; see the source for copying > >conditions. There is NO > > warranty; not even for MERCHANTABILITY or FITNESS FOR A > >PARTICULAR PURPOSE. > > > > johnh at flag:~> uname -a > > SunOS flag 5.10 Generic_118855-15 i86pc i386 i86pc > > johnh at flag:~> python -V > > Python 2.4.2 > > johnh at flag:~> cat tmp.py > > import datetime > > import numpy as np > > > > print np.__version__ > > > > def myfunc(N): > > newrec = np.empty(N, dtype=[('date', '|O4'), ('age', > >int)]) > > newrec['date'] = datetime.date(2002,1,1) > > newrec['age'] = 22 > > > > > > newrec['date'][1:12] = datetime.date(2003,1,1) > > return newrec.view(np.recarray) > > > > > > if __name__=='__main__': > > newrec = myfunc(29) > > print newrec['date'] > > > > johnh at flag:~> python ~/tmp.py > > 1.0.5.dev4812 > > Segmentation Fault (core dumped) > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > No problem here > > python tmp.py > 1.0.5.dev4811 > [2002-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 > 2003-01-01 > 2003-01-01 2003-01-01 2003-01-01 2003-01-01 2003-01-01 > 2003-01-01 > 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 > 2002-01-01 > 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 > 2002-01-01 > 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01] > > Can you send a backtrace (gdb) ? > > Nils > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Thanks for taking a look Nils. Can you try with a larger array (change the value of N from 29 to 2900)? I have attached the backtrace below. Program terminated with signal 11, Segmentation fault. #0 PyObject_Malloc (nbytes=24) at ../Objects/obmalloc.c:605 warning: Source file is more recent than executable. 605 if ((pool->freeblock = *(block **)bp) != NULL) { (gdb) backtrace #0 PyObject_Malloc (nbytes=24) at ../Objects/obmalloc.c:605 #1 0x08084eee in _PyObject_New (tp=0x8154980) at ../Objects/object.c:188 #2 0x0810a3f2 in range_iter (seq=0x841b676) at ../Objects/rangeobject.c:266 #3 0x08064d32 in PyObject_GetIter (o=0x841b676) at ../Objects/abstract.c:2228 #4 0x080bc202 in PyEval_EvalFrame (f=0x844ac1c) at ../Python/ceval.c:2107 #5 0x080bf87d in PyEval_EvalFrame (f=0x822e7dc) at ../Python/ceval.c:3640 #6 0x080c0272 in PyEval_EvalCodeEx (co=0x83ab4e0, globals=0x2, locals=0x81373e8, args=0x822e928, argcount=6, kws=0x81be290, kwcount=0, defs=0x82cb9d8, defcount=2, closure=0x0) at ../Python/ceval.c:2736 #7 0x080bdaf8 in PyEval_EvalFrame (f=0x81be104) at ../Python/ceval.c:3650 #8 0x080c0272 in PyEval_EvalCodeEx (co=0x83ab5e0, globals=0x2, locals=0x81373e8, args=0x0, argcount=7, kws=0x81a2ef0, kwcount=0, defs=0x83ae878, defcount=6, closure=0x0) at ../Python/ceval.c:2736 #9 0x080bdaf8 in PyEval_EvalFrame (f=0x81a2d74) at ../Python/ceval.c:3650 #10 0x080c0272 in PyEval_EvalCodeEx (co=0x83a87a0, globals=0x2, locals=0x81373e8, args=0x1, argcount=1, kws=0x0, kwcount=0, defs=0x83ac628, defcount=3, closure=0x0) at ../Python/ceval.c:2736 #11 0x08108e8e in function_call (func=0x83b3a04, arg=0x81f50ac, kw=0x0) at ../Objects/funcobject.c:548 #12 0x0806412c in PyObject_Call (func=0x841b000, arg=0x81f50ac, kw=0x0) at ../Objects/abstract.c:1795 #13 0x080b8427 in PyEval_CallObjectWithKeywords (func=0x83b3a04, arg=0x81f50ac, kw=0x0) at ../Python/ceval.c:3425 #14 0xd009f393 in array_str (self=0x2) at numpy/core/src/arrayobject.c:4259 #15 0x080850d2 in PyObject_Str (v=0x2) at ../Objects/object.c:347 #16 0x08085212 in internal_print (op=0x82a6708, fp=0x815bf90, flags=1, nesting=0) at ../Objects/object.c:241 #17 0x0806eafb in PyFile_WriteObject (v=0x82a6708, f=0x82a6708, flags=1) at ../Objects/fileobject.c:2036 #18 0x080bcc84 in PyEval_EvalFrame (f=0x819ccbc) at ../Python/ceval.c:1538 #19 0x080c0272 in PyEval_EvalCodeEx (co=0x81e70a0, globals=0x2, locals=0x81373e8, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:2736 #20 0x080c03e6 in PyEval_EvalCode (co=0x81e70a0, globals=0x8175824, locals=0x8175824) at ../Python/ceval.c:484 #21 0x080e5450 in PyRun_FileExFlags (fp=0x815bfb0, filename=0x8046927 "/home/titan/johnh/tmp.py", start=257, globals=0x8175824, locals=0x8175824, closeit=1, flags=0x81e70a0) at ../Python/pythonrun.c:1265 #22 0x080e5d36 in PyRun_SimpleFileExFlags (fp=0x815bfb0, filename=0x8046927 "/home/titan/johnh/tmp.py", closeit=1, flags=0x8046628) at ../Python/pythonrun.c:860 #23 0x0805d41b in Py_Main (argc=1, argv=0x8046740) at ../Modules/main.c:492 #24 0x0805cb0b in main (argc=2, argv=0x8046740) at ../Modules/python.c:23 Sameer From sameerslists at gmail.com Wed Feb 20 16:10:16 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Wed, 20 Feb 2008 15:10:16 -0600 Subject: [Numpy-discussion] Rename record array fields Message-ID: <8fb8cc060802201310j3a3b8518naab15cc764c37a6b@mail.gmail.com> Is there a way to rename record array fields without making a copy of the whole record array? Thanks in advance for your replies. Sameer From jdh2358 at gmail.com Wed Feb 20 16:49:37 2008 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 20 Feb 2008 15:49:37 -0600 Subject: [Numpy-discussion] bug in numpy.histogram? Message-ID: <88e473830802201349q62aeb56bq132476d2c29f136d@mail.gmail.com> We recently deprecated matplotlib.mlab.hist, and I am now hitting a bug in numpy's historgram, which appears to be caused by the use of "any" that does not exist in the namespace. Small patch attached. The example below exposes the bug: Python 2.4.2 (#1, Feb 23 2006, 12:48:31) Type "copyright", "credits" or "license" for more information. IPython 0.8.3.svn.r2876 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: import numpy as np In [2]: np.__file__ Out[2]: '/home/titan/johnh/dev/lib/python2.4/site-packages/numpy/__init__.pyc' In [3]: np.__version__ Out[3]: '1.0.5.dev4814' In [4]: x = np.random.randn(100) In [5]: bins = np.linspace(x.min(), x.max(), 40) In [6]: y = np.histogram(x, bins=bins) ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in ? File "/home/titan/johnh/dev/lib/python2.4/site-packages/numpy/lib/function_base.py", line 155, in histogram if(any(bins[1:]-bins[:-1] < 0)): NameError: global name 'any' is not defined -------------- next part -------------- A non-text attachment was scrubbed... Name: hist_any.diff Type: text/x-patch Size: 533 bytes Desc: not available URL: From robert.kern at gmail.com Wed Feb 20 16:59:13 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Feb 2008 15:59:13 -0600 Subject: [Numpy-discussion] Rename record array fields In-Reply-To: <8fb8cc060802201310j3a3b8518naab15cc764c37a6b@mail.gmail.com> References: <8fb8cc060802201310j3a3b8518naab15cc764c37a6b@mail.gmail.com> Message-ID: <3d375d730802201359h7a3327acv52fee7ceaace2c7f@mail.gmail.com> On Wed, Feb 20, 2008 at 3:10 PM, Sameer DCosta wrote: > Is there a way to rename record array fields without making a copy of > the whole record array? Make a new dtype object with the new names. Use the .view() method on arrays to get a view of the array with the new dtype. In [1]: from numpy import * In [2]: olddt = dtype([('foo', int), ('bar', float)]) In [3]: a = zeros(10, olddt) In [4]: a Out[4]: array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)], dtype=[('foo', ' References: <88e473830802201349q62aeb56bq132476d2c29f136d@mail.gmail.com> Message-ID: <3d375d730802201405w5f842588i85e83ee50000bc3d@mail.gmail.com> On Wed, Feb 20, 2008 at 3:49 PM, John Hunter wrote: > We recently deprecated matplotlib.mlab.hist, and I am now hitting a > bug in numpy's historgram, which appears to be caused by the use of > "any" that does not exist in the namespace. Small patch attached. Fixed in SVN. Thank you. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Wed Feb 20 17:09:47 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 20 Feb 2008 17:09:47 -0500 Subject: [Numpy-discussion] bug in numpy.histogram? In-Reply-To: <88e473830802201349q62aeb56bq132476d2c29f136d@mail.gmail.com> References: <88e473830802201349q62aeb56bq132476d2c29f136d@mail.gmail.com> Message-ID: On Wed, 20 Feb 2008, John Hunter apparently wrote: > File > "/home/titan/johnh/dev/lib/python2.4/site-packages/numpy/lib/function_base.py", > line 155, in histogram > if(any(bins[1:]-bins[:-1] < 0)): > NameError: global name 'any' is not defined ``any`` was introduced in Python 2.5, so you need ``np.any`` here. Cheers, Alan Isaac From aisaac at american.edu Wed Feb 20 17:12:07 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 20 Feb 2008 17:12:07 -0500 Subject: [Numpy-discussion] bug in numpy.histogram? In-Reply-To: References: <88e473830802201349q62aeb56bq132476d2c29f136d@mail.gmail.com> Message-ID: On Wed, 20 Feb 2008, Alan G Isaac apparently wrote: > so you need ``np.any`` Or I could notice you included a patch ... sorry for the noise. Alan From stefan at sun.ac.za Wed Feb 20 18:08:25 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Thu, 21 Feb 2008 01:08:25 +0200 Subject: [Numpy-discussion] Rename record array fields In-Reply-To: <8fb8cc060802201310j3a3b8518naab15cc764c37a6b@mail.gmail.com> References: <8fb8cc060802201310j3a3b8518naab15cc764c37a6b@mail.gmail.com> Message-ID: <20080220230825.GC30053@mentat.za.net> Hi Sameer On Wed, Feb 20, 2008 at 03:10:16PM -0600, Sameer DCosta wrote: > Is there a way to rename record array fields without making a copy of > the whole record array? > > Thanks in advance for your replies. Simply view the array as a new dtype: In [2]: x Out[2]: array([(1, 2), (3, 4)], dtype=[('a', ' Hi everybody, In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support. For example, 0-d arrays can be indexed with * Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()] I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible). This is an easy change to implement, and I don't think it would cause any backward compatibility issues. Any opinions from the list? Best regards, -Travis O. From faltet at carabos.com Thu Feb 21 02:41:50 2008 From: faltet at carabos.com (Francesc Altet) Date: Thu, 21 Feb 2008 08:41:50 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <200802210841.50836.faltet@carabos.com> A Thursday 21 February 2008, Travis E. Oliphant escrigu?: > Hi everybody, > > In writing some generic code, I've encountered situations where it > would reduce code complexity to allow NumPy scalars to be "indexed" > in the same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these > particular cases as well (read-only of course, i.e. no setting of > the value would be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? Well, it seems like a non-intrusive modification, but I like the scalars to remain un-indexable, mainly because it would be useful to raise an error when you are trying to index them. In fact, I thought that when you want a kind of scalar but indexable, you should use a 0-d array. So, my vote is -0. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From konrad.hinsen at laposte.net Thu Feb 21 02:47:43 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Thu, 21 Feb 2008 08:47:43 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <200802210841.50836.faltet@carabos.com> References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com> Message-ID: <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> On 21.02.2008, at 08:41, Francesc Altet wrote: > Well, it seems like a non-intrusive modification, but I like the > scalars > to remain un-indexable, mainly because it would be useful to raise an > error when you are trying to index them. In fact, I thought that when > you want a kind of scalar but indexable, you should use a 0-d array. I agree. In fact, I'd rather see NumPy scalars move towards Python scalars rather than towards NumPy arrays in behaviour. In particular, their nasty habit of coercing everything they are combined with into arrays is still my #1 source of compatibility problems with porting code from Numeric to NumPy. I end up converting NumPy scalars to Python scalars explicitly in lots of places. Konrad. From dmitrey.kroshko at scipy.org Thu Feb 21 02:47:58 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 Feb 2008 09:47:58 +0200 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <47BD2CAE.1050605@scipy.org> Travis E. Oliphant wrote: > Hi everybody, > > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these particular > cases as well (read-only of course, i.e. no setting of the value would > be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? > > > Best regards, > > -Travis O. > > As for me I would be glad to see same behavior for numbers as for arrays at all, like it's implemented in MATLAB, i.e. >>a=80 >>disp(a) 80 >>disp(a(1,1)) 80 ok, for numpy having at least possibility to use a=array(80) print a[0] would be very convenient, now atleast_1d(a) is required very often, and sometimes errors occur only some times later, already during execution of user-installed code, when user usually pass several-variables arrays and some time later suddenly single-variable array have been encountered. I guess it could be implemented via a simple check: if user calls for a[0] and a is array of shape () (i.e. like a=array(80)) then return a[()] D. From eads at soe.ucsc.edu Thu Feb 21 07:44:54 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Thu, 21 Feb 2008 05:44:54 -0700 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <47BD7246.9030406@soe.ucsc.edu> In MATLAB, scalars are 1x1 arrays, and thus they can be indexed. There have been situations in my use of Numpy when I would have liked to index scalars to make my code more general. It's not a very pressing issue for me but it is an interesting issue. Whenever I index an array with a sequence or slice I'm guaranteed to get another array out. This consistency is nice. In [1]: A=numpy.random.rand(10) In [2]: A[range(0,1)] Out[2]: array([ 0.88109759]) In [3]: A[slice(0,1)] Out[3]: array([ 0.88109759]) In [3]: A[[0]] Out[3]: array([ 0.88109759]) However, when I index an array with an integer, I can get either a sequence or a scalar out. In [4]: c1=A[0] Out[4]: 0.88109759 In [5]: B=numpy.random.rand(5,5) In [5]: c2=B[0] Out[5]: array([ 0.81589633, 0.9762584 , 0.72666631, 0.12700816, 0.40653243]) Although c1 and c2 were derived by integer-indexing two different arrays of doubles, one is a sequence and the other is a scalar. This lack of consistency might be confusing to some people, and I'd imagine it occasionally results in programming errors. Damian Travis E. Oliphant wrote: > Hi everybody, > > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these particular > cases as well (read-only of course, i.e. no setting of the value would > be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? > > > Best regards, > > -Travis O. From oliphant at enthought.com Thu Feb 21 10:03:16 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 21 Feb 2008 09:03:16 -0600 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com> <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> Message-ID: <47BD92B4.1010805@enthought.com> Konrad Hinsen wrote: > On 21.02.2008, at 08:41, Francesc Altet wrote: > > >> Well, it seems like a non-intrusive modification, but I like the >> scalars >> to remain un-indexable, mainly because it would be useful to raise an >> error when you are trying to index them. In fact, I thought that when >> you want a kind of scalar but indexable, you should use a 0-d array. >> > > I agree. In fact, I'd rather see NumPy scalars move towards Python > scalars rather than towards NumPy arrays in behaviour. A good balance should be sought. I agree that improvements are needed, especially because much behavior is still just a side-effect of how things were implemented rather than specifically intentional. > In particular, > their nasty habit of coercing everything they are combined with into > arrays is still my #1 source of compatibility problems with porting > code from Numeric to NumPy. I end up converting NumPy scalars to > Python scalars explicitly in lots of places. > This bit, for example, comes from the fact that most of the math on scalars still uses ufuncs for their implementation. The numpy scalars could definitely use some improvements. However, I think my proposal for limited indexing capabilities should be considered separately from coercion behavior of NumPy scalars. NumPy scalars are intentionally different from Python scalars, and I see this difference growing due to where Python itself is going. For example, the int/long unification is going to change the ability for numpy.int to inherit from int. I could also forsee the Python float being an instance of a Decimal object or some other infinite precision float at some point which would prevent inheritance for the numpy.float object. The legitimate question is *how* different should they really be in each specific case. -Travis From eads at soe.ucsc.edu Thu Feb 21 10:39:15 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Thu, 21 Feb 2008 08:39:15 -0700 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <47BD9B23.1030009@soe.ucsc.edu> While we are on the subject of indexing... I use xranges all over the place because I tend to loop over big data sets. Thus I try avoid to avoid allocating large chunks of memory unnecessarily with range. While I try to be careful not to let xranges propagate to the ndarray's [] operator, there have been a few times when I've made a mistake. Is there any reason why adding support for xrange indexing would be a bad thing to do? All one needs to do is convert the xrange to a slice object in __getitem__. I've written some simple code to do this conversion in Python (note that in C, one can access the start, end, and step of an xrange object very easily.) def xrange_to_slice(ind): """ Converts an xrange object to a slice object. """ retval = slice(None, None, None) if type(ind) == XRangeType: # Grab a string representation of the xrange object, which takes # any of the forms: xrange(a), xrange(a,b), xrange(a,b,s). # Break it apart into a, b, and s. sind = str(ind) xr_params = [int(s) for s in sind[(sind.find('(')+1):sind.find(')')].split(",")] retval = apply(slice, xr_params) else: raise TypeError("Index must be an xrange object!") #endif return retval ---- On another note, I think it would be great if we added support for a find function, which takes a boolean array A, and returns the indices corresponding to True, but over A's flat view. In many cases, indexing with a boolean array is all one needs, making find unnecessary. However, I've encountered cases where computing the boolean array was computationally burdensome, the boolean arrays were large, and the result was needed many times throughout the broader computation. For many of my problems, storing away the flat index array uses a lot less memory than storing the boolean index arrays. I frequently define a function like def find(A): return numpy.where(A.flat)[0] Certainly, we'd need a find with more error checking, and one that handles the case when a list of booleans is passed (or a list of lists). Conceivably, one might try to index a non-flat array with the result of find. To deal with this, find could return a place holder object that the index operator checks for. Just an idea. -- I also think it'd be really useful to have a function that's like arange in that it supports floats/doubles, and also like xrange in that elements are only generated on demand. It could be implemented as a generator as shown below. def axrange(start, stop=None, step=1.0): if stop == None: stop = start start = 0.0 #endif (start, stop, step) = (numpy.float64(start), numpy.float64(stop), numpy.float64(step)) for i in xrange(0,numpy.ceil((stop-start)/step)): yield numpy.float64(start + step * i) #endfor Or, as a class, class axrangeiter: def __init__(self, rng): "An iterator over an axrange object." self.rng = rng self.i = 0 def next(self): "Returns the next float in the sequence." if self.i >= len(self.rng): raise StopIteration() self.i += 1 return self.rng[self.i-1] class axrange: def __init__(self, *args): """ axrange(stop) axrange(start, stop, [step]) An axrange object is an iterable numerical sequence between start and stop. Similar to arange, there are n=ceil((stop-start)/step) elements in the sequence. Elements are generated on demand, which can be more memory efficient. """ if len(args) == 1: self.start = numpy.float64(0.0) self.stop = numpy.float64(args[0]) self.step = numpy.float64(1.0) elif len(args) == 2: self.start = numpy.float64(args[0]) self.stop = numpy.float64(args[1]) self.step = numpy.float64(1.0) elif len(args) == 3: self.start = numpy.float64(args[0]) self.stop = numpy.float64(args[1]) self.step = numpy.float64(args[2]) else: raise TypeError("axrange requires 3 arguments.") #endif self.len = max(int(numpy.ceil((self.stop-self.start)/self.step)),0) def __len__(self): return self.len def __getitem__(self, i): return numpy.float64(self.start + self.step * i) def __iter__(self): return axrangeiter(self) def __repr__(self): if self.start == 0.0 and self.step == 1.0: return "axrange(%s)" % str(self.stop) elif self.step == 1.0: return "axrange(%s,%s)" % (str(self.start), str(self.stop)) else: return "axrange(%s,%s,%s)" % (str(self.start), str(self.stop), str(self.step)) #endif Travis E. Oliphant wrote: > Hi everybody, > > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these particular > cases as well (read-only of course, i.e. no setting of the value would > be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? > > > Best regards, > > -Travis O. > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From zachary.pincus at yale.edu Thu Feb 21 10:47:13 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 21 Feb 2008 10:47:13 -0500 Subject: [Numpy-discussion] finding eigenvectors etc In-Reply-To: References: <2decd292-04b8-4082-9807-496528b52cc3@d5g2000hsc.googlegroups.com> <5148b5bc-9ac6-4daa-8cfa-95217de99a69@e25g2000prg.googlegroups.com> Message-ID: <4D4662A3-08DB-4226-ADB2-B2946C09D4AC@yale.edu> Hi all, >> How are you using the values? How significant are the differences? >> > > i am using these eigenvectors to do PCA on a set of images(of faces).I > sort the eigenvectors in descending order of their eigenvalues and > this is multiplied with the (orig data of some images viz a matrix)to > obtain a facespace. I've dealt with similar issues a lot -- performing PCA on data where the dimensionality of the data is much greater than the number of data points. (Like images.) In this case, the maximum number of non-trivial eigenvectors of the covariance matrix of the data is min(dimension_of_data, number_of_data_points), so one always runs into the zero-eigenvalue problem; the matrix is thus always ill-conditioned, but that's not a problem in these cases. Nevertheless, if you've got (say) 100 images that are each 100x100 pixels, to do PCA in the naive way you need to make a 10000x10000 covariance matrix and then decompose it into 100000 eigenvectors and values just to get out the 100 non-trivial ones. That's a lot of computation wasted calculating noise! Fortunately, there are better ways. One is to perform the SVD on the 100x10000 data matrix. Let the centered (mean-subtracted) data matrix be D, then the SVD provides matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance matrix of interest) are then packed along the first dimension of V', and the eigenvalues are the square of the values in S. But! There's an even faster way (from my testing). The trick is that instead of calculating the 10000x10000 outer covariance matrix D'D, or doing the SVD on D, one can calculate the 100x100 "inner" covariance matrix DD' and perform the eigen-decomposition thereon and then trivially transform those eigenvalues and vectors to the ones of the D'D matrix. This computation is often substantially faster than the SVD. Here's how it works: Let D, our re-centered data matrix, be of shape (n, k) -- that is, n data points in k dimensions. We know that D has a singular value decomposition D = USV' (no need to calculate the SVD though; just enough to know it exists). From this, we can rewrite the covariance matrices: D'D = VS'SV' DD' = USS'U' Now, from the SVD, we know that S'S and SS' are diagonal matrices, and V and U (and V' and U') form orthogonal bases. One way to write the eigen-decomposition of a matrix is A = QLQ', where Q is orthogonal and L is diagonal. Since the eigen-decomposition is unique (up to a permutation of the columns of Q and L), we know that V must therefore contain the eigenvectors of D'D in its columns, and U must contain the eigenvectors of DD' in its columns. This is the origin of the SVD recipe for that I gave above. Further, let S_hat, of shape (n, k) be the elementwise reciprocal of S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m), where I is the identity matrix). Then, we can solve for U or V in terms of the other: V = D'US_hat' U = DVS_hat So, to get the eigenvectors and eigenvalues of D'D, we just calculate DD' and then apply the symmetric eigen-decomposition (symmetric version is faster, and DD' is symmetric) to get eigenvectors U and eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the sqrt is taken elementwise, of course). So, the eigenvectors we're looking for are: V = D'US_hat Then, the principal components (eigenvectors) are in the columns of V (packed along the second dimension of V). Fortunately, I've packaged this all up into a python module for PCA that takes care of this all. It's attached. Zach Pincus Postdoctoral Fellow, Lab of Dr. Frank Slack Molecular, Cellular and Developmental Biology Yale University -------------- next part -------------- A non-text attachment was scrubbed... Name: pca.py Type: text/x-python-script Size: 4350 bytes Desc: not available URL: From oliphant at enthought.com Thu Feb 21 11:33:26 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 21 Feb 2008 10:33:26 -0600 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BD9B23.1030009@soe.ucsc.edu> References: <47BCFA8F.2020808@enthought.com> <47BD9B23.1030009@soe.ucsc.edu> Message-ID: <47BDA7D6.1040203@enthought.com> Damian Eads wrote: > While we are on the subject of indexing... I use xranges all over the > place because I tend to loop over big data sets. Thus I try avoid to > avoid allocating large chunks of memory unnecessarily with range. While > I try to be careful not to let xranges propagate to the ndarray's [] > operator, there have been a few times when I've made a mistake. Is there > any reason why adding support for xrange indexing would be a bad thing > to do? All one needs to do is convert the xrange to a slice object in > __getitem__. I've written some simple code to do this conversion in > Python (note that in C, one can access the start, end, and step of an > xrange object very easily.) > I think something like this could be supported. Basically, interpreting an xrange object as a slice object would be my presumed behavior. -Travis O. From konrad.hinsen at laposte.net Thu Feb 21 11:47:56 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Thu, 21 Feb 2008 17:47:56 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BD92B4.1010805@enthought.com> References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com> <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> <47BD92B4.1010805@enthought.com> Message-ID: On Feb 21, 2008, at 16:03, Travis E. Oliphant wrote: > However, I think my proposal for limited indexing capabilities > should be > considered separately from coercion behavior of NumPy scalars. NumPy > scalars are intentionally different from Python scalars, and I see > this > difference growing due to where Python itself is going. For example, > the int/long unification is going to change the ability for > numpy.int to > inherit from int. True, but this is almost an implementation detail. What I see as more fundamental is the behaviour of Python container objects (lists, sets, etc.). If you add an object to a container and then access it as an element of the container, you get the original object (or something that behaves like the original object) without any trace of the container itself. I don't see why arrays should behave differently from all the other Python container objects - certainly not because it would be rather easy to implement. NumPy has been inspired a lot by array languages like APL or Matlab. In those languages, everything is an array, and plain numbers that would be scalars elsewhere are considered 0-d arrays. Python is not an array language but an OO language with the more general concepts of containers, sequences, iterators, etc. Arrays are just one kind of container object among many others, so they should respect the common behaviours of containers. Konrad. From aisaac at american.edu Thu Feb 21 12:08:32 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 21 Feb 2008 12:08:32 -0500 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com><1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net><47BD92B4.1010805@enthought.com> Message-ID: On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote: > What I see as more fundamental is the behaviour of Python container > objects (lists, sets, etc.). If you add an object to a container and > then access it as an element of the container, you get the original > object (or something that behaves like the original object) without > any trace of the container itself. I am not a CS type, but your statement seems related to a matrix behavior that I find bothersome and unnatural:: >>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]]) I do not think anyone has really defended this behavior, *but* the reply to me when I suggested that a matrix contains arrays and we should see that in its behavior was that, no, a matrix is a container of matrices so this is what you get. So a possible problem with your phrasing of the argument (from a non-CS, user point of view) is that it fails to address what is actually "contained" (as opposed to what you might wish were contained). Apologies if this proves OT. Cheers, Alan Isaac From konrad.hinsen at laposte.net Thu Feb 21 12:24:38 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Thu, 21 Feb 2008 18:24:38 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com><1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net><47BD92B4.1010805@enthought.com> Message-ID: <52C43DCC-6263-4A8B-A4B7-7EC0C690F7C9@laposte.net> On Feb 21, 2008, at 18:08, Alan G Isaac wrote: > I do not think anyone has really defended this behavior, > *but* the reply to me when I suggested that a matrix > contains arrays and we should see that in its behavior > was that, no, a matrix is a container of matrices so this is > what you get. I can't say much about matrices in NumPy as I never used them, nor tried to understand them. The example you give looks weird to me. > So a possible problem with your phrasing of the argument > (from a non-CS, user point of view) > is that it fails to address what is actually "contained" > (as opposed to what you might wish were contained). Most Python container objects contain arbitrary objects. Arrays are an exception (the exception being justified by the enormous memory and performance gains) in that all its elements are necessarily of identical type. A float64 array is thus a container of float64 values. BTW, I am not a CS type either, my background is in physics. I see myself on the "user" side as well. Konrad. From efiring at hawaii.edu Thu Feb 21 12:34:07 2008 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 21 Feb 2008 07:34:07 -1000 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <47BDB60F.7090205@hawaii.edu> Travis E. Oliphant wrote: > Hi everybody, > > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these particular > cases as well (read-only of course, i.e. no setting of the value would > be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? > > > Best regards, > > -Travis O. Travis, You have been getting mostly objections so far; maybe it would help if you gave a simple specific example of how your proposal would simplify code. Eric From aisaac at american.edu Thu Feb 21 12:40:13 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 21 Feb 2008 12:40:13 -0500 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <52C43DCC-6263-4A8B-A4B7-7EC0C690F7C9@laposte.net> References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com><1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net><47BD92B4.1010805@enthought.com><52C43DCC-6263-4A8B-A4B7-7EC0C690F7C9@laposte.net> Message-ID: On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote: > A float64 array is thus a container of float64 values. Well ... ok:: >>> x = N.array([1,2],dtype='float') >>> x0 = x[0] >>> type(x0) >>> So a "float64 value" is whatever a numpy.float64 is, and that is part of what is under discussion. So it seems to me. If so, then expected behavior and use cases seem relevant. Alan PS I agree that the posted matrix behavior is "weird". For this and other reasons I think it hurts the matrix object, and I have requested that it change ... From oliphant at enthought.com Thu Feb 21 14:30:11 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 21 Feb 2008 13:30:11 -0600 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BDB60F.7090205@hawaii.edu> References: <47BCFA8F.2020808@enthought.com> <47BDB60F.7090205@hawaii.edu> Message-ID: <47BDD143.7000808@enthought.com> > Travis, > > You have been getting mostly objections so far; I wouldn't characterize it that way, but yes 2 people have pushed back a bit, although one not directly speaking to the proposed behavior. The issue is that [] notation does more than just "select from a container" for NumPy arrays. In particular, it is used to reshape an array to more dimensions: [..., newaxis] A common pattern is to reduce over a dimension and then re-shape the result so that it can be combined with the un-reduced object. Broadcasting makes this work if the dimension being reduced along is the first dimension. But, broadcasting is not enough if you want the reduction dimension to be arbitrary: Thus, y = add.reduce(x, axis=-1) produces an N-1 array if x is 2-d and a numpy scalar if x is 1-d. Suppose y needs to be subtracted from x. If x is 2-d, then >>> x - y[...,newaxis] is the needed code. But, if x is 1-d, then >>> x - y[..., newaxis] returns an error and a check must be done to handle the case separately. If y[..., newaxis] worked and produced a 1-d array when y was a numpy scalar, this could be avoided. -Travis O. From charlesr.harris at gmail.com Thu Feb 21 14:58:15 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Feb 2008 12:58:15 -0700 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BDD143.7000808@enthought.com> References: <47BCFA8F.2020808@enthought.com> <47BDB60F.7090205@hawaii.edu> <47BDD143.7000808@enthought.com> Message-ID: On Thu, Feb 21, 2008 at 12:30 PM, Travis E. Oliphant wrote: > > > Travis, > > > > You have been getting mostly objections so far; > I wouldn't characterize it that way, but yes 2 people have pushed back a > bit, although one not directly speaking to the proposed behavior. > I need to think about it a lot more, but my initial reaction is also negative. On general principle, I think scalars should be different from arrays. Perhaps you could give some concrete examples of why you want the new behavior? Perhaps there will be other approaches that would achieve the same end. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Feb 21 18:17:15 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 22 Feb 2008 01:17:15 +0200 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: References: <47BCFA8F.2020808@enthought.com> Message-ID: <20080221231715.GC8095@mentat.za.net> On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote: > On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote: > > > What I see as more fundamental is the behaviour of Python container > > objects (lists, sets, etc.). If you add an object to a container and > > then access it as an element of the container, you get the original > > object (or something that behaves like the original object) without > > any trace of the container itself. > > I am not a CS type, but your statement seems related to > a matrix behavior that I find bothersome and unnatural:: > > >>> M = N.mat('1 2;3 4') > >>> M[0] > matrix([[1, 2]]) > >>> M[0][0] > matrix([[1, 2]]) This is exactly what I would expect for matrices: M[0] is the first row of the matrix. Note that you don't see this behaviour for ndarrays, since those don't insist on having a minimum of 2-dimensions. In [2]: x = np.arange(12).reshape((3,4)) In [3]: x Out[3]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [4]: x[0][0] Out[4]: 0 In [5]: x[0] Out[5]: array([0, 1, 2, 3]) Regards Stefan From stefan at sun.ac.za Thu Feb 21 18:33:15 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 22 Feb 2008 01:33:15 +0200 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: <20080221233315.GD8095@mentat.za.net> Hi Travis, On Wed, Feb 20, 2008 at 10:14:07PM -0600, Travis E. Oliphant wrote: > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > > For example, 0-d arrays can be indexed with > > * Boolean masks I've tried to use this before, but an IndexError (0-d arrays can't be indexed) is raised. > * Ellipses x[...] and x[..., newaxis] This, especially, seems like it could be very useful. > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? This is maybe a fairly esoteric use case, but one I can imagine coming across. I'm in favour of implementing the change. Could I ask that we also consider implementing len() for 0-d arrays? numpy.asarray returns those as-is, and I would like to be able to handle them just as I do any other 1-dimensional array. I don't know if a length of 1 would be valid, given a shape of (), but there must be some consistent way of handling them. Regards Stefan From peridot.faceted at gmail.com Thu Feb 21 18:43:49 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 21 Feb 2008 18:43:49 -0500 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <20080221233315.GD8095@mentat.za.net> References: <47BCFA8F.2020808@enthought.com> <20080221233315.GD8095@mentat.za.net> Message-ID: On 21/02/2008, Stefan van der Walt wrote: > Could I ask that we also consider implementing len() for 0-d arrays? > numpy.asarray returns those as-is, and I would like to be able to > handle them just as I do any other 1-dimensional array. I don't know > if a length of 1 would be valid, given a shape of (), but there must > be some consistent way of handling them. Well, if the length of an array is the product of all its sizes, the product of no things is customarily defined to be one... whether that is actually a useful value is another question. Anne From aisaac at american.edu Thu Feb 21 19:10:24 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 21 Feb 2008 19:10:24 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <20080221231715.GC8095@mentat.za.net> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net> Message-ID: > On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote: >> a matrix behavior that I find bothersome and unnatural:: >> >>> M = N.mat('1 2;3 4') >> >>> M[0] >> matrix([[1, 2]]) >> >>> M[0][0] >> matrix([[1, 2]]) On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote: > This is exactly what I would expect for matrices: M[0] is > the first row of the matrix. Define what "first row" means! There is no standard definition that says this is means the **submatrix** that can be created from the first row. Someone once pointed out on this list that one might consider a matrix to be a container of 1d vectors. For NumPy, however, it is natural that it be a container of 1d arrays. (See the discussion for the distinction.) Imagine if a 2d array behaved this way. Ugh! Note that it too is 2d; you could have the same "expectation" based on its 2d-ness. Why don't you? You "expect" this matrix behavior only from experience with it, which is why I "expect" it too, while hating it. It is not what new users will expect and also not desirable. As Konrad noted, it is very odd behavior to treat a matrix as a container of matrices. You can only "expect" this behavior by learning to expect it (by use), which is undesirable. Nobody has objected to returning matrices when getitem is fed multiple arguments: these are naturally interpreted as requests for submatrices. M[0][0] and M[:1,:1] are very different kinds of requests: the first should return the 0,0 element but does not, while M[0,0] does! Bizarre! How to guess?? If you teach, do your students expect this behavior? Mine don't! This is a wart. The example really speaks for itself. Since Konrad is an extremely experienced user/developer, his reaction should speak volumes. Cheers, Alan Isaac From cburns at berkeley.edu Thu Feb 21 19:17:26 2008 From: cburns at berkeley.edu (Christopher Burns) Date: Thu, 21 Feb 2008 16:17:26 -0800 Subject: [Numpy-discussion] change memmap.sync function Message-ID: <764e38540802211617w6688a74bw3ce71d892fdc431c@mail.gmail.com> Would anyone oppose deprecating the memmap.sync function and replacing it with memmap.flush? This would match python's mmap module, and I think be more intuitive. -- Christopher Burns, Software Engineer Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From efiring at hawaii.edu Fri Feb 22 00:02:24 2008 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 21 Feb 2008 19:02:24 -1000 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BDD143.7000808@enthought.com> References: <47BCFA8F.2020808@enthought.com> <47BDB60F.7090205@hawaii.edu> <47BDD143.7000808@enthought.com> Message-ID: <47BE5760.9060709@hawaii.edu> Travis E. Oliphant wrote: >> Travis, >> >> You have been getting mostly objections so far; > I wouldn't characterize it that way, but yes 2 people have pushed back a > bit, although one not directly speaking to the proposed behavior. > > The issue is that [] notation does more than just "select from a > container" for NumPy arrays. In particular, it is used to reshape an > array to more dimensions: [..., newaxis] > > A common pattern is to reduce over a dimension and then re-shape the > result so that it can be combined with the un-reduced object. > Broadcasting makes this work if the dimension being reduced along is the > first dimension. But, broadcasting is not enough if you want the > reduction dimension to be arbitrary: > > Thus, > > y = add.reduce(x, axis=-1) produces an N-1 array if x is 2-d and a > numpy scalar if x is 1-d. Why does it produce a scalar instead of a 0-d array? Wouldn't the latter take care of your use case, and be consistent with the action of reduce in removing one dimension? I'm not opposed to your suggested change--just trying to understand it. I'm certainly sympathetic to your use case, below. I dimly recall extensive and confusing (to me) discussions of numpy scalars versus 0-d arrays during your heroic push to make numpy gel, and I suspect the answer is somewhere back in those discussions. Eric > > Suppose y needs to be subtracted from x. > > If x is 2-d, then > > >>> x - y[...,newaxis] > > is the needed code. But, if x is 1-d, then > > >>> x - y[..., newaxis] > > returns an error and a check must be done to handle the case > separately. If y[..., newaxis] worked and produced a 1-d array when y > was a numpy scalar, this could be avoided. > > > -Travis O. > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From konrad.hinsen at laposte.net Fri Feb 22 02:42:40 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Fri, 22 Feb 2008 08:42:40 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com><1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net><47BD92B4.1010805@enthought.com><52C43DCC-6263-4A8B-A4B7-7EC0C690F7C9@laposte.net> Message-ID: <04895A92-09C7-45DC-B9E5-4E016289957F@laposte.net> On 21.02.2008, at 18:40, Alan G Isaac wrote: >>>> x = N.array([1,2],dtype='float') >>>> x0 = x[0] >>>> type(x0) > >>>> > > So a "float64 value" is whatever a numpy.float64 is, > and that is part of what is under discussion. numpy.float64 is a very recent invention. During the first decade of numerical arrays in Python (Numeric), typ(x0) was the standard Python float type. And even today, what you put into an array (via the array constructor or by assignment) is Python scalar objects, mostly int, float, and complex. The reason for defining special types for the scalar elements of arrays was efficiency considerations. Python has only a single float type, there is no distinction between single and double precision. Extracting an array element would thus always yield a double precision float, and adding it to a single-precision array would yield a double precision result, meaning that it was extremely difficult to maintain single-precision storage across array arithmetic. For huge arrays, that was a serious problem. However, the intention was always to have numpy's scalar objects behave as similarly as possible to Python scalars. Ideally, application code should not see a difference at all. This was largely successful, with the notable exception of the coercion problem that I mentioned a few mails ago. Konrad. From konrad.hinsen at laposte.net Fri Feb 22 02:50:42 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Fri, 22 Feb 2008 08:50:42 +0100 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net> Message-ID: <4C851142-ACFD-4376-9E70-80C62F142C1D@laposte.net> On 22.02.2008, at 01:10, Alan G Isaac wrote: > Someone once pointed out on this list that one might > consider a matrix to be a container of 1d vectors. For NumPy, > however, it is natural that it be a container of 1d arrays. > (See the discussion for the distinction.) If I were to design a Pythonic implementation of the mathematical concept of a matrix, I'd implement three classes: Matrix, ColumnVector, and RowVector. It would work like this: m = Matrix([[1, 2], [3, 4]]) m[0, :] --> ColumnVector([1, 3]) m[:, 0] --> RowVector([1, 2]) m[0,0] --> 1 # scalar m.shape --> (2, 2) m[0].shape --> (2,) However, the matrix implementation in Numeric was inspired by Matlab, where everything is a matrix. But as I said before, Python is not Matlab. Konrad. From stefan at sun.ac.za Fri Feb 22 03:37:07 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 22 Feb 2008 10:37:07 +0200 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <20080221231715.GC8095@mentat.za.net> Message-ID: <20080222083707.GA11708@mentat.za.net> On Thu, Feb 21, 2008 at 07:10:24PM -0500, Alan G Isaac wrote: > > On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote: > >> a matrix behavior that I find bothersome and unnatural:: > > >> >>> M = N.mat('1 2;3 4') > >> >>> M[0] > >> matrix([[1, 2]]) > >> >>> M[0][0] > >> matrix([[1, 2]]) > > > On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote: > > This is exactly what I would expect for matrices: M[0] is > > the first row of the matrix. > > Define what "first row" means! > There is no standard definition that says this is means the > **submatrix** that can be created from the first row. > Someone once pointed out on this list that one might > consider a matrix to be a container of 1d vectors. For NumPy, > however, it is natural that it be a container of 1d arrays. > (See the discussion for the distinction.) Could you explain to me how you'd like this to be fixed? If the matrix becomes a container of 1-d arrays, then you can no longer expect x[:,0] to return a column vector -- which was one of the reasons the matrix class was created. While not entirely consistent, one workaround would be to detect when a matrix is a "vector", and then do 1-d-like indexing on it. > You "expect" this matrix behavior only from experience with it, > which is why I "expect" it too, while hating it. No, really, I don't ever use the matrix class :) But it is not like the behaviour is set in stone, so I would spend less time hating and more time patching. > The example really speaks for itself. Since Konrad is an extremely > experienced user/developer, his reaction should speak volumes. Of course, I meant no disrespect to Konrad. I'm just trying to understand the best way to address your concern. Regards Stefan From stefan at sun.ac.za Fri Feb 22 04:05:51 2008 From: stefan at sun.ac.za (Stefan van der Walt) Date: Fri, 22 Feb 2008 11:05:51 +0200 Subject: [Numpy-discussion] change memmap.sync function In-Reply-To: <764e38540802211617w6688a74bw3ce71d892fdc431c@mail.gmail.com> References: <764e38540802211617w6688a74bw3ce71d892fdc431c@mail.gmail.com> Message-ID: <20080222090551.GB11708@mentat.za.net> On Thu, Feb 21, 2008 at 04:17:26PM -0800, Christopher Burns wrote: > Would anyone oppose deprecating the memmap.sync function and replacing > it with memmap.flush? This would match python's mmap module, and I > think be more intuitive. I made the change in http://projects.scipy.org/scipy/numpy/changeset/4817 If anyone objects, please revert. Chris, hope you don't mind that I went ahead so long -- I committed before noticing that you were working on the module earlier. Then again, you guys are probably all in bed by now! Cheers Stefan From faltet at carabos.com Fri Feb 22 07:04:25 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 22 Feb 2008 13:04:25 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <20080221233315.GD8095@mentat.za.net> References: <47BCFA8F.2020808@enthought.com> <20080221233315.GD8095@mentat.za.net> Message-ID: <200802221304.25978.faltet@carabos.com> A Friday 22 February 2008, Stefan van der Walt escrigu?: > Hi Travis, > > On Wed, Feb 20, 2008 at 10:14:07PM -0600, Travis E. Oliphant wrote: > > In writing some generic code, I've encountered situations where it > > would reduce code complexity to allow NumPy scalars to be "indexed" > > in the same number of limited ways, that 0-d arrays support. > > > > > > For example, 0-d arrays can be indexed with > > > > * Boolean masks > > I've tried to use this before, but an IndexError (0-d arrays can't be > indexed) is raised. Yes, that's true, and what's more, you can't pass a slice to a 0-d array, which is certainly problematic. I think this should be fixed. > > > * Ellipses x[...] and x[..., newaxis] > > This, especially, seems like it could be very useful. Well, if you want to create a x[..., newaxis], you can always use array([x]), which also works with scalars (and python scalars too), although the later does create a copy :-/ > Could I ask that we also consider implementing len() for 0-d arrays? > numpy.asarray returns those as-is, and I would like to be able to > handle them just as I do any other 1-dimensional array. I don't know > if a length of 1 would be valid, given a shape of (), but there must > be some consistent way of handling them. If 0-d arrays are going to be indexable, then +1 for len(0-d) returning 1. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From ndbecker2 at gmail.com Fri Feb 22 07:29:22 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Feb 2008 07:29:22 -0500 Subject: [Numpy-discussion] ufunc for user-defined type Message-ID: Now that I have my user-defined type, I want to add some funcs. According to the numpy book, I need to use: PyUFunc_RegisterLoopForType The book says I first need a ufunc. The only way I see to create one is PyUFunc_FromFuncAndData. Is the the correct procedure? I wonder, because PyUFunc_FromFuncAndData requires 'type', and 'type' is char, but user-defined types start at 256, which doesn't fit in a char, which gives a compile warning. From aisaac at american.edu Fri Feb 22 09:42:07 2008 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Feb 2008 09:42:07 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <20080222083707.GA11708@mentat.za.net> References: <20080221231715.GC8095@mentat.za.net><20080222083707.GA11708@mentat.za.net> Message-ID: >>> On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac >>> wrote: >>>> a matrix behavior that I find bothersome and unnatural:: >>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]]) On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote: > Could you explain to me how you'd like this to be fixed? If the > matrix becomes a container of 1-d arrays, then you can no longer > expect x[:,0] to return a column vector -- which was one > of the reasons the matrix class was created. While not > entirely consistent, one workaround would be to detect > when a matrix is a "vector", and then do 1-d-like indexing > on it. Letting M be a matrix and A=M.A, and i and j are integers. I would want two principles to be honored. 1. ordinary Python indexing produces unsurprising results, so that e.g. M[0][0] returns the first element of the matrix 2. indexing that produces a 2-d array when applied to A will produce the equivalent matrix when applied to M There is some tension between these two requirements, and they do not address your specific example. Various reconciliations can be imagined. I believe a nice one can be achieved with a truly minimal change, as follows. Let M[i] return a 1d array. (Unsurprising!) This is a change: a matrix becomes a container of arrays (e.g., when iterating). Let M[:,i] and M[i,:] behave as now. In addition, as a consistency measure, one might ask that M[i,j] return a 1 x 1 matrix. (This is of secondary importance, but it follows the principle that the use of multiple indexes produces matrices.) Right now I'm operating on caffeine instead of sleep, but that looks right ... Alan Isaac From oliphant at enthought.com Fri Feb 22 09:55:41 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 08:55:41 -0600 Subject: [Numpy-discussion] matrix wart In-Reply-To: <4C851142-ACFD-4376-9E70-80C62F142C1D@laposte.net> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net> <4C851142-ACFD-4376-9E70-80C62F142C1D@laposte.net> Message-ID: <47BEE26D.7020102@enthought.com> Konrad Hinsen wrote: > On 22.02.2008, at 01:10, Alan G Isaac wrote: > > >> Someone once pointed out on this list that one might >> consider a matrix to be a container of 1d vectors. For NumPy, >> however, it is natural that it be a container of 1d arrays. >> (See the discussion for the distinction.) >> > > If I were to design a Pythonic implementation of the mathematical > concept of a matrix, I'd implement three classes: Matrix, > ColumnVector, and RowVector. It would work like this: > > m = Matrix([[1, 2], [3, 4]]) > > m[0, :] --> ColumnVector([1, 3]) > m[:, 0] --> RowVector([1, 2]) > These seem backward to me. I would think that m[0,:] would be the RowVector([1,2]) and m[:,0] be the ColumnVector([1,3]). > m[0,0] --> 1 # scalar > > m.shape --> (2, 2) > m[0].shape --> (2,) > What is m[0] in this case? The same as m[0, :]? > However, the matrix implementation in Numeric was inspired by Matlab, > where everything is a matrix. But as I said before, Python is not > Matlab. It should be kept in mind, however, that Matlab's matrix object is used successfully by a lot of people and should not be dismissed as irrelevant. I would like to see an improved Matrix object as a built-in type (for 1.1). I am aware of two implementations that could be referred to in creating it: CVXOPT's matrix object and NumPy's matrix object. There may be others as well. If somebody has strong feelings about this sufficient to write a matrix built-in, then the door is wide open. Best, -Travis From konrad.hinsen at laposte.net Fri Feb 22 10:14:48 2008 From: konrad.hinsen at laposte.net (Konrad Hinsen) Date: Fri, 22 Feb 2008 16:14:48 +0100 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BEE26D.7020102@enthought.com> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net> <4C851142-ACFD-4376-9E70-80C62F142C1D@laposte.net> <47BEE26D.7020102@enthought.com> Message-ID: <85EDE606-2FFF-4704-BD72-9E58F9A5C012@laposte.net> On Feb 22, 2008, at 15:55, Travis E. Oliphant wrote: >> ColumnVector, and RowVector. It would work like this: >> >> m = Matrix([[1, 2], [3, 4]]) >> >> m[0, :] --> ColumnVector([1, 3]) >> m[:, 0] --> RowVector([1, 2]) >> > These seem backward to me. I would think that m[0,:] would be the > RowVector([1,2]) and m[:,0] be the ColumnVector([1,3]). Right. > What is m[0] in this case? The same as m[0, :]? Yes. >> However, the matrix implementation in Numeric was inspired by Matlab, >> where everything is a matrix. But as I said before, Python is not >> Matlab. > It should be kept in mind, however, that Matlab's matrix object is > used > successfully by a lot of people and should not be dismissed as > irrelevant. Matlab's approach is fine for Matlab, of course. All I am saying is that it is a misfit for Python. Just like 1-based indexing is used successfully by lots of Fortran programmers, but would be an eternal source of confusion if it were introduced for a specific container object in Python. Konrad. From oliphant at enthought.com Fri Feb 22 10:18:38 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 09:18:38 -0600 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net> Message-ID: <47BEE7CE.9020302@enthought.com> > On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote: > >> This is exactly what I would expect for matrices: M[0] is >> the first row of the matrix. >> > > Define what "first row" means! > Konrad has shown that do "get it right" you really have to introduce three separate things (matrices, row vectors, and column vectors). This is a fine direction to proceed in, but it does complicate things as well. The current implementation has the advantage that row vectors are just 1xN matrices and column vectors are Nx1 matrices, so there is only 1 kind of thing: matrices. The expectation that M[0][0] and M[0,0] return the same thing stems from believing that all objects using [] syntax are just containers. (Think of a dictionary with keys, '0', and '(0,0)' for an example). The matrix object is not a "container" object. A NumPy array, however, is. They have different behaviors, on purpose. If you don't like the matrix object, then just use the NumPy array. There are situations, however, when the matrix object is very useful. I use it in limited fashion to make expressions easier to read. > Imagine if a 2d array behaved this way. Ugh! > Note that it too is 2d; you could have the same > "expectation" based on its 2d-ness. Why don't you? > The 2d-ness is not the point. The point is that a matrix object is a matrix object and *not* a generic container. > Nobody has objected to returning matrices when getitem is > fed multiple arguments: these are naturally interpreted as > requests for submatrices. M[0][0] and M[:1,:1] are very > different kinds of requests: the first should return the 0,0 > element but does not, while M[0,0] does! Bizarre! > How to guess?? If you teach, do your students expect this > behavior? Mine don't! > Again, stop believing that M[0][0] and M[0,0] should return the same thing. There is nothing in Python that requires this. As far as I know, the matrix object is consistent. It may not behave as you, or people that you teach, would expect, but it does have reasonable behavior. Expectations are generally "learned" based on previous experience. Our different experiences will always lead to different expectations. What somebody expects for a matrix behavior will depend on how they were taught what it means to "be" a matrix. > This is a wart. > I disagree. It's not a wart, it is intentional. > The example really speaks for itself. Since Konrad is an > extremely experienced user/developer, his reaction should > speak volumes. > I'm not as convinced by this kind of argument. I respect Konrad a great deal and am always interested to hear his opinion, and make use of all of the code that he shares with us. His example has been an important part of my Python "education." However, we do approach problems differently (probably again based on previous experiences) which leads us to promote different solutions. I also see this in the wider Python community where there is a "diversity" of user/developers who promote different approaches as well (e.g. the PIL vs NumPy concept of Images comes to mind as well). I've heard many differing points of view on the Matrix object. Stefan's comment is most relevant: the Matrix object can be changed (in 1.1), especially because we are keen on merging CVXOPT's matrix object with NumPy's and making it a builtin type. -Travis O. From oliphant at enthought.com Fri Feb 22 10:22:29 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 09:22:29 -0600 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <20080221231715.GC8095@mentat.za.net><20080222083707.GA11708@mentat.za.net> Message-ID: <47BEE8B5.90000@enthought.com> > >> Could you explain to me how you'd like this to be fixed? If the >> matrix becomes a container of 1-d arrays, then you can no longer >> expect x[:,0] to return a column vector -- which was one >> of the reasons the matrix class was created. While not >> entirely consistent, one workaround would be to detect >> when a matrix is a "vector", and then do 1-d-like indexing >> on it. >> > > > > Letting M be a matrix and A=M.A, and i and j are integers. > I would want two principles to be honored. > > 1. ordinary Python indexing produces unsurprising results, > so that e.g. M[0][0] returns the first element of the matrix > 2. indexing that produces a 2-d array when applied to A > will produce the equivalent matrix when applied to M > > There is some tension between these two requirements, > and they do not address your specific example. > > Various reconciliations can be imagined. > I believe a nice one can be achieved with > a truly minimal change, as follows. > > Let M[i] return a 1d array. (Unsurprising!) > This is a change: a matrix becomes a container > This is a concrete proposal and I don't immediately have a problem with it (other than it will break code and so must go in to 1.1). > Let M[:,i] and M[i,:] behave as now. > Some would expect M[i,:] and M[i] to be the same thing, but I would be willing to squelsh those expectations if many can agree that M[i] should return an array. > In addition, as a consistency measure, one might > ask that M[i,j] return a 1 x 1 matrix. (This is > of secondary importance, but it follows the > principle that the use of multiple indexes > produces matrices.) > I'm pretty sure that wasn't the original "principle", but again this is not unreasonable. > Right now I'm operating on caffeine instead of sleep, > but that looks right ... > > Alan Isaac > > From aisaac at american.edu Fri Feb 22 10:34:47 2008 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Feb 2008 10:34:47 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BEE7CE.9020302@enthought.com> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net><47BEE7CE.9020302@enthought.com> Message-ID: On Fri, 22 Feb 2008, "Travis E. Oliphant" apparently wrote: > The point is that a matrix object is a > matrix object and not a generic container. I see the point a bit differently: there are costs and benefits to the abandonment of a specific and natural behavior of containers. (The kind of behavior that arrays have.) The costs outweigh the benefits. > stop believing that M[0][0] and M[0,0] should return the > same thing. There is nothing in Python that requires > this. I never suggested there is. My question "how to guess?" does not imply that. My point is: the matrix object could have more intuitive behavior with no loss of functionality. Or so it seems to me. See my other post. Cheers, Alan From oliphant at enthought.com Fri Feb 22 10:56:22 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 09:56:22 -0600 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net><47BEE7CE.9020302@enthought.com> Message-ID: <47BEF0A6.9080808@enthought.com> Alan G Isaac wrote: > >> stop believing that M[0][0] and M[0,0] should return the >> same thing. There is nothing in Python that requires >> this. >> > > I never suggested there is. > My question "how to guess?" does not imply that. > > My point is: the matrix object could have more intuitive > behavior with no loss of functionality. > > Do I understand correctly, that by intuitive you mean based on experience with lists, and NumPy arrays? I agree, it is very valuable to be able to use previous understanding to navigate a new thing. That's a big part of why I could see changing the matrix object in 1.1 to behave as you described in your previous post: where M[i] returned a 1-d array and matrices were returned with 2-d (slice-involved) indexing (I would not mind M[0,0] to still return a scalar, however). -Travis From oliphant at enthought.com Fri Feb 22 11:01:08 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 10:01:08 -0600 Subject: [Numpy-discussion] ufunc for user-defined type In-Reply-To: References: Message-ID: <47BEF1C4.2000608@enthought.com> Neal Becker wrote: > Now that I have my user-defined type, I want to add some funcs. > According to the numpy book, I need to use: > > PyUFunc_RegisterLoopForType > > The book says I first need a ufunc. The only way I see to create one is > > PyUFunc_FromFuncAndData. > > Is the the correct procedure? > You'll have to do some digging because the API allowing low-level loops for user-defined types is not well tested or vetted. Also, don't be surprised if you uncover some bugs. The concept is that you create the UFunc with the "built-in" types using PyUFunc_FromFuncAndData and then you add loops for the user-defined type using RegisterLoopForType. So, you have the basic procedure down. -Travis From aisaac at american.edu Fri Feb 22 11:29:04 2008 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Feb 2008 11:29:04 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BEF0A6.9080808@enthought.com> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net><47BEE7CE.9020302@enthought.com><47BEF0A6.9080808@enthought.com> Message-ID: On Fri, 22 Feb 2008, "Travis E. Oliphant" apparently wrote: > Do I understand correctly, that by intuitive you mean > based on experience with lists, and NumPy arrays? Yes. In particular, array behavior is quite lovely and almost never surprising, so matrices should deviate from it only when there is an adequate payoff and, ideally, an easily stated principle. Thanks! Alan PS If you choose to implement such changes, I would find M[0,0] returning a 1?1 matrix to be more consistent, but to be clear, for me this is *very* much a secondary issue. Not even in the same ballpark. From faltet at carabos.com Fri Feb 22 11:34:44 2008 From: faltet at carabos.com (Francesc Altet) Date: Fri, 22 Feb 2008 17:34:44 +0100 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> References: <47BCFA8F.2020808@enthought.com> <200802210841.50836.faltet@carabos.com> <1A14086B-3B2E-48AD-9D35-7E7FA066151C@laposte.net> Message-ID: <200802221734.45195.faltet@carabos.com> A Thursday 21 February 2008, Konrad Hinsen escrigu?: > I agree. In fact, I'd rather see NumPy scalars move towards Python > scalars rather than towards NumPy arrays in behaviour. In particular, > their nasty habit of coercing everything they are combined with into > arrays is still my #1 source of compatibility problems with porting > code from Numeric to NumPy. I end up converting NumPy scalars to > Python scalars explicitly in lots of places. Yeah, that happened to me too quite frequently, and it is quite uncomfortable. Also, I find this specially unpleasant: In [87]: numpy.int(1)/numpy.uint64(2) Out[87]: 0.5 Is this avoidable, or it's a consequence of the coercing rules? I guess this is the same case of: In [88]: numpy.array([1])/numpy.array([2], 'uint64') Out[88]: array([ 0.5]) By the way: In [89]: numpy.array(1)/numpy.array(2, 'uint64') Out[89]: 0.5 shouldn't this be array(0.5)? Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From Chris.Barker at noaa.gov Fri Feb 22 11:48:46 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 Feb 2008 08:48:46 -0800 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BEF0A6.9080808@enthought.com> References: <47BCFA8F.2020808@enthought.com> <20080221231715.GC8095@mentat.za.net> <47BEE7CE.9020302@enthought.com> <47BEF0A6.9080808@enthought.com> Message-ID: <47BEFCEE.8040300@noaa.gov> Travis E. Oliphant wrote: > to behave as you described in your previous post: where M[i] returned a > 1-d array My thoughts on this: As Konrad suggested, row vectors and column vectors are different beasts ,and both need to be easily and intuitively available. M[i] returning a 1-d array breaks this -- that's what raw numpy arrays do, and I like it, but it's not so natural for linear algebra. If we really want to support matrixes, then no, M[i] should not return a 1-d array -- what is a 1-d array mean in the matrix/linear algebra context? It makes me think that M[i] should not even be possible, as you would always want one of: row vector: M[i,:] column vector: M[:,i] element: M[i,j] I do like the idea of a row/column vectors being different objects than matrices, then you could naturally index the elements from them. If you really want a 1-d array, you can always do: M.A[i] What if you want to naturally iterate through all the rows, or all the columns? what about: for row in M.rows for column in M.columns M.rows and M.columns would be iterators. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ndbecker2 at gmail.com Fri Feb 22 11:59:16 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Feb 2008 11:59:16 -0500 Subject: [Numpy-discussion] ufunc for user-defined type References: <47BEF1C4.2000608@enthought.com> Message-ID: Travis E. Oliphant wrote: > Neal Becker wrote: >> Now that I have my user-defined type, I want to add some funcs. >> According to the numpy book, I need to use: >> >> PyUFunc_RegisterLoopForType >> >> The book says I first need a ufunc. The only way I see to create one is >> >> PyUFunc_FromFuncAndData. >> >> Is the the correct procedure? >> > You'll have to do some digging because the API allowing low-level loops > for user-defined types is not well tested or vetted. Also, don't be > surprised if you uncover some bugs. > > The concept is that you create the UFunc with the "built-in" types using > PyUFunc_FromFuncAndData and then you add loops for the user-defined type > using RegisterLoopForType. > > So, you have the basic procedure down. > > -Travis It looks like this works. Basically, the only thing used from the ufunc obj is: name, nin, nout, nargs, and ntypes. // Make a generic new ufunc obj static PyUFuncObject * new_ufunc (char* name) { PyUFuncObject * u = PyObject_New (PyUFuncObject, &PyUFunc_Type); u->nin = 0; u->nout = 0; u->nargs = 0; u->identity = 0; u->functions = 0; u->data = 0; u->types = 0; u->ntypes = 0; u->check_return = 0; u->ptr = NULL; u->obj = NULL; u->userloops=NULL; u->name = name; u->doc = "NULL"; return u; } //fill in required fields PyUFuncObject * u = new_ufunc ("magsqr"); u->nin = 1; u->nout = 1; u->nargs = 2; u->ntypes = 1; { int types[] = {d1->type_num, NPY_INT32}; register_unary_loop_func (u, d1->type_num, types, mag_sqr_impl()); } where 'register_unary_loop_func' will basically call: PyUFunc_RegisterLoopForType (u, typenum, &loop1d_unary, types, 0); From aisaac at american.edu Fri Feb 22 12:52:36 2008 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Feb 2008 12:52:36 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BEFCEE.8040300@noaa.gov> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net><47BEE7CE.9020302@enthought.com><47BEF0A6.9080808@enthought.com><47BEFCEE.8040300@noaa.gov> Message-ID: On Fri, 22 Feb 2008, Christopher Barker apparently wrote: > It makes me think that M[i] should not even be possible, > as you would always want one of: > row vector: M[i,:] > column vector: M[:,i] > element: M[i,j] I propose that the user-friendly question is: why deviate needlessly from array behavior? (Needlessly means: no increase in functionality.) Cheers, Alan Isaac From ndbecker2 at gmail.com Fri Feb 22 13:13:06 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Feb 2008 13:13:06 -0500 Subject: [Numpy-discussion] cmp_arg_types bug? Message-ID: cmp_arg_types(int *arg1, int *arg2, int n) { while (n--) { if (PyArray_EquivTypenums(*arg1, *arg2)) continue; if (PyArray_CanCastSafely(*arg1, *arg2)) return -1; return 1; } return 0; } IIUC, if can cast (arg1, arg2), we never compare other args, just return -1. Shouldn't this be: cmp_arg_types(int *arg1, int *arg2, int n) { while (n--) { if (PyArray_EquivTypenums(*arg1, *arg2) or (PyArray_CanCastSafely(*arg1, *arg2)) continue; return 1; } return 0; } From ndbecker2 at gmail.com Fri Feb 22 13:45:48 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Feb 2008 13:45:48 -0500 Subject: [Numpy-discussion] cmp_arg_types bug? References: Message-ID: Neal Becker wrote: > cmp_arg_types(int *arg1, int *arg2, int n) > { > while (n--) { > if (PyArray_EquivTypenums(*arg1, *arg2)) continue; > if (PyArray_CanCastSafely(*arg1, *arg2)) > return -1; > return 1; > } > return 0; > } > > IIUC, if can cast (arg1, arg2), we never compare other args, just return > -1. > > Shouldn't this be: > cmp_arg_types(int *arg1, int *arg2, int n) > { > while (n--) { > if (PyArray_EquivTypenums(*arg1, *arg2) or > (PyArray_CanCastSafely(*arg1, *arg2)) > continue; > return 1; > } > return 0; > } Oh, better make that: static int cmp_arg_types(int *arg1, int *arg2, int n) { for (;n > 0; n--, ++arg1, ++arg2) { if (PyArray_EquivTypenums(*arg1, *arg2) || PyArray_CanCastSafely(*arg1, *arg2)) continue; return 1; } return 0; } From Chris.Barker at noaa.gov Fri Feb 22 13:48:02 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 Feb 2008 10:48:02 -0800 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <47BCFA8F.2020808@enthought.com> <20080221231715.GC8095@mentat.za.net> <47BEE7CE.9020302@enthought.com> <47BEF0A6.9080808@enthought.com> <47BEFCEE.8040300@noaa.gov> Message-ID: <47BF18E2.50306@noaa.gov> Alan G Isaac wrote: > I propose that the user-friendly question is: > why deviate needlessly from array behavior? because that's the whole point of a Matrix object in the first place. > (Needlessly means: no increase in functionality.) Functionally, you can do everything you need to do with numpy arrays. The only reason there is a matrix class is to create a more natural, and readable way to do linear algebra. That's why the current version always returns matrices -- people don't want to have to keep converting back to matrices from arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From aisaac at american.edu Fri Feb 22 14:11:26 2008 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Feb 2008 14:11:26 -0500 Subject: [Numpy-discussion] matrix wart In-Reply-To: <47BF18E2.50306@noaa.gov> References: <47BCFA8F.2020808@enthought.com><20080221231715.GC8095@mentat.za.net><47BEE7CE.9020302@enthought.com><47BEF0A6.9080808@enthought.com> <47BEFCEE.8040300@noaa.gov><47BF18E2.50306@noaa.gov> Message-ID: > Alan G Isaac wrote: >> I propose that the user-friendly question is: >> why deviate needlessly from array behavior? >> (Needlessly means: no increase in functionality.) On Fri, 22 Feb 2008, Christopher Barker apparently wrote: > because that's the whole point of a Matrix object in the > first place. Do you really believe that? As phrased?? (Out of curiosity: do you use matrices?) On Fri, 22 Feb 2008, Christopher Barker apparently wrote: > Functionally, you can do everything you need to do with numpy arrays. That is a pretty narrow concept of functionality, which excludes all user convenience aspects. I do not understand why you are introducing it; it seems irrelevant. If you push this line of reasoning, you should just tell me I can do it all in C. On Fri, 22 Feb 2008, Christopher Barker apparently wrote: > The only reason there is a matrix class is to create > a more natural, and readable way to do linear algebra. > That's why the current version always returns matrices -- > people don't want to have to keep converting back to > matrices from arrays. You are begging the question. Of course we want to be able to conveniently extract submatrices and build new matrices. Nobody has challenged that or proposed otherwise. Or are you complaining that you would have to type M[i,:] instead of M[i]? (No, that cannot be; you were proposing that M[i] be an error...) Alan Isaac From ndbecker2 at gmail.com Fri Feb 22 15:38:47 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Feb 2008 15:38:47 -0500 Subject: [Numpy-discussion] user-defined types mixed arithmetic Message-ID: I have 2 user-defined types, and simple arithmetic working for the cases I registered with PyUFunc_RegisterLoopForType. I'd like to use automatic conversion to do mixed arithmetic between these 2 types. I did PyArray_RegisterCastFunc, and it seems this allows explicit conversion: >>> a array([(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0)], dtype=cmplx_int32) >>> b array([(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0)], dtype=cmplx_int64) >>> array (a,dtype=b.dtype) array([(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0)], dtype=cmplx_int64) >>> But mixed mode arithmetic gives: a+b TypeError: function not supported for these types, and can't coerce safely to supported types I've been trying to understand this 'coerce' without much luck. Any clues what I need to do here? From dalcinl at gmail.com Fri Feb 22 17:05:42 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 22 Feb 2008 19:05:42 -0300 Subject: [Numpy-discussion] Matching 0-d arrays and NumPy scalars In-Reply-To: <47BCFA8F.2020808@enthought.com> References: <47BCFA8F.2020808@enthought.com> Message-ID: Travis, after reading all the post on this thread, my comments Fist of all, I'm definitelly +1 on your suggestion. Below my rationale. * I believe numpy scalars should provide all possible features needed to smooth the difference between mutable, indexable 0-d arrays and inmutable, non-indexable builtin Python numeric types. * Given that in the context of generic multi-dimensional array processing a 0-d array are more natural and useful concept that a Python 'int' and 'float', I really think that numpy scalars shoud follow as much as possible the behavior of 0-d arrays (of course, retaining inmutability). * Numpy scalars already have (thanks for that!) a very, very similar API to ndarrays. You can as for 'size', 'shape', etc ( BTW, why scalar.fill(x) does not generate any error????). Why do not add indexing as well? * However, I'm not sure about the proposal of supporting len(), I'm -0 on this point. Anyway, if this is added, then 0-d arrays should also have to support it. And then... len(scalar) or len(0-d-array) is going to return 0 (zero)? Regards. On 2/21/08, Travis E. Oliphant wrote: > In writing some generic code, I've encountered situations where it would > reduce code complexity to allow NumPy scalars to be "indexed" in the > same number of limited ways, that 0-d arrays support. > > For example, 0-d arrays can be indexed with > > * Boolean masks > * Ellipses x[...] and x[..., newaxis] > * Empty tuple x[()] > > I think that numpy scalars should also be indexable in these particular > cases as well (read-only of course, i.e. no setting of the value would > be possible). > > This is an easy change to implement, and I don't think it would cause > any backward compatibility issues. > > Any opinions from the list? > > > Best regards, > > -Travis O. > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From millman at berkeley.edu Fri Feb 22 18:26:33 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Fri, 22 Feb 2008 15:26:33 -0800 Subject: [Numpy-discussion] Faster array version of ndindex In-Reply-To: <463e11f90712131533u17e5a976gb1e4b31ab61107a7@mail.gmail.com> References: <463e11f90712131533u17e5a976gb1e4b31ab61107a7@mail.gmail.com> Message-ID: Could you provide more details about this to the ticket I created based on your email: http://projects.scipy.org/scipy/numpy/ticket/636 Thanks, On Thu, Dec 13, 2007 at 3:33 PM, Jonathan Taylor wrote: > I was needing an array representation of ndindex since ndindex only > gives an iterator but array(list(ndindex)) takes too long. There is > prob some obvious way to do this I am missing but if not feel free to > include this code which is much faster. > > In [252]: time a=np.array(list(np.ndindex(10,10,10,10,10,10))) > CPU times: user 11.61 s, sys: 0.09 s, total: 11.70 s > Wall time: 11.82 > > In [253]: time a=ndtuples(10,10,10,10,10,10) > CPU times: user 0.32 s, sys: 0.21 s, total: 0.53 s > Wall time: 0.60 > > def ndtuples(*dims): > """Fast implementation of array(list(ndindex(*dims))).""" > > # Need a list because we will go through it in reverse popping > # off the size of the last dimension. > dims = list(dims) > > # N will keep track of the current length of the indices. > N = dims.pop() > > # At the beginning the current list of indices just ranges over the > # last dimension. > cur = np.arange(N) > cur = cur[:,np.newaxis] > > while dims != []: > > d = dims.pop() > > # This repeats the current set of indices d times. > # e.g. [0,1,2] -> [0,1,2,0,1,2,...,0,1,2] > cur = np.kron(np.ones((d,1)),cur) > > # This ranges over the new dimension and 'stretches' it by N. > # e.g. [0,1,2] -> [0,0,...,0,1,1,...,1,2,2,...,2] > front = np.arange(d).repeat(N)[:,np.newaxis] > > # This puts these two together. > cur = np.column_stack((front,cur)) > N *= d > > return cur > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From ondrej at certik.cz Fri Feb 22 19:48:52 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sat, 23 Feb 2008 01:48:52 +0100 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights Message-ID: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> Hi, more details in this bug report. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=467095 The bug report offers a fix for this problem. It seems to me this is not fixed even in the latest svn. Thanks, Ondrej From donaldsona at battelle.org Fri Feb 22 19:57:04 2008 From: donaldsona at battelle.org (Alex Donaldson) Date: Sat, 23 Feb 2008 00:57:04 +0000 (UTC) Subject: [Numpy-discussion] Problems installing numpy on Mac OS 10.4 Message-ID: I'm trying to install numpy 1.0.4 on my Intel MacBook Pro with Mac OS 10.4 and running Python 2.5. When I run python setup.py build I get the errors below. Is this a compatibility issue?: Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2_4422 blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec'] running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src building py_modules sources building extension "numpy.core.multiarray" sources Generating build/src.macosx-10.3-fat-2.5/numpy/core/config.h customize NAGFCompiler Could not locate executable f95 customize AbsoftFCompiler Could not locate executable f90 Could not locate executable f77 customize IBMFCompiler Could not locate executable xlf90 Could not locate executable xlf customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize GnuFCompiler Could not locate executable g77 customize Gnu95FCompiler Could not locate executable gfortran customize G95FCompiler Could not locate executable g95 .... SystemError: Failed to test configuration. See previous error messages for more information. From robert.kern at gmail.com Fri Feb 22 20:09:02 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Feb 2008 19:09:02 -0600 Subject: [Numpy-discussion] Problems installing numpy on Mac OS 10.4 In-Reply-To: References: Message-ID: <3d375d730802221709p319cab4cu1a726402e4aee74f@mail.gmail.com> On Fri, Feb 22, 2008 at 6:57 PM, Alex Donaldson wrote: > I'm trying to install numpy 1.0.4 on my Intel MacBook Pro with Mac OS 10.4 > and running Python 2.5. When I run python setup.py build I get the > errors below. Is this a compatibility issue?: I don't know. Please provide the entire output. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at enthought.com Fri Feb 22 20:10:32 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 22 Feb 2008 19:10:32 -0600 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights In-Reply-To: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> Message-ID: <47BF7288.7060206@enthought.com> Ondrej Certik wrote: > Hi, > > more details in this bug report. > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=467095 > > The bug report offers a fix for this problem. It seems to me this is > not fixed even in the latest svn. > > Is there a ticket on the NumPy trac for this? We won't see it if there isn't. Thanks for pointing us to the bug. -Travis From ondrej at certik.cz Fri Feb 22 20:24:31 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sat, 23 Feb 2008 02:24:31 +0100 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights In-Reply-To: <47BF7288.7060206@enthought.com> References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> <47BF7288.7060206@enthought.com> Message-ID: <85b5c3130802221724o729911beuc2c3c804c190bea1@mail.gmail.com> On Sat, Feb 23, 2008 at 2:10 AM, Travis E. Oliphant wrote: > > Ondrej Certik wrote: > > Hi, > > > > more details in this bug report. > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=467095 > > > > The bug report offers a fix for this problem. It seems to me this is > > not fixed even in the latest svn. > > > > > Is there a ticket on the NumPy trac for this? We won't see it if there > isn't. Thanks for pointing us to the bug. I'll add it. I registered on the trac, as required, but I am still denied, when filling my username and password when logging in. How can I create an account? Ondrej From robert.kern at gmail.com Fri Feb 22 20:30:42 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Feb 2008 19:30:42 -0600 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights In-Reply-To: <85b5c3130802221724o729911beuc2c3c804c190bea1@mail.gmail.com> References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> <47BF7288.7060206@enthought.com> <85b5c3130802221724o729911beuc2c3c804c190bea1@mail.gmail.com> Message-ID: <47BF7742.3070600@gmail.com> Ondrej Certik wrote: > On Sat, Feb 23, 2008 at 2:10 AM, Travis E. Oliphant > wrote: >> Ondrej Certik wrote: >> > Hi, >> > >> > more details in this bug report. >> > >> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=467095 >> > >> > The bug report offers a fix for this problem. It seems to me this is >> > not fixed even in the latest svn. >> > >> > >> Is there a ticket on the NumPy trac for this? We won't see it if there >> isn't. Thanks for pointing us to the bug. > > I'll add it. I registered on the trac, as required, but I am still > denied, when filling my username and password when logging in. > How can I create an account? That should have done it. When you say you are "denied", exactly what happens? I've run into times when I've logged in and I get the unaltered front page again. Logging in again usually works. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From donaldsona at battelle.org Fri Feb 22 20:34:20 2008 From: donaldsona at battelle.org (Alex Donaldson) Date: Sat, 23 Feb 2008 01:34:20 +0000 (UTC) Subject: [Numpy-discussion] Problems installing numpy on Mac OS 10.4 References: <3d375d730802221709p319cab4cu1a726402e4aee74f@mail.gmail.com> Message-ID: Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2_4422 blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec'] running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src building py_modules sources building extension "numpy.core.multiarray" sources Generating build/src.macosx-10.3-fat-2.5/numpy/core/config.h customize NAGFCompiler Could not locate executable f95 customize AbsoftFCompiler Could not locate executable f90 Could not locate executable f77 customize IBMFCompiler Could not locate executable xlf90 Could not locate executable xlf customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize GnuFCompiler Could not locate executable g77 customize Gnu95FCompiler Could not locate executable gfortran customize G95FCompiler Could not locate executable g95 don't know how to compile Fortran code on platform 'posix' C compiler: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 compile options: '-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -Inumpy/core/src -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' gcc: _configtest.c sh: line 1: gcc: command not found sh: line 1: gcc: command not found failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 89, in setup_package() File "setup.py", line 82, in setup_package configuration=configuration ) File "/Users/donaldsona/Desktop/numpy-1.0.4/numpy/distutils/core.py", line 176, in setup return old_setup(**new_attr) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ core.py", line 151, in setup dist.run_commands() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ dist.py", line 974, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ dist.py", line 994, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ command/build.py", line 112, in run self.run_command(cmd_name) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ cmd.py", line 333, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/ dist.py", line 994, in run_command cmd_obj.run() File "/Users/donaldsona/Desktop/numpy-1.0.4/numpy/distutils/command/build_src.py", line 130, in run self.build_sources() File "/Users/donaldsona/Desktop/numpy-1.0.4/numpy/distutils/command/build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/Users/donaldsona/Desktop/numpy-1.0.4/numpy/distutils/command/build_src.py", line 250, in build_extension_sources sources = self.generate_sources(sources, ext) File "/Users/donaldsona/Desktop/numpy-1.0.4/numpy/distutils/command/build_src.py", line 307, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 53, in generate_config_h raise SystemError,"Failed to test configuration. "\ SystemError: Failed to test configuration. See previous error messages for more information. From robert.kern at gmail.com Fri Feb 22 20:39:39 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Feb 2008 19:39:39 -0600 Subject: [Numpy-discussion] Problems installing numpy on Mac OS 10.4 In-Reply-To: References: <3d375d730802221709p319cab4cu1a726402e4aee74f@mail.gmail.com> Message-ID: <47BF795B.50105@gmail.com> Alex Donaldson wrote: > compile options: > '-I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 > -Inumpy/core/src -Inumpy/core/include > -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' > gcc: _configtest.c > sh: line 1: gcc: command not found This is your problem. You need to install the Developer Tools. I believe they are included on your OS X 10.4 installation DVD. Alternatively, you can download them from Apple: http://developer.apple.com/tools/download/ You will need Xcode 2.5. You will need to register (for free) to be part of the Apple Developer Connection. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Fri Feb 22 22:05:32 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 Feb 2008 19:05:32 -0800 Subject: [Numpy-discussion] matrix wart In-Reply-To: References: <47BCFA8F.2020808@enthought.com> <20080221231715.GC8095@mentat.za.net> <47BEE7CE.9020302@enthought.com> <47BEF0A6.9080808@enthought.com> <47BEFCEE.8040300@noaa.gov> <47BF18E2.50306@noaa.gov> Message-ID: <47BF8D7C.7090601@noaa.gov> Alan G Isaac wrote: > On Fri, 22 Feb 2008, Christopher Barker apparently wrote: >> because that's the whole point of a Matrix object in the >> first place. > > Do you really believe that? As phrased?? yes -- the matrix object is about style, not functionality -- not that style isn't important > (Out of curiosity: do you use matrices?) No. In fact, that's one of the reasons I was overjoyed to find Numeric after using Matlab for along time -- I hardly ever need linear algebra, what I need is n-d arrays. So, yes, I should just shut up and leave the discussion to those that really want to use them. I will note, however, that in reading this list for years, I haven't found that many people really do want matrices -- they are asked for a lot by Matlab converts, then often the users find that they can more easily do what they want with arrays after all. Maybe that's because the Matrix API needs improvement, so I guess what we really need is someone that really wants them to champion the cause. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From peridot.faceted at gmail.com Sat Feb 23 00:55:32 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 23 Feb 2008 00:55:32 -0500 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights In-Reply-To: <47BF7288.7060206@enthought.com> References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> <47BF7288.7060206@enthought.com> Message-ID: On 22/02/2008, Travis E. Oliphant wrote: > Is there a ticket on the NumPy trac for this? We won't see it if there > isn't. Thanks for pointing us to the bug. It appears to be fixed in SVN (that was quick!). But the Debian bug report also points out a peculiar unnecessary use of eval; the code is also slower and uses more memory than it has to. Attached is a patch to cure that. Anne -------------- next part -------------- A non-text attachment was scrubbed... Name: average.patch Type: text/x-patch Size: 693 bytes Desc: not available URL: From charlesr.harris at gmail.com Sat Feb 23 22:22:01 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 Feb 2008 20:22:01 -0700 Subject: [Numpy-discussion] Closing tickets Message-ID: I believe there are a number of tickets that can be closed. Will the relevant folks please comment on the following: #632: numpy.histogram fails with bin= #568: Aligned allocator for numpy #558: 'axis' support for numpy.median() #555: setting random seed does not work with a numpy.int64 #551: numpy.ndarray messed up after unpickling #69: Why numpy headers are installed using add_data_dir and not add_headers? #637: typo in scalarmathmodule.c.src for ctype_negative Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fullung at gmail.com Sat Feb 23 23:39:12 2008 From: fullung at gmail.com (Albert Strasheim) Date: Sun, 24 Feb 2008 06:39:12 +0200 Subject: [Numpy-discussion] Closing tickets References: Message-ID: <005501c8769f$341cf610$c884e892@sun.ac.za> Hello, Here's two tickets without a milestone that could probably do with some attention (setting a milestone would be a start): numpy.scons branch: setuptools' develop mode broken http://scipy.org/scipy/numpy/ticket/596 If float('123.45') works, so should numpy.float32('123.45') http://scipy.org/scipy/numpy/ticket/630 Cheers, Albert -------------- next part -------------- An HTML attachment was scrubbed... URL: From fullung at gmail.com Sat Feb 23 23:41:16 2008 From: fullung at gmail.com (Albert Strasheim) Date: Sun, 24 Feb 2008 06:41:16 +0200 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> <47BF7288.7060206@enthought.com><85b5c3130802221724o729911beuc2c3c804c190bea1@mail.gmail.com> <47BF7742.3070600@gmail.com> Message-ID: <006801c8769f$7e3a3280$c884e892@sun.ac.za> Hello, ----- Original Message ----- From: "Robert Kern" To: "Discussion of Numerical Python" Sent: Saturday, February 23, 2008 3:30 AM Subject: Re: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights > Ondrej Certik wrote: >> I'll add it. I registered on the trac, as required, but I am still >> denied, when filling my username and password when logging in. >> How can I create an account? > > That should have done it. When you say you are "denied", exactly what > happens? > I've run into times when I've logged in and I get the unaltered front page > again. Logging in again usually works. There is something strange going on. Logging in on projects.scipy.org/scipy/numpy usually redirects you to scipy.org/scipy/numpy, at which point you need to log in again. Cheers, Albert From david at ar.media.kyoto-u.ac.jp Sun Feb 24 00:47:50 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 24 Feb 2008 14:47:50 +0900 Subject: [Numpy-discussion] Closing tickets In-Reply-To: References: Message-ID: <47C10506.6070802@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > I believe there are a number of tickets that can be closed. Will the > relevant folks please comment on the following: > > #632: numpy.histogram fails with bin= > #568: Aligned allocator for numpy This one is not done, the ticket should not be closed. There was some discussion about it, but it did not go far enough. I don't see this going in 1.0.5; I am less convinced now that it is useful as it is (I certainly still believe we need aligned allocator, though). > #558: 'axis' support for numpy.median() > #555: setting random seed does not work with a numpy.int64 > #551: numpy.ndarray messed up after unpickling > #69: Why numpy headers are installed using add_data_dir and not > add_headers? I can't edit the main wiki page, so I cannot add the FAQ link. I would be happy to add the last description on the ticket, if Robert thinks this one is OK. cheers, David From david at ar.media.kyoto-u.ac.jp Sun Feb 24 00:52:30 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 24 Feb 2008 14:52:30 +0900 Subject: [Numpy-discussion] Closing tickets In-Reply-To: <005501c8769f$341cf610$c884e892@sun.ac.za> References: <005501c8769f$341cf610$c884e892@sun.ac.za> Message-ID: <47C1061E.5020809@ar.media.kyoto-u.ac.jp> Albert Strasheim wrote: > Hello, > > Here's two tickets without a milestone that could probably do with > some attention (setting a milestone would be a start): > > numpy.scons branch: setuptools' develop mode broken > http://scipy.org/scipy/numpy/ticket/596 I would change the ticket title to setuptools develop mode doesn't work, and then mark it as fixed. I don't think that develop mode works with numscons, though, but that should be a separate issue. Is it ok to change the title of a ticket ? cheers, David From berthe.loic at gmail.com Sun Feb 24 01:23:59 2008 From: berthe.loic at gmail.com (=?ISO-8859-1?Q?Lo=EFc_BERTHE?=) Date: Sun, 24 Feb 2008 07:23:59 +0100 Subject: [Numpy-discussion] numpy.test() fails if it runs after scipy.test() Message-ID: Hi, I've got one failure if I run numpy.test() after running scipy.test() : ====================================================================== ERROR: Ticket #396 ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", line 600, in check_poly1d_nan_roots self.failUnlessRaises(N.linalg.LinAlgError,getattr,p,"r") File "/home/loic/tmp/Linux/lib/python2.5/unittest.py", line 320, in failUnlessRaises callableObj(*args, **kwargs) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", line 623, in __getattr__ return roots(self.coeffs) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", line 124, in roots roots = _eigvals(A) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", line 37, in _eigvals return eigvals(arg) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/scipy/linalg/decomp.py", line 378, in eigvals return eig(a,b=b,left=0,right=0,overwrite_a=overwrite_a) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/scipy/linalg/decomp.py", line 128, in eig a1 = asarray_chkfinite(a) File "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/function_base.py", line 398, in asarray_chkfinite raise ValueError, "array must not contain infs or NaNs" ValueError: array must not contain infs or NaNs But I've got no error if I begin with numpy test. I've seen that this Ticket #396 seems closed in Trac, should I reopen it ? for more information, I've attached the results of scipy.test and numpy.test . Regards, -- LB -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: scipy_numpy.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: numpy_scipy.txt URL: From ondrej at certik.cz Sun Feb 24 04:31:40 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sun, 24 Feb 2008 10:31:40 +0100 Subject: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights In-Reply-To: <006801c8769f$7e3a3280$c884e892@sun.ac.za> References: <85b5c3130802221648w61a0e181w25f9b9acb939c76e@mail.gmail.com> <47BF7288.7060206@enthought.com> <85b5c3130802221724o729911beuc2c3c804c190bea1@mail.gmail.com> <47BF7742.3070600@gmail.com> <006801c8769f$7e3a3280$c884e892@sun.ac.za> Message-ID: <85b5c3130802240131u6139c76ct8f9a29ed1cf6ce8@mail.gmail.com> On Sun, Feb 24, 2008 at 5:41 AM, Albert Strasheim wrote: > Hello, > > ----- Original Message ----- > From: "Robert Kern" > To: "Discussion of Numerical Python" > Sent: Saturday, February 23, 2008 3:30 AM > Subject: Re: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the > wrong result with weights > > > > > Ondrej Certik wrote: > >> I'll add it. I registered on the trac, as required, but I am still > >> denied, when filling my username and password when logging in. > >> How can I create an account? > > > > That should have done it. When you say you are "denied", exactly what > > happens? > > I've run into times when I've logged in and I get the unaltered front page > > again. Logging in again usually works. > > There is something strange going on. Logging in on > projects.scipy.org/scipy/numpy usually redirects you to > scipy.org/scipy/numpy, at which point you need to log in again. I tried it several times now - register -> login -> it doesn't accept the password. Then it redirects to scipy.org/scipy/numpy as you said. Then I went back to the original page and then the login worked. (But I did the same trick before and it didn't work). Anyway, it seems to be working now, thanks for the help. Ondrej From matthew.brett at gmail.com Sun Feb 24 13:09:13 2008 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 24 Feb 2008 18:09:13 +0000 Subject: [Numpy-discussion] Closing tickets In-Reply-To: References: Message-ID: <1e2af89e0802241009v7c8b2138jdf584d8d1c7ad17e@mail.gmail.com> Hi, > #558: 'axis' support for numpy.median() This one should be fixed, and can be closed. I don't think I have wiki login rights though. Thanks, Matthew From chanley at stsci.edu Mon Feb 25 08:33:31 2008 From: chanley at stsci.edu (Christopher Hanley) Date: Mon, 25 Feb 2008 08:33:31 -0500 Subject: [Numpy-discussion] build issue on Solaris System Message-ID: <47C2C3AB.3020706@stsci.edu> Good morning, I've just done a fresh checkout from SVN and attempted to build numpy on a Solaris 10 system. The build actually completes normally. However, when attempting to import numpy I get the following error: Python 2.5.1 (r251:54863, Jan 21 2008, 11:03:00) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy Traceback (most recent call last): File "", line 1, in File "/data/basil5/site-packages/lib/python/numpy/__init__.py", line 51, in import linalg File "/data/basil5/site-packages/lib/python/numpy/linalg/__init__.py", line 4, in from linalg import * File "/data/basil5/site-packages/lib/python/numpy/linalg/linalg.py", line 28, in from numpy.linalg import lapack_lite ImportError: ld.so.1: python: fatal: relocation error: file /data/basil5/site-packages/lib/python/numpy/linalg/lapack_lite.so: symbol s_cat: referenced symbol not found >>> I haven't had these issues with lapack in the past. Has anyone made changes to linalg dependencies lately that I must have missed? Thank you for your time and help, Chris -- Christopher Hanley Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From bolme1234 at comcast.net Mon Feb 25 12:28:12 2008 From: bolme1234 at comcast.net (David Bolme) Date: Mon, 25 Feb 2008 10:28:12 -0700 Subject: [Numpy-discussion] Closing tickets In-Reply-To: References: Message-ID: <72B0ED5E-0B61-4431-9697-CE9DE7F5B7E0@comcast.net> > #551: numpy.ndarray messed up after unpickling I have added a comment to this one. I don't think it should be closed. I think there is a problem with initialization of the ndarray. The normal constructor works fine, loading from pickle does not. From debl2 at verizon.net Mon Feb 25 13:44:38 2008 From: debl2 at verizon.net (debl2 at verizon.net) Date: Mon, 25 Feb 2008 12:44:38 -0600 (CST) Subject: [Numpy-discussion] How do I change endianess? Message-ID: <27276504.3892761203965078678.JavaMail.root@vms075.mailsrvcs.net> I would like to change the Endianess of a large of data written on PC so I can process it on a Solaris box. I see that the dtype.str attribute is read-only. TIA David Lees From robert.kern at gmail.com Mon Feb 25 13:51:31 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 25 Feb 2008 12:51:31 -0600 Subject: [Numpy-discussion] How do I change endianess? In-Reply-To: <27276504.3892761203965078678.JavaMail.root@vms075.mailsrvcs.net> References: <27276504.3892761203965078678.JavaMail.root@vms075.mailsrvcs.net> Message-ID: <3d375d730802251051v5a440bdfv63ff5e0a85dd89c3@mail.gmail.com> On Mon, Feb 25, 2008 at 12:44 PM, wrote: > I would like to change the Endianess of a large of data written on PC so I can process it on a Solaris box. I see that the dtype.str attribute is read-only. Read it in with the appropriately-endian dtype in the first place. import numpy dt = numpy.dtype(numpy.int32).newbyteorder('<') -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chanley at stsci.edu Mon Feb 25 13:57:05 2008 From: chanley at stsci.edu (Christopher Hanley) Date: Mon, 25 Feb 2008 13:57:05 -0500 Subject: [Numpy-discussion] How do I change endianess? In-Reply-To: <27276504.3892761203965078678.JavaMail.root@vms075.mailsrvcs.net> References: <27276504.3892761203965078678.JavaMail.root@vms075.mailsrvcs.net> Message-ID: <47C30F81.7070800@stsci.edu> debl2 at verizon.net wrote: > I would like to change the Endianess of a large of data written on PC so I can process it on a Solaris box. I see that the dtype.str attribute is read-only. > > TIA > > David Lees > > You can use the byteswap method to change the byte order of your array. >>> obj = obj.byteswap() -- Christopher Hanley Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From gingekerr at gmail.com Mon Feb 25 15:58:24 2008 From: gingekerr at gmail.com (Christopher Kerr) Date: Mon, 25 Feb 2008 20:58:24 +0000 Subject: [Numpy-discussion] numpy.random.randint() inconsistent with plain random.randint() Message-ID: I don't know if this is the right place to report bugs, but I couldn't find anywhere else on the website... random.randint(min,max) from python core returns an integer between min and max inclusive. The documentation on the website says that numpy.random.randint(min,max [,size]) does this too, but it in fact only ever returns numbers strictly less than the max, and gives an error if min is equal to max Python 2.4.4 (#1, Dec 20 2007, 08:43:49) [GCC 4.1.2 20070214 ( (gdc 0.24, using dmd 1.020)) (Gentoo 4.1.2 p1.0.2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import random >>> import numpy.random >>> numpy.version.version '1.0.4' >>> random.randint(3,3) 3 >>> numpy.random.randint(3,3) Traceback (most recent call last): File "", line 1, in ? File "mtrand.pyx", line 600, in mtrand.RandomState.randint ValueError: low >= high >>> numpy.random.randint(1,3,(2,10)) array([[2, 1, 2, 1, 2, 1, 2, 1, 2, 1], [1, 2, 1, 2, 1, 1, 1, 2, 1, 2]]) From robert.kern at gmail.com Mon Feb 25 16:39:35 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 25 Feb 2008 15:39:35 -0600 Subject: [Numpy-discussion] numpy.random.randint() inconsistent with plain random.randint() In-Reply-To: References: Message-ID: <3d375d730802251339s5def4mc177932889d5a2e6@mail.gmail.com> On Mon, Feb 25, 2008 at 2:58 PM, Christopher Kerr wrote: > I don't know if this is the right place to report bugs, but I couldn't find > anywhere else on the website... > > random.randint(min,max) from python core returns an integer between min and > max inclusive. The documentation on the website says that > numpy.random.randint(min,max [,size]) does this too, but it in fact only > ever returns numbers strictly less than the max, and gives an error if min > is equal to max The documentation on what website? It needs to be fixed. numpy.random.randint() is behaving correctly. numpy.random is not intended to replace the standard library's random module. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pisuke at blueyonder.co.uk Mon Feb 25 18:38:42 2008 From: pisuke at blueyonder.co.uk (Francesco Anselmo) Date: Mon, 25 Feb 2008 23:38:42 +0000 Subject: [Numpy-discussion] py2exe, pyinstaller and numpy Message-ID: <1203982722.22567.27.camel@leviathan> Hi all! I have just subscribed to this mailing list and would like to say hello to everybody and a big thank you to all the people involved in the development of numpy and scipy. My first question is about packaging numpy scripts into executables. I have googled around and read mailing-list posts, but I haven't found the definitive answer, yet: is there a working method of "compiling" numpy and scipy scripts into binaries on Windows? In the best occasion, I manage to create the executable, but get errors like this one: No scipy-style subpackage 'dft' found in Z:\Projects\lightinglib\dist \library.zi p\numpy. Ignoring: cannot import name deprecate Traceback (most recent call last): File "uv2lmap.py", line 5, in ? File "zipextimporter.pyc", line 82, in load_module File "threedee\__init__.pyc", line 5, in ? File "zipextimporter.pyc", line 82, in load_module File "threedee\mapping.pyc", line 14, in ? File "zipextimporter.pyc", line 82, in load_module File "numpy\__init__.pyc", line 35, in ? File "numpy\_import_tools.pyc", line 173, in __call__ File "numpy\_import_tools.pyc", line 68, in _init_info_modules File "", line 1, in ? File "zipextimporter.pyc", line 82, in load_module File "numpy\random\__init__.pyc", line 3, in ? File "zipextimporter.pyc", line 98, in load_module File "numpy.pxi", line 32, in mtrand AttributeError: 'module' object has no attribute 'dtype' Many thanks in advance. Francesco From trond at unc.edu Mon Feb 25 21:08:16 2008 From: trond at unc.edu (Trond Kristiansen) Date: Mon, 25 Feb 2008 21:08:16 -0500 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: Message-ID: Hi all. This is my first email to the discussion group. I have spent two days trying to get a particular loop to speed up, and the best result I got was this: tmp1=zeros((eta,xi),float) tmp2=zeros((eta,xi),float) tmp1=tmp1+10000 tmp2=tmp2+10000 for i in range(xi): for j in range(eta): for k in range(s): if z_r[k,j,i] < depth: if tmp1[j,i]==10000: if (depth - z_r[k,j,i]) <= (depth - z_r[0,j,i]): tmp1[j,i]=k else: if (depth - z_r[k,j,i]) <= (depth - z_r[int(tmp1[j,i]),j,i]): tmp1[j,i]=k elif z_r[k,j,i] >= depth: if tmp2[j,i]==10000: if abs(depth - z_r[k,j,i]) <= abs(depth - z_r[s-1,j,i]) : tmp2[j,i]=k else: if abs(depth - z_r[k,j,i]) <= abs(depth - z_r[int(tmp2[j,i]),j,i]) : tmp2[j,i]=k Not very impressive. My problem is that I can not find any numpy functions that actually can do the tests I do for each k value in the arrays. I need to identify the position (i,j) of k-values that meet the specified requirement. There are way too many if-else tests here as well. The scripts takes about 10 seconds to run (for 238MB input file), but these arrays are read from netCDF files and can be much larger and easily grow to enormous dimensions. It is therefore crucial for me to speed things up. Hope some of you can help. I really appreciate all feedback on this. I am just a rooky to numpy. Cheers and thanks for all help, Trond From robert.kern at gmail.com Mon Feb 25 21:15:36 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 25 Feb 2008 20:15:36 -0600 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: Message-ID: <3d375d730802251815u4ff9aa20o2b87ee083a3096ed@mail.gmail.com> On Mon, Feb 25, 2008 at 8:08 PM, Trond Kristiansen wrote: > > > Hi all. > This is my first email to the discussion group. I have spent two days trying > to get a particular loop to speed up, and the best result I got was this: Can you try to repost this in such a way that the indentation is preserved? Feel free to attach it as a text file. Also, can you describe at a higher level what it is you are trying to accomplish and what the arrays mean? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From trond at unc.edu Mon Feb 25 21:32:54 2008 From: trond at unc.edu (Trond Kristiansen) Date: Mon, 25 Feb 2008 21:32:54 -0500 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: <3d375d730802251815u4ff9aa20o2b87ee083a3096ed@mail.gmail.com> Message-ID: Hi again. I have attached the function that the FOR loop is part of as a python file. What I am trying to do is to create a set of functions that will read the output files (NetCDF) from running the ROMS model (ocean model). The output file is organized in xi (x-direction), eta (y-direction), and s (z-direction) where the s-values are vertical layers and not depth. This particular function (z_slice) will find the closest upper and lower s-layer for a given depth in meters (e.g. -100). Then values from the two selcted layers will be interpolated to create a new layer at the selected depth (-100). The problem is that the s-layers follow the bathymetry and a particular s-layer will therefore sometimes be above and sometimes below the selected depth that we want to interpolate to. That's why I need a quick script that searches all of the layers and find the upper and lower layers for a given depth value (which is negative). The z_r is a 3D array (s,eta,xi) that is created using the netcdf module. The main goal of these set of functions is to move away from using matlab, but also to speed things up. The sliced data array will be plotted using GMT or pyNGL. Thanks for helping me. Cheers, Trond On 2/25/08 9:15 PM, "Robert Kern" wrote: > On Mon, Feb 25, 2008 at 8:08 PM, Trond Kristiansen wrote: >> >> >> Hi all. >> This is my first email to the discussion group. I have spent two days trying >> to get a particular loop to speed up, and the best result I got was this: > > Can you try to repost this in such a way that the indentation is > preserved? Feel free to attach it as a text file. Also, can you > describe at a higher level what it is you are trying to accomplish and > what the arrays mean? -------------- next part -------------- A non-text attachment was scrubbed... Name: z_slice.py Type: application/octet-stream Size: 1708 bytes Desc: not available URL: From hoytak at gmail.com Mon Feb 25 21:40:19 2008 From: hoytak at gmail.com (Hoyt Koepke) Date: Mon, 25 Feb 2008 18:40:19 -0800 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: <3d375d730802251815u4ff9aa20o2b87ee083a3096ed@mail.gmail.com> Message-ID: <4db580fd0802251840p6bfc644etbb6e2532c70badd6@mail.gmail.com> I would definitely suggest using scipy's weave.inline for this. It seems like this particular function can be translated into C code really easily, which would give you a HUGE speed up. Look at some of the examples in scipy/weave/examples to see how to do this. The numpy book also has a section on it. One of the reasons I've left matlab and never looked back is how easy it is to interweave bits of compiled C code for loops like this. --Hoyt On Mon, Feb 25, 2008 at 6:32 PM, Trond Kristiansen wrote: > Hi again. > > I have attached the function that the FOR loop is part of as a python file. > What I am trying to do is to create a set of functions that will read the > output files (NetCDF) from running the ROMS model (ocean model). The output > file is organized in xi (x-direction), eta (y-direction), and s > (z-direction) where the s-values are vertical layers and not depth. This > particular function (z_slice) will find the closest upper and lower s-layer > for a given depth in meters (e.g. -100). Then values from the two selcted > layers will be interpolated to create a new layer at the selected depth > (-100). The problem is that the s-layers follow the bathymetry and a > particular s-layer will therefore sometimes be above and sometimes below the > selected depth that we want to interpolate to. That's why I need a quick > script that searches all of the layers and find the upper and lower layers > for a given depth value (which is negative). The z_r is a 3D array > (s,eta,xi) that is created using the netcdf module. > > The main goal of these set of functions is to move away from using matlab, > but also to speed things up. The sliced data array will be plotted using GMT > or pyNGL. > > Thanks for helping me. Cheers, Trond > > > > > On 2/25/08 9:15 PM, "Robert Kern" wrote: > > > On Mon, Feb 25, 2008 at 8:08 PM, Trond Kristiansen wrote: > >> > >> > >> Hi all. > >> This is my first email to the discussion group. I have spent two days trying > >> to get a particular loop to speed up, and the best result I got was this: > > > > Can you try to repost this in such a way that the indentation is > > preserved? Feel free to attach it as a text file. Also, can you > > describe at a higher level what it is you are trying to accomplish and > > what the arrays mean? > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Mon Feb 25 23:40:22 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Feb 2008 21:40:22 -0700 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: Message-ID: On Mon, Feb 25, 2008 at 7:08 PM, Trond Kristiansen wrote: > > > Hi all. > This is my first email to the discussion group. I have spent two days > trying > to get a particular loop to speed up, and the best result I got was this: > > tmp1=zeros((eta,xi),float) > > tmp2=zeros((eta,xi),float) > > tmp1=tmp1+10000 > > tmp2=tmp2+10000 > > for i in range(xi): > > for j in range(eta): > > for k in range(s): > > > > if z_r[k,j,i] < depth: > > if tmp1[j,i]==10000: > > if (depth - z_r[k,j,i]) <= (depth - z_r[0,j,i]): > > tmp1[j,i]=k > > > else: > > if (depth - z_r[k,j,i]) <= (depth - z_r[int(tmp1[j,i]),j,i]): > > tmp1[j,i]=k > > elif z_r[k,j,i] >= depth: > > > if tmp2[j,i]==10000: > > if abs(depth - z_r[k,j,i]) <= abs(depth - z_r[s-1,j,i]) : > > tmp2[j,i]=k > > > else: > > if abs(depth - z_r[k,j,i]) <= abs(depth - z_r[int(tmp2[j,i]),j,i]) : > > tmp2[j,i]=k > > Not very impressive. My problem is that I can not find any numpy functions > that actually can do the tests I do for each k value in the arrays. I need > to identify the position (i,j) of k-values that meet the specified > requirement. There are way too many if-else tests here as well. The > scripts > takes about 10 seconds to run (for 238MB input file), but these arrays are > read from netCDF files and can be much larger and easily grow to enormous > dimensions. It is therefore crucial for me to speed things up. Hope some > of > you can help. I really appreciate all feedback on this. I am just a rooky > to > numpy. > Python if statements are certain death. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Feb 25 23:54:15 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 25 Feb 2008 22:54:15 -0600 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: <3d375d730802251815u4ff9aa20o2b87ee083a3096ed@mail.gmail.com> Message-ID: <3d375d730802252054v6840b9c9ua1cd127f6fa1629@mail.gmail.com> On Mon, Feb 25, 2008 at 8:32 PM, Trond Kristiansen wrote: > Hi again. > > I have attached the function that the FOR loop is part of as a python file. > What I am trying to do is to create a set of functions that will read the > output files (NetCDF) from running the ROMS model (ocean model). The output > file is organized in xi (x-direction), eta (y-direction), and s > (z-direction) where the s-values are vertical layers and not depth. This > particular function (z_slice) will find the closest upper and lower s-layer > for a given depth in meters (e.g. -100). Then values from the two selcted > layers will be interpolated to create a new layer at the selected depth > (-100). The problem is that the s-layers follow the bathymetry and a > particular s-layer will therefore sometimes be above and sometimes below the > selected depth that we want to interpolate to. That's why I need a quick > script that searches all of the layers and find the upper and lower layers > for a given depth value (which is negative). The z_r is a 3D array > (s,eta,xi) that is created using the netcdf module. Ah, that makes things clearer. You should be able to remove the innermost k-loop by using searchsorted(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From peridot.faceted at gmail.com Tue Feb 26 03:41:02 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 26 Feb 2008 03:41:02 -0500 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: <3d375d730802251815u4ff9aa20o2b87ee083a3096ed@mail.gmail.com> Message-ID: On 25/02/2008, Trond Kristiansen wrote: > I have attached the function that the FOR loop is part of as a python file. > What I am trying to do is to create a set of functions that will read the > output files (NetCDF) from running the ROMS model (ocean model). The output > file is organized in xi (x-direction), eta (y-direction), and s > (z-direction) where the s-values are vertical layers and not depth. This > particular function (z_slice) will find the closest upper and lower s-layer > for a given depth in meters (e.g. -100). Then values from the two selcted > layers will be interpolated to create a new layer at the selected depth > (-100). The problem is that the s-layers follow the bathymetry and a > particular s-layer will therefore sometimes be above and sometimes below the > selected depth that we want to interpolate to. That's why I need a quick > script that searches all of the layers and find the upper and lower layers > for a given depth value (which is negative). The z_r is a 3D array > (s,eta,xi) that is created using the netcdf module. If I understand what you're doing correctly, you have a 3D array of depths, indexed by two unproblematic coordinates (eta and xi), and by a third coordinate whose values are peculiar. For each pair (eta, xi), you want to find the value of the third coordinate that occurs at a depth closest to some given depth. Are you looking to do this for many values of depth? It seems like what you're trying to do is (approximately) invert a function - you have z as a function of s, and you want s as a function of z. Is the mapping between depths and s values monotonic? First of all, numpy.searchsorted() is the right tool for finding where depth occurs in a list of z values. It gives you the position of the next lower value; it's pretty straightforward to write down a linear interpolation (you should be able to do it without any for loop at all) and more sophisticated interpolation schemes may be straightforward as well. If you want a smoother interpolation scheme, you may want to look at scipy.interpolate. It is not always as vectorial as you might wish, but it implements splines (optionally with smoothing) quite efficiently. I believe scipy.ndimage may well have some suitable interpolation code as well. In my opinion, the value of numpy/scipy comes from the tools that allow you to express major parts of your algorithm as a line or two of code. But it is often difficult to take advantage of this by "trying to accelerate for loops". I find it really pays to step back and look at my algorithm and think how to express it as uniform operations on arrays. (Of course it helps to have a rough idea of what operations are available; the numpy documentation is sometimes sadly lacking in terms of letting you know what tools are there.) You might find useful the new http://www.scipy.org/Numpy_Functions_by_Category Good luck, Anne From gingekerr at gmail.com Tue Feb 26 05:10:30 2008 From: gingekerr at gmail.com (Christopher Kerr) Date: Tue, 26 Feb 2008 10:10:30 +0000 Subject: [Numpy-discussion] numpy.random.randint() inconsistent with plain random.randint() References: <3d375d730802251339s5def4mc177932889d5a2e6@mail.gmail.com> Message-ID: Robert Kern wrote: > On Mon, Feb 25, 2008 at 2:58 PM, Christopher Kerr > wrote: >> I don't know if this is the right place to report bugs, but I couldn't >> find >> anywhere else on the website... >> >> random.randint(min,max) from python core returns an integer between min >> and max inclusive. The documentation on the website says that >> numpy.random.randint(min,max [,size]) does this too, but it in fact only >> ever returns numbers strictly less than the max, and gives an error if >> min is equal to max > > The documentation on what website? It needs to be fixed. > numpy.random.randint() is behaving correctly. numpy.random is not > intended to replace the standard library's random module. > It looks like I might have misread the documentation the first time round. It still seems rather silly to have a function with the same name as one in the core distribution but with different semantics, though. (On the other hand, you could argue that randint() in the core distribution is inconsistent with the general policy in Python of describing ranges as half-open intervals.) Perhaps the easiest way to deal with this without breaking things add something in BIG BOLD LETTERS to the API docs saying that numpy.random.randint behaves differently to the core random.randint, so that other newbies don't hit the same problems as me (i.e. my Monte Carlo never touching one side of my array). From ndbecker2 at gmail.com Tue Feb 26 06:40:41 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 26 Feb 2008 06:40:41 -0500 Subject: [Numpy-discussion] A little help please? Message-ID: My user-defined type project has mostly gone well, but I'm stuck on mixed-type arithmetic. I have 2 types: cmplx_int32 and cmplx_int64. I have added basic arithmetic for those types, and for mix of those arrays and their respective scalars. But mixed arithmetic only partly works. In [2]: a Out[2]: array([(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0)], dtype=cmplx_int32) In [3]: b Out[3]: array([(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0)], dtype=cmplx_int64) In [4]: a+b --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/nbecker/numpy/ in () TypeError: function not supported for these types, and can't coerce safely to supported types In [5]: b+a Out[5]: array([(0,0), (2,0), (4,0), (6,0), (8,0), (10,0), (12,0), (14,0), (16,0), (18,0)], dtype=cmplx_int64) What I did: d1 is dtype cmplx_int32, d2 is dtype cmplx_int64. 1. PyArray_RegisterCastFunc (d1, d2->type_num, &cmplx_to_cmplx); That registers a conversion from cmplx_int32->cmplx_int64. (Docs never explain when this conversion is used, BTW) 2. PyArray_RegisterCanCast (d1, d2->type_num, PyArray_NOSCALAR); 3. d1->f->castdict = PyDict_New(); PyObject *key = PyInt_FromLong (d2->type_num); PyObject *cobj = PyCObject_FromVoidPtr ((void*)(void(* (void*,void*,npy_intp,void*,void*))&cmplx_to_cmplx, 0); PyDict_SetItem (d1->f->castdict, key, cobj); So now add (cmplx_int64, cmplx_int32) is OK, the 2nd arg is converted, but no attempt is made at radd. Obviously, I don't want to convert 64->32. Any clues what's missing here? From lxander.m at gmail.com Tue Feb 26 09:25:50 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Tue, 26 Feb 2008 09:25:50 -0500 Subject: [Numpy-discussion] Trouble With MaskedArray and Shared Masks Message-ID: <525f23e80802260625y755f8610o252171a22d27fef7@mail.gmail.com> I'm having trouble with MaskedArray's _sharedmask flag. I would like to create a sub-view of a MaskedArray, fill it, and have the modifications reflected in the original array. This works with regular ndarrays, but only works with MaskedArrays if _sharedmask is set to False. Here's an example: >>> a = numpy.ma.MaskedArray( ... data=numpy.zeros((4,5), dtype=float), ... mask=numpy.ones((4,5), dtype=numpy.ma.MaskType), ... fill_value=0.0 ... ) >>> sub_a = a[:2,:3] >>> sub_a[0,0] = 1.0 >>> print sub_a [[1.0 -- --] [-- -- --]] >>> print a [[-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --]] >>> print a.data [[ 1. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]] The data array receives the new value, but the mask array does not. >>> a._sharedmask = False >>> sub_a = a[:2,:3] >>> sub_a[0,0] = 1.0 >>> print sub_a [[1.0 -- --] [-- -- --]] >>> print a [[1.0 -- -- -- --] [-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --]] This sort of (for me) unexpected behavior extends to other ways I've been using numpy arrays as well: a[:] = 1.0 (set to constant); a[:] = b (copy into); a[:5] = a[-5:] (rotating copy), etc. I wasn't seeing this behavior before because I was working on an array that had already been sliced and therefore "unshared", which caused a good deal of confusion for me when I started working on an array that wasn't the product of slicing. All of this leads me to some questions. What is the rational for initializing a new MaskedArray with _sharedmask=True when its mask isn't (actively) being shared yet? Is there a better way to say: "a=MaskedArray(...); a._sharedmask=False" that does not require touching a "private" attribute? Or am I going about this all wrong? What's the correct MaskedArray idioms for these actions that doesn't cause a new mask to be created? Thanks! Alex From oliphant at enthought.com Tue Feb 26 10:13:54 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Tue, 26 Feb 2008 09:13:54 -0600 Subject: [Numpy-discussion] A little help please? In-Reply-To: References: Message-ID: <47C42CB2.7080007@enthought.com> Neal Becker wrote: > My user-defined type project has mostly gone well, but I'm stuck on > mixed-type arithmetic. > > I have 2 types: cmplx_int32 and cmplx_int64. I have added basic arithmetic > for those types, and for mix of those arrays and their respective scalars. > But mixed arithmetic only partly works This is an area that needs testing and possible fixes. The relevant code is in ufuncobject.c (select_types) and in multiarraymodule.c (PyArray_CanCoerceScalar). If you can go through that code you may be able to see what the problem is and let us know. I tried to support this kind of thing you are doing, but I'm not sure how well I succeeded because I didn't have time or the code to test it with. Thus, there is still some work to do. The fact that radd is not called is because ufuncs try to handle everything (the ufunc is more general than just the functions with "r" prefixes. I think one problem may be due to the fact that the first argument to a ufunc is the one that defines the search for the correctly registered function and there may be no code to allow other arguments to direct the search should that one fail. I'm actually pleased you've gotten this far. I'll keep trying to help as I get time. -Travis O. From pontikakis at gmail.com Tue Feb 26 11:18:59 2008 From: pontikakis at gmail.com (Manos Pontikakis) Date: Tue, 26 Feb 2008 10:18:59 -0600 Subject: [Numpy-discussion] #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?) Message-ID: Hello, I am trying to install numpy 1.0.4 to the following machine: $ uname -a Linux myhostname 2.6.9-55.ELsmp #1 SMP Sat Apr 21 11:16:24 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux and I am getting the error that appears in the subject of this email. It appears that BLAS and LAPACK libraries cannot be found as well, but I think numpy still shouldn't have problem with that. There must be some problem with 64 bit. Please see the complete log. ------------------------------------------------------------------------------------------- Running from numpy source directory. F2PY Version 2_4422 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib NOT AVAILABLE atlas_blas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib/sse2 libraries f77blas,cblas,atlas not found in /usr/lib NOT AVAILABLE /.../numpy-1.0.4/numpy/distutils/system_info.py:1340: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) blas_info: libraries blas not found in /usr/local/lib FOUND: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 FOUND: libraries = ['blas'] library_dirs = ['/usr/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries f77blas,cblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib numpy.distutils.system_info.atlas_info NOT AVAILABLE /.../numpy-1.0.4/numpy/distutils/system_info.py:1247: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) lapack_info: libraries lapack not found in /usr/local/lib FOUND: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 FOUND: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 running install running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src building py_modules sources building extension "numpy.core.multiarray" sources Generating build/src.linux-x86_64-2.5/numpy/core/config.h customize GnuFCompiler Could not locate executable g77 Could not locate executable f77 customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize LaheyFCompiler Could not locate executable lf95 customize PGroupFCompiler Could not locate executable pgf90 Could not locate executable pgf77 customize AbsoftFCompiler Could not locate executable f90 customize NAGFCompiler Could not locate executable f95 customize VastFCompiler customize GnuFCompiler customize CompaqFCompiler Could not locate executable fort customize IntelItaniumFCompiler Could not locate executable efort Could not locate executable efc customize IntelEM64TFCompiler customize Gnu95FCompiler Found executable /usr/bin/gfortran customize G95FCompiler Could not locate executable g95 don't know how to compile Fortran code on platform 'posix' C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-I/.../python2.5 -Inumpy/core/src -Inumpy/core/include -I/.../python2.5 -c' gcc: _configtest.c In file included from /.../python2.5/Python.h:57, from _configtest.c:2: */.../python2.5/pyport.h:730:2: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."* _configtest.c: In function `main': _configtest.c:50: warning: int format, different type arg (arg 3) _configtest.c:57: warning: int format, different type arg (arg 3) _configtest.c:72: warning: int format, different type arg (arg 3) In file included from /.../python2.5/Python.h:57, from _configtest.c:2: */.../python2.5/pyport.h:730:2: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."* _configtest.c: In function `main': _configtest.c:50: warning: int format, different type arg (arg 3) _configtest.c:57: warning: int format, different type arg (arg 3) _configtest.c:72: warning: int format, different type arg (arg 3) failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 89, in setup_package() File "setup.py", line 82, in setup_package configuration=configuration ) File "/.../numpy-1.0.4/numpy/distutils/core.py", line 176, in setup return old_setup(**new_attr) File "/.../python2.5/distutils/core.py", line 151, in setup dist.run_commands() File "/.../python2.5/distutils/dist.py", line 974, in run_commands self.run_command(cmd) File "/.../python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/.../numpy-1.0.4/numpy/distutils/command/install.py", line 16, in run r = old_install.run(self) File "/.../python2.5/distutils/command/install.py", line 506, in run self.run_command('build') File "/.../python2.5/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/.../python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/.../python2.5/distutils/command/build.py", line 112, in run self.run_command(cmd_name) File "/.../python2.5/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/.../python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/.../numpy-1.0.4/numpy/distutils/command/build_src.py", line 130, in run self.build_sources() File "/.../numpy-1.0.4/numpy/distutils/command/build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/.../numpy-1.0.4/numpy/distutils/command/build_src.py", line 250, in build_extension_sources sources = self.generate_sources(sources, ext) File "/.../numpy-1.0.4/numpy/distutils/command/build_src.py", line 307, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 53, in generate_config_h raise SystemError,"Failed to test configuration. "\ SystemError: Failed to test configuration. See previous error messages for more information. ------------------------------------------------------------------------------------------- Is there a specific configuration that I should apply for 64 bit machines? Thank you, Manos -------------- next part -------------- An HTML attachment was scrubbed... URL: From chanley at stsci.edu Tue Feb 26 12:50:17 2008 From: chanley at stsci.edu (Christopher Hanley) Date: Tue, 26 Feb 2008 12:50:17 -0500 Subject: [Numpy-discussion] FORTRAN compiler detection Message-ID: <47C45159.2080900@stsci.edu> Greetings, I was wondering if within the last 8 - 10 weeks anyone has made changes to the way FORTRAN compilers are detected. In the past I was able to specify which compiler was used by the F77 system variable. However, I am now having a f90 compiler that exists on my Solaris system detected regardless of the F77 value. Even unsetting the F77 variable leads to the use of the f90 compiler. I was going to look though the distutil change logs but I was hoping that someone might remember changing something off the top of their heads. Thank you for your time and help, Chris -- Christopher Hanley Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From robert.kern at gmail.com Tue Feb 26 13:02:30 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 Feb 2008 12:02:30 -0600 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <47C45159.2080900@stsci.edu> References: <47C45159.2080900@stsci.edu> Message-ID: <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: > Greetings, > > I was wondering if within the last 8 - 10 weeks anyone has made changes > to the way FORTRAN compilers are detected. In the past I was able to > specify which compiler was used by the F77 system variable. However, I > am now having a f90 compiler that exists on my Solaris system detected > regardless of the F77 value. Even unsetting the F77 variable leads to > the use of the f90 compiler. > > I was going to look though the distutil change logs but I was hoping > that someone might remember changing something off the top of their heads. Which FORTRAN compilers do you have installed? What --fcompiler flag are you using? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chanley at stsci.edu Tue Feb 26 13:09:49 2008 From: chanley at stsci.edu (Christopher Hanley) Date: Tue, 26 Feb 2008 13:09:49 -0500 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> Message-ID: <47C455ED.1060607@stsci.edu> Robert Kern wrote: > On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: >> Greetings, >> >> I was wondering if within the last 8 - 10 weeks anyone has made changes >> to the way FORTRAN compilers are detected. In the past I was able to >> specify which compiler was used by the F77 system variable. However, I >> am now having a f90 compiler that exists on my Solaris system detected >> regardless of the F77 value. Even unsetting the F77 variable leads to >> the use of the f90 compiler. >> >> I was going to look though the distutil change logs but I was hoping >> that someone might remember changing something off the top of their heads. > > Which FORTRAN compilers do you have installed? What --fcompiler flag > are you using? > Hi Robert, Thank you for your help. I am not using the --fcompiler flag. The FORTRAN compilers I have installed are: f77: Sun WorkShop 6 update 2 FORTRAN 77 5.3 Patch 111691-07 2004/04/23 Usage: f77 [ options ] files. Use 'f77 -flags' for details basil> f90 -V f90: Sun WorkShop 6 update 2 Fortran 95 6.2 Patch 111690-10 2003/08/28 Usage: f90 [ options ] files. Use 'f90 -flags' for details basil> Chris -- Christopher Hanley Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From robert.kern at gmail.com Tue Feb 26 13:43:19 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 Feb 2008 12:43:19 -0600 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <47C455ED.1060607@stsci.edu> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> <47C455ED.1060607@stsci.edu> Message-ID: <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> On Tue, Feb 26, 2008 at 12:09 PM, Christopher Hanley wrote: > Robert Kern wrote: > > On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: > >> Greetings, > >> > >> I was wondering if within the last 8 - 10 weeks anyone has made changes > >> to the way FORTRAN compilers are detected. In the past I was able to > >> specify which compiler was used by the F77 system variable. However, I > >> am now having a f90 compiler that exists on my Solaris system detected > >> regardless of the F77 value. Even unsetting the F77 variable leads to > >> the use of the f90 compiler. > >> > >> I was going to look though the distutil change logs but I was hoping > >> that someone might remember changing something off the top of their heads. > > > > Which FORTRAN compilers do you have installed? What --fcompiler flag > > are you using? > > > Hi Robert, > > Thank you for your help. > > I am not using the --fcompiler flag. Okay, use the --fcompiler flag. That is the way to tell numpy to use a particular compiler if you have multiple ones installed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chanley at stsci.edu Tue Feb 26 13:46:54 2008 From: chanley at stsci.edu (Christopher Hanley) Date: Tue, 26 Feb 2008 13:46:54 -0500 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> <47C455ED.1060607@stsci.edu> <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> Message-ID: <47C45E9E.5030607@stsci.edu> Robert Kern wrote: > On Tue, Feb 26, 2008 at 12:09 PM, Christopher Hanley wrote: >> Robert Kern wrote: >> > On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: >> >> Greetings, >> >> >> >> I was wondering if within the last 8 - 10 weeks anyone has made changes >> >> to the way FORTRAN compilers are detected. In the past I was able to >> >> specify which compiler was used by the F77 system variable. However, I >> >> am now having a f90 compiler that exists on my Solaris system detected >> >> regardless of the F77 value. Even unsetting the F77 variable leads to >> >> the use of the f90 compiler. >> >> >> >> I was going to look though the distutil change logs but I was hoping >> >> that someone might remember changing something off the top of their heads. >> > >> > Which FORTRAN compilers do you have installed? What --fcompiler flag >> > are you using? >> > >> Hi Robert, >> >> Thank you for your help. >> >> I am not using the --fcompiler flag. > > Okay, use the --fcompiler flag. That is the way to tell numpy to use a > particular compiler if you have multiple ones installed. > What do you do if you have FORTRAN compilers installed but don't want to use any of the compilers to build numpy? -- Christopher Hanley Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From robert.kern at gmail.com Tue Feb 26 14:20:26 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 Feb 2008 13:20:26 -0600 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <47C45E9E.5030607@stsci.edu> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> <47C455ED.1060607@stsci.edu> <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> <47C45E9E.5030607@stsci.edu> Message-ID: <3d375d730802261120t7f862529p16cf2296c43dea2b@mail.gmail.com> On Tue, Feb 26, 2008 at 12:46 PM, Christopher Hanley wrote: > Robert Kern wrote: > > On Tue, Feb 26, 2008 at 12:09 PM, Christopher Hanley wrote: > >> Robert Kern wrote: > >> > On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: > >> >> Greetings, > >> >> > >> >> I was wondering if within the last 8 - 10 weeks anyone has made changes > >> >> to the way FORTRAN compilers are detected. In the past I was able to > >> >> specify which compiler was used by the F77 system variable. However, I > >> >> am now having a f90 compiler that exists on my Solaris system detected > >> >> regardless of the F77 value. Even unsetting the F77 variable leads to > >> >> the use of the f90 compiler. > >> >> > >> >> I was going to look though the distutil change logs but I was hoping > >> >> that someone might remember changing something off the top of their heads. > >> > > >> > Which FORTRAN compilers do you have installed? What --fcompiler flag > >> > are you using? > >> > > >> Hi Robert, > >> > >> Thank you for your help. > >> > >> I am not using the --fcompiler flag. > > > > Okay, use the --fcompiler flag. That is the way to tell numpy to use a > > particular compiler if you have multiple ones installed. > > > > What do you do if you have FORTRAN compilers installed but don't want to > use any of the compilers to build numpy? Unless if you are linking to a FORTRAN LAPACK or BLAS, no FORTRAN compiler should be used to build numpy. If you find that one is being used, please show us the full output log. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Tue Feb 26 14:32:05 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 26 Feb 2008 14:32:05 -0500 Subject: [Numpy-discussion] Trouble With MaskedArray and Shared Masks In-Reply-To: <525f23e80802260625y755f8610o252171a22d27fef7@mail.gmail.com> References: <525f23e80802260625y755f8610o252171a22d27fef7@mail.gmail.com> Message-ID: <200802261432.06070.pgmdevlist@gmail.com> Alexander, The rationale behind the current behavior is to avoid an accidental propagation of the mask. Consider the following example: >>>m = numpy.array([1,0,0,1,0], dtype=bool_) >>>x = numpy.array([1,2,3,4,5]) >>>y = numpy.sqrt([5,4,3,2,1]) >>>mx = masked_array(x,mask=m) >>>my = masked_array(y,mask=m) >>>mx[0] = 0 >>>print mx,my, m [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False True False] At the creation, mx._sharedmask and my._sharedmask are both True. Setting mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my. Now, >>>m = numpy.array([1,0,0,1,0], dtype=bool_) >>>x = numpy.array([1,2,3,4,5]) >>>y = numpy.sqrt([5,4,3,2,1]) >>>mx = masked_array(x,mask=m) >>>my = masked_array(y,mask=m) >>>mx._sharedmask = False >>>mx[0] = 0 >>>print mx,my, m [0 2 3 -- 5] [5 4 3 -- 1] [False False False True False] By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to update the mask of mx (that is, m), and my gets updated. Sometimes it's what you want (your case for example), often it is not: I've been bitten more than once before reintroducing the _sharedmask flag. As you've observed, setting a private flag isn't a very good idea: you should use the .unshare_mask() function instead, that copies the mask and set the _sharedmask to False. OK, in your example, copying the mask is not needed, but in more general cases, it is. At the initialization, self._sharedmask is set to (not copy). That is, if you didn't specify copy=True at the creation (the default being copy=False), self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we could introduce yet another parameter to masked_array/array/MaskedArray, share_mask, that would take a default value of True and set self._sharedmask=(not copy)&share_mask So: should we introduce this extra parameter ? In any case, I hope it helps. P. From dalcinl at gmail.com Tue Feb 26 14:44:38 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 26 Feb 2008 16:44:38 -0300 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline Message-ID: Dear all, I believe the current 'loadtxt' function is broken if file does not end in newline. The problem is at the last line of this fragment: for i,line in enumerate(fh): if i References: Message-ID: On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote: > I believe the current 'loadtxt' function is broken I agree: Cheers, Alan Isaac From oliphant at enthought.com Tue Feb 26 15:35:42 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Tue, 26 Feb 2008 14:35:42 -0600 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <47C45E9E.5030607@stsci.edu> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> <47C455ED.1060607@stsci.edu> <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> <47C45E9E.5030607@stsci.edu> Message-ID: <47C4781E.8040809@enthought.com> Christopher Hanley wrote: > Robert Kern wrote: > >> On Tue, Feb 26, 2008 at 12:09 PM, Christopher Hanley wrote: >> >>> Robert Kern wrote: >>> > On Tue, Feb 26, 2008 at 11:50 AM, Christopher Hanley wrote: >>> >> Greetings, >>> >> >>> >> I was wondering if within the last 8 - 10 weeks anyone has made changes >>> >> to the way FORTRAN compilers are detected. In the past I was able to >>> >> specify which compiler was used by the F77 system variable. However, I >>> >> am now having a f90 compiler that exists on my Solaris system detected >>> >> regardless of the F77 value. Even unsetting the F77 variable leads to >>> >> the use of the f90 compiler. >>> >> >>> >> I was going to look though the distutil change logs but I was hoping >>> >> that someone might remember changing something off the top of their heads. >>> > >>> > Which FORTRAN compilers do you have installed? What --fcompiler flag >>> > are you using? >>> > >>> Hi Robert, >>> >>> Thank you for your help. >>> >>> I am not using the --fcompiler flag. >>> >> Okay, use the --fcompiler flag. That is the way to tell numpy to use a >> particular compiler if you have multiple ones installed. >> >> > > What do you do if you have FORTRAN compilers installed but don't want to > use any of the compilers to build numpy? > You need to use it in your case because you are linking against lapack and blas that were built with a Fortran compiler. The Fortran compiler is only used in the link step of lapack_lite.so -Travis O. From cburns at berkeley.edu Tue Feb 26 18:26:34 2008 From: cburns at berkeley.edu (Christopher Burns) Date: Tue, 26 Feb 2008 15:26:34 -0800 Subject: [Numpy-discussion] masked_array/matplotlib issue with memmaps Message-ID: <764e38540802261526h5d6e6df8qdf7520a8eac789b4@mail.gmail.com> If I initialize an AxesImage using a np.zeros array and then set the axes data later to a np.memmap array, I get a RuntimeError when matplotlib tries to autoscale the image. The errors continue to fill my console and I'm forced to close the shell. This bug was introduced when I switched from numpy v1.0.3.1 to the trunk v1.0.5.dev4815 The two hacks to get around this are: 1) Setting any array element to something other than zero fixes the error: zdata[0,0] = 1 2) Specify the extent and max/min values when creating the image: imgaxes = pylab.imshow(zdata, extent=(0, data_shape[1], data_shape[0], 0), vmin=0, vmax=1) Unfortunately, due to the way this errors I'm having a difficult time debugging it. I'm hoping someone with in-depth knowledge of masked_arrays will have some insight. Code and output are below. Thanks! Chris ---- script to reproduce the bug ---- import pylab import numpy as np def printinfo(imgaxes): a = imgaxes.get_array() print '\nimgaxes array info:' print 'type', type(a) print 'shape', a.shape print 'dtype', a.dtype print 'has _mmap', hasattr(a, '_mmap') data_type = 'float32' data_shape = (30, 40) zdata = np.zeros(data_shape, dtype=data_type) #zdata[0,0] = 1 # No exception raised if this line is executed imgaxes = pylab.imshow(zdata) printinfo(imgaxes) mmdata = np.memmap('foo.dat', dtype=zdata.dtype, shape=zdata.shape, mode='w+') imgaxes.set_data(mmdata) printinfo(imgaxes) # imgaxes array now has a _mmap pylab.show() ---- version info ---- In [2]: pylab.matplotlib.__version__ Out[2]: '0.91.2' In [4]: numpy.version.version Out[4]: '1.0.5.dev4817' ---- error ---- In [26]: run memmap_reassign.py imgaxes array info: type shape (30, 40) dtype float32 has _mmap False imgaxes array info: type shape (30, 40) dtype float32 has _mmap True Exception exceptions.RuntimeError: 'maximum recursion depth exceeded' in ignored ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (10, 0)) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /Users/cburns/local/lib/python2.5/site-packages/matplotlib/backends/backend_wx.pyc in _onPaint(self, evt) 1079 self.realize() 1080 # Render to the bitmap -> 1081 self.draw(repaint=False) 1082 # Update the display using a PaintDC 1083 self.gui_repaint(drawDC=wx.PaintDC(self)) /Users/cburns/local/lib/python2.5/site-packages/matplotlib/backends/backend_wxagg.pyc in draw(self, repaint) 59 """ 60 DEBUG_MSG("draw()", 1, self) ---> 61 FigureCanvasAgg.draw(self) 62 63 self.bitmap = _convert_agg_to_wx_bitmap(self.get_renderer(), None) /Users/cburns/local/lib/python2.5/site-packages/matplotlib/backends/backend_agg.pyc in draw(self) 356 357 self.renderer = self.get_renderer() --> 358 self.figure.draw(self.renderer) 359 360 def get_renderer(self): /Users/cburns/local/lib/python2.5/site-packages/matplotlib/figure.pyc in draw(self, renderer) 622 623 # render the axes --> 624 for a in self.axes: a.draw(renderer) 625 626 # render the figure text /Users/cburns/local/lib/python2.5/site-packages/matplotlib/axes.pyc in draw(self, renderer, inframe) 1303 mag = renderer.get_image_magnification() 1304 ims = [(im.make_image(mag),0,0) -> 1305 for im in self.images if im.get_visible()] 1306 1307 /Users/cburns/local/lib/python2.5/site-packages/matplotlib/image.pyc in make_image(self, magnification) 129 im.is_grayscale = False 130 else: --> 131 x = self.to_rgba(self._A, self._alpha) 132 im = _image.fromarray(x, 0) 133 if len(self._A.shape) == 2: /Users/cburns/local/lib/python2.5/site-packages/matplotlib/cm.pyc in to_rgba(self, x, alpha, bytes) 74 x = ma.asarray(x) 75 x = self.norm(x) ---> 76 x = self.cmap(x, alpha=alpha, bytes=bytes) 77 return x 78 /Users/cburns/local/lib/python2.5/site-packages/matplotlib/colors.pyc in __call__(self, X, alpha, bytes) 431 vtype = 'array' 432 xma = ma.asarray(X) --> 433 xa = xma.filled(0) 434 mask_bad = ma.getmask(xma) 435 if xa.dtype.char in npy.typecodes['Float']: /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in filled(self, fill_value) 1542 m = self._mask 1543 if m is nomask or not m.any(): -> 1544 return self._data 1545 # 1546 if fill_value is None: /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _get_data(self) 1472 1473 """ -> 1474 return self.view(self._baseclass) 1475 _data = property(fget=_get_data) 1476 data = property(fget=_get_data) /Users/cburns/local/lib/python2.5/site-packages/numpy/core/memmap.pyc in __array_finalize__(self, obj) 204 self._mmap = obj._mmap 205 else: --> 206 raise ValueError, 'Cannot create a memmap from object %s'%obj 207 else: 208 self._mmap = None /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __str__(self) 1614 m = self._mask 1615 if m is nomask: -> 1616 res = self._data 1617 else: 1618 if m.shape == (): /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _get_data(self) 1472 1473 """ -> 1474 return self.view(self._baseclass) 1475 _data = property(fget=_get_data) 1476 data = property(fget=_get_data) /Users/cburns/local/lib/python2.5/site-packages/numpy/core/memmap.pyc in __array_finalize__(self, obj) 204 self._mmap = obj._mmap 205 else: --> 206 raise ValueError, 'Cannot create a memmap from object %s'%obj 207 else: 208 self._mmap = None /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __str__(self) 1614 m = self._mask 1615 if m is nomask: -> 1616 res = self._data 1617 else: 1618 if m.shape == (): /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _get_data(self) 1472 1473 """ -> 1474 return self.view(self._baseclass) 1475 _data = property(fget=_get_data) 1476 data = property(fget=_get_data) /Users/cburns/local/lib/python2.5/site-packages/numpy/core/memmap.pyc in __array_finalize__(self, obj) 204 self._mmap = obj._mmap 205 else: --> 206 raise ValueError, 'Cannot create a memmap from object %s'%obj 207 else: 208 self._mmap = None .... [snip] /Users/cburns/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _get_data(self) 1472 1473 """ -> 1474 return self.view(self._baseclass) 1475 _data = property(fget=_get_data) 1476 data = property(fget=_get_data) RuntimeError: maximum recursion depth exceeded Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in ignored Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in ignored [snip] Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in ignored Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in ignored [snip] From robert.kern at gmail.com Tue Feb 26 19:18:17 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 Feb 2008 18:18:17 -0600 Subject: [Numpy-discussion] masked_array/matplotlib issue with memmaps In-Reply-To: <764e38540802261526h5d6e6df8qdf7520a8eac789b4@mail.gmail.com> References: <764e38540802261526h5d6e6df8qdf7520a8eac789b4@mail.gmail.com> Message-ID: <3d375d730802261618h3514c89xe5b4dd16c0a8cc9c@mail.gmail.com> On Tue, Feb 26, 2008 at 5:26 PM, Christopher Burns wrote: > If I initialize an AxesImage using a np.zeros array and then set the > axes data later to a np.memmap array, I get a RuntimeError when > matplotlib tries to autoscale the image. The errors continue to fill > my console and I'm forced to close the shell. This bug was introduced > when I switched from numpy v1.0.3.1 to the trunk v1.0.5.dev4815 > > The two hacks to get around this are: > 1) Setting any array element to something other than zero fixes the error: > zdata[0,0] = 1 > 2) Specify the extent and max/min values when creating the image: > imgaxes = pylab.imshow(zdata, extent=(0, data_shape[1], > data_shape[0], 0), vmin=0, vmax=1) > > Unfortunately, due to the way this errors I'm having a difficult time > debugging it. I'm hoping someone with in-depth knowledge of > masked_arrays will have some insight. > Exception exceptions.AttributeError: "'memmap' object has no attribute > '_mmap'" in 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., > 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., > 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., > 0., 0.])> ignored Some operations on numpy.memmap objects create new arrays, but unfortunately, the new array objects, which should be ndarrays, are created as numpy.memmap instances even though they aren't. When they go to clean up after themselves, they fail. A workaround would be to make numpy.memmap.__del__ more robust and do nothing if ._mmap isn't present. A real fix would be to figure out how to make sure that "memmap+memmap", etc., make ndarray instances rather than memmap instances. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Tue Feb 26 22:05:33 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 27 Feb 2008 12:05:33 +0900 Subject: [Numpy-discussion] #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?) In-Reply-To: References: Message-ID: <47C4D37D.1020808@ar.media.kyoto-u.ac.jp> Manos Pontikakis wrote: > Hello, > > I am trying to install numpy 1.0.4 to the following machine: > > $ uname -a > Linux myhostname 2.6.9-55.ELsmp #1 SMP Sat Apr 21 11:16:24 EDT 2007 > x86_64 x86_64 x86_64 GNU/Linux > > > and I am getting the error that appears in the subject of this email. > It appears that BLAS and LAPACK libraries cannot be found as well, but > I think numpy still shouldn't have problem with that. There must be > some problem with 64 bit. Please see the complete log. > > ------------------------------------------------------------------------------------------- > Running from numpy source directory. > F2PY Version 2_4422 > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > NOT AVAILABLE > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > NOT AVAILABLE > > /.../numpy-1.0.4/numpy/distutils/system_info.py:1340: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > blas_info: > libraries blas not found in /usr/local/lib > FOUND: > libraries = ['blas'] > library_dirs = ['/usr/lib'] > language = f77 > > FOUND: > libraries = ['blas'] > library_dirs = ['/usr/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_threads_info > NOT AVAILABLE > > atlas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_info > NOT AVAILABLE > > /.../numpy-1.0.4/numpy/distutils/system_info.py:1247: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > lapack_info: > libraries lapack not found in /usr/local/lib > FOUND: > libraries = ['lapack'] > library_dirs = ['/usr/lib'] > language = f77 > > FOUND: > libraries = ['lapack', 'blas'] > library_dirs = ['/usr/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > running install > running build > running config_cc > unifing config_cc, config, build_clib, build_ext, build commands > --compiler options > running config_fc > unifing config_fc, config, build_clib, build_ext, build commands > --fcompiler options > running build_src > building py_modules sources > building extension "numpy.core.multiarray" sources > Generating build/src.linux-x86_64-2.5/numpy/core/config.h > customize GnuFCompiler > Could not locate executable g77 > Could not locate executable f77 > customize IntelFCompiler > Could not locate executable ifort > Could not locate executable ifc > customize LaheyFCompiler > Could not locate executable lf95 > customize PGroupFCompiler > Could not locate executable pgf90 > Could not locate executable pgf77 > customize AbsoftFCompiler > Could not locate executable f90 > customize NAGFCompiler > Could not locate executable f95 > customize VastFCompiler > customize GnuFCompiler > customize CompaqFCompiler > Could not locate executable fort > customize IntelItaniumFCompiler > Could not locate executable efort > Could not locate executable efc > customize IntelEM64TFCompiler > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > > customize G95FCompiler > Could not locate executable g95 > don't know how to compile Fortran code on platform 'posix' > C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall > -Wstrict-prototypes -fPIC > > compile options: '-I/.../python2.5 -Inumpy/core/src > -Inumpy/core/include -I/.../python2.5 -c' > gcc: _configtest.c > In file included from /.../python2.5/Python.h:57, > from _configtest.c:2: > */.../python2.5/pyport.h:730:2: #error "LONG_BIT definition appears > wrong for platform (bad gcc/glibc config?)."* That looks strange. First, I am surprised by /.../: is this recognized by the shell ? It also looks like you are using a python built from source. In this case, are you sure you built it correctly ? If you did not build python from sources, could you tell us your exact configuration ? (OS + distribution, CPU, etc...) ? cheers, David From david at ar.media.kyoto-u.ac.jp Tue Feb 26 22:10:52 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 27 Feb 2008 12:10:52 +0900 Subject: [Numpy-discussion] FORTRAN compiler detection In-Reply-To: <47C4781E.8040809@enthought.com> References: <47C45159.2080900@stsci.edu> <3d375d730802261002x42959cceja2003d56930abaa5@mail.gmail.com> <47C455ED.1060607@stsci.edu> <3d375d730802261043v4c4a8b26wc33568b95f095fc9@mail.gmail.com> <47C45E9E.5030607@stsci.edu> <47C4781E.8040809@enthought.com> Message-ID: <47C4D4BC.5090407@ar.media.kyoto-u.ac.jp> Travis E. Oliphant wrote: > You need to use it in your case because you are linking against lapack > and blas that were built with a Fortran compiler. The Fortran compiler > is only used in the link step of lapack_lite.so > If for some reasons, Christopher does not want to use fortran compiler at all, I just want to mention that on solaris (and linux), it should be possible to link with BLAS/LAPACK without a fortran compiler, if using sun performance libraries, since they do not depend on fortran at all (but do provide fortran calling convention). But it is certainly easier to use a fortran compiler, since this configuration is supported out of the box. cheers, David From efiring at hawaii.edu Wed Feb 27 02:02:34 2008 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 26 Feb 2008 21:02:34 -1000 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: References: Message-ID: <47C50B0A.2000001@hawaii.edu> Trond, See if the attached file contains something close to what you need. It has no loops at all; I have not timed it, but it should be quite quick. I have given it only a cursory check, so I don't guarantee it works correctly. Depending on how your particular NetCDF interface works, you might need to copy arrays from NetCDF to ensure they are genuine ndarray objects. For plotting, you might want to try matplotlib. I think you will find it easier to use than GMT, especially if you are accustomed to matlab. Eric Trond Kristiansen wrote: > Hi again. > > I have attached the function that the FOR loop is part of as a python file. > What I am trying to do is to create a set of functions that will read the > output files (NetCDF) from running the ROMS model (ocean model). The output > file is organized in xi (x-direction), eta (y-direction), and s > (z-direction) where the s-values are vertical layers and not depth. This > particular function (z_slice) will find the closest upper and lower s-layer > for a given depth in meters (e.g. -100). Then values from the two selcted > layers will be interpolated to create a new layer at the selected depth > (-100). The problem is that the s-layers follow the bathymetry and a > particular s-layer will therefore sometimes be above and sometimes below the > selected depth that we want to interpolate to. That's why I need a quick > script that searches all of the layers and find the upper and lower layers > for a given depth value (which is negative). The z_r is a 3D array > (s,eta,xi) that is created using the netcdf module. > > The main goal of these set of functions is to move away from using matlab, > but also to speed things up. The sliced data array will be plotted using GMT > or pyNGL. > > Thanks for helping me. Cheers, Trond > -------------- next part -------------- A non-text attachment was scrubbed... Name: bracket.py Type: text/x-python Size: 1673 bytes Desc: not available URL: From ndbecker2 at gmail.com Wed Feb 27 05:53:09 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 27 Feb 2008 05:53:09 -0500 Subject: [Numpy-discussion] A little help please? References: <47C42CB2.7080007@enthought.com> Message-ID: Travis E. Oliphant wrote: > Neal Becker wrote: >> My user-defined type project has mostly gone well, but I'm stuck on >> mixed-type arithmetic. >> >> I have 2 types: cmplx_int32 and cmplx_int64. I have added basic >> arithmetic for those types, and for mix of those arrays and their >> respective scalars. But mixed arithmetic only partly works > This is an area that needs testing and possible fixes. The relevant > code is in ufuncobject.c (select_types) and in multiarraymodule.c > (PyArray_CanCoerceScalar). If you can go through that code you may be > able to see what the problem is and let us know. > > I tried to support this kind of thing you are doing, but I'm not sure > how well I succeeded because I didn't have time or the code to test it > with. Thus, there is still some work to do. > > The fact that radd is not called is because ufuncs try to handle > everything (the ufunc is more general than just the functions with "r" > prefixes. I think one problem may be due to the fact that the first > argument to a ufunc is the one that defines the search for the correctly > registered function and there may be no code to allow other arguments to > direct the search should that one fail. > > I'm actually pleased you've gotten this far. I'll keep trying to help > as I get time. > The code for this is a bit hard to understand. It does appear that it only searches for a conversion on the 2nd argument. I don't think that's desirable behavior. What I'm wondering is, this works fine for builtin types. What is different in the handling of builtin types? From trond at unc.edu Wed Feb 27 09:05:06 2008 From: trond at unc.edu (Trond Kristiansen) Date: Wed, 27 Feb 2008 09:05:06 -0500 Subject: [Numpy-discussion] Optimize speed of for loop using numpy In-Reply-To: <47C50B0A.2000001@hawaii.edu> Message-ID: Hey all. I would just like to thank you all for extremely good feedback on my problem with optimizing loops. Thank you all for being so helpful. Cheers, Trond From lxander.m at gmail.com Wed Feb 27 09:34:41 2008 From: lxander.m at gmail.com (Alexander Michael) Date: Wed, 27 Feb 2008 09:34:41 -0500 Subject: [Numpy-discussion] Trouble With MaskedArray and Shared Masks In-Reply-To: <200802261432.06070.pgmdevlist@gmail.com> References: <525f23e80802260625y755f8610o252171a22d27fef7@mail.gmail.com> <200802261432.06070.pgmdevlist@gmail.com> Message-ID: <525f23e80802270634s74543f3eqc18e14b9616003cf@mail.gmail.com> On Tue, Feb 26, 2008 at 2:32 PM, Pierre GM wrote: > Alexander, > The rationale behind the current behavior is to avoid an accidental > propagation of the mask. Consider the following example: > > >>>m = numpy.array([1,0,0,1,0], dtype=bool_) > >>>x = numpy.array([1,2,3,4,5]) > >>>y = numpy.sqrt([5,4,3,2,1]) > >>>mx = masked_array(x,mask=m) > >>>my = masked_array(y,mask=m) > >>>mx[0] = 0 > >>>print mx,my, m > [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False True False] > > At the creation, mx._sharedmask and my._sharedmask are both True. Setting > mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my. > > Now, > >>>m = numpy.array([1,0,0,1,0], dtype=bool_) > >>>x = numpy.array([1,2,3,4,5]) > >>>y = numpy.sqrt([5,4,3,2,1]) > >>>mx = masked_array(x,mask=m) > >>>my = masked_array(y,mask=m) > >>>mx._sharedmask = False > >>>mx[0] = 0 > >>>print mx,my, m > [0 2 3 -- 5] [5 4 3 -- 1] [False False False True False] > > By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to > update the mask of mx (that is, m), and my gets updated. Sometimes it's what > you want (your case for example), often it is not: I've been bitten more than > once before reintroducing the _sharedmask flag. > > As you've observed, setting a private flag isn't a very good idea: you should > use the .unshare_mask() function instead, that copies the mask and set the > _sharedmask to False. OK, in your example, copying the mask is not needed, > but in more general cases, it is. > > At the initialization, self._sharedmask is set to (not copy). That is, if you > didn't specify copy=True at the creation (the default being copy=False), > self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we > could introduce yet another parameter to masked_array/array/MaskedArray, > share_mask, that would take a default value of True and set > self._sharedmask=(not copy)&share_mask Thank you for your thorough explanation. I was providing the mask array to the constructor in order to do my own allocating, mostly to ensure that the MaskedArray had a dense mask that *wouldn't* be replaced with a copy without my intentional instruction. I didn't realize that the MaskedArray was not taking ownership of provided mask (even though copy was False) because the implied usage for providing the mask explicitly is to read-only alias another MaskedArray's mask. I was working against my own goal! Now that I understand a little better, the easiest/betst thing for me to do is change the way I create the MaskedArray to: >>> a = numpy.ma.MaskedArray( ... data=numpy.zeros((4,5), dtype=float), ... mask=True, ... fill_value=0.0 ... ) This appears to cause MaskedArray to create a dense mask which persists (i.e. isn't replaced by a copy) for the lifetime of the MaskedArray. > So: should we introduce this extra parameter ? The propagation semantics and mechanics are definitely tricky, especially considering that it seems that the "right behavior" is context dependent. Are the mask propagation rules spelled out anywhere (aside from the code! :-))? I could see some potential value to an additional argument, but the constructor is already quite complicated so I'm reluctant to say "Yes" outright, especially with my current level of understanding. At the very least, perhaps the doc-string should be amended to include the note that if a mask is provided, it is assumed to be shared and a copy of it will be made when/if it is modified. How does the keep_mask option play into this? I don't understand what that one does yet. Thanks! Alex From oliphant at enthought.com Wed Feb 27 09:50:57 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 27 Feb 2008 08:50:57 -0600 Subject: [Numpy-discussion] A little help please? In-Reply-To: References: <47C42CB2.7080007@enthought.com> Message-ID: <47C578D1.5060307@enthought.com> Neal Becker wrote: > Travis E. Oliphant wrote: > > > > > The code for this is a bit hard to understand. It does appear that it only > searches for a conversion on the 2nd argument. I don't think that's > desirable behavior. > > What I'm wondering is, this works fine for builtin types. What is different > in the handling of builtin types? > There are quite a few differences which lead to the current issues. 1) For built-in types there is a coercion order that can be searched more intelligently which does not exist for user-defined types. 2) For built-in types all the 1d loops are stored in a single C-array in the same order as the signatures. The entire signature list is scanned until a signature to which all inputs can be cast is found. 3) For user-defined types the 1d loops (functions) for a particular user-defined type are stored in a linked-list that itself is stored in a Python dictionary (as a C-object) attached to the ufunc and keyed by the user-defined type (of the first argument). Thus, what is missing is code to search all the linked lists in all the entries of all the user-defined types on input (only the linked-list keyed by the first user-defined type is searched at the moment). This would allow similar behavior to the built-in types (but a bit more expensive searching). -Travis O. From pgmdevlist at gmail.com Wed Feb 27 10:54:35 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 Feb 2008 10:54:35 -0500 Subject: [Numpy-discussion] Trouble With MaskedArray and Shared Masks In-Reply-To: <525f23e80802270634s74543f3eqc18e14b9616003cf@mail.gmail.com> References: <525f23e80802260625y755f8610o252171a22d27fef7@mail.gmail.com> <200802261432.06070.pgmdevlist@gmail.com> <525f23e80802270634s74543f3eqc18e14b9616003cf@mail.gmail.com> Message-ID: <200802271054.35807.pgmdevlist@gmail.com> Alexander, > create the MaskedArray to: > >>> a = numpy.ma.MaskedArray( > > ... data=numpy.zeros((4,5), dtype=float), > ... mask=True, > ... fill_value=0.0 > ... ) By far the easiest indeed. > > So: should we introduce this extra parameter ? > > The propagation semantics and mechanics are definitely tricky, > especially considering that it seems that the "right behavior" is > context dependent. Are the mask propagation rules spelled out anywhere > (aside from the code! :-))? Mmh, no: we tried to avoid mask propagation as much as possible, as it can have some fairly disastrous side-effects. In other terms, no propagation by default when a mask is shared, propagation when the mask is not shared. > I could see some potential value to an > additional argument, but the constructor is already quite complicated > so I'm reluctant to say "Yes" outright, especially with my current > level of understanding. Yes, there are already a lot of parameters, some more useful than others: hard_mask : if True, prevent a masked value to be accidentally unmasked. shrink: if True, force a mask full of False to nomask keep_mask : when creating a new masked_array for an existing one, specifies whether the old mask should be taken into account or not. By default, keep_mask is True For example: >>>import numpy.mas as ma >>>x=ma.array([1,2,3,4,5],mask=[1,0,0,1,0]) >>>y=ma.array(x) >>>y masked_array(data = [-- 2 3 -- 5], mask = [ True False False True False], fill_value=999999) We just inherited the mask from x: y._mask and x._mask are the same object, and y._sharedmask is True. Now, let's change keep_mask to False >>>y=ma.array(x,keep_mask=False) >>>y masked_array(data = [1 2 3 4 5], mask = False, fill_value=999999) We keep the data from x, but we force the mask to the default (viz, nomask) Now for some more fun: remember that we keep the mask by defulat >>>y=ma.array(x,mask=[0,0,0,0,1]) >>>y masked_array(data = [-- 2 3 -- --], mask = [ True False False True True], fill_value=999999) We kept the mask of x ([1,0,0,1,0]) and combined it with our new mask ([0,0,0,0,1]), so y._mask=[1,0,0,1,1] If you really want [0,0,0,0,1] as a mask, just drop the initial mask: >>>y=ma.array(x,mask=[0,0,0,0,1], keep_mask=False) >>>y masked_array(data = [1 2 3 4 --], mask = [False False False False True], fill_value=999999) > At the very least, perhaps the doc-string > should be amended to include the note that if a mask is provided, it > is assumed to be shared and a copy of it will be made when/if it is > modified. Sounds like a good idea. is there a wiki page for MaskedArrays somewhere ? If not, Alexander, feel free to start one from your experience, I'll update if needed. From david.huard at gmail.com Wed Feb 27 10:50:35 2008 From: david.huard at gmail.com (David Huard) Date: Wed, 27 Feb 2008 10:50:35 -0500 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: References: Message-ID: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> I can look at it. Would everyone be satisfied with a solution using regular expressions ? That is, looking for the following pattern: pattern = re.compile(r""" ^\s* # leading white space (.*) # Data %s? # Zero or one comment character (.*) # Comments \s*$ # Trailing white space """%comments, re.VERBOSE) match = pattern.search(line) line, comment = match.groups() instead of line = line[:line.find(comments)].strip() By the way, is there a test function for loadtxt and savetxt ? I couldn't find one. David 2008/2/26, Alan G Isaac : > > On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote: > > I believe the current 'loadtxt' function is broken > > > I agree: > http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html > > > > Cheers, > > Alan Isaac > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Wed Feb 27 11:26:28 2008 From: david.huard at gmail.com (David Huard) Date: Wed, 27 Feb 2008 11:26:28 -0500 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> Message-ID: <91cf711d0802270826ld06b816s2da2ebe7cb6bff4f@mail.gmail.com> Lisandro, When you have some time, could you check this patch solves your problem (and does not introduce new ones) ? David Index: numpy/lib/io.py =================================================================== --- numpy/lib/io.py (revision 4824) +++ numpy/lib/io.py (working copy) @@ -11,6 +11,7 @@ import cStringIO import tempfile import os +import re from cPickle import load as _cload, loads from _datasource import DataSource @@ -291,9 +292,12 @@ converterseq = [_getconv(dtype.fields[name][0]) \ for name in dtype.names] + # Remove comments and leading/trailing white space + pattern = re.compile(comments) for i,line in enumerate(fh): if i: > > I can look at it. > > Would everyone be satisfied with a solution using regular expressions ? > That is, looking for the following pattern: > > pattern = re.compile(r""" > ^\s* # leading white space > (.*) # Data > %s? # Zero or one comment character > (.*) # Comments > \s*$ # Trailing white space > """%comments, re.VERBOSE) > > match = pattern.search(line) > line, comment = match.groups() > > instead of > > line = line[:line.find(comments)].strip() > > By the way, is there a test function for loadtxt and savetxt ? I couldn't > find one. > > > David > > 2008/2/26, Alan G Isaac : > > > > On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote: > > > I believe the current 'loadtxt' function is broken > > > > > > I agree: > > > http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html > > > > > > > Cheers, > > > > Alan Isaac > > > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Feb 27 12:41:57 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Feb 2008 09:41:57 -0800 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> Message-ID: <47C5A0E5.9050302@noaa.gov> David Huard wrote: > Would everyone be satisfied with a solution using regular expressions ? Maybe it's because regular expressions make me itch, but I think it's overkill for this. The issue here is a result of what I consider a wart in python's string methods -- string.find() returns a valid index( -1 ) when it fails to find anything. The usual way to work with this is to test for it: print "test for comment not found:" for line in SampleLines: i = line.find(comments) if i == -1: line = line.strip() else: line = line[:i].strip() print line which does seem like a lot of extra code. In this case, that wasn't' done, as most of the time there is a newline at the end that can be thrown away anyway, so the -1 index is OK. So that inspired the following solution -- just add an extra space every time: print "simply pad the line with a space:" for line in SampleLines: line += " " line = line[:(line).find(comments)].strip() print line an extra string creation, but simple. > pattern = re.compile(r""" > ^\s* # leading white space > (.*) # Data > %s? # Zero or one comment character > (.*) # Comments > \s*$ # Trailing white space > """%comments, re.VERBOSE) This pattern fails if the last character of the line is a comment character, and if it is a comment only line, though I'm sure that could be fixed. I still prefer the python string methods approaches, though. I've enclosed a little test code, that gives these results: old way -- this fails with no comment of newline 1 2 3 4 5 1 2 3 4 1 2 3 4 5 with regular expression: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5# # 1 2 3 4 5 simply pad the line with a space: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 test for comment not found: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 My suggestions work on all my test cases. We really should put these, and others, into a real unit test when this fix is added. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: Test-loadtxt.py Type: text/x-python Size: 1365 bytes Desc: not available URL: From david.huard at gmail.com Wed Feb 27 13:22:55 2008 From: david.huard at gmail.com (David Huard) Date: Wed, 27 Feb 2008 13:22:55 -0500 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <47C5A0E5.9050302@noaa.gov> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> Message-ID: <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> Hi Christopher, The advantage of using regular expressions is that in this case it gives you some flexibility that wasn't there before. For instance, if for any reason there are two type of characters that coexist in the file to mark comments, using pattern = re.compile(comments) for i,line in enumerate(fh): if i: > > David Huard wrote: > > Would everyone be satisfied with a solution using regular expressions ? > > > Maybe it's because regular expressions make me itch, but I think it's > overkill for this. > > The issue here is a result of what I consider a wart in python's string > methods -- string.find() returns a valid index( -1 ) when it fails to > find anything. The usual way to work with this is to test for it: > > print "test for comment not found:" > for line in SampleLines: > i = line.find(comments) > if i == -1: > line = line.strip() > else: > line = line[:i].strip() > print line > > which does seem like a lot of extra code. > > In this case, that wasn't' done, as most of the time there is a newline > at the end that can be thrown away anyway, so the -1 index is OK. So > that inspired the following solution -- just add an extra space every > time: > > print "simply pad the line with a space:" > for line in SampleLines: > line += " " > > line = line[:(line).find(comments)].strip() > > print line > > an extra string creation, but simple. > > > > pattern = re.compile(r""" > > ^\s* # leading white space > > (.*) # Data > > %s? # Zero or one comment character > > (.*) # Comments > > \s*$ # Trailing white space > > """%comments, re.VERBOSE) > > > This pattern fails if the last character of the line is a comment > character, and if it is a comment only line, though I'm sure that could > be fixed. I still prefer the python string methods approaches, though. > > I've enclosed a little test code, that gives these results: > > old way -- this fails with no comment of newline > 1 2 3 4 5 > 1 2 3 4 > 1 2 3 4 5 > > with regular expression: > 1 2 3 4 5 > 1 2 3 4 5 > 1 2 3 4 5# > # 1 2 3 4 5 > simply pad the line with a space: > 1 2 3 4 5 > 1 2 3 4 5 > 1 2 3 4 5 > > test for comment not found: > 1 2 3 4 5 > 1 2 3 4 5 > 1 2 3 4 5 > > My suggestions work on all my test cases. We really should put these, > and others, into a real unit test when this fix is added. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Wed Feb 27 14:58:03 2008 From: aisaac at american.edu (Alan Isaac) Date: Wed, 27 Feb 2008 14:58:03 -0500 Subject: [Numpy-discussion] =?iso-8859-1?q?loadtxt_broken_if_file_does_not?= =?iso-8859-1?q?_end=09in=09newline?= In-Reply-To: <47C5A0E5.9050302@noaa.gov> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com><47C5A0E5.9050302@noaa.gov> Message-ID: On Wed, 27 Feb 2008, Christopher Barker wrote: > The issue here is a result of what I consider a wart in python's string > methods -- string.find() returns a valid index( -1 ) when > it fails to find anything. Use index instead? Cheers, Alan Isaac From Chris.Barker at noaa.gov Wed Feb 27 15:16:35 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Feb 2008 12:16:35 -0800 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> Message-ID: <47C5C523.20505@noaa.gov> Alan Isaac wrote: > Use index instead? yup, that'll work. enclosed is another test file, with that and one using string.split(comments) instead. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: Test-loadtxt.py Type: text/x-python Size: 2080 bytes Desc: not available URL: From Chris.Barker at noaa.gov Wed Feb 27 15:19:28 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Feb 2008 12:19:28 -0800 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> Message-ID: <47C5C5D0.6090400@noaa.gov> David Huard wrote: > The advantage of using regular expressions is that in this case it gives > you some flexibility that wasn't there before. For instance, if for any > reason there are two type of characters that coexist in the file to mark > comments, using > pattern = re.compile(comments) > can take care of that automatically if comments is a regular expression. OK -- but loadtxt() doesn't support that now anyway. I'm not writing the code, nor using it at the moment, so It's fine with me either way, but the re should certainly support the examples I gave that don't work now. (plus probably others, that's not a comprehensive list of possibilities.) -CHB > 2008/2/27, Christopher Barker This pattern fails if the last character of the line is a comment > character, and if it is a comment only line -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From dalcinl at gmail.com Wed Feb 27 16:02:18 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 27 Feb 2008 18:02:18 -0300 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <47C5C5D0.6090400@noaa.gov> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> Message-ID: Well, after all that said, I'm also fine with either approach. Anyway, I would say that my personal preference is for the one using 'str.index', as it is the simplest one regarding the old code. Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue made a coworker to get crazy (he is a newby in python/numpy). BTW, I'm pretty sure that some time ago Guido agreed about the removal of str.find for Py3k, but it is still there in py3k-repo. Feel free to ask at python-dev if any of you consider it appropriate. Regards, On 2/27/08, Christopher Barker wrote: > David Huard wrote: > > The advantage of using regular expressions is that in this case it gives > > you some flexibility that wasn't there before. For instance, if for any > > reason there are two type of characters that coexist in the file to mark > > comments, using > > > pattern = re.compile(comments) > > > can take care of that automatically if comments is a regular expression. > > > OK -- but loadtxt() doesn't support that now anyway. I'm not writing the > code, nor using it at the moment, so It's fine with me either way, but > the re should certainly support the examples I gave that don't work now. > (plus probably others, that's not a comprehensive list of possibilities.) > > -CHB > > > > 2008/2/27, Christopher Barker > > > This pattern fails if the last character of the line is a comment > > character, and if it is a comment only line > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From oliphant at enthought.com Wed Feb 27 17:04:55 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 27 Feb 2008 16:04:55 -0600 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> Message-ID: <47C5DE87.9090102@enthought.com> Lisandro Dalcin wrote: > Well, after all that said, I'm also fine with either approach. Anyway, > I would say that my personal preference is for the one using > 'str.index', as it is the simplest one regarding the old code. > > Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue > made a coworker to get crazy (he is a newby in python/numpy). > > BTW, I'm pretty sure that some time ago Guido agreed about the removal > of str.find for Py3k, but it is still there in py3k-repo. Feel free to > ask at python-dev if any of you consider it appropriate. > > Did this discussion resolve with a fix that can go in before 1.0.5 is released? -Travis O. From robert.kern at gmail.com Wed Feb 27 17:56:57 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Feb 2008 16:56:57 -0600 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <47C5DE87.9090102@enthought.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> Message-ID: <3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com> On Wed, Feb 27, 2008 at 4:04 PM, Travis E. Oliphant wrote: > Did this discussion resolve with a fix that can go in before 1.0.5 is > released? Fixed in r4827. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sdb at cloud9.net Wed Feb 27 18:10:03 2008 From: sdb at cloud9.net (Stuart Brorson) Date: Wed, 27 Feb 2008 18:10:03 -0500 (EST) Subject: [Numpy-discussion] Handling of numpy.power(0, ) Message-ID: I have been poking at the limits of NumPy's handling of powers of zero. I find some results which are disturbing, at least to me. Here they are: In [67]: A = numpy.array([0, 0, 0]) In [68]: B = numpy.array([-1, 0, 1+1j]) In [69]: numpy.power(A, B) Out[69]: array([ 0.+0.j, 1.+0.j, 0.+0.j]) IMO, the answers should be [Inf, NaN, and NaN]. The reasons: ** 0^-1 is 1/0, which is infinity. Not much argument here, I would think. ** 0^0: This is problematic. People smarter than I have argued for both NaN and for 1, although I understand that 1 is the preferred value nowadays. If the NumPy gurus also think so, then I buy it. ** 0^(x+y*i): This one is tricky; please bear with me and I'll walk through the reason it should be NaN. In general, one can write a^(x+y*i) = (r exp(i*theta))^(x+y*i) where r, theta, x, and y are all reals. Then, this expression can be rearranged as: (r^x) * (r^i*y) * exp(i*theta*(x+y*i)) = (r^x) * (r^i*y) * exp(i*theta*x) * exp(-theta*y) Now consider what happens to each term if r = 0. -- r^x is either 0^ = 1, or 0^ = Inf. -- r^(i*y) = exp(i*y*ln(r)). If y != 0 (i.e. complex power), then taking the ln of r = 0 is -Inf. But what's exp(i*-Inf)? It's probably NaN, since nothing else makes sense. Note that if y == 0 (real power), then this term is still NaN (y*ln(r) = 0*ln(0) = Nan). However, by convention, 0^ is something other than NaN. -- exp(i*theta*x) is just a complex number. -- exp(-theta*y) is just a real number. Therefore, for 0^ we have Inf * NaN * * , which is NaN. Another observation to chew on. I know NumPy != Matlab, but FWIW, here's what Matlab says about these values: >> A = [0, 0, 0] A = 0 0 0 >> B = [-1, 0, 1+1*i] B = -1.0000 0 1.0000 + 1.0000i >> A .^ B ans = Inf 1.0000 NaN + NaNi Any reactions to this? Does NumPy just make library calls when computing power, or does it do any trapping of corner cases? And should the returns from power conform to the above suggestions? Regards, Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/ From Chris.Barker at noaa.gov Wed Feb 27 19:31:50 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Feb 2008 16:31:50 -0800 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> <3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com> Message-ID: <47C600F6.1080108@noaa.gov> Robert Kern wrote: > Fixed in r4827. Thanks Robert. For the record, this is the fixed version: comment_start = line.find(comments) if comment_start > 0: line = line[:comments_start].strip() else: line = line.strip() Just as a matter of interest, why this, rather than line.index()? Are exceptions slower than an if test? Also, I don't see any io tests in: numpy/lib/tests Is that where they should be? It seems like a good idea to have a few... If I did find the time to write some tests -- how does one go about it for this sort of thing? Do I put a couple sample input files in SVN? Or does the test code write out the sample files, then read them in to test? Or maybe do it all in memory with sStringIO or something. Are there any examples of tests of file reading code that I could borrow from? thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From robert.kern at gmail.com Wed Feb 27 19:45:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Feb 2008 18:45:20 -0600 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: <3d375d730802271645n46ff4152v7c766230fd04867d@mail.gmail.com> On Wed, Feb 27, 2008 at 5:10 PM, Stuart Brorson wrote: > I have been poking at the limits of NumPy's handling of powers of > zero. I find some results which are disturbing, at least to me. > Here they are: > > In [67]: A = numpy.array([0, 0, 0]) > > In [68]: B = numpy.array([-1, 0, 1+1j]) > > In [69]: numpy.power(A, B) > Out[69]: array([ 0.+0.j, 1.+0.j, 0.+0.j]) > > IMO, the answers should be [Inf, NaN, and NaN]. The reasons: > > ** 0^-1 is 1/0, which is infinity. Not much argument here, I would > think. I believe the failure is occurring because of the coercion to complex. With plain floats: In [14]: zeros(2) ** array([-1.0, 0.0]) Out[14]: array([ Inf, 1.]) > ** 0^0: This is problematic. People smarter than I have argued for > both NaN and for 1, although I understand that 1 is the preferred > value nowadays. If the NumPy gurus also think so, then I buy it. Python gives 1.0: In [12]: 0.0 ** 0.0 Out[12]: 1.0 I'm not sure about the reasons for this, but I'm willing to assume that they're acceptable. > ** 0^(x+y*i): This one is tricky; please bear with me and I'll walk > through the reason it should be NaN. > > In general, one can write a^(x+y*i) = (r exp(i*theta))^(x+y*i) where > r, theta, x, and y are all reals. Then, this expression can be > rearranged as: > > (r^x) * (r^i*y) * exp(i*theta*(x+y*i)) > > = (r^x) * (r^i*y) * exp(i*theta*x) * exp(-theta*y) > > Now consider what happens to each term if r = 0. You could probably stop the analysis here. If a=0, then theta is already undefined. I believe that NaN+NaN*j is the correct answer. The relevant function is nc_pow() in numpy/core/src/umathmodule.c. The problem is that a=(0+0j) is special-cased incorrectly: if (ar == 0. && ai == 0.) { r->real = 0.; r->imag = 0.; return; } The preceding if clause (br == 0. && bi == 0.) takes care of the (0+0j)**(0+0j) case. It's worth noting that the general case at the bottom returns the expected (NaN+NaN*j). However, we can't just remove this if-clause; it makes (0+0j)**(-1+0j) return (NaN+NaN*j). It also makes (0+0j)**(1.5+0j) give (NaN+NaN*j), too. > -- r^x is either 0^ = 1, or 0^ = Inf. > > -- r^(i*y) = exp(i*y*ln(r)). If y != 0 (i.e. complex power), then taking > the ln of r = 0 is -Inf. But what's exp(i*-Inf)? It's probably NaN, > since nothing else makes sense. > > Note that if y == 0 (real power), then this term is still NaN (y*ln(r) > = 0*ln(0) = Nan). However, by convention, 0^ is something other > than NaN. > > -- exp(i*theta*x) is just a complex number. > > -- exp(-theta*y) is just a real number. > > Therefore, for 0^ we have Inf * NaN * * , > which is NaN. > > Another observation to chew on. I know NumPy != Matlab, but FWIW, > here's what Matlab says about these values: > > >> A = [0, 0, 0] > > A = > > 0 0 0 > > >> B = [-1, 0, 1+1*i] > > B = > > -1.0000 0 1.0000 + 1.0000i > > >> A .^ B > > ans = > > Inf 1.0000 NaN + NaNi > > > > Any reactions to this? Does NumPy just make library calls when > computing power, or does it do any trapping of corner cases? And > should the returns from power conform to the above suggestions? In this case, I think Matlab looks about right. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Wed Feb 27 19:58:16 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Feb 2008 18:58:16 -0600 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <47C600F6.1080108@noaa.gov> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> <3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com> <47C600F6.1080108@noaa.gov> Message-ID: <3d375d730802271658q76f48fddgc6f4d949a60ac7b6@mail.gmail.com> On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote: > Robert Kern wrote: > > Fixed in r4827. > > Thanks Robert. For the record, this is the fixed version: > > comment_start = line.find(comments) > if comment_start > 0: > line = line[:comments_start].strip() > else: > line = line.strip() > > Just as a matter of interest, why this, rather than line.index()? Are > exceptions slower than an if test? Yes. > Also, > > I don't see any io tests in: > > numpy/lib/tests > > Is that where they should be? It seems like a good idea to have a few... Yes. > If I did find the time to write some tests -- how does one go about it > for this sort of thing? Do I put a couple sample input files in SVN? Or > does the test code write out the sample files, then read them in to > test? Or maybe do it all in memory with sStringIO or something. Any of the above depending on the situation. Use cStringIO if you can. Put files into numpy/lib/tests/data/ otherwise. Locate them using os.path.join(os.path.dirname(__file__), 'data', 'mytestfile.dat'). Write things out at runtime *only* if you use tempfile correctly and are sure you clean up properly after yourself whether the test passes or fails. > Are > there any examples of tests of file reading code that I could borrow from? numpy/lib/tests/test_format.py Unfortunately, they have been written for nose, which we haven't moved to, yet, for numpy itself. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From travis at enthought.com Wed Feb 27 22:12:21 2008 From: travis at enthought.com (Travis Vaught) Date: Wed, 27 Feb 2008 21:12:21 -0600 Subject: [Numpy-discussion] ANN: Enthought Python Distribution - Beta Message-ID: Greetings, Enthought is very excited about our pending wide-release of the Enthought Python Distribution (EPD). After much effort, we finally think we're close to the first non-beta release. As one more quality check, we'd love to impose on you guys one more time to try out a just- minted beta release for Windows (EPD 2.5.2001_beta1) and give us some feedback. Any major problems will, of course, be fixed for the next release, but we're open to any suggestions for improvement for future releases as well. http://www.enthought.com/epd For those of you unfamiliar with EPD, it's a "kitchen-sink-included" distribution of Python with over 60 additional tools and libraries. It's bundled into a nice MSI installer on Windows and includes NumPy, SciPy, IPython, 2D and 3D visualization, database adapters and a lot of other tools right out of the box. We'll have support for RedHat and Mac OS X in a general release very soon. For academic, non-profit or hobbyist use, EPD is, and will remain, free. We are charging an annual subscription for commercial and governmental access to downloads and updates of EPD. Downloaded files may be used indefinitely past the subscription term. You are welcome to try out the beta indefinitely, regardless of your commercial/non- commercial persuasion. When the final (non-beta) version is released, commercial folks can try it for 30 days. You can check out the license terms (http://www.enthought.com/products/epdlicense.php) if you're interested in the details. EPD is compelling because it solves a lingering packaging and distribution problem, but also because of the libraries which it includes. We owe many folks on this list a debt of gratitude for their work on some really great tools. So, thanks ... and enjoy! Best Regards, Travis N. Vaught From aisaac at american.edu Thu Feb 28 01:12:02 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 28 Feb 2008 01:12:02 -0500 Subject: [Numpy-discussion] =?utf-8?q?loadtxt_broken_if_file_does_not_end_?= =?utf-8?q?in=09newline?= In-Reply-To: <3d375d730802271658q76f48fddgc6f4d949a60ac7b6@mail.gmail.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com><47C5A0E5.9050302@noaa.gov><91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com><47C5C5D0.6090400@noaa.gov><47C5DE87.9090102@enthought.com><3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com><47C600F6.1080108@noaa.gov><3d375d730802271658q76f48fddgc6f4d949a60ac7b6@mail.gmail.com> Message-ID: > On Wed, 27 Feb 2008, Robert Kern apparently wrote: >> Fixed in r4827. > On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote: >> For the record, this is the fixed version: >> comment_start = line.find(comments) >> if comment_start > 0: >> line = line[:comments_start].strip() >> else: >> line = line.strip() Three problems. 1. I do not see this change here: Am I looking in the wrong place? 2. Can I assume this was not cut and past? Otherwise, I see two problems. 2a. comment_start vs. comments_start (spelling) 2b. >0 instead of >=0 (e.g., "#try me!" would not be skipped) So I think the desired lines are actually:: comment_start = line.find(comments) if comment_start >= 0: line = line[:comment_start].strip() else: line = line.strip() return line Cheers, Alan Isaac From aisaac at american.edu Thu Feb 28 01:12:03 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 28 Feb 2008 01:12:03 -0500 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: On Wed, 27 Feb 2008, Stuart Brorson apparently wrote: > ** 0^0: This is problematic. Accessible discussion: Cheers, Alan Isaac From aisaac at american.edu Thu Feb 28 01:12:05 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 28 Feb 2008 01:12:05 -0500 Subject: [Numpy-discussion] ANN: Enthought Python Distribution - Beta In-Reply-To: References: Message-ID: On Wed, 27 Feb 2008, Travis Vaught apparently wrote: > http://www.enthought.com/epd Looks good. An increasing number of my students are buying Macs, so the OSX support will be very welcome. Cheers, Alan Isaac From robert.kern at gmail.com Thu Feb 28 01:39:51 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 00:39:51 -0600 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: References: <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> <3d375d730802271456mbce26e6k4c2dacaf232fc39d@mail.gmail.com> <47C600F6.1080108@noaa.gov> <3d375d730802271658q76f48fddgc6f4d949a60ac7b6@mail.gmail.com> Message-ID: <3d375d730802272239y48cd5824q77e92134306e40bc@mail.gmail.com> On Thu, Feb 28, 2008 at 12:12 AM, Alan G Isaac wrote: > > On Wed, 27 Feb 2008, Robert Kern apparently wrote: > >> Fixed in r4827. > > > > > On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote: > >> For the record, this is the fixed version: > >> comment_start = line.find(comments) > >> if comment_start > 0: > >> line = line[:comments_start].strip() > >> else: > >> line = line.strip() > > > Three problems. > 1. I do not see this change here: > > Am I looking in the wrong place? I fixed the version in numpy/lib/io.py. I didn't know there was a second version lying around. It was moved there during in the lib_io branch but did not get removed from numpy/core during the merge. > 2. Can I assume this was not cut and past? > Otherwise, I see two problems. > > 2a. comment_start vs. comments_start (spelling) > 2b. >0 instead of >=0 (e.g., "#try me!" would not be skipped) > > So I think the desired lines are actually:: > > > comment_start = line.find(comments) > if comment_start >= 0: > line = line[:comment_start].strip() > else: > line = line.strip() > return line The errors were real. They are now fixed, thank you. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From devnew at gmail.com Thu Feb 28 02:17:15 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Wed, 27 Feb 2008 23:17:15 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector Message-ID: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> i all I am learning PCA method by reading up Turk&Petland papers etc while trying out PCA on a set of greyscale images using python, and numpy I tried to create eigenvectors and facespace. i have facesarray--- an NXP numpy.ndarray that contains data of images N=numof images,P=pixels in an image avgarray --1XP array containing avg value for each pixel adjustedfaces=facesarray-avgarray adjustedmatrix=matrix(adjustedfaces) adjustedmatrix_trans=adjustedmatrix.transpose() covariancematrix =adjustedmatrix*adjustedmatrix_trans evalues,evect=eigh(covariancematrix) after sorting such that most significant eigenvectors are selected. evectmatrix is now my eigenvectors matrix here is a sample using 4X3 greyscale images evalues [ -1.85852801e-13 6.31143639e+02 3.31182765e+03 5.29077871e+03] evect [[ 0.5 -0.06727772 0.6496399 -0.56871936] [ 0.5 -0.77317718 -0.37697426 0.10043632] [ 0.5 0.27108233 0.31014514 0.76179023] [ 0.5 0.56937257 -0.58281078 -0.29350719]] evectmatrix (sorted according to largest evalue first) [[-0.56871936 0.6496399 -0.06727772 0.5 ] [ 0.10043632 -0.37697426 -0.77317718 0.5 ] [ 0.76179023 0.31014514 0.27108233 0.5 ] [-0.29350719 -0.58281078 0.56937257 0.5 ]] then i can create facespace by facespace=evectmat*adjustedfaces till now i 've been following the steps as mentioned in the PCA tutorial(by Lindsay smith & others) what i want to know is that in the above evectmatrix is each row ([-0.56871936 0.6496399 -0.06727772 0.5 ] etc) an eigenvector? or does a column in the above matrix represent an eigenvector? to put it differently, should i represent an eigenvector by evectmatrix[i] or by (get_column_i_of(evectmatrix)).transpose() if someone can make this clear please do D From matthieu.brucher at gmail.com Thu Feb 28 03:27:49 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 28 Feb 2008 09:27:49 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> Message-ID: Hi, If your images are 4x3, your eigenvector must be 12 long. Matthieu 2008/2/28, devnew at gmail.com : > > i all > I am learning PCA method by reading up Turk&Petland papers etc > while trying out PCA on a set of greyscale images using python, and > numpy I tried to create eigenvectors and facespace. > > i have > facesarray--- an NXP numpy.ndarray that contains data of images > N=numof images,P=pixels in an image > avgarray --1XP array containing avg value for each pixel > adjustedfaces=facesarray-avgarray > adjustedmatrix=matrix(adjustedfaces) > adjustedmatrix_trans=adjustedmatrix.transpose() > covariancematrix =adjustedmatrix*adjustedmatrix_trans > evalues,evect=eigh(covariancematrix) > > after sorting such that most significant eigenvectors are selected. > evectmatrix is now my eigenvectors matrix > > here is a sample using 4X3 greyscale images > > evalues > [ -1.85852801e-13 6.31143639e+02 3.31182765e+03 5.29077871e+03] > evect > [[ 0.5 -0.06727772 0.6496399 -0.56871936] > [ 0.5 -0.77317718 -0.37697426 0.10043632] > [ 0.5 0.27108233 0.31014514 0.76179023] > [ 0.5 0.56937257 -0.58281078 -0.29350719]] > > evectmatrix (sorted according to largest evalue first) > [[-0.56871936 0.6496399 -0.06727772 0.5 ] > [ 0.10043632 -0.37697426 -0.77317718 0.5 ] > [ 0.76179023 0.31014514 0.27108233 0.5 ] > [-0.29350719 -0.58281078 0.56937257 0.5 ]] > > then i can create facespace by > facespace=evectmat*adjustedfaces > > till now i 've been following the steps as mentioned in the PCA > tutorial(by Lindsay smith & others) > what i want to know is that in the above evectmatrix is each row > ([-0.56871936 0.6496399 -0.06727772 0.5 ] etc) an eigenvector? > or does a column in the above matrix represent an eigenvector? > to put it differently, > should i represent an eigenvector by > evectmatrix[i] or by > (get_column_i_of(evectmatrix)).transpose() > > if someone can make this clear please do > D > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From devnew at gmail.com Thu Feb 28 06:54:55 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Thu, 28 Feb 2008 03:54:55 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> Message-ID: On Feb 28, 1:27 pm, "Matthieu Brucher" wrote > If your images are 4x3, your eigenvector must be 12 long. hi thanx for reply i am using 4 images each of size 4X3 the covariance matrix obtained from adjfaces*faces_trans is 4X4 in size and that produces the evalues and eigenvectors given here evalues,evect=eigh(covarmat) i will give the data i used facemat (ndarray from data of 4 images each4X3) [[ 173. 87. 88. 163. 167. 72. 75. 159. 170. 101. 88. 165.] [ 158. 103. 115. 152. 138. 58. 81. 153. 126. 68. 73. 143.] [ 180. 87. 107. 180. 167. 65. 86. 182. 113. 41. 55. 143.] [ 155. 117. 128. 147. 147. 70. 93. 146. 153. 65. 93. 155.]] avgvals [ 166.5 98.5 109.5 160.5 154.75 66.25 83.75 160. 140.5 68.75 77.25 151.5 ] adjfaces=matrix(facemat-avgvals) [[ 6.5 -11.5 -21.5 2.5 12.25 5.75 -8.75 -1. 29.5 32.25 10.75 13.5 ] [ -8.5 4.5 5.5 -8.5 -16.75 -8.25 -2.75 -7. -14.5 -0.75 -4.25 -8.5 ] [ 13.5 -11.5 -2.5 19.5 12.25 -1.25 2.25 22. -27.5 -27.75 -22.25 -8.5 ] [-11.5 18.5 18.5 -13.5 -7.75 3.75 9.25 -14. 12.5 -3.75 15.75 3.5 ]] faces_trans =adjfaces.transpose() [[ 6.5 -8.5 13.5 -11.5 ] [-11.5 4.5 -11.5 18.5 ] [-21.5 5.5 -2.5 18.5 ] [ 2.5 -8.5 19.5 -13.5 ] [ 12.25 -16.75 12.25 -7.75] [ 5.75 -8.25 -1.25 3.75] [ -8.75 -2.75 2.25 9.25] [ -1. -7. 22. -14. ] [ 29.5 -14.5 -27.5 12.5 ] [ 32.25 -0.75 -27.75 -3.75] [ 10.75 -4.25 -22.25 15.75] [ 13.5 -8.5 -8.5 3.5 ]] covarmat =adjfaces * faces_trans [[ 3111.8125 -1080.4375 -1636.4375 -394.9375] [-1080.4375 901.3125 -114.6875 293.8125] [-1636.4375 -114.6875 3435.3125 -1684.1875] [ -394.9375 293.8125 -1684.1875 1785.3125]] evalues,evectors=eigh(covarmat) evalues [ -1.85852801e-13 6.31143639e+02 3.31182765e+03 5.29077871e+03] evectors [[ 0.5 -0.06727772 0.6496399 -0.56871936] [ 0.5 -0.77317718 -0.37697426 0.10043632] [ 0.5 0.27108233 0.31014514 0.76179023] [ 0.5 0.56937257 -0.58281078 -0.29350719]] newevectmatrix [[-0.56871936 0.6496399 -0.06727772 0.5 ] [ 0.10043632 -0.37697426 -0.77317718 0.5 ] [ 0.76179023 0.31014514 0.27108233 0.5 ] [-0.29350719 -0.58281078 0.56937257 0.5 ]] i am not getting the eigenvector of length 12 as you said pls tell me if i am doing sthing wrong D From matthieu.brucher at gmail.com Thu Feb 28 07:06:10 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 28 Feb 2008 13:06:10 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> Message-ID: OK, what you are getting are not the eigenvectors of you data, but the eigenvectors of the transposition of your data (I suppose). You have two options : - either you make an eigen analysis of your data and get 12 eigenvectors - either you make an eigen analysis of the transposition of your data and then you must get the 4 eignevectors into your original space (I don't remember the exact equations to do thge latter). Matthieu 2008/2/28, devnew at gmail.com : > > > > On Feb 28, 1:27 pm, "Matthieu Brucher" wrote > > > If your images are 4x3, your eigenvector must be 12 long. > > > hi > thanx for reply > i am using 4 images each of size 4X3 > the covariance matrix obtained from adjfaces*faces_trans is 4X4 in > size and that produces the evalues and eigenvectors given here > evalues,evect=eigh(covarmat) i will give the data i used > > facemat (ndarray from data of 4 images each4X3) > [[ 173. 87. 88. 163. 167. 72. 75. 159. 170. 101. 88. > 165.] > [ 158. 103. 115. 152. 138. 58. 81. 153. 126. 68. 73. > 143.] > [ 180. 87. 107. 180. 167. 65. 86. 182. 113. 41. 55. > 143.] > [ 155. 117. 128. 147. 147. 70. 93. 146. 153. 65. 93. > 155.]] > avgvals > [ 166.5 98.5 109.5 160.5 154.75 66.25 83.75 160. > 140.5 > 68.75 77.25 151.5 ] > > adjfaces=matrix(facemat-avgvals) > [[ 6.5 -11.5 -21.5 2.5 12.25 5.75 -8.75 -1. 29.5 > 32.25 > 10.75 13.5 ] > [ -8.5 4.5 5.5 -8.5 -16.75 -8.25 -2.75 -7. -14.5 > -0.75 > -4.25 -8.5 ] > [ 13.5 -11.5 -2.5 19.5 12.25 -1.25 2.25 22. -27.5 > -27.75 > -22.25 -8.5 ] > [-11.5 18.5 18.5 -13.5 -7.75 3.75 9.25 -14. 12.5 > -3.75 > 15.75 3.5 ]] > > faces_trans =adjfaces.transpose() > [[ 6.5 -8.5 13.5 -11.5 ] > [-11.5 4.5 -11.5 18.5 ] > [-21.5 5.5 -2.5 18.5 ] > [ 2.5 -8.5 19.5 -13.5 ] > [ 12.25 -16.75 12.25 -7.75] > [ 5.75 -8.25 -1.25 3.75] > [ -8.75 -2.75 2.25 9.25] > [ -1. -7. 22. -14. ] > [ 29.5 -14.5 -27.5 12.5 ] > [ 32.25 -0.75 -27.75 -3.75] > [ 10.75 -4.25 -22.25 15.75] > [ 13.5 -8.5 -8.5 3.5 ]] > covarmat =adjfaces * faces_trans > [[ 3111.8125 -1080.4375 -1636.4375 -394.9375] > [-1080.4375 901.3125 -114.6875 293.8125] > [-1636.4375 -114.6875 3435.3125 -1684.1875] > [ -394.9375 293.8125 -1684.1875 1785.3125]] > > evalues,evectors=eigh(covarmat) > > evalues > [ -1.85852801e-13 6.31143639e+02 3.31182765e+03 5.29077871e+03] > > evectors > > [[ 0.5 -0.06727772 0.6496399 -0.56871936] > [ 0.5 -0.77317718 -0.37697426 0.10043632] > [ 0.5 0.27108233 0.31014514 0.76179023] > [ 0.5 0.56937257 -0.58281078 -0.29350719]] > > newevectmatrix > > [[-0.56871936 0.6496399 -0.06727772 0.5 ] > [ 0.10043632 -0.37697426 -0.77317718 0.5 ] > [ 0.76179023 0.31014514 0.27108233 0.5 ] > [-0.29350719 -0.58281078 0.56937257 0.5 ]] > > > i am not getting the eigenvector of length 12 as you said > pls tell me if i am doing sthing wrong > > D > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdb at cloud9.net Thu Feb 28 07:46:34 2008 From: sdb at cloud9.net (Stuart Brorson) Date: Thu, 28 Feb 2008 07:46:34 -0500 (EST) Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: >> ** 0^0: This is problematic. > > Accessible discussion: > Thanks. That was quite informative. Indeed, I communicated with a math professor at MIT who also more or less convinced me that 0^0 = 1. Stuart From ndbecker2 at gmail.com Thu Feb 28 08:21:54 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 28 Feb 2008 08:21:54 -0500 Subject: [Numpy-discussion] A little help please? References: <47C42CB2.7080007@enthought.com> <47C578D1.5060307@enthought.com> Message-ID: Travis E. Oliphant wrote: > Neal Becker wrote: >> Travis E. Oliphant wrote: >> >> >> >> >> The code for this is a bit hard to understand. It does appear that it >> only >> searches for a conversion on the 2nd argument. I don't think that's >> desirable behavior. >> ... > > Thus, what is missing is code to search all the linked lists in all the > entries of all the user-defined types on input (only the linked-list > keyed by the first user-defined type is searched at the moment). This > would allow similar behavior to the built-in types (but a bit more > expensive searching). > Sounds like this needs a bit of re-thinking. Given a set of function signatures: F(a,b,c) F(d,e,f) ... The user calls: F(A,B,C) (no relation between a,A ,etc) How do we find the 'best' match? I think we can start with: Rules: 1) Only allowed (at most) 1 conversion on each argument But what is the 'best' match mean? I think that can't be decided without some sort of hierarchical relation of the types. Now given a hierarchy, I still don't know the solution, but it sounds like some kind of graph algorithm. From ndbecker2 at gmail.com Thu Feb 28 08:35:55 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 28 Feb 2008 08:35:55 -0500 Subject: [Numpy-discussion] numpyx.pyx (recent svn) works? Message-ID: I tried numpyx.pyx with cython-0.9.6.12. Here's what I got: In [2]: import numpyx In [3]: numpyx.test() -=-=-=-=-=-=-=-=-=-= printing array info for ndarray at 0x0 print number of dimensions: 0 address of strides: 0x0 strides: memory dump: -1e-30 -=-=-=-=-=-=-=-=-=-= -=-=-=-=-=-=-=-=-=-= --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/nbecker/cython/ in () /home/nbecker/cython/numpyx.pyx in numpyx.test() 94 95 for arr in [arr1,arr2,arr3,arr4,arr5]: ---> 96 print arr 97 print_array_info(arr) 98 /home/nbecker/cython/numpyx.pyx in numpyx.print_array_info() 12 13 print '-='*10 ---> 14 print 'printing array info for ndarray at 0x%0lx'%(arr,) 15 print 'print number of dimensions:',arr.nd 16 print 'address of strides: 0x%0lx'%(arr.strides,) TypeError: only length-1 arrays can be converted to Python scalars From arnar.flatberg at gmail.com Thu Feb 28 08:37:00 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Thu, 28 Feb 2008 14:37:00 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> Message-ID: <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> On Thu, Feb 28, 2008 at 8:17 AM, devnew at gmail.com wrote: > i all > I am learning PCA method by reading up Turk&Petland papers etc > while trying out PCA on a set of greyscale images using python, and > numpy I tried to create eigenvectors and facespace. > > i have > facesarray--- an NXP numpy.ndarray that contains data of images > N=numof images,P=pixels in an image > avgarray --1XP array containing avg value for each pixel > adjustedfaces=facesarray-avgarray > adjustedmatrix=matrix(adjustedfaces) > adjustedmatrix_trans=adjustedmatrix.transpose() > covariancematrix =adjustedmatrix*adjustedmatrix_trans > evalues,evect=eigh(covariancematrix) > > after sorting such that most significant eigenvectors are selected. > evectmatrix is now my eigenvectors matrix > > here is a sample using 4X3 greyscale images > > evalues > [ -1.85852801e-13 6.31143639e+02 3.31182765e+03 5.29077871e+03] > evect > [[ 0.5 -0.06727772 0.6496399 -0.56871936] > [ 0.5 -0.77317718 -0.37697426 0.10043632] > [ 0.5 0.27108233 0.31014514 0.76179023] > [ 0.5 0.56937257 -0.58281078 -0.29350719]] > > evectmatrix (sorted according to largest evalue first) > [[-0.56871936 0.6496399 -0.06727772 0.5 ] > [ 0.10043632 -0.37697426 -0.77317718 0.5 ] > [ 0.76179023 0.31014514 0.27108233 0.5 ] > [-0.29350719 -0.58281078 0.56937257 0.5 ]] > > then i can create facespace by > facespace=evectmat*adjustedfaces > > till now i 've been following the steps as mentioned in the PCA > tutorial(by Lindsay smith & others) > what i want to know is that in the above evectmatrix is each row > ([-0.56871936 0.6496399 -0.06727772 0.5 ] etc) an eigenvector? > or does a column in the above matrix represent an eigenvector? The eigenvectors are in columns. To ensure yourself, look at the last constant column (of 0.5's) corresponding to the zero-eigenvalue. This id due to the initial column centering. > to put it differently, > should i represent an eigenvector by > evectmatrix[i] or by > (get_column_i_of(evectmatrix)).transpose() > > if someone can make this clear please do > D > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > BTW: If your data is not extreme these simple steps would also result in what you want (Not tested): ------------- from scipy import linalg facearray-=facearray.mean(0) #mean centering u, s, vt = linalg.svd(facearray, 0) scores = u*s facespace = vt.T # reconstruction: facearray ~= dot(scores, facespace.T) explained_variance = 100*s.cumsum()/s.sum() # here is how to reconstruct an `eigen-image` from the first component # You may want to ensure this as it depends on how you created the facearray face_image0 = facespace[:,0].reshape(4,3) ----------- In case you have a large dataset (many pixels *and* many images) you may look into using the arpack eigensolver for efficiency (located in scikits and appearing in the upcomming release of scipy, 0.7) Arnar From dalcinl at gmail.com Thu Feb 28 09:18:23 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 28 Feb 2008 11:18:23 -0300 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: <47C5DE87.9090102@enthought.com> References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> Message-ID: On 2/27/08, Travis E. Oliphant wrote: > Did this discussion resolve with a fix that can go in before 1.0.5 is > released? I believe the answer is yes, but we have to choose: 1- Use the regepx based solution of David. 2- Move to use 'index' instead of 'find' as proposed by Alan and implemented by Christopher in example code. In my view, (1) is more powerful regarding future improvements; but (2) is far simpler if we are looking for just a fix. Someone will have to take a decision about what approach should be followed. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From devnew at gmail.com Thu Feb 28 09:41:41 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Thu, 28 Feb 2008 06:41:41 -0800 (PST) Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> Message-ID: <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> > Arnar wrote > from scipy import linalg > facearray-=facearray.mean(0) #mean centering > u, s, vt = linalg.svd(facearray, 0) > scores = u*s > facespace = vt.T hi Arnar when i do this i get these u =< 'numpy.core.defmatrix.matrix'> (4, 4) that matches the eigenvectors matrix in my previous data s=< 'numpy.ndarray'> (4,) and vt=<'numpy.core.defmatrix.matrix'> (4, 12) here scores=u*s causes a matrix not aligned error.. is there something wrong in the calculation? D From arnar.flatberg at gmail.com Thu Feb 28 10:17:15 2008 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Thu, 28 Feb 2008 16:17:15 +0100 Subject: [Numpy-discussion] confusion about eigenvector In-Reply-To: <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> References: <38127f22-da3a-4479-90e6-fc97de31f64e@e60g2000hsh.googlegroups.com> <5d3194020802280537k15b31bakee9526cffa394a51@mail.gmail.com> <19c4cb45-1cda-4128-ba67-d1e14015d768@h25g2000hsf.googlegroups.com> Message-ID: <5d3194020802280717m100083efu30263ce34fdc4f4@mail.gmail.com> On Thu, Feb 28, 2008 at 3:41 PM, devnew at gmail.com wrote: > > Arnar wrote > > > from scipy import linalg > > facearray-=facearray.mean(0) #mean centering > > u, s, vt = linalg.svd(facearray, 0) > > scores = u*s > > facespace = vt.T > > hi Arnar > when i do this i get these > u =< 'numpy.core.defmatrix.matrix'> (4, 4) > that matches the eigenvectors matrix in my previous data > s=< 'numpy.ndarray'> (4,) > and > vt=<'numpy.core.defmatrix.matrix'> (4, 12) > > here > scores=u*s causes a matrix not aligned error.. > > is there something wrong in the calculation? > > > D > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion This example assumes that facearray is an ndarray.(like you described in original post ;-) ) It looks like you are using a matrix. (u =< 'numpy.core.defmatrix.matrix'> (4, 4)) . This causes the u*s-broadcasting to fail. Try again, with: facearray = numpy.asarray(facearray), before calculation. Arnar From mailinglist.honeypot at gmail.com Thu Feb 28 10:18:29 2008 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Thu, 28 Feb 2008 10:18:29 -0500 Subject: [Numpy-discussion] ANN: Enthought Python Distribution - Beta In-Reply-To: References: Message-ID: > On Wed, 27 Feb 2008, Travis Vaught apparently wrote: >> http://www.enthought.com/epd > > Looks good. > An increasing number of my students are buying Macs, > so the OSX support will be very welcome. Yeah ... agreed, this is great! -steve From rblove_lists at comcast.net Thu Feb 28 10:29:37 2008 From: rblove_lists at comcast.net (Robert Love) Date: Thu, 28 Feb 2008 09:29:37 -0600 Subject: [Numpy-discussion] ANN: Enthought Python Distribution - Beta In-Reply-To: References: Message-ID: I'd drive to Austin and wash and wax their cars if there was an OSX distribution. OK, mild exaggeration but I'd buy them a beer or two. On Feb 28, 2008, at 9:18 AM, Steve Lianoglou wrote: >> On Wed, 27 Feb 2008, Travis Vaught apparently wrote: >>> http://www.enthought.com/epd >> >> Looks good. >> An increasing number of my students are buying Macs, >> so the OSX support will be very welcome. > > Yeah ... agreed, this is great! > -steve > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Thu Feb 28 11:17:08 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 10:17:08 -0600 Subject: [Numpy-discussion] numpyx.pyx (recent svn) works? In-Reply-To: References: Message-ID: <3d375d730802280817h6271fcf6gcd027b1c774af855@mail.gmail.com> On Thu, Feb 28, 2008 at 7:35 AM, Neal Becker wrote: > I tried numpyx.pyx with cython-0.9.6.12. These were written for and still work with Pyrex. If it doesn't work with Cython then that is either a bug in Cython or an intentional incompatibility of Cython. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sameerslists at gmail.com Thu Feb 28 11:35:43 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Thu, 28 Feb 2008 10:35:43 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) Message-ID: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> Hi, I'm having trouble renaming record array fields if they contain object arrays in them. I followed the solutions posted by Robert Kern and Stefan van der Walt (Thanks again) but it doesn't look like this method works in all cases. For reference: http://projects.scipy.org/pipermail/numpy-discussion/2008-February/031509.html In [1]: from numpy import * In [2]: olddt = dtype([('foo', '|O4'), ('bar', float)]) In [3]: a = zeros(10, olddt) In [4]: a Out[4]: array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)], dtype=[('foo', '|O4'), ('bar', ' TypeError: Cannot change data-type for object array. -- Sameer From oliphant at enthought.com Thu Feb 28 11:48:41 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 28 Feb 2008 10:48:41 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> Message-ID: <47C6E5E9.4030201@enthought.com> Sameer DCosta wrote: > Hi, > > I'm having trouble renaming record array fields if they contain object > arrays in them. I followed the solutions posted by Robert Kern and > Stefan van der Walt (Thanks again) but it doesn't look like this > method works in all cases. For reference: > http://projects.scipy.org/pipermail/numpy-discussion/2008-February/031509.html > > In [1]: from numpy import * > > In [2]: olddt = dtype([('foo', '|O4'), ('bar', float)]) > > In [3]: a = zeros(10, olddt) > > In [4]: a > Out[4]: > array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), > (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)], > dtype=[('foo', '|O4'), ('bar', ' > In [5]: newdt = dtype([('notfoo', '|O4'), ('notbar', float)]) > > In [6]: b = a.view(newdt) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > > /home/titan/sameer/projects/ > > TypeError: Cannot change data-type for object array. > This looks like a bug. We are being a bit over-zealous in protecting you from getting access to pointers and in the process making it impossible to rename Object fields. Perhaps an actual field-renaming API (which would be relatively easy) is useful. -Travis O. From tim.hochberg at ieee.org Thu Feb 28 12:57:02 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Thu, 28 Feb 2008 10:57:02 -0700 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: On Wed, Feb 27, 2008 at 4:10 PM, Stuart Brorson wrote: > I have been poking at the limits of NumPy's handling of powers of > zero. I find some results which are disturbing, at least to me. > Here they are: [SNIP] > > ** 0^(x+y*i): This one is tricky; please bear with me and I'll walk > through the reason it should be NaN. > > In general, one can write a^(x+y*i) = (r exp(i*theta))^(x+y*i) where > r, theta, x, and y are all reals. Then, this expression can be > rearranged as: > > (r^x) * (r^i*y) * exp(i*theta*(x+y*i)) > > = (r^x) * (r^i*y) * exp(i*theta*x) * exp(-theta*y) > > Now consider what happens to each term if r = 0. > > -- r^x is either 0^ = 1, or 0^ = Inf. > > -- r^(i*y) = exp(i*y*ln(r)). If y != 0 (i.e. complex power), then taking > the ln of r = 0 is -Inf. But what's exp(i*-Inf)? It's probably NaN, > since nothing else makes sense. > > Note that if y == 0 (real power), then this term is still NaN (y*ln(r) > = 0*ln(0) = Nan). However, by convention, 0^ is something other > than NaN. > > -- exp(i*theta*x) is just a complex number. > > -- exp(-theta*y) is just a real number. > > Therefore, for 0^ we have Inf * NaN * * , > which is NaN. [SNIP] I don't buy this argument. For instance it is claimed 0*ln(0) is NaN, but IIRC the correct way to determine a value for cases like this is to take the limit. (It's been a long time since I worked through any limits, so take this with a grain of salt.) We are interested in: Limit (x^(a+bj)) as x -> 0. First consider the case where a > 0. x^(a+bj) = x^a x^bj We guess that the limit is zero as x->0. In other words |x^a x^bj - 0| -> 0 (I'll leave out all the epsilon and delta stuff and just wave my hands here. Thus we are interested in the behaviour of |x^a| |x^bj| = |x^a| which does indeed go to zero as x -> 0 for a > 0. For a < 0, the limit diverges and it could be argued that the appropriate value is either Infinity or undefined (NaN), depending on your preference. Personally, I'd opt for Inf. For a = 0, the limit neither converges nor diverges; it simply oscillates around the unit circle, so in that case the value is indeed undefined. Unless b = 0 as well, in which case, the value is 1. In summary: 0^(a+bj) = 0 for a > 0 0^(a+bj) = 1 for a == b == 0 0^(a+bj) = NaN for a == 0, b != 0 0^(a+bj) = Inf for a < 0 I suppose one could use NaN + j NaN, as some have proposed, but is seems unnecessary; an undefined number is undefined and it's not usually necessary to state that both the real and imaginary parts are undefined. Similarly I suspect that someone will complain that the result is not necessarily real for the (a < 0) case. However, as I recall, in complex analysis, infinity usually refers to a point infinitely distant from the origin with no particular phase associated with it. There is a certain conflict between this mathematical view and an numerical point of view where Inf and NaN are typically viewed as pure real numbers, so the above results could be rejected for that reason. However, I suspect that the phase issues don't matter -- the only time you get back into the normal numbers when dealing with Inf, is by dividing Inf by a number and in that case one always get's zero. Similarly, once you have a NaN, you never get back to the normal real / imaginary numbers when one of the inputs is a NaN, so NaN verus NaN + j NaN, are essentially equivalent and the simple NaN is cleaner. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Thu Feb 28 13:19:48 2008 From: cburns at berkeley.edu (Christopher Burns) Date: Thu, 28 Feb 2008 10:19:48 -0800 Subject: [Numpy-discussion] numpy.test() fails if it runs after scipy.test() In-Reply-To: References: Message-ID: <764e38540802281019t34b5d659q95fc84680a836e66@mail.gmail.com> Loic, I was not able to reproduce this. Have you tried doing a clean install by removing your numpy and scipy directories in the site-packages and reinstalling. I've had old files in my install directory cause problems in the past. You are correct, the tests should not effect one another. Chris On Sat, Feb 23, 2008 at 10:23 PM, Lo?c BERTHE wrote: > Hi, > > I've got one failure if I run numpy.test() after running scipy.test() : > > ====================================================================== > ERROR: Ticket #396 > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", > line 600, in check_poly1d_nan_roots > self.failUnlessRaises(N.linalg.LinAlgError,getattr,p,"r") > File "/home/loic/tmp/Linux/lib/python2.5/unittest.py", line 320, in > failUnlessRaises > callableObj(*args, **kwargs) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", > line 623, in __getattr__ > return roots(self.coeffs) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", > line 124, in roots > roots = _eigvals(A) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/polynomial.py", > line 37, in _eigvals > return eigvals(arg) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/scipy/linalg/decomp.py", > line 378, in eigvals > return eig(a,b=b,left=0,right=0,overwrite_a=overwrite_a) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/scipy/linalg/decomp.py", > line 128, in eig > a1 = asarray_chkfinite(a) > File > "/home/loic/tmp/Linux/lib/python2.5/site-packages/numpy/lib/function_base.py", > line 398, in asarray_chkfinite > raise ValueError, "array must not contain infs or NaNs" > ValueError: array must not contain infs or NaNs > > But I've got no error if I begin with numpy test. > I've seen that this Ticket #396 seems closed in Trac, should I reopen it ? > > for more information, I've attached the results of scipy.test and > numpy.test. > > Regards, > > -- > LB > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Burns, Software Engineer Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From Chris.Barker at noaa.gov Thu Feb 28 13:22:24 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 28 Feb 2008 10:22:24 -0800 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: <47C6FBE0.8070605@noaa.gov> Timothy Hochberg wrote: > I suppose one could use NaN + j NaN, as some have proposed, but is seems > unnecessary; Except don't we want the result to be a complex number (in terms of data storage) so it can be part of a complex array? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Feb 28 13:34:29 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 28 Feb 2008 10:34:29 -0800 Subject: [Numpy-discussion] loadtxt broken if file does not end in newline In-Reply-To: References: <91cf711d0802270750v73d77eefm9f6f0018c1d189bb@mail.gmail.com> <47C5A0E5.9050302@noaa.gov> <91cf711d0802271022v42a99073w1145b615a9fcecf8@mail.gmail.com> <47C5C5D0.6090400@noaa.gov> <47C5DE87.9090102@enthought.com> Message-ID: <47C6FEB5.7090909@noaa.gov> Lisandro Dalcin wrote: > On 2/27/08, Travis E. Oliphant wrote: >> Did this discussion resolve with a fix that can go in before 1.0.5 is >> released? > > I believe the answer is yes, but we have to choose: > > 1- Use the regepx based solution of David. A good idea, but a feature expansion, and it needs more testing -- not ready for 1.0.5 > 2- Move to use 'index' instead of 'find' as proposed by Alan and > implemented by Christopher in example code. Robert's committed that, so we're done for now (though I hope to write a test or two soon...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Glen.Mabey at swri.org Thu Feb 28 14:21:39 2008 From: Glen.Mabey at swri.org (Glen W. Mabey) Date: Thu, 28 Feb 2008 13:21:39 -0600 Subject: [Numpy-discussion] failure building numpy using icc Message-ID: <20080228192138.GA21482@swri.org> Hello, I'm using svn numpy and get the following error upon executing /usr/local/bin/python2.5 setup.py config --noisy --cc=/opt/intel/cce/10.0.025/bin/icc --compiler=intel --fcompiler=intel build_clib build_ext I see: conv_template:> build/src.linux-x86_64-2.5/numpy/core/src/scalartypes.inc Traceback (most recent call last): File "setup.py", line 96, in setup_package() File "setup.py", line 89, in setup_package configuration=configuration ) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/core.py", line 184, in setup return old_setup(**new_attr) File "/usr/local/lib/python2.5/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/local/lib/python2.5/distutils/dist.py", line 974, in run_commands self.run_command(cmd) File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_ext.py", line 56, in run self.run_command('build_src') File "/usr/local/lib/python2.5/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 130, in run self.build_sources() File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 252, in build_extension_sources sources = self.template_sources(sources, ext) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py", line 359, in template_sources outstr = process_c_file(source) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 185, in process_file % (sourcefile, process_str(''.join(lines)))) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 150, in process_str newstr[sub[0]:sub[1]], sub[4]) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 117, in expand_sub % (line, template_re.sub(namerepl, substr))) File "/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py", line 113, in namerepl return names[name][thissub[0]] KeyError: 'PREFIX' And I do not see any errors when building the same svn version with gcc (on a different machine). I've unsuccessfully tried to follow that backtrace of functions to figure out exactly what is going on. Any hints/suggestions? Thanks, Glen Mabey From berthe.loic at gmail.com Thu Feb 28 14:33:00 2008 From: berthe.loic at gmail.com (LB) Date: Thu, 28 Feb 2008 11:33:00 -0800 (PST) Subject: [Numpy-discussion] numpy.test() fails if it runs after scipy.test() In-Reply-To: <764e38540802281019t34b5d659q95fc84680a836e66@mail.gmail.com> References: <764e38540802281019t34b5d659q95fc84680a836e66@mail.gmail.com> Message-ID: <99b6b0bc-e616-425a-a387-f48f2e59414c@41g2000hsc.googlegroups.com> That's very strange, I've made a local installation of numpy 1.0.4 and scipy 0.6 from official source, and I use the same kind of installation script as in http://www.scipy.org/Installing_SciPy/Linux/BuildingFromSource/GCC_1, so I always erase all the local directory to prevent this kind of conflict from happening. What's your configuration ? I'm under GNU/Linux debian lenny on a i686, and, I've just seen that this pb is already in the debian packages of my distribution, which are : > dpkg -l 'python-*py' Souhait=inconnU/Install?/suppRim?/Purg?/H=? garder | ?tat=Non/Install?/fichier-Config/d?paqUet?/?chec-conFig/H=semi- install? |/ Err?=(aucune)/H=? garder/besoin R?installation/X=les deux (?tat,Err: majuscule=mauvais) ||/ Nom Version Description +++-============================-============================- ======================================================================== un python-f2py (aucune description n'est disponible) ii python-ipy 1:0.56-1 Python module for handling IPv4 and IPv6 addresses and networks ii python-numpy 1:1.0.4-5 Numerical Python adds a fast array facility to the Python language ii python-scipy 0.6.0-5.1 scientific tools for Python un python-soappy (aucune description n'est disponible) -- LB From sameerslists at gmail.com Thu Feb 28 15:00:45 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Thu, 28 Feb 2008 14:00:45 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <47C6E5E9.4030201@enthought.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> <47C6E5E9.4030201@enthought.com> Message-ID: <8fb8cc060802281200y614d3027yf00dab5b34fb91a7@mail.gmail.com> I think having a record array field renaming api is a good idea.. I was going to create a ticket for this, but I don't think I have permissions to do this. Can someone who has the permissions, please create it? Thanks. Sameer On Thu, Feb 28, 2008 at 10:48 AM, Travis E. Oliphant wrote: > > Sameer DCosta wrote: > > Hi, > > > > I'm having trouble renaming record array fields if they contain object > > arrays in them. I followed the solutions posted by Robert Kern and > > Stefan van der Walt (Thanks again) but it doesn't look like this > > method works in all cases. For reference: > > http://projects.scipy.org/pipermail/numpy-discussion/2008-February/031509.html > > > > In [1]: from numpy import * > > > > In [2]: olddt = dtype([('foo', '|O4'), ('bar', float)]) > > > > In [3]: a = zeros(10, olddt) > > > > In [4]: a > > Out[4]: > > array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), > > (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)], > > dtype=[('foo', '|O4'), ('bar', ' > > > In [5]: newdt = dtype([('notfoo', '|O4'), ('notbar', float)]) > > > > In [6]: b = a.view(newdt) > > --------------------------------------------------------------------------- > > TypeError Traceback (most recent call last) > > > > /home/titan/sameer/projects/ > > > > TypeError: Cannot change data-type for object array. > > > > This looks like a bug. We are being a bit over-zealous in protecting > you from getting access to pointers and in the process making it > impossible to rename Object fields. > > Perhaps an actual field-renaming API (which would be relatively easy) is > useful. > > -Travis O. > From robert.kern at gmail.com Thu Feb 28 15:11:48 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 14:11:48 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <8fb8cc060802281200y614d3027yf00dab5b34fb91a7@mail.gmail.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> <47C6E5E9.4030201@enthought.com> <8fb8cc060802281200y614d3027yf00dab5b34fb91a7@mail.gmail.com> Message-ID: <3d375d730802281211x4e315c5dm66fbef1e8c34b022@mail.gmail.com> On Thu, Feb 28, 2008 at 2:00 PM, Sameer DCosta wrote: > I think having a record array field renaming api is a good idea.. I > was going to create a ticket for this, but I don't think I have > permissions to do this. Can someone who has the permissions, please > create it? Thanks. Click on the "Register" link in the upper right-hand corner. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lists at cheimes.de Thu Feb 28 15:16:23 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 28 Feb 2008 21:16:23 +0100 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: Stuart Brorson wrote: > I have been poking at the limits of NumPy's handling of powers of > zero. I find some results which are disturbing, at least to me. > Here they are: [SNIPP] Please checkout Mark Dickinson's and my trunk-math branch of Python 2.6. We have put lots of effort into fixing edge cases of floats, math and cmath functions. The return values are either based on the latest revision of IEEE 754 or the last public draft of the C99 standard (1124, Annex F and G). For pow the C99 says: >>> math.pow(0, 0) 1.0 >>> math.pow(0, 1) 0.0 [30859 refs] >>> math.pow(0, float("inf")) 0.0 >>> math.pow(0, float("nan")) nan >>> math.pow(0, -1) Traceback (most recent call last): File "", line 1, in ValueError: math domain error Christian From sdb at cloud9.net Thu Feb 28 15:23:34 2008 From: sdb at cloud9.net (Stuart Brorson) Date: Thu, 28 Feb 2008 15:23:34 -0500 (EST) Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: > Please checkout Mark Dickinson's and my trunk-math branch of Python 2.6. > We have put lots of effort into fixing edge cases of floats, math and > cmath functions. The return values are either based on the latest > revision of IEEE 754 or the last public draft of the C99 standard (1124, > Annex F and G). Thanks for the info! > For pow the C99 says: > >>>> math.pow(0, 0) > 1.0 OK. I've come around to think this is the "right" answer. >>>> math.pow(0, 1) > 0.0 OK. > [30859 refs] >>>> math.pow(0, float("inf")) > 0.0 OK. But what about math.pow(0, -1*float("inf")) >>>> math.pow(0, float("nan")) > nan OK. >>>> math.pow(0, -1) > Traceback (most recent call last): > File "", line 1, in > ValueError: math domain error Why isn't this one inf? Also, what do these specs say about 0^? Cheers, Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/ From andrea.gavana at gmail.com Thu Feb 28 15:47:20 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Thu, 28 Feb 2008 21:47:20 +0100 Subject: [Numpy-discussion] I,J,K Coordinates from Cell ID Message-ID: Hi All, I have some problems in figuring out a solution for an issue I am trying to solve. I have a 3D grid of dimension Nx, Ny, Nz; for every cell of this grid, I calculate the cell centroids (with the cell coordinates x, y, and z) and then I try to find which cell centroid is the closest to a specified point in 3D (which I supply). I still haven't figured out how to do this, even if I have some ideas (and suggestions for this problem are welcome :-D ). But the problem is another one. When I find the closest centroid, I know only the cell ID of this centroid. The cell ID is (usually) defined as: ID = (K-1)*Nx*Ny + (J-1)*Nx + I - 1 Where I, J, K are the cell indexes. Now, the problem is, how can I calculate back the I, J, K indexes knowing only the cell ID? I am trying to solve this using numpy (as my grid is stored using a numpy array), but it's something akin to the Matlab function ind2sub... Thank you for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From sameerslists at gmail.com Thu Feb 28 15:54:09 2008 From: sameerslists at gmail.com (Sameer DCosta) Date: Thu, 28 Feb 2008 14:54:09 -0600 Subject: [Numpy-discussion] Rename record array fields (with object arrays) In-Reply-To: <3d375d730802281211x4e315c5dm66fbef1e8c34b022@mail.gmail.com> References: <8fb8cc060802280835n65b6922dree65a10e79e6c995@mail.gmail.com> <47C6E5E9.4030201@enthought.com> <8fb8cc060802281200y614d3027yf00dab5b34fb91a7@mail.gmail.com> <3d375d730802281211x4e315c5dm66fbef1e8c34b022@mail.gmail.com> Message-ID: <8fb8cc060802281254k3e90457cj31a3e8fdd8c866b0@mail.gmail.com> On Thu, Feb 28, 2008 at 2:11 PM, Robert Kern wrote: > On Thu, Feb 28, 2008 at 2:00 PM, Sameer DCosta wrote: > > was going to create a ticket for this, but I don't think I have > > permissions to do this. Can someone who has the permissions, please > > create it? Thanks. > > Click on the "Register" link in the upper right-hand corner. > Thanks I should have read the wiki page before posting to the list. However, here is the ticket. http://scipy.org/scipy/numpy/ticket/674 Sameer From lists at cheimes.de Thu Feb 28 16:37:35 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 28 Feb 2008 22:37:35 +0100 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: Stuart Brorson wrote: >>>>> math.pow(0, -1) >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: math domain error > > Why isn't this one inf? The standard says return inf and raise a divide-by-zero floating point exception. Since we can't do both in Python we sticked to the exception part. > Also, what do these specs say about 0^? See for yourself http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf The interesting information are in Annex F.9 and Annex G.6. So far we haven't dealt with complex powers and Python doesn't support 0.**1j yet. Christian From irving at pixar.com Thu Feb 28 17:34:47 2008 From: irving at pixar.com (Geoffrey Irving) Date: Thu, 28 Feb 2008 14:34:47 -0800 Subject: [Numpy-discussion] arrays of matrices Message-ID: <20080228223447.GD5991@pixar.com> Hello, I have a large number of points (shape (n,3)), and a matching number of 3x3 matrices (shape (n,3,3)), and I want to compute the product of each matrix times the corresponding point. I can't see a way to do this operation with dot or tensordot, since these routines either sum across an index or treat it as independent between the two arguments. Is this case, I can use the fact that 3 is small to do it, but is there a clean way for handle this kind of "array of matrix" situation in general? Thanks, Geoffrey From robert.kern at gmail.com Thu Feb 28 17:40:54 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 16:40:54 -0600 Subject: [Numpy-discussion] Handling of numpy.power(0, ) In-Reply-To: References: Message-ID: <3d375d730802281440t5236018ci4254997463578ea2@mail.gmail.com> On Thu, Feb 28, 2008 at 3:37 PM, Christian Heimes wrote: > Stuart Brorson wrote: > > Also, what do these specs say about 0^? > > See for yourself > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf The > interesting information are in Annex F.9 and Annex G.6. Thank you! This is quite helpful. G.6.3 tells us most of the special cases for cexp() and clog(). In G.6.4, cpow(z,c) is more-or-less defined as cexp(c * clog(z)). One thing that I was reminded of is that decomposing 0 into the form r*cexp(theta*i) is *not* indeterminate in theta in IEEE floating point arithmetic as I claimed earlier. IEEE floating point models an "affine extension" of the reals. I'm not particularly clear on the more abstract mathematical consequences of that, but the practical upshot is that we have separate +Inf and -Inf as well as +0 and -0. For complex numbers, we also have (+0+0j), (-0+0j), (+0-0j), and (-0-0j) as well as a similar permutation set for the Infs. The practical use of the signed zeros is that clog(+0+0j) can be defined as the limit of clog(z) coming to the origin from a particular direction and thus have a somewhat-arbitrary, but well-defined limit. Namely, the standard defines clog(+0+0j) as (-Inf+pi*j). I'll get around to working through all of the cases at some point, and make a table. Suffice it to say, for the original problem in the thread: power(+0.0, (1+1j)) == cexp((1+1j) * clog(+0.0+0.0j)) == cexp((1+1j) * (-inf+pi*1j)) == cexp((-inf-inf*1j)) == (+-0 + (+-0j)) where the signs are unspecified by the standard. So numpy's behavior happens to be correct for power(0.0, (1+1j)). It is incorrect for power(0.0, (-1.0+0j)). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at ieee.org Thu Feb 28 18:16:13 2008 From: tim.hochberg at ieee.org (Timothy Hochberg) Date: Thu, 28 Feb 2008 16:16:13 -0700 Subject: [Numpy-discussion] I,J,K Coordinates from Cell ID In-Reply-To: References: Message-ID: On Thu, Feb 28, 2008 at 1:47 PM, Andrea Gavana wrote: > Hi All, > > I have some problems in figuring out a solution for an issue I am > trying to solve. I have a 3D grid of dimension Nx, Ny, Nz; for every > cell of this grid, I calculate the cell centroids (with the cell > coordinates x, y, and z) and then I try to find which cell centroid is > the closest to a specified point in 3D (which I supply). I still > haven't figured out how to do this, even if I have some ideas (and > suggestions for this problem are welcome :-D ). But the problem is > another one. > When I find the closest centroid, I know only the cell ID of this > centroid. The cell ID is (usually) defined as: > > ID = (K-1)*Nx*Ny + (J-1)*Nx + I - 1 > > Where I, J, K are the cell indexes. Now, the problem is, how can I > calculate back the I, J, K indexes knowing only the cell ID? I am > trying to solve this using numpy (as my grid is stored using a numpy > array), but it's something akin to the Matlab function ind2sub... > > Thank you for your suggestions. I think you are going to want to use mod (aka '%'). Something like: >>> def coord(id): ... return (id % NX + 1, id // NX % NY + 1, id // (NX*NY) + 1 ) should work. I believe there are a few things you could do to improve the efficiency here, but try this and see if it works for you before you worry about that. Note that the above definition for cell ID is probably a little weird in the context of numpy where all of the indexing starts at zero. -- . __ . |-\ . . tim.hochberg at ieee.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Feb 28 18:57:29 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 17:57:29 -0600 Subject: [Numpy-discussion] arrays of matrices In-Reply-To: <20080228223447.GD5991@pixar.com> References: <20080228223447.GD5991@pixar.com> Message-ID: <3d375d730802281557g3ea8639frfb1d8ecd1456ca65@mail.gmail.com> On Thu, Feb 28, 2008 at 4:34 PM, Geoffrey Irving wrote: > Hello, > > I have a large number of points (shape (n,3)), and a matching > number of 3x3 matrices (shape (n,3,3)), and I want to compute > the product of each matrix times the corresponding point. > > I can't see a way to do this operation with dot or tensordot, > since these routines either sum across an index or treat it > as independent between the two arguments. > > Is this case, I can use the fact that 3 is small to do it, but > is there a clean way for handle this kind of "array of matrix" > situation in general? For dot-products, yes. We can use broadcasting followed by axis summation. In [20]: from numpy import * In [21]: n = 5 In [22]: A = arange(n*3*3).reshape([n,3,3]) In [23]: b = 10*arange(n*3).reshape([n,3]) In [24]: A Out[24]: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]], [[27, 28, 29], [30, 31, 32], [33, 34, 35]], [[36, 37, 38], [39, 40, 41], [42, 43, 44]]]) In [25]: b Out[25]: array([[ 0, 10, 20], [ 30, 40, 50], [ 60, 70, 80], [ 90, 100, 110], [120, 130, 140]]) In [26]: for i in range(n): ....: print dot(A[i], b[i]) ....: ....: [ 50 140 230] [1220 1580 1940] [4010 4640 5270] [ 8420 9320 10220] [14450 15620 16790] In [27]: (A * b.reshape([n, 1, 3])).sum(axis=-1) Out[27]: array([[ 50, 140, 230], [ 1220, 1580, 1940], [ 4010, 4640, 5270], [ 8420, 9320, 10220], [14450, 15620, 16790]]) The magic is in In[27]. We reshape the array of vectors to be compatible with the shape of the array of matrices. When we multiply the two together, it is as if we multiplied two (n,3,3) matrices, the latter being the vectors repeated 3 times. Then we sum along the rows of each of the product matrices to get the desired dot product. PS: Are you perchance the Geoffrey Irving I knew at CalTech, class of '03? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From irving at pixar.com Thu Feb 28 19:43:08 2008 From: irving at pixar.com (Geoffrey Irving) Date: Thu, 28 Feb 2008 16:43:08 -0800 Subject: [Numpy-discussion] arrays of matrices In-Reply-To: <3d375d730802281557g3ea8639frfb1d8ecd1456ca65@mail.gmail.com> References: <20080228223447.GD5991@pixar.com> <3d375d730802281557g3ea8639frfb1d8ecd1456ca65@mail.gmail.com> Message-ID: <20080229004308.GH5991@pixar.com> On Thu, Feb 28, 2008 at 05:57:29PM -0600, Robert Kern wrote: > On Thu, Feb 28, 2008 at 4:34 PM, Geoffrey Irving wrote: > > Hello, > > > > I have a large number of points (shape (n,3)), and a matching > > number of 3x3 matrices (shape (n,3,3)), and I want to compute > > the product of each matrix times the corresponding point. > > > > I can't see a way to do this operation with dot or tensordot, > > since these routines either sum across an index or treat it > > as independent between the two arguments. > > > > Is this case, I can use the fact that 3 is small to do it, but > > is there a clean way for handle this kind of "array of matrix" > > situation in general? > > For dot-products, yes. We can use broadcasting followed by axis summation. > > > In [20]: from numpy import * > > In [21]: n = 5 > > In [22]: A = arange(n*3*3).reshape([n,3,3]) > > In [23]: b = 10*arange(n*3).reshape([n,3]) > > In [24]: A > Out[24]: > array([[[ 0, 1, 2], > [ 3, 4, 5], > [ 6, 7, 8]], > > [[ 9, 10, 11], > [12, 13, 14], > [15, 16, 17]], > > [[18, 19, 20], > [21, 22, 23], > [24, 25, 26]], > > [[27, 28, 29], > [30, 31, 32], > [33, 34, 35]], > > [[36, 37, 38], > [39, 40, 41], > [42, 43, 44]]]) > > In [25]: b > Out[25]: > array([[ 0, 10, 20], > [ 30, 40, 50], > [ 60, 70, 80], > [ 90, 100, 110], > [120, 130, 140]]) > > In [26]: for i in range(n): > ....: print dot(A[i], b[i]) > ....: > ....: > [ 50 140 230] > [1220 1580 1940] > [4010 4640 5270] > [ 8420 9320 10220] > [14450 15620 16790] > > In [27]: (A * b.reshape([n, 1, 3])).sum(axis=-1) > Out[27]: > array([[ 50, 140, 230], > [ 1220, 1580, 1940], > [ 4010, 4640, 5270], > [ 8420, 9320, 10220], > [14450, 15620, 16790]]) > > > The magic is in In[27]. We reshape the array of vectors to be > compatible with the shape of the array of matrices. When we multiply > the two together, it is as if we multiplied two (n,3,3) matrices, the > latter being the vectors repeated 3 times. Then we sum along the rows > of each of the product matrices to get the desired dot product. Thanks! That'll do nicely. For large matrices, that could be problematic due to the blowup in intermediate memory, but on the other hand for large matrices a loop through the toplevel index wouldn't add much cost. > PS: Are you perchance the Geoffrey Irving I knew at CalTech, class of '03? Yep. That would answer the question I had when I started reading this email. However, it's spelled Caltech, not CalTech! Geoffrey From robert.kern at gmail.com Thu Feb 28 19:55:11 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Feb 2008 18:55:11 -0600 Subject: [Numpy-discussion] arrays of matrices In-Reply-To: <20080229004308.GH5991@pixar.com> References: <20080228223447.GD5991@pixar.com> <3d375d730802281557g3ea8639frfb1d8ecd1456ca65@mail.gmail.com> <20080229004308.GH5991@pixar.com> Message-ID: <3d375d730802281655o1d6fe4d3w4ce99c4be1ec4299@mail.gmail.com> On Thu, Feb 28, 2008 at 6:43 PM, Geoffrey Irving wrote: > > The magic is in In[27]. We reshape the array of vectors to be > > compatible with the shape of the array of matrices. When we multiply > > the two together, it is as if we multiplied two (n,3,3) matrices, the > > latter being the vectors repeated 3 times. Then we sum along the rows > > of each of the product matrices to get the desired dot product. > > Thanks! That'll do nicely. > > For large matrices, that could be problematic due to the blowup in > intermediate memory, but on the other hand for large matrices a loop > through the toplevel index wouldn't add much cost. If you really want to save memory and you can destroy A, then you could do the multiplication in-place. If you really want to get fancy and can destroy b, you can use it as storage for the summation output, too. In [11]: A *= b.reshape([n,1,3]) In [12]: c = A.sum(axis=-1, out=b) In [13]: b Out[13]: array([[ 50, 140, 230], [ 1220, 1580, 1940], [ 4010, 4640, 5270], [ 8420, 9320, 10220], [14450, 15620, 16790]]) In [14]: c is b Out[14]: True > > PS: Are you perchance the Geoffrey Irving I knew at CalTech, class of '03? > > Yep. That would answer the question I had when I started reading this email. > However, it's spelled Caltech, not CalTech! Yeah, yeah, yeah. The Wikis, they have taken over my finger ReFlexes. NumPy Rudds += 1. Take that, Tim Hochberg! :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From irving at pixar.com Thu Feb 28 20:21:27 2008 From: irving at pixar.com (Geoffrey Irving) Date: Thu, 28 Feb 2008 17:21:27 -0800 Subject: [Numpy-discussion] arrays of matrices In-Reply-To: <3d375d730802281655o1d6fe4d3w4ce99c4be1ec4299@mail.gmail.com> References: <20080228223447.GD5991@pixar.com> <3d375d730802281557g3ea8639frfb1d8ecd1456ca65@mail.gmail.com> <20080229004308.GH5991@pixar.com> <3d375d730802281655o1d6fe4d3w4ce99c4be1ec4299@mail.gmail.com> Message-ID: <20080229012126.GI5991@pixar.com> On Thu, Feb 28, 2008 at 06:55:11PM -0600, Robert Kern wrote: > On Thu, Feb 28, 2008 at 6:43 PM, Geoffrey Irving wrote: > > > The magic is in In[27]. We reshape the array of vectors to be > > > compatible with the shape of the array of matrices. When we multiply > > > the two together, it is as if we multiplied two (n,3,3) matrices, the > > > latter being the vectors repeated 3 times. Then we sum along the rows > > > of each of the product matrices to get the desired dot product. > > > > Thanks! That'll do nicely. > > > > For large matrices, that could be problematic due to the blowup in > > intermediate memory, but on the other hand for large matrices a loop > > through the toplevel index wouldn't add much cost. > > If you really want to save memory and you can destroy A, then you > could do the multiplication in-place. If you really want to get fancy > and can destroy b, you can use it as storage for the summation output, > too. > > In [11]: A *= b.reshape([n,1,3]) > > In [12]: c = A.sum(axis=-1, out=b) > > In [13]: b > Out[13]: > array([[ 50, 140, 230], > [ 1220, 1580, 1940], > [ 4010, 4640, 5270], > [ 8420, 9320, 10220], > [14450, 15620, 16790]]) > > In [14]: c is b > Out[14]: True By large I meant if the array shapes are (n,i,j), (n,j,k), where all indices are large. The permanent memory cost is O(nij+njk), but O(nijk) flops are required, and the broadcast/sum solution would require intermediate storage for each one. It doesn't matter in my case though, so it's just a curiosity. > > > PS: Are you perchance the Geoffrey Irving I knew at CalTech, class of '03? > > > > Yep. That would answer the question I had when I started reading this email. > > However, it's spelled Caltech, not CalTech! > > Yeah, yeah, yeah. The Wikis, they have taken over my finger ReFlexes. > > NumPy Rudds += 1. Take that, Tim Hochberg! :-) There are a bunch of techers here (from further back than us), but I may be the only one actively using numpy at the moment. It would be very painful to have to lug large mesh data in python around without it. And I wouldn't have the pleasure of writing stuff like this: # triangulate a polygon mesh, given as (counts,vertices,points) # counts is the number of vertices in each polygon, and vertices # are the concatenated vertex indices of each polygon triangleFace = numpy.arange(len(counts)).repeat(counts-2) vertex1 = (numpy.add.accumulate(counts)-counts)[triangleFace] vertex2 = numpy.arange(len(triangleFace)) + 2*triangleFace + 1 triangles = vertices[numpy.vstack([vertex1,vertex2,vertex2+1]).transpose()] Geoffrey From andrea.gavana at gmail.com Fri Feb 29 02:34:11 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 29 Feb 2008 08:34:11 +0100 Subject: [Numpy-discussion] I,J,K Coordinates from Cell ID In-Reply-To: References: Message-ID: Hi Timothy, On Fri, Feb 29, 2008 at 12:16 AM, Timothy Hochberg wrote: > On Thu, Feb 28, 2008 at 1:47 PM, Andrea Gavana > wrote: > > Hi All, > > > > I have some problems in figuring out a solution for an issue I am > > trying to solve. I have a 3D grid of dimension Nx, Ny, Nz; for every > > cell of this grid, I calculate the cell centroids (with the cell > > coordinates x, y, and z) and then I try to find which cell centroid is > > the closest to a specified point in 3D (which I supply). I still > > haven't figured out how to do this, even if I have some ideas (and > > suggestions for this problem are welcome :-D ). But the problem is > > another one. > > When I find the closest centroid, I know only the cell ID of this > > centroid. The cell ID is (usually) defined as: > > > > ID = (K-1)*Nx*Ny + (J-1)*Nx + I - 1 > > > > Where I, J, K are the cell indexes. Now, the problem is, how can I > > calculate back the I, J, K indexes knowing only the cell ID? I am > > trying to solve this using numpy (as my grid is stored using a numpy > > array), but it's something akin to the Matlab function ind2sub... > > > > Thank you for your suggestions. > > I think you are going to want to use mod (aka '%'). Something like: > > >>> def coord(id): > ... return (id % NX + 1, id // NX % NY + 1, id // (NX*NY) + 1 ) > > should work. I believe there are a few things you could do to improve the > efficiency here, but try this and see if it works for you before you worry > about that. Thank you for the suggestion. I was so concentrated on another kind of solution that I didn't even think about your approach, which is far more elegant than mine. > Note that the above definition for cell ID is probably a little weird in the > context of numpy where all of the indexing starts at zero. Yep, but actually this is perfect for my case as I have to input those numbers in a reservoir simulator where all of the indexing starts at 1 :-D . It's perfect. Thank you very much. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From eads at soe.ucsc.edu Fri Feb 29 06:02:05 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Fri, 29 Feb 2008 04:02:05 -0700 Subject: [Numpy-discussion] A little help please? In-Reply-To: References: <47C42CB2.7080007@enthought.com> <47C578D1.5060307@enthought.com> Message-ID: <47C7E62D.40808@soe.ucsc.edu> Neal Becker wrote: > Sounds like this needs a bit of re-thinking. > > Given a set of function signatures: > F(a,b,c) > F(d,e,f) > ... > > The user calls: > F(A,B,C) (no relation between a,A ,etc) > > How do we find the 'best' match? > > I think we can start with: > Rules: > 1) Only allowed (at most) 1 conversion on each argument > > But what is the 'best' match mean? > > I think that can't be decided without some sort of hierarchical relation of > the types. > > Now given a hierarchy, I still don't know the solution, but it sounds like > some kind of graph algorithm. Dear Neal, This response is probably a bit off-topic but I thought I'd provide this nugget to help guide your search for information. This topic has been heavily studied by theoretical computer scientists. Benjamin Pierce's Types and Programming Languages (MIT Press, 2002) is an excellent text gives a detailed review of the state of the art in the field of Type Systems. Finding the "best" match or an unambiguous one can be difficult depending on how the language is defined. Many type systems are defined by a set of typing rules and axioms. Types of expressions are often resolved by generating a proof that these rules are met. I've always found the subject of static typing interesting because it deals with how to make type guarantees of a program before it is evaluated or compiled. The subject to which you're referring is called dispatch. Many common OOP languages (e.g. C++, Java) support only a rudimentary dispatch algorithm where the method branch chosen for dispatch is determined at run-time by examining the type of the target object on which the method is invoked. The types of the arguments are only examined statically at compile or type-check time. Languages supporting multiple dispatch (variants of Java like MultiJava, Charming Python, etc.) examine the types of the arguments at run-time, and some multimethod type checkers can prove whether a program will ever encounter an ambiguous choice between branches before the program is ever run or compiled. I hope this helps. Damian From pearu at cens.ioc.ee Fri Feb 29 07:12:40 2008 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri, 29 Feb 2008 14:12:40 +0200 (EET) Subject: [Numpy-discussion] ANN: sympycore version 0.1 released Message-ID: <46012.129.240.228.53.1204287160.squirrel@cens.ioc.ee> We are proud to present a new Python package: sympycore - an efficient pure Python Computer Algebra System Sympycore is available for download from http://sympycore.googlecode.com/ Sympycore is released under the New BSD License. Sympycore provides efficient data structures for representing symbolic expressions and methods to manipulate them. Sympycore uses a very clear algebra oriented design that can be easily extended. Sympycore is a pure Python package with no external dependencies, it requires Python version 2.5 or higher to run. Sympycore uses Mpmath for fast arbitrary-precision floating-point arithmetic that is included into sympycore package. Sympycore is to our knowledge the most efficient pure Python implementation of a Computer Algebra System. Its speed is comparable to Computer Algebra Systems implemented in compiled languages. Some comparison benchmarks are available in * http://code.google.com/p/sympycore/wiki/Performance * http://code.google.com/p/sympycore/wiki/PerformanceHistory and it is our aim to continue seeking for more efficient ways to manipulate symbolic expressions: http://cens.ioc.ee/~pearu/sympycore_bench/ Sympycore version 0.1 provides the following features: * symbolic arithmetic operations * basic expression manipulation methods: expanding, substituting, and pattern matching. * primitive algebra to represent unevaluated symbolic expressions * calculus algebra of symbolic expressions, unevaluated elementary functions, differentiation and polynomial integration methods * univariate and multivariate polynomial rings * matrix rings * expressions with physical units * SympyCore User's Guide and API Docs are available online. Take a look at the demo for sympycore 0.1 release: http://sympycore.googlecode.com/svn/trunk/doc/html/demo0_1.html However, one should be aware that sympycore does not implement many features that other Computer Algebra Systems do. The version number 0.1 speaks for itself:) Sympycore is inspired by many attempts to implement CAS for Python and it is created to fix SymPy performance and robustness issues. Sympycore does not yet have nearly as many features as SymPy. Our goal is to work on in direction of merging the efforts with the SymPy project in the near future. Enjoy! * Pearu Peterson * Fredrik Johansson Acknowledgments: * The work of Pearu Peterson on the SympyCore project is supported by a Center of Excellence grant from the Norwegian Research Council to Center for Biomedical Computing at Simula Research Laboratory. From devnew at gmail.com Fri Feb 29 09:54:49 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Fri, 29 Feb 2008 06:54:49 -0800 (PST) Subject: [Numpy-discussion] image to array doubt Message-ID: <93ea57db-12df-4531-a9d7-50a51eb8fb49@e25g2000prg.googlegroups.com> hi i came across a codebase by rice univ people..in that there are some functions for conversion btw image and vectors 1. def image_to_vector(self, filename): try: im = Image.open(filename) except IOError: print 'couldn\'t load ' + filename sys.exit(1) self.im_size = im.size a = numpy.array(im.getdata()) return a - 128 what is the purpose of a-128 here? i couldn't quite understand 2. def vector_to_image(self, v, filename): v.shape = (-1,) a, b = min(v), max(v) span = max(abs(b), abs(a)) im = Image.new('L', self.im_size) im.putdata((v * 127. / span) + 128) im.save(filename) and why the calculations of a,b span? why do you need the extra maths ops inside im.putdata() I couldn't quite follow it can someone explain this? thanks D From robince at gmail.com Fri Feb 29 10:25:33 2008 From: robince at gmail.com (Robin) Date: Fri, 29 Feb 2008 15:25:33 +0000 Subject: [Numpy-discussion] image to array doubt In-Reply-To: <93ea57db-12df-4531-a9d7-50a51eb8fb49@e25g2000prg.googlegroups.com> References: <93ea57db-12df-4531-a9d7-50a51eb8fb49@e25g2000prg.googlegroups.com> Message-ID: On Fri, Feb 29, 2008 at 2:54 PM, devnew at gmail.com wrote: > hi > i came across a codebase by rice univ people..in that there are some > functions for conversion btw image and vectors I'm not an expert by any means but I thought I'd try and help... > 1. > def image_to_vector(self, filename): > try: > im = Image.open(filename) > except IOError: > print 'couldn\'t load ' + filename > sys.exit(1) > self.im_size = im.size > a = numpy.array(im.getdata()) > return a - 128 > > > what is the purpose of a-128 here? i couldn't quite understand It looks to me like im.getdata() is returning an unsigned 8 bit integer (0 to 256), which they are converting to a signed integer by subtracting 128. > > 2. > def vector_to_image(self, v, filename): > v.shape = (-1,) > a, b = min(v), max(v) > span = max(abs(b), abs(a)) > im = Image.new('L', self.im_size) > im.putdata((v * 127. / span) + 128) > im.save(filename) > > > and why the calculations of a,b span? why do you need the extra maths > ops inside im.putdata() I couldn't quite follow it Similarly, here it looks like they are rescaling whatever is in v to an unsigned 8-bit range (0-256). The span gives the absolute maximum value of the input data, so v/span is rescaled to have maximum value 1, so (v * 127. / span) is the (signed) input vector rescaled to have values in the range [-127,127]. Adding 128 makes unsigned again in [0,256]. I'm not sure why they would be doing this - to me it looks they might be using Image as a convenient way to store some other kind of data... HTH, Robin From jdh2358 at gmail.com Fri Feb 29 12:53:09 2008 From: jdh2358 at gmail.com (John Hunter) Date: Fri, 29 Feb 2008 11:53:09 -0600 Subject: [Numpy-discussion] contiguous true Message-ID: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> [apologies if this is a resend, my mail just flaked out] I have a boolean array and would like to find the lowest index "ind" where N contiguous elements are all True. Eg, if x is In [101]: x = np.random.rand(20)>.4 In [102]: x Out[102]: array([False, True, True, False, False, True, True, False, False, True, False, True, False, True, True, True, False, True, False, True], dtype=bool) I would like to find ind=1 for N=2 and ind=13 for N=2. I assume with the right cumsum, diff and maybe repeat magic, this can be vectorized, but the proper incantation is escaping me. for N==3, I thought of In [110]: x = x.astype(int) In [112]: y = x[:-2] + x[1:-1] + x[2:] In [125]: ind = (y==3).nonzero()[0] In [126]: if len(ind): ind = ind[0] In [128]: ind Out[128]: 13 Thanks, JDH From devnew at gmail.com Fri Feb 29 12:55:47 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Fri, 29 Feb 2008 09:55:47 -0800 (PST) Subject: [Numpy-discussion] image to array doubt In-Reply-To: References: <93ea57db-12df-4531-a9d7-50a51eb8fb49@e25g2000prg.googlegroups.com> Message-ID: > Robin wrote > I'm not sure why they would be doing this - to me it looks they might > be using Image as a convenient way to store some other kind of data... thanks Robin, I am wondering if there is a more straightforward way to do these.. especially the vector to image function D From robert.kern at gmail.com Fri Feb 29 13:13:04 2008 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 29 Feb 2008 12:13:04 -0600 Subject: [Numpy-discussion] contiguous true In-Reply-To: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> References: <88e473830802290953y1ec21d95ic437e43971b316ba@mail.gmail.com> Message-ID: <3d375d730802291013v530f61a6i6b331a1304075dce@mail.gmail.com> On Fri, Feb 29, 2008 at 11:53 AM, John Hunter wrote: > [apologies if this is a resend, my mail just flaked out] > > I have a boolean array and would like to find the lowest index "ind" > where N contiguous elements are all True. Eg, if x is > > In [101]: x = np.random.rand(20)>.4 > > In [102]: x > Out[102]: > array([False, True, True, False, False, True, True, False, False, > True, False, True, False, True, True, True, False, True, > False, True], dtype=bool) > > I would like to find ind=1 for N=2 and ind=13 for N=2. I assume with > the right cumsum, diff and maybe repeat magic, this can be vectorized, > but the proper incantation is escaping me. For smallish N (< 100 perhaps), I'd do something like this: In [57]: from numpy import * In [58]: prng = random.RandomState(1234567890) In [59]: x = prng.random_sample(50) < 0.5 In [60]: x Out[60]: array([False, False, False, False, True, False, True, False, False, False, True, False, True, False, True, True, True, True, True, False, False, False, True, False, True, False, False, False, True, True, True, True, False, False, True, False, False, False, False, False, False, False, False, True, False, False, True, False, True, False], dtype=bool) In [61]: N = 2 In [62]: mask = ones(len(x) - N + 1, dtype=bool) In [63]: for i in range(N): ....: mask &= x[i:len(x)-N+1+i] ....: ....: In [64]: mask Out[64]: array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, False, False, False, False, False, False, False, False, False, False, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], dtype=bool) In [65]: nonzero(mask)[0][0] Out[65]: 14 In [66]: x[13:20] Out[66]: array([False, True, True, True, True, True, False], dtype=bool) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From strawman at astraw.com Fri Feb 29 13:18:47 2008 From: strawman at astraw.com (Andrew Straw) Date: Fri, 29 Feb 2008 10:18:47 -0800 Subject: [Numpy-discussion] image to array doubt In-Reply-To: References: <93ea57db-12df-4531-a9d7-50a51eb8fb49@e25g2000prg.googlegroups.com> Message-ID: <47C84C87.7020500@astraw.com> devnew at gmail.com wrote: >> Robin wrote >> I'm not sure why they would be doing this - to me it looks they might >> be using Image as a convenient way to store some other kind of data... > > thanks Robin, > I am wondering if there is a more straightforward way to do these.. > especially the vector to image function > D Check out scipy.misc.pilutil.imread() and imsave() From devnew at gmail.com Fri Feb 29 14:15:00 2008 From: devnew at gmail.com (devnew at gmail.com) Date: Fri, 29 Feb 2008 11:15:00 -0800 (PST) Subject: [Numpy-discussion] PCA on set of face images Message-ID: hi guys I have a set of face images with which i want to do face recognition using Petland's PCA method.I gathered these steps from their docs 1.represent matrix of face images data 2.find the adjusted matrix by substracting the mean face 3.calculate covariance matrix (cov=A* A_transpose) where A is from step2 4.find eigenvectors and select those with highest eigenvalues 5.calculate facespace=eigenvectors*A when it comes to implementation i have doubts as to how i should represent the matrix of face images? using PIL image.getdata() i can make an array of each greyscale image. Should the matrix be like each row contains an array representing an image? That will make a matrix with rows=numimages and columns=numpixels cavariancematrix =A *A_transpose will create a square matrix of shape(numimages,numimages) Using numpy.linalg.eigh(covariancematrix) will give eigenvectors of same shape as the covariance matrix. I would like to know if this is the correct way to do this..I have no big expertise in linear algebra so i would be grateful if someone can confirm the right way of doing this RoyG From peter.skomoroch at gmail.com Fri Feb 29 14:50:14 2008 From: peter.skomoroch at gmail.com (Peter Skomoroch) Date: Fri, 29 Feb 2008 14:50:14 -0500 Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: References: Message-ID: RoyG, The timing of your question couldn't be better, I just did an blog post on this (I also plugged scipy and the EPD): http://www.datawrangling.com/python-montage-code-for-displaying-arrays.html The code basically replicates the matlab montage() function and approach to handling grayscale images using matplotlib. -Pete On Fri, Feb 29, 2008 at 2:15 PM, devnew at gmail.com wrote: > hi guys > I have a set of face images with which i want to do face recognition > using Petland's PCA method.I gathered these steps from their docs > > 1.represent matrix of face images data > 2.find the adjusted matrix by substracting the mean face > 3.calculate covariance matrix (cov=A* A_transpose) where A is from > step2 > 4.find eigenvectors and select those with highest eigenvalues > 5.calculate facespace=eigenvectors*A > > > when it comes to implementation i have doubts as to how i should > represent the matrix of face images? > using PIL image.getdata() i can make an array of each greyscale image. > Should the matrix be like each row contains an array representing an > image? That will make a matrix with rows=numimages and > columns=numpixels > > cavariancematrix =A *A_transpose will create a square matrix of > shape(numimages,numimages) > Using numpy.linalg.eigh(covariancematrix) will give eigenvectors of > same shape as the covariance matrix. > > I would like to know if this is the correct way to do this..I have no > big expertise in linear algebra so i would be grateful if someone can > confirm the right way of doing this > > RoyG > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Peter N. Skomoroch peter.skomoroch at gmail.com http://www.datawrangling.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.skomoroch at gmail.com Fri Feb 29 14:56:20 2008 From: peter.skomoroch at gmail.com (Peter Skomoroch) Date: Fri, 29 Feb 2008 14:56:20 -0500 Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: References: Message-ID: Here is the page I referenced for the octave version ... it includes examples very similar to what you want. I will be posting a very similar example in Python later this month. I don't have any Python code on hand for the Petland paper, but I think matlab example should be easy to translate to scipy/matplotlib using the montage function: load faces.mat %Form covariance matrix C=cov(faces'); %build eigenvectors and eigenvalues [E,D] = eig(C); %sort based on eigenvalue [B,index] = sortrows(D'); E2=E'; E2(index,:)'; eigensorted=E2(index,:)'; %show eigenfaces clear Z; for i=1:length(eigensorted) Z(:,:,1,i)=reshape(eigensorted(:,i)-1.5*min(min(min(eigensorted))), 19,19); end montage(Z) %show top 16 eigenfaces clear Z; for i=1:16 Z(:,:,1,i)=reshape(eigensorted(:,i)-min(min(min(eigensorted))), 19,19); end montage(Z) On Fri, Feb 29, 2008 at 2:50 PM, Peter Skomoroch wrote: > RoyG, > > The timing of your question couldn't be better, I just did an blog post on > this (I also plugged scipy and the EPD): > > > http://www.datawrangling.com/python-montage-code-for-displaying-arrays.html > > The code basically replicates the matlab montage() function and approach > to handling grayscale images using matplotlib. > > -Pete > > > On Fri, Feb 29, 2008 at 2:15 PM, devnew at gmail.com > wrote: > > > hi guys > > I have a set of face images with which i want to do face recognition > > using Petland's PCA method.I gathered these steps from their docs > > > > 1.represent matrix of face images data > > 2.find the adjusted matrix by substracting the mean face > > 3.calculate covariance matrix (cov=A* A_transpose) where A is from > > step2 > > 4.find eigenvectors and select those with highest eigenvalues > > 5.calculate facespace=eigenvectors*A > > > > > > when it comes to implementation i have doubts as to how i should > > represent the matrix of face images? > > using PIL image.getdata() i can make an array of each greyscale image. > > Should the matrix be like each row contains an array representing an > > image? That will make a matrix with rows=numimages and > > columns=numpixels > > > > cavariancematrix =A *A_transpose will create a square matrix of > > shape(numimages,numimages) > > Using numpy.linalg.eigh(covariancematrix) will give eigenvectors of > > same shape as the covariance matrix. > > > > I would like to know if this is the correct way to do this..I have no > > big expertise in linear algebra so i would be grateful if someone can > > confirm the right way of doing this > > > > RoyG > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Peter N. Skomoroch > peter.skomoroch at gmail.com > http://www.datawrangling.com -- Peter N. Skomoroch peter.skomoroch at gmail.com http://www.datawrangling.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.skomoroch at gmail.com Fri Feb 29 14:57:18 2008 From: peter.skomoroch at gmail.com (Peter Skomoroch) Date: Fri, 29 Feb 2008 14:57:18 -0500 Subject: [Numpy-discussion] PCA on set of face images In-Reply-To: References: Message-ID: Forgot the url: http://www.cis.hut.fi/Opinnot/T-61.2010/harjoitustyo_en07.shtml On Fri, Feb 29, 2008 at 2:56 PM, Peter Skomoroch wrote: > Here is the page I referenced for the octave version ... it includes > examples very similar to what you want. I will be posting a very similar > example in Python later this month. > > I don't have any Python code on hand for the Petland paper, but I think > matlab example should be easy to translate to scipy/matplotlib using the > montage function: > > load faces.mat > %Form covariance matrix > C=cov(faces'); > %build eigenvectors and eigenvalues > [E,D] = eig(C); > %sort based on eigenvalue > [B,index] = sortrows(D'); > E2=E'; > E2(index,:)'; > eigensorted=E2(index,:)'; > %show eigenfaces > clear Z; > for i=1:length(eigensorted) > Z(:,:,1,i)=reshape(eigensorted(:,i)-1.5*min(min(min(eigensorted))), > 19,19); > end > montage(Z) > %show top 16 eigenfaces > > clear Z; > for i=1:16 > Z(:,:,1,i)=reshape(eigensorted(:,i)-min(min(min(eigensorted))), 19,19); > end > montage(Z) > > > > > On Fri, Feb 29, 2008 at 2:50 PM, Peter Skomoroch < > peter.skomoroch at gmail.com> wrote: > > > RoyG, > > > > The timing of your question couldn't be better, I just did an blog post > > on this (I also plugged scipy and the EPD): > > > > > > http://www.datawrangling.com/python-montage-code-for-displaying-arrays.html > > > > The code basically replicates the matlab montage() function and approach > > to handling grayscale images using matplotlib. > > > > -Pete > > > > > > On Fri, Feb 29, 2008 at 2:15 PM, devnew at gmail.com > > wrote: > > > > > hi guys > > > I have a set of face images with which i want to do face recognition > > > using Petland's PCA method.I gathered these steps from their docs > > > > > > 1.represent matrix of face images data > > > 2.find the adjusted matrix by substracting the mean face > > > 3.calculate covariance matrix (cov=A* A_transpose) where A is from > > > step2 > > > 4.find eigenvectors and select those with highest eigenvalues > > > 5.calculate facespace=eigenvectors*A > > > > > > > > > when it comes to implementation i have doubts as to how i should > > > represent the matrix of face images? > > > using PIL image.getdata() i can make an array of each greyscale image. > > > Should the matrix be like each row contains an array representing an > > > image? That will make a matrix with rows=numimages and > > > columns=numpixels > > > > > > cavariancematrix =A *A_transpose will create a square matrix of > > > shape(numimages,numimages) > > > Using numpy.linalg.eigh(covariancematrix) will give eigenvectors of > > > same shape as the covariance matrix. > > > > > > I would like to know if this is the correct way to do this..I have no > > > big expertise in linear algebra so i would be grateful if someone can > > > confirm the right way of doing this > > > > > > RoyG > > > _______________________________________________ > > > Numpy-discussion mailing list > > > Numpy-discussion at scipy.org > > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > -- > > Peter N. Skomoroch > > peter.skomoroch at gmail.com > > http://www.datawrangling.com > > > > > -- > Peter N. Skomoroch > peter.skomoroch at gmail.com > http://www.datawrangling.com > -- Peter N. Skomoroch peter.skomoroch at gmail.com http://www.datawrangling.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From subscriber100 at rjs.org Fri Feb 29 15:04:06 2008 From: subscriber100 at rjs.org (Ray Schumacher) Date: Fri, 29 Feb 2008 12:04:06 -0800 Subject: [Numpy-discussion] image to array doubt In-Reply-To: References: Message-ID: <6.2.3.4.2.20080229115250.04d75c58@rjs.org> At 10:00 AM 2/29/2008, you wrote: > > Robin wrote > > I'm not sure why they would be doing this - to me it looks they might > > be using Image as a convenient way to store some other kind of data... > >thanks Robin, >I am wondering if there is a more straightforward way to do these.. >especially the vector to image function I would normally suggest PIL, as it contains most all of these types of array<>image functions See: http://www.pythonware.com/library/pil/handbook/imagemath.htm http://www.pythonware.com/library/pil/handbook/imageops.htm etc... as well as the built-ins for numpy array objects http://effbot.org/zone/pil-changes-116.htm frombuffer, fromstring, fromarray, tostring etc. http://www.pythonware.com/library/pil/handbook/image.htm (I've used them for home astronomy projects, myself.) Ray Schumacher Blue Cove Interactive -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.2/1304 - Release Date: 2/29/2008 8:18 AM