From jjl at pobox.com Thu Nov 1 11:19:12 2001 From: jjl at pobox.com (John J. Lee) Date: Thu Nov 1 11:19:12 2001 Subject: [Numpy-discussion] RE: Numeric2 In-Reply-To: Message-ID: On Tue, 30 Oct 2001, Perry Greenfield wrote: [...] > > What is the current status of Numeric2? > > > We are in the process of putting it up on sourceforge now. [...] What does it do?? John From hungjunglu at yahoo.com Fri Nov 2 10:24:10 2001 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Fri Nov 2 10:24:10 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? Message-ID: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Hi, This is a tangential topic. Can someone give me pointers where to find freeware/shareware/commercial packages for linear algebra and probability calculations (e.g: Cholesky decomposition, eigenvalue & eigenvectors in diagonalization, interpolation, normal distribution, beta distribution, inverse cumulative normal function, etc.), and such that it uses assembly level optimization (I need highspeed, but on mundane Pentium 3 or Pentium 4 machines) and can be used in Windows platform and from Microsoft's Visual C++? I know mtxvec from www.dewresearch.com does something along these lines, but it seems like they are aiming for specific dev platforms (CBuilder and Delphi). thanks! Hung Jung __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From chrishbarker at home.net Fri Nov 2 11:40:07 2001 From: chrishbarker at home.net (Chris Barker) Date: Fri Nov 2 11:40:07 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? References: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Message-ID: <3BE2FADF.23D659E9@home.net> Hung Jung Lu wrote: > Can someone give me pointers where to find > freeware/shareware/commercial packages for linear > algebra and probability calculations (e.g: Cholesky > decomposition, eigenvalue & eigenvectors in > diagonalization, This sounds likeyou are looking for is LAPACK with a good BLAS. Do a web search, and you'll find lot's of pointers. interpolation, normal distribution, > beta distribution, inverse cumulative normal function, > etc.) I'm lost here. Perhaps someone else will have some pointers. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From jsaenz at wm.lc.ehu.es Mon Nov 5 00:56:09 2001 From: jsaenz at wm.lc.ehu.es (Jon Saenz) Date: Mon Nov 5 00:56:09 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? In-Reply-To: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Message-ID: On Fri, 2 Nov 2001, Hung Jung Lu wrote: > Can someone give me pointers where to find > freeware/shareware/commercial packages for linear > algebra and probability calculations (e.g: Cholesky > decomposition, eigenvalue & eigenvectors in > diagonalization, interpolation, normal distribution, > beta distribution, inverse cumulative normal function, > etc.), and such that it uses assembly level > optimization (I need highspeed, but on mundane Pentium > 3 or Pentium 4 machines) and can be used in Windows > platform and from Microsoft's Visual C++? For statistical distribution functions, you can check DCDFLIB.C: http://odin.mdacc.tmc.edu/anonftp/page_2.html It is C, not assembler. Jon Saenz. | Tfno: +34 946012445 Depto. Fisica Aplicada II | Fax: +34 944648500 Facultad de Ciencias. \\ Universidad del Pais Vasco \\ Apdo. 644 \\ 48080 - Bilbao \\ SPAIN From R.M.Everson at exeter.ac.uk Tue Nov 6 13:05:05 2001 From: R.M.Everson at exeter.ac.uk (R.M.Everson) Date: Tue Nov 6 13:05:05 2001 Subject: [Numpy-discussion] Sparse matrices Message-ID: Hello, Does anyone have a working sparse matrix module for Numeric 20.2.0 and Python 2.1 (or similar). I'm tryinng to get the version in the SciPy CVS tree to work - so far without success. I don't want anything particularly fancy -- not even sparse matrix inversion. Addition and multiplication would be fine. Thanks for any ideas/pointers/software etc! Cheers, Richard. -- Department of Computer Science, Exeter University Voice: +44 1392 264065 R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 From vanandel at atd.ucar.edu Tue Nov 6 13:15:04 2001 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Tue Nov 6 13:15:04 2001 Subject: [Numpy-discussion] MA - math operations do not preserve fill_value Message-ID: <3BE852CA.A18F9E5C@atd.ucar.edu> Using Python 2.1 and Numeric 20.2.1 on Redhat Linux 7.1 Shouldn't masked arrays preserve the fill value of their operands, if both operands have the same fill value? Otherwise, if I want to preserve the value of the fill_value, I have to write expressions like: d=masked_values((a+b),a.fill_value()) Here's a demonstration of the problem: >>> a = masked_values((1.0,2.0,3.0,4.0,-999.0), -999) >>> b = masked_values((-999.0,1.0,2.0,3.0,4.0), -999) >>> a array(data = [ 1., 2., 3., 4.,-999.,], mask = [0,0,0,0,1,], fill_value=-999) >>> b array(data = [-999., 1., 2., 3., 4.,], mask = [1,0,0,0,0,], fill_value=-999) >>> c=a+b >>> c array(data = [ 1.00000002e+20, 3.00000000e+00, 5.00000000e+00, 7.00000000e+00, 1.00000002e+20,], mask = [1,0,0,0,1,], fill_value=[ 1.00000002e+20,]) >>> d=masked_values((a+b),a.fill_value()) >>> d array(data = [-999., 3., 5., 7.,-999.,], mask = [1,0,0,0,1,], fill_value=-999) -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From roitblat at hawaii.edu Tue Nov 6 17:05:03 2001 From: roitblat at hawaii.edu (Herbert L. Roitblat) Date: Tue Nov 6 17:05:03 2001 Subject: [Numpy-discussion] Sparse matrices References: Message-ID: <055701c16727$b57fed90$8fd6afcf@pixi.com> Travis Oliphant has one. H. ----- Original Message ----- From: "R.M.Everson" To: Sent: Tuesday, November 06, 2001 11:03 AM Subject: [Numpy-discussion] Sparse matrices > > Hello, > > Does anyone have a working sparse matrix module for Numeric 20.2.0 and > Python 2.1 (or similar). I'm tryinng to get the version in the SciPy > CVS tree to work - so far without success. > > I don't want anything particularly fancy -- not even sparse matrix > inversion. Addition and multiplication would be fine. > > Thanks for any ideas/pointers/software etc! > > Cheers, > > Richard. > > -- > Department of Computer Science, Exeter University Voice: +44 1392 264065 > R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 > http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From jochen at jochen-kuepper.de Tue Nov 6 18:57:02 2001 From: jochen at jochen-kuepper.de (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Tue Nov 6 18:57:02 2001 Subject: [Numpy-discussion] Sparse matrices In-Reply-To: <055701c16727$b57fed90$8fd6afcf@pixi.com> References: <055701c16727$b57fed90$8fd6afcf@pixi.com> Message-ID: On Tue, 6 Nov 2001 15:01:18 -1000 Herbert L Roitblat wrote: Herbert> Travis Oliphant has one. Isn't that the one in SciPy? Herbert> ----- Original Message ----- Herbert> From: "R.M.Everson" Herbert> To: Herbert> Sent: Tuesday, November 06, 2001 11:03 AM Herbert> Subject: [Numpy-discussion] Sparse matrices >> Does anyone have a working sparse matrix module for Numeric 20.2.0 >> and Python 2.1 (or similar). I'm tryinng to get the version in the >> SciPy CVS tree to work - so far without success. Herbert, this inverse citing really is counterproductive on mls. Greetings, Jochen -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll From nwagner at mecha.uni-stuttgart.de Sun Nov 11 07:32:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Sun Nov 11 07:32:03 2001 Subject: [Numpy-discussion] RandomArray - random Message-ID: <3BEEA88E.742E9225@mecha.uni-stuttgart.de> Hi, I tried to produce a random matrix say Q (2ndof \times nsamp+1) with Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "modal.py", line 192, in ? Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) TypeError: random() takes exactly 1 argument (2 given) Does it require a new syntax to obtain a matrix consisting of uniformly distributed random numbers in the range +/- 1 ? Nils From paul at pfdubois.com Sun Nov 11 09:14:02 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sun Nov 11 09:14:02 2001 Subject: [Numpy-discussion] RandomArray - random In-Reply-To: <3BEEA88E.742E9225@mecha.uni-stuttgart.de> Message-ID: <000001c16ad3$f3e688a0$3d01a8c0@plstn1.sfba.home.com> Your reference to random is not fully qualified so I suppose you could be picking up some other random. But I just tried RandomArray.random((2,3)) and it worked fine. BTW you could just do 2.0*random((n,m))-1.0. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Nils Wagner Sent: Sunday, November 11, 2001 8:34 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] RandomArray - random Hi, I tried to produce a random matrix say Q (2ndof \times nsamp+1) with Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "modal.py", line 192, in ? Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) TypeError: random() takes exactly 1 argument (2 given) Does it require a new syntax to obtain a matrix consisting of uniformly distributed random numbers in the range +/- 1 ? Nils _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From nwagner at mecha.uni-stuttgart.de Mon Nov 12 04:01:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 12 04:01:03 2001 Subject: [Numpy-discussion] RandomArray - random References: <000001c16ad3$f3e688a0$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3BEFC88E.F87F363E@mecha.uni-stuttgart.de> "Paul F. Dubois" schrieb: > > Your reference to random is not fully qualified so I suppose you could > be picking up some other random. But I just tried > RandomArray.random((2,3)) and it worked fine. > > BTW you could just do 2.0*random((n,m))-1.0. > It seems to be a conflict with Vpython formerly Visualpython. http://cil.andrew.cmu.edu/projects/visual/index.html Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> from RandomArray import * >>> random((2,3)) array([[ 0.68769461, 0.33015978, 0.07285815], [ 0.20514929, 0.81925279, 0.50694615]]) >>> from visual import * Visual-2001-09-24 >>> random((2,3)) Traceback (most recent call last): File "", line 1, in ? TypeError: random() takes exactly 1 argument (2 given) >>> Nils > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Nils > Wagner > Sent: Sunday, November 11, 2001 8:34 AM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] RandomArray - random > > Hi, > > I tried to produce a random matrix say Q (2ndof \times nsamp+1) with > Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 > 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for > more information. > > Traceback (most recent call last): > File "modal.py", line 192, in ? > Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) > TypeError: random() takes exactly 1 argument (2 given) > > Does it require a new syntax to obtain a matrix consisting of uniformly > distributed random numbers in the range +/- 1 ? > > Nils > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From neelk at cswcasa.com Mon Nov 12 09:24:02 2001 From: neelk at cswcasa.com (Krishnaswami, Neel) Date: Mon Nov 12 09:24:02 2001 Subject: [Numpy-discussion] Building Numeric with Intel KML and mingw32 Message-ID: Hello, I'm trying to rebuild Numeric with the Intel Kernel Math Libraries. I've gotten Numeric building normally with the default BLAS libraries, but I'm not sure what I need to put into the libraries_dir_list and libraries_list variables in the setup.py file. I have the directories mkl\ia32\bin (contains the DLLs), mkl\ia32\lib (contains the lib*.a files), and mkl\include (contains the *.h files). Can anyone tells me what goes where? -- Neel Krishnaswami neelk at cswcasa.com From nwagner at mecha.uni-stuttgart.de Tue Nov 13 02:22:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Tue Nov 13 02:22:01 2001 Subject: [Numpy-discussion] Total least squares problem Message-ID: <3BF102C6.8C651D9E@mecha.uni-stuttgart.de> Hi, How do I solve a Total Least Squares problem in Numpy ? A small example would be appreciated. The TLS problem assumes an overdetermined set of linear equations AX = B, where both the data matrix A as well as the observation matrix B are inaccurate: Nils Reference: R.D.Fierro, G.H. Golub, P.C. Hansen, D.P.O'Leary, Regularization by truncated total least squares, SIAM J. Sci. Comput. Vol.18(4) 1997 pp. 1223-1241 From barnard at stat.harvard.edu Tue Nov 13 06:42:03 2001 From: barnard at stat.harvard.edu (barnard at stat.harvard.edu) Date: Tue Nov 13 06:42:03 2001 Subject: [Numpy-discussion] Small Bug in multiarray.c Message-ID: <15345.13522.866400.686203@aragorn.stat.harvard.edu> When attempting to compile the CVS version of Numpy using MSVC 6 under Windows 2000 I found a small error in multiarray.c: the doc string for arange contains newlines. The offending code begins one line # 1168. Simple removing the newlines from the string fixes the error. John ******************************** * John Barnard, Ph.D. * Senior Research Statistician * deCODE genetics * 1000 Winter Str., Suite 3100 * Waltham, MA 02451 * Phone (Direct) : (781) 290-5771 Ext. 27 * Phone (General) : (781) 466-8833 * Fax : (781) 466-8686 * Email: j.barnard at decode.com ******************************** From oliphant at ee.byu.edu Tue Nov 13 11:25:03 2001 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Nov 13 11:25:03 2001 Subject: [Numpy-discussion] Total least squares problem In-Reply-To: <3BF102C6.8C651D9E@mecha.uni-stuttgart.de> Message-ID: > > How do I solve a Total Least Squares problem in Numpy ? > A small example would be appreciated. > > The TLS problem assumes an overdetermined set of linear equations > AX = B, where both the data matrix A as well as the observation > matrix B are inaccurate: X, resids, rank, s = LinearAlgebra.linear_least_squares(A,B) -Travis From R.M.Everson at exeter.ac.uk Tue Nov 13 13:53:01 2001 From: R.M.Everson at exeter.ac.uk (R.M.Everson) Date: Tue Nov 13 13:53:01 2001 Subject: [Numpy-discussion] BLAS and innerproduct Message-ID: Hello, So far as I can tell Numeric.dot(), which uses innerproduct() from multiarraymodule.c doesn't call the BLAS, even if Numeric was compiled against native BLAS. This means (at least on my machine) that X = ones((150, 16384), 'd') C = dot(X, tranpose(X)) is about 15 times as slow as the comparable operations in Matlab (v6), which does, I think, use the native BLAS. I guess that multiarray.c is not particularly optimised to use the BLAS because of the difficulties of coping with all sorts of types (float32, int64 etc), and with non-contiguous arrays. The innerproduct is so basic to most of the work I use Numeric for that a speed up here would make a big difference. I'm thinking of patching multiarray.c to use the BLAS when it can, but before I start are there good reasons for doing something different? Any advice gratefully received! Cheers, Richard. -- Department of Computer Science, Exeter University Voice: +44 1392 264065 R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 From nwagner at mecha.uni-stuttgart.de Wed Nov 14 04:44:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Nov 14 04:44:03 2001 Subject: [Numpy-discussion] Total least squares problem References: Message-ID: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Travis Oliphant schrieb: > > > > > How do I solve a Total Least Squares problem in Numpy ? > > A small example would be appreciated. > > > > The TLS problem assumes an overdetermined set of linear equations > > AX = B, where both the data matrix A as well as the observation > > matrix B are inaccurate: > > X, resids, rank, s = LinearAlgebra.linear_least_squares(A,B) > > -Travis Travis, There is a difference between classical least squares (Numpy) and TLS (total least squares). I am attaching a small example for illustration. Nils -------------- next part -------------- from Numeric import * from LinearAlgebra import * A = zeros((6,3),Float) b = zeros((6,1),Float) # # Example by Van Huffel # http://www.netlib.org/vanhuffel/dtls-doc # A[0,0] = 0.80010002 A[0,1] = 0.39985167 A[0,2] = 0.60005390 A[1,0] = 0.29996484 A[1,1] = 0.69990689 A[1,2] = 0.39997269 A[2,0] = 0.49994235 A[2,1] = 0.60003167 A[2,2] = 0.20012361 A[3,0] = 0.90013643 A[3,1] = 0.20016919 A[3,2] = 0.79995025 A[4,0] = 0.39998539 A[4,1] = 0.80006338 A[4,2] = 0.49985474 A[5,0] = 0.20002274 A[5,1] = 0.90007114 A[5,2] = 0.70009777 b[0] = 0.89999446 b[1] = 0.82997570 b[2] = 0.79011189 b[3] = 0.85002662 b[4] = 0.99016399 b[5] = 0.10299439 print 'Solution of an overdetermined system of linear equations A x = b' print print 'A' print print A # print 'b' print print b # x, resids, rank, s = linear_least_squares(A,b) print print 'Least squares solution (Numpy)' print print x print print 'Computed rank',rank print print 'Sum of the squared residuals', resids print print 'Singular values of A in descending order' print print s # xtls = zeros((3,1),Float) # # total least squares solution given by Van Huffel # http://www.netlib.org/vanhuffel/dtls-doc # xtls[0] = 0.500254 xtls[1] = 0.800251 xtls[2] = 0.299492 print print 'Total least squares solution' print print xtls print print 'Residuals of LS (Numpy)' print print matrixmultiply(A,x)-b print print 'Residuals of TLS' print print matrixmultiply(A,xtls)-b print # # Least squares in Numpy A^\top A x = A^\top b # Atb = matrixmultiply(transpose(A),b) AtA = matrixmultiply(transpose(A),A) xls = solve_linear_equations(AtA,Atb) print print 'Least squares solution via normal equation' print print xls From hinsen at cnrs-orleans.fr Wed Nov 14 05:30:07 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Nov 14 05:30:07 2001 Subject: [Numpy-discussion] Total least squares problem In-Reply-To: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> References: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > There is a difference between classical least squares (Numpy) > and TLS (total least squares). Algorithmically speaking it is even a very different problem. I'd say the only reasonable (i.e. efficient) solution for NumPy is to implement the TLS algorithm in a C subroutine calling LAPACK routines for SVD etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From nwagner at mecha.uni-stuttgart.de Wed Nov 14 06:13:07 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Nov 14 06:13:07 2001 Subject: [Numpy-discussion] Total least squares problem References: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Message-ID: <3BF28365.53373B65@mecha.uni-stuttgart.de> Konrad Hinsen schrieb: > > Nils Wagner writes: > > > There is a difference between classical least squares (Numpy) > > and TLS (total least squares). > > Algorithmically speaking it is even a very different problem. I'd say > the only reasonable (i.e. efficient) solution for NumPy is to > implement the TLS algorithm in a C subroutine calling LAPACK routines > for SVD etc. > > Konrad. > -- There are two Fortran implementations of the TLS algorithm already available via http://www.netlib.org/vanhuffel/ . Moreover there is a tool called f2py that generates Python C/API modules for wrapping Fortran 77/90/95 codes to Python. Unfortunately I am not very familar with this tool. Therefore I need some advice for this. Thanks in advance Nils > ------------------------------------------------------------------------------- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------------- From nwagner at mecha.uni-stuttgart.de Thu Nov 15 01:14:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Thu Nov 15 01:14:01 2001 Subject: [Numpy-discussion] Numpy, BLAS, LAPACK, f2py Message-ID: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> Hi, I have installed f2py on my system for wrapping existing FORTRAN 77 codes to Python. Then I have gone through the following steps An example for using a TLS (total least squares routine) http://www.netlib.org/vanhuffel/ 2) Get dtsl.f with dependencies 3) Run f2py dtsl.f -m foo -h foo.pyf only: dtsl \ \ \ \________ just wrap dtsl function \ \ \______ create signature file \ \____ python module name \_____ Fortran 77 code 4) Edit foo.pyf to your specific needs (optional) 5) Run f2py foo.pyf \_____________ this will create Python C/API module foomodule.c 6) Run make -f Makefile-foo \_____________ this will build the module 7) In python: Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import foo Traceback (most recent call last): File "", line 1, in ? ImportError: ./foomodule.so: undefined symbol: dcopy_ >>> Any suggestions to solve this problem ? Nils There are prebuilt libraries of LAPACK and BLAS in /usr/lib -rw-r--r-- 1 root root 657706 Sep 24 01:00 libblas.a lrwxrwxrwx 1 root root 12 Okt 22 19:27 libblas.so -> libblas.so.2 lrwxrwxrwx 1 root root 16 Okt 22 19:27 libblas.so.2 -> libblas.so.2.2.0 -rwxr-xr-x 1 root root 559600 Sep 24 01:01 libblas.so.2.2.0 -rw-r--r-- 1 root root 5763150 Sep 24 01:00 liblapack.a lrwxrwxrwx 1 root root 14 Okt 22 19:27 liblapack.so -> liblapack.so.3 lrwxrwxrwx 1 root root 18 Okt 22 19:27 liblapack.so.3 -> liblapack.so.3.0.0 -rwxr-xr-x 1 root root 4826626 Sep 24 01:01 liblapack.so.3.0.0 From gvermeul at labs.polycnrs-gre.fr Thu Nov 15 01:28:02 2001 From: gvermeul at labs.polycnrs-gre.fr (Gerard Vermeulen) Date: Thu Nov 15 01:28:02 2001 Subject: [Numpy-discussion] Numpy, BLAS, LAPACK, f2py In-Reply-To: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> References: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> Message-ID: <01111510271301.11576@taco.polycnrs-gre.fr> Hi, Try to link in the blas library (there is a dcopy_ in my blas library, but better check the README first). best regards -- Gerard On Thursday 15 November 2001 11:15, Nils Wagner wrote: > Hi, > > I have installed f2py on my system for wrapping existing FORTRAN 77 > codes to Python. > Then I have gone through the following steps > > An example for using a TLS (total least squares routine) > http://www.netlib.org/vanhuffel/ > > 2) Get dtsl.f with dependencies > 3) Run > f2py dtsl.f -m foo -h foo.pyf only: dtsl > \ \ \ \________ just wrap dtsl function > \ \ \______ create signature file > \ \____ python module name > \_____ Fortran 77 code > 4) Edit foo.pyf to your specific needs (optional) > 5) Run > f2py foo.pyf > \_____________ this will create Python C/API module foomodule.c > 6) Run > make -f Makefile-foo > \_____________ this will build the module > 7) In python: > > Python 2.1.1 (#1, Sep 24 2001, 05:28:47) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > Type "copyright", "credits" or "license" for more information. > > >>> import foo > > Traceback (most recent call last): > File "", line 1, in ? > ImportError: ./foomodule.so: undefined symbol: dcopy_ > > > Any suggestions to solve this problem ? > > Nils > > There are prebuilt libraries of LAPACK and BLAS in /usr/lib > > -rw-r--r-- 1 root root 657706 Sep 24 01:00 libblas.a > lrwxrwxrwx 1 root root 12 Okt 22 19:27 libblas.so -> > libblas.so.2 > lrwxrwxrwx 1 root root 16 Okt 22 19:27 libblas.so.2 -> > libblas.so.2.2.0 > -rwxr-xr-x 1 root root 559600 Sep 24 01:01 libblas.so.2.2.0 > -rw-r--r-- 1 root root 5763150 Sep 24 01:00 liblapack.a > lrwxrwxrwx 1 root root 14 Okt 22 19:27 liblapack.so -> > liblapack.so.3 > lrwxrwxrwx 1 root root 18 Okt 22 19:27 liblapack.so.3 > -> liblapack.so.3.0.0 > -rwxr-xr-x 1 root root 4826626 Sep 24 01:01 > liblapack.so.3.0.0 > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From perry at stsci.edu Fri Nov 16 14:33:02 2001 From: perry at stsci.edu (Perry Greenfield) Date: Fri Nov 16 14:33:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: We have been working on a reimplementation of Numeric, the numeric array manipulation extension module for Python. The reimplementation is virtually a complete rewrite and because it is not completely backwards compatible with Numeric, we have dubbed it numarray to prevent confusion. While we think this version is not yet mature enough for most to use in everyday projects, we are interested in feedback on the user interface and the open issues (see the documents on the web page shown below). We also welcome those who would like to contribute to this effort by helping with the development or adding libraries. An early beta version is available on sourceforge as the package Numarray (http://sourceforge.net/projects/numpy/) Information on the goals, changes in user interface, open issues, and design can be found at http://aten.stsci.edu/numarray From pete at shinners.org Fri Nov 16 15:12:02 2001 From: pete at shinners.org (Pete Shinners) Date: Fri Nov 16 15:12:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BF59D10.2070107@shinners.org> Perry Greenfield wrote: > An early beta version is available on sourceforge as the > package Numarray (http://sourceforge.net/projects/numpy/) > > Information on the goals, changes in user interface, open issues, > and design can be found at http://aten.stsci.edu/numarray you ask a few questions on the information website, here are some of my answers for things i "care" about. note that my main use of numpy is as a pixel buffer for images. some of the changes like avoiding type promotion sounds really good to me :] 5) should the implementation be bulletproof for private vars? i don't think you should worry about this. as long as the interface is well defined, i wouldn't worry about protecting users from themselves. i this it will be the rare numarray user will be in a situation where they need to modify the internal C data. 7) necessary to add other types? yes. i really want unsigned int16 and unsigned int32. all my operations are on pixel data, and things can just get messy when i need to treat packed color values as signed integers. 8) negative and out-of-range indices? i'd prefer them to be kept as similar to python as can be. the current implementation in Numeric is nice for me. one other thing i'd like there to be a little focus on is adding my own new ufunc operators. for image manipulation i'd like new ufunc operators that clamp the results to legal values. i'd be happy to do this myself, but i don't believe it's possible with the current Numeric. the last thing i really really want is for this to be rolled into the standard python distribution. that is perhaps the most important aspect for me. i do not like requiring the extra dependency for generic numeric arrays. :] From oliphant.travis at ieee.org Fri Nov 16 18:42:02 2001 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Nov 16 18:42:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: References: Message-ID: > > While we think this version is not yet mature enough for > most to use in everyday projects, we are interested in > feedback on the user interface and the open issues (see > the documents on the web page shown below). We also welcome > those who would like to contribute to this effort by helping > with the development or adding libraries. > What I've seen looks great. You've all done some good work here. Of course, I do have some feedback. I haven't looked at everything, these points have just caught my eye. Complex Types: ============== 1) I don't like the idea of complex types being a separate subclass of ndarray. This makes them "different." Unless this "difference" can be completely hidden (which I doubt), I would prefer complex types to be on the same level as other numeric types. 2) Also, in your C-API, you have a different pointer to the imaginary data. I much prefer the way it is done currently to have complex numbers represented as an 8-byte, or 16-byte chunk of contiguous memory. Index Arrays: =========== 1) For what it's worth, my initial reaction to your indexing scheme is negative. I would prefer that if a = [[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]] then a[[1,3],[0,3]] returns the sub-matrix: [[ 4, 6], [ 12, 14] i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB works. I'm not sure what IDL does. If I understand your scheme, right now, then I would have to append an extra dimension to my indexing arrays to get this behavior, right? 2) I would like to be able to index the array in a flattenned sense as well (is that possible?) in other words, it would be nice if a[flat(9,10,11)] or something got me the elements 9,10,11 in a one-dimensional interpretation of the array. 3) Why can't you combine slice notation and indexing? Just interpret the slice as index array that would be created from using tha range operator on the same start, stop, and step objects. Is this the plan? That's all for now. I don't mean to be critical, I'm really impressed with what works so far. These are just some concerns I have right now. -Travis Oliphant From europax at home.com Sat Nov 17 08:06:02 2001 From: europax at home.com (Rob) Date: Sat Nov 17 08:06:02 2001 Subject: [Numpy-discussion] Numeric Python EM Project may need mirror Message-ID: <3BF68A67.C4963807@home.com> Hi all, I just got an email from @home yesterday, saying that all customers should back up their web pages, email, etc etc. I know they are in bankruptcy, but this email sounded ominous. I'm wondering if there is some kindly soul who would want to mirror this site. I'd really love to have this site on Starship Python, but haven't had any responses to emails to them. I'm continuously working on more code for the site so I'd hate to see it go down, even if temporarily. Sincerely, Rob. -- The Numeric Python EM Project www.members.home.net/europax From greenfield at home.com Sat Nov 17 14:58:02 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 14:58:02 2001 Subject: FW: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: -----Original Message----- > > What I've seen looks great. You've all done some good work here. > Thanks, you were origin of some of the ideas used. > Of course, I do have some feedback. I haven't looked at > everything, these > points have just caught my eye. > > Complex Types: > ============== > > 1) I don't like the idea of complex types being a separate subclass of > ndarray. This makes them "different." Unless this "difference" can be > completely hidden (which I doubt), I would prefer complex types > to be on the > same level as other numeric types. > I think that we also don't like that, and after doing the original, somewhat incomplete, implementation using the subclassed approach, I began to feel that implementing it in C (albeit using a different approach for the code generation) was probably easier and more elegant than what was done here. So you are very likely to see it integrated as a regular numeric type, with a more C-based implementation. > 2) Also, in your C-API, you have a different pointer to the > imaginary data. > I much prefer the way it is done currently to have complex numbers > represented as an 8-byte, or 16-byte chunk of contiguous memory. > Any reason not to allow both? (The pointer to the real can be interpreted as either a pointer to 8-byte or 16-byte quantities). It is true that figuring out the imaginary pointer from the real is trivial so I suppose it really isn't necessary. > > Index Arrays: > =========== > > 1) For what it's worth, my initial reaction to your indexing scheme is > negative. I would prefer that if > > a = [[1,2,3,4], > [5,6,7,8], > [9,10,11,12], > [13,14,15,16]] > > then > > a[[1,3],[0,3]] returns the sub-matrix: > > [[ 4, 6], > [ 12, 14] > > i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB > works. I'm > not sure what IDL does. > I'm afraid I don't understand the example. Could you elaborate a bit more how this is supposed to work? (Or is it possible there is an error? I would understand it if the result were [[5, 8],[13,16]] corresponding to the index pairs [[(1,0),(1,3)],[(3,0),(3,3)]]) > If I understand your scheme, right now, then I would have to > append an extra > dimension to my indexing arrays to get this behavior, right? > > 2) I would like to be able to index the array in a flattenned > sense as well > (is that possible?) in other words, it would be nice if > a[flat(9,10,11)] or > something got me the elements 9,10,11 in a one-dimensional > interpretation of > the array. > Why not: ravel(a)[[9,10,11]] ? > 3) Why can't you combine slice notation and indexing? Just interpret the > slice as index array that would be created from using tha range > operator on > the same start, stop, and step objects. Is this the plan? > I think that allowing slicing could be possible. But things were getting pretty complex as they were, and we wanted to see if there was agreement on how it was being done so far. It could be extended to handle slices, if there was a well defined interpretation. (I think there may be at least two possible interpretations considered). As for the above, sure, but of course the slice would have to be shape consistent with the other index arrays (under the current scheme). > That's all for now. I don't mean to be critical, I'm really > impressed with > what works so far. These are just some concerns I have right now. > > -Travis Oliphant > Thanks Travis, we're looking for constructive feedback, positive or negative. Perry From greenfield at home.com Sat Nov 17 16:28:02 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 16:28:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > > I think that we also don't like that, and after doing the original, > > somewhat incomplete, implementation using the subarray approach, > > I began to feel that implementing it in C (albiet using a different > > approach for the code generation) was probably easier and more > > elegant than what was done here. So you are very likely to see > > it integrated as a regular numeric type, with a more C-based > > implementation. > > Sounds good. Is development going to take place on the CVS > tree. If so, I > could help out by comitting changes directly. > > > > > > 2) Also, in your C-API, you have a different pointer to the > > > imaginary data. > > > I much prefer the way it is done currently to have complex numbers > > > represented as an 8-byte, or 16-byte chunk of contiguous memory. > > > > Any reason not to allow both? (The pointer to the real can be > interpreted > > as either a pointer to 8-byte or 16-byte quantities). It is true > > that figuring out the imaginary pointer from the real is trivial > > so I suppose it really isn't necessary. > > I guess the way you've structured the ndarray, it is possible. I figured > some operations might be faster, but perhaps not if you have two pointers > running at the same time, anyway. > Well, the C implementation I was thinking of would only use one pointer. The API could supply both if some algorithms would find it useful to just access the imaginary data alone. But as mentioned, I don't think it is important to include, so we could easily get rid of it (and probably should) > > > > > Index Arrays: > > > =========== > > > > > > 1) For what it's worth, my initial reaction to your indexing > scheme is > > > negative. I would prefer that if > > > > > > a = [[1,2,3,4], > > > [5,6,7,8], > > > [9,10,11,12], > > > [13,14,15,16]] > > > > > > then > > > > > > a[[1,3],[0,3]] returns the sub-matrix: > > > > > > [[ 4, 6], > > > [ 12, 14] > > > > > > i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB > > > works. I'm > > > not sure what IDL does. > > > > I'm afraid I don't understand the example. Could you elaborate > > a bit more how this is supposed to work? (Or is it possible > > there is an error? I would understand it if the result were > > [[5, 8],[13,16]] corresponding to the index pairs > > [[(1,0),(1,3)],[(3,0),(3,3)]]) > > > > The idea is to consider indexing with arrays of integers to be a > generalization of slice index notation. Simply interpret the > slice as an > array of integers that would be formed by using the range operator. > > For example, I would like to see > > a[1:5,1:3] be the same thing as a[[1,2,3,4],[1,2]] > > a[1:5,1:3] selects the 2-d subarray consisting of rows 1 to 4 and > columns 1 > to 2 (inclusive starting with the first row being row 0). In > other words, > the indices used to select the elements of a are ordered-pairs > taken from the > cross-product of the index set: > > [1,2,3,4] x [1,2] = [(1,1), (1,2), (2,1), (2,2), (3,1), (3,2), > (4,1), (4,2)] > and these selected elements are structured as a 2-d array of shape (4,2) > > Does this make more sense? Indexing would be a natural extension of this > behavior but allowing sets that can't be necessarily formed from > the range > function. > I understand this (but is the example in the first message consistent with this?). This is certainly a reasonable interpetation. But if this is the way multiple index arrays are interpreted, how does one easily specify scattered points in a multidimensional array? The only other alternative I can think of is to use some of the dimensions of a multidimensional index array as indicies for each of the dimensions. For example, if one wanted to index random points in a 2d array, then supplying an nx2 array would provide a list of n such points. But I see this as a more limiting way to do this (and there are often benefits to being able to keep the indices for different dimensions in separate arrays. But I think doing what you would like to do is straightforward even with the existing implementation. For example, if x is a 2d array we could easily develop a function such that: x[outer_index_product([1,3,4],[1,5])] # with a better function name! The function outer_index_product would return a tuple of two index arrays each with a shape of 3x2. These arrays would not take up more space than the original arrays even though they appear to have a much larger size (the one dimension is replicated by use of a 0 stride size so the data buffer is the same as the original). Would this be acceptable? In the end, all these indexing behaviors can be provided by different functions. So it isn't really a question of which one to have and which not to have. The question is what is supported by the indexing notation? For us, the behavior we have implemented is far more useful for our applications than the one you propose. But perhaps we are in the minority, so I'd be very interested in hearing which indexing interpretation is most useful to the general community. > > Why not: > > > > ravel(a)[[9,10,11]] ? > > sure, that would work, especially if ravel doesn't make a copy of > the data > (which I presume it does not). > Correct. Perry From greenfield at home.com Sat Nov 17 17:23:06 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 17:23:06 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: From: Pete Shinners > 7) necessary to add other types? > yes. i really want unsigned int16 and unsigned int32. all my operations > are on pixel data, and things can just get messy when i need to treat > packed color values as signed integers. > Unsigned int16 is already supported. UInt32 could be done, but raises some interesting issues with regard to combining with Int32. I don't believe the current implementation prevents you from carrying around unsigned data in Int32 arrays. If you are using them as packed color values, do you ever do any arithmetic operations on them other than to pack and unpack them? > one other thing i'd like there to be a little focus on is adding my own > new ufunc operators. for image manipulation i'd like new ufunc operators > that clamp the results to legal values. i'd be happy to do this myself, > but i don't believe it's possible with the current Numeric. > It will be possible for users to add their own ufuncs. We will eventually document how to do so (and it should be fairly simple to do once we give a few example templates). Perry > From alessandro.mirone at wanadoo.fr Sun Nov 18 07:42:01 2001 From: alessandro.mirone at wanadoo.fr (Alessandro Mirone) Date: Sun Nov 18 07:42:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken Message-ID: <3BF7E462.A473B686@wanadoo.fr> Is it a problem of lapack3.0 of of LinearAlgebra.py? ..................... ==> (Eigenvalues should be (0,2)) >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([-2.30277564, 1.30277564]) >>> print M [[ 1.+0.j 0.+1.j] [ 0.-1.j 1.+0.j]] >>> From oliphant.travis at ieee.org Sun Nov 18 19:01:01 2001 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Nov 18 19:01:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: <3BF7E462.A473B686@wanadoo.fr> References: <3BF7E462.A473B686@wanadoo.fr> Message-ID: On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > Is it a problem of lapack3.0 of of > LinearAlgebra.py? > ..................... ==> (Eigenvalues should be (0,2)) > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) I suspect it is your lapack. On an Athlon running Mandrake Linux with the lapack-3.0-9 package, I get. >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([ 0., 2.]) -Travis From nwagner at mecha.uni-stuttgart.de Sun Nov 18 23:58:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Sun Nov 18 23:58:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken References: <3BF7E462.A473B686@wanadoo.fr> Message-ID: <3BF8C9FA.97B3AEB1@mecha.uni-stuttgart.de> Alessandro Mirone schrieb: > > Is it a problem of lapack3.0 of of > LinearAlgebra.py? > ..................... ==> (Eigenvalues should be (0,2)) > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) > array([-2.30277564, 1.30277564]) > >>> print M > [[ 1.+0.j 0.+1.j] > [ 0.-1.j 1.+0.j]] > >>> > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion On an Athlon running SuSE Linux 7.3 with the lapack-3.0-0 package, I get. [-2.30277564 1.30277564] Nils From Peter.Verveer at embl-heidelberg.de Mon Nov 19 02:45:02 2001 From: Peter.Verveer at embl-heidelberg.de (Peter Verveer) Date: Mon Nov 19 02:45:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: <3BF59D10.2070107@shinners.org> References: <3BF59D10.2070107@shinners.org> Message-ID: On Saturday 17 November 2001 00:11 am, you wrote: > note that my main use of numpy is as a pixel buffer for images. some of > the changes like avoiding type promotion sounds really good to me :] I have exactly the same application so I agree with this. > 7) necessary to add other types? > yes. i really want unsigned int16 and unsigned int32. all my operations > are on pixel data, and things can just get messy when i need to treat > packed color values as signed integers. Yes please! One of the things that irritates me most on the original Numeric is that some types are lacking. I think the whole range of data types should be supported, even if some may be seldom used by most people. > one other thing i'd like there to be a little focus on is adding my own > new ufunc operators. for image manipulation i'd like new ufunc operators > that clamp the results to legal values. i'd be happy to do this myself, > but i don't believe it's possible with the current Numeric. I write functions in C that directly access the numeric data. I don't use the ufunc api. One reason that I do that is that I want my libary of routines to be useful independent of Numeric, so I only have a tiny glue between my C routines and Numeric. I hope that it will be still possible to do this in the new version. > the last thing i really really want is for this to be rolled into the > standard python distribution. that is perhaps the most important aspect > for me. i do not like requiring the extra dependency for generic numeric > arrays. :] I second that! Cheers, Peter -- Dr. Peter J. Verveer Bastiaens Group Cell Biology and Cell Biophysics Programme EMBL Meyerhofstrasse 1 D-69117 Heidelberg Germany Tel. : +49 6221 387245 Fax : +49 6221 387242 Email: Peter.Verveer at embl-heidelberg.de From tpitts at accentopto.com Mon Nov 19 05:58:03 2001 From: tpitts at accentopto.com (Todd Alan Pitts, Ph.D.) Date: Mon Nov 19 05:58:03 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: ; from oliphant.travis@ieee.org on Fri, Nov 16, 2001 at 07:43:41PM -0700 References: Message-ID: <20011119065758.B11653@fermi.accentopto.com> Thanks for all of your work. Things seem to be shaping up nicely. I just wanted to second some of the concerns below: > Complex Types: > ============== > > 1) I don't like the idea of complex types being a separate subclass of > ndarray. This makes them "different." Unless this "difference" can be > completely hidden (which I doubt), I would prefer complex types to be on the > same level as other numeric types. > > 2) Also, in your C-API, you have a different pointer to the imaginary data. > I much prefer the way it is done currently to have complex numbers > represented as an 8-byte, or 16-byte chunk of contiguous memory. > The second comment above is really critical for accessing utility available in a very large number of numerical libraries. In my view this would "break" the utility of numpy severely -- recopying arrays both on the way out and the way in would be extremely cumbersome. -Todd Alan Pitts From jh at oobleck.astro.cornell.edu Mon Nov 19 08:47:02 2001 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Mon Nov 19 08:47:02 2001 Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #345 - 4 msgs In-Reply-To: (numpy-discussion-request@lists.sourceforge.net) References: Message-ID: <200111191646.fAJGkCL28182@oobleck.astro.cornell.edu> Just to fill in the blanks, here's what IDL does: IDL> a = [[1,2,3,4], $ IDL> [5,6,7,8], $ IDL> [9,10,11,12], $ IDL> [13,14,15,16]] IDL> print,a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 IDL> print, a[[1,3],[0,3]] 2 16 --jh-- From jsw at cdc.noaa.gov Mon Nov 19 11:37:05 2001 From: jsw at cdc.noaa.gov (Jeff Whitaker) Date: Mon Nov 19 11:37:05 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: Message-ID: On Sun, 18 Nov 2001, Travis Oliphant wrote: > On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > > Is it a problem of lapack3.0 of of > > LinearAlgebra.py? > > ..................... ==> (Eigenvalues should be (0,2)) > > > > >>> a=array([[1,0],[0,1]]) > > >>> b=array([[0,1],[-1,0]]) > > >>> M=a+b*complex(0,1.0) > > >>> Heigenvalues(M) > > I suspect it is your lapack. On an Athlon running Mandrake Linux with the > lapack-3.0-9 package, I get. > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) > array([ 0., 2.]) This is definitely a hardware/compiler dependant feature. I get the "right" answer on Solaris (with the forte compiler) but the same "wrong" answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler options on my OS X box, to no avail. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : jsw at cdc.noaa.gov 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 From ransom at physics.mcgill.ca Mon Nov 19 11:47:02 2001 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Nov 19 11:47:02 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: References: Message-ID: On November 19, 2001 02:36 pm, Jeff Whitaker wrote: > > This is definitely a hardware/compiler dependant feature. I get the > "right" answer on Solaris (with the forte compiler) but the same "wrong" > answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler > options on my OS X box, to no avail. But seemingly it is even stranger than this. Here are my results from Debian unstable using Lapack 3.0 on an Athlon system: Python 2.1.1 (#1, Nov 11 2001, 18:19:24) [GCC 2.95.4 20011006 (Debian prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from LinearAlgebra import * >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([ 0., 2.]) Scott > On Sun, 18 Nov 2001, Travis Oliphant wrote: > > On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > > > Is it a problem of lapack3.0 of of > > > LinearAlgebra.py? > > > ..................... ==> (Eigenvalues should be (0,2)) > > > > > > >>> a=array([[1,0],[0,1]]) > > > >>> b=array([[0,1],[-1,0]]) > > > >>> M=a+b*complex(0,1.0) > > > >>> Heigenvalues(M) > > > > I suspect it is your lapack. On an Athlon running Mandrake Linux with > > the lapack-3.0-9 package, I get. > > > > >>> a=array([[1,0],[0,1]]) > > >>> b=array([[0,1],[-1,0]]) > > >>> M=a+b*complex(0,1.0) > > >>> Heigenvalues(M) > > > > array([ 0., 2.]) > > This is definitely a hardware/compiler dependant feature. I get the > "right" answer on Solaris (with the forte compiler) but the same "wrong" > answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler > options on my OS X box, to no avail. > > -Jeff > > -- > Jeffrey S. Whitaker Phone : (303)497-6313 > Meteorologist FAX : (303)497-6449 > NOAA/OAR/CDC R/CDC1 Email : jsw at cdc.noaa.gov > 325 Broadway Web : www.cdc.noaa.gov/~jsw > Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From Barrett at stsci.edu Mon Nov 19 14:12:02 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon Nov 19 14:12:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BF98336.9010500@STScI.Edu> Perry Greenfield wrote: > > An early beta version is available on sourceforge as the > package Numarray (http://sourceforge.net/projects/numpy/) > > Information on the goals, changes in user interface, open issues, > and design can be found at http://aten.stsci.edu/numarray 6) Should array properties be accessible as public attributes instead of through accessor methods? We don't currently allow public array attributes to make the Python code simpler and faster (otherwise we will be forced to use __setattr__ and such). This results in incompatibilty with previous code that uses such attributes. I prefer the use of public attributes over accessor methods. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From perry at stsci.edu Tue Nov 20 12:29:13 2001 From: perry at stsci.edu (Perry Greenfield) Date: Tue Nov 20 12:29:13 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > 6) Should array properties be accessible as public attributes > instead of through accessor methods? > > We don't currently allow public array attributes to make > the Python code simpler and faster (otherwise we will > be forced to use __setattr__ and such). This results in > incompatibilty with previous code that uses such attributes. > > > I prefer the use of public attributes over accessor methods. > > > -- > Paul Barrett, PhD Space Telescope Science Institute The issue of efficiency may not be a problem with Python 2.2 or later since it provides new mechanisms that avoid the need to use __setattr__ to solve this problem. (e.g. __slots__, property, __get__, and __set__). So it becomes more of an issue of which style people prefer rather than simplicity and speed of the code. Perry From chrishbarker at home.net Tue Nov 20 15:23:12 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue Nov 20 15:23:12 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BFAEA19.3153B495@home.net> Perry Greenfield wrote: > > One major comment that isn't directly addressed on the web page is the > > ease of writing new functions, I suppose Ufuncs, although I don't > > usually care if they work on anything other than Arrays. I hope the new > > system will make it easier to write new ones. > Absolutely. We will provide examples of how to write new ufuncs. It should > be very simple in one sense (requiring few lines of code) if our code > generator machinery is used (but context is important here so this > is why examples or a template is extremely important). But it isn't > particularly hard to do without the code generator. And such ufuncs > will handle *all* the generality of arrays including slices, non-aligned > arrays, byteswapped arrays, and type conversion. I'd like to provide > examples of writing ufuncs within a few weeks (along with examples > of other kinds of functions using the C-API as well). This sounds great! The code generting machinery sound very promising, and examples are, of course, key. I found digging through the NumPy source to figure out how to do things very treacherous. Making writing Ufuncs easy will enocourage a lot more C Ufuncs to be written which should help perfomance. > > Also, I can't help wondering if this could leverage more existing code. > > The blitz++ package being used by Eric Jones in the SciPy.compiler > > project looks very promising. It's probably too late, but I'm wondering > > what the reasons are for re-inventing such a general purpose wheel. > > > I'm not sure which "wheel" you are talking about :-) The wheel I'm talking about are multi-dimensional array objects... > We certainly > aren't trying to replilcate what Eric Jones has done with the > SciPy.compiler approach (which is very interesting in its own right). I know, I just think using an existing set of C++ classes for multiple typed multidimansional arrays would make sense, although I imagine it is too late now! > If the issue is why we are redoing Numeric: Actually, I think I had a pretty good idea why you were working on this. > 1) it has to be rewritten to be acceptable to Guido before it can be > part of the Standard Library. > 2) to add new types (e.g. unsigned) and representations (e.g., non-aligned, > byteswapped, odd strides, etc). Using memory mapped data requires some > of these. > 3) to make it more memory efficient with large arrays. > 4) to make it more generally extensible I'm particualry excited about 1) and 4) > > As a whole I have found that I would like the transition from Python to > > Compiled laguages to be smoother. The standard answer to Python > > perfomance is to profile, and then re-write the computationally intesive > > pertions in C. This would be a whole lot easier if Python used datatypes > > that are easy to use from C/C++ as well as Python. I hope NumPy2 can > > move in this direction. > > > What do you see as missing in numarray in that sense? Aside from UInt32 > I'm not aware of any missing type that is available on all platforms. > There is the issue of Float128 and such. Adding these is not hard. > The real issue is how to deal with the platforms that don't support them. I used Poor wording. When I wrote "datatypes", I meant data types in a much higher order sense. Perhaps structures or classes would be a better term. What I mean is that is should be easy to use an manipulate the same multidimensional arrays from both Python and C/C++. In the current Numeric, most folks generate a contiguous array, and then just use the array->data pointer to get what is essentially a C array. That's fine if you are using it in a traditional C way, with fixed dimension, one datatype, etc. What I'm imagining is having an object in C or C++ that could be easily used as a multidimentional array. I'm thinking C++ would probably neccesary, and probably templates as well, which is why blitz++ looked promising. Of course, blitz++ only compiles with a few up-to-date compilers, so you'd never get it into the standard library that way! This could also lead the way to being able to compile NumPy code.... > I think it is pretty easy to install since it use distutils. I agree, but from the newsgroup, it is clear that a lot of folks are very reluctant to use something that is not part of the standard library. > > > We estimate > > > that numarray is probably another order of magnitude worse, > > > i.e., that 20K element arrays are at half the asymptotic > > > speed. How much should this be improved? > > > > A lot. I use arrays smaller than that most of the time! > > > What is good enough? As fast as current Numeric? As fast as current Numeric would be "good enough" for me. It would be a shame to go backwards in performance! > (IDL does much > better than that for example). My personal benchmark is MATLAB, which I imagine is similar to IDL in performance. > 10 element arrays will never be > close to C speed in any array based language embedded in an > interpreted environment. Well, sure, I'm not expecting that > 100, maybe, but will be very hard. > 1000 should be possible with some work. I suppose MATLAB has it easier, as all arrays are doubles, and, (untill recently anyway), all variable where arrays, and all arrays were 2-d. NumPy is a lot more flexible that that. Is is the type and size checking that takes the time? > Another approach is to try to cast many of the functions as being > able to broadcast over repeated small arrays. After all, if one > is only doing a computation on one small array, it seems unlikely > that the overhead of Python will be objectionable. Only if you > have many such arrays to repeat calculations on, should it be > a problem (or am I wrong about that). You are probably right about that. > If these repeated calculations > can be "assembled" into a higher dimensionality array (which > I understand isn't always possible) and operated on in that sense, > the efficiency issue can be dealt with. I do that when possible, but it's not always possible. > But I guess this can only > be seen with specific existing examples and programs. I would > be interested in seeing the kinds of applications you have now > to gauge what the most effective solution would be. One of the things I do a lot with are coordinates of points and polygons. Sets if points I can handle easily as an NX2 array, but polygons don't work so well, as each polgon has a different number of points, so I use a list of arrays, which I have to loop over. Each polygon can have from about 10 to thousands of points (mostly 10-20, however). One way I have dealt with this is to store a polygon set as a large array of all the points, and another array with the indexes of the start and end of each polygon. That way I can transform the coordinates of all the polygons in one operation. It works OK, but sometimes it is more useful to have them in a sequence. > As mentioned, > we tend to deal with large data sets and so I don't think we have > a lot of such examples ourselves. I know large datasets were one of your driving factors, but I really don't want to make performance on smaller datasets secondary. I hope I'll get a chance to play with it soon.... -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From nwagner at mecha.uni-stuttgart.de Thu Nov 22 02:43:06 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Thu Nov 22 02:43:06 2001 Subject: [Numpy-discussion] Numpy for FORTRAN users Message-ID: <3BFCE508.E6C365DF@mecha.uni-stuttgart.de> Hi, Currently users must be aware of the fact that multi-dimensional arrays are stored differently in Python and Fortran. Is there any progress that users do not need to worry about this rather confusing and technical detail ? Nils From martin.wiechert at gmx.de Thu Nov 22 05:23:02 2001 From: martin.wiechert at gmx.de (Martin Wiechert) Date: Thu Nov 22 05:23:02 2001 Subject: [Numpy-discussion] Numpy2 and GSL Message-ID: Hi! Just an uneducated question. Are there any plans to wrap GSL for Numpy2? I did not actually try it (It's not Python ;-)), but it looks clean and powerful. Regards, Martin. From hinsen at cnrs-orleans.fr Thu Nov 22 06:29:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Nov 22 06:29:02 2001 Subject: [Numpy-discussion] Numpy2 and GSL In-Reply-To: References: Message-ID: Martin Wiechert writes: > Are there any plans to wrap GSL for Numpy2? > I did not actually try it (It's not Python ;-)), > but it looks clean and powerful. I have heard that several projects decided not to use it for legal reasons; GSL is GPL, not LGPL. Personally I don't see the problem for Python/NumPy, but then I am not a lawyer... And I haven't used GSL either, but it looks good from the description. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From edcjones at erols.com Thu Nov 22 17:30:10 2001 From: edcjones at erols.com (Edward C. Jones) Date: Thu Nov 22 17:30:10 2001 Subject: [Numpy-discussion] Numeric & changes in Python division Message-ID: <3BFDA742.5080109@erols.com> # Python 2.2b1, Numeric 20.2.0 from __future__ import division import Numeric arr = Numeric.ones((2,2), 'f') arr = arr/2.0 #Traceback (most recent call last): # File "bug.py", line 6, in ? #arr = arr/2.0 #TypeError: unsupported operand type(s) for / From paul at pfdubois.com Thu Nov 22 18:51:01 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu Nov 22 18:51:01 2001 Subject: [Numpy-discussion] Numeric & changes in Python division In-Reply-To: <3BFDA742.5080109@erols.com> Message-ID: <000201c173c9$606902c0$3d01a8c0@plstn1.sfba.home.com> You know what the doctor said: if it hurts when you do that, don't do that. Seriously, I have not the slightest idea what you're doing here. My project won't get to 2.2 until well into the new year. Especially if stuff like this has to be fixed. I haven't even read most of the 2.2 changes. I understand this is also an issue with CXX. Barry Scott runs CXX now since I am no longer in a job where I use C++. When he will get to this I don't know. I need to demote myself on the CXX website. You haven't seen any recent changes to Numpy, or comments from me on numarray, because I have a release to get out at my job. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Edward C. Jones Sent: Thursday, November 22, 2001 5:33 PM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Numeric & changes in Python division # Python 2.2b1, Numeric 20.2.0 from __future__ import division import Numeric arr = Numeric.ones((2,2), 'f') arr = arr/2.0 #Traceback (most recent call last): # File "bug.py", line 6, in ? #arr = arr/2.0 #TypeError: unsupported operand type(s) for / _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From siopis at umich.edu Fri Nov 23 20:59:01 2001 From: siopis at umich.edu (Christos Siopis ) Date: Fri Nov 23 20:59:01 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: Message-ID: [ This message got longer than i had initially thought, but these thoughts have been bugging me for so long that i cannot resist the temptation to push the send button! Apologies in advance to those not interested... ] On Mon, 26 Nov 2001, Martin Wiechert wrote: > Hi! > > Just an uneducated question. > Are there any plans to wrap GSL for Numpy2? > I did not actually try it (It's not Python ;-)), > but it looks clean and powerful. > > Regards, > Martin. I actually think that this question has come up before in this list, perhaps more than once. And i think it brings up a bigger issue, which is: to what extent is it useful for the numerical community to have multiple numerical libraries, and to what extent does this constitute a waste of resources? Numpy (Python), PDL (Perl), GSL (C), and a rather large number of other libraries usually have to re-implement the same old numerical algorithms, but offered under a different interface each time. However, there is such a big body of numerical algorithms out there that it's a daunting effort to integrate them into every language's numerical library (anyone want to implement LAPACK's functionality in Numpy?) The compromise that is usually made is to wrap one library around another. While this may be "better than nothing", it is usually not a pleasant situation as it leads to inconsistencies in the interface, inconsistencies in the error handling, difficulties in the installation, problems with licensing,... Since i have been a beneficiary rather than a contributor to the numerical open-source community, i feel somewhat hesitant to file this "complaint", but i really do think that there are relatively few people out there who are both willing and capable of building quality open-source numerical software, while there are too many algorithms to implement, so the community should be vigilant to minimize waste of resources! Don't take me wrong, i am not saying that Numpy, PDL, GSL & co. should be somehow "merged" --obviously, one needs different wrappers to call numerical routines from Python, Perl, C, C++ or Java. But there should be a way so that the actual *implementation* of the numerical algorithms is only done once and for all. So what i envision, in some sense, is a super-library of "all"/as many as possible numerical algorithms, which will present appropriate (but consistent) APIs for different programming languages, so that no matter what language i use, i can expect consistent interface, consistent numerical behavior, consistent error handling etc. Furthermore, different levels of access should allow the application developer to access low-level or high-level routines as needed (and could object orientation be efficiently built as a higher-level wrapper?) This way, the programmer won't have to worry whether the secant root finder that s/he is using handles exceptions well or how NaNs are treated. Perhaps most importantly, people would feel compelled to go into the pain of "translating" existing libraries such as LAPACK into this common framework, because they will know that this will benefit the entire community and won't go away when the next scripting language du jour eclipses their current favorite. Over time, this may lead to a truly precious resource for the numerical community. Now, i do realize that this may sound like a "holy grail" of numerical computing, that it is something which is very difficult, if not impossible to accomplish. It certainly does not seem like a project that the next ambitious programmer or lab group would want to embark into on a rainy day. Rather, it would require a number of important requirements and architectural decisions to be made first, and trade-offs considered. This would perhaps be best coordinated by the numerical community at large, perhaps under the auspices of some organization. But this would be time well-spent, for it would form the foundations on which a truly universal numerical library could be built. Experience gained from all the numerical projects to this day would obviously be invaluable in such an endeavor. I suspect that this list may not be the best place to discuss such a topic, but i think that some of the most active people in the field lurk here, and i would love to hear their thoughts and understand why i am wrong :) If there is a more appropriate forum to discuss such issues, i would be glad to be pointed to it --in which case, please disregard this posting! *************************************************************** / Christos Siopis | Tel : 734-764-3440 \ / Postdoctoral Research Fellow | \ / Department of Astronomy | FAX : 734-763-6317 \ / University of Michigan | \ / Ann Arbor, MI 48109-1090 | E-mail : siopis at umich.edu \ / U.S.A. _____________________| \ / / http://www.astro.lsa.umich.edu/People/siopis.html \ *************************************************************** From jh at oobleck.astro.cornell.edu Sat Nov 24 19:14:02 2001 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Sat Nov 24 19:14:02 2001 Subject: [Numpy-discussion] Re: Meta: too many numerical libraries doing the same thing? In-Reply-To: (numpy-discussion-request@lists.sourceforge.net) References: Message-ID: <200111250313.fAP3DUL21168@oobleck.astro.cornell.edu> Yes, this issue has been raised here before. It was the main conclusion of Paul Barrett's and my BOF session at ADASS a 5 years ago (see our report at http://oobleck.astro.cornell.edu/jh/ast/papers/idae96.ps). The main problems are that we scientists are too individualistic to get organized around a single library, too pushed by job pressures to commit much concentrated time to it ourselves, and too poor to pay the architects, coders, doc writers, testers, etc. to write it for us. Socially, we *want* to reinvent the wheel, because we want to be riding on our own wheels. Once we are riding reasonably well for our own needs, our interest and commitment vanishes. We're off to write the next paper. Following that conference, I took a poll on this list looking for help to implement the library. About half a dozen people responded that they could put in up to 10 hours a week, which in my experience isn't enough, once things get hard and attrition sets in. Nonetheless, Paul and I proposed to the NASA Astrophysics Data Analysis Program to hire some people to write it, but we were turned down. We proposed the idea to the head of the High Energy Astrophysics group at NASA Goddard, and he agreed -- as long as what we were really doing was writing software for his group's special needs. The frustrating thing is how many hundreds of astronomy projects hire people to do their 10% of this problem, and how unwilling they are to pool resources to do the general problem. A few of the volunteers in my query to this list have gone on to do SciPy, to their credit, but I don't see them moving in the direction we outlined. Still, they have the capacity to do it right in Python and compiled code written explicitly for Python. They won't solve the general problem, but they may solve the first problem, namely getting a data analysis environment that is OSS and as good as IDL et al. in terms of end-to-end functionality, completeness, and documentation. I like the notion that the present list is for designing and building the underlying language capabilities into Python, and for getting them standardized, tested, and included in the main Python distribution. It is also a good place for debating the merits of different implementations of particular functionality. That leaves the job of building coherent end-user data analysis packages (which necessarily have to pick one routine to be called "fft", one device-independent graphics subsystem, etc.) to application groups like SciPy. There can be more than one of these, if that's necessary, but they should all use the same underlying numerical language capability. I hope that the application groups from several array-based OSS languages will someday get together and collaborate on an ueberlibrary of numerical and graphics routines (the latter being the real sticking point) that are easily wrapped by most languages. That seems backwards, but I think the social reality is that that's the way it is going to be, if it ever happens at all. --jh-- From paul at pfdubois.com Sat Nov 24 19:59:01 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sat Nov 24 19:59:01 2001 Subject: [Numpy-discussion] Re: Meta: too many numerical libraries doing the same thing? In-Reply-To: <200111250313.fAP3DUL21168@oobleck.astro.cornell.edu> Message-ID: <000101c17565$12af2760$3d01a8c0@plstn1.sfba.home.com> There is more to this issue than meets the eye, both technically and historically. For numerical algorithms to be available independent of language, they would have to be packaged as components such as COM objects. While there is research in this field, nobody knows whether it can be done is a way that is efficient enough. For a given language like C, C++, Eiffel or Fortran used as the speed-demon base for wrapping up in Python, there are some difficult technical issues. Reusable numerical software needs context to operate and there is no decent way to supply the context in a non-object-oriented language. Geoff Furnish wrote a good paper about the issue for C++ showing the way to truly reusable libraries in that language, and recent improvements in Eiffel make it easier to do there now. In C or Fortran you simply can't do it. (Note that Eiffel or C++ versions of some NAG routines typically have methods with one or two arguments while the C or Fortran ones have 15 or more; a routine is not reusable if you have to understand that many arguments to try it. There are also important issue with regard to error handling and memory). The second issue is the algorithmic issue: most scientists do NOT know the right algorithms to use, and the ones they do use are often inferior. The good algorithms are for the most part in commercial libraries, and the numerical analysis literature, where they were written by numerical analysts. Often the coding from both sources is unavailable for free use, in the wrong language, and/or wretched. The commerical libraries also exist because some companies have requirements for fiduciary responsibility; in effect, they need a guarantor of the software to show that they have not carelessly depended on software of unknown quality. In short, computer scientists are not going to be able to write such a library without an army of numerical analysts familiar with the literature, and the numerical analysts aren't going to write it unless they are OO-experienced, which almost all of them aren't, so far. Most people when they discuss mathematical software think of leaves on the call tree. In fact the most useful mathematical software, in the sense that it incorporates the most expertise, is middleware such as ODE solvers, integrators, root finders, etc. The algorithm itself will have many controls, optional outputs, etc. This requires a library-wide design motif. I thus feel there are perfectly good reasons not to expect such a library soon. The Python community could do a good OO-design using what is available (such as LAPACK) but we haven't -- all the contributions are functional. From hinsen at cnrs-orleans.fr Sun Nov 25 04:45:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sun Nov 25 04:45:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? Message-ID: <200111251244.fAPCiIj01855@localhost.localdomain> "Christos Siopis " writes: > Don't take me wrong, i am not saying that Numpy, PDL, GSL & co. should be > somehow "merged" --obviously, one needs different wrappers to call > numerical routines from Python, Perl, C, C++ or Java. But there should be > a way so that the actual *implementation* of the numerical algorithms is > only done once and for all. I agree that sounds nice in theory. But even if it were technically feasible (which I doubt) given the language differences, it would be a development project that is simply too big for scientists to handle as a side job, even if they were willing (which again I doubt). My impression is that the organizational aspects of software development are often neglected. Some people are good programmers but can't work well in teams. Others can work in teams, but are not good coordinators. A big project requires at least one, if not several, people who are good scientist and programmers, have coordinator skills, and a job description that permits them to take up the task. Plus a larger number of people who are good scientists and programmers and can work in teams. Finally, all of these have to agree on languages, design principles, etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From tim.hochberg at ieee.org Sun Nov 25 10:50:02 2001 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Sun Nov 25 10:50:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: <3BF98336.9010500@STScI.Edu> Message-ID: <01fd01c175e1$e6ae7990$87740918@cx781526b> From: "Paul Barrett" > Perry Greenfield wrote: > > > > > An early beta version is available on sourceforge as the > > package Numarray (http://sourceforge.net/projects/numpy/) > > > > Information on the goals, changes in user interface, open issues, > > and design can be found at http://aten.stsci.edu/numarray > > > 6) Should array properties be accessible as public attributes > instead of through accessor methods? > > We don't currently allow public array attributes to make > the Python code simpler and faster (otherwise we will > be forced to use __setattr__ and such). This results in > incompatibilty with previous code that uses such attributes. > > > I prefer the use of public attributes over accessor methods. As do I. As of Python 2.2, __getattr__/__setattr__ should not be required anyway: new style classes allow this to be done in a more pleasent way. (I'm still too fuzzy on the details to describe it coherently here though). -tim From nwagner at mecha.uni-stuttgart.de Mon Nov 26 01:55:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 26 01:55:03 2001 Subject: [Numpy-discussion] Sort , Complex array Message-ID: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Hi, How can I sort an array of complex eigenvalues with respect to the imaginary part (in ascending order) in Numpy ? All eigenvalues appear in complex cunjugate pairs. Nils From hinsen at cnrs-orleans.fr Mon Nov 26 02:46:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 02:46:02 2001 Subject: [Numpy-discussion] Sort , Complex array In-Reply-To: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> References: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > How can I sort an array of complex eigenvalues with respect to the > imaginary part > (in ascending order) in Numpy ? > All eigenvalues appear in complex cunjugate pairs. indices = argsort(eigenvalues.imag) eigenvalues = take(eigenvalues, indices) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gvermeul at labs.polycnrs-gre.fr Mon Nov 26 02:48:02 2001 From: gvermeul at labs.polycnrs-gre.fr (Gerard Vermeulen) Date: Mon Nov 26 02:48:02 2001 Subject: [Numpy-discussion] Sort , Complex array In-Reply-To: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> References: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Message-ID: <01112611475600.19933@taco.polycnrs-gre.fr> On Monday 26 November 2001 11:53, Nils Wagner wrote: > Hi, > > How can I sort an array of complex eigenvalues with respect to the > imaginary part > (in ascending order) in Numpy ? > All eigenvalues appear in complex cunjugate pairs. > > Nils > I have solved that like this: >>> from Numeric import * >>> a = array([3+3j, 1+1j, 2+2j]) >>> b = a.imag >>> print take(a, argsort(b)) [ 1.+1.j 2.+2.j 3.+3.j] >>> Best regards -- Gerard From nwagner at mecha.uni-stuttgart.de Mon Nov 26 07:03:06 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 26 07:03:06 2001 Subject: [Numpy-discussion] Augmented matrix Message-ID: <3C026834.E56CE70@mecha.uni-stuttgart.de> Hi, How can I build an augmented matrix [A,b] in Numpy, where A is a m * n matrix (m>n) and b is a m*1 vector Nils From hinsen at cnrs-orleans.fr Mon Nov 26 08:34:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 08:34:02 2001 Subject: [Numpy-discussion] Augmented matrix In-Reply-To: <3C026834.E56CE70@mecha.uni-stuttgart.de> References: <3C026834.E56CE70@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > How can I build an augmented matrix [A,b] in Numpy, > where A is a m * n matrix (m>n) and b is a m*1 vector AB = concatenate((A, b[:, NewAxis]), -1) (assuming b is of rank 1) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From chrishbarker at home.net Mon Nov 26 10:30:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 10:30:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> Message-ID: <3C028E87.82C57211@home.net> Another factor that complicates things is open source philosophy and the licenses that go with it. The GSL project looks very promising, and the ultimate goals of that project appear to be to create a coherent and complete numerical library. This kind of thing NEEDS to be open source, and the GSL folks have chosen a license (GPL) that guarantees that it remains that way. That is a good thing. The license also make it impossible to use the library in closed source projects, which is a deal killer for a lot of people, but it is also an important attribute for many folks that don't think there should be closed source projects at all. I believe that that will greatly stifle the potential of the project, but it fits with the philosophy iof it's creators. Personally I think the LGPL would have guaranteed the future openness of the source, and allowed a much greater user (and therefor contributer) base. BTW, IANAL either, but my reading of the GPL and Python's "GPL compatable" license, is that GSL could be used with Python, but the result would have to be released under the GPL. That means it could not be imbedded in a closed source project. As a rule, Python itself and most of the libraries I have seen for it (Numeric, wxPython, etc.) are released under licences that allow propriatary use, so we probably don't want to make Numeric, or SciPy GPL. too bad. On another note, it looks like the blitz++ library might be a good basis for a general Numerical library (and NumPy 3) as well. It does come with a flexible license. Any thoughts? -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From hinsen at cnrs-orleans.fr Mon Nov 26 11:40:03 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 11:40:03 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> Message-ID: <200111261938.fAQJcmd01426@localhost.localdomain> Chris Barker writes: > On another note, it looks like the blitz++ library might be a good basis > for a general Numerical library (and NumPy 3) as well. It does come > with a flexible license. Any thoughts? I think the major question is whether we are willing to move to C++. And if we want to keep up any pretentions for Numeric becoming part of the Python core, this translates into whether Guido will accept C++ code in the Python core. >From a more pragmatic point of view, I wonder what the implications for efficiency would be. C++ used to be very different in their optimization abilities, is that still the case? Even more pragmatically, is blitz++ reasonably efficient with g++? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From chrishbarker at home.net Mon Nov 26 12:43:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 12:43:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> <200111261938.fAQJcmd01426@localhost.localdomain> Message-ID: <3C02ADB3.E314B8FB@home.net> Konrad Hinsen wrote: > Chris Barker writes: > > On another note, it looks like the blitz++ library might be a good basis > > for a general Numerical library (and NumPy 3) as well. It does come > > with a flexible license. Any thoughts? > I think the major question is whether we are willing to move to C++. > And if we want to keep up any pretentions for Numeric becoming part of > the Python core, this translates into whether Guido will accept C++ > code in the Python core. Actually, It's worse than that. Blitz++ makes heavy use of templates, and thus only works with compilers that support that well. The current Python core can compile under a very wide variety of compilers. I doubt that Guido would want to change that. Personally, I'm torn. I would very much like to see NumPy arrays become part of the core Python, but don't want to have to compromise what it could be to do that. Another idea is to extend the SciPy project to become a complete Python distribution, that would clearly include Numeric. One download, and you have all you need. > >From a more pragmatic point of view, I wonder what the implications > for efficiency would be. C++ used to be very different in their > optimization abilities, is that still the case? Even more > pragmatically, is blitz++ reasonably efficient with g++? I know g++ is supported (and I think it is their primary development platform). From the web site: Is there a way to soup up C++ so that we can keep the advanced language features but ditch the poor performance? This is the goal of the Blitz++ project: to develop techniques which will enable C++ to rival -- and in some cases even exceed -- the speed of Fortran for numerical computing, while preserving an object-oriented interface. The Blitz++ Numerical Library is being constructed as a testbed for these techniques. Recent benchmarks show C++ encroaching steadily on Fortran's high-performance monopoly, and for some benchmarks, C++ is even faster than Fortran! These results are being obtained not through better optimizing compilers, preprocessors, or language extensions, but through the use of template techniques. By using templates cleverly, optimizations such as loop fusion, unrolling, tiling, and algorithm specialization can be performed automatically at compile time. see: http://www.oonumerics.org/blitz/whatis.html for more info. I havn't messed with it myself, but from the web page, it seems the answer is yes, C++ can produce high performance code. -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From hinsen at cnrs-orleans.fr Mon Nov 26 12:52:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 12:52:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> (paul@pfdubois.com) References: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> Message-ID: <200111262050.fAQKoxB01580@localhost.localdomain> > We had some meetings to discuss using blitz and the truth is that as > wrapped by Python there is not much to gain. The efficiency of blitz > comes up when you do an array expression in C++. Then x = y + z + w + a > + b gets compiled into one loop with no temporary objects created. But That could still be of interest to extension module writers. And it seems conceivable to write some limited Python-C compiler for numerical expressions that generates extension modules, although this is more than a weekend project. Still, I agree that what most people care about is the speed of NumPy operations. Some lazy evaluation scheme might be more promising to eliminate the creation of intermediate objects, but that isn't exactly trivial either... Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Mon Nov 26 12:59:03 2001 From: perry at stsci.edu (Perry Greenfield) Date: Mon Nov 26 12:59:03 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > From: Chris Barker > To: Perry Greenfield , > numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Re: Re-implementation of Python > Numerical arrays (Numeric) available > for download > > I used Poor wording. When I wrote "datatypes", I meant data types in a > much higher order sense. Perhaps structures or classes would be a better > term. What I mean is that is should be easy to use an manipulate the > same multidimensional arrays from both Python and C/C++. In the current > Numeric, most folks generate a contiguous array, and then just use the > array->data pointer to get what is essentially a C array. That's fine if > you are using it in a traditional C way, with fixed dimension, one > datatype, etc. What I'm imagining is having an object in C or C++ that > could be easily used as a multidimentional array. I'm thinking C++ would > probably neccesary, and probably templates as well, which is why blitz++ > looked promising. Of course, blitz++ only compiles with a few up-to-date > compilers, so you'd never get it into the standard library that way! > Yes, that was an important issue (C++ and the Python Standard Library). And yes, it is not terribly convenient to access multi-dimensional arrays in C (of varying sizes). We don't solve that problem in the way a C++ library could. But I suppose that some might say that C++ libraries may introduce their own, new problems. But coming up with the one solution to all scientific computing appears well beyond our grasp at the moment. If someone does see that solution, let us know! > I agree, but from the newsgroup, it is clear that a lot of folks are > very reluctant to use something that is not part of the standard > library. > We agree that getting into the standard library is important. > > > > We estimate > > > > that numarray is probably another order of magnitude worse, > > > > i.e., that 20K element arrays are at half the asymptotic > > > > speed. How much should this be improved? > > > > > > A lot. I use arrays smaller than that most of the time! > > > > > What is good enough? As fast as current Numeric? > > As fast as current Numeric would be "good enough" for me. It would be a > shame to go backwards in performance! > > > (IDL does much > > better than that for example). > > My personal benchmark is MATLAB, which I imagine is similar to IDL in > performance. > We'll see if we can match current performance (or at least present usable alternative approaches that are faster). > > 10 element arrays will never be > > close to C speed in any array based language embedded in an > > interpreted environment. > > Well, sure, I'm not expecting that > Good :-) > > 100, maybe, but will be very hard. > > 1000 should be possible with some work. > > I suppose MATLAB has it easier, as all arrays are doubles, and, (untill > recently anyway), all variable where arrays, and all arrays were 2-d. > NumPy is a lot more flexible that that. Is is the type and size checking > that takes the time? > Probably, but we haven't started serious benchmarking yet so I wouldn't put much stock in what I say now. > One of the things I do a lot with are coordinates of points and > polygons. Sets if points I can handle easily as an NX2 array, but > polygons don't work so well, as each polgon has a different number of > points, so I use a list of arrays, which I have to loop over. Each > polygon can have from about 10 to thousands of points (mostly 10-20, > however). One way I have dealt with this is to store a polygon set as a > large array of all the points, and another array with the indexes of the > start and end of each polygon. That way I can transform the coordinates > of all the polygons in one operation. It works OK, but sometimes it is > more useful to have them in a sequence. > This is a good example of an ensemble of variable sized arrays. > > As mentioned, > > we tend to deal with large data sets and so I don't think we have > > a lot of such examples ourselves. > > I know large datasets were one of your driving factors, but I really > don't want to make performance on smaller datasets secondary. > > -- > Christopher Barker, That's why we are asking, and it seems so far that there are enough of those that do care about small arrays to spend the effort to significantly improve the performance. Perry From chrishbarker at home.net Mon Nov 26 13:03:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 13:03:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3C02B298.E1F0E661@home.net> "Paul F. Dubois" wrote: > We had some meetings to discuss using blitz and the truth is that as > wrapped by Python there is not much to gain. The efficiency of blitz > comes up when you do an array expression in C++. Then x = y + z + w + a > + b gets compiled into one loop with no temporary objects created. But > this trick is possible because you can bind the assignment. In python > you cannot bind the assignment so you cannot do a lazy evaluation of the > operations, unless you are willing to go with some sort of function call > like x = evaluate(y + z + w). Immediate evaluations means creating > temporaries, and performance is dead. > > The only gain then would be when you passed a Python-wrapped blitz array > back to C++ and did a bunch of operations there. Personally, I think this could be a big gain. At the moment, if you don't get the performance you need with NumPy, you have to write some of your code in C, and using the Numeric and Python C API is a whole lot of work, particularly if you want your function to work on non-contiguous arrays and/or arrays of any type. I don't know much C++, and I have no idea if Blitz++ fits this bill, but it seemed to me that using an object oriented framework that could take care of reference counting, and allow you to work with generic arrays, and index them naturally, etc, would be a great improvement, even if the performance was the same as the current C API. Perhaps NumPy2 has accomplished that, it sounds like it is a step in the right direction, at least. In a sentence: the most important reason for using a C++ object oriented multi-dimensional array package would be easy of use, not speed. It's nice to hear Blitz++ was considered, it was proably rejected for good reason, but it just looked very promising to me. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From oliphant at ee.byu.edu Mon Nov 26 13:24:11 2001 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Nov 26 13:24:11 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: <3C02B298.E1F0E661@home.net> Message-ID: > In a sentence: the most important reason for using a C++ object oriented > multi-dimensional array package would be easy of use, not speed. > > It's nice to hear Blitz++ was considered, it was proably rejected for > good reason, but it just looked very promising to me. I believe that Eric's "compiler" module included in SciPy uses Blitz++ to optimize Numeric expressions. You have others who also share your admiration of Blitz++ -Travis From chrishbarker at home.net Mon Nov 26 15:31:05 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 15:31:05 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame thing? References: Message-ID: <3C02D510.E7454CCA@home.net> Travis Oliphant wrote: > I believe that Eric's "compiler" module included in SciPy uses Blitz++ to > optimize Numeric expressions. You have others who also share your > admiration of Blitz++ Yes, it does. That's where I heard about it. That also brings up a good point. Paul mentioned that using something like Blitz++ would only help performance if you could pass it an entire expression, like: x = a+b+c+d. That is exactly what Eric's compiler module does, and it would sure be easier if NumPy already used Blitz++! In Fact, I suppose Eric's compiler is a start towards a tool that could comp9le en entire NumPy function or module. I'd love to be able to just do that (with some tweeking perhaps) rather than having to code it all by hand. My fantasies continue... -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From jochen at jochen-kuepper.de Mon Nov 26 16:34:01 2001 From: jochen at jochen-kuepper.de (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Mon Nov 26 16:34:01 2001 Subject: [Numpy-discussion] Re: Numpy2 and GSL In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 26 Nov 2001 08:21:40 +0100 Martin Wiechert wrote: Martin> Are there any plans to wrap GSL for Numpy2? Martin> I did not actually try it (It's not Python ;-)), Martin> but it looks clean and powerful. There is actually a project to wrap gsl for python: http://pygsl.sourceforge.net/ It only provides wrapper for the special functions, but more is to come. (Hopefully Achim will put the cvs on sf soon.) Yes, I agree, PyGSL should be fully integrated with Numpy2, but it should probably also remain a separate project -- as Numpy should stay a base layer for all kind of numerical stuff and hopefully make it into core python at some point (my personal wish, no more, AFAICT!). I think when PyGSL will fully go to SF (or anything similar) more people would start contributing and we should have a fine general numerical algorithms library for python soon! Greetings, Jochen - -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt and GnuPG iD8DBQE8At88iJ/aUUS8zY4RAikdAJ9184yaCSH+GtkDz2mLVlrSh7mjEQCdGSqA 2uhmBKRCFBb9eeq3gmmn9/Q= =64gm -----END PGP SIGNATURE----- From europax at home.com Mon Nov 26 17:36:16 2001 From: europax at home.com (Rob) Date: Mon Nov 26 17:36:16 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> <200111261938.fAQJcmd01426@localhost.localdomain> <3C02ADB3.E314B8FB@home.net> Message-ID: <3C02ED76.F02F17D8@home.com> I'm currently testing the SciPy Blitz++ features with FDTD. Should have some comparisons soon. Right now my statements are compiling, but not giving the right answers :( I think they might have it fixed soon. Rob. Chris Barker wrote: > > Konrad Hinsen wrote: > > Chris Barker writes: > > > On another note, it looks like the blitz++ library might be a good basis > > > for a general Numerical library (and NumPy 3) as well. It does come > > > with a flexible license. Any thoughts? > > > I think the major question is whether we are willing to move to C++. > > And if we want to keep up any pretentions for Numeric becoming part of > > the Python core, this translates into whether Guido will accept C++ > > code in the Python core. > > Actually, It's worse than that. Blitz++ makes heavy use of templates, > and thus only works with compilers that support that well. The current > Python core can compile under a very wide variety of compilers. I doubt > that Guido would want to change that. > > Personally, I'm torn. I would very much like to see NumPy arrays become > part of the core Python, but don't want to have to compromise what it > could be to do that. Another idea is to extend the SciPy project to > become a complete Python distribution, that would clearly include > Numeric. One download, and you have all you need. > > > >From a more pragmatic point of view, I wonder what the implications > > for efficiency would be. C++ used to be very different in their > > optimization abilities, is that still the case? Even more > > pragmatically, is blitz++ reasonably efficient with g++? > > I know g++ is supported (and I think it is their primary development > platform). From the web site: > > Is there a way to soup up C++ so that we can keep the advanced language > features but ditch the poor performance? This is the goal of the > Blitz++ project: to develop techniques which will enable C++ to rival -- > and in some cases even exceed -- the speed of Fortran for numerical > computing, while preserving an object-oriented interface. The Blitz++ > Numerical Library is being constructed as a testbed for these > techniques. > > Recent benchmarks show C++ encroaching steadily on Fortran's > high-performance monopoly, and for some benchmarks, C++ is even faster > than Fortran! These results are being obtained not through better > optimizing compilers, preprocessors, or language extensions, but through > the > use of template techniques. By using templates cleverly, optimizations > such as loop fusion, unrolling, tiling, and algorithm specialization can > be > performed automatically at compile time. > > see: http://www.oonumerics.org/blitz/whatis.html for more info. > > I havn't messed with it myself, but from the web page, it seems the > answer is yes, C++ can produce high performance code. > > -- > Christopher Barker, > Ph.D. > ChrisHBarker at home.net --- --- --- > http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ > ------@@@ ------@@@ ------@@@ > Oil Spill Modeling ------ @ ------ @ ------ @ > Water Resources Engineering ------- --------- -------- > Coastal and Fluvial Hydrodynamics -------------------------------------- > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- The Numeric Python EM Project www.members.home.net/europax From Achim.Gaedke at uni-koeln.de Tue Nov 27 00:20:02 2001 From: Achim.Gaedke at uni-koeln.de (Achim Gaedke) Date: Tue Nov 27 00:20:02 2001 Subject: [Numpy-discussion] Re: Numpy2 and GSL References: Message-ID: <3C034BFA.FBB64E94@uni-koeln.de> Ok, there is a clear need for the facility of easy contribution. Please be patient until Friday, December 7th. Then I have time to let it happen. It is right that the oficial site for this project is at pygsl.sourcefogrge.net (Brian Gough, can you change the link on the gsl homepage, thanks :-) ) But I will show some discussion points that must be clear before a cvs release: - Is the file and directory structure fully expandable, can several persons work parallel? - Should classes be created with excellent working objects or should it be a 1:1 wrapper? - should there be one interface dynamic library or more than one? - Is there an other way expect that of the GPL (personally prefered, but other opinions should be discussed before the contribution of source) Some questions of minor weight: - Is the tuple return value for (value,error) ok in the sf module? - Test cases are needed These questions are the reason, why I do not simply "copy" my code into cvs. Jochen K?pper wrote: > > It only provides wrapper for the special functions, but more is to > come. (Hopefully Achim will put the cvs on sf soon.) > > Yes, I agree, PyGSL should be fully integrated with Numpy2, but it > should probably also remain a separate project -- as Numpy should stay > a base layer for all kind of numerical stuff and hopefully make it > into core python at some point (my personal wish, no more, AFAICT!). > > I think when PyGSL will fully go to SF (or anything similar) more > people would start contributing and we should have a fine general > numerical algorithms library for python soon! > I agree with Jochen and I'd like to move to the core of Python too. But this is far away and I hate monolithic distributions. If there is the need to discuss seperately about PyGSL we can do that here or at the gsl-discuss list mailto:gsl-discuss at sources.redhat.com . But there is also the possibility of a mailinglist at pygsl.sourceforge.net . Please let me know. From neelk at cswcasa.com Tue Nov 27 05:52:05 2001 From: neelk at cswcasa.com (Krishnaswami, Neel) Date: Tue Nov 27 05:52:05 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: Perry Greenfield [mailto:perry at stsci.edu] wrote: > > > > I know large datasets were one of your driving factors, but I really > > don't want to make performance on smaller datasets secondary. > > That's why we are asking, and it seems so far that there are enough > of those that do care about small arrays to spend the effort to > significantly improve the performance. Well, here's my application. I do data mining work, and one of the techniques I want to use Numpy for is to implement robust regression algorithms like least-trimmed-squares. Now for a k-variable regression, the best-of-breed algorithm for this involves taking hundreds of thousands of k-element samples and calculating the fitting hyperplane through them. Small matrix performance is thus something this program lives or dies by, and right now it seems like 'dies' is the right measure -- it is about 10x slower than the Gauss program that does the same thing. :( When I profiled it seems like Numpy is spending almost all of its time in _castCopyAndTranspose. Switching to the Intel MKL LAPACK had no performance effect, but changing _castCopyAndTranspose into a C function was a 20% speed increase. If Numpy2 is even slower on small matrices I'd have to give up using it, and that's a shame: it's a *much* nicer environment than Gauss is. -- Neel Krishnaswami neelk at cswcasa.com From hungjunglu at yahoo.com Tue Nov 27 08:28:06 2001 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Tue Nov 27 08:28:06 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: Message-ID: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Hi, Thanks to Jon Saenz and Chris Baker for helping out with fast linear algebra and statistical distribution routines. Again, I have a tangential question. I am hitting the physical limit of the CPU (meaning things have been optimized down to assembly level), in order to achieve even higher performance, the only way to go is hardware. Is there any recommendation for fast machines at the price range of a few thousand dollars? (I cannot afford supercomputers or connection machines.) My purpose is to run Monte Carlo simulation. This means that a lot of scenarios can be run in parallel fashion. Of course I can just use regular cheap Pentium boxes... but they are kind of bulky, and I don't need any of the video, audio, USB features (I think 10 machines at 1GHz each would be the size of calculation power I need, or equivalently, a single machine at an equivalent 10GHz. Heck, if there are some specialized racks/boxes, I can wire the motherboards myself.) I am wondering what you people do for heavy number crunching? Are there any cheap yet specialized machines? What about machines with dual processor? I would imagine a lot of people in the number crunching world run into my situation, and since the number crunching machines don't require much beyond a motherboard and a small hard-drive, maybe there are already some cheap solutions out there. thanks! Hung Jung __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From rossini at blindglobe.net Tue Nov 27 09:44:02 2001 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue Nov 27 09:44:02 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: <20011127162705.40865.qmail@web12604.mail.yahoo.com> References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <87vgfwdsao.fsf@jeeves.blindglobe.net> >>>>> "HJL" == Hung Jung Lu writes: HJL> Again, I have a tangential question. I am hitting the HJL> physical limit of the CPU (meaning things have been optimized HJL> down to assembly level), in order to achieve even higher HJL> performance, the only way to go is hardware. HJL> Is there any recommendation for fast machines at the price HJL> range of a few thousand dollars? (I cannot afford HJL> supercomputers or connection machines.) My purpose is to run HJL> Monte Carlo simulation. This means that a lot of scenarios HJL> can be run in parallel fashion. Of course I can just use HJL> regular cheap Pentium boxes... but they are kind of bulky, HJL> and I don't need any of the video, audio, USB features (I HJL> think 10 machines at 1GHz each would be the size of HJL> calculation power I need, or equivalently, a single machine HJL> at an equivalent 10GHz. Heck, if there are some specialized HJL> racks/boxes, I can wire the motherboards myself.) I am HJL> wondering what you people do for heavy number crunching? Are HJL> there any cheap yet specialized machines? What about machines HJL> with dual processor? I would imagine a lot of people in the HJL> number crunching world run into my situation, and since the HJL> number crunching machines don't require much beyond a HJL> motherboard and a small hard-drive, maybe there are already HJL> some cheap solutions out there. The usual way is to build some "blackboxes", i.e. mobo/cpu/memory/NIC, diskless or nearly diskless (you don't want to maintain machines :-). Connect them using 100bT or faster networks (though 100bT should be fine). Do such things exist? Sort of -- they tend to be more expensive than building them yourself, but if you've got a reliable local supplier, they can build them fairly cheaply for you. I'd go with single or dual athlons, myself :-). If power and maintenance is an issue, duals, and if not, maybe singles. We use MOSIX (www.mosix.org) for transparent load balancing between linux machines, and it could be used on the machines I described (using a floppy or CD to boot). The next question is whether some form of parallel RNG will help. The answer is "maybe". I worked with a student who evaluated coupled chains, and we couldn't do too much better. And then after that, is whether you want to figure out how to post-process the results. If you want to automate the whole thing (and it isn't clear that it would be worth it, but...), you could use PyPVM to front-end the sub-processes distributed on the network, load-balanced at the system level by MOSIX. Now for the problems -- MOSIX seems to have difficulties with Python. Severe difficulties. I don't know if it still holds true for recent MOSIX releases. (note that I use R (www.r-project.org) for most of my simulation work these days, but am looking at Python for stat analyses, of which MCMC tools are of interest). best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ -------------- FHCRC: M-W: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: T-Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX Rosen: (Mullins' Lab) Fridays, and I'm unreachable except by email. From chrishbarker at home.net Tue Nov 27 10:28:01 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue Nov 27 10:28:01 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <3C03DF8D.3725E2A2@home.net> Hung Jung Lu wrote: > Is there any recommendation for fast machines at the > price range of a few thousand dollars? (I cannot > afford supercomputers or connection machines.) My > purpose is to run Monte Carlo simulation. This means > that a lot of scenarios can be run in parallel > fashion. Of course I can just use regular cheap > Pentium boxes... but they are kind of bulky, and I > don't need any of the video, audio, USB features (I I've been looking into setting up a system to do similar work, and it looks to me like the best bang for the buck right now are dual Athlon systems. If space is an important consideration, you can get dual Athlon 1U rack mount systems for less than $2000. I'm pretty sure the only dual Athlon board currently available (Tyan K7 thunder) has on board video, ethernet and SCSI, which means it cost a little more than it could, but these systems are still a pretty good deal if you get one without a hard drive (or a very cheap one). I just did quick web search, and epox is supposed to be coming out with a dual board as well, so there may be cheaper options soon. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From wsryu at fas.harvard.edu Tue Nov 27 15:52:04 2001 From: wsryu at fas.harvard.edu (William Ryu) Date: Tue Nov 27 15:52:04 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: <3C03DF8D.3725E2A2@home.net> References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <5.1.0.14.2.20011127184457.00aa3850@pop.fas.harvard.edu> At 10:46 AM 11/27/2001 -0800, Chris Barker wrote: >Hung Jung Lu wrote: > > Is there any recommendation for fast machines at the > > price range of a few thousand dollars? (I cannot > > afford supercomputers or connection machines.) My > > purpose is to run Monte Carlo simulation. This means > > that a lot of scenarios can be run in parallel > > fashion. Of course I can just use regular cheap > > Pentium boxes... but they are kind of bulky, and I > > don't need any of the video, audio, USB features (I > >I've been looking into setting up a system to do similar work, and it >looks to me like the best bang for the buck right now are dual Athlon >systems. If space is an important consideration, you can get dual Athlon >1U rack mount systems for less than $2000. I'm pretty sure the only dual >Athlon board currently available (Tyan K7 thunder) has on board video, >ethernet and SCSI, which means it cost a little more than it could, but >these systems are still a pretty good deal if you get one without a hard >drive (or a very cheap one). I just did quick web search, and epox is >supposed to be coming out with a dual board as well, so there may be >cheaper options soon. > >-Chris There is a cheaper dual CPU Tyan board which uses the same motherboard chipset. Its the Tyan Tiger-MP S2460, which doesn't have SCSI, onboard video, or Ethernet, but is half the price (around $200). -willryu From eric at enthought.com Tue Nov 27 16:16:02 2001 From: eric at enthought.com (eric) Date: Tue Nov 27 16:16:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame Message-ID: <051001c17799$8bfa68b0$777ba8c0@ericlaptop> Hey group, Blitz++ is very cool, but I'm not sure it would make a very good underpinning for reimplementing Numeric. There are 2 (well maybe 3) main points. 1. Blitz++ declares arrays in the following way: The first issue deals with how you declare arrays in Blitz++. Array A(N,N,N); The big deal here is that the dimensionality of Array is a template parameter, not a constructor parameter. In other words, 2D arrays are effectively a different type than 3D arrays. Numeric, on the other hand represents arrays of all dimensions with a single class/type. For Python, this makes the most sense. I think you could fanagle some way of getting blitz to work, but I'm not sure it would be the desired elegant solution. I've also tinkered with building a simple C++ templated (non-blitz) implementation of Numeric for kicks, but kept coming back to using the dreaded void* to store the data arrays. I still haven't completely given up on a templated solution, but it wasn't as obvious as I thought it would be. 2. Compiling Blitz++ is slooooow. scipy.compiler spits out 200-300 line extension modules at the most. Depending on hox complicated expressions are, it can take .5-1.5 minutes to compile a single extension funtion on an 850 MHz PIII. I can't imagine how long it would take to compile Numeric arrays for 1 through 11 dimensions (the most blitz supports as I remember) for all the different data types with 100s of extension functions. The cost wouldn't be linear because you do pay a one time hit for some of the template instantiation. Also, I've heard gcc 3.0 might be better. Still, it'd be a painful development process. 3. Portability. This comes at two levels. The first is that blitz++ has heavy duty requirements of the compiler. gcc works fine which is a huge plus, but a lot of other compilers don't. MSVC is the most notable of these because it is so heavily used on windows. The second level is the portability of C++ extension modules in general. I've run into this on windows, but I think it is an issue pretty much everywhere. For example, MSVC and GCC compiled C extension libraries can call each other on Windows because they the are binary compatible. C++ classes are _not_ binary compatible. This has come up for me with wxPython. The standard version that Robin Dunn distributes is compiled with MSVC. If you build a small extensions with gcc that make wxPython call, it'll link just fine, but seg-faults during execution. Does anyone know if the same sorta thing is true on the Unices? If it is, and Numeric was written in C++ then you'd have to compile extension modules that use Numeric arrays with the same compiler that was used to compile Numeric. This can lead to all sorts of hassles, and it has made me lean back towards C as the preferred language for something as fundemental as Numeric. (Note that I do like C++ for modules that don't really define an API called by other modules). Ok, so maybe there's a 4th point. Paul D. pointed out that blitz isn't much of a win unless you have lazy evaluation (which scipy.compiler already provides). I also think improved speed _isn't_ the biggest goal of a reimplementation (although it can't be sacrificed either). I'm more excited about a code base that more people can comprehend. Perry G. et al's mixed Python/C implementation with the code generators is a very good idea and a step in this direction. I hope the speed issues for small arrays can be solved. I also hope the memory mapped aspect doesn't complicate the code base much. see ya, eric From hinsen at cnrs-orleans.fr Wed Nov 28 00:09:03 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Nov 28 00:09:03 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame Message-ID: <200111280808.fAS889g08217@localhost.localdomain> "eric" writes: > The standard version that Robin Dunn distributes is compiled with MSVC. If > you build a small > extensions with gcc that make wxPython call, it'll link just fine, but > seg-faults during execution. > Does anyone know if the same sorta thing is true on the Unices? If it is, > and Numeric was written in C++ then you'd have to compile extension modules > that use Numeric arrays with the same compiler that was used to compile > Numeric. This can lead to all sorts of hassles, and it has made me lean If you rely on dynamic linking for cross-module calls, you'd have the same problem with Unix, as different compilers use different name-mangling schemes. One way around this would be to limit cross-module calls to C functions compiled with "C" linking. Better yet, don't rely on dynamic linking at all and export a module's C API via a Python CObject, as described in the extension manual, and declare all symbols as static (except for the module initialization function of course). In my experience that is the only method that works on all platforms, with all compilers. Of course this also assumes that interfaces are at the C level. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From sag at hydrosphere.com Wed Nov 28 09:02:05 2001 From: sag at hydrosphere.com (Sue Giller) Date: Wed Nov 28 09:02:05 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array Message-ID: <20011128170140921.AAA253@mail.climatedata.com@SUEW2000> I posted the following inquiry to python-list at python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code: >>> import MA >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) >>> a [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]] >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) >>> r = MA.sum(m) >>> r array([11,22,33, 0,55,]) >>> t = MA.getmask(r) >>> print t None From paul at pfdubois.com Wed Nov 28 20:31:03 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Wed Nov 28 20:31:03 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011128170140921.AAA253@mail.climatedata.com@SUEW2000> Message-ID: <000201c1788e$60359ce0$3d01a8c0@plstn1.sfba.home.com> [dubois at ldorritt ~]$ pydoc MA.sum Python Library Documentation: function sum in MA sum(a, axis=0, fill_value=0) Sum of elements along a certain axis using fill_value for missing. If you use add.reduce, you'll get what you want. >>> print m [[1 ,2 ,3 ,-- ,5 ,] [10 ,20 ,30 ,-- ,50 ,]] >>> MA.sum(m) array([11,22,33, 0,55,]) >>> MA.add.reduce(m) array(data = [ 11, 22, 33,-99, 55,], mask = [0,0,0,1,0,], fill_value=-99) In other words, sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) Surprising in your case. Still, both uses are quite common, so I probably was thinking to myself that since add.reduce already does one of the jobs, I might as well make sum do the other one. One could have just as well argued that one was a synonym for the other and so it is revolting to have them be different. Well, MA users, is this something I should change, or not? -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue Giller Sent: Wednesday, November 28, 2001 9:03 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array I posted the following inquiry to python-list at python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code: >>> import MA >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) >>> a [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]] >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) >>> r = MA.sum(m) >>> r array([11,22,33, 0,55,]) >>> t = MA.getmask(r) >>> print t None _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From giulio.bottazzi at libero.it Thu Nov 29 02:10:03 2001 From: giulio.bottazzi at libero.it (Giulio Bottazzi) Date: Thu Nov 29 02:10:03 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array References: <000201c1788e$60359ce0$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3C05FDA2.AD9C5DCC@libero.it> My answer is yes: the difference between the two behaviors could be confusing for the user. If I can dare to express a "general rule", I would say that the masks in MA arrays should not disappear if not EXPLICITLY required to do so! Of course you can interpret a provided value for the fill_value parameter in the sum function as such a request... but if value is not provided, than I would say that the correct approach would be to keep the mask on (after all, what special about the value 0? For instance, if you have to take logarithm in the next step of the calculation, it is a rather bad choice!) Giulio. "Paul F. Dubois" wrote: > > [dubois at ldorritt ~]$ pydoc MA.sum > Python Library Documentation: function sum in MA > > sum(a, axis=0, fill_value=0) > Sum of elements along a certain axis using fill_value for missing. > > If you use add.reduce, you'll get what you want. > >>> print m > [[1 ,2 ,3 ,-- ,5 ,] > [10 ,20 ,30 ,-- ,50 ,]] > >>> MA.sum(m) > array([11,22,33, 0,55,]) > >>> MA.add.reduce(m) > array(data = > [ 11, 22, 33,-99, 55,], > mask = > [0,0,0,1,0,], > fill_value=-99) > > In other words, > sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) > > Surprising in your case. Still, both uses are quite common, so I > probably was thinking to myself that since add.reduce already does one > of the jobs, I might as well make sum do the other one. One could have > just as well argued that one was a synonym for the other and so it is > revolting to have them be different. > > Well, MA users, is this something I should change, or not? > > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue > Giller > Sent: Wednesday, November 28, 2001 9:03 AM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked > array > > I posted the following inquiry to python-list at python.org earlier this > week, but got no responses, so I thought I'd try a more focused > group. I assume MA module falls under NumPy area. > > I am using 2 (and more) dimensional masked arrays with some > numeric data, and using the reduce functionality on the arrays. I > use the masking because some of the values in the arrays are > 'missing' and should not be included in the results of the reduction. > > For example, assume a 5 x 2 array, with masked values for the 4th > entry for both of the 2nd dimension cells. If I want to sum along the > 2nd dimension, I would expect to get a 'missing' value for the 4th > entry because both of the entries for the sum are 'missing'. Instead, > I get 0, which might be a valid number in my data space, and the > returned 1 dimensional array has no mask associated with it. > > Is this expected behavior for masked arrays or a bug or am I > misusing the mask concept? Does anyone know how to get the > reduction to produce a masked value? > > Example Code: > >>> import MA > >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) > >>> a > [[ 1, 2, 3,-99, 5,] > [ 10, 20, 30,-99, 50,]] > >>> m = MA.masked_values(a, -99) > >>> m > array(data = > [[ 1, 2, 3,-99, 5,] > [ 10, 20, 30,-99, 50,]], > mask = > [[0,0,0,1,0,] > [0,0,0,1,0,]], > fill_value=-99) > > >>> r = MA.sum(m) > >>> r > array([11,22,33, 0,55,]) > >>> t = MA.getmask(r) > >>> print t > None > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From sag at hydrosphere.com Thu Nov 29 09:49:02 2001 From: sag at hydrosphere.com (Sue Giller) Date: Thu Nov 29 09:49:02 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <3C05FDA2.AD9C5DCC@libero.it> Message-ID: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA..reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked). >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA. working one way, and MA..reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. From reggie at merfinllc.com Thu Nov 29 10:36:01 2001 From: reggie at merfinllc.com (Reggie Dugard) Date: Thu Nov 29 10:36:01 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: > That raises the question of how to determine a count of valid values > in a masked array. Can I assume that I can do 'math' on the mask > array itself, for example to sum along a given axis and have the > masked cells add up? > > In my original example, I would expect a sum along the second axis > to return [0,0,0,2,0]. Can I rely on this? I would suggest that a > .count operator would be very useful in working with masked arrays > (count valid and count masked). Actually masked arrays already have a count method that does what you want: Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from pydoc import help >>> import MA >>> x = MA.arange(10) >>> help(x.count) Help on method count in module MA.MA: count(self, axis=None) method of MA.MA.MaskedArray instance Count of the non-masked elements in a, or along a certain axis. >>> x.count() 10 >>> From paul at pfdubois.com Thu Nov 29 12:54:02 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu Nov 29 12:54:02 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: <000201c17917$ac5efec0$3d01a8c0@plstn1.sfba.home.com> You have misread my reply. It is not true that MA.op works one way and MA.op.reduce is different. sum and add.reduce are different, and the documentation for sum DOES say the right thing for sum. The function sum is a special case in that its native meaning was the same as add.reduce and so the function is redundant. I believe you are in error wrt average; average works the way you want. Function count can tell you the number of non-masked values either in the whole array or axis-wise if you give an axis argument. Function size gives you the total number, so #invalid is size(x)-count(x). maximum and minimum (don't use max and min, they are built-ins that don't know about Numeric) have two forms. When called with one argument they return the overall max or min of the whole array, returning masked only if all entries are masked. For two arguments, you get element-wise extrema, and the mask is on where any one of the arguments was masked. >>> print x [[1 ,-- ,3 ,] [11 ,-- ,-- ,]] >>> print average(x) [6.0 ,-- ,3.0 ,] >>> y array( [[ 6, 7, 8,] [ 9,10,11,]]) >>> print maximum(x,y) [[6 ,-- ,8 ,] [11 ,-- ,-- ,]] >>> y[0,0]=masked >>> print maximum(x,y) [[-- ,-- ,8 ,] [11 ,-- ,-- ,]] -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue Giller Sent: Thursday, November 29, 2001 9:50 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA..reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked). >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA. working one way, and MA..reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From sag at hydrosphere.com Thu Nov 29 15:21:04 2001 From: sag at hydrosphere.com (Sue Giller) Date: Thu Nov 29 15:21:04 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <000201c17917$ac5efec0$3d01a8c0@plstn1.sfba.home.com> References: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: <20011129232011546.AAA269@mail.climatedata.com@SUEW2000> Paul, Well, you're right. I did misunderstand your reply, as well as what the various functions were supposed to do. I was mis-using the sum, minimum, maximum as tho they were MA..reduce, and my test case didn't point out the difference. I should always have been doing the .reduce version. I apologize for this! I found a section on page 45 of the Numerical Python text (PDF form, July 13, 2001) that defines sum as 'The sum function is a synonym for the reduce method of the add ufunc. It returns the sum of all the elements in the sequence given along the specified axis (first axis by default).' This is where I would expect to see a caveat about it not retaining any mask-edness. I was misussing the MA.minimum and MA.maximum as tho they were .reduce version. My bad. The MA.average does produce a masked array, but it has changed the 'missing value' to fill_value=[ 1.00000002e+020,]). I do find this a bit odd, since the other reductions didn't change the fill value. Anyway, I can now get the stats I want in a format I want, and I understand better the various functions for array/masked array. Thanks for the comments/input. sue From romberg at fsl.noaa.gov Fri Nov 30 11:30:04 2001 From: romberg at fsl.noaa.gov (Mike Romberg) Date: Fri Nov 30 11:30:04 2001 Subject: [Numpy-discussion] equal() and complex Message-ID: <15367.56879.54329.654575@smaug.fsl.noaa.gov> I'm wondering if there is some good reason why equal(), not_equal(), nonzero() and the like do not work with numeric arrays of tyep complex. I can see why operators like less() and less_equal() do not work. But the pure equality ones seem like they should work. Or am I missing something :). Thanks, Mike Romberg (romberg at fsl.noaa.gov) From hinsen at cnrs-orleans.fr Fri Nov 30 12:17:04 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Nov 30 12:17:04 2001 Subject: [Numpy-discussion] equal() and complex References: <15367.56879.54329.654575@smaug.fsl.noaa.gov> Message-ID: <200111302016.fAUKG9X01351@localhost.localdomain> Mike Romberg writes: > I'm wondering if there is some good reason why equal(), not_equal(), > nonzero() and the like do not work with numeric arrays of tyep > complex. I can see why operators like less() and less_equal() do not > work. But the pure equality ones seem like they should work. Or am I > missing something :). Before Python 2.1, comparison couldn't be implemented for equality only. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From europax at home.com Fri Nov 30 17:35:03 2001 From: europax at home.com (Rob) Date: Fri Nov 30 17:35:03 2001 Subject: [Numpy-discussion] Numeric Python EM Project has moved Message-ID: <3C083356.31E66685@home.com> Its now at www.pythonemproject.com. I can be reached at rob at pythonemproject.com. All this has come about since @home is possibly suspending operation at midnite tonight :( Rob. Looks like I need to change my sig too :) -- The Numeric Python EM Project www.members.home.net/europax From jjl at pobox.com Thu Nov 1 11:19:12 2001 From: jjl at pobox.com (John J. Lee) Date: Thu Nov 1 11:19:12 2001 Subject: [Numpy-discussion] RE: Numeric2 In-Reply-To: Message-ID: On Tue, 30 Oct 2001, Perry Greenfield wrote: [...] > > What is the current status of Numeric2? > > > We are in the process of putting it up on sourceforge now. [...] What does it do?? John From hungjunglu at yahoo.com Fri Nov 2 10:24:10 2001 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Fri Nov 2 10:24:10 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? Message-ID: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Hi, This is a tangential topic. Can someone give me pointers where to find freeware/shareware/commercial packages for linear algebra and probability calculations (e.g: Cholesky decomposition, eigenvalue & eigenvectors in diagonalization, interpolation, normal distribution, beta distribution, inverse cumulative normal function, etc.), and such that it uses assembly level optimization (I need highspeed, but on mundane Pentium 3 or Pentium 4 machines) and can be used in Windows platform and from Microsoft's Visual C++? I know mtxvec from www.dewresearch.com does something along these lines, but it seems like they are aiming for specific dev platforms (CBuilder and Delphi). thanks! Hung Jung __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From chrishbarker at home.net Fri Nov 2 11:40:07 2001 From: chrishbarker at home.net (Chris Barker) Date: Fri Nov 2 11:40:07 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? References: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Message-ID: <3BE2FADF.23D659E9@home.net> Hung Jung Lu wrote: > Can someone give me pointers where to find > freeware/shareware/commercial packages for linear > algebra and probability calculations (e.g: Cholesky > decomposition, eigenvalue & eigenvectors in > diagonalization, This sounds likeyou are looking for is LAPACK with a good BLAS. Do a web search, and you'll find lot's of pointers. interpolation, normal distribution, > beta distribution, inverse cumulative normal function, > etc.) I'm lost here. Perhaps someone else will have some pointers. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From jsaenz at wm.lc.ehu.es Mon Nov 5 00:56:09 2001 From: jsaenz at wm.lc.ehu.es (Jon Saenz) Date: Mon Nov 5 00:56:09 2001 Subject: [Numpy-discussion] Assembly optimized numerical packages? In-Reply-To: <20011102182318.66182.qmail@web12606.mail.yahoo.com> Message-ID: On Fri, 2 Nov 2001, Hung Jung Lu wrote: > Can someone give me pointers where to find > freeware/shareware/commercial packages for linear > algebra and probability calculations (e.g: Cholesky > decomposition, eigenvalue & eigenvectors in > diagonalization, interpolation, normal distribution, > beta distribution, inverse cumulative normal function, > etc.), and such that it uses assembly level > optimization (I need highspeed, but on mundane Pentium > 3 or Pentium 4 machines) and can be used in Windows > platform and from Microsoft's Visual C++? For statistical distribution functions, you can check DCDFLIB.C: http://odin.mdacc.tmc.edu/anonftp/page_2.html It is C, not assembler. Jon Saenz. | Tfno: +34 946012445 Depto. Fisica Aplicada II | Fax: +34 944648500 Facultad de Ciencias. \\ Universidad del Pais Vasco \\ Apdo. 644 \\ 48080 - Bilbao \\ SPAIN From R.M.Everson at exeter.ac.uk Tue Nov 6 13:05:05 2001 From: R.M.Everson at exeter.ac.uk (R.M.Everson) Date: Tue Nov 6 13:05:05 2001 Subject: [Numpy-discussion] Sparse matrices Message-ID: Hello, Does anyone have a working sparse matrix module for Numeric 20.2.0 and Python 2.1 (or similar). I'm tryinng to get the version in the SciPy CVS tree to work - so far without success. I don't want anything particularly fancy -- not even sparse matrix inversion. Addition and multiplication would be fine. Thanks for any ideas/pointers/software etc! Cheers, Richard. -- Department of Computer Science, Exeter University Voice: +44 1392 264065 R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 From vanandel at atd.ucar.edu Tue Nov 6 13:15:04 2001 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Tue Nov 6 13:15:04 2001 Subject: [Numpy-discussion] MA - math operations do not preserve fill_value Message-ID: <3BE852CA.A18F9E5C@atd.ucar.edu> Using Python 2.1 and Numeric 20.2.1 on Redhat Linux 7.1 Shouldn't masked arrays preserve the fill value of their operands, if both operands have the same fill value? Otherwise, if I want to preserve the value of the fill_value, I have to write expressions like: d=masked_values((a+b),a.fill_value()) Here's a demonstration of the problem: >>> a = masked_values((1.0,2.0,3.0,4.0,-999.0), -999) >>> b = masked_values((-999.0,1.0,2.0,3.0,4.0), -999) >>> a array(data = [ 1., 2., 3., 4.,-999.,], mask = [0,0,0,0,1,], fill_value=-999) >>> b array(data = [-999., 1., 2., 3., 4.,], mask = [1,0,0,0,0,], fill_value=-999) >>> c=a+b >>> c array(data = [ 1.00000002e+20, 3.00000000e+00, 5.00000000e+00, 7.00000000e+00, 1.00000002e+20,], mask = [1,0,0,0,1,], fill_value=[ 1.00000002e+20,]) >>> d=masked_values((a+b),a.fill_value()) >>> d array(data = [-999., 3., 5., 7.,-999.,], mask = [1,0,0,0,1,], fill_value=-999) -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From roitblat at hawaii.edu Tue Nov 6 17:05:03 2001 From: roitblat at hawaii.edu (Herbert L. Roitblat) Date: Tue Nov 6 17:05:03 2001 Subject: [Numpy-discussion] Sparse matrices References: Message-ID: <055701c16727$b57fed90$8fd6afcf@pixi.com> Travis Oliphant has one. H. ----- Original Message ----- From: "R.M.Everson" To: Sent: Tuesday, November 06, 2001 11:03 AM Subject: [Numpy-discussion] Sparse matrices > > Hello, > > Does anyone have a working sparse matrix module for Numeric 20.2.0 and > Python 2.1 (or similar). I'm tryinng to get the version in the SciPy > CVS tree to work - so far without success. > > I don't want anything particularly fancy -- not even sparse matrix > inversion. Addition and multiplication would be fine. > > Thanks for any ideas/pointers/software etc! > > Cheers, > > Richard. > > -- > Department of Computer Science, Exeter University Voice: +44 1392 264065 > R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 > http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From jochen at jochen-kuepper.de Tue Nov 6 18:57:02 2001 From: jochen at jochen-kuepper.de (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Tue Nov 6 18:57:02 2001 Subject: [Numpy-discussion] Sparse matrices In-Reply-To: <055701c16727$b57fed90$8fd6afcf@pixi.com> References: <055701c16727$b57fed90$8fd6afcf@pixi.com> Message-ID: On Tue, 6 Nov 2001 15:01:18 -1000 Herbert L Roitblat wrote: Herbert> Travis Oliphant has one. Isn't that the one in SciPy? Herbert> ----- Original Message ----- Herbert> From: "R.M.Everson" Herbert> To: Herbert> Sent: Tuesday, November 06, 2001 11:03 AM Herbert> Subject: [Numpy-discussion] Sparse matrices >> Does anyone have a working sparse matrix module for Numeric 20.2.0 >> and Python 2.1 (or similar). I'm tryinng to get the version in the >> SciPy CVS tree to work - so far without success. Herbert, this inverse citing really is counterproductive on mls. Greetings, Jochen -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll From nwagner at mecha.uni-stuttgart.de Sun Nov 11 07:32:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Sun Nov 11 07:32:03 2001 Subject: [Numpy-discussion] RandomArray - random Message-ID: <3BEEA88E.742E9225@mecha.uni-stuttgart.de> Hi, I tried to produce a random matrix say Q (2ndof \times nsamp+1) with Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "modal.py", line 192, in ? Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) TypeError: random() takes exactly 1 argument (2 given) Does it require a new syntax to obtain a matrix consisting of uniformly distributed random numbers in the range +/- 1 ? Nils From paul at pfdubois.com Sun Nov 11 09:14:02 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sun Nov 11 09:14:02 2001 Subject: [Numpy-discussion] RandomArray - random In-Reply-To: <3BEEA88E.742E9225@mecha.uni-stuttgart.de> Message-ID: <000001c16ad3$f3e688a0$3d01a8c0@plstn1.sfba.home.com> Your reference to random is not fully qualified so I suppose you could be picking up some other random. But I just tried RandomArray.random((2,3)) and it worked fine. BTW you could just do 2.0*random((n,m))-1.0. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Nils Wagner Sent: Sunday, November 11, 2001 8:34 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] RandomArray - random Hi, I tried to produce a random matrix say Q (2ndof \times nsamp+1) with Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "modal.py", line 192, in ? Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) TypeError: random() takes exactly 1 argument (2 given) Does it require a new syntax to obtain a matrix consisting of uniformly distributed random numbers in the range +/- 1 ? Nils _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From nwagner at mecha.uni-stuttgart.de Mon Nov 12 04:01:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 12 04:01:03 2001 Subject: [Numpy-discussion] RandomArray - random References: <000001c16ad3$f3e688a0$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3BEFC88E.F87F363E@mecha.uni-stuttgart.de> "Paul F. Dubois" schrieb: > > Your reference to random is not fully qualified so I suppose you could > be picking up some other random. But I just tried > RandomArray.random((2,3)) and it worked fine. > > BTW you could just do 2.0*random((n,m))-1.0. > It seems to be a conflict with Vpython formerly Visualpython. http://cil.andrew.cmu.edu/projects/visual/index.html Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> from RandomArray import * >>> random((2,3)) array([[ 0.68769461, 0.33015978, 0.07285815], [ 0.20514929, 0.81925279, 0.50694615]]) >>> from visual import * Visual-2001-09-24 >>> random((2,3)) Traceback (most recent call last): File "", line 1, in ? TypeError: random() takes exactly 1 argument (2 given) >>> Nils > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Nils > Wagner > Sent: Sunday, November 11, 2001 8:34 AM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] RandomArray - random > > Hi, > > I tried to produce a random matrix say Q (2ndof \times nsamp+1) with > Numpy 20.2 and Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 > 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for > more information. > > Traceback (most recent call last): > File "modal.py", line 192, in ? > Q = 2.0*random((2*ndof,nsamp+1))-ones((2*ndof,nsamp+1)) > TypeError: random() takes exactly 1 argument (2 given) > > Does it require a new syntax to obtain a matrix consisting of uniformly > distributed random numbers in the range +/- 1 ? > > Nils > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From neelk at cswcasa.com Mon Nov 12 09:24:02 2001 From: neelk at cswcasa.com (Krishnaswami, Neel) Date: Mon Nov 12 09:24:02 2001 Subject: [Numpy-discussion] Building Numeric with Intel KML and mingw32 Message-ID: Hello, I'm trying to rebuild Numeric with the Intel Kernel Math Libraries. I've gotten Numeric building normally with the default BLAS libraries, but I'm not sure what I need to put into the libraries_dir_list and libraries_list variables in the setup.py file. I have the directories mkl\ia32\bin (contains the DLLs), mkl\ia32\lib (contains the lib*.a files), and mkl\include (contains the *.h files). Can anyone tells me what goes where? -- Neel Krishnaswami neelk at cswcasa.com From nwagner at mecha.uni-stuttgart.de Tue Nov 13 02:22:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Tue Nov 13 02:22:01 2001 Subject: [Numpy-discussion] Total least squares problem Message-ID: <3BF102C6.8C651D9E@mecha.uni-stuttgart.de> Hi, How do I solve a Total Least Squares problem in Numpy ? A small example would be appreciated. The TLS problem assumes an overdetermined set of linear equations AX = B, where both the data matrix A as well as the observation matrix B are inaccurate: Nils Reference: R.D.Fierro, G.H. Golub, P.C. Hansen, D.P.O'Leary, Regularization by truncated total least squares, SIAM J. Sci. Comput. Vol.18(4) 1997 pp. 1223-1241 From barnard at stat.harvard.edu Tue Nov 13 06:42:03 2001 From: barnard at stat.harvard.edu (barnard at stat.harvard.edu) Date: Tue Nov 13 06:42:03 2001 Subject: [Numpy-discussion] Small Bug in multiarray.c Message-ID: <15345.13522.866400.686203@aragorn.stat.harvard.edu> When attempting to compile the CVS version of Numpy using MSVC 6 under Windows 2000 I found a small error in multiarray.c: the doc string for arange contains newlines. The offending code begins one line # 1168. Simple removing the newlines from the string fixes the error. John ******************************** * John Barnard, Ph.D. * Senior Research Statistician * deCODE genetics * 1000 Winter Str., Suite 3100 * Waltham, MA 02451 * Phone (Direct) : (781) 290-5771 Ext. 27 * Phone (General) : (781) 466-8833 * Fax : (781) 466-8686 * Email: j.barnard at decode.com ******************************** From oliphant at ee.byu.edu Tue Nov 13 11:25:03 2001 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Nov 13 11:25:03 2001 Subject: [Numpy-discussion] Total least squares problem In-Reply-To: <3BF102C6.8C651D9E@mecha.uni-stuttgart.de> Message-ID: > > How do I solve a Total Least Squares problem in Numpy ? > A small example would be appreciated. > > The TLS problem assumes an overdetermined set of linear equations > AX = B, where both the data matrix A as well as the observation > matrix B are inaccurate: X, resids, rank, s = LinearAlgebra.linear_least_squares(A,B) -Travis From R.M.Everson at exeter.ac.uk Tue Nov 13 13:53:01 2001 From: R.M.Everson at exeter.ac.uk (R.M.Everson) Date: Tue Nov 13 13:53:01 2001 Subject: [Numpy-discussion] BLAS and innerproduct Message-ID: Hello, So far as I can tell Numeric.dot(), which uses innerproduct() from multiarraymodule.c doesn't call the BLAS, even if Numeric was compiled against native BLAS. This means (at least on my machine) that X = ones((150, 16384), 'd') C = dot(X, tranpose(X)) is about 15 times as slow as the comparable operations in Matlab (v6), which does, I think, use the native BLAS. I guess that multiarray.c is not particularly optimised to use the BLAS because of the difficulties of coping with all sorts of types (float32, int64 etc), and with non-contiguous arrays. The innerproduct is so basic to most of the work I use Numeric for that a speed up here would make a big difference. I'm thinking of patching multiarray.c to use the BLAS when it can, but before I start are there good reasons for doing something different? Any advice gratefully received! Cheers, Richard. -- Department of Computer Science, Exeter University Voice: +44 1392 264065 R.M.Everson at exeter.ac.uk Secretary: +44 1392 264061 http://www.dcs.ex.ac.uk/people/reverson Fax: +44 1392 264067 From nwagner at mecha.uni-stuttgart.de Wed Nov 14 04:44:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Nov 14 04:44:03 2001 Subject: [Numpy-discussion] Total least squares problem References: Message-ID: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Travis Oliphant schrieb: > > > > > How do I solve a Total Least Squares problem in Numpy ? > > A small example would be appreciated. > > > > The TLS problem assumes an overdetermined set of linear equations > > AX = B, where both the data matrix A as well as the observation > > matrix B are inaccurate: > > X, resids, rank, s = LinearAlgebra.linear_least_squares(A,B) > > -Travis Travis, There is a difference between classical least squares (Numpy) and TLS (total least squares). I am attaching a small example for illustration. Nils -------------- next part -------------- from Numeric import * from LinearAlgebra import * A = zeros((6,3),Float) b = zeros((6,1),Float) # # Example by Van Huffel # http://www.netlib.org/vanhuffel/dtls-doc # A[0,0] = 0.80010002 A[0,1] = 0.39985167 A[0,2] = 0.60005390 A[1,0] = 0.29996484 A[1,1] = 0.69990689 A[1,2] = 0.39997269 A[2,0] = 0.49994235 A[2,1] = 0.60003167 A[2,2] = 0.20012361 A[3,0] = 0.90013643 A[3,1] = 0.20016919 A[3,2] = 0.79995025 A[4,0] = 0.39998539 A[4,1] = 0.80006338 A[4,2] = 0.49985474 A[5,0] = 0.20002274 A[5,1] = 0.90007114 A[5,2] = 0.70009777 b[0] = 0.89999446 b[1] = 0.82997570 b[2] = 0.79011189 b[3] = 0.85002662 b[4] = 0.99016399 b[5] = 0.10299439 print 'Solution of an overdetermined system of linear equations A x = b' print print 'A' print print A # print 'b' print print b # x, resids, rank, s = linear_least_squares(A,b) print print 'Least squares solution (Numpy)' print print x print print 'Computed rank',rank print print 'Sum of the squared residuals', resids print print 'Singular values of A in descending order' print print s # xtls = zeros((3,1),Float) # # total least squares solution given by Van Huffel # http://www.netlib.org/vanhuffel/dtls-doc # xtls[0] = 0.500254 xtls[1] = 0.800251 xtls[2] = 0.299492 print print 'Total least squares solution' print print xtls print print 'Residuals of LS (Numpy)' print print matrixmultiply(A,x)-b print print 'Residuals of TLS' print print matrixmultiply(A,xtls)-b print # # Least squares in Numpy A^\top A x = A^\top b # Atb = matrixmultiply(transpose(A),b) AtA = matrixmultiply(transpose(A),A) xls = solve_linear_equations(AtA,Atb) print print 'Least squares solution via normal equation' print print xls From hinsen at cnrs-orleans.fr Wed Nov 14 05:30:07 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Nov 14 05:30:07 2001 Subject: [Numpy-discussion] Total least squares problem In-Reply-To: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> References: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > There is a difference between classical least squares (Numpy) > and TLS (total least squares). Algorithmically speaking it is even a very different problem. I'd say the only reasonable (i.e. efficient) solution for NumPy is to implement the TLS algorithm in a C subroutine calling LAPACK routines for SVD etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From nwagner at mecha.uni-stuttgart.de Wed Nov 14 06:13:07 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Nov 14 06:13:07 2001 Subject: [Numpy-discussion] Total least squares problem References: <3BF27591.EC1BF4EA@mecha.uni-stuttgart.de> Message-ID: <3BF28365.53373B65@mecha.uni-stuttgart.de> Konrad Hinsen schrieb: > > Nils Wagner writes: > > > There is a difference between classical least squares (Numpy) > > and TLS (total least squares). > > Algorithmically speaking it is even a very different problem. I'd say > the only reasonable (i.e. efficient) solution for NumPy is to > implement the TLS algorithm in a C subroutine calling LAPACK routines > for SVD etc. > > Konrad. > -- There are two Fortran implementations of the TLS algorithm already available via http://www.netlib.org/vanhuffel/ . Moreover there is a tool called f2py that generates Python C/API modules for wrapping Fortran 77/90/95 codes to Python. Unfortunately I am not very familar with this tool. Therefore I need some advice for this. Thanks in advance Nils > ------------------------------------------------------------------------------- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------------- From nwagner at mecha.uni-stuttgart.de Thu Nov 15 01:14:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Thu Nov 15 01:14:01 2001 Subject: [Numpy-discussion] Numpy, BLAS, LAPACK, f2py Message-ID: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> Hi, I have installed f2py on my system for wrapping existing FORTRAN 77 codes to Python. Then I have gone through the following steps An example for using a TLS (total least squares routine) http://www.netlib.org/vanhuffel/ 2) Get dtsl.f with dependencies 3) Run f2py dtsl.f -m foo -h foo.pyf only: dtsl \ \ \ \________ just wrap dtsl function \ \ \______ create signature file \ \____ python module name \_____ Fortran 77 code 4) Edit foo.pyf to your specific needs (optional) 5) Run f2py foo.pyf \_____________ this will create Python C/API module foomodule.c 6) Run make -f Makefile-foo \_____________ this will build the module 7) In python: Python 2.1.1 (#1, Sep 24 2001, 05:28:47) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import foo Traceback (most recent call last): File "", line 1, in ? ImportError: ./foomodule.so: undefined symbol: dcopy_ >>> Any suggestions to solve this problem ? Nils There are prebuilt libraries of LAPACK and BLAS in /usr/lib -rw-r--r-- 1 root root 657706 Sep 24 01:00 libblas.a lrwxrwxrwx 1 root root 12 Okt 22 19:27 libblas.so -> libblas.so.2 lrwxrwxrwx 1 root root 16 Okt 22 19:27 libblas.so.2 -> libblas.so.2.2.0 -rwxr-xr-x 1 root root 559600 Sep 24 01:01 libblas.so.2.2.0 -rw-r--r-- 1 root root 5763150 Sep 24 01:00 liblapack.a lrwxrwxrwx 1 root root 14 Okt 22 19:27 liblapack.so -> liblapack.so.3 lrwxrwxrwx 1 root root 18 Okt 22 19:27 liblapack.so.3 -> liblapack.so.3.0.0 -rwxr-xr-x 1 root root 4826626 Sep 24 01:01 liblapack.so.3.0.0 From gvermeul at labs.polycnrs-gre.fr Thu Nov 15 01:28:02 2001 From: gvermeul at labs.polycnrs-gre.fr (Gerard Vermeulen) Date: Thu Nov 15 01:28:02 2001 Subject: [Numpy-discussion] Numpy, BLAS, LAPACK, f2py In-Reply-To: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> References: <3BF395DB.DA1C34A4@mecha.uni-stuttgart.de> Message-ID: <01111510271301.11576@taco.polycnrs-gre.fr> Hi, Try to link in the blas library (there is a dcopy_ in my blas library, but better check the README first). best regards -- Gerard On Thursday 15 November 2001 11:15, Nils Wagner wrote: > Hi, > > I have installed f2py on my system for wrapping existing FORTRAN 77 > codes to Python. > Then I have gone through the following steps > > An example for using a TLS (total least squares routine) > http://www.netlib.org/vanhuffel/ > > 2) Get dtsl.f with dependencies > 3) Run > f2py dtsl.f -m foo -h foo.pyf only: dtsl > \ \ \ \________ just wrap dtsl function > \ \ \______ create signature file > \ \____ python module name > \_____ Fortran 77 code > 4) Edit foo.pyf to your specific needs (optional) > 5) Run > f2py foo.pyf > \_____________ this will create Python C/API module foomodule.c > 6) Run > make -f Makefile-foo > \_____________ this will build the module > 7) In python: > > Python 2.1.1 (#1, Sep 24 2001, 05:28:47) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > Type "copyright", "credits" or "license" for more information. > > >>> import foo > > Traceback (most recent call last): > File "", line 1, in ? > ImportError: ./foomodule.so: undefined symbol: dcopy_ > > > Any suggestions to solve this problem ? > > Nils > > There are prebuilt libraries of LAPACK and BLAS in /usr/lib > > -rw-r--r-- 1 root root 657706 Sep 24 01:00 libblas.a > lrwxrwxrwx 1 root root 12 Okt 22 19:27 libblas.so -> > libblas.so.2 > lrwxrwxrwx 1 root root 16 Okt 22 19:27 libblas.so.2 -> > libblas.so.2.2.0 > -rwxr-xr-x 1 root root 559600 Sep 24 01:01 libblas.so.2.2.0 > -rw-r--r-- 1 root root 5763150 Sep 24 01:00 liblapack.a > lrwxrwxrwx 1 root root 14 Okt 22 19:27 liblapack.so -> > liblapack.so.3 > lrwxrwxrwx 1 root root 18 Okt 22 19:27 liblapack.so.3 > -> liblapack.so.3.0.0 > -rwxr-xr-x 1 root root 4826626 Sep 24 01:01 > liblapack.so.3.0.0 > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From perry at stsci.edu Fri Nov 16 14:33:02 2001 From: perry at stsci.edu (Perry Greenfield) Date: Fri Nov 16 14:33:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: We have been working on a reimplementation of Numeric, the numeric array manipulation extension module for Python. The reimplementation is virtually a complete rewrite and because it is not completely backwards compatible with Numeric, we have dubbed it numarray to prevent confusion. While we think this version is not yet mature enough for most to use in everyday projects, we are interested in feedback on the user interface and the open issues (see the documents on the web page shown below). We also welcome those who would like to contribute to this effort by helping with the development or adding libraries. An early beta version is available on sourceforge as the package Numarray (http://sourceforge.net/projects/numpy/) Information on the goals, changes in user interface, open issues, and design can be found at http://aten.stsci.edu/numarray From pete at shinners.org Fri Nov 16 15:12:02 2001 From: pete at shinners.org (Pete Shinners) Date: Fri Nov 16 15:12:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BF59D10.2070107@shinners.org> Perry Greenfield wrote: > An early beta version is available on sourceforge as the > package Numarray (http://sourceforge.net/projects/numpy/) > > Information on the goals, changes in user interface, open issues, > and design can be found at http://aten.stsci.edu/numarray you ask a few questions on the information website, here are some of my answers for things i "care" about. note that my main use of numpy is as a pixel buffer for images. some of the changes like avoiding type promotion sounds really good to me :] 5) should the implementation be bulletproof for private vars? i don't think you should worry about this. as long as the interface is well defined, i wouldn't worry about protecting users from themselves. i this it will be the rare numarray user will be in a situation where they need to modify the internal C data. 7) necessary to add other types? yes. i really want unsigned int16 and unsigned int32. all my operations are on pixel data, and things can just get messy when i need to treat packed color values as signed integers. 8) negative and out-of-range indices? i'd prefer them to be kept as similar to python as can be. the current implementation in Numeric is nice for me. one other thing i'd like there to be a little focus on is adding my own new ufunc operators. for image manipulation i'd like new ufunc operators that clamp the results to legal values. i'd be happy to do this myself, but i don't believe it's possible with the current Numeric. the last thing i really really want is for this to be rolled into the standard python distribution. that is perhaps the most important aspect for me. i do not like requiring the extra dependency for generic numeric arrays. :] From oliphant.travis at ieee.org Fri Nov 16 18:42:02 2001 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Nov 16 18:42:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: References: Message-ID: > > While we think this version is not yet mature enough for > most to use in everyday projects, we are interested in > feedback on the user interface and the open issues (see > the documents on the web page shown below). We also welcome > those who would like to contribute to this effort by helping > with the development or adding libraries. > What I've seen looks great. You've all done some good work here. Of course, I do have some feedback. I haven't looked at everything, these points have just caught my eye. Complex Types: ============== 1) I don't like the idea of complex types being a separate subclass of ndarray. This makes them "different." Unless this "difference" can be completely hidden (which I doubt), I would prefer complex types to be on the same level as other numeric types. 2) Also, in your C-API, you have a different pointer to the imaginary data. I much prefer the way it is done currently to have complex numbers represented as an 8-byte, or 16-byte chunk of contiguous memory. Index Arrays: =========== 1) For what it's worth, my initial reaction to your indexing scheme is negative. I would prefer that if a = [[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]] then a[[1,3],[0,3]] returns the sub-matrix: [[ 4, 6], [ 12, 14] i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB works. I'm not sure what IDL does. If I understand your scheme, right now, then I would have to append an extra dimension to my indexing arrays to get this behavior, right? 2) I would like to be able to index the array in a flattenned sense as well (is that possible?) in other words, it would be nice if a[flat(9,10,11)] or something got me the elements 9,10,11 in a one-dimensional interpretation of the array. 3) Why can't you combine slice notation and indexing? Just interpret the slice as index array that would be created from using tha range operator on the same start, stop, and step objects. Is this the plan? That's all for now. I don't mean to be critical, I'm really impressed with what works so far. These are just some concerns I have right now. -Travis Oliphant From europax at home.com Sat Nov 17 08:06:02 2001 From: europax at home.com (Rob) Date: Sat Nov 17 08:06:02 2001 Subject: [Numpy-discussion] Numeric Python EM Project may need mirror Message-ID: <3BF68A67.C4963807@home.com> Hi all, I just got an email from @home yesterday, saying that all customers should back up their web pages, email, etc etc. I know they are in bankruptcy, but this email sounded ominous. I'm wondering if there is some kindly soul who would want to mirror this site. I'd really love to have this site on Starship Python, but haven't had any responses to emails to them. I'm continuously working on more code for the site so I'd hate to see it go down, even if temporarily. Sincerely, Rob. -- The Numeric Python EM Project www.members.home.net/europax From greenfield at home.com Sat Nov 17 14:58:02 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 14:58:02 2001 Subject: FW: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: -----Original Message----- > > What I've seen looks great. You've all done some good work here. > Thanks, you were origin of some of the ideas used. > Of course, I do have some feedback. I haven't looked at > everything, these > points have just caught my eye. > > Complex Types: > ============== > > 1) I don't like the idea of complex types being a separate subclass of > ndarray. This makes them "different." Unless this "difference" can be > completely hidden (which I doubt), I would prefer complex types > to be on the > same level as other numeric types. > I think that we also don't like that, and after doing the original, somewhat incomplete, implementation using the subclassed approach, I began to feel that implementing it in C (albeit using a different approach for the code generation) was probably easier and more elegant than what was done here. So you are very likely to see it integrated as a regular numeric type, with a more C-based implementation. > 2) Also, in your C-API, you have a different pointer to the > imaginary data. > I much prefer the way it is done currently to have complex numbers > represented as an 8-byte, or 16-byte chunk of contiguous memory. > Any reason not to allow both? (The pointer to the real can be interpreted as either a pointer to 8-byte or 16-byte quantities). It is true that figuring out the imaginary pointer from the real is trivial so I suppose it really isn't necessary. > > Index Arrays: > =========== > > 1) For what it's worth, my initial reaction to your indexing scheme is > negative. I would prefer that if > > a = [[1,2,3,4], > [5,6,7,8], > [9,10,11,12], > [13,14,15,16]] > > then > > a[[1,3],[0,3]] returns the sub-matrix: > > [[ 4, 6], > [ 12, 14] > > i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB > works. I'm > not sure what IDL does. > I'm afraid I don't understand the example. Could you elaborate a bit more how this is supposed to work? (Or is it possible there is an error? I would understand it if the result were [[5, 8],[13,16]] corresponding to the index pairs [[(1,0),(1,3)],[(3,0),(3,3)]]) > If I understand your scheme, right now, then I would have to > append an extra > dimension to my indexing arrays to get this behavior, right? > > 2) I would like to be able to index the array in a flattenned > sense as well > (is that possible?) in other words, it would be nice if > a[flat(9,10,11)] or > something got me the elements 9,10,11 in a one-dimensional > interpretation of > the array. > Why not: ravel(a)[[9,10,11]] ? > 3) Why can't you combine slice notation and indexing? Just interpret the > slice as index array that would be created from using tha range > operator on > the same start, stop, and step objects. Is this the plan? > I think that allowing slicing could be possible. But things were getting pretty complex as they were, and we wanted to see if there was agreement on how it was being done so far. It could be extended to handle slices, if there was a well defined interpretation. (I think there may be at least two possible interpretations considered). As for the above, sure, but of course the slice would have to be shape consistent with the other index arrays (under the current scheme). > That's all for now. I don't mean to be critical, I'm really > impressed with > what works so far. These are just some concerns I have right now. > > -Travis Oliphant > Thanks Travis, we're looking for constructive feedback, positive or negative. Perry From greenfield at home.com Sat Nov 17 16:28:02 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 16:28:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > > I think that we also don't like that, and after doing the original, > > somewhat incomplete, implementation using the subarray approach, > > I began to feel that implementing it in C (albiet using a different > > approach for the code generation) was probably easier and more > > elegant than what was done here. So you are very likely to see > > it integrated as a regular numeric type, with a more C-based > > implementation. > > Sounds good. Is development going to take place on the CVS > tree. If so, I > could help out by comitting changes directly. > > > > > > 2) Also, in your C-API, you have a different pointer to the > > > imaginary data. > > > I much prefer the way it is done currently to have complex numbers > > > represented as an 8-byte, or 16-byte chunk of contiguous memory. > > > > Any reason not to allow both? (The pointer to the real can be > interpreted > > as either a pointer to 8-byte or 16-byte quantities). It is true > > that figuring out the imaginary pointer from the real is trivial > > so I suppose it really isn't necessary. > > I guess the way you've structured the ndarray, it is possible. I figured > some operations might be faster, but perhaps not if you have two pointers > running at the same time, anyway. > Well, the C implementation I was thinking of would only use one pointer. The API could supply both if some algorithms would find it useful to just access the imaginary data alone. But as mentioned, I don't think it is important to include, so we could easily get rid of it (and probably should) > > > > > Index Arrays: > > > =========== > > > > > > 1) For what it's worth, my initial reaction to your indexing > scheme is > > > negative. I would prefer that if > > > > > > a = [[1,2,3,4], > > > [5,6,7,8], > > > [9,10,11,12], > > > [13,14,15,16]] > > > > > > then > > > > > > a[[1,3],[0,3]] returns the sub-matrix: > > > > > > [[ 4, 6], > > > [ 12, 14] > > > > > > i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB > > > works. I'm > > > not sure what IDL does. > > > > I'm afraid I don't understand the example. Could you elaborate > > a bit more how this is supposed to work? (Or is it possible > > there is an error? I would understand it if the result were > > [[5, 8],[13,16]] corresponding to the index pairs > > [[(1,0),(1,3)],[(3,0),(3,3)]]) > > > > The idea is to consider indexing with arrays of integers to be a > generalization of slice index notation. Simply interpret the > slice as an > array of integers that would be formed by using the range operator. > > For example, I would like to see > > a[1:5,1:3] be the same thing as a[[1,2,3,4],[1,2]] > > a[1:5,1:3] selects the 2-d subarray consisting of rows 1 to 4 and > columns 1 > to 2 (inclusive starting with the first row being row 0). In > other words, > the indices used to select the elements of a are ordered-pairs > taken from the > cross-product of the index set: > > [1,2,3,4] x [1,2] = [(1,1), (1,2), (2,1), (2,2), (3,1), (3,2), > (4,1), (4,2)] > and these selected elements are structured as a 2-d array of shape (4,2) > > Does this make more sense? Indexing would be a natural extension of this > behavior but allowing sets that can't be necessarily formed from > the range > function. > I understand this (but is the example in the first message consistent with this?). This is certainly a reasonable interpetation. But if this is the way multiple index arrays are interpreted, how does one easily specify scattered points in a multidimensional array? The only other alternative I can think of is to use some of the dimensions of a multidimensional index array as indicies for each of the dimensions. For example, if one wanted to index random points in a 2d array, then supplying an nx2 array would provide a list of n such points. But I see this as a more limiting way to do this (and there are often benefits to being able to keep the indices for different dimensions in separate arrays. But I think doing what you would like to do is straightforward even with the existing implementation. For example, if x is a 2d array we could easily develop a function such that: x[outer_index_product([1,3,4],[1,5])] # with a better function name! The function outer_index_product would return a tuple of two index arrays each with a shape of 3x2. These arrays would not take up more space than the original arrays even though they appear to have a much larger size (the one dimension is replicated by use of a 0 stride size so the data buffer is the same as the original). Would this be acceptable? In the end, all these indexing behaviors can be provided by different functions. So it isn't really a question of which one to have and which not to have. The question is what is supported by the indexing notation? For us, the behavior we have implemented is far more useful for our applications than the one you propose. But perhaps we are in the minority, so I'd be very interested in hearing which indexing interpretation is most useful to the general community. > > Why not: > > > > ravel(a)[[9,10,11]] ? > > sure, that would work, especially if ravel doesn't make a copy of > the data > (which I presume it does not). > Correct. Perry From greenfield at home.com Sat Nov 17 17:23:06 2001 From: greenfield at home.com (Perry Greenfield) Date: Sat Nov 17 17:23:06 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: From: Pete Shinners > 7) necessary to add other types? > yes. i really want unsigned int16 and unsigned int32. all my operations > are on pixel data, and things can just get messy when i need to treat > packed color values as signed integers. > Unsigned int16 is already supported. UInt32 could be done, but raises some interesting issues with regard to combining with Int32. I don't believe the current implementation prevents you from carrying around unsigned data in Int32 arrays. If you are using them as packed color values, do you ever do any arithmetic operations on them other than to pack and unpack them? > one other thing i'd like there to be a little focus on is adding my own > new ufunc operators. for image manipulation i'd like new ufunc operators > that clamp the results to legal values. i'd be happy to do this myself, > but i don't believe it's possible with the current Numeric. > It will be possible for users to add their own ufuncs. We will eventually document how to do so (and it should be fairly simple to do once we give a few example templates). Perry > From alessandro.mirone at wanadoo.fr Sun Nov 18 07:42:01 2001 From: alessandro.mirone at wanadoo.fr (Alessandro Mirone) Date: Sun Nov 18 07:42:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken Message-ID: <3BF7E462.A473B686@wanadoo.fr> Is it a problem of lapack3.0 of of LinearAlgebra.py? ..................... ==> (Eigenvalues should be (0,2)) >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([-2.30277564, 1.30277564]) >>> print M [[ 1.+0.j 0.+1.j] [ 0.-1.j 1.+0.j]] >>> From oliphant.travis at ieee.org Sun Nov 18 19:01:01 2001 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Nov 18 19:01:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: <3BF7E462.A473B686@wanadoo.fr> References: <3BF7E462.A473B686@wanadoo.fr> Message-ID: On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > Is it a problem of lapack3.0 of of > LinearAlgebra.py? > ..................... ==> (Eigenvalues should be (0,2)) > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) I suspect it is your lapack. On an Athlon running Mandrake Linux with the lapack-3.0-9 package, I get. >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([ 0., 2.]) -Travis From nwagner at mecha.uni-stuttgart.de Sun Nov 18 23:58:01 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Sun Nov 18 23:58:01 2001 Subject: [Numpy-discussion] Heigenvalues is broken References: <3BF7E462.A473B686@wanadoo.fr> Message-ID: <3BF8C9FA.97B3AEB1@mecha.uni-stuttgart.de> Alessandro Mirone schrieb: > > Is it a problem of lapack3.0 of of > LinearAlgebra.py? > ..................... ==> (Eigenvalues should be (0,2)) > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) > array([-2.30277564, 1.30277564]) > >>> print M > [[ 1.+0.j 0.+1.j] > [ 0.-1.j 1.+0.j]] > >>> > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion On an Athlon running SuSE Linux 7.3 with the lapack-3.0-0 package, I get. [-2.30277564 1.30277564] Nils From Peter.Verveer at embl-heidelberg.de Mon Nov 19 02:45:02 2001 From: Peter.Verveer at embl-heidelberg.de (Peter Verveer) Date: Mon Nov 19 02:45:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: <3BF59D10.2070107@shinners.org> References: <3BF59D10.2070107@shinners.org> Message-ID: On Saturday 17 November 2001 00:11 am, you wrote: > note that my main use of numpy is as a pixel buffer for images. some of > the changes like avoiding type promotion sounds really good to me :] I have exactly the same application so I agree with this. > 7) necessary to add other types? > yes. i really want unsigned int16 and unsigned int32. all my operations > are on pixel data, and things can just get messy when i need to treat > packed color values as signed integers. Yes please! One of the things that irritates me most on the original Numeric is that some types are lacking. I think the whole range of data types should be supported, even if some may be seldom used by most people. > one other thing i'd like there to be a little focus on is adding my own > new ufunc operators. for image manipulation i'd like new ufunc operators > that clamp the results to legal values. i'd be happy to do this myself, > but i don't believe it's possible with the current Numeric. I write functions in C that directly access the numeric data. I don't use the ufunc api. One reason that I do that is that I want my libary of routines to be useful independent of Numeric, so I only have a tiny glue between my C routines and Numeric. I hope that it will be still possible to do this in the new version. > the last thing i really really want is for this to be rolled into the > standard python distribution. that is perhaps the most important aspect > for me. i do not like requiring the extra dependency for generic numeric > arrays. :] I second that! Cheers, Peter -- Dr. Peter J. Verveer Bastiaens Group Cell Biology and Cell Biophysics Programme EMBL Meyerhofstrasse 1 D-69117 Heidelberg Germany Tel. : +49 6221 387245 Fax : +49 6221 387242 Email: Peter.Verveer at embl-heidelberg.de From tpitts at accentopto.com Mon Nov 19 05:58:03 2001 From: tpitts at accentopto.com (Todd Alan Pitts, Ph.D.) Date: Mon Nov 19 05:58:03 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: ; from oliphant.travis@ieee.org on Fri, Nov 16, 2001 at 07:43:41PM -0700 References: Message-ID: <20011119065758.B11653@fermi.accentopto.com> Thanks for all of your work. Things seem to be shaping up nicely. I just wanted to second some of the concerns below: > Complex Types: > ============== > > 1) I don't like the idea of complex types being a separate subclass of > ndarray. This makes them "different." Unless this "difference" can be > completely hidden (which I doubt), I would prefer complex types to be on the > same level as other numeric types. > > 2) Also, in your C-API, you have a different pointer to the imaginary data. > I much prefer the way it is done currently to have complex numbers > represented as an 8-byte, or 16-byte chunk of contiguous memory. > The second comment above is really critical for accessing utility available in a very large number of numerical libraries. In my view this would "break" the utility of numpy severely -- recopying arrays both on the way out and the way in would be extremely cumbersome. -Todd Alan Pitts From jh at oobleck.astro.cornell.edu Mon Nov 19 08:47:02 2001 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Mon Nov 19 08:47:02 2001 Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #345 - 4 msgs In-Reply-To: (numpy-discussion-request@lists.sourceforge.net) References: Message-ID: <200111191646.fAJGkCL28182@oobleck.astro.cornell.edu> Just to fill in the blanks, here's what IDL does: IDL> a = [[1,2,3,4], $ IDL> [5,6,7,8], $ IDL> [9,10,11,12], $ IDL> [13,14,15,16]] IDL> print,a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 IDL> print, a[[1,3],[0,3]] 2 16 --jh-- From jsw at cdc.noaa.gov Mon Nov 19 11:37:05 2001 From: jsw at cdc.noaa.gov (Jeff Whitaker) Date: Mon Nov 19 11:37:05 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: Message-ID: On Sun, 18 Nov 2001, Travis Oliphant wrote: > On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > > Is it a problem of lapack3.0 of of > > LinearAlgebra.py? > > ..................... ==> (Eigenvalues should be (0,2)) > > > > >>> a=array([[1,0],[0,1]]) > > >>> b=array([[0,1],[-1,0]]) > > >>> M=a+b*complex(0,1.0) > > >>> Heigenvalues(M) > > I suspect it is your lapack. On an Athlon running Mandrake Linux with the > lapack-3.0-9 package, I get. > > >>> a=array([[1,0],[0,1]]) > >>> b=array([[0,1],[-1,0]]) > >>> M=a+b*complex(0,1.0) > >>> Heigenvalues(M) > array([ 0., 2.]) This is definitely a hardware/compiler dependant feature. I get the "right" answer on Solaris (with the forte compiler) but the same "wrong" answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler options on my OS X box, to no avail. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : jsw at cdc.noaa.gov 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 From ransom at physics.mcgill.ca Mon Nov 19 11:47:02 2001 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Nov 19 11:47:02 2001 Subject: [Numpy-discussion] Heigenvalues is broken In-Reply-To: References: Message-ID: On November 19, 2001 02:36 pm, Jeff Whitaker wrote: > > This is definitely a hardware/compiler dependant feature. I get the > "right" answer on Solaris (with the forte compiler) but the same "wrong" > answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler > options on my OS X box, to no avail. But seemingly it is even stranger than this. Here are my results from Debian unstable using Lapack 3.0 on an Athlon system: Python 2.1.1 (#1, Nov 11 2001, 18:19:24) [GCC 2.95.4 20011006 (Debian prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from LinearAlgebra import * >>> a=array([[1,0],[0,1]]) >>> b=array([[0,1],[-1,0]]) >>> M=a+b*complex(0,1.0) >>> Heigenvalues(M) array([ 0., 2.]) Scott > On Sun, 18 Nov 2001, Travis Oliphant wrote: > > On Sunday 18 November 2001 09:40 am, Alessandro Mirone wrote: > > > Is it a problem of lapack3.0 of of > > > LinearAlgebra.py? > > > ..................... ==> (Eigenvalues should be (0,2)) > > > > > > >>> a=array([[1,0],[0,1]]) > > > >>> b=array([[0,1],[-1,0]]) > > > >>> M=a+b*complex(0,1.0) > > > >>> Heigenvalues(M) > > > > I suspect it is your lapack. On an Athlon running Mandrake Linux with > > the lapack-3.0-9 package, I get. > > > > >>> a=array([[1,0],[0,1]]) > > >>> b=array([[0,1],[-1,0]]) > > >>> M=a+b*complex(0,1.0) > > >>> Heigenvalues(M) > > > > array([ 0., 2.]) > > This is definitely a hardware/compiler dependant feature. I get the > "right" answer on Solaris (with the forte compiler) but the same "wrong" > answer as Alessandro on MacOS X/gcc. I've tried fiddling with compiler > options on my OS X box, to no avail. > > -Jeff > > -- > Jeffrey S. Whitaker Phone : (303)497-6313 > Meteorologist FAX : (303)497-6449 > NOAA/OAR/CDC R/CDC1 Email : jsw at cdc.noaa.gov > 325 Broadway Web : www.cdc.noaa.gov/~jsw > Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From Barrett at stsci.edu Mon Nov 19 14:12:02 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon Nov 19 14:12:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BF98336.9010500@STScI.Edu> Perry Greenfield wrote: > > An early beta version is available on sourceforge as the > package Numarray (http://sourceforge.net/projects/numpy/) > > Information on the goals, changes in user interface, open issues, > and design can be found at http://aten.stsci.edu/numarray 6) Should array properties be accessible as public attributes instead of through accessor methods? We don't currently allow public array attributes to make the Python code simpler and faster (otherwise we will be forced to use __setattr__ and such). This results in incompatibilty with previous code that uses such attributes. I prefer the use of public attributes over accessor methods. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From perry at stsci.edu Tue Nov 20 12:29:13 2001 From: perry at stsci.edu (Perry Greenfield) Date: Tue Nov 20 12:29:13 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > 6) Should array properties be accessible as public attributes > instead of through accessor methods? > > We don't currently allow public array attributes to make > the Python code simpler and faster (otherwise we will > be forced to use __setattr__ and such). This results in > incompatibilty with previous code that uses such attributes. > > > I prefer the use of public attributes over accessor methods. > > > -- > Paul Barrett, PhD Space Telescope Science Institute The issue of efficiency may not be a problem with Python 2.2 or later since it provides new mechanisms that avoid the need to use __setattr__ to solve this problem. (e.g. __slots__, property, __get__, and __set__). So it becomes more of an issue of which style people prefer rather than simplicity and speed of the code. Perry From chrishbarker at home.net Tue Nov 20 15:23:12 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue Nov 20 15:23:12 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download References: Message-ID: <3BFAEA19.3153B495@home.net> Perry Greenfield wrote: > > One major comment that isn't directly addressed on the web page is the > > ease of writing new functions, I suppose Ufuncs, although I don't > > usually care if they work on anything other than Arrays. I hope the new > > system will make it easier to write new ones. > Absolutely. We will provide examples of how to write new ufuncs. It should > be very simple in one sense (requiring few lines of code) if our code > generator machinery is used (but context is important here so this > is why examples or a template is extremely important). But it isn't > particularly hard to do without the code generator. And such ufuncs > will handle *all* the generality of arrays including slices, non-aligned > arrays, byteswapped arrays, and type conversion. I'd like to provide > examples of writing ufuncs within a few weeks (along with examples > of other kinds of functions using the C-API as well). This sounds great! The code generting machinery sound very promising, and examples are, of course, key. I found digging through the NumPy source to figure out how to do things very treacherous. Making writing Ufuncs easy will enocourage a lot more C Ufuncs to be written which should help perfomance. > > Also, I can't help wondering if this could leverage more existing code. > > The blitz++ package being used by Eric Jones in the SciPy.compiler > > project looks very promising. It's probably too late, but I'm wondering > > what the reasons are for re-inventing such a general purpose wheel. > > > I'm not sure which "wheel" you are talking about :-) The wheel I'm talking about are multi-dimensional array objects... > We certainly > aren't trying to replilcate what Eric Jones has done with the > SciPy.compiler approach (which is very interesting in its own right). I know, I just think using an existing set of C++ classes for multiple typed multidimansional arrays would make sense, although I imagine it is too late now! > If the issue is why we are redoing Numeric: Actually, I think I had a pretty good idea why you were working on this. > 1) it has to be rewritten to be acceptable to Guido before it can be > part of the Standard Library. > 2) to add new types (e.g. unsigned) and representations (e.g., non-aligned, > byteswapped, odd strides, etc). Using memory mapped data requires some > of these. > 3) to make it more memory efficient with large arrays. > 4) to make it more generally extensible I'm particualry excited about 1) and 4) > > As a whole I have found that I would like the transition from Python to > > Compiled laguages to be smoother. The standard answer to Python > > perfomance is to profile, and then re-write the computationally intesive > > pertions in C. This would be a whole lot easier if Python used datatypes > > that are easy to use from C/C++ as well as Python. I hope NumPy2 can > > move in this direction. > > > What do you see as missing in numarray in that sense? Aside from UInt32 > I'm not aware of any missing type that is available on all platforms. > There is the issue of Float128 and such. Adding these is not hard. > The real issue is how to deal with the platforms that don't support them. I used Poor wording. When I wrote "datatypes", I meant data types in a much higher order sense. Perhaps structures or classes would be a better term. What I mean is that is should be easy to use an manipulate the same multidimensional arrays from both Python and C/C++. In the current Numeric, most folks generate a contiguous array, and then just use the array->data pointer to get what is essentially a C array. That's fine if you are using it in a traditional C way, with fixed dimension, one datatype, etc. What I'm imagining is having an object in C or C++ that could be easily used as a multidimentional array. I'm thinking C++ would probably neccesary, and probably templates as well, which is why blitz++ looked promising. Of course, blitz++ only compiles with a few up-to-date compilers, so you'd never get it into the standard library that way! This could also lead the way to being able to compile NumPy code.... > I think it is pretty easy to install since it use distutils. I agree, but from the newsgroup, it is clear that a lot of folks are very reluctant to use something that is not part of the standard library. > > > We estimate > > > that numarray is probably another order of magnitude worse, > > > i.e., that 20K element arrays are at half the asymptotic > > > speed. How much should this be improved? > > > > A lot. I use arrays smaller than that most of the time! > > > What is good enough? As fast as current Numeric? As fast as current Numeric would be "good enough" for me. It would be a shame to go backwards in performance! > (IDL does much > better than that for example). My personal benchmark is MATLAB, which I imagine is similar to IDL in performance. > 10 element arrays will never be > close to C speed in any array based language embedded in an > interpreted environment. Well, sure, I'm not expecting that > 100, maybe, but will be very hard. > 1000 should be possible with some work. I suppose MATLAB has it easier, as all arrays are doubles, and, (untill recently anyway), all variable where arrays, and all arrays were 2-d. NumPy is a lot more flexible that that. Is is the type and size checking that takes the time? > Another approach is to try to cast many of the functions as being > able to broadcast over repeated small arrays. After all, if one > is only doing a computation on one small array, it seems unlikely > that the overhead of Python will be objectionable. Only if you > have many such arrays to repeat calculations on, should it be > a problem (or am I wrong about that). You are probably right about that. > If these repeated calculations > can be "assembled" into a higher dimensionality array (which > I understand isn't always possible) and operated on in that sense, > the efficiency issue can be dealt with. I do that when possible, but it's not always possible. > But I guess this can only > be seen with specific existing examples and programs. I would > be interested in seeing the kinds of applications you have now > to gauge what the most effective solution would be. One of the things I do a lot with are coordinates of points and polygons. Sets if points I can handle easily as an NX2 array, but polygons don't work so well, as each polgon has a different number of points, so I use a list of arrays, which I have to loop over. Each polygon can have from about 10 to thousands of points (mostly 10-20, however). One way I have dealt with this is to store a polygon set as a large array of all the points, and another array with the indexes of the start and end of each polygon. That way I can transform the coordinates of all the polygons in one operation. It works OK, but sometimes it is more useful to have them in a sequence. > As mentioned, > we tend to deal with large data sets and so I don't think we have > a lot of such examples ourselves. I know large datasets were one of your driving factors, but I really don't want to make performance on smaller datasets secondary. I hope I'll get a chance to play with it soon.... -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From nwagner at mecha.uni-stuttgart.de Thu Nov 22 02:43:06 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Thu Nov 22 02:43:06 2001 Subject: [Numpy-discussion] Numpy for FORTRAN users Message-ID: <3BFCE508.E6C365DF@mecha.uni-stuttgart.de> Hi, Currently users must be aware of the fact that multi-dimensional arrays are stored differently in Python and Fortran. Is there any progress that users do not need to worry about this rather confusing and technical detail ? Nils From martin.wiechert at gmx.de Thu Nov 22 05:23:02 2001 From: martin.wiechert at gmx.de (Martin Wiechert) Date: Thu Nov 22 05:23:02 2001 Subject: [Numpy-discussion] Numpy2 and GSL Message-ID: Hi! Just an uneducated question. Are there any plans to wrap GSL for Numpy2? I did not actually try it (It's not Python ;-)), but it looks clean and powerful. Regards, Martin. From hinsen at cnrs-orleans.fr Thu Nov 22 06:29:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Nov 22 06:29:02 2001 Subject: [Numpy-discussion] Numpy2 and GSL In-Reply-To: References: Message-ID: Martin Wiechert writes: > Are there any plans to wrap GSL for Numpy2? > I did not actually try it (It's not Python ;-)), > but it looks clean and powerful. I have heard that several projects decided not to use it for legal reasons; GSL is GPL, not LGPL. Personally I don't see the problem for Python/NumPy, but then I am not a lawyer... And I haven't used GSL either, but it looks good from the description. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From edcjones at erols.com Thu Nov 22 17:30:10 2001 From: edcjones at erols.com (Edward C. Jones) Date: Thu Nov 22 17:30:10 2001 Subject: [Numpy-discussion] Numeric & changes in Python division Message-ID: <3BFDA742.5080109@erols.com> # Python 2.2b1, Numeric 20.2.0 from __future__ import division import Numeric arr = Numeric.ones((2,2), 'f') arr = arr/2.0 #Traceback (most recent call last): # File "bug.py", line 6, in ? #arr = arr/2.0 #TypeError: unsupported operand type(s) for / From paul at pfdubois.com Thu Nov 22 18:51:01 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu Nov 22 18:51:01 2001 Subject: [Numpy-discussion] Numeric & changes in Python division In-Reply-To: <3BFDA742.5080109@erols.com> Message-ID: <000201c173c9$606902c0$3d01a8c0@plstn1.sfba.home.com> You know what the doctor said: if it hurts when you do that, don't do that. Seriously, I have not the slightest idea what you're doing here. My project won't get to 2.2 until well into the new year. Especially if stuff like this has to be fixed. I haven't even read most of the 2.2 changes. I understand this is also an issue with CXX. Barry Scott runs CXX now since I am no longer in a job where I use C++. When he will get to this I don't know. I need to demote myself on the CXX website. You haven't seen any recent changes to Numpy, or comments from me on numarray, because I have a release to get out at my job. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Edward C. Jones Sent: Thursday, November 22, 2001 5:33 PM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Numeric & changes in Python division # Python 2.2b1, Numeric 20.2.0 from __future__ import division import Numeric arr = Numeric.ones((2,2), 'f') arr = arr/2.0 #Traceback (most recent call last): # File "bug.py", line 6, in ? #arr = arr/2.0 #TypeError: unsupported operand type(s) for / _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From siopis at umich.edu Fri Nov 23 20:59:01 2001 From: siopis at umich.edu (Christos Siopis ) Date: Fri Nov 23 20:59:01 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: Message-ID: [ This message got longer than i had initially thought, but these thoughts have been bugging me for so long that i cannot resist the temptation to push the send button! Apologies in advance to those not interested... ] On Mon, 26 Nov 2001, Martin Wiechert wrote: > Hi! > > Just an uneducated question. > Are there any plans to wrap GSL for Numpy2? > I did not actually try it (It's not Python ;-)), > but it looks clean and powerful. > > Regards, > Martin. I actually think that this question has come up before in this list, perhaps more than once. And i think it brings up a bigger issue, which is: to what extent is it useful for the numerical community to have multiple numerical libraries, and to what extent does this constitute a waste of resources? Numpy (Python), PDL (Perl), GSL (C), and a rather large number of other libraries usually have to re-implement the same old numerical algorithms, but offered under a different interface each time. However, there is such a big body of numerical algorithms out there that it's a daunting effort to integrate them into every language's numerical library (anyone want to implement LAPACK's functionality in Numpy?) The compromise that is usually made is to wrap one library around another. While this may be "better than nothing", it is usually not a pleasant situation as it leads to inconsistencies in the interface, inconsistencies in the error handling, difficulties in the installation, problems with licensing,... Since i have been a beneficiary rather than a contributor to the numerical open-source community, i feel somewhat hesitant to file this "complaint", but i really do think that there are relatively few people out there who are both willing and capable of building quality open-source numerical software, while there are too many algorithms to implement, so the community should be vigilant to minimize waste of resources! Don't take me wrong, i am not saying that Numpy, PDL, GSL & co. should be somehow "merged" --obviously, one needs different wrappers to call numerical routines from Python, Perl, C, C++ or Java. But there should be a way so that the actual *implementation* of the numerical algorithms is only done once and for all. So what i envision, in some sense, is a super-library of "all"/as many as possible numerical algorithms, which will present appropriate (but consistent) APIs for different programming languages, so that no matter what language i use, i can expect consistent interface, consistent numerical behavior, consistent error handling etc. Furthermore, different levels of access should allow the application developer to access low-level or high-level routines as needed (and could object orientation be efficiently built as a higher-level wrapper?) This way, the programmer won't have to worry whether the secant root finder that s/he is using handles exceptions well or how NaNs are treated. Perhaps most importantly, people would feel compelled to go into the pain of "translating" existing libraries such as LAPACK into this common framework, because they will know that this will benefit the entire community and won't go away when the next scripting language du jour eclipses their current favorite. Over time, this may lead to a truly precious resource for the numerical community. Now, i do realize that this may sound like a "holy grail" of numerical computing, that it is something which is very difficult, if not impossible to accomplish. It certainly does not seem like a project that the next ambitious programmer or lab group would want to embark into on a rainy day. Rather, it would require a number of important requirements and architectural decisions to be made first, and trade-offs considered. This would perhaps be best coordinated by the numerical community at large, perhaps under the auspices of some organization. But this would be time well-spent, for it would form the foundations on which a truly universal numerical library could be built. Experience gained from all the numerical projects to this day would obviously be invaluable in such an endeavor. I suspect that this list may not be the best place to discuss such a topic, but i think that some of the most active people in the field lurk here, and i would love to hear their thoughts and understand why i am wrong :) If there is a more appropriate forum to discuss such issues, i would be glad to be pointed to it --in which case, please disregard this posting! *************************************************************** / Christos Siopis | Tel : 734-764-3440 \ / Postdoctoral Research Fellow | \ / Department of Astronomy | FAX : 734-763-6317 \ / University of Michigan | \ / Ann Arbor, MI 48109-1090 | E-mail : siopis at umich.edu \ / U.S.A. _____________________| \ / / http://www.astro.lsa.umich.edu/People/siopis.html \ *************************************************************** From jh at oobleck.astro.cornell.edu Sat Nov 24 19:14:02 2001 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Sat Nov 24 19:14:02 2001 Subject: [Numpy-discussion] Re: Meta: too many numerical libraries doing the same thing? In-Reply-To: (numpy-discussion-request@lists.sourceforge.net) References: Message-ID: <200111250313.fAP3DUL21168@oobleck.astro.cornell.edu> Yes, this issue has been raised here before. It was the main conclusion of Paul Barrett's and my BOF session at ADASS a 5 years ago (see our report at http://oobleck.astro.cornell.edu/jh/ast/papers/idae96.ps). The main problems are that we scientists are too individualistic to get organized around a single library, too pushed by job pressures to commit much concentrated time to it ourselves, and too poor to pay the architects, coders, doc writers, testers, etc. to write it for us. Socially, we *want* to reinvent the wheel, because we want to be riding on our own wheels. Once we are riding reasonably well for our own needs, our interest and commitment vanishes. We're off to write the next paper. Following that conference, I took a poll on this list looking for help to implement the library. About half a dozen people responded that they could put in up to 10 hours a week, which in my experience isn't enough, once things get hard and attrition sets in. Nonetheless, Paul and I proposed to the NASA Astrophysics Data Analysis Program to hire some people to write it, but we were turned down. We proposed the idea to the head of the High Energy Astrophysics group at NASA Goddard, and he agreed -- as long as what we were really doing was writing software for his group's special needs. The frustrating thing is how many hundreds of astronomy projects hire people to do their 10% of this problem, and how unwilling they are to pool resources to do the general problem. A few of the volunteers in my query to this list have gone on to do SciPy, to their credit, but I don't see them moving in the direction we outlined. Still, they have the capacity to do it right in Python and compiled code written explicitly for Python. They won't solve the general problem, but they may solve the first problem, namely getting a data analysis environment that is OSS and as good as IDL et al. in terms of end-to-end functionality, completeness, and documentation. I like the notion that the present list is for designing and building the underlying language capabilities into Python, and for getting them standardized, tested, and included in the main Python distribution. It is also a good place for debating the merits of different implementations of particular functionality. That leaves the job of building coherent end-user data analysis packages (which necessarily have to pick one routine to be called "fft", one device-independent graphics subsystem, etc.) to application groups like SciPy. There can be more than one of these, if that's necessary, but they should all use the same underlying numerical language capability. I hope that the application groups from several array-based OSS languages will someday get together and collaborate on an ueberlibrary of numerical and graphics routines (the latter being the real sticking point) that are easily wrapped by most languages. That seems backwards, but I think the social reality is that that's the way it is going to be, if it ever happens at all. --jh-- From paul at pfdubois.com Sat Nov 24 19:59:01 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sat Nov 24 19:59:01 2001 Subject: [Numpy-discussion] Re: Meta: too many numerical libraries doing the same thing? In-Reply-To: <200111250313.fAP3DUL21168@oobleck.astro.cornell.edu> Message-ID: <000101c17565$12af2760$3d01a8c0@plstn1.sfba.home.com> There is more to this issue than meets the eye, both technically and historically. For numerical algorithms to be available independent of language, they would have to be packaged as components such as COM objects. While there is research in this field, nobody knows whether it can be done is a way that is efficient enough. For a given language like C, C++, Eiffel or Fortran used as the speed-demon base for wrapping up in Python, there are some difficult technical issues. Reusable numerical software needs context to operate and there is no decent way to supply the context in a non-object-oriented language. Geoff Furnish wrote a good paper about the issue for C++ showing the way to truly reusable libraries in that language, and recent improvements in Eiffel make it easier to do there now. In C or Fortran you simply can't do it. (Note that Eiffel or C++ versions of some NAG routines typically have methods with one or two arguments while the C or Fortran ones have 15 or more; a routine is not reusable if you have to understand that many arguments to try it. There are also important issue with regard to error handling and memory). The second issue is the algorithmic issue: most scientists do NOT know the right algorithms to use, and the ones they do use are often inferior. The good algorithms are for the most part in commercial libraries, and the numerical analysis literature, where they were written by numerical analysts. Often the coding from both sources is unavailable for free use, in the wrong language, and/or wretched. The commerical libraries also exist because some companies have requirements for fiduciary responsibility; in effect, they need a guarantor of the software to show that they have not carelessly depended on software of unknown quality. In short, computer scientists are not going to be able to write such a library without an army of numerical analysts familiar with the literature, and the numerical analysts aren't going to write it unless they are OO-experienced, which almost all of them aren't, so far. Most people when they discuss mathematical software think of leaves on the call tree. In fact the most useful mathematical software, in the sense that it incorporates the most expertise, is middleware such as ODE solvers, integrators, root finders, etc. The algorithm itself will have many controls, optional outputs, etc. This requires a library-wide design motif. I thus feel there are perfectly good reasons not to expect such a library soon. The Python community could do a good OO-design using what is available (such as LAPACK) but we haven't -- all the contributions are functional. From hinsen at cnrs-orleans.fr Sun Nov 25 04:45:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sun Nov 25 04:45:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? Message-ID: <200111251244.fAPCiIj01855@localhost.localdomain> "Christos Siopis " writes: > Don't take me wrong, i am not saying that Numpy, PDL, GSL & co. should be > somehow "merged" --obviously, one needs different wrappers to call > numerical routines from Python, Perl, C, C++ or Java. But there should be > a way so that the actual *implementation* of the numerical algorithms is > only done once and for all. I agree that sounds nice in theory. But even if it were technically feasible (which I doubt) given the language differences, it would be a development project that is simply too big for scientists to handle as a side job, even if they were willing (which again I doubt). My impression is that the organizational aspects of software development are often neglected. Some people are good programmers but can't work well in teams. Others can work in teams, but are not good coordinators. A big project requires at least one, if not several, people who are good scientist and programmers, have coordinator skills, and a job description that permits them to take up the task. Plus a larger number of people who are good scientists and programmers and can work in teams. Finally, all of these have to agree on languages, design principles, etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From tim.hochberg at ieee.org Sun Nov 25 10:50:02 2001 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Sun Nov 25 10:50:02 2001 Subject: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download References: <3BF98336.9010500@STScI.Edu> Message-ID: <01fd01c175e1$e6ae7990$87740918@cx781526b> From: "Paul Barrett" > Perry Greenfield wrote: > > > > > An early beta version is available on sourceforge as the > > package Numarray (http://sourceforge.net/projects/numpy/) > > > > Information on the goals, changes in user interface, open issues, > > and design can be found at http://aten.stsci.edu/numarray > > > 6) Should array properties be accessible as public attributes > instead of through accessor methods? > > We don't currently allow public array attributes to make > the Python code simpler and faster (otherwise we will > be forced to use __setattr__ and such). This results in > incompatibilty with previous code that uses such attributes. > > > I prefer the use of public attributes over accessor methods. As do I. As of Python 2.2, __getattr__/__setattr__ should not be required anyway: new style classes allow this to be done in a more pleasent way. (I'm still too fuzzy on the details to describe it coherently here though). -tim From nwagner at mecha.uni-stuttgart.de Mon Nov 26 01:55:03 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 26 01:55:03 2001 Subject: [Numpy-discussion] Sort , Complex array Message-ID: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Hi, How can I sort an array of complex eigenvalues with respect to the imaginary part (in ascending order) in Numpy ? All eigenvalues appear in complex cunjugate pairs. Nils From hinsen at cnrs-orleans.fr Mon Nov 26 02:46:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 02:46:02 2001 Subject: [Numpy-discussion] Sort , Complex array In-Reply-To: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> References: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > How can I sort an array of complex eigenvalues with respect to the > imaginary part > (in ascending order) in Numpy ? > All eigenvalues appear in complex cunjugate pairs. indices = argsort(eigenvalues.imag) eigenvalues = take(eigenvalues, indices) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gvermeul at labs.polycnrs-gre.fr Mon Nov 26 02:48:02 2001 From: gvermeul at labs.polycnrs-gre.fr (Gerard Vermeulen) Date: Mon Nov 26 02:48:02 2001 Subject: [Numpy-discussion] Sort , Complex array In-Reply-To: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> References: <3C021F19.E0D869CB@mecha.uni-stuttgart.de> Message-ID: <01112611475600.19933@taco.polycnrs-gre.fr> On Monday 26 November 2001 11:53, Nils Wagner wrote: > Hi, > > How can I sort an array of complex eigenvalues with respect to the > imaginary part > (in ascending order) in Numpy ? > All eigenvalues appear in complex cunjugate pairs. > > Nils > I have solved that like this: >>> from Numeric import * >>> a = array([3+3j, 1+1j, 2+2j]) >>> b = a.imag >>> print take(a, argsort(b)) [ 1.+1.j 2.+2.j 3.+3.j] >>> Best regards -- Gerard From nwagner at mecha.uni-stuttgart.de Mon Nov 26 07:03:06 2001 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Nov 26 07:03:06 2001 Subject: [Numpy-discussion] Augmented matrix Message-ID: <3C026834.E56CE70@mecha.uni-stuttgart.de> Hi, How can I build an augmented matrix [A,b] in Numpy, where A is a m * n matrix (m>n) and b is a m*1 vector Nils From hinsen at cnrs-orleans.fr Mon Nov 26 08:34:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 08:34:02 2001 Subject: [Numpy-discussion] Augmented matrix In-Reply-To: <3C026834.E56CE70@mecha.uni-stuttgart.de> References: <3C026834.E56CE70@mecha.uni-stuttgart.de> Message-ID: Nils Wagner writes: > How can I build an augmented matrix [A,b] in Numpy, > where A is a m * n matrix (m>n) and b is a m*1 vector AB = concatenate((A, b[:, NewAxis]), -1) (assuming b is of rank 1) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From chrishbarker at home.net Mon Nov 26 10:30:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 10:30:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> Message-ID: <3C028E87.82C57211@home.net> Another factor that complicates things is open source philosophy and the licenses that go with it. The GSL project looks very promising, and the ultimate goals of that project appear to be to create a coherent and complete numerical library. This kind of thing NEEDS to be open source, and the GSL folks have chosen a license (GPL) that guarantees that it remains that way. That is a good thing. The license also make it impossible to use the library in closed source projects, which is a deal killer for a lot of people, but it is also an important attribute for many folks that don't think there should be closed source projects at all. I believe that that will greatly stifle the potential of the project, but it fits with the philosophy iof it's creators. Personally I think the LGPL would have guaranteed the future openness of the source, and allowed a much greater user (and therefor contributer) base. BTW, IANAL either, but my reading of the GPL and Python's "GPL compatable" license, is that GSL could be used with Python, but the result would have to be released under the GPL. That means it could not be imbedded in a closed source project. As a rule, Python itself and most of the libraries I have seen for it (Numeric, wxPython, etc.) are released under licences that allow propriatary use, so we probably don't want to make Numeric, or SciPy GPL. too bad. On another note, it looks like the blitz++ library might be a good basis for a general Numerical library (and NumPy 3) as well. It does come with a flexible license. Any thoughts? -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From hinsen at cnrs-orleans.fr Mon Nov 26 11:40:03 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 11:40:03 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> Message-ID: <200111261938.fAQJcmd01426@localhost.localdomain> Chris Barker writes: > On another note, it looks like the blitz++ library might be a good basis > for a general Numerical library (and NumPy 3) as well. It does come > with a flexible license. Any thoughts? I think the major question is whether we are willing to move to C++. And if we want to keep up any pretentions for Numeric becoming part of the Python core, this translates into whether Guido will accept C++ code in the Python core. >From a more pragmatic point of view, I wonder what the implications for efficiency would be. C++ used to be very different in their optimization abilities, is that still the case? Even more pragmatically, is blitz++ reasonably efficient with g++? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From chrishbarker at home.net Mon Nov 26 12:43:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 12:43:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> <200111261938.fAQJcmd01426@localhost.localdomain> Message-ID: <3C02ADB3.E314B8FB@home.net> Konrad Hinsen wrote: > Chris Barker writes: > > On another note, it looks like the blitz++ library might be a good basis > > for a general Numerical library (and NumPy 3) as well. It does come > > with a flexible license. Any thoughts? > I think the major question is whether we are willing to move to C++. > And if we want to keep up any pretentions for Numeric becoming part of > the Python core, this translates into whether Guido will accept C++ > code in the Python core. Actually, It's worse than that. Blitz++ makes heavy use of templates, and thus only works with compilers that support that well. The current Python core can compile under a very wide variety of compilers. I doubt that Guido would want to change that. Personally, I'm torn. I would very much like to see NumPy arrays become part of the core Python, but don't want to have to compromise what it could be to do that. Another idea is to extend the SciPy project to become a complete Python distribution, that would clearly include Numeric. One download, and you have all you need. > >From a more pragmatic point of view, I wonder what the implications > for efficiency would be. C++ used to be very different in their > optimization abilities, is that still the case? Even more > pragmatically, is blitz++ reasonably efficient with g++? I know g++ is supported (and I think it is their primary development platform). From the web site: Is there a way to soup up C++ so that we can keep the advanced language features but ditch the poor performance? This is the goal of the Blitz++ project: to develop techniques which will enable C++ to rival -- and in some cases even exceed -- the speed of Fortran for numerical computing, while preserving an object-oriented interface. The Blitz++ Numerical Library is being constructed as a testbed for these techniques. Recent benchmarks show C++ encroaching steadily on Fortran's high-performance monopoly, and for some benchmarks, C++ is even faster than Fortran! These results are being obtained not through better optimizing compilers, preprocessors, or language extensions, but through the use of template techniques. By using templates cleverly, optimizations such as loop fusion, unrolling, tiling, and algorithm specialization can be performed automatically at compile time. see: http://www.oonumerics.org/blitz/whatis.html for more info. I havn't messed with it myself, but from the web page, it seems the answer is yes, C++ can produce high performance code. -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From hinsen at cnrs-orleans.fr Mon Nov 26 12:52:02 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Nov 26 12:52:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> (paul@pfdubois.com) References: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> Message-ID: <200111262050.fAQKoxB01580@localhost.localdomain> > We had some meetings to discuss using blitz and the truth is that as > wrapped by Python there is not much to gain. The efficiency of blitz > comes up when you do an array expression in C++. Then x = y + z + w + a > + b gets compiled into one loop with no temporary objects created. But That could still be of interest to extension module writers. And it seems conceivable to write some limited Python-C compiler for numerical expressions that generates extension modules, although this is more than a weekend project. Still, I agree that what most people care about is the speed of NumPy operations. Some lazy evaluation scheme might be more promising to eliminate the creation of intermediate objects, but that isn't exactly trivial either... Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Mon Nov 26 12:59:03 2001 From: perry at stsci.edu (Perry Greenfield) Date: Mon Nov 26 12:59:03 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download In-Reply-To: Message-ID: > From: Chris Barker > To: Perry Greenfield , > numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Re: Re-implementation of Python > Numerical arrays (Numeric) available > for download > > I used Poor wording. When I wrote "datatypes", I meant data types in a > much higher order sense. Perhaps structures or classes would be a better > term. What I mean is that is should be easy to use an manipulate the > same multidimensional arrays from both Python and C/C++. In the current > Numeric, most folks generate a contiguous array, and then just use the > array->data pointer to get what is essentially a C array. That's fine if > you are using it in a traditional C way, with fixed dimension, one > datatype, etc. What I'm imagining is having an object in C or C++ that > could be easily used as a multidimentional array. I'm thinking C++ would > probably neccesary, and probably templates as well, which is why blitz++ > looked promising. Of course, blitz++ only compiles with a few up-to-date > compilers, so you'd never get it into the standard library that way! > Yes, that was an important issue (C++ and the Python Standard Library). And yes, it is not terribly convenient to access multi-dimensional arrays in C (of varying sizes). We don't solve that problem in the way a C++ library could. But I suppose that some might say that C++ libraries may introduce their own, new problems. But coming up with the one solution to all scientific computing appears well beyond our grasp at the moment. If someone does see that solution, let us know! > I agree, but from the newsgroup, it is clear that a lot of folks are > very reluctant to use something that is not part of the standard > library. > We agree that getting into the standard library is important. > > > > We estimate > > > > that numarray is probably another order of magnitude worse, > > > > i.e., that 20K element arrays are at half the asymptotic > > > > speed. How much should this be improved? > > > > > > A lot. I use arrays smaller than that most of the time! > > > > > What is good enough? As fast as current Numeric? > > As fast as current Numeric would be "good enough" for me. It would be a > shame to go backwards in performance! > > > (IDL does much > > better than that for example). > > My personal benchmark is MATLAB, which I imagine is similar to IDL in > performance. > We'll see if we can match current performance (or at least present usable alternative approaches that are faster). > > 10 element arrays will never be > > close to C speed in any array based language embedded in an > > interpreted environment. > > Well, sure, I'm not expecting that > Good :-) > > 100, maybe, but will be very hard. > > 1000 should be possible with some work. > > I suppose MATLAB has it easier, as all arrays are doubles, and, (untill > recently anyway), all variable where arrays, and all arrays were 2-d. > NumPy is a lot more flexible that that. Is is the type and size checking > that takes the time? > Probably, but we haven't started serious benchmarking yet so I wouldn't put much stock in what I say now. > One of the things I do a lot with are coordinates of points and > polygons. Sets if points I can handle easily as an NX2 array, but > polygons don't work so well, as each polgon has a different number of > points, so I use a list of arrays, which I have to loop over. Each > polygon can have from about 10 to thousands of points (mostly 10-20, > however). One way I have dealt with this is to store a polygon set as a > large array of all the points, and another array with the indexes of the > start and end of each polygon. That way I can transform the coordinates > of all the polygons in one operation. It works OK, but sometimes it is > more useful to have them in a sequence. > This is a good example of an ensemble of variable sized arrays. > > As mentioned, > > we tend to deal with large data sets and so I don't think we have > > a lot of such examples ourselves. > > I know large datasets were one of your driving factors, but I really > don't want to make performance on smaller datasets secondary. > > -- > Christopher Barker, That's why we are asking, and it seems so far that there are enough of those that do care about small arrays to spend the effort to significantly improve the performance. Perry From chrishbarker at home.net Mon Nov 26 13:03:02 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 13:03:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <000301c176b7$26da0680$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3C02B298.E1F0E661@home.net> "Paul F. Dubois" wrote: > We had some meetings to discuss using blitz and the truth is that as > wrapped by Python there is not much to gain. The efficiency of blitz > comes up when you do an array expression in C++. Then x = y + z + w + a > + b gets compiled into one loop with no temporary objects created. But > this trick is possible because you can bind the assignment. In python > you cannot bind the assignment so you cannot do a lazy evaluation of the > operations, unless you are willing to go with some sort of function call > like x = evaluate(y + z + w). Immediate evaluations means creating > temporaries, and performance is dead. > > The only gain then would be when you passed a Python-wrapped blitz array > back to C++ and did a bunch of operations there. Personally, I think this could be a big gain. At the moment, if you don't get the performance you need with NumPy, you have to write some of your code in C, and using the Numeric and Python C API is a whole lot of work, particularly if you want your function to work on non-contiguous arrays and/or arrays of any type. I don't know much C++, and I have no idea if Blitz++ fits this bill, but it seemed to me that using an object oriented framework that could take care of reference counting, and allow you to work with generic arrays, and index them naturally, etc, would be a great improvement, even if the performance was the same as the current C API. Perhaps NumPy2 has accomplished that, it sounds like it is a step in the right direction, at least. In a sentence: the most important reason for using a C++ object oriented multi-dimensional array package would be easy of use, not speed. It's nice to hear Blitz++ was considered, it was proably rejected for good reason, but it just looked very promising to me. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From oliphant at ee.byu.edu Mon Nov 26 13:24:11 2001 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Nov 26 13:24:11 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? In-Reply-To: <3C02B298.E1F0E661@home.net> Message-ID: > In a sentence: the most important reason for using a C++ object oriented > multi-dimensional array package would be easy of use, not speed. > > It's nice to hear Blitz++ was considered, it was proably rejected for > good reason, but it just looked very promising to me. I believe that Eric's "compiler" module included in SciPy uses Blitz++ to optimize Numeric expressions. You have others who also share your admiration of Blitz++ -Travis From chrishbarker at home.net Mon Nov 26 15:31:05 2001 From: chrishbarker at home.net (Chris Barker) Date: Mon Nov 26 15:31:05 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame thing? References: Message-ID: <3C02D510.E7454CCA@home.net> Travis Oliphant wrote: > I believe that Eric's "compiler" module included in SciPy uses Blitz++ to > optimize Numeric expressions. You have others who also share your > admiration of Blitz++ Yes, it does. That's where I heard about it. That also brings up a good point. Paul mentioned that using something like Blitz++ would only help performance if you could pass it an entire expression, like: x = a+b+c+d. That is exactly what Eric's compiler module does, and it would sure be easier if NumPy already used Blitz++! In Fact, I suppose Eric's compiler is a start towards a tool that could comp9le en entire NumPy function or module. I'd love to be able to just do that (with some tweeking perhaps) rather than having to code it all by hand. My fantasies continue... -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From jochen at jochen-kuepper.de Mon Nov 26 16:34:01 2001 From: jochen at jochen-kuepper.de (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Mon Nov 26 16:34:01 2001 Subject: [Numpy-discussion] Re: Numpy2 and GSL In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 26 Nov 2001 08:21:40 +0100 Martin Wiechert wrote: Martin> Are there any plans to wrap GSL for Numpy2? Martin> I did not actually try it (It's not Python ;-)), Martin> but it looks clean and powerful. There is actually a project to wrap gsl for python: http://pygsl.sourceforge.net/ It only provides wrapper for the special functions, but more is to come. (Hopefully Achim will put the cvs on sf soon.) Yes, I agree, PyGSL should be fully integrated with Numpy2, but it should probably also remain a separate project -- as Numpy should stay a base layer for all kind of numerical stuff and hopefully make it into core python at some point (my personal wish, no more, AFAICT!). I think when PyGSL will fully go to SF (or anything similar) more people would start contributing and we should have a fine general numerical algorithms library for python soon! Greetings, Jochen - -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt and GnuPG iD8DBQE8At88iJ/aUUS8zY4RAikdAJ9184yaCSH+GtkDz2mLVlrSh7mjEQCdGSqA 2uhmBKRCFBb9eeq3gmmn9/Q= =64gm -----END PGP SIGNATURE----- From europax at home.com Mon Nov 26 17:36:16 2001 From: europax at home.com (Rob) Date: Mon Nov 26 17:36:16 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing the same thing? References: <200111251244.fAPCiIj01855@localhost.localdomain> <3C028E87.82C57211@home.net> <200111261938.fAQJcmd01426@localhost.localdomain> <3C02ADB3.E314B8FB@home.net> Message-ID: <3C02ED76.F02F17D8@home.com> I'm currently testing the SciPy Blitz++ features with FDTD. Should have some comparisons soon. Right now my statements are compiling, but not giving the right answers :( I think they might have it fixed soon. Rob. Chris Barker wrote: > > Konrad Hinsen wrote: > > Chris Barker writes: > > > On another note, it looks like the blitz++ library might be a good basis > > > for a general Numerical library (and NumPy 3) as well. It does come > > > with a flexible license. Any thoughts? > > > I think the major question is whether we are willing to move to C++. > > And if we want to keep up any pretentions for Numeric becoming part of > > the Python core, this translates into whether Guido will accept C++ > > code in the Python core. > > Actually, It's worse than that. Blitz++ makes heavy use of templates, > and thus only works with compilers that support that well. The current > Python core can compile under a very wide variety of compilers. I doubt > that Guido would want to change that. > > Personally, I'm torn. I would very much like to see NumPy arrays become > part of the core Python, but don't want to have to compromise what it > could be to do that. Another idea is to extend the SciPy project to > become a complete Python distribution, that would clearly include > Numeric. One download, and you have all you need. > > > >From a more pragmatic point of view, I wonder what the implications > > for efficiency would be. C++ used to be very different in their > > optimization abilities, is that still the case? Even more > > pragmatically, is blitz++ reasonably efficient with g++? > > I know g++ is supported (and I think it is their primary development > platform). From the web site: > > Is there a way to soup up C++ so that we can keep the advanced language > features but ditch the poor performance? This is the goal of the > Blitz++ project: to develop techniques which will enable C++ to rival -- > and in some cases even exceed -- the speed of Fortran for numerical > computing, while preserving an object-oriented interface. The Blitz++ > Numerical Library is being constructed as a testbed for these > techniques. > > Recent benchmarks show C++ encroaching steadily on Fortran's > high-performance monopoly, and for some benchmarks, C++ is even faster > than Fortran! These results are being obtained not through better > optimizing compilers, preprocessors, or language extensions, but through > the > use of template techniques. By using templates cleverly, optimizations > such as loop fusion, unrolling, tiling, and algorithm specialization can > be > performed automatically at compile time. > > see: http://www.oonumerics.org/blitz/whatis.html for more info. > > I havn't messed with it myself, but from the web page, it seems the > answer is yes, C++ can produce high performance code. > > -- > Christopher Barker, > Ph.D. > ChrisHBarker at home.net --- --- --- > http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ > ------@@@ ------@@@ ------@@@ > Oil Spill Modeling ------ @ ------ @ ------ @ > Water Resources Engineering ------- --------- -------- > Coastal and Fluvial Hydrodynamics -------------------------------------- > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- The Numeric Python EM Project www.members.home.net/europax From Achim.Gaedke at uni-koeln.de Tue Nov 27 00:20:02 2001 From: Achim.Gaedke at uni-koeln.de (Achim Gaedke) Date: Tue Nov 27 00:20:02 2001 Subject: [Numpy-discussion] Re: Numpy2 and GSL References: Message-ID: <3C034BFA.FBB64E94@uni-koeln.de> Ok, there is a clear need for the facility of easy contribution. Please be patient until Friday, December 7th. Then I have time to let it happen. It is right that the oficial site for this project is at pygsl.sourcefogrge.net (Brian Gough, can you change the link on the gsl homepage, thanks :-) ) But I will show some discussion points that must be clear before a cvs release: - Is the file and directory structure fully expandable, can several persons work parallel? - Should classes be created with excellent working objects or should it be a 1:1 wrapper? - should there be one interface dynamic library or more than one? - Is there an other way expect that of the GPL (personally prefered, but other opinions should be discussed before the contribution of source) Some questions of minor weight: - Is the tuple return value for (value,error) ok in the sf module? - Test cases are needed These questions are the reason, why I do not simply "copy" my code into cvs. Jochen K?pper wrote: > > It only provides wrapper for the special functions, but more is to > come. (Hopefully Achim will put the cvs on sf soon.) > > Yes, I agree, PyGSL should be fully integrated with Numpy2, but it > should probably also remain a separate project -- as Numpy should stay > a base layer for all kind of numerical stuff and hopefully make it > into core python at some point (my personal wish, no more, AFAICT!). > > I think when PyGSL will fully go to SF (or anything similar) more > people would start contributing and we should have a fine general > numerical algorithms library for python soon! > I agree with Jochen and I'd like to move to the core of Python too. But this is far away and I hate monolithic distributions. If there is the need to discuss seperately about PyGSL we can do that here or at the gsl-discuss list mailto:gsl-discuss at sources.redhat.com . But there is also the possibility of a mailinglist at pygsl.sourceforge.net . Please let me know. From neelk at cswcasa.com Tue Nov 27 05:52:05 2001 From: neelk at cswcasa.com (Krishnaswami, Neel) Date: Tue Nov 27 05:52:05 2001 Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download Message-ID: Perry Greenfield [mailto:perry at stsci.edu] wrote: > > > > I know large datasets were one of your driving factors, but I really > > don't want to make performance on smaller datasets secondary. > > That's why we are asking, and it seems so far that there are enough > of those that do care about small arrays to spend the effort to > significantly improve the performance. Well, here's my application. I do data mining work, and one of the techniques I want to use Numpy for is to implement robust regression algorithms like least-trimmed-squares. Now for a k-variable regression, the best-of-breed algorithm for this involves taking hundreds of thousands of k-element samples and calculating the fitting hyperplane through them. Small matrix performance is thus something this program lives or dies by, and right now it seems like 'dies' is the right measure -- it is about 10x slower than the Gauss program that does the same thing. :( When I profiled it seems like Numpy is spending almost all of its time in _castCopyAndTranspose. Switching to the Intel MKL LAPACK had no performance effect, but changing _castCopyAndTranspose into a C function was a 20% speed increase. If Numpy2 is even slower on small matrices I'd have to give up using it, and that's a shame: it's a *much* nicer environment than Gauss is. -- Neel Krishnaswami neelk at cswcasa.com From hungjunglu at yahoo.com Tue Nov 27 08:28:06 2001 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Tue Nov 27 08:28:06 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: Message-ID: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Hi, Thanks to Jon Saenz and Chris Baker for helping out with fast linear algebra and statistical distribution routines. Again, I have a tangential question. I am hitting the physical limit of the CPU (meaning things have been optimized down to assembly level), in order to achieve even higher performance, the only way to go is hardware. Is there any recommendation for fast machines at the price range of a few thousand dollars? (I cannot afford supercomputers or connection machines.) My purpose is to run Monte Carlo simulation. This means that a lot of scenarios can be run in parallel fashion. Of course I can just use regular cheap Pentium boxes... but they are kind of bulky, and I don't need any of the video, audio, USB features (I think 10 machines at 1GHz each would be the size of calculation power I need, or equivalently, a single machine at an equivalent 10GHz. Heck, if there are some specialized racks/boxes, I can wire the motherboards myself.) I am wondering what you people do for heavy number crunching? Are there any cheap yet specialized machines? What about machines with dual processor? I would imagine a lot of people in the number crunching world run into my situation, and since the number crunching machines don't require much beyond a motherboard and a small hard-drive, maybe there are already some cheap solutions out there. thanks! Hung Jung __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From rossini at blindglobe.net Tue Nov 27 09:44:02 2001 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue Nov 27 09:44:02 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: <20011127162705.40865.qmail@web12604.mail.yahoo.com> References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <87vgfwdsao.fsf@jeeves.blindglobe.net> >>>>> "HJL" == Hung Jung Lu writes: HJL> Again, I have a tangential question. I am hitting the HJL> physical limit of the CPU (meaning things have been optimized HJL> down to assembly level), in order to achieve even higher HJL> performance, the only way to go is hardware. HJL> Is there any recommendation for fast machines at the price HJL> range of a few thousand dollars? (I cannot afford HJL> supercomputers or connection machines.) My purpose is to run HJL> Monte Carlo simulation. This means that a lot of scenarios HJL> can be run in parallel fashion. Of course I can just use HJL> regular cheap Pentium boxes... but they are kind of bulky, HJL> and I don't need any of the video, audio, USB features (I HJL> think 10 machines at 1GHz each would be the size of HJL> calculation power I need, or equivalently, a single machine HJL> at an equivalent 10GHz. Heck, if there are some specialized HJL> racks/boxes, I can wire the motherboards myself.) I am HJL> wondering what you people do for heavy number crunching? Are HJL> there any cheap yet specialized machines? What about machines HJL> with dual processor? I would imagine a lot of people in the HJL> number crunching world run into my situation, and since the HJL> number crunching machines don't require much beyond a HJL> motherboard and a small hard-drive, maybe there are already HJL> some cheap solutions out there. The usual way is to build some "blackboxes", i.e. mobo/cpu/memory/NIC, diskless or nearly diskless (you don't want to maintain machines :-). Connect them using 100bT or faster networks (though 100bT should be fine). Do such things exist? Sort of -- they tend to be more expensive than building them yourself, but if you've got a reliable local supplier, they can build them fairly cheaply for you. I'd go with single or dual athlons, myself :-). If power and maintenance is an issue, duals, and if not, maybe singles. We use MOSIX (www.mosix.org) for transparent load balancing between linux machines, and it could be used on the machines I described (using a floppy or CD to boot). The next question is whether some form of parallel RNG will help. The answer is "maybe". I worked with a student who evaluated coupled chains, and we couldn't do too much better. And then after that, is whether you want to figure out how to post-process the results. If you want to automate the whole thing (and it isn't clear that it would be worth it, but...), you could use PyPVM to front-end the sub-processes distributed on the network, load-balanced at the system level by MOSIX. Now for the problems -- MOSIX seems to have difficulties with Python. Severe difficulties. I don't know if it still holds true for recent MOSIX releases. (note that I use R (www.r-project.org) for most of my simulation work these days, but am looking at Python for stat analyses, of which MCMC tools are of interest). best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ -------------- FHCRC: M-W: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: T-Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX Rosen: (Mullins' Lab) Fridays, and I'm unreachable except by email. From chrishbarker at home.net Tue Nov 27 10:28:01 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue Nov 27 10:28:01 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <3C03DF8D.3725E2A2@home.net> Hung Jung Lu wrote: > Is there any recommendation for fast machines at the > price range of a few thousand dollars? (I cannot > afford supercomputers or connection machines.) My > purpose is to run Monte Carlo simulation. This means > that a lot of scenarios can be run in parallel > fashion. Of course I can just use regular cheap > Pentium boxes... but they are kind of bulky, and I > don't need any of the video, audio, USB features (I I've been looking into setting up a system to do similar work, and it looks to me like the best bang for the buck right now are dual Athlon systems. If space is an important consideration, you can get dual Athlon 1U rack mount systems for less than $2000. I'm pretty sure the only dual Athlon board currently available (Tyan K7 thunder) has on board video, ethernet and SCSI, which means it cost a little more than it could, but these systems are still a pretty good deal if you get one without a hard drive (or a very cheap one). I just did quick web search, and epox is supposed to be coming out with a dual board as well, so there may be cheaper options soon. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ From wsryu at fas.harvard.edu Tue Nov 27 15:52:04 2001 From: wsryu at fas.harvard.edu (William Ryu) Date: Tue Nov 27 15:52:04 2001 Subject: [Numpy-discussion] Hardware for Monte Carlo simulation In-Reply-To: <3C03DF8D.3725E2A2@home.net> References: <20011127162705.40865.qmail@web12604.mail.yahoo.com> Message-ID: <5.1.0.14.2.20011127184457.00aa3850@pop.fas.harvard.edu> At 10:46 AM 11/27/2001 -0800, Chris Barker wrote: >Hung Jung Lu wrote: > > Is there any recommendation for fast machines at the > > price range of a few thousand dollars? (I cannot > > afford supercomputers or connection machines.) My > > purpose is to run Monte Carlo simulation. This means > > that a lot of scenarios can be run in parallel > > fashion. Of course I can just use regular cheap > > Pentium boxes... but they are kind of bulky, and I > > don't need any of the video, audio, USB features (I > >I've been looking into setting up a system to do similar work, and it >looks to me like the best bang for the buck right now are dual Athlon >systems. If space is an important consideration, you can get dual Athlon >1U rack mount systems for less than $2000. I'm pretty sure the only dual >Athlon board currently available (Tyan K7 thunder) has on board video, >ethernet and SCSI, which means it cost a little more than it could, but >these systems are still a pretty good deal if you get one without a hard >drive (or a very cheap one). I just did quick web search, and epox is >supposed to be coming out with a dual board as well, so there may be >cheaper options soon. > >-Chris There is a cheaper dual CPU Tyan board which uses the same motherboard chipset. Its the Tyan Tiger-MP S2460, which doesn't have SCSI, onboard video, or Ethernet, but is half the price (around $200). -willryu From eric at enthought.com Tue Nov 27 16:16:02 2001 From: eric at enthought.com (eric) Date: Tue Nov 27 16:16:02 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame Message-ID: <051001c17799$8bfa68b0$777ba8c0@ericlaptop> Hey group, Blitz++ is very cool, but I'm not sure it would make a very good underpinning for reimplementing Numeric. There are 2 (well maybe 3) main points. 1. Blitz++ declares arrays in the following way: The first issue deals with how you declare arrays in Blitz++. Array A(N,N,N); The big deal here is that the dimensionality of Array is a template parameter, not a constructor parameter. In other words, 2D arrays are effectively a different type than 3D arrays. Numeric, on the other hand represents arrays of all dimensions with a single class/type. For Python, this makes the most sense. I think you could fanagle some way of getting blitz to work, but I'm not sure it would be the desired elegant solution. I've also tinkered with building a simple C++ templated (non-blitz) implementation of Numeric for kicks, but kept coming back to using the dreaded void* to store the data arrays. I still haven't completely given up on a templated solution, but it wasn't as obvious as I thought it would be. 2. Compiling Blitz++ is slooooow. scipy.compiler spits out 200-300 line extension modules at the most. Depending on hox complicated expressions are, it can take .5-1.5 minutes to compile a single extension funtion on an 850 MHz PIII. I can't imagine how long it would take to compile Numeric arrays for 1 through 11 dimensions (the most blitz supports as I remember) for all the different data types with 100s of extension functions. The cost wouldn't be linear because you do pay a one time hit for some of the template instantiation. Also, I've heard gcc 3.0 might be better. Still, it'd be a painful development process. 3. Portability. This comes at two levels. The first is that blitz++ has heavy duty requirements of the compiler. gcc works fine which is a huge plus, but a lot of other compilers don't. MSVC is the most notable of these because it is so heavily used on windows. The second level is the portability of C++ extension modules in general. I've run into this on windows, but I think it is an issue pretty much everywhere. For example, MSVC and GCC compiled C extension libraries can call each other on Windows because they the are binary compatible. C++ classes are _not_ binary compatible. This has come up for me with wxPython. The standard version that Robin Dunn distributes is compiled with MSVC. If you build a small extensions with gcc that make wxPython call, it'll link just fine, but seg-faults during execution. Does anyone know if the same sorta thing is true on the Unices? If it is, and Numeric was written in C++ then you'd have to compile extension modules that use Numeric arrays with the same compiler that was used to compile Numeric. This can lead to all sorts of hassles, and it has made me lean back towards C as the preferred language for something as fundemental as Numeric. (Note that I do like C++ for modules that don't really define an API called by other modules). Ok, so maybe there's a 4th point. Paul D. pointed out that blitz isn't much of a win unless you have lazy evaluation (which scipy.compiler already provides). I also think improved speed _isn't_ the biggest goal of a reimplementation (although it can't be sacrificed either). I'm more excited about a code base that more people can comprehend. Perry G. et al's mixed Python/C implementation with the code generators is a very good idea and a step in this direction. I hope the speed issues for small arrays can be solved. I also hope the memory mapped aspect doesn't complicate the code base much. see ya, eric From hinsen at cnrs-orleans.fr Wed Nov 28 00:09:03 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Nov 28 00:09:03 2001 Subject: [Numpy-discussion] Meta: too many numerical libraries doing thesame Message-ID: <200111280808.fAS889g08217@localhost.localdomain> "eric" writes: > The standard version that Robin Dunn distributes is compiled with MSVC. If > you build a small > extensions with gcc that make wxPython call, it'll link just fine, but > seg-faults during execution. > Does anyone know if the same sorta thing is true on the Unices? If it is, > and Numeric was written in C++ then you'd have to compile extension modules > that use Numeric arrays with the same compiler that was used to compile > Numeric. This can lead to all sorts of hassles, and it has made me lean If you rely on dynamic linking for cross-module calls, you'd have the same problem with Unix, as different compilers use different name-mangling schemes. One way around this would be to limit cross-module calls to C functions compiled with "C" linking. Better yet, don't rely on dynamic linking at all and export a module's C API via a Python CObject, as described in the extension manual, and declare all symbols as static (except for the module initialization function of course). In my experience that is the only method that works on all platforms, with all compilers. Of course this also assumes that interfaces are at the C level. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From sag at hydrosphere.com Wed Nov 28 09:02:05 2001 From: sag at hydrosphere.com (Sue Giller) Date: Wed Nov 28 09:02:05 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array Message-ID: <20011128170140921.AAA253@mail.climatedata.com@SUEW2000> I posted the following inquiry to python-list at python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code: >>> import MA >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) >>> a [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]] >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) >>> r = MA.sum(m) >>> r array([11,22,33, 0,55,]) >>> t = MA.getmask(r) >>> print t None From paul at pfdubois.com Wed Nov 28 20:31:03 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Wed Nov 28 20:31:03 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011128170140921.AAA253@mail.climatedata.com@SUEW2000> Message-ID: <000201c1788e$60359ce0$3d01a8c0@plstn1.sfba.home.com> [dubois at ldorritt ~]$ pydoc MA.sum Python Library Documentation: function sum in MA sum(a, axis=0, fill_value=0) Sum of elements along a certain axis using fill_value for missing. If you use add.reduce, you'll get what you want. >>> print m [[1 ,2 ,3 ,-- ,5 ,] [10 ,20 ,30 ,-- ,50 ,]] >>> MA.sum(m) array([11,22,33, 0,55,]) >>> MA.add.reduce(m) array(data = [ 11, 22, 33,-99, 55,], mask = [0,0,0,1,0,], fill_value=-99) In other words, sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) Surprising in your case. Still, both uses are quite common, so I probably was thinking to myself that since add.reduce already does one of the jobs, I might as well make sum do the other one. One could have just as well argued that one was a synonym for the other and so it is revolting to have them be different. Well, MA users, is this something I should change, or not? -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue Giller Sent: Wednesday, November 28, 2001 9:03 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array I posted the following inquiry to python-list at python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code: >>> import MA >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) >>> a [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]] >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) >>> r = MA.sum(m) >>> r array([11,22,33, 0,55,]) >>> t = MA.getmask(r) >>> print t None _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From giulio.bottazzi at libero.it Thu Nov 29 02:10:03 2001 From: giulio.bottazzi at libero.it (Giulio Bottazzi) Date: Thu Nov 29 02:10:03 2001 Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array References: <000201c1788e$60359ce0$3d01a8c0@plstn1.sfba.home.com> Message-ID: <3C05FDA2.AD9C5DCC@libero.it> My answer is yes: the difference between the two behaviors could be confusing for the user. If I can dare to express a "general rule", I would say that the masks in MA arrays should not disappear if not EXPLICITLY required to do so! Of course you can interpret a provided value for the fill_value parameter in the sum function as such a request... but if value is not provided, than I would say that the correct approach would be to keep the mask on (after all, what special about the value 0? For instance, if you have to take logarithm in the next step of the calculation, it is a rather bad choice!) Giulio. "Paul F. Dubois" wrote: > > [dubois at ldorritt ~]$ pydoc MA.sum > Python Library Documentation: function sum in MA > > sum(a, axis=0, fill_value=0) > Sum of elements along a certain axis using fill_value for missing. > > If you use add.reduce, you'll get what you want. > >>> print m > [[1 ,2 ,3 ,-- ,5 ,] > [10 ,20 ,30 ,-- ,50 ,]] > >>> MA.sum(m) > array([11,22,33, 0,55,]) > >>> MA.add.reduce(m) > array(data = > [ 11, 22, 33,-99, 55,], > mask = > [0,0,0,1,0,], > fill_value=-99) > > In other words, > sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) > > Surprising in your case. Still, both uses are quite common, so I > probably was thinking to myself that since add.reduce already does one > of the jobs, I might as well make sum do the other one. One could have > just as well argued that one was a synonym for the other and so it is > revolting to have them be different. > > Well, MA users, is this something I should change, or not? > > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue > Giller > Sent: Wednesday, November 28, 2001 9:03 AM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked > array > > I posted the following inquiry to python-list at python.org earlier this > week, but got no responses, so I thought I'd try a more focused > group. I assume MA module falls under NumPy area. > > I am using 2 (and more) dimensional masked arrays with some > numeric data, and using the reduce functionality on the arrays. I > use the masking because some of the values in the arrays are > 'missing' and should not be included in the results of the reduction. > > For example, assume a 5 x 2 array, with masked values for the 4th > entry for both of the 2nd dimension cells. If I want to sum along the > 2nd dimension, I would expect to get a 'missing' value for the 4th > entry because both of the entries for the sum are 'missing'. Instead, > I get 0, which might be a valid number in my data space, and the > returned 1 dimensional array has no mask associated with it. > > Is this expected behavior for masked arrays or a bug or am I > misusing the mask concept? Does anyone know how to get the > reduction to produce a masked value? > > Example Code: > >>> import MA > >>> a = MA.array([[1,2,3,-99,5],[10,20,30,-99,50]]) > >>> a > [[ 1, 2, 3,-99, 5,] > [ 10, 20, 30,-99, 50,]] > >>> m = MA.masked_values(a, -99) > >>> m > array(data = > [[ 1, 2, 3,-99, 5,] > [ 10, 20, 30,-99, 50,]], > mask = > [[0,0,0,1,0,] > [0,0,0,1,0,]], > fill_value=-99) > > >>> r = MA.sum(m) > >>> r > array([11,22,33, 0,55,]) > >>> t = MA.getmask(r) > >>> print t > None > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From sag at hydrosphere.com Thu Nov 29 09:49:02 2001 From: sag at hydrosphere.com (Sue Giller) Date: Thu Nov 29 09:49:02 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <3C05FDA2.AD9C5DCC@libero.it> Message-ID: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA..reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked). >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA. working one way, and MA..reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. From reggie at merfinllc.com Thu Nov 29 10:36:01 2001 From: reggie at merfinllc.com (Reggie Dugard) Date: Thu Nov 29 10:36:01 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: > That raises the question of how to determine a count of valid values > in a masked array. Can I assume that I can do 'math' on the mask > array itself, for example to sum along a given axis and have the > masked cells add up? > > In my original example, I would expect a sum along the second axis > to return [0,0,0,2,0]. Can I rely on this? I would suggest that a > .count operator would be very useful in working with masked arrays > (count valid and count masked). Actually masked arrays already have a count method that does what you want: Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from pydoc import help >>> import MA >>> x = MA.arange(10) >>> help(x.count) Help on method count in module MA.MA: count(self, axis=None) method of MA.MA.MaskedArray instance Count of the non-masked elements in a, or along a certain axis. >>> x.count() 10 >>> From paul at pfdubois.com Thu Nov 29 12:54:02 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu Nov 29 12:54:02 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: <000201c17917$ac5efec0$3d01a8c0@plstn1.sfba.home.com> You have misread my reply. It is not true that MA.op works one way and MA.op.reduce is different. sum and add.reduce are different, and the documentation for sum DOES say the right thing for sum. The function sum is a special case in that its native meaning was the same as add.reduce and so the function is redundant. I believe you are in error wrt average; average works the way you want. Function count can tell you the number of non-masked values either in the whole array or axis-wise if you give an axis argument. Function size gives you the total number, so #invalid is size(x)-count(x). maximum and minimum (don't use max and min, they are built-ins that don't know about Numeric) have two forms. When called with one argument they return the overall max or min of the whole array, returning masked only if all entries are masked. For two arguments, you get element-wise extrema, and the mask is on where any one of the arguments was masked. >>> print x [[1 ,-- ,3 ,] [11 ,-- ,-- ,]] >>> print average(x) [6.0 ,-- ,3.0 ,] >>> y array( [[ 6, 7, 8,] [ 9,10,11,]]) >>> print maximum(x,y) [[6 ,-- ,8 ,] [11 ,-- ,-- ,]] >>> y[0,0]=masked >>> print maximum(x,y) [[-- ,-- ,8 ,] [11 ,-- ,-- ,]] -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of Sue Giller Sent: Thursday, November 29, 2001 9:50 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA..reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked). >>> m = MA.masked_values(a, -99) >>> m array(data = [[ 1, 2, 3,-99, 5,] [ 10, 20, 30,-99, 50,]], mask = [[0,0,0,1,0,] [0,0,0,1,0,]], fill_value=-99) To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA. working one way, and MA..reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From sag at hydrosphere.com Thu Nov 29 15:21:04 2001 From: sag at hydrosphere.com (Sue Giller) Date: Thu Nov 29 15:21:04 2001 Subject: [Numpy-discussion] Re: Using Reduce with Multi-dimensional Masked array In-Reply-To: <000201c17917$ac5efec0$3d01a8c0@plstn1.sfba.home.com> References: <20011129174809062.AAA210@mail.climatedata.com@SUEW2000> Message-ID: <20011129232011546.AAA269@mail.climatedata.com@SUEW2000> Paul, Well, you're right. I did misunderstand your reply, as well as what the various functions were supposed to do. I was mis-using the sum, minimum, maximum as tho they were MA..reduce, and my test case didn't point out the difference. I should always have been doing the .reduce version. I apologize for this! I found a section on page 45 of the Numerical Python text (PDF form, July 13, 2001) that defines sum as 'The sum function is a synonym for the reduce method of the add ufunc. It returns the sum of all the elements in the sequence given along the specified axis (first axis by default).' This is where I would expect to see a caveat about it not retaining any mask-edness. I was misussing the MA.minimum and MA.maximum as tho they were .reduce version. My bad. The MA.average does produce a masked array, but it has changed the 'missing value' to fill_value=[ 1.00000002e+020,]). I do find this a bit odd, since the other reductions didn't change the fill value. Anyway, I can now get the stats I want in a format I want, and I understand better the various functions for array/masked array. Thanks for the comments/input. sue From romberg at fsl.noaa.gov Fri Nov 30 11:30:04 2001 From: romberg at fsl.noaa.gov (Mike Romberg) Date: Fri Nov 30 11:30:04 2001 Subject: [Numpy-discussion] equal() and complex Message-ID: <15367.56879.54329.654575@smaug.fsl.noaa.gov> I'm wondering if there is some good reason why equal(), not_equal(), nonzero() and the like do not work with numeric arrays of tyep complex. I can see why operators like less() and less_equal() do not work. But the pure equality ones seem like they should work. Or am I missing something :). Thanks, Mike Romberg (romberg at fsl.noaa.gov) From hinsen at cnrs-orleans.fr Fri Nov 30 12:17:04 2001 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Nov 30 12:17:04 2001 Subject: [Numpy-discussion] equal() and complex References: <15367.56879.54329.654575@smaug.fsl.noaa.gov> Message-ID: <200111302016.fAUKG9X01351@localhost.localdomain> Mike Romberg writes: > I'm wondering if there is some good reason why equal(), not_equal(), > nonzero() and the like do not work with numeric arrays of tyep > complex. I can see why operators like less() and less_equal() do not > work. But the pure equality ones seem like they should work. Or am I > missing something :). Before Python 2.1, comparison couldn't be implemented for equality only. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From europax at home.com Fri Nov 30 17:35:03 2001 From: europax at home.com (Rob) Date: Fri Nov 30 17:35:03 2001 Subject: [Numpy-discussion] Numeric Python EM Project has moved Message-ID: <3C083356.31E66685@home.com> Its now at www.pythonemproject.com. I can be reached at rob at pythonemproject.com. All this has come about since @home is possibly suspending operation at midnite tonight :( Rob. Looks like I need to change my sig too :) -- The Numeric Python EM Project www.members.home.net/europax