From just at letterror.com Sat Apr 1 11:54:26 2000 From: just at letterror.com (Just van Rossum) Date: Sat, 1 Apr 2000 17:54:26 +0100 Subject: [Numpy-discussion] multiarray & ufuncs in Python core? Message-ID: Folks, Whatever happend to the plans to fold multiarray and ufuncs into the core Python distribution? Did Guido decide to not want it? Or is it just that nobody came round to fulfilling the requirements, whatever they be? Since Python 1.6 is only in alpha, it may not be too late... (Sorry if this has been answered before and I missed it.) Just From cgw at fnal.gov Sat Apr 1 19:59:57 2000 From: cgw at fnal.gov (Charles G Waldman) Date: Sat, 1 Apr 2000 18:59:57 -0600 Subject: [Numpy-discussion] setup.py vs python1.6 Message-ID: <200004020059.SAA11573@buffalo.fnal.gov> I'm playing around with the brand-new Python1.6 alpha (formerly known as 1.5.2+) and noticed a problem when installing Numeric into the new python1.6 directories - the "setup.py" that comes with Numeric has a hardcoded "python1.5" in it. Patch uploaded to the sourceforge patch page. From vanandel at atd.ucar.edu Mon Apr 3 19:33:00 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Mon, 03 Apr 2000 17:33:00 -0600 Subject: [Numpy-discussion] arraytypes.c : can't read descrs table Message-ID: <38E92A2C.A0D6E724@atd.ucar.edu> Using Python 1.5.2, gcc 2.95.2 and Numeric CVS as of 4/3/2000 on Solaris 7 (Sparc). When I attempt to import Numeric, I get a segmentation fault. After 'multiarray'. '_numpy' and 'umath' are imported, the stack trace is: #0 PyArray_DescrFromType (type=70) at Src/arraytypes.c:593 #1 0xfe768dd0 in PyArray_FromDims (nd=1, d=0xffbecfc8, type=70) at Src/arrayobject.c:416 #2 0xfe7952f0 in array_zeros (ignored=0x1, args=0x1) at Src/multiarraymodule.c:961 #3 0x1f72c in call_builtin (func=0xe6928, arg=0xedd40, kw=0x0) at ceval.c:2359 #4 0x1f5f8 in PyEval_CallObjectWithKeywords (func=0xe6928, arg=0xedd40, kw=0x0) at ceval.c:2324 #5 0x1dde0 in eval_code2 (co=0xe7ec8, globals=0x0, locals=0x83, args=0xffffffff, argcount=944424, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at ceval.c:1654 #6 0x1dc88 in eval_code2 (co=0xc9dc0, globals=0xfda98, locals=0xffffffff, args=0x1, argcount=1015296, kws=0x0, kwcount=0, defs=0xe6b2c, defcount=1, owner=0x0) at ceval.c:1612 #7 0x1dc88 in eval_code2 (co=0xf78f0, globals=0xfdc7c, locals=0x1, args=0x1, argcount=1014976, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at ceval.c:1612 #8 0x1b8a0 in PyEval_EvalCode (co=0xf78f0, globals=0xf0f70, locals=0xf0f70) at ceval.c:324 #9 0x277a8 in PyImport_ExecCodeModuleEx (name=0xffbede80 "Precision", co=0xf78f0, pathname=0xffbed4a0 "/usr/local/lib/python1.5/site-packages/Numeric/Precision.pyc") at import.c:485 gdb shows the fault right after line 596 in Src/arraytypes.c : if (type < PyArray_NTYPES) { return descrs[type]; # type = 'F' } else { switch(type) { case 'c': return descrs[PyArray_CHAR]; case 'b': return descrs[PyArray_UBYTE]; case '1': return descrs[PyArray_SBYTE]; case 's': return descrs[PyArray_SHORT]; case 'i': return descrs[PyArray_INT]; case 'l': return descrs[PyArray_LONG]; case 'f': return descrs[PyArray_FLOAT]; case 'd': return descrs[PyArray_DOUBLE]; case 'F': return descrs[PyArray_CFLOAT]; If I try to examine descrs[0], gdb says: (gdb) print descrs[0] Cannot access memory at address 0x0. This is probably shared library weardness, but I'm not sure how to fix it. Any ideas? -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From tchur at bigpond.com Sat Apr 8 11:07:13 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 09 Apr 2000 01:07:13 +1000 Subject: [Numpy-discussion] NumPy and None (null, NaN, missing) Message-ID: <38EF4B21.10D4438F@bigpond.com> I'm a new user of MumPy so forgive me if this is a FAQ. I would normally check the list archives but I'm on holidays at the moment in Manila and the speed of the Internet connection here does not permit much Web browsing... I've been experimenting with using Gary Strangman's excellent stats.py functions. The spped of these functions when operating on NumPy arrays and the ability of NumPy to swallow very large arrays is remarkable. However, one deficiency I have noticed is the lack of the ability to represent nulls (i.e. missing values, None or NaN [Not-a-Number] in NumPy arrays. Missing values commonly occur in real-life statistical data and although they are usually excluded from most statistical calculations, it is important to be able to keep track of the number of missing data elements and report this. ecause NumPy arrays can't represent missing data via a special value, it is necessary to exclude missing data elements from NumPy arrays and keep track of them elsewhere (in standard Python lists). This is messy. Also, it is quite common to use various imputation techniques to estimate the values of missing data elements - the ability to represent missing data in a NumPy array and then change it to an imputed value would be a real boon. Regards, Tim C . The speed of these functions arelightning-fast. The problem is the speed with which data can be extracted from a column of a MySQL (or any other SQL database) query result set and stuffed into a NumPy array. This inevitably involves forming a Python list and then assigning that to a NumPy array. This is both slow and memory-hungry, especially with large datsets (I have een playing with a few million rows). I was wondering if it would be feasible to initially add a method to the _mysql class in the MySQLdb module which iterated through a result set using a C routine (rather than a Python routine) and stuffed the data directly into a NumPy array (or arrays - one for each column in the result set) in one fell swoop (or even iterating row-by-row but in C)? I suspect that such a facility would be much faster than having to move the data into NumPy via a standard Python list (or actually via tuples within a list, which i sthe way the Python DB-API returns results). If this direct MySQL-to-NumPy interface worked well, it might be desirable to add it to the Python DB-API specification for optional implementation in the other database modules which conform to the API. There are probably other extensions which would make the DB-API more useful for statistical applications, which tend to be set (column)-oriented rather than row-oriented - will post to the list as these occur to me. Cheers, Tim Churches PS I will be away for the next week so apologies in advance for not replying immediately to any follow-ups to this posting. TC From tchur at bigpond.com Sat Apr 8 10:32:02 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 09 Apr 2000 00:32:02 +1000 Subject: [Numpy-discussion] NumPy, Python DB-API and MySQL Message-ID: <38EF42E2.BD6946EF@bigpond.com> I've been experimenting with pulling quantitative data out of a MySQL table into NumPy arrays via Andy Dustman's excellent MySQLdb module and then calculating various statistics from the data using Gary Strangman's excellent stats.py functions, which when operating on NumPy arrays are lightning-fast. The problem is the speed with which data can be extracted from a column of a MySQL (or any other SQL database) query result set and stuffed into a NumPy array. This inevitably involves forming a Python list and then assigning that to a NumPy array. This is both slow and memory-hungry, especially with large datsets (I have een playing with a few million rows). I was wondering if it would be feasible to initially add a method to the _mysql class in the MySQLdb module which iterated through a result set using a C routine (rather than a Python routine) and stuffed the data directly into a NumPy array (or arrays - one for each column in the result set) in one fell swoop (or even iterating row-by-row but in C)? I suspect that such a facility would be much faster than having to move the data into NumPy via a standard Python list (or actually via tuples within a list, which i sthe way the Python DB-API returns results). If this direct MySQL-to-NumPy interface worked well, it might be desirable to add it to the Python DB-API specification for optional implementation in the other database modules which conform to the API. There are probably other extensions which would make the DB-API more useful for statistical applications, which tend to be set (column)-oriented rather than row-oriented - will post to the list as these occur to me. Cheers, Tim Churches PS I will be away for the next week so apologies in advance for not replying immediately to any follow-ups to this posting. TC From cgw at fnal.gov Sat Apr 8 18:54:20 2000 From: cgw at fnal.gov (Charles G Waldman) Date: Sat, 8 Apr 2000 17:54:20 -0500 (CDT) Subject: [Numpy-discussion] __version__ hack in Matrix.py is busted Message-ID: <14575.47260.499027.194394@buffalo.fnal.gov> I can't import "Matrix" due to the following cruft: __id__ = """ $Id: Matrix.py,v 1.1.1.1 2000/01/13 21:23:06 dubois Exp $ """[1:-1] import string __version__ = int(__id__[string.index(__id__, '#')+1:-1]) You can't count on the CVS ID having a "#" character in it; each time the file is checked in and out of CVS the Id is rewritten. I don't think this trick can be made to work. I think what is needed is either a simpler more failsafe way of setting __version__, or simply to eliminate __version__ altogether. From maechler at stat.math.ethz.ch Mon Apr 10 04:29:17 2000 From: maechler at stat.math.ethz.ch (Martin Maechler) Date: Mon, 10 Apr 2000 10:29:17 +0200 (CEST) Subject: [Numpy-discussion] Re: NumPy and None (null, NaN, missing) In-Reply-To: <200004081921.MAA29128@lists.sourceforge.net> References: <200004081921.MAA29128@lists.sourceforge.net> Message-ID: <14577.37085.686348.645721@lynne.ethz.ch> >>>>> "TimC" == gestalt-system-discuss-admin writes: TimC> Date: Sun, 09 Apr 2000 01:07:13 +1000 TimC> From: Tim Churches TimC> Organization: Gestalt Institute TimC> To: strang at nmr.mgh.harvard.edu, strang at bucky.nmr.mgh.harvard.edu, TimC> gestalt-system-discuss at lists.sourceforge.net, TimC> numpy-discussion at lists.sourceforge.net TimC> I'm a new user of MumPy so forgive me if this is a FAQ. ...... TimC> I've been experimenting with using Gary Strangman's excellent stats.py TimC> functions. The spped of these functions when operating on NumPy arrays TimC> and the ability of NumPy to swallow very large arrays is remarkable. TimC> However, one deficiency I have noticed is the lack of the ability TimC> to represent nulls (i.e. missing values, None or NaN TimC> [Not-a-Number] in NumPy arrays. Missing values commonly occur in TimC> real-life statistical data and although they are usually excluded TimC> from most statistical calculations, it is important to be able to TimC> keep track of the number of missing data elements and report TimC> this. I'm just a recent "listener" on gestalt-system-discuss, and don't even have any python experience. I'm member of the R core team (www.r-project.org). In R (and even in S-plus, but almost invisibly there), we even do differentiate between "NA" (missing / not available) and "NaN" (IEEE result of 0/0, etc). I'd very much like to have these different as in R. I think our implementation of these is quite efficient, implementing NA as one particular bit pattern from the whole possible NaN set. We use code like the following (R source, src/main/arithmetic.c ) : static double R_ValueOfNA(void) { ieee_double x; x.word[hw] = 0x7ff00000; x.word[lw] = 1954; return x.value; } int R_IsNA(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] == 1954); } return 0; } Martin Maechler http://stat.ethz.ch/~maechler/ TimC> Because NumPy arrays can't represent missing data via a TimC> special value, it is necessary to exclude missing data elements TimC> from NumPy arrays and keep track of them elsewhere (in standard TimC> Python lists). This is messy. Also, it is quite common to use TimC> various imputation techniques to estimate the values of missing TimC> data elements - the ability to represent missing data in a NumPy TimC> array and then change it to an imputed value would be a real TimC> boon. From pauldubois at home.com Mon Apr 10 12:01:55 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Mon, 10 Apr 2000 09:01:55 -0700 Subject: [Numpy-discussion] Re: NumPy and None (null, NaN, missing) In-Reply-To: <14577.37085.686348.645721@lynne.ethz.ch> Message-ID: I have sent this out before but here it is again. It is a beta of a missing-observation class. Please help me refine it and complete it. I intend to add it to the numpy distribution since this facility is much-requested. MAtest.py shows how to use it. The intention is that it is used the same way you use a Numeric, and in fact if there are no masked values that there isn't a lot of overhead. The basic concept is that each MA holds an array and a mask that indicates which values of the array are valid. Note the change in semantics for indexing shown below. Later I imagine creating a compiled extension class for bit masks to improve the space and time efficiency. Paul # Note copy semantics here differ from Numeric def __getitem__(self, i): m = self.__mask if m is None: return Numeric.array(self.__data[i]) else: return MA(Numeric.array(self.__data[i]), Numeric.array(m[i])) def __getslice__(self, i, j): m = self.__mask if m is None: return Numeric.array(self.__data[i:j]) else: return MA(Numeric.array(self.__data[i:j]), Numeric.array(m[i:j])) # -------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: MA.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: MAtest.py URL: From adustman at comstar.net Mon Apr 10 12:27:03 2000 From: adustman at comstar.net (Andy Dustman) Date: Mon, 10 Apr 2000 12:27:03 -0400 (EDT) Subject: [Numpy-discussion] Re: NumPy, Python DB-API and MySQL In-Reply-To: <38EF42E2.BD6946EF@bigpond.com> Message-ID: On Sun, 9 Apr 2000, Tim Churches wrote: > I've been experimenting with pulling quantitative data out of a MySQL > table into NumPy arrays via Andy Dustman's excellent MySQLdb module and > then calculating various statistics from the data using Gary Strangman's > excellent stats.py functions, which when operating on NumPy arrays are > lightning-fast. > > The problem is the speed with which data can be extracted from a column > of a MySQL (or any other SQL database) query result set and stuffed into > a NumPy array. This inevitably involves forming a Python list and then > assigning that to a NumPy array. This is both slow and memory-hungry, > especially with large datsets (I have een playing with a few million > rows). > > I was wondering if it would be feasible to initially add a method to the > _mysql class in the MySQLdb module which iterated through a result set > using a C routine (rather than a Python routine) and stuffed the data > directly into a NumPy array (or arrays - one for each column in the > result set) in one fell swoop (or even iterating row-by-row but in C)? I > suspect that such a facility would be much faster than having to move > the data into NumPy via a standard Python list (or actually via tuples > within a list, which i sthe way the Python DB-API returns results). > > If this direct MySQL-to-NumPy interface worked well, it might be > desirable to add it to the Python DB-API specification for optional > implementation in the other database modules which conform to the API. > There are probably other extensions which would make the DB-API more > useful for statistical applications, which tend to be set > (column)-oriented rather than row-oriented - will post to the list as > these occur to me. It might be possible to do something like this. I would prefer that such a feature work as a seperate module (i.e. I don't think it is generally applicable to MySQLdb/_mysql). Or perhaps it could be a compile-time option for _mysql (-DUSE_NUMPY). The object that you want to mess with is the _mysql result object. It contains an attribute MYSQL_RES *result, which is a pointer to the actual MySQL structure. I don't remember if NumPy arrays are extensible or not, i.e. can rows be appended? That would affect the design. If they are not extensible, then you are probably limited to using mysql_store_result() (result set stored on the client side), as opposed to mysql_use_result() (result set stored on the server side). mysql_store_result is probably preferable in this case anyway, so extensibility doesn't matter, as we can find the size of the result set in advance with mysql_num_rows(). Then we know the full size of the array. However, with very large result sets, it may be necessary to use mysql_use_result(), in which case the array will need to be extended, possibly row-by-row. I could do this, but I need to know how to create and assign values to a NumPy array from within C. Or perhaps an initial (empty) array with the correct number of columns can be passed. I am pretty sure NumPy arrays look like sequences (of sequences), so assignment should not be a big problem. Easiest solution (for me, and puts least bloat in _mysql) would be for the user to pass in a NumPy array. Question: Would it be adequate to put all columns returned into the array? If label columns need to be returned, this could pose a problem. They may have to be returned as a separate query. Or else non-numeric columns would be excluded and returned in a list of tuples (this would be harder). I suspect the existing cursor.executemany() is capable of INSERTing and UPDATEing NumPy arrays. -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From kern at its.caltech.edu Tue Apr 11 19:16:19 2000 From: kern at its.caltech.edu (Robert Kern) Date: Tue, 11 Apr 2000 16:16:19 -0700 (PDT) Subject: [Numpy-discussion] Request for Datasets Message-ID: Hello, I'm working on a Multipack module to use ODRPACK for non-linear regression problems. ODRPACK is available from NETLIB if you want information: http://www.netlib.org/odrpack/index.html I'm in the debugging phase right now, and I want to be able to test my interfaces to most if not all of ODRPACK's capabilities. Consequently, I need datasets with some of the following properties: * multiple inputs (or a vector of inputs) * multiple responses (or a vector of responses) * errors/weights on either the responses, inputs, or both * covariance matrices for the errors on responses/inputs/both (in the case of multiple inputs/responses) * any differentiable functional form that I can make a Python function compute using Numpy and SpecialFuncs ufuncs (and maybe a few others) * problems where it is sensible to fix particular parameters and estimate the others * problems where it is sensible to fix particular datapoints (e.g. boundary conditions) * problems where some datapoints should be removed I would be much obliged if any of you could send me datasets that have some of these characteristics. I would prefer them to be in something parsable by Python, either a simple plaintext format or even NetCDF to goad me into learning how to use Konrad's NetCDF interface. A description of the function to fit to is necessary, and a brief description of the problem and perhaps even the expected answers would be nice. *** Please e-mail these to me and not the list. If you would like more information or clarification, please e-mail me. Many thanks for your time and possible contribution. -- Robert Kern kern at caltech.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From godzilla at netmeg.net Wed Apr 12 15:59:20 2000 From: godzilla at netmeg.net (Les Schaffer) Date: Wed, 12 Apr 2000 15:59:20 -0400 (EDT) Subject: [Numpy-discussion] compiling extensions with cygwin??? Message-ID: <14580.54680.536718.237210@gargle.gargle.HOWL> i need to compile a simple Python extension, which uses NumPy/C API, on WinXX. does anyone have (bad/good) experience compiling NumPy extension with cygwin or mingw32 compilers? i am trying to decide whether i need to purchase VC++ or not. many thanks les schaffer From kern at its.caltech.edu Wed Apr 12 18:17:52 2000 From: kern at its.caltech.edu (Robert Kern) Date: Wed, 12 Apr 2000 15:17:52 -0700 (PDT) Subject: [Numpy-discussion] compiling extensions with cygwin??? In-Reply-To: <14580.54680.536718.237210@gargle.gargle.HOWL> Message-ID: On Wed, 12 Apr 2000, Les Schaffer wrote: > i need to compile a simple Python extension, which uses NumPy/C API, > on WinXX. > > does anyone have (bad/good) experience compiling NumPy extension with > cygwin or mingw32 compilers? i am trying to decide whether i need to > purchase VC++ or not. http://starship.python.net/crew/kernr/mingw32/Notes.html Don't bother buying anything. > many thanks > > les schaffer -- Robert Kern kern at caltech.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From jbaddor at physics.mcgill.ca Thu Apr 13 14:02:46 2000 From: jbaddor at physics.mcgill.ca (Jean-Bernard Addor) Date: Thu, 13 Apr 2000 14:02:46 -0400 (EDT) Subject: [Numpy-discussion] multiprocessor machine Message-ID: Hey Numpy people! I just put a second processor in my computer and it seems Numpy don't use it. Is Numpy able to use 2 processors? From wich version does't work? Jean-Bernard From tchur at bigpond.com Fri Apr 14 07:25:33 2000 From: tchur at bigpond.com (Tim Churches) Date: Fri, 14 Apr 2000 21:25:33 +1000 Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL References: Message-ID: <38F7002C.BE9610AA@bigpond.com> Andy Dustman wrote: > > On Sun, 9 Apr 2000, Tim Churches wrote: > > > I've been experimenting with pulling quantitative data out of a MySQL > > table into NumPy arrays via Andy Dustman's excellent MySQLdb module and > > then calculating various statistics from the data using Gary Strangman's > > excellent stats.py functions, which when operating on NumPy arrays are > > lightning-fast. > > [...snip...] > > It might be possible to do something like this. I would prefer that such a > feature work as a seperate module (i.e. I don't think it is generally > applicable to MySQLdb/_mysql). Or perhaps it could be a compile-time > option for _mysql (-DUSE_NUMPY). The latter sounds good. I agree that most users of MySQLdb would not need it, so they shouldn't be burdened with it. > > The object that you want to mess with is the _mysql result object. It > contains an attribute MYSQL_RES *result, which is a pointer to the actual > MySQL structure. I don't remember if NumPy arrays are extensible or not, > i.e. can rows be appended? No they can't. I suspect that is the price to be paid for the efficient storage offered by NumPy arrays. > That would affect the design. If they are not > extensible, then you are probably limited to using mysql_store_result() > (result set stored on the client side), as opposed to mysql_use_result() > (result set stored on the server side). mysql_store_result is probably > preferable in this case anyway, so extensibility doesn't matter, as we can > find the size of the result set in advance with mysql_num_rows(). Then we > know the full size of the array. Yes, but the problem with mysql_store_result() is the large amount of memory required to store the result set. Couldn't the user be responsible for predetermining the size of the array via a query such as "select count(*) from sometable where...." and then pass this value as a parameter to the executeNumPy() method? In MySQL at least such count(*) queries are resolved very quickly so such an approach wouldn't take twice the time. Then mysql_use_result() could be used to populate the initialised NumPy array with data row, so there so only ever one complete copy of the data in memory, and that copy is in the NumPy array. > > However, with very large result sets, it may be necessary to use > mysql_use_result(), in which case the array will need to be extended, > possibly row-by-row. > > I could do this, but I need to know how to create and assign values to a > NumPy array from within C. Or perhaps an initial (empty) array with the > correct number of columns can be passed. I am pretty sure NumPy arrays > look like sequences (of sequences), so assignment should not be a big > problem. Easiest solution (for me, and puts least bloat in _mysql) would > be for the user to pass in a NumPy array. I'll look at the NumPy docs re this. Can any of the NumPy developers give some clues re this? > > Question: Would it be adequate to put all columns returned into the array? > If label columns need to be returned, this could pose a problem. They may > have to be returned as a separate query. Or else non-numeric columns would > be excluded and returned in a list of tuples (this would be harder). Yes, more thought needed here - my initial thought was one NumPy array per column, particularly since NumPy arrays must be homogenous wrt data type. Each NumPy array could be named the same as the column from which it is derived. Cheers, Tim C From hinsen at cnrs-orleans.fr Fri Apr 14 12:45:07 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 14 Apr 2000 18:45:07 +0200 Subject: [Numpy-discussion] multiprocessor machine In-Reply-To: (message from Jean-Bernard Addor on Thu, 13 Apr 2000 14:02:46 -0400 (EDT)) References: Message-ID: <200004141645.SAA09618@chinon.cnrs-orleans.fr> > I just put a second processor in my computer and it seems Numpy don't use > it. > > Is Numpy able to use 2 processors? From wich version does't work? NumPy uses only one processor, and I am not even sure I'd want to change that. I use biprocessor machines as well and I have adapted my time-critical code to them (parallelization via threading), but the parallelization is almost always at a higher level than NumPy operations. In other words, I give one NumPy operation to each processor rather than have both work on the same NumPy operation. I'd prefer to build a parallelizing general-purpose library on top of NumPy, ideally supporting message passing as well. Would anyone else be interested in this? I have a nicely packaged MPI support module for Python (to be released next week in a new version of ScientificPython), so that part is already done. Which reminds me: there once was a parallelization project mailing list on the old Starship, which disappeared during the move due to a minor accident. Is there interest to revive it? I now have a cluster of 20 biprocessors to feed, and I'd like to provide it with only the best: Python code ;-) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at dirac.cnrs-orleans.fr Fri Apr 14 12:45:25 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Fri, 14 Apr 2000 18:45:25 +0200 Subject: [Numpy-discussion] multiprocessor machine In-Reply-To: (message from Jean-Bernard Addor on Thu, 13 Apr 2000 14:02:46 -0400 (EDT)) References: Message-ID: <200004141645.SAA09622@chinon.cnrs-orleans.fr> > I just put a second processor in my computer and it seems Numpy don't use > it. > > Is Numpy able to use 2 processors? From wich version does't work? NumPy uses only one processor, and I am not even sure I'd want to change that. I use biprocessor machines as well and I have adapted my time-critical code to them (parallelization via threading), but the parallelization is almost always at a higher level than NumPy operations. In other words, I give one NumPy operation to each processor rather than have both work on the same NumPy operation. I'd prefer to build a parallelizing general-purpose library on top of NumPy, ideally supporting message passing as well. Would anyone else be interested in this? I have a nicely packaged MPI support module for Python (to be released next week in a new version of ScientificPython), so that part is already done. Which reminds me: there once was a parallelization project mailing list on the old Starship, which disappeared during the move due to a minor accident. Is there interest to revive it? I now have a cluster of 20 biprocessors to feed, and I'd like to provide it with only the best: Python code ;-) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at cfa.harvard.edu Fri Apr 14 12:53:38 2000 From: ransom at cfa.harvard.edu (Scott M. Ransom) Date: Fri, 14 Apr 2000 12:53:38 -0400 Subject: [Numpy-discussion] multiprocessor machine References: <200004141645.SAA09618@chinon.cnrs-orleans.fr> Message-ID: <38F74D12.DF81F59A@cfa.harvard.edu> Konrad Hinsen wrote: > > NumPy uses only one processor, and I am not even sure I'd want to > change that. I use biprocessor machines as well and I have adapted my > time-critical code to them (parallelization via threading), but > the parallelization is almost always at a higher level than > NumPy operations. In other words, I give one NumPy operation to each > processor rather than have both work on the same NumPy operation. I do the same thing. And I agree about not wanting it the other way (although an option for this might be nice...) > I'd prefer to build a parallelizing general-purpose library on top > of NumPy, ideally supporting message passing as well. Would anyone > else be interested in this? I have a nicely packaged MPI support > module for Python (to be released next week in a new version of > ScientificPython), so that part is already done. I am certainly interested. In fact, I have also written a MPI support module. Maybe when I see yours I will be able to add some stuff...I'm making the assumption that yours is probably more flexible than mine... > Which reminds me: there once was a parallelization project mailing > list on the old Starship, which disappeared during the move due to a > minor accident. Is there interest to revive it? I now have a > cluster of 20 biprocessors to feed, and I'd like to provide it with > only the best: Python code ;-) Once again, I'm in... Scott -- Scott M. Ransom Address: Harvard-Smithsonian CfA Phone: (617) 495-4142 60 Garden St. MS 10 email: ransom at cfa.harvard.edu Cambridge, MA 02138 PGP Fingerprint: D2 0E D0 10 CD 95 06 DA EF 78 FE 2B CB 3A D3 53 From eq3pvl at eq.uc.pt Tue Apr 18 09:36:32 2000 From: eq3pvl at eq.uc.pt (Pedro Vale Lima) Date: Tue, 18 Apr 2000 14:36:32 +0100 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! References: <200004181322.GAA09316@lists.sourceforge.net> Message-ID: <38FC64E0.E1D2EFCD@eq.uc.pt> Hello, Can someone give me a hand? I'm porting some code and I need to do QR decomposition. I couldn't find a such a function in Numpy. As I remember Lapack has one, isn't it part of the python interface? thanks pedro vale lima -- University of Coimbra, Portugal eq3pvl at eq.uc.pt From hinsen at cnrs-orleans.fr Tue Apr 18 11:18:42 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 18 Apr 2000 17:18:42 +0200 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! In-Reply-To: <38FC64E0.E1D2EFCD@eq.uc.pt> (message from Pedro Vale Lima on Tue, 18 Apr 2000 14:36:32 +0100) References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> Message-ID: <200004181518.RAA20451@chinon.cnrs-orleans.fr> > Can someone give me a hand? I'm porting some code and I need to do > QR decomposition. I couldn't find a such a function in Numpy. > As I remember Lapack has one, isn't it part of the python interface? There is a lot more in LAPACK than is covered by the high-level Python interface (Module LinearAlgebra). There is, however, a complete low-level interface to all of LAPACK and BLAS, written eons ago by Doug Heisterkamp. You can pick up an updated copy at ftp://dirac.cnrs-orleans.fr/pub/PyLapack.tar.gz Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From eq3pvl at eq.uc.pt Tue Apr 18 11:43:58 2000 From: eq3pvl at eq.uc.pt (Pedro Vale Lima) Date: Tue, 18 Apr 2000 16:43:58 +0100 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> <200004181518.RAA20451@chinon.cnrs-orleans.fr> Message-ID: <38FC82BE.4E51F15B@eq.uc.pt> Konrad Hinsen wrote: > > Can someone give me a hand? I'm porting some code and I need to do > > QR decomposition. I couldn't find a such a function in Numpy. > > As I remember Lapack has one, isn't it part of the python interface? > > There is a lot more in LAPACK than is covered by the high-level Python > interface (Module LinearAlgebra). There is, however, a complete low-level > interface to all of LAPACK and BLAS, written eons ago by Doug Heisterkamp. > You can pick up an updated copy at > ftp://dirac.cnrs-orleans.fr/pub/PyLapack.tar.gz > > Konrad. > -- Thanks, Meanwhile I wrote QR in pyhton. I'll change to that interface to get some speed improvement. Just to satisfy my curiosity, is it a design decision to keep LinearAlgebra small, or just waiting for someone to contribute more bindings? pedro -- Pedro Vale Lima University of Coimbra From vanandel at atd.ucar.edu Tue Apr 18 11:59:49 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Tue, 18 Apr 2000 09:59:49 -0600 Subject: [Numpy-discussion] Numeric Python release? Message-ID: <38FC8675.B41F2099@atd.ucar.edu> sourceforge.net shows the latest release of Numeric Python as v 15.2, dated 1/19/2000 Could I encourage the Numeric Python developers to release a more recent version of Numeric Python? I know the latest is always available from CVS, but I'm sure that there are people who aren't ready to deal with CVS, just to get a current version of Numeric. -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From hinsen at dirac.cnrs-orleans.fr Wed Apr 19 09:45:11 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Wed, 19 Apr 2000 15:45:11 +0200 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! In-Reply-To: <38FC82BE.4E51F15B@eq.uc.pt> (message from Pedro Vale Lima on Tue, 18 Apr 2000 16:43:58 +0100) References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> <200004181518.RAA20451@chinon.cnrs-orleans.fr> <38FC82BE.4E51F15B@eq.uc.pt> Message-ID: <200004191345.PAA26395@chinon.cnrs-orleans.fr> > Meanwhile I wrote QR in pyhton. I'll change to that interface to get > some speed improvement. Just to satisfy my curiosity, is it a design > decision to keep LinearAlgebra small, or just waiting for someone to > contribute more bindings? Not speaking for the NumPy maintainers, but I am sure it's the latter. I don't see what harm could be done by having more operations in LinearAlgebra. On the other hand, if I wanted to support everything in LAPACK, I would use several modules or, better yet, a package. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Fri Apr 21 08:48:25 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 21 Apr 2000 14:48:25 +0200 Subject: [Numpy-discussion] ScientificPython 2.1.0 with MPI interface Message-ID: <200004211248.OAA31247@chinon.cnrs-orleans.fr> I have just put ScientificPython 2.0.1 and 2.1.0 on my FTP server, ftp://dirac.cnrs-orleans.fr/pub/ScientificPython/ while Starship is recovering. Version 2.0.1 is mostly a bugfix release, with only minor additions. 2.1.0 is identical to 2.0.1 except for the addition of an MPI interface module. I have tested this on only one platform (Linux/Intel running MPICH), so I would be very interested in feedback from people running different MPI platforms. MPI support in ScientificPython is still very basic; there are probably more complete MPI interfaces out there. The strong point of ScientificPython's interface is the integration into Python: communicators are Python objects, all communication happens via methods defined on communicator objects, support is provided for sending and receiving both string and NumPy array objects. Moreover, Python scripts can rather easily be written in such a way that they work both with and without MPI support, of course using only a single processor when no MPI is available. Finally, there is a full C API as well, which means that other C modules can make use of MPI without having to link to the MPI library, which is particularly useful for dynamic library modules. It also facilitates packaging of MPI-based code, which doesn't need to know anything at all about the MPI library. Happy Easter, Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From adustman at comstar.net Mon Apr 24 16:19:44 2000 From: adustman at comstar.net (Andy Dustman) Date: Mon, 24 Apr 2000 16:19:44 -0400 (EDT) Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL In-Reply-To: <38F7002C.BE9610AA@bigpond.com> Message-ID: On Fri, 14 Apr 2000, Tim Churches wrote: > Andy Dustman wrote: > > Yes, but the problem with mysql_store_result() is the large amount of > memory required to store the result set. Couldn't the user be > responsible for predetermining the size of the array via a query such as > "select count(*) from sometable where...." and then pass this value as a > parameter to the executeNumPy() method? In MySQL at least such count(*) > queries are resolved very quickly so such an approach wouldn't take > twice the time. Then mysql_use_result() could be used to populate the > initialised NumPy array with data row, so there so only ever one > complete copy of the data in memory, and that copy is in the NumPy > array. After some more thought on this subject, and some poking around at NumPy, I came to the following conclusions: Since NumPy arrays are fixed-size, but otherwise sequences (in the multi-dimensional case, sequences of sequences), the best approach would be for the user to pass in a pre-sized array (i.e. from zeros(), and btw, the docstring for zeros is way wrong), and _mysql would simply access it through the Sequence object protocol, and update as many values as it could: If you passed a 100-row array, it would fill 100 rows or as many as were in the result set, whichever is less. Since this requires no special knowledge of NumPy, it could be a standard addition (no conditional compiliation required). This method (tentatively _mysql.fetch_rows_into_array(array)) would return the array argument as the result. IndexError would likely be raised if the array was too narrow (too many columns in result set). Probably this would not be a MySQLdb.Cursor method, but perhaps I can have a seperate module with a cursor subclass which returns NumPy arrays. > > Question: Would it be adequate to put all columns returned into the array? > > If label columns need to be returned, this could pose a problem. They may > > have to be returned as a separate query. Or else non-numeric columns would > > be excluded and returned in a list of tuples (this would be harder). > > Yes, more thought needed here - my initial thought was one NumPy array > per column, particularly since NumPy arrays must be homogenous wrt data > type. Each NumPy array could be named the same as the column from which > it is derived. Okay, I think I know what you mean here. You are wanting to return each column as a (vertical) vector, whereas I am thinking along the lines of returning the result set as a matrix. Is that correct? Since it appears you can efficiently slice out column vectors as a[:,n], is my idea acceptable? i.e. >>> a=Numeric.multiarray.zeros( (2,2),'d') >>> a[1,1]=2 >>> a[0,1]=-1 >>> a[1,0]=-3 >>> a array([[ 0., -1.], [-3., 2.]]) >>> a[:,0] array([ 0., -3.]) >>> a[:,1] array([-1., 2.]) -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From tchur at bigpond.com Fri Apr 28 18:32:44 2000 From: tchur at bigpond.com (Tim Churches) Date: Sat, 29 Apr 2000 08:32:44 +1000 Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL References: Message-ID: <390A118C.FB60517D@bigpond.com> Andy Dustman wrote: [...snip...] > > Okay, I think I know what you mean here. You are wanting to return each > column as a (vertical) vector, whereas I am thinking along the lines of > returning the result set as a matrix. Is that correct? Yes, exactly. > Since it appears > you can efficiently slice out column vectors as a[:,n], is my idea > acceptable? i.e. > > >>> a=Numeric.multiarray.zeros( (2,2),'d') > >>> a[1,1]=2 > >>> a[0,1]=-1 > >>> a[1,0]=-3 > >>> a > array([[ 0., -1.], > [-3., 2.]]) > >>> a[:,0] > array([ 0., -3.]) > >>> a[:,1] > array([-1., 2.]) The only problem is that NumPy arrays must be homogeneous wrt type, which means that, say, a categorical column containing just a few distinct values stored as an integer would have to be upcast to a double in the NumPy matrix if it was part of a query which also returned a float. Would it be possible to extend your idea of passing in an array to the query? Perhaps the user could pass in a list of pre-existing, pre-sized sequence objects (which might be rank-1 NumPy arrays of various appropriate data types or Python tuples) which correspond to the columns which are to be returned by the SQL query. It would be up to the user to determine the correct type for each NumPy array and to size the array or tuples correctly. The reason for needing tuples as well as NumPy arrays is that, as you mention, NumPy arrays only support numbers. The intention would be for all of this to be wrapped in a class which may issue a number of small queries to determine the number of rows to be returned and the data types of the columns, so the user is shielded from having to work out these details. The only bit that has to be written in C is the function which takes the sequence of sequences (NumPy Arrays or Python tuples) in which to store the query results, column-wise and stuffs the value for each column for each row of the result set into the appropriate passed-in sequence object. I would be more than happy to assist with the Python code, testing and documentation but my C skills aren't up to helping with the guts of it. In other words, making this part of the low-level _mysql interface would be sufficient. Cheers, Tim C > > -- > andy dustman | programmer/analyst | comstar.net, inc. > telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d > "Therefore, sweet knights, if you may doubt your strength or courage, > come no further, for death awaits you all, with nasty, big, pointy teeth!" From vanroose at ruca.ua.ac.be Sat Apr 29 08:58:29 2000 From: vanroose at ruca.ua.ac.be (Vanroose Wim) Date: Sat, 29 Apr 2000 14:58:29 +0200 Subject: [Numpy-discussion] Gnu Scientic Library Message-ID: <390ADC75.CAABEABA@ruca.ua.ac.be> Dear Madam, Sir, I recently started to use the GSLibrary from http://soureware.cygnus.com/gsl/ The have a interesting collection of special functions. And a started to wrap several of them into my python programs. Does anybody has experiences with GSL?? Would n't it be beautiful to produce a python module based on the GSL special functions. Did somebody already do it??? Wim Vanroose From jhauser at ifm.uni-kiel.de Sat Apr 29 11:50:24 2000 From: jhauser at ifm.uni-kiel.de (Janko Hauser) Date: Sat, 29 Apr 2000 17:50:24 +0200 (CEST) Subject: [Numpy-discussion] Gnu Scientic Library In-Reply-To: <390ADC75.CAABEABA@ruca.ua.ac.be> References: <390ADC75.CAABEABA@ruca.ua.ac.be> Message-ID: <20000429155024.24770.qmail@lisboa.ifm.uni-kiel.de> A very complete set of special functions is already wrapped by Travis Oliphant. But there are numerous other functions in GSL which would be worthwhile to connect to NumPy. Look for the cephes module. One benefit of using such a general library covering different areas is, that with one form of interface a whole slew of functions can be wrapped which also make the packaging a lot easier. Also I think the library is desinged with wrapping to other languages. Just to mention another library with a similar scope, but which is older, perhaps more mature there is also SLATEC from netlib. __Janko From tchur at bigpond.com Sat Apr 29 19:12:11 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 30 Apr 2000 09:12:11 +1000 Subject: [Numpy-discussion] Gnu Scientic Library References: <390ADC75.CAABEABA@ruca.ua.ac.be> Message-ID: <390B6C4B.58080AED@bigpond.com> Vanroose Wim wrote: > > Dear Madam, Sir, > > I recently started to use the GSLibrary from > http://soureware.cygnus.com/gsl/ The have a interesting collection of > special functions. And a started to wrap several of them into my python > programs. Does anybody has experiences with GSL?? > > Would n't it be beautiful to produce a python module based > on the GSL special functions. Did somebody already do it??? > > Wim Vanroose Wim, Have a look at http://gestalt-system.sourceforge.net/gestalt_manifesto_exp_097.html and search for the string "GSL". As you will see, we originally proposed to wrap at least the statistical functions of GSL as User-Defined Functions and/or User-Defined Procedures for MySQL. (Note that the GNU Goose library which we mention is no longer under separate, active development, having been rolled back into the GNOME Guppy project, it seems). We still hope to wrap GSL for use directly in MySQL, but this now has a lower priority after experiencing how fast and memory-efficient NumPy is for basic exploratory statistics when used in conjunction with Gary Strangman's stats.py package. Nevertheless, it would be useful to have the GSL library available in Python - is it feasible to make it work with NumPy arrays as well as other Python sequences? We are most interested in the statistical aspects of the library but all the functions are potentially useful. My C skills are not up to the task but perhaps someone else on the GS-discuss mailing list might be able to assist? Regards, Tim Churches > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > http://lists.sourceforge.net/mailman/listinfo/numpy-discussion From just at letterror.com Sat Apr 1 11:54:26 2000 From: just at letterror.com (Just van Rossum) Date: Sat, 1 Apr 2000 17:54:26 +0100 Subject: [Numpy-discussion] multiarray & ufuncs in Python core? Message-ID: Folks, Whatever happend to the plans to fold multiarray and ufuncs into the core Python distribution? Did Guido decide to not want it? Or is it just that nobody came round to fulfilling the requirements, whatever they be? Since Python 1.6 is only in alpha, it may not be too late... (Sorry if this has been answered before and I missed it.) Just From cgw at fnal.gov Sat Apr 1 19:59:57 2000 From: cgw at fnal.gov (Charles G Waldman) Date: Sat, 1 Apr 2000 18:59:57 -0600 Subject: [Numpy-discussion] setup.py vs python1.6 Message-ID: <200004020059.SAA11573@buffalo.fnal.gov> I'm playing around with the brand-new Python1.6 alpha (formerly known as 1.5.2+) and noticed a problem when installing Numeric into the new python1.6 directories - the "setup.py" that comes with Numeric has a hardcoded "python1.5" in it. Patch uploaded to the sourceforge patch page. From vanandel at atd.ucar.edu Mon Apr 3 19:33:00 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Mon, 03 Apr 2000 17:33:00 -0600 Subject: [Numpy-discussion] arraytypes.c : can't read descrs table Message-ID: <38E92A2C.A0D6E724@atd.ucar.edu> Using Python 1.5.2, gcc 2.95.2 and Numeric CVS as of 4/3/2000 on Solaris 7 (Sparc). When I attempt to import Numeric, I get a segmentation fault. After 'multiarray'. '_numpy' and 'umath' are imported, the stack trace is: #0 PyArray_DescrFromType (type=70) at Src/arraytypes.c:593 #1 0xfe768dd0 in PyArray_FromDims (nd=1, d=0xffbecfc8, type=70) at Src/arrayobject.c:416 #2 0xfe7952f0 in array_zeros (ignored=0x1, args=0x1) at Src/multiarraymodule.c:961 #3 0x1f72c in call_builtin (func=0xe6928, arg=0xedd40, kw=0x0) at ceval.c:2359 #4 0x1f5f8 in PyEval_CallObjectWithKeywords (func=0xe6928, arg=0xedd40, kw=0x0) at ceval.c:2324 #5 0x1dde0 in eval_code2 (co=0xe7ec8, globals=0x0, locals=0x83, args=0xffffffff, argcount=944424, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at ceval.c:1654 #6 0x1dc88 in eval_code2 (co=0xc9dc0, globals=0xfda98, locals=0xffffffff, args=0x1, argcount=1015296, kws=0x0, kwcount=0, defs=0xe6b2c, defcount=1, owner=0x0) at ceval.c:1612 #7 0x1dc88 in eval_code2 (co=0xf78f0, globals=0xfdc7c, locals=0x1, args=0x1, argcount=1014976, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at ceval.c:1612 #8 0x1b8a0 in PyEval_EvalCode (co=0xf78f0, globals=0xf0f70, locals=0xf0f70) at ceval.c:324 #9 0x277a8 in PyImport_ExecCodeModuleEx (name=0xffbede80 "Precision", co=0xf78f0, pathname=0xffbed4a0 "/usr/local/lib/python1.5/site-packages/Numeric/Precision.pyc") at import.c:485 gdb shows the fault right after line 596 in Src/arraytypes.c : if (type < PyArray_NTYPES) { return descrs[type]; # type = 'F' } else { switch(type) { case 'c': return descrs[PyArray_CHAR]; case 'b': return descrs[PyArray_UBYTE]; case '1': return descrs[PyArray_SBYTE]; case 's': return descrs[PyArray_SHORT]; case 'i': return descrs[PyArray_INT]; case 'l': return descrs[PyArray_LONG]; case 'f': return descrs[PyArray_FLOAT]; case 'd': return descrs[PyArray_DOUBLE]; case 'F': return descrs[PyArray_CFLOAT]; If I try to examine descrs[0], gdb says: (gdb) print descrs[0] Cannot access memory at address 0x0. This is probably shared library weardness, but I'm not sure how to fix it. Any ideas? -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From tchur at bigpond.com Sat Apr 8 11:07:13 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 09 Apr 2000 01:07:13 +1000 Subject: [Numpy-discussion] NumPy and None (null, NaN, missing) Message-ID: <38EF4B21.10D4438F@bigpond.com> I'm a new user of MumPy so forgive me if this is a FAQ. I would normally check the list archives but I'm on holidays at the moment in Manila and the speed of the Internet connection here does not permit much Web browsing... I've been experimenting with using Gary Strangman's excellent stats.py functions. The spped of these functions when operating on NumPy arrays and the ability of NumPy to swallow very large arrays is remarkable. However, one deficiency I have noticed is the lack of the ability to represent nulls (i.e. missing values, None or NaN [Not-a-Number] in NumPy arrays. Missing values commonly occur in real-life statistical data and although they are usually excluded from most statistical calculations, it is important to be able to keep track of the number of missing data elements and report this. ecause NumPy arrays can't represent missing data via a special value, it is necessary to exclude missing data elements from NumPy arrays and keep track of them elsewhere (in standard Python lists). This is messy. Also, it is quite common to use various imputation techniques to estimate the values of missing data elements - the ability to represent missing data in a NumPy array and then change it to an imputed value would be a real boon. Regards, Tim C . The speed of these functions arelightning-fast. The problem is the speed with which data can be extracted from a column of a MySQL (or any other SQL database) query result set and stuffed into a NumPy array. This inevitably involves forming a Python list and then assigning that to a NumPy array. This is both slow and memory-hungry, especially with large datsets (I have een playing with a few million rows). I was wondering if it would be feasible to initially add a method to the _mysql class in the MySQLdb module which iterated through a result set using a C routine (rather than a Python routine) and stuffed the data directly into a NumPy array (or arrays - one for each column in the result set) in one fell swoop (or even iterating row-by-row but in C)? I suspect that such a facility would be much faster than having to move the data into NumPy via a standard Python list (or actually via tuples within a list, which i sthe way the Python DB-API returns results). If this direct MySQL-to-NumPy interface worked well, it might be desirable to add it to the Python DB-API specification for optional implementation in the other database modules which conform to the API. There are probably other extensions which would make the DB-API more useful for statistical applications, which tend to be set (column)-oriented rather than row-oriented - will post to the list as these occur to me. Cheers, Tim Churches PS I will be away for the next week so apologies in advance for not replying immediately to any follow-ups to this posting. TC From tchur at bigpond.com Sat Apr 8 10:32:02 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 09 Apr 2000 00:32:02 +1000 Subject: [Numpy-discussion] NumPy, Python DB-API and MySQL Message-ID: <38EF42E2.BD6946EF@bigpond.com> I've been experimenting with pulling quantitative data out of a MySQL table into NumPy arrays via Andy Dustman's excellent MySQLdb module and then calculating various statistics from the data using Gary Strangman's excellent stats.py functions, which when operating on NumPy arrays are lightning-fast. The problem is the speed with which data can be extracted from a column of a MySQL (or any other SQL database) query result set and stuffed into a NumPy array. This inevitably involves forming a Python list and then assigning that to a NumPy array. This is both slow and memory-hungry, especially with large datsets (I have een playing with a few million rows). I was wondering if it would be feasible to initially add a method to the _mysql class in the MySQLdb module which iterated through a result set using a C routine (rather than a Python routine) and stuffed the data directly into a NumPy array (or arrays - one for each column in the result set) in one fell swoop (or even iterating row-by-row but in C)? I suspect that such a facility would be much faster than having to move the data into NumPy via a standard Python list (or actually via tuples within a list, which i sthe way the Python DB-API returns results). If this direct MySQL-to-NumPy interface worked well, it might be desirable to add it to the Python DB-API specification for optional implementation in the other database modules which conform to the API. There are probably other extensions which would make the DB-API more useful for statistical applications, which tend to be set (column)-oriented rather than row-oriented - will post to the list as these occur to me. Cheers, Tim Churches PS I will be away for the next week so apologies in advance for not replying immediately to any follow-ups to this posting. TC From cgw at fnal.gov Sat Apr 8 18:54:20 2000 From: cgw at fnal.gov (Charles G Waldman) Date: Sat, 8 Apr 2000 17:54:20 -0500 (CDT) Subject: [Numpy-discussion] __version__ hack in Matrix.py is busted Message-ID: <14575.47260.499027.194394@buffalo.fnal.gov> I can't import "Matrix" due to the following cruft: __id__ = """ $Id: Matrix.py,v 1.1.1.1 2000/01/13 21:23:06 dubois Exp $ """[1:-1] import string __version__ = int(__id__[string.index(__id__, '#')+1:-1]) You can't count on the CVS ID having a "#" character in it; each time the file is checked in and out of CVS the Id is rewritten. I don't think this trick can be made to work. I think what is needed is either a simpler more failsafe way of setting __version__, or simply to eliminate __version__ altogether. From maechler at stat.math.ethz.ch Mon Apr 10 04:29:17 2000 From: maechler at stat.math.ethz.ch (Martin Maechler) Date: Mon, 10 Apr 2000 10:29:17 +0200 (CEST) Subject: [Numpy-discussion] Re: NumPy and None (null, NaN, missing) In-Reply-To: <200004081921.MAA29128@lists.sourceforge.net> References: <200004081921.MAA29128@lists.sourceforge.net> Message-ID: <14577.37085.686348.645721@lynne.ethz.ch> >>>>> "TimC" == gestalt-system-discuss-admin writes: TimC> Date: Sun, 09 Apr 2000 01:07:13 +1000 TimC> From: Tim Churches TimC> Organization: Gestalt Institute TimC> To: strang at nmr.mgh.harvard.edu, strang at bucky.nmr.mgh.harvard.edu, TimC> gestalt-system-discuss at lists.sourceforge.net, TimC> numpy-discussion at lists.sourceforge.net TimC> I'm a new user of MumPy so forgive me if this is a FAQ. ...... TimC> I've been experimenting with using Gary Strangman's excellent stats.py TimC> functions. The spped of these functions when operating on NumPy arrays TimC> and the ability of NumPy to swallow very large arrays is remarkable. TimC> However, one deficiency I have noticed is the lack of the ability TimC> to represent nulls (i.e. missing values, None or NaN TimC> [Not-a-Number] in NumPy arrays. Missing values commonly occur in TimC> real-life statistical data and although they are usually excluded TimC> from most statistical calculations, it is important to be able to TimC> keep track of the number of missing data elements and report TimC> this. I'm just a recent "listener" on gestalt-system-discuss, and don't even have any python experience. I'm member of the R core team (www.r-project.org). In R (and even in S-plus, but almost invisibly there), we even do differentiate between "NA" (missing / not available) and "NaN" (IEEE result of 0/0, etc). I'd very much like to have these different as in R. I think our implementation of these is quite efficient, implementing NA as one particular bit pattern from the whole possible NaN set. We use code like the following (R source, src/main/arithmetic.c ) : static double R_ValueOfNA(void) { ieee_double x; x.word[hw] = 0x7ff00000; x.word[lw] = 1954; return x.value; } int R_IsNA(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] == 1954); } return 0; } Martin Maechler http://stat.ethz.ch/~maechler/ TimC> Because NumPy arrays can't represent missing data via a TimC> special value, it is necessary to exclude missing data elements TimC> from NumPy arrays and keep track of them elsewhere (in standard TimC> Python lists). This is messy. Also, it is quite common to use TimC> various imputation techniques to estimate the values of missing TimC> data elements - the ability to represent missing data in a NumPy TimC> array and then change it to an imputed value would be a real TimC> boon. From pauldubois at home.com Mon Apr 10 12:01:55 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Mon, 10 Apr 2000 09:01:55 -0700 Subject: [Numpy-discussion] Re: NumPy and None (null, NaN, missing) In-Reply-To: <14577.37085.686348.645721@lynne.ethz.ch> Message-ID: I have sent this out before but here it is again. It is a beta of a missing-observation class. Please help me refine it and complete it. I intend to add it to the numpy distribution since this facility is much-requested. MAtest.py shows how to use it. The intention is that it is used the same way you use a Numeric, and in fact if there are no masked values that there isn't a lot of overhead. The basic concept is that each MA holds an array and a mask that indicates which values of the array are valid. Note the change in semantics for indexing shown below. Later I imagine creating a compiled extension class for bit masks to improve the space and time efficiency. Paul # Note copy semantics here differ from Numeric def __getitem__(self, i): m = self.__mask if m is None: return Numeric.array(self.__data[i]) else: return MA(Numeric.array(self.__data[i]), Numeric.array(m[i])) def __getslice__(self, i, j): m = self.__mask if m is None: return Numeric.array(self.__data[i:j]) else: return MA(Numeric.array(self.__data[i:j]), Numeric.array(m[i:j])) # -------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: MA.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: MAtest.py URL: From adustman at comstar.net Mon Apr 10 12:27:03 2000 From: adustman at comstar.net (Andy Dustman) Date: Mon, 10 Apr 2000 12:27:03 -0400 (EDT) Subject: [Numpy-discussion] Re: NumPy, Python DB-API and MySQL In-Reply-To: <38EF42E2.BD6946EF@bigpond.com> Message-ID: On Sun, 9 Apr 2000, Tim Churches wrote: > I've been experimenting with pulling quantitative data out of a MySQL > table into NumPy arrays via Andy Dustman's excellent MySQLdb module and > then calculating various statistics from the data using Gary Strangman's > excellent stats.py functions, which when operating on NumPy arrays are > lightning-fast. > > The problem is the speed with which data can be extracted from a column > of a MySQL (or any other SQL database) query result set and stuffed into > a NumPy array. This inevitably involves forming a Python list and then > assigning that to a NumPy array. This is both slow and memory-hungry, > especially with large datsets (I have een playing with a few million > rows). > > I was wondering if it would be feasible to initially add a method to the > _mysql class in the MySQLdb module which iterated through a result set > using a C routine (rather than a Python routine) and stuffed the data > directly into a NumPy array (or arrays - one for each column in the > result set) in one fell swoop (or even iterating row-by-row but in C)? I > suspect that such a facility would be much faster than having to move > the data into NumPy via a standard Python list (or actually via tuples > within a list, which i sthe way the Python DB-API returns results). > > If this direct MySQL-to-NumPy interface worked well, it might be > desirable to add it to the Python DB-API specification for optional > implementation in the other database modules which conform to the API. > There are probably other extensions which would make the DB-API more > useful for statistical applications, which tend to be set > (column)-oriented rather than row-oriented - will post to the list as > these occur to me. It might be possible to do something like this. I would prefer that such a feature work as a seperate module (i.e. I don't think it is generally applicable to MySQLdb/_mysql). Or perhaps it could be a compile-time option for _mysql (-DUSE_NUMPY). The object that you want to mess with is the _mysql result object. It contains an attribute MYSQL_RES *result, which is a pointer to the actual MySQL structure. I don't remember if NumPy arrays are extensible or not, i.e. can rows be appended? That would affect the design. If they are not extensible, then you are probably limited to using mysql_store_result() (result set stored on the client side), as opposed to mysql_use_result() (result set stored on the server side). mysql_store_result is probably preferable in this case anyway, so extensibility doesn't matter, as we can find the size of the result set in advance with mysql_num_rows(). Then we know the full size of the array. However, with very large result sets, it may be necessary to use mysql_use_result(), in which case the array will need to be extended, possibly row-by-row. I could do this, but I need to know how to create and assign values to a NumPy array from within C. Or perhaps an initial (empty) array with the correct number of columns can be passed. I am pretty sure NumPy arrays look like sequences (of sequences), so assignment should not be a big problem. Easiest solution (for me, and puts least bloat in _mysql) would be for the user to pass in a NumPy array. Question: Would it be adequate to put all columns returned into the array? If label columns need to be returned, this could pose a problem. They may have to be returned as a separate query. Or else non-numeric columns would be excluded and returned in a list of tuples (this would be harder). I suspect the existing cursor.executemany() is capable of INSERTing and UPDATEing NumPy arrays. -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From kern at its.caltech.edu Tue Apr 11 19:16:19 2000 From: kern at its.caltech.edu (Robert Kern) Date: Tue, 11 Apr 2000 16:16:19 -0700 (PDT) Subject: [Numpy-discussion] Request for Datasets Message-ID: Hello, I'm working on a Multipack module to use ODRPACK for non-linear regression problems. ODRPACK is available from NETLIB if you want information: http://www.netlib.org/odrpack/index.html I'm in the debugging phase right now, and I want to be able to test my interfaces to most if not all of ODRPACK's capabilities. Consequently, I need datasets with some of the following properties: * multiple inputs (or a vector of inputs) * multiple responses (or a vector of responses) * errors/weights on either the responses, inputs, or both * covariance matrices for the errors on responses/inputs/both (in the case of multiple inputs/responses) * any differentiable functional form that I can make a Python function compute using Numpy and SpecialFuncs ufuncs (and maybe a few others) * problems where it is sensible to fix particular parameters and estimate the others * problems where it is sensible to fix particular datapoints (e.g. boundary conditions) * problems where some datapoints should be removed I would be much obliged if any of you could send me datasets that have some of these characteristics. I would prefer them to be in something parsable by Python, either a simple plaintext format or even NetCDF to goad me into learning how to use Konrad's NetCDF interface. A description of the function to fit to is necessary, and a brief description of the problem and perhaps even the expected answers would be nice. *** Please e-mail these to me and not the list. If you would like more information or clarification, please e-mail me. Many thanks for your time and possible contribution. -- Robert Kern kern at caltech.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From godzilla at netmeg.net Wed Apr 12 15:59:20 2000 From: godzilla at netmeg.net (Les Schaffer) Date: Wed, 12 Apr 2000 15:59:20 -0400 (EDT) Subject: [Numpy-discussion] compiling extensions with cygwin??? Message-ID: <14580.54680.536718.237210@gargle.gargle.HOWL> i need to compile a simple Python extension, which uses NumPy/C API, on WinXX. does anyone have (bad/good) experience compiling NumPy extension with cygwin or mingw32 compilers? i am trying to decide whether i need to purchase VC++ or not. many thanks les schaffer From kern at its.caltech.edu Wed Apr 12 18:17:52 2000 From: kern at its.caltech.edu (Robert Kern) Date: Wed, 12 Apr 2000 15:17:52 -0700 (PDT) Subject: [Numpy-discussion] compiling extensions with cygwin??? In-Reply-To: <14580.54680.536718.237210@gargle.gargle.HOWL> Message-ID: On Wed, 12 Apr 2000, Les Schaffer wrote: > i need to compile a simple Python extension, which uses NumPy/C API, > on WinXX. > > does anyone have (bad/good) experience compiling NumPy extension with > cygwin or mingw32 compilers? i am trying to decide whether i need to > purchase VC++ or not. http://starship.python.net/crew/kernr/mingw32/Notes.html Don't bother buying anything. > many thanks > > les schaffer -- Robert Kern kern at caltech.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From jbaddor at physics.mcgill.ca Thu Apr 13 14:02:46 2000 From: jbaddor at physics.mcgill.ca (Jean-Bernard Addor) Date: Thu, 13 Apr 2000 14:02:46 -0400 (EDT) Subject: [Numpy-discussion] multiprocessor machine Message-ID: Hey Numpy people! I just put a second processor in my computer and it seems Numpy don't use it. Is Numpy able to use 2 processors? From wich version does't work? Jean-Bernard From tchur at bigpond.com Fri Apr 14 07:25:33 2000 From: tchur at bigpond.com (Tim Churches) Date: Fri, 14 Apr 2000 21:25:33 +1000 Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL References: Message-ID: <38F7002C.BE9610AA@bigpond.com> Andy Dustman wrote: > > On Sun, 9 Apr 2000, Tim Churches wrote: > > > I've been experimenting with pulling quantitative data out of a MySQL > > table into NumPy arrays via Andy Dustman's excellent MySQLdb module and > > then calculating various statistics from the data using Gary Strangman's > > excellent stats.py functions, which when operating on NumPy arrays are > > lightning-fast. > > [...snip...] > > It might be possible to do something like this. I would prefer that such a > feature work as a seperate module (i.e. I don't think it is generally > applicable to MySQLdb/_mysql). Or perhaps it could be a compile-time > option for _mysql (-DUSE_NUMPY). The latter sounds good. I agree that most users of MySQLdb would not need it, so they shouldn't be burdened with it. > > The object that you want to mess with is the _mysql result object. It > contains an attribute MYSQL_RES *result, which is a pointer to the actual > MySQL structure. I don't remember if NumPy arrays are extensible or not, > i.e. can rows be appended? No they can't. I suspect that is the price to be paid for the efficient storage offered by NumPy arrays. > That would affect the design. If they are not > extensible, then you are probably limited to using mysql_store_result() > (result set stored on the client side), as opposed to mysql_use_result() > (result set stored on the server side). mysql_store_result is probably > preferable in this case anyway, so extensibility doesn't matter, as we can > find the size of the result set in advance with mysql_num_rows(). Then we > know the full size of the array. Yes, but the problem with mysql_store_result() is the large amount of memory required to store the result set. Couldn't the user be responsible for predetermining the size of the array via a query such as "select count(*) from sometable where...." and then pass this value as a parameter to the executeNumPy() method? In MySQL at least such count(*) queries are resolved very quickly so such an approach wouldn't take twice the time. Then mysql_use_result() could be used to populate the initialised NumPy array with data row, so there so only ever one complete copy of the data in memory, and that copy is in the NumPy array. > > However, with very large result sets, it may be necessary to use > mysql_use_result(), in which case the array will need to be extended, > possibly row-by-row. > > I could do this, but I need to know how to create and assign values to a > NumPy array from within C. Or perhaps an initial (empty) array with the > correct number of columns can be passed. I am pretty sure NumPy arrays > look like sequences (of sequences), so assignment should not be a big > problem. Easiest solution (for me, and puts least bloat in _mysql) would > be for the user to pass in a NumPy array. I'll look at the NumPy docs re this. Can any of the NumPy developers give some clues re this? > > Question: Would it be adequate to put all columns returned into the array? > If label columns need to be returned, this could pose a problem. They may > have to be returned as a separate query. Or else non-numeric columns would > be excluded and returned in a list of tuples (this would be harder). Yes, more thought needed here - my initial thought was one NumPy array per column, particularly since NumPy arrays must be homogenous wrt data type. Each NumPy array could be named the same as the column from which it is derived. Cheers, Tim C From hinsen at cnrs-orleans.fr Fri Apr 14 12:45:07 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 14 Apr 2000 18:45:07 +0200 Subject: [Numpy-discussion] multiprocessor machine In-Reply-To: (message from Jean-Bernard Addor on Thu, 13 Apr 2000 14:02:46 -0400 (EDT)) References: Message-ID: <200004141645.SAA09618@chinon.cnrs-orleans.fr> > I just put a second processor in my computer and it seems Numpy don't use > it. > > Is Numpy able to use 2 processors? From wich version does't work? NumPy uses only one processor, and I am not even sure I'd want to change that. I use biprocessor machines as well and I have adapted my time-critical code to them (parallelization via threading), but the parallelization is almost always at a higher level than NumPy operations. In other words, I give one NumPy operation to each processor rather than have both work on the same NumPy operation. I'd prefer to build a parallelizing general-purpose library on top of NumPy, ideally supporting message passing as well. Would anyone else be interested in this? I have a nicely packaged MPI support module for Python (to be released next week in a new version of ScientificPython), so that part is already done. Which reminds me: there once was a parallelization project mailing list on the old Starship, which disappeared during the move due to a minor accident. Is there interest to revive it? I now have a cluster of 20 biprocessors to feed, and I'd like to provide it with only the best: Python code ;-) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at dirac.cnrs-orleans.fr Fri Apr 14 12:45:25 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Fri, 14 Apr 2000 18:45:25 +0200 Subject: [Numpy-discussion] multiprocessor machine In-Reply-To: (message from Jean-Bernard Addor on Thu, 13 Apr 2000 14:02:46 -0400 (EDT)) References: Message-ID: <200004141645.SAA09622@chinon.cnrs-orleans.fr> > I just put a second processor in my computer and it seems Numpy don't use > it. > > Is Numpy able to use 2 processors? From wich version does't work? NumPy uses only one processor, and I am not even sure I'd want to change that. I use biprocessor machines as well and I have adapted my time-critical code to them (parallelization via threading), but the parallelization is almost always at a higher level than NumPy operations. In other words, I give one NumPy operation to each processor rather than have both work on the same NumPy operation. I'd prefer to build a parallelizing general-purpose library on top of NumPy, ideally supporting message passing as well. Would anyone else be interested in this? I have a nicely packaged MPI support module for Python (to be released next week in a new version of ScientificPython), so that part is already done. Which reminds me: there once was a parallelization project mailing list on the old Starship, which disappeared during the move due to a minor accident. Is there interest to revive it? I now have a cluster of 20 biprocessors to feed, and I'd like to provide it with only the best: Python code ;-) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at cfa.harvard.edu Fri Apr 14 12:53:38 2000 From: ransom at cfa.harvard.edu (Scott M. Ransom) Date: Fri, 14 Apr 2000 12:53:38 -0400 Subject: [Numpy-discussion] multiprocessor machine References: <200004141645.SAA09618@chinon.cnrs-orleans.fr> Message-ID: <38F74D12.DF81F59A@cfa.harvard.edu> Konrad Hinsen wrote: > > NumPy uses only one processor, and I am not even sure I'd want to > change that. I use biprocessor machines as well and I have adapted my > time-critical code to them (parallelization via threading), but > the parallelization is almost always at a higher level than > NumPy operations. In other words, I give one NumPy operation to each > processor rather than have both work on the same NumPy operation. I do the same thing. And I agree about not wanting it the other way (although an option for this might be nice...) > I'd prefer to build a parallelizing general-purpose library on top > of NumPy, ideally supporting message passing as well. Would anyone > else be interested in this? I have a nicely packaged MPI support > module for Python (to be released next week in a new version of > ScientificPython), so that part is already done. I am certainly interested. In fact, I have also written a MPI support module. Maybe when I see yours I will be able to add some stuff...I'm making the assumption that yours is probably more flexible than mine... > Which reminds me: there once was a parallelization project mailing > list on the old Starship, which disappeared during the move due to a > minor accident. Is there interest to revive it? I now have a > cluster of 20 biprocessors to feed, and I'd like to provide it with > only the best: Python code ;-) Once again, I'm in... Scott -- Scott M. Ransom Address: Harvard-Smithsonian CfA Phone: (617) 495-4142 60 Garden St. MS 10 email: ransom at cfa.harvard.edu Cambridge, MA 02138 PGP Fingerprint: D2 0E D0 10 CD 95 06 DA EF 78 FE 2B CB 3A D3 53 From eq3pvl at eq.uc.pt Tue Apr 18 09:36:32 2000 From: eq3pvl at eq.uc.pt (Pedro Vale Lima) Date: Tue, 18 Apr 2000 14:36:32 +0100 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! References: <200004181322.GAA09316@lists.sourceforge.net> Message-ID: <38FC64E0.E1D2EFCD@eq.uc.pt> Hello, Can someone give me a hand? I'm porting some code and I need to do QR decomposition. I couldn't find a such a function in Numpy. As I remember Lapack has one, isn't it part of the python interface? thanks pedro vale lima -- University of Coimbra, Portugal eq3pvl at eq.uc.pt From hinsen at cnrs-orleans.fr Tue Apr 18 11:18:42 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 18 Apr 2000 17:18:42 +0200 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! In-Reply-To: <38FC64E0.E1D2EFCD@eq.uc.pt> (message from Pedro Vale Lima on Tue, 18 Apr 2000 14:36:32 +0100) References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> Message-ID: <200004181518.RAA20451@chinon.cnrs-orleans.fr> > Can someone give me a hand? I'm porting some code and I need to do > QR decomposition. I couldn't find a such a function in Numpy. > As I remember Lapack has one, isn't it part of the python interface? There is a lot more in LAPACK than is covered by the high-level Python interface (Module LinearAlgebra). There is, however, a complete low-level interface to all of LAPACK and BLAS, written eons ago by Doug Heisterkamp. You can pick up an updated copy at ftp://dirac.cnrs-orleans.fr/pub/PyLapack.tar.gz Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From eq3pvl at eq.uc.pt Tue Apr 18 11:43:58 2000 From: eq3pvl at eq.uc.pt (Pedro Vale Lima) Date: Tue, 18 Apr 2000 16:43:58 +0100 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> <200004181518.RAA20451@chinon.cnrs-orleans.fr> Message-ID: <38FC82BE.4E51F15B@eq.uc.pt> Konrad Hinsen wrote: > > Can someone give me a hand? I'm porting some code and I need to do > > QR decomposition. I couldn't find a such a function in Numpy. > > As I remember Lapack has one, isn't it part of the python interface? > > There is a lot more in LAPACK than is covered by the high-level Python > interface (Module LinearAlgebra). There is, however, a complete low-level > interface to all of LAPACK and BLAS, written eons ago by Doug Heisterkamp. > You can pick up an updated copy at > ftp://dirac.cnrs-orleans.fr/pub/PyLapack.tar.gz > > Konrad. > -- Thanks, Meanwhile I wrote QR in pyhton. I'll change to that interface to get some speed improvement. Just to satisfy my curiosity, is it a design decision to keep LinearAlgebra small, or just waiting for someone to contribute more bindings? pedro -- Pedro Vale Lima University of Coimbra From vanandel at atd.ucar.edu Tue Apr 18 11:59:49 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Tue, 18 Apr 2000 09:59:49 -0600 Subject: [Numpy-discussion] Numeric Python release? Message-ID: <38FC8675.B41F2099@atd.ucar.edu> sourceforge.net shows the latest release of Numeric Python as v 15.2, dated 1/19/2000 Could I encourage the Numeric Python developers to release a more recent version of Numeric Python? I know the latest is always available from CVS, but I'm sure that there are people who aren't ready to deal with CVS, just to get a current version of Numeric. -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From hinsen at dirac.cnrs-orleans.fr Wed Apr 19 09:45:11 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Wed, 19 Apr 2000 15:45:11 +0200 Subject: [Numpy-discussion] Re: Welcome To "Numpy-discussion"! In-Reply-To: <38FC82BE.4E51F15B@eq.uc.pt> (message from Pedro Vale Lima on Tue, 18 Apr 2000 16:43:58 +0100) References: <200004181322.GAA09316@lists.sourceforge.net> <38FC64E0.E1D2EFCD@eq.uc.pt> <200004181518.RAA20451@chinon.cnrs-orleans.fr> <38FC82BE.4E51F15B@eq.uc.pt> Message-ID: <200004191345.PAA26395@chinon.cnrs-orleans.fr> > Meanwhile I wrote QR in pyhton. I'll change to that interface to get > some speed improvement. Just to satisfy my curiosity, is it a design > decision to keep LinearAlgebra small, or just waiting for someone to > contribute more bindings? Not speaking for the NumPy maintainers, but I am sure it's the latter. I don't see what harm could be done by having more operations in LinearAlgebra. On the other hand, if I wanted to support everything in LAPACK, I would use several modules or, better yet, a package. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Fri Apr 21 08:48:25 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 21 Apr 2000 14:48:25 +0200 Subject: [Numpy-discussion] ScientificPython 2.1.0 with MPI interface Message-ID: <200004211248.OAA31247@chinon.cnrs-orleans.fr> I have just put ScientificPython 2.0.1 and 2.1.0 on my FTP server, ftp://dirac.cnrs-orleans.fr/pub/ScientificPython/ while Starship is recovering. Version 2.0.1 is mostly a bugfix release, with only minor additions. 2.1.0 is identical to 2.0.1 except for the addition of an MPI interface module. I have tested this on only one platform (Linux/Intel running MPICH), so I would be very interested in feedback from people running different MPI platforms. MPI support in ScientificPython is still very basic; there are probably more complete MPI interfaces out there. The strong point of ScientificPython's interface is the integration into Python: communicators are Python objects, all communication happens via methods defined on communicator objects, support is provided for sending and receiving both string and NumPy array objects. Moreover, Python scripts can rather easily be written in such a way that they work both with and without MPI support, of course using only a single processor when no MPI is available. Finally, there is a full C API as well, which means that other C modules can make use of MPI without having to link to the MPI library, which is particularly useful for dynamic library modules. It also facilitates packaging of MPI-based code, which doesn't need to know anything at all about the MPI library. Happy Easter, Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From adustman at comstar.net Mon Apr 24 16:19:44 2000 From: adustman at comstar.net (Andy Dustman) Date: Mon, 24 Apr 2000 16:19:44 -0400 (EDT) Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL In-Reply-To: <38F7002C.BE9610AA@bigpond.com> Message-ID: On Fri, 14 Apr 2000, Tim Churches wrote: > Andy Dustman wrote: > > Yes, but the problem with mysql_store_result() is the large amount of > memory required to store the result set. Couldn't the user be > responsible for predetermining the size of the array via a query such as > "select count(*) from sometable where...." and then pass this value as a > parameter to the executeNumPy() method? In MySQL at least such count(*) > queries are resolved very quickly so such an approach wouldn't take > twice the time. Then mysql_use_result() could be used to populate the > initialised NumPy array with data row, so there so only ever one > complete copy of the data in memory, and that copy is in the NumPy > array. After some more thought on this subject, and some poking around at NumPy, I came to the following conclusions: Since NumPy arrays are fixed-size, but otherwise sequences (in the multi-dimensional case, sequences of sequences), the best approach would be for the user to pass in a pre-sized array (i.e. from zeros(), and btw, the docstring for zeros is way wrong), and _mysql would simply access it through the Sequence object protocol, and update as many values as it could: If you passed a 100-row array, it would fill 100 rows or as many as were in the result set, whichever is less. Since this requires no special knowledge of NumPy, it could be a standard addition (no conditional compiliation required). This method (tentatively _mysql.fetch_rows_into_array(array)) would return the array argument as the result. IndexError would likely be raised if the array was too narrow (too many columns in result set). Probably this would not be a MySQLdb.Cursor method, but perhaps I can have a seperate module with a cursor subclass which returns NumPy arrays. > > Question: Would it be adequate to put all columns returned into the array? > > If label columns need to be returned, this could pose a problem. They may > > have to be returned as a separate query. Or else non-numeric columns would > > be excluded and returned in a list of tuples (this would be harder). > > Yes, more thought needed here - my initial thought was one NumPy array > per column, particularly since NumPy arrays must be homogenous wrt data > type. Each NumPy array could be named the same as the column from which > it is derived. Okay, I think I know what you mean here. You are wanting to return each column as a (vertical) vector, whereas I am thinking along the lines of returning the result set as a matrix. Is that correct? Since it appears you can efficiently slice out column vectors as a[:,n], is my idea acceptable? i.e. >>> a=Numeric.multiarray.zeros( (2,2),'d') >>> a[1,1]=2 >>> a[0,1]=-1 >>> a[1,0]=-3 >>> a array([[ 0., -1.], [-3., 2.]]) >>> a[:,0] array([ 0., -3.]) >>> a[:,1] array([-1., 2.]) -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From tchur at bigpond.com Fri Apr 28 18:32:44 2000 From: tchur at bigpond.com (Tim Churches) Date: Sat, 29 Apr 2000 08:32:44 +1000 Subject: [Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL References: Message-ID: <390A118C.FB60517D@bigpond.com> Andy Dustman wrote: [...snip...] > > Okay, I think I know what you mean here. You are wanting to return each > column as a (vertical) vector, whereas I am thinking along the lines of > returning the result set as a matrix. Is that correct? Yes, exactly. > Since it appears > you can efficiently slice out column vectors as a[:,n], is my idea > acceptable? i.e. > > >>> a=Numeric.multiarray.zeros( (2,2),'d') > >>> a[1,1]=2 > >>> a[0,1]=-1 > >>> a[1,0]=-3 > >>> a > array([[ 0., -1.], > [-3., 2.]]) > >>> a[:,0] > array([ 0., -3.]) > >>> a[:,1] > array([-1., 2.]) The only problem is that NumPy arrays must be homogeneous wrt type, which means that, say, a categorical column containing just a few distinct values stored as an integer would have to be upcast to a double in the NumPy matrix if it was part of a query which also returned a float. Would it be possible to extend your idea of passing in an array to the query? Perhaps the user could pass in a list of pre-existing, pre-sized sequence objects (which might be rank-1 NumPy arrays of various appropriate data types or Python tuples) which correspond to the columns which are to be returned by the SQL query. It would be up to the user to determine the correct type for each NumPy array and to size the array or tuples correctly. The reason for needing tuples as well as NumPy arrays is that, as you mention, NumPy arrays only support numbers. The intention would be for all of this to be wrapped in a class which may issue a number of small queries to determine the number of rows to be returned and the data types of the columns, so the user is shielded from having to work out these details. The only bit that has to be written in C is the function which takes the sequence of sequences (NumPy Arrays or Python tuples) in which to store the query results, column-wise and stuffs the value for each column for each row of the result set into the appropriate passed-in sequence object. I would be more than happy to assist with the Python code, testing and documentation but my C skills aren't up to helping with the guts of it. In other words, making this part of the low-level _mysql interface would be sufficient. Cheers, Tim C > > -- > andy dustman | programmer/analyst | comstar.net, inc. > telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d > "Therefore, sweet knights, if you may doubt your strength or courage, > come no further, for death awaits you all, with nasty, big, pointy teeth!" From vanroose at ruca.ua.ac.be Sat Apr 29 08:58:29 2000 From: vanroose at ruca.ua.ac.be (Vanroose Wim) Date: Sat, 29 Apr 2000 14:58:29 +0200 Subject: [Numpy-discussion] Gnu Scientic Library Message-ID: <390ADC75.CAABEABA@ruca.ua.ac.be> Dear Madam, Sir, I recently started to use the GSLibrary from http://soureware.cygnus.com/gsl/ The have a interesting collection of special functions. And a started to wrap several of them into my python programs. Does anybody has experiences with GSL?? Would n't it be beautiful to produce a python module based on the GSL special functions. Did somebody already do it??? Wim Vanroose From jhauser at ifm.uni-kiel.de Sat Apr 29 11:50:24 2000 From: jhauser at ifm.uni-kiel.de (Janko Hauser) Date: Sat, 29 Apr 2000 17:50:24 +0200 (CEST) Subject: [Numpy-discussion] Gnu Scientic Library In-Reply-To: <390ADC75.CAABEABA@ruca.ua.ac.be> References: <390ADC75.CAABEABA@ruca.ua.ac.be> Message-ID: <20000429155024.24770.qmail@lisboa.ifm.uni-kiel.de> A very complete set of special functions is already wrapped by Travis Oliphant. But there are numerous other functions in GSL which would be worthwhile to connect to NumPy. Look for the cephes module. One benefit of using such a general library covering different areas is, that with one form of interface a whole slew of functions can be wrapped which also make the packaging a lot easier. Also I think the library is desinged with wrapping to other languages. Just to mention another library with a similar scope, but which is older, perhaps more mature there is also SLATEC from netlib. __Janko From tchur at bigpond.com Sat Apr 29 19:12:11 2000 From: tchur at bigpond.com (Tim Churches) Date: Sun, 30 Apr 2000 09:12:11 +1000 Subject: [Numpy-discussion] Gnu Scientic Library References: <390ADC75.CAABEABA@ruca.ua.ac.be> Message-ID: <390B6C4B.58080AED@bigpond.com> Vanroose Wim wrote: > > Dear Madam, Sir, > > I recently started to use the GSLibrary from > http://soureware.cygnus.com/gsl/ The have a interesting collection of > special functions. And a started to wrap several of them into my python > programs. Does anybody has experiences with GSL?? > > Would n't it be beautiful to produce a python module based > on the GSL special functions. Did somebody already do it??? > > Wim Vanroose Wim, Have a look at http://gestalt-system.sourceforge.net/gestalt_manifesto_exp_097.html and search for the string "GSL". As you will see, we originally proposed to wrap at least the statistical functions of GSL as User-Defined Functions and/or User-Defined Procedures for MySQL. (Note that the GNU Goose library which we mention is no longer under separate, active development, having been rolled back into the GNOME Guppy project, it seems). We still hope to wrap GSL for use directly in MySQL, but this now has a lower priority after experiencing how fast and memory-efficient NumPy is for basic exploratory statistics when used in conjunction with Gary Strangman's stats.py package. Nevertheless, it would be useful to have the GSL library available in Python - is it feasible to make it work with NumPy arrays as well as other Python sequences? We are most interested in the statistical aspects of the library but all the functions are potentially useful. My C skills are not up to the task but perhaps someone else on the GS-discuss mailing list might be able to assist? Regards, Tim Churches > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > http://lists.sourceforge.net/mailman/listinfo/numpy-discussion