From vanandel at atd.ucar.edu Wed Feb 2 13:35:34 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Wed, 02 Feb 2000 11:35:34 -0700 Subject: [Numpy-discussion] single precision routines in NumPy? Message-ID: <389878F6.B7F2DAF@atd.ucar.edu> I would like a single precision version of 'interp' in the Numeric Core. (I want such a routine since I'm operating on huge single precision arrays, that I don't want promoted to double precision.) I've written such a routine, but Paul Dubois and I are discussing the best way of integrating it into the core. One solution is to simply add a new function 'interpf' to arrayfnsmodule.c . Another solution is to add a typecode=Float option to interp. Any opinions on how this single precision version be handled? -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From tla at research.nj.nec.com Thu Feb 3 16:57:41 2000 From: tla at research.nj.nec.com (Tom Adelman) Date: Thu, 03 Feb 2000 16:57:41 -0500 Subject: [Numpy-discussion] newbie: PyArray_Check difficulties Message-ID: <3.0.1.32.20000203165741.00958d00@zingo.nj.nec.com> I'm having a problem with PyArray_Check. If I just call PyArray_Check(args) I don't have a problem, but if I try to assign the result to anything, etc., it crashes (due to acces violation). So, for example the code at the end of this note doesn't work, yet I know an array is being passed and I can, for example, calculate its trace correctly if I type cast it as a PyArrayObject*. Also, a more general question: is this the recommended way to input numpy arrays when using swig, or do most people find it easier to use more elaborate typemaps, or something else? Finally, I apologize if this is the wrong forum to post this question. Please let me know. Thanks, Tom Method from C++ class: PyObject * Test01::trace(PyObject * args) { if (!(PyArray_Check(args))) { // <- crashes here PyErr_SetString(PyExc_ValueError, "must use NumPy array"); return NULL; } return NULL; } Swig file: (where typemaps are the ones included with most recent swig) /* TMatrix.i */ %module Ptest %include "typemaps.i" %{ #include "Test01.h" %} class Test01 { public: PyObject * trace(PyObject *INPUT); Test01(); virtual ~Test01(); }; Python code: import Ptest t = Ptest.Test01() import Numeric a = Numeric.arange(1.1, 2.7, .1) b = Numeric.reshape(a, (4,4)) x = t.trace(b) From Oliphant.Travis at mayo.edu Fri Feb 4 15:49:34 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 4 Feb 2000 14:49:34 -0600 (CST) Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #7 - 1 msg In-Reply-To: <200002042005.MAA15424@lists.sourceforge.net> Message-ID: > I'm having a problem with PyArray_Check. If I just call > PyArray_Check(args) I don't have a problem, but if I try to assign the > result to anything, etc., it crashes (due to acces violation). So, for > example the code at the end of this note doesn't work, yet I know an array > is being passed and I can, for example, calculate its trace correctly if I > type cast it as a PyArrayObject*. > > Also, a more general question: is this the recommended way to input numpy > arrays when using swig, or do most people find it easier to use more > elaborate typemaps, or something else? I have some experience with SWIG but it is not my favorite method to use Numerical Python with C, since you have so little control over how things get allocated. Your problem is probably due to the fact that you do not run import_array() in the module header. There is a typemap in SWIG that let's you put commands to run at module initialization. Try this in your *.i file. %init %{ import_array(); %} This may help. Best, Travis From Oliphant.Travis at mayo.edu Mon Feb 7 19:08:43 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Mon, 7 Feb 2000 18:08:43 -0600 (CST) Subject: [Numpy-discussion] An Experiment in code-cleanup. Message-ID: I wanted to let users of the community know (so they can help if they want, or offer criticism or comments) that over the next several months I will be experimenting with a branch of the main Numerical source tree and endeavoring to "clean-up" the code for Numerical Python. I have in mind a few (in my opinion minor) alterations to the current code-base which necessitates a branch. Guido has made some good suggestions for improving the code base, and both David Ascher and Paul Dubois have expressed concerns over the current state of the source code and given suggestions as to how to improve it. That said, I should emphasize that my work is not authorized, or endorsed, by any of the people mentioned above. It is simply my little experiment. My intent is not to re-create Numerical Python --- I like most of the current functionality --- but to merely, clean-up the code, comment it, and change the underlying structure just a bit and add some features I want. One goal I have is to create something that can go into Python 1.7 at some future point, so this incarnation of Numerical Python may not be completely C-source compatible with current Numerical Python (but it will be close). This means C-extensions that access the underlying structure of the current arrayobject may need some alterations to use this experimental branch if it every becomes useful. I don't know how long this will take me. I'm not promising anything. The purpose of this announcement is just to invite interested parties into the discussion. These are the (somewhat negotiable) directions I will be pursuing. 1) Still written in C but heavily (in my opinion) commented. 2) Addition of bit-types and unsigned integer types. 3) Facility for memory-mapped dataspace in arrays. 4) Slices become copies with the addition of methods for current strict referencing behavior. 5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible). 6) Rank-0 arrays will not be autoconverted to Python scalars, but will still behave as Python scalars whenever Python allows general scalar-like objects in it's operations. Methods will allow the user-controlled conversion to the Python scalars. 7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be warned, however, that I like actual code or code-templates more than just great ideas (truly great ideas are never turned away however ;-) ) If something I do benefits the current NumPy source in a non-invasive, backwards compatible way, I will try to place it in the current CVS tree, but that won't be a priority, as my time does have limitations, and I'm scratching my own itch at this point. Best regards, Travis Oliphant From dubois1 at llnl.gov Mon Feb 7 19:22:45 2000 From: dubois1 at llnl.gov (Paul F. Dubois) Date: Mon, 7 Feb 2000 16:22:45 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: Message-ID: Travis says that I don't necessarily endorse his goals but in fact I do, strongly. If I understand right he intends to make a CVS branch for this experiment and that is fine with me. The only goal I didn't quite understand was: Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. In a world of reusable components the situation is complicated. I would not like to support a dot-product routine, for example, if the user could turn off any double precision behind my back. My needs for precision are local to my algorithm. From archiver at db.geocrawler.com Tue Feb 8 10:52:47 2000 From: archiver at db.geocrawler.com (John Travers) Date: Tue, 8 Feb 2000 07:52:47 -0800 Subject: [Numpy-discussion] Re: A proposal for dot product Message-ID: <200002081552.HAA10267@www.geocrawler.com> This message was sent from Geocrawler.com by "John Travers" Be sure to reply to that address. If the above was implemented, I would be very happy indeed. As a maths student, I use NumPy a lot. And get infuriated with the current implementation. John. Geocrawler.com - The Knowledge Archive From hinsen at cnrs-orleans.fr Tue Feb 8 12:12:56 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 8 Feb 2000 18:12:56 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Mon, 7 Feb 2000 18:08:43 -0600 (CST)) References: Message-ID: <200002081712.SAA03158@chinon.cnrs-orleans.fr> > 3) Facility for memory-mapped dataspace in arrays. I'd really like to have that... > 4) Slices become copies with the addition of methods for current strict > referencing behavior. This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is? > 5) Handling of sliceobjects which consist of sequences of indices (so that > setting and getting elements of arrays using their index is possible). Sounds good as well... > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > still behave as Python scalars whenever Python allows general scalar-like > objects in it's operations. Methods will allow the > user-controlled conversion to the Python scalars. I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare. > 7) Addition of attributes so that different users can configure aspects of > the math behavior, to their hearts content. You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them. > If their is anyone interested in helping in this "unofficial branch > work" let me know and we'll see about setting up someplace to work. Be I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at dirac.cnrs-orleans.fr Tue Feb 8 12:13:20 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Tue, 8 Feb 2000 18:13:20 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Mon, 7 Feb 2000 18:08:43 -0600 (CST)) References: Message-ID: <200002081713.SAA03161@chinon.cnrs-orleans.fr> > 3) Facility for memory-mapped dataspace in arrays. I'd really like to have that... > 4) Slices become copies with the addition of methods for current strict > referencing behavior. This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is? > 5) Handling of sliceobjects which consist of sequences of indices (so that > setting and getting elements of arrays using their index is possible). Sounds good as well... > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > still behave as Python scalars whenever Python allows general scalar-like > objects in it's operations. Methods will allow the > user-controlled conversion to the Python scalars. I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare. > 7) Addition of attributes so that different users can configure aspects of > the math behavior, to their hearts content. You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them. > If their is anyone interested in helping in this "unofficial branch > work" let me know and we'll see about setting up someplace to work. Be I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis at mayo.edu Tue Feb 8 12:38:26 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Tue, 8 Feb 2000 11:38:26 -0600 (CST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <200002081712.SAA03158@chinon.cnrs-orleans.fr> Message-ID: > > 3) Facility for memory-mapped dataspace in arrays. > > I'd really like to have that... This is pretty easy to add but it does require some changes to the underlying structure, So you can expect it. > > > 4) Slices become copies with the addition of methods for current strict > > referencing behavior. > > This will break a lot of code, and in a way that will be difficult to > debug. In fact, this is the only point you mention which would be > reason enough for me not to use your modified version; going through > all of my code to check what effect this might have sounds like a > nightmare. I know this will be a sticky point. I'm not sure what to do exactly, but the current behavior and implementation makes the semantics for slicing an array using a sequence problematic since I don't see a way to represent a reference to a sequence of indices in the underlying structure of an array. So such slices would have to be copies and not references, which makes for an inconsistent code. > > I see the point of having a copying version as well, but why not > implement the copying behaviour as methods and leave indexing as it > is? I want to agree with you, but I think we may need to change the behavior eventually so when is it going to happen? > > > 5) Handling of sliceobjects which consist of sequences of indices (so that > > setting and getting elements of arrays using their index is possible). > > Sounds good as well... This facility is already embedded in the underlying structure. My plan is to go with the original idea that Jim Hugunin and Chris Chase had for slice objects. The sliceobject in python is already general enough for this to work. > > > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > > still behave as Python scalars whenever Python allows general scalar-like > > objects in it's operations. Methods will allow the > > user-controlled conversion to the Python scalars. > > I suspect that full behaviour-compatibility with scalars is > impossible, but I am willing to be proven wrong. For example, Python > scalars are immutable, arrays aren't. This also means that rank-0 > arrays can't be used as keys in dictionaries. > > How do you plan to implement mixed arithmetic with scalars? If the > return value is a rank-0 array, then a single library returning > a rank-0 array somewhere could mess up a program well enough that > debugging becomes a nightmare. > Mixed arithmetic in general is another sticky point. I went back and read the discussion of this point which occured 1995-1996. It was very interesting reading and a lot of points were made. Now we have several years of experience and we should apply what we've learned (of course we've all learned different things :-) ). Konrad, you had a lot to say on this point 4 years ago. I've had a long discussion with a colleague who is starting to "get in" to Numerical Python and he has really been annoyed with the current mixed arithmetic rules. The seem to try to outguess the user. The spacesaving concept helps, but it still seem's like a hack to me. I know there are several opinions, so I'll offer mine. We need simple rules that are easy to teach a newcomer. Right now the rule is farily simple in that coercion always proceeds up. But, mixed arithmetic with a float and a double does not produce something with double precision -- yet that's our rule. I think any automatic conversion should go the other way. Konrad, 4 years ago, you talked about unexpected losses of precision if this were allowed to happen, but I couldn't understand how. To me, it is unexpected to have double precision arrays which are really only carrying single-precision results. My idea of the coercion hierchy is shown below with conversion always happening down when called for. The Python scalars get mapped to the "largest precision" in their category and then normal coercions rules take place. The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request them, they don't want them suddenly changing precision. That is my take anyway. Boolean Character Unsigned long int short Signed long int short Real /* long double */ double float Complex /* __complex__ long double */ __complex__ double __complex__ float Object > > 7) Addition of attributes so that different users can configure aspects of > > the math behavior, to their hearts content. > > You mean global attributes? That could be the end of universally > usable library modules, supposing that people actually use them. I thought I did, but I've changed my mind after reading the discussion in 1995. I don't like global attributes either, so I'm not going there. > > > If their is anyone interested in helping in this "unofficial branch > > work" let me know and we'll see about setting up someplace to work. Be > > I don't have much time at the moment, but I could still help out with > testing etc. Konrad you were very instrumental in getting NumPy off the ground in the first place and I will always appreciate your input. From pauldubois at home.com Tue Feb 8 12:56:11 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Tue, 8 Feb 2000 09:56:11 -0800 Subject: [Numpy-discussion] precision isn't just precision In-Reply-To: Message-ID: Before we all rattle on too long about precision, I'd like to point out that selecting a precision actually carries two consequences in the context of computer languages: 1. Expected: The number of digits of accuracy in the representation of a floating point number. 2. Unexpected: The range of numbers that can be represented by this type. Thus, to a scientist it is perfectly logical that if d is a double and f is a single, d * f has only single precision validity. Unfortunately in a computer if you hold this answer in a single, then it may fail if the contents of d include numbers outside the single range, even if f is 1.0. Thus the rules in C and Fortran that coercion is UP had to do as much with range as precision. From pearu at ioc.ee Tue Feb 8 14:46:16 2000 From: pearu at ioc.ee (Pearu Peterson) Date: Tue, 8 Feb 2000 21:46:16 +0200 (EET) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: Message-ID: On Tue, 8 Feb 2000, Travis Oliphant wrote: > I know there are several opinions, so I'll offer mine. We need > simple rules that are easy to teach a newcomer. Right now the rule is > farily simple in that coercion always proceeds up. But, mixed arithmetic > with a float and a double does not produce something with double > precision -- yet that's our rule. I think any automatic conversion should > go the other way. Remark: If you are consistent then you say here that mixed arithmetic with an int and a float/double produces int?! Right? (I hope that I am wrong.) > Boolean > Character > Unsigned > long > int > short > Signed > long > int > short How about `/* long long */'? Is this left out intentionally? > Real > /* long double */ > double > float Travis, while you are doing revision on NumPy, could you also estimate the degree of difficulty of introducing column major order arrays? Pearu From hinsen at cnrs-orleans.fr Tue Feb 8 14:56:21 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 8 Feb 2000 20:56:21 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Tue, 8 Feb 2000 11:38:26 -0600 (CST)) References: Message-ID: <200002081956.UAA03241@chinon.cnrs-orleans.fr> > I know this will be a sticky point. I'm not sure what to do exactly, but > the current behavior and implementation makes the semantics for slicing an > array using a sequence problematic since I don't see a way to represent a You are right there. But is it really necessary to extend the meaning of slices? Of course everyone wants the functionality of indexing with a sequence, but I'd be perfectly happy to have it implemented as a method. Indexing would remain as it is (by reference), and a new method would provide copying behaviour for element extraction and also permit more generalized sequence indices. In addition to backwards compatibility, there is another argument for keeping indexing behaviour as it is: compatibility with other Python sequence types. If you have a list of lists, which in many ways behaves like a 2D array, and extract the third element (which is thus a list), then this data is shared with the full nested list. > > How do you plan to implement mixed arithmetic with scalars? If the > > return value is a rank-0 array, then a single library returning > > a rank-0 array somewhere could mess up a program well enough that > > debugging becomes a nightmare. > > > > Mixed arithmetic in general is another sticky point. I went back and read > the discussion of this point which occured 1995-1996. It was very What I meant was not mixed-precision arithmetic, but arithmetic in which one operand is a scalar and the other one a rank-0 array. Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays. The elements of a list of numbers are numbers, not number-like sequence objects. But back to precision, which is also a popular subject: > discussion with a colleague who is starting to "get in" to Numerical > Python and he has really been annoyed with the current mixed arithmetic > rules. The seem to try to outguess the user. The spacesaving concept > helps, but it still seem's like a hack to me. I wouldn't say that the current system tries to outguess the user. It simply gives precision a higher priority than memory space. That might not coincide with what a particular user wants, but it is consistent and easy to understand. > I know there are several opinions, so I'll offer mine. We need > simple rules that are easy to teach a newcomer. Right now the rule is > farily simple in that coercion always proceeds up. But, mixed arithmetic Like in Python (for scalars), C, Fortran, and all other languages that I can think of. > Konrad, 4 years ago, you talked about unexpected losses of precision if > this were allowed to happen, but I couldn't understand how. To me, it is > unexpected to have double precision arrays which are really only > carrying single-precision results. My idea of the coercion hierchy is I think this is a confusion of two different meanings of "precision". In numerical algorithms, precision refers to the deviation between an ideal and a real numerical value. In programming languages, it refers to the *maximum* precision that can be stored in a given data type (and is in fact often combined with a difference in range). The upcasting rule thus ensures that 1) No precision is lost accidentally. If you multiply a float by a double, the float might contain the exact number 2, and thus have infinite precision. The language can't know this, so it acts conservatively and chooses the "bigger" type. 2) No overflow occurs unless it is unavoidable (the range problem). > The casual user will never use single precision arrays and so will not > even notice they are there unless they request them. If they do request There are many ways in which single-precision arrays can creep into a program without a user's attention. Suppose you send me some data in a pickled array, which happens to be single-precision. Or I call a library routine that does some internal calculation on huge data arrays, which it keeps at single precision, and (intentionally or by error) returns a single-precision result. I think your "active flag" solution is a rather good solution to the casting problem, because it gives access to a different behaviour in a very explicit way. So unless future experience points out problems, I'd propose to keep it. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Barrett at stsci.edu Tue Feb 8 15:10:39 2000 From: Barrett at stsci.edu (Paul Barrett) Date: Tue, 8 Feb 2000 15:10:39 -0500 (EST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: <14496.16890.698835.619131@nem-srvr.stsci.edu> Message-ID: <14496.26037.829754.450187@nem-srvr.stsci.edu> Travis Oliphant writes: > > > > 1) The re-use of temporary arrays -- to conserve memory. > > Please elaborate about this request. When Python evaluates the expression: >>> Y = B*X + A where A, B, X, and Y are all arrays, B*X creates a temporary array, T. A new array, Y, will be created to hold the result of T + A, and T will be deleted. If T and Y have the same shape and typecode, then instead of creating Y, T can be re-used to conserve memory. > > > > 2) A copy-on-write option -- to enhance performance. > > > > I need more explanation of this as well. This would be an advanced feature of arrays that use memory-mapping or access their arrays from disk. It is similar to the secondary cache of a CPU. The data is held in memory until a write request is made. > > > > 3) The initialization of arrays by default -- to help novices. > > What kind of initialization are you taking about (we have zeros and ones > and random already). For mixed-type (or object) arrays containing strings, zeros() and ones() would be confusing. Therefore by default, integer and floating types are initialized to 0 and string types to ' ', and the option would be available to not initialize the array for performance. > > > > 4) The creation of a standard API -- which I guess is assumed, if it > > is to be part of the Python standard distribution. > > Any suggestions as to what needs to be changed in the already somewhat > standard API. No, not exactly. But the last time I looked, I thought some improvements could be made to it. > > > > 5) The inclusion of IEEE support. > > This was supposed to be there from the beginning, but it didn't get > finished. Jim's original idea was to have two math modules, one which > checked and gave error's for 1/0 and another that returned IEEE inf for > 1/0. > > The current umath does both with different types which is annoying. When I last spoke to Jim about this at IPC6, I was under the impression that IEEE support was not fully implemented and much work still needed to be done. Has this situation changed since then? > > > > And > > > > 6) Enhanced support for mixed-types or objects. > > > > This last issue is very import to me and the astronomical community, > > since we routinely store data as (multi-dimensional) arrays of fixed > > length records or C-structures. A current deficiency of NumPy is that > > the object typecode does not work with the fromstring() method, so > > importing arrays of records from a binary file is just not possible. > > I've been developing my own C-extension type to handle this situation > > and have come to realize that my record type is really just a > > generalization of NumPy's types. > > > I would like to see the code for your generalized type which would help me > see if there were some relatively painless way the two could be merged. recordmodule.c is part of my PyFITS module for dealing with FITS files. You can find it here: ftp://ra.stsci.edu/pub/barrett/PyFITS_0.3.tgz I use NumPy to access fixed-type arrays and the record type for accessing mixed-type arrays. A common example is accessing the second element of a mixed-type (ie. an object) from the entire array. This returns a record type with a single element, which is equivalent to a NumPy array of fixed type. Therefore users expect this object to be a NumPy array and it isn't. They have to convert it to one. > > two C-extension types merged. I think this enhancement can be done > > with minimal change to the current NumPy behavior and minor changes to > > the typecode system. > > If you already see how to do it, then great. Note that NumPy already has some support for an Object type. It has been proposed that it be removed, because it is not well supported and hence few people use it. I have the contrary opinion and feel we should enhance the Object type and make it much more usable. If you don't need it, then you don't have to use it. This enhancement really shouldn't get in the way of those who only use fixed-type arrays. So what changes to NumPy are needed? 1) Instead of a typecode (or in addition to the typecode for backward compatibility), I suggest an optional format keyword, which can be used to specify the mixed-type or object format. Namely, format = 'i, f, s10', where 'i' is an integer type, 'f' a floating point type, and s10 is a string of 10 characters. 2) Array access will be the same as it is now. For example # Create a 10x10 mixed-type array. A = array((10, 10), format = 'i, f, 10s') # Create a 10x10 fixed-type array. B = array((10, 10), typecode = 'i') # Print a 5x5 subarray of mixed-type. print A[:5,:5] # Print a 5x5 subarray of fixed-type print B[:5,:5] # Or # (Note that the 3rd index is optional for fixed-type arrays, it # always defaults to 0.) print B[:5,:5,0] # Print the second element of the mixed-type of the entire array. # Note that this is now an array of fixed-type. print A[:,:,1] The major thorn that I see at this point is how to reconcile the behavior of numbers and strings during operations. But I don't see this as an intractable problem. I actually believe this enhancement will encourage us to create a better and more generic multi-dimensional array module by concentrating on the behavioral aspects of this extension type. Note that J, which NumPy is base upon, allows such mixed-types. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-516-6714 DESD/DPT FAX: 410-516-8615 Baltimore, MD 21218 From pauldubois at home.com Tue Feb 8 15:16:55 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Tue, 8 Feb 2000 12:16:55 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: Konrad wrote: > > In addition to backwards compatibility, there is another argument for > keeping indexing behaviour as it is: compatibility with other Python > sequence types. I claim the current Numeric is INconsistent with other Python sequence types: >>> x = [1, 2, 3, 4, 5] >>> y = x[2:5] >>> x [1, 2, 3, 4, 5] >>> y [3, 4, 5] >>> y[1] = 7 >>> y [3, 7, 5] >>> x [1, 2, 3, 4, 5] So, y is a copy of x[2:5], not a reference. From DavidA at ActiveState.com Tue Feb 8 15:30:23 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 8 Feb 2000 12:30:23 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <14496.26037.829754.450187@nem-srvr.stsci.edu> Message-ID: <000101bf7273$531e3f80$c355cfc0@ski.org> > So what changes to NumPy are needed? > > 1) Instead of a typecode (or in addition to the typecode for backward > compatibility), I suggest an optional format keyword, which can be > used to specify the mixed-type or object format. Namely, format = > 'i, f, s10', where 'i' is an integer type, 'f' a floating point > type, and s10 is a string of 10 characters. I'd suggest to go all the way and make it a real object, not just a string. That object can then have useful attributes, like size in bytes, maxval, minval, some indication of precision, etc. Logically, itemsize should be an attribute of the numeric type of an array, not of the array itself. --david ascher From beausol at exch.hpl.hp.com Tue Feb 8 16:31:30 2000 From: beausol at exch.hpl.hp.com (Beausoleil, Raymond) Date: Tue, 8 Feb 2000 13:31:30 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. Message-ID: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> I've been reading the posts on this topic with considerable interest. For a moment, I want to emphasize the "code-cleanup" angle more literally than the functionality mods suggested so far. Several months ago, I hacked my personal copy of the NumPy distribution so that I could use the Intel Math Kernel Library for Windows. The IMKL is (1) freely available from Intel at http://developer.intel.com/vtune/perflibst/mkl/index.htm; (2) basically BLAS and LAPACK, with an FFT or two thrown in for good measure; (3) optimized for the different x86 processors (e.g., generic x86, Pentium II & III); (4) configured to use 1, 2, or 4 processors; and (5) configured to use multithreading. It is an impressive, fast implementation. I'm sure there are similar native libraries available on other platforms. Probably due to my inexperience with both Python and NumPy, it took me a couple of days to successfully tear out the f2c'd stuff and get the IMKL linking correctly. The parts I've used work fine, but there are probably other features that I haven't tested yet that still aren't up to snuff. In any case, the resulting code wasn't very pretty. As long as the NumPy code is going to be commented and cleaned up, I'd be glad to help make sure that the process of using a native BLAS/LAPACK distribution (which was probably compiled using Fortran storage and naming conventions) is more straightforward. Among the more tedious issues to consider are: (1) The extent of the support for LAPACK. Do we want to stick with LAPACK Lite? (2) The storage format. If we've still got row-ordered matrices under the hood, and we want to use native LAPACK libraries that were compiled using column-major format, then we'll have to be careful to set all of the flags correctly. This isn't going to be a big deal, _unless_ NumPy will support more of LAPACK when a native library is available. Then, of course, there are the special cases: the IMKL has both a C and a Fortran interface to the BLAS. (3) Through the judicious use of header files with compiler-dependent flags, we could accommodate the various naming conventions used when the FORTRAN libraries were compiled (e.g., sgetrf_ or SGETRF). The primary output of this effort would be an expansion of the "Compilation Notes" subsection of Section 15 of the NumPy documentation, and some header files that made the recompilation easier than it is now. Regards, Ray ============================ Ray Beausoleil Hewlett-Packard Laboratories mailto:beausol at hpl.hp.com Vox: 425-883-6648 Fax: 425-883-2535 HP Telnet: 957-4951 ============================ From Oliphant.Travis at mayo.edu Tue Feb 8 16:32:57 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Tue, 8 Feb 2000 15:32:57 -0600 (CST) Subject: [Numpy-discussion] Come take an informal survey. In-Reply-To: <200002082004.MAA26529@lists.sourceforge.net> Message-ID: In an effort to try and get data about what users' attitudes are toward Numerical Python, I'm conducting a survey at sourceforge.net If you would like to participate in the survey, please go to http://www.sourceforge.net, log-in with your sourceforge id and go to the numpy page: http://sourceforge.net/project/?group_id=1369 In the Public Survey section there is a short survey you can fill out. Thank you, Travis Oliphant NumPy Developer From phil at geog.ubc.ca Tue Feb 8 18:33:18 2000 From: phil at geog.ubc.ca (Phil Austin) Date: Tue, 8 Feb 2000 15:33:18 -0800 (PST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: Message-ID: <14496.42942.5355.849670@brant.geog.ubc.ca> Travis Oliphant writes: > > 3) Facility for memory-mapped dataspace in arrays. > For the NumPy users who are as ignorant about mmap, msync, and madvise as I am, I've put a couple of documents on my web site: 1) http://www.geog.ubc.ca/~phil/mmap/mmap.pdf A pdf version of Kevin Sheehan's paper: "Why aren't you using mmap yet?" (19 page Frame postscript orginal, page order back to front). He gives a good discusion of the SV4 VM model, with some mmap examples in C. 2) http://www.geog.ubc.ca/~phil/mmap/threads.html An archived email exchange (initially on the F90 mailing list) between Kevin (who is an independent Solaris consultant) and Brian Sumner (SGI) about the pros and cons of using mmap. Executive summary: i) mmap on Solaris can be a very big win (see bottom of http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when used in combination with WILLNEED/WONTNEED madvise calls to guide the page prefetching. ii) IRIX and some other Unices (Linux 2.2 in particular), haven't implemented madvise, and naive use of mmap without madvise can produce lots of page faulting and much slower io than, say, asynchronous io calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) So I'd love to see mmap in Numpy, but we may need to produce a tutorial outlining the tradeoffs, and giving some examples of madvise/msync/mmap used together (with a few benchmarks). Any mmap module would need to include member functions that call madvise/msync for the mmapped array (but these may be no-ops on several popular OSes.) Regards, Phil From jrwebb at goodnet.com Tue Feb 8 01:03:42 2000 From: jrwebb at goodnet.com (James R. Webb) Date: Mon, 7 Feb 2000 23:03:42 -0700 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> Message-ID: <001801bf71fa$41f681a0$01f936d1@janus> There is now a linux native BLAS available through links at http://www.cs.utk.edu/~ghenry/distrib/ courtesy of the ASCI Option Red Project. There is also ATLAS (http://www.netlib.org/atlas/). Either library seems to link into NumPy without a hitch. ----- Original Message ----- From: "Beausoleil, Raymond" To: Cc: Sent: Tuesday, February 08, 2000 2:31 PM Subject: RE: [Matrix-SIG] An Experiment in code-cleanup. > I've been reading the posts on this topic with considerable interest. For a > moment, I want to emphasize the "code-cleanup" angle more literally than the > functionality mods suggested so far. > > Several months ago, I hacked my personal copy of the NumPy distribution so > that I could use the Intel Math Kernel Library for Windows. The IMKL is > (1) freely available from Intel at > http://developer.intel.com/vtune/perflibst/mkl/index.htm; > (2) basically BLAS and LAPACK, with an FFT or two thrown in for good > measure; > (3) optimized for the different x86 processors (e.g., generic x86, Pentium > II & III); > (4) configured to use 1, 2, or 4 processors; and > (5) configured to use multithreading. > It is an impressive, fast implementation. I'm sure there are similar native > libraries available on other platforms. > > Probably due to my inexperience with both Python and NumPy, it took me a > couple of days to successfully tear out the f2c'd stuff and get the IMKL > linking correctly. The parts I've used work fine, but there are probably > other features that I haven't tested yet that still aren't up to snuff. In > any case, the resulting code wasn't very pretty. > > As long as the NumPy code is going to be commented and cleaned up, I'd be > glad to help make sure that the process of using a native BLAS/LAPACK > distribution (which was probably compiled using Fortran storage and naming > conventions) is more straightforward. Among the more tedious issues to > consider are: > (1) The extent of the support for LAPACK. Do we want to stick with LAPACK > Lite? > (2) The storage format. If we've still got row-ordered matrices under the > hood, and we want to use native LAPACK libraries that were compiled using > column-major format, then we'll have to be careful to set all of the flags > correctly. This isn't going to be a big deal, _unless_ NumPy will support > more of LAPACK when a native library is available. Then, of course, there > are the special cases: the IMKL has both a C and a Fortran interface to the > BLAS. > (3) Through the judicious use of header files with compiler-dependent flags, > we could accommodate the various naming conventions used when the FORTRAN > libraries were compiled (e.g., sgetrf_ or SGETRF). > > The primary output of this effort would be an expansion of the "Compilation > Notes" subsection of Section 15 of the NumPy documentation, and some header > files that made the recompilation easier than it is now. > > Regards, > > Ray > > ============================ > Ray Beausoleil > Hewlett-Packard Laboratories > mailto:beausol at hpl.hp.com > Vox: 425-883-6648 > Fax: 425-883-2535 > HP Telnet: 957-4951 > ============================ > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG at python.org > http://www.python.org/mailman/listinfo/matrix-sig > From amullhau at zen-pharaohs.com Wed Feb 9 01:51:09 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 01:51:09 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: <03f401bf72ca$0e0608e0$5063cb0a@amullhau> > I'd be perfectly happy to have it implemented as a > method. Indexing would remain as it is (by reference), and a new > method would provide copying behaviour for element extraction and also > permit more generalized sequence indices. I think I can live with that, as long as it _syntactically_ looks like indexing. This is one case where the syntax is more important than functionality. There are things you want to index with indices, etc., and the composition with parenthesis-like (Dyck language) syntax has proved to be one of the few readable ways to do it. > In addition to backwards compatibility, there is another argument for > keeping indexing behaviour as it is: compatibility with other Python > sequence types. If you have a list of lists, which in many ways > behaves like a 2D array, and extract the third element (which is thus > a list), then this data is shared with the full nested list. _Avoiding_ data sharing will eventually be more important that supporting data sharing since memory continues to get cheaper but memory bandwidth and latency do not improve at the same rate. Locality of reference is hard to control when there is a lot of default data sharing, and performance suffers, yet it becomes important on more and more scales as memory systems become more and more hierarchical. Ultimately, the _semantics_ we like will be implemented efficiently by emulating references and copies in code which copies and references as it sees fit and keeps track of which copies are "really" references and which references are really "copies". I've thought this through for the "everything gets copied" languages and it isn't too mentally distressing - you simply reference count fake copies. The "everything is a reference" languages are less clean, but the database people have confronted that problem. > Which reminds me: rank-0 arrays are also incompatible with the > nested-list view of arrays. There are ways out of that trap. Most post-ISO APLs provide examples of how to cope. > > I know there are several opinions, so I'll offer mine. We need > > simple rules that are easy to teach a newcomer. Right now the rule is > > farily simple in that coercion always proceeds up. But, mixed arithmetic > > Like in Python (for scalars), C, Fortran, and all other languages that > I can think of. And that is not a bad thing. But which way is "up"? (See example below.) > > Konrad, 4 years ago, you talked about unexpected losses of precision if > > this were allowed to happen, but I couldn't understand how. To me, it is > > unexpected to have double precision arrays which are really only > > carrying single-precision results. Most people always hate, and only sometimes detect, when that happens. It specifically contravenes the Geneva conventions on programming mental hygiene. > The upcasting rule thus ensures that > > 1) No precision is lost accidentally. More or less. More precisely, it depends on what you call an accident. What happens when you add the IEEE single precision floating point value 1.0 to the 32-bit integer 2^30? A _lot_ of people don't expect to get the IEEE single precision floating point value 2.0^30, but that is what happens in some languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more sense? Now what about the case where the 32 bit integer is signed and adding one to it will "wrap around" if the value remains an integer? Because these two examples might make double precision or a wider integer (if available) seem the correct answer, suppose it's only one element of a gigantic array? Let's now talk about complex values.... There's plenty of rough edges like this when you mix numerical types. It's guaranteed that everybody's ox will get gored somewhere. > 2) No overflow occurs unless it is unavoidable (the range problem). > > > The casual user will never use single precision arrays and so will not > > even notice they are there unless they request them. If they do request > > There are many ways in which single-precision arrays can creep into a > program without a user's attention. Absolutely. > Suppose you send me some data in a > pickled array, which happens to be single-precision. Or I call a > library routine that does some internal calculation on huge data > arrays, which it keeps at single precision, and (intentionally or by > error) returns a single-precision result. And the worst one is when the accuracy of the result is single precision, but the _type_ of the result is double precision. There is a function in S-plus which does this (without documenting it, of course) and man was that a pain in the neck to sort out. Today, I just found another bug in one of the S-plus functions - turns out that that if you hand a complex triangular matrix and a real right hand side to the triangular solver (backsubstitution) it doesn't cast the right hand side to complex and uses whatever values are subsequent in memoty to the right hand side as if they were part of the vector. Obviously, when testing the function, they didn't try this mixed type case. But interpreters are really convenient for writing code so that you _don't_ have to think about types all the time and do your own casting. Which is why stubbing your head on an unexpected cast is so unlooked for. > I think your "active flag" solution is a rather good solution to the > casting problem, because it gives access to a different behaviour in a > very explicit way. So unless future experience points out problems, > I'd propose to keep it. Is there a simple way to ensure that no active arrays are ever activated at any time when I use Numerical Python? Later, Andrew Mullhaupt From amullhau at zen-pharaohs.com Wed Feb 9 02:17:39 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 02:17:39 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <14496.42942.5355.849670@brant.geog.ubc.ca> Message-ID: <03fe01bf72cd$c040d640$5063cb0a@amullhau> > Travis Oliphant writes: > > > > 3) Facility for memory-mapped dataspace in arrays. > > For the NumPy users who are as ignorant about mmap, msync, > and madvise as I am, I've put a couple of documents on > my web site: I have Kevin's "Why Aren't You Using mmap() Yet?" on my site. Kevin is working on a new (11th anniversary edition? 1xth anniversary edition?). By the way, Uresh Vahalia's book on Unix Internals is a very good idea for anyone not yet familiar with modern operating systems, especially Unices. Kevin is extremely knowledgable on this subject, and several others. > Executive summary: > > i) mmap on Solaris can be a very big win Orders of magnitude. > (see bottom of > http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when > used in combination with WILLNEED/WONTNEED madvise calls to > guide the page prefetching. And with the newer versions of Solaris, madvise() is a good way to go. madvise is _not_ SVR4 (not in SVID3) but it _is_ in the OSF/1 AES which means it is _not_ vendor specific. But the standard part of madvise is that it is a "hint". However everything it actually _does_ when you hint the kernel with madvise is specific usually to some versions of an operating system. There are tricks to get around madvise not doing everything you want (WONTNEED didn't work in Solaris for a long time. Kevin found a trick that worked really well instead. Kevin knows people at Sun, since he was one of the very earliest employees there, and so now the trick Kevin used to suggest has now been found to be the implementation of WONTNEED in Solaris.) And that trick is well worth understanding. It happens that msync() is a good call to know. It has an undocumented behavior on Solaris that when you msync a memory region with MS_INVALIDATE | MS_ASYNC, what happens is the dirty pages are queued for writing and backing store is available immediately, or if dirty, as soon as written out. This means that the pager doesn't have to run at all to scavenge the pages. Linux didn't do this last time I looked. I suggested it to the kernel guys and the idea got some positive response, but I don't know if they did it. > ii) IRIX and some other Unices (Linux 2.2 in particular), haven't > implemented madvise, and naive use of mmap without madvise can produce > lots of page faulting and much slower io than, say, asynchronous io > calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) IRIX has an awful implementation of mmap. And SGI people go around badmouthing mmap; not that they don't have cause, but they are usually very surprised to see how big the win is with a good implementation. Of course, the msync() trick doesn't work on IRIX last I looked, which leads to the SGI people believing that mmap() is brain damaged because it runs the pager into the ground. It's a point of view that is bound to come up. HP/UX was really wacked last time I looked. They had a version (10) which supported the full mmap() on one series of workstations (700, 7000, I forget, let's say 7e+?) and didn't support it except in the non-useful SVR3.2 way on another series of workstations (8e+?). The reason was that the 8e+? workstations were multiprocessor and they hadn't figured out how to get the newer kernel flying on the multiprocessors. I know Konrad had HP systems at one point, maybe he has the scoop on those. > So I'd love to see mmap in Numpy, but we may need to produce a > tutorial outlining the tradeoffs, and giving some examples of > madvise/msync/mmap used together (with a few benchmarks). Any mmap > module would need to include member functions that call madvise/msync > for the mmapped array (but these may be no-ops on several popular OSes.) I don't know if you want a separate module; maybe what you want is the normal allocation of memory for all Numerical Python objects to be handled in a way that makes sense for each operating system. The approach I took when I was writing portable code for this sort of thing was to write a wrapper for the memory operation semantics and then implement the operations as a small library that would be OS specific, although not _that_ specific. It was possible to write single source code for SVID3 and OSF/AES1 systems with sparing use of conditional defines. Unfortunately, that code is the intellectual property of another firm, or else I'd donate it as an example for people who want to learn stuff about mmap. As it stands, there was some similar code I was able to produce at some point. I forget who here has a copy, maybe Konrad, maybe David Ascher. Later, Andrew Mullhaupt From skaller at maxtal.com.au Wed Feb 9 11:12:49 2000 From: skaller at maxtal.com.au (skaller) Date: Thu, 10 Feb 2000 03:12:49 +1100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: <38A19201.8A43EC01@maxtal.com.au> Konrad Hinsen wrote: > But back to precision, which is also a popular subject: but one which even numerical programmers don't seem to understand ... > The upcasting rule thus ensures that > > 1) No precision is lost accidentally. If you multiply a float by > a double, the float might contain the exact number 2, and thus > have infinite precision. The language can't know this, so it > acts conservatively and chooses the "bigger" type. > > 2) No overflow occurs unless it is unavoidable (the range problem). .. which is all wrong. It is NOT safe to convert floating point from a lower to a higher number of bits. ALL such conversions should be removed for this reason: any conversions should have to be explicit. The reason is that whether a conversion to a larger number is safe or not is context dependent (and so it should NEVER be done silently). Consider a function k0 = 100 k = 99 while k < k0: .. k0 = k k = ... which refines a calculation until the measure k stops decreasing. This algorithm may terminate when k is a float, but _fail_ when k is a double -- the extra precision may cause the algorithm to perform many useless iterations, in which the precision of the result is in fact _lost_ due to rounding error. What is happening is that the real comparison is probably: k - k0 < epsilon where epsilon was 0.0 in floating point, and thus omitted. My point is that throwing away information is what numerical programming is all about. Numerical programmers need to know how big numbers are, and how much significance they have, and optimise calculations accordingly -- sometimes by _using_ the precision of the working types to advantage. to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have: it can be misleading. So a language should not assume that it is OK to add more precision. It may not be. -- John (Max) Skaller, mailto:skaller at maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller From gpk at bell-labs.com Wed Feb 9 11:23:47 2000 From: gpk at bell-labs.com (Greg Kochanski) Date: Wed, 09 Feb 2000 11:23:47 -0500 Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #10 - 10 msgs References: <200002091617.IAA28931@lists.sourceforge.net> Message-ID: <38A19493.6E2394E6@bell-labs.com> > From: "Andrew P. Mullhaupt" > > The upcasting rule thus ensures that > > > > 1) No precision is lost accidentally. > > More or less. > > More precisely, it depends on what you call an accident. What happens when > you add the IEEE single precision floating point value 1.0 to the 32-bit > integer 2^30? A _lot_ of people don't expect to get the IEEE single > precision floating point value 2.0^30, but that is what happens in some > languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more > sense? Now what about the case where the 32 bit integer is signed and adding > one to it will "wrap around" if the value remains an integer? Because these > two examples might make double precision or a wider integer (if available) > seem the correct answer, suppose it's only one element of a gigantic array? > Let's now talk about complex values.... > It's most important that the rules be simple, and (preferably) close to common languages. I'd suggest C. In my book, anyone who carelessly mixes floats and ints deserves whatever punishment the language metes out. I've done numeric work in languages where casting was by request _only_ (e.g., Limbo, for Inferno), and I found, to my surprise, that automatic type casting these type casting is only a mild convenience. Writing code with manual typecasting is surprisingly easy. Since automatic typecasting only buys a small improvement in ease of use, I'd want to be extremely sure that it doesn't cause many problems. It's very easy to write some complicated set of rules that wastes more time (in the form of unexpected, untraceable bugs) than it saves. By the way, automatic downcasting has a hidden problems if python is ever set to trap underflow errors. I had a program that would randomly crash every 10th (or so) time I ran it with a large dataset (1000x1000 linear algebra). After days of hair-pulling, I found that the matrix was being converted from double to float at one step, and about 1 in 10,000,000 of the entries was too small to represent as a single precision number. That very rare event would underflow, be trapped, and crash the program with a floating point exception. From hinsen at cnrs-orleans.fr Wed Feb 9 12:17:30 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 18:17:30 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <38A19201.8A43EC01@maxtal.com.au> (message from skaller on Thu, 10 Feb 2000 03:12:49 +1100) References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> Message-ID: <200002091717.SAA10604@chinon.cnrs-orleans.fr> > silently). Consider a function > > k0 = 100 > k = 99 > while k < k0: > .. > k0 = k > k = ... > > which refines a calculation until the measure k stops decreasing. > This algorithm may terminate when k is a float, but _fail_ when > k is a double -- the extra precision may cause the algorithm I'd call this a buggy implementation. Convergence criteria should be explicit and not rely on the internal representation of data types. Neither Python nor C guarantees you any absolute bounds for precision and value range, and even languages that do (such as Fortran 9x) only promise to give you a data type that is *at least* as big as your specification. > programming is all about. Numerical programmers need to know > how big numbers are, and how much significance they have, > and optimise calculations accordingly -- sometimes by _using_ > the precision of the working types to advantage. If you care at all about portability, you shouldn't even think about this. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From amullhau at zen-pharaohs.com Wed Feb 9 12:21:37 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 12:21:37 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> Message-ID: <054301bf7322$23cbbbe0$5063cb0a@amullhau> > Konrad Hinsen wrote: > > > But back to precision, which is also a popular subject: > > but one which even numerical programmers don't seem to > understand ... Some do, some don't. > It is NOT safe to convert floating point from a lower to a higher > number > of bits. It is usually safe. Extremely safe. Safe enough that code in which it is _not_ safe is badly designed. > ALL such conversions should be removed for this reason: any > conversions should have to be explicit. I really hope not. A generic function with six different arguments becomes an interesting object in a language without automatic conversions. Usually, a little table driven piece of code has to cast the arguments into conformance, and then multiple versions of similar code are applied. > which refines a calculation until the measure k stops decreasing. > This algorithm may terminate when k is a float, but _fail_ when > k is a double -- the extra precision may cause the algorithm > to perform many useless iterations, in which the precision > of the result is in fact _lost_ due to rounding error. This is a classic bad programming practice and _it_ is what should be eliminated. It is a good, (and required, if you work for me), practice that: 1. All iterations should have termination conditions which are correct; that is, prevent extra iterations. This is typically precision sensitive. But that is simply something that has to be taken into account when writing the termination condition. 2. All iterations should be protected against an unexpectedly large number of iterates taking place. There are examples of iterations which are intrinsically stable in lower precision and not in higher precision (Brun's algorithm) but those are quite rare in practice. (Note that the Fergueson-Forcade algorithm, as implemented by Lenstra, Odlyzko, and others, has completely supplanted any need to use Brun's algorithm as well.) When an algorithm converges because of lack of precision, it is because the rounding error regularizes the problem. This is normally referred to in the trade as "idiot regularization". It is in my experience, invariably better to actually choose a regularization that is specific to the computation than to rely on rounding effects which might be different from machine to machine. In particular, your type of example is in for serious programmer enjoyment hours on Intel or AMD machines, which have 80 bit wide registers for all the floating point arithmetic. Supporting needless machine dependency is not something to argue for, either, since the Cray style floating point arithmetic has a bad error model. Even Cray has been beaten into submission on this, finally releasing IEEE compliant processors, but only just recently. > to put this another way, it is generally bad to keep more digits (bits) > or precision than you actually have I couldn't agree less. The exponential function and inner product accumulation are famous examples of why extra bits are important in intermediate computations. It's almost impossible to have an accurate exponential function without using extra precision - which is one reason why so many machines have extra bits in their FPUs and there is an IEEE "extended" precision type. The storage history effects which result from temporarily increased precision are well understood, mild in that they violate no common error models used in numerical analysis. And for those few cases where testing for equality is needed for debugging purposes, many systems permit you to impose truncation and eliminate storage history issues. Later, Andrew Mullhaupt From hinsen at cnrs-orleans.fr Wed Feb 9 12:31:00 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 18:31:00 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: Message-ID: <200002091731.SAA10614@chinon.cnrs-orleans.fr> > > In addition to backwards compatibility, there is another argument for > > keeping indexing behaviour as it is: compatibility with other Python > > sequence types. > > I claim the current Numeric is INconsistent with other Python sequence > types: > > >>> x = [1, 2, 3, 4, 5] > >>> y = x[2:5] > >>> x > [1, 2, 3, 4, 5] > >>> y > [3, 4, 5] > >>> y[1] = 7 > >>> y > [3, 7, 5] > >>> x > [1, 2, 3, 4, 5] > > So, y is a copy of x[2:5], not a reference. Good point. So we can't be consistent with all properties of other Python sequence types. Which reminds me of some very different compatibility problem in NumPy that can (and should) be removed: the rules for integer division and remainders for negative arguments are not the same. NumPy inherits the C rules, Python has its own. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gpk at bell-labs.com Wed Feb 9 13:25:00 2000 From: gpk at bell-labs.com (Greg Kochanski) Date: Wed, 09 Feb 2000 13:25:00 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] Re: Matrix-SIG digest, Vol 1 #364 - 9 msgs References: <20000209064911.9404C1CD3C@dinsdale.python.org> <38A19236.51D1879F@bell-labs.com> <053b01bf731f$30118c20$5063cb0a@amullhau> Message-ID: <38A1B0FC.168FDCB5@bell-labs.com> "Andrew P. Mullhaupt" wrote: > > It's most important that the rules be simple, and (preferably) close > > to common languages. I'd suggest C. > > That is a good example of a language which has a pretty weird history on > this particular matter. True. The only real advantage of C is that so many people are used to it. Don't forget the human element. FORTRAN would also be a reasonable choice. There's a big cost to learning a new language; if it gets too big, people simply won't use Python. > > Since automatic typecasting only buys a small improvement > > in ease of use, I'd want to be extremely sure that it doesn't cause > > many problems. > > Au contraire. It is a huge win. Try writing a "generic" function with six > arguments which can sensibly be integers, or single or double precision > variables. If you have to test variables to see what they are, then you have > to essentially write a table driven typecaster. If, as in Fortran, you have > to write different functions for different argument types then you have the > dangerous programming practice of having several different pieces of code > which do essentially the same computation. While that's nice to say, it doesn't really translate completely to practice. A lot of functions don't make sense with arbitrary objects; and some require subtle changes. For instance, who wants a matrix inversion function that operates on integers, using integer division inside? Lots of functions have DOUBLE_EPS or FLOAT_EPS embedded inside them. One has to change the small number when you change the data type. I'll grant you that running things with both doubles or floats is often useful. I'd be happy with automatic upcasting among them. I'd be moderately happy with upcasting among the integers. I really don't see any crying need to mix integers with floating point numbers. I'd like some examples to make me believe that mixing ints and floats is a 'huge win'. From hinsen at cnrs-orleans.fr Wed Feb 9 13:39:46 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 19:39:46 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> (beausol@exch.hpl.hp.com) References: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> Message-ID: <200002091839.TAA10737@chinon.cnrs-orleans.fr> > (1) The extent of the support for LAPACK. Do we want to stick with LAPACK > Lite? There has been a full LAPACK interface for a long while, of which LAPACK Lite is just the subset that is needed for supporting the high-level routines in the module LinearAlgebra. I seem to have lost the URL to the full version, but it's on my disk, so I can put it onto my FTP server if there is a need. > (2) The storage format. If we've still got row-ordered matrices under the > hood, and we want to use native LAPACK libraries that were compiled using > column-major format, then we'll have to be careful to set all of the flags > correctly. This isn't going to be a big deal, _unless_ NumPy will support > more of LAPACK when a native library is available. Then, of course, there The low-level interface routines don't take care of this. It's the high-level Python code (module LinearAlgebra) that sets the transposition argument correctly. That looks like a good compromise to me. > (3) Through the judicious use of header files with compiler-dependent flags, > we could accommodate the various naming conventions used when the FORTRAN > libraries were compiled (e.g., sgetrf_ or SGETRF). That's already done! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From pitts.todd at mayo.edu Wed Feb 9 14:05:02 2000 From: pitts.todd at mayo.edu (Pitts, Todd A., Ph.D.) Date: Wed, 9 Feb 2000 13:05:02 -0600 (CST) Subject: [Numpy-discussion] Upcasting Message-ID: Here are my two cents worth on the subject... Most of what has been said in this thread (at least what I have read) I find very valuable. Apparently, many people have been thinking about the subject. I view this problem as inherent in a language without an Lvalue (like C) that allows a very explicit and clear definition from the programmer's point of view as to the size of the container you are going put things in. The language in many cases simply returns an object to you and has made some decision as to what you "needed" or "wanted". Of course, this is one of the things that makes languages such an Numerical Python, Matlab, IDL, etc. very nice for protyping and investigating. In many cases this decision will be adequate or acceptable. In some, quite simply, it will not be. At this point the programmer has to have a good means of managing this decision himself. If memory is not a constraint, I can think of very few situations (none, actually) where I would choose to go with something other than the Numerical Python default of double. In general, that is what you get when creating python arrays unless you make some effort to obtain some other type. However, in some important (read "cases that affect the author") situations memory is a very critical constraint. Typically, in Numerical Python I use 4-byte floats. In fact, one of the reasons I use Numerical Python is because I don't *need* doubles and matlab for example is really only setup to work gracefully with doubles. I do *need* to conserve memory as I deal with very large data sets. It seems the question we are discussing is not really "what *should* be done in terms of casting?" but "what provides good enough decisions much of the time *and* a gracefull way to manage the decisions when "good enough" no longer applies to you?" Currently, this is not a trivial thing to manage. Reading in a 100 MB data set and multiplying by the python scalar 2 produces a 200 MB data set. I manage this by wrapping the 2 in an array. This happens, of course, all the time. Having to do this once is not a big deal, but doing everywhere in code that uses floats makes for cluttered code -- not something which I expect to have to write in an otherwise very elegant and concise language. Also, I often find myself trudging through code looking for the subtlety that converted my floats to doubles, doubled my memory usage and then caused subsequent float only routines to error out. To those who are constrained to use floats this is awkward and time consuming. To those who are not I would say -- use doubles. The flag that causes an array to be a "space saving array" seems to be a temporary fix (that doesn't mean it was a bad idea -- just that it feels messy and effectively adds complexity that shouldn't be there). It also, mearly postpones the problem as I understand it -- what happens when I multiply two space saving arrays? We simply will never get away from situations where we have to manage the interaction ourselves and so we should be careful not to make that management so awkward (and currently I think it is awkward) that the floats, bytes, shorts, etc. become marginalized in their utility. My suggestion is to go with the rule that a simple hirearchy (in which downcasting is the rule) longs integers shorts bytes cardinals booleans doubles complex doubles <--- default floats complex floats for the most part makes good decisions: Principally because people who are not constrained to conserve memory will use the larger, default types all the time and not wince. They don't *need* floats or bytes. If anyone gives them a float a simple astype('d') or astype('D') to make sure it becomes a double lets them go on their way. Types like integers and bytes are effectively treated as being precise. If you are constrained to conserve memory by staying with floats or bytes instead of just reading things in from disk and making them doubles it will not be so awkward to manage the types in large programs. If I use someones code and they have a scalar anywhere in it at some point, even if I (or they) cast the output, memory usage swells at least for intermediate calculations. Effectively, python *has* 4-byte floats but programming with them is awkward. This means, of course, that multiplying a float array by a double array produces a float. Multiplying a double array by anything above it produces a double. etc. For my work, if I have a float anywhere in the calculation I don't believe precision beyond that in the output so getting a float back is reasonable. I know that some operations produce "more precision" and so I would cast the array if I needed to take advantage of that. Perhaps the downcasting is *not* the way to go. However, I definately think the current awkwardness should be eliminated. I hope my comments will not be percieved as being critical of the original language designers. I find python to be very useful or I wouldn't have bothered to make the comments at all. -Todd Pitts From beausol at exch.hpl.hp.com Wed Feb 9 14:16:58 2000 From: beausol at exch.hpl.hp.com (Beausoleil, Raymond) Date: Wed, 9 Feb 2000 11:16:58 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. Message-ID: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> From: Konrad Hinsen [mailto:hinsen at cnrs-orleans.fr] > > (1) The extent of the support for LAPACK. Do we want to stick > > with LAPACK Lite? > > There has been a full LAPACK interface for a long while, of which > LAPACK Lite is just the subset that is needed for supporting the > high-level routines in the module LinearAlgebra. I seem to have lost > the URL to the full version, but it's on my disk, so I can put it > onto my FTP server if there is a need. Yes, I'd like to get a copy! You can simply e-mail it to me, if you'd prefer. > > (2) The storage format. If we've still got row-ordered matrices > > under the hood, and we want to use native LAPACK libraries that > > were compiled using column-major format, then we'll have to be > > careful to set all of the flags correctly. This isn't going to > > be a big deal, _unless_ NumPy will support more of LAPACK when a > > native library is available. Then, of course, there ... > > The low-level interface routines don't take care of this. It's the > high-level Python code (module LinearAlgebra) that sets the > transposition argument correctly. That looks like a good compromise > to me. I'll have to look at this more carefully. Due to my relative lack of Python experience, I hacked the C code so that Fortran routines could be called instead, producing the expected results. > > (3) Through the judicious use of header files with compiler- > > dependent flags, we could accommodate the various naming > > conventions used when the FORTRAN libraries were compiled (e.g., > > sgetrf_ or SGETRF). > > That's already done! Where? Even in the latest f2c'd source code that I downloaded from SourceForge, I see all names written using the lower-case-trailing-underscore convention (e.g., dgeqrf_). The Intel MKL was compiled from Fortran source using the upper-case-no-underscore convention (e.g., DGEQRF). If I replace dgeqrf_ with DGEQRF in dlapack_lite.c (and a few other tweaks), then the subsequent link with the IMKL succeeds. ============================ Ray Beausoleil Hewlett-Packard Laboratories mailto:beausol at hpl.hp.com Vox: 425-883-6648 Fax: 425-883-2535 HP Telnet: 957-4951 ============================ From godzilla at netmeg.net Wed Feb 9 15:47:34 2000 From: godzilla at netmeg.net (Les Schaffer) Date: Wed, 9 Feb 2000 15:47:34 -0500 (EST) Subject: [Numpy-discussion] digest Message-ID: <14497.53862.853166.521584@gargle.gargle.HOWL> just switched over from matrix-sig to numpy-discussion. in the process i changed to the digest version and got my first issue. is it possible to distribute the digests properly formatted as multipart/digests as per rfc822 and company? having such a formatted digest makes it very easy when using an emailer like VM in emacs: VM automatically displays the digest as a virtual folder, allowing one to browse all the posts in a given digest very quickly and easily. don't know whether the other (lacklusters) emailers out there will handle it so nicely, but i don't think the extra required markers will interfere with your reading of the digests at all. highly recommended. i'd be glad to work with whoever has control over this to ensure that the proper markers get placed into the digests. les schaffer From hinsen at cnrs-orleans.fr Wed Feb 9 15:58:47 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 21:58:47 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> (beausol@exch.hpl.hp.com) References: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> Message-ID: <200002092058.VAA10798@chinon.cnrs-orleans.fr> > > onto my FTP server if there is a need. > > Yes, I'd like to get a copy! You can simply e-mail it to me, if you'd > prefer. OK, coming soon... > I'll have to look at this more carefully. Due to my relative lack of Python > experience, I hacked the C code so that Fortran routines could be called > instead, producing the expected results. That's fine, you can simply replace the f2c-generated code by Fortran-compiled code, as long as the calling conventions are the same. I have used optimized BLAS as well on some machines. > Where? Even in the latest f2c'd source code that I downloaded from > SourceForge, I see all names written using the > lower-case-trailing-underscore convention (e.g., dgeqrf_). The Intel MKL was Sure, f2c generates the underscores. But the LAPACK interface code (the one I'll send you, and also LAPACK Lite) supports both conventions, controlled by the preprocessor symbol NO_APPEND_FORTRAN (maybe not the most obvious name). On the other hand, there is no support for uppercase names; that convention is not used in the Unix world. But I suppose it could be added by machine transformation of the code. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From da at ski.org Wed Feb 9 16:23:22 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 13:23:22 -0800 Subject: [Numpy-discussion] digest References: <14497.53862.853166.521584@gargle.gargle.HOWL> Message-ID: <00cd01bf7343$ea55ae30$0100000a@ski.org> > just switched over from matrix-sig to numpy-discussion. in the process > i changed to the digest version and got my first issue. > > is it possible to distribute the digests properly formatted as > multipart/digests as per rfc822 and company? Did you try to edit your configuration on the mailman control panel? There is a choice between MIME and plain-text digests. --david ascher From skaller at maxtal.com.au Wed Feb 9 17:04:13 2000 From: skaller at maxtal.com.au (skaller) Date: Thu, 10 Feb 2000 09:04:13 +1100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> <200002091717.SAA10604@chinon.cnrs-orleans.fr> Message-ID: <38A1E45D.6E0EF317@maxtal.com.au> Konrad Hinsen wrote: > > > silently). Consider a function > > > > k0 = 100 > > k = 99 > > while k < k0: > > .. > > k0 = k > > k = ... > > > > which refines a calculation until the measure k stops decreasing. > > This algorithm may terminate when k is a float, but _fail_ when > > k is a double -- the extra precision may cause the algorithm > > I'd call this a buggy implementation. Convergence criteria should be > explicit and not rely on the internal representation of data > types. > If you care at all about portability, you shouldn't even think about > this. But sometimes you DON'T care about portability. Sometimes, you want the best result the architecture can support, and so you need to perform a portable computation of an architecture dependent value. -- John (Max) Skaller, mailto:skaller at maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller From da at ski.org Wed Feb 9 18:50:03 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 15:50:03 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <14496.42942.5355.849670@brant.geog.ubc.ca> <03fe01bf72cd$c040d640$5063cb0a@amullhau> Message-ID: <037d01bf7358$630b8980$0100000a@ski.org> > it as an example for people who want to learn stuff about mmap. As it > stands, there was some similar code I was able to produce at some point. I > forget who here has a copy, maybe Konrad, maybe David Ascher. > > Later, > Andrew Mullhaupt I did have some of that code, but it was almost 3 years ago and five computers ago. In other words, it's *somewhere*. I'll start a grep, but don't hold your breath... --da From da at ski.org Thu Feb 10 01:52:21 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 22:52:21 -0800 Subject: [Numpy-discussion] Binary distribution available References: <14496.42942.5355.849670@brant.geog.ubc.ca> <03fe01bf72cd$c040d640$5063cb0a@amullhau> Message-ID: <051f01bf7393$61bff4e0$0100000a@ski.org> With Travis' wise advice, I appear to have succeeded in putting forth a binary installation of Numerical-15.2. Due to a bug in distutils, this is an 'install in place' package, instead of a 'run python setup.py install' package. So, unzip the file in your main Python tree, and it should 'work'. Let me (and Paul and Travis) know if it doesn't. Download is available from the main page (http://sourceforge.net/project/?group_id=1369 look for [zip]) or from http://download.sourceforge.net/numpy/python-numpy-15.2.zip --david ascher From gvwilson at nevex.com Thu Feb 10 13:28:51 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 10 Feb 2000 13:28:51 -0500 (EST) Subject: [Numpy-discussion] re: scientific Python publishing venue Message-ID: Hi, folks. A former colleague of mine is now editing a magazine devoted to scientific computing, and is looking for articles. If you're doing something scientific with Python, and want to tell the world about it, please give me a shout, and I'll forward more information. Greg Wilson http://www.software-carpentry.com From archiver at db.geocrawler.com Thu Feb 17 12:34:11 2000 From: archiver at db.geocrawler.com (andrew x swan) Date: Thu, 17 Feb 2000 09:34:11 -0800 Subject: [Numpy-discussion] more speed? Message-ID: <200002171734.JAA08011@www.geocrawler.com> This message was sent from Geocrawler.com by "andrew x swan" Be sure to reply to that address. hi - i've only just started using python and numpy... the program i wrote below runs much more slowly than a fortran equivalent. ie. on a dataset where the order of the matrix is (3325,3325), python took this long: 362.25user 0.74system 6:09.78elapsed 98%CPU and fortran took this long: 2.68user 1.12system 0:03.89elapsed 97%CPU is this because the element by element calculations involved are contained in python for loops? thanks #!/usr/bin/python from Numeric import * def nrm(pedigree): n_animals = len(pedigree) + 1 nrm = zeros((n_animals,n_animals),Float) for i in xrange(1,n_animals): isire = pedigree[i-1][1] idam = pedigree[i-1][2] nrm[i,i] = 1.0 + 0.5 * nrm[isire,idam] for j in xrange(i+1,n_animals): jsire = pedigree[j-1][1] jdam = pedigree[j-1][2] nrm[j,i] = 0.5 * (nrm[jsire,i] + nrm[jdam,i]) nrm[i,j] = nrm[j,i] return nrm if __name__ == '__main__': test_ped = [(1,0,0),(2,0,0),(3,1,0),(4,1,2), (5,3,4),(6,1,4),(7,5,6)] a = nrm(test_ped) print a Geocrawler.com - The Knowledge Archive From da at ski.org Thu Feb 17 18:25:57 2000 From: da at ski.org (David Ascher) Date: Thu, 17 Feb 2000 15:25:57 -0800 Subject: [Numpy-discussion] more speed? References: <200002171734.JAA08011@www.geocrawler.com> Message-ID: <04d501bf799e$5a82abd0$0100000a@ski.org> From: andrew x swan > python took this long: > > 362.25user 0.74system 6:09.78elapsed 98%CPU > > and fortran took this long: > > 2.68user 1.12system 0:03.89elapsed 97%CPU > > is this because the element by element > calculations involved are contained in python for > loops? yes. --david ascher From syrus at long.ucsd.edu Thu Feb 17 18:27:29 2000 From: syrus at long.ucsd.edu (Syrus Nemat-Nasser) Date: Thu, 17 Feb 2000 15:27:29 -0800 (PST) Subject: [Numpy-discussion] more speed? In-Reply-To: <200002171734.JAA08011@www.geocrawler.com> Message-ID: On Thu, 17 Feb 2000, andrew x swan wrote: > is this because the element by element > calculations involved are contained in python for > loops? Hi Andrew! I've only just begun using Numeric Python, but I'm a long-time user of GNU Octave and a sporadic user of MatLab. In general, for loops kill the execution speed of interpretive environments like Numpy and Octave. The high-speed comes when one uses vector operations such as Matrix multiplication. If you can vectorize your code, meaning replace all the loops with matrix operations, you should see equivalent speed to Fortran for large data sets. As far as I know, you will never see an interpreted language match a compiled one in the execution of for loops. Thanks. Syrus. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Syrus Nemat-Nasser UCSD Physics Dept. From peter at eexpc.eee.nott.ac.uk Fri Feb 18 12:05:06 2000 From: peter at eexpc.eee.nott.ac.uk (Peter Chang) Date: Fri, 18 Feb 2000 17:05:06 +0000 (GMT) Subject: [Numpy-discussion] numpy documentation - alternative format? Message-ID: Hi there, I've just started to use python and numpy and want to print out the numpy document but the PDF file has a strange aspect ratio which makes it hard to print it as 2up on A4 paper. (I've tried hacking about with the postscript generated by xpdf but it seems that there is no global setting for page size!) Could the authors please provide alternative formats for the doc, eg. as postscript files sized for A4 and letter so that people can print them out easier? Thanks Peter From roitblat at hawaii.edu Fri Feb 18 12:14:22 2000 From: roitblat at hawaii.edu (Herbert L. Roitblat) Date: Fri, 18 Feb 2000 07:14:22 -1000 Subject: [Numpy-discussion] numpy documentation - alternative format? Message-ID: <03fd01bf7a33$9b046320$8fd6afcf@0gl1u.pixi.com> Adobe Acrobat has a shrink to fit option in their print menu. I'm not sure if it comes with their free-reader. Try printing as a 1up. It seems a small adaptation. HLR -----Original Message----- From: Peter Chang To: Numpy-discussion at lists.sourceforge.net Date: Friday, February 18, 2000 7:09 AM Subject: [Numpy-discussion] numpy documentation - alternative format? > >Hi there, > >I've just started to use python and numpy and want to print out the numpy >document but the PDF file has a strange aspect ratio which makes it hard >to print it as 2up on A4 paper. (I've tried hacking about with the >postscript generated by xpdf but it seems that there is no global setting >for page size!) > >Could the authors please provide alternative formats for the doc, eg. >as postscript files sized for A4 and letter so that people can print them >out easier? > >Thanks > Peter > > > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >http://lists.sourceforge.net/mailman/listinfo/numpy-discussion > From peter at eexpc.eee.nott.ac.uk Fri Feb 18 12:18:46 2000 From: peter at eexpc.eee.nott.ac.uk (Peter Chang) Date: Fri, 18 Feb 2000 17:18:46 +0000 (GMT) Subject: [Numpy-discussion] numpy documentation - alternative format? In-Reply-To: <03fd01bf7a33$9b046320$8fd6afcf@0gl1u.pixi.com> Message-ID: On Fri, 18 Feb 2000, Herbert L. Roitblat wrote: > Adobe Acrobat has a shrink to fit option in their print menu. I'm not sure > if it comes with their free-reader. Is it available for Linux? I'll check it out... > Try printing as a 1up. It seems a small adaptation. I'm trying to save dead trees, i.e. print out 40 odd pages instead of 90 odd. Peter From sanner at scripps.edu Sat Feb 19 22:50:16 2000 From: sanner at scripps.edu (Michel Sanner) Date: Sat, 19 Feb 2000 19:50:16 -0800 Subject: [Numpy-discussion] Numeric Python under IRIX646 Message-ID: <1000219195017.ZM77150@noah.scripps.edu> Hi There, I just tried to add SGI running IRIX6.5 to the collection of Unix boxes I will support and I ran into the following problem: If I compile Python -O2 loading the Numeric extensions dumps the core, if I compile Python -g it works just fine and this regardless if Numeric is compile -g or -O2. After I re-compiled Objects/complexobject.o using -g (everything else being compiled -O2) I got it to work ... did anyone else out there see this kind of behavior ? I also post this to psa-members just in case this might be Python related -Michel -- ----------------------------------------------------------------------- >>>>>>>>>> AREA CODE CHANGE <<<<<<<<< we are now 858 !!!!!!! Michel F. Sanner Ph.D. The Scripps Research Institute Assistant Professor Department of Molecular Biology 10550 North Torrey Pines Road Tel. (858) 784-2341 La Jolla, CA 92037 Fax. (858) 784-2860 sanner at scripps.edu http://www.scripps.edu/sanner ----------------------------------------------------------------------- From mitch.chapman at mciworld.com Mon Feb 21 13:01:59 2000 From: mitch.chapman at mciworld.com (Mitch Chapman) Date: Mon, 21 Feb 2000 11:01:59 -0700 Subject: [Numpy-discussion] Re: [PSA MEMBERS] Numeric Python under IRIX646 In-Reply-To: <1000219195017.ZM77150@noah.scripps.edu> References: <1000219195017.ZM77150@noah.scripps.edu> Message-ID: <00022111060701.00593@mchapmanpc> On Sat, 19 Feb 2000, Michel Sanner wrote: > Hi There, > > I just tried to add SGI running IRIX6.5 to the collection of Unix boxes I will > support and I ran into the following problem: > > If I compile Python -O2 loading the Numeric extensions dumps the core, > if I compile Python -g it works just fine and this regardless if Numeric is > compile -g or -O2. > > After I re-compiled Objects/complexobject.o using -g (everything else being > compiled -O2) I got it to work ... > > did anyone else out there see this kind of behavior ? I saw exactly this behavior just last Friday afternoon. After all of Python was recompiled with -g the bus error went away. Thanks for pointing out that only complexobject needs to be compiled with -g. It didn't occur to me to try this, despite the location of the bus error, because it was possible to exercise complex objects interactively w. no problems. BTW I don't know whether you were compiling N32 or N64. In our case N32 created the bus error. -- Mitch Chapman mitch.chapman at mciworld.com From hinsen at cnrs-orleans.fr Fri Feb 25 07:26:58 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 25 Feb 2000 13:26:58 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] Numeric Array: adding a 0-D array to a cell in a 2-D array In-Reply-To: <019901bf7e93$8771d5e0$8fd6afcf@0gl1u.pixi.com> (roitblat@hawaii.edu) References: <019901bf7e93$8771d5e0$8fd6afcf@0gl1u.pixi.com> Message-ID: <200002251226.NAA14777@chinon.cnrs-orleans.fr> > We get the type error from trying to set the matrix element with a matrix > element (apparently). In the old version (1.9) on our NT box, > temp=a[kwd,kwd] results in temp being an int type. How can we either cast > the temp to an int or enable what we really want, which is to add an int to > a[kwd,kwd], as in a[kwd,kwd] = a[kwd,kwd] + jwd ? > > Do we have a bad version of Numeric? Maybe an experimental version. If you check the archives of this mailing list, you can find a recent discussion about proposed modifications. One of them was to eliminate the automatic conversion of rank-0 arrays to scalars, in order to prevent type promotion. Perhaps this proposal was implemented in the version you have. Note to the NumPy maintainers: please announce all new releases on this list, mentioning changes, especially those that affect backward compatibility. As a maintainer of code that makes heavy use of NumPy, I keep getting questions and bug reports caused by some new NumPy release that I haven't even heard of. A recent example is the change of location of the header files; C modules using arrays now have to include Numeric/arrayobject.h instead of simply arrayobject.h. I can understand this change (although I am not sure it's important enough to break compatibility), but I'd have preferred to learn about it directly and as early as possible. It's really no fun working through a 2 KB bug report sent by someone with zero knowledge of C. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis at mayo.edu Fri Feb 25 15:23:01 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 25 Feb 2000 14:23:01 -0600 (CST) Subject: [Numpy-discussion] Array-casting problem. Message-ID: Hi Herb, It has taken awhile for me to respond to this, but your problem's here illustrate exactly the kinds of difficulties one encounters with the current NumPy coercion rules: You do not have a bad version of Numeric. The behavior you describe is exactly what "should" happen though it needs to be fixed. I'll trace for you exactly what is going on as it could be illustrative to others: >>> a = zeros((5,5),'b') # You've just created a 5x5 byte array that follows "normal" coercion # rules filled with zeros. >>> a[3,3] = 8 # This line copies the rank-0 array of type 'b' created from the Python # Integer 8 (by a direct coercion in C) into element (3,3) of matrix a >>> temp = a[3,3] # This selects out the rank-0 array of typecode 'b' at position (3,3). As # of 15.2 this is nolonger changed to a scalar. Note that rank-0 arrays # act alot like scalars, but because there is not a one-to-one # correspondence between the Python Scalars and rank-0 arrays, this is not # automatically converted to a Python scalar (this is a change in 15.2) >>> temp = temp + 3 # This is the problem line for you right here. Something is wrong though, # since it should not be, a problem. # You are adding a rank-0 array of typecode 'b' to a Python Integer which # is interpreted by Numeric as a rank-0 array of typecode 'l'. The result # should be a Python Integer. For some reason this is returning an array # of typecode 'i' (which does not get automatically converted to a Python # scalar). >>> a[3,3] = temp # This would work fine if temp were the Python scalar it should be. # Right now, assignment doesn't let you assign an array of a "larger" type # to elements of a smaller type (except for Python scalars). Since temp # is (incorrectly I think) a type 'i' rank-0 array, it does not let you # make the assignment. At any rate it is inconsistent to let you assign # Python scalars but not rank-0 arrays of arbitrary precision, this should # be fixed. It is also a problem that temp + 3 returns an array of # typecode 'i'. I will look into fixing the above problems this example points out. Of course, it could also be fixed by having long integers lower in the coercion tree than byte arrays. Thanks for the feedback, Travis Oliphant From Oliphant.Travis at mayo.edu Fri Feb 25 15:57:34 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 25 Feb 2000 14:57:34 -0600 (CST) Subject: [Numpy-discussion] Casting problems with new version of NumPy. Message-ID: The code sent by Herbert Roitblat pointed out some inconsistencies in the current NumPy, that I've fixed with two small changes: 1) Long's can no longer be safely cast to Int's (this is not safe on 64-bit machines anyway) -- this makes Numeric more consistent with how it interprets Python integers. 2) Automatic casting will be done when using rank-0 arrays to set elements of a Numeric array to be consistent with the behavior for Python scalars. The changes are in CVS right now, but are simple to change back if there is a problem. -Travis From collins at rushe.aero.org Mon Feb 28 13:17:38 2000 From: collins at rushe.aero.org (JEFFERY COLLINS) Date: Mon, 28 Feb 2000 10:17:38 -0800 Subject: [Numpy-discussion] Matrix.py problem Message-ID: <200002281817.KAA04027@rushe.aero.org> I installed the Numpy 15.2 and got the following error during the import of Matrix. Apparently, the version number is no longer embedded in the module doc string following the # sign. >>> import Matrix Traceback (innermost last): File "", line 1, in ? File "/usr/local/lib/python1.5/site-packages/Numeric/Matrix.py", line 5, in ? __version__ = int(__id__[string.index(__id__, '#')+1:-1]) File "/usr/local/lib/python1.5/string.py", line 138, in index return _apply(s.index, args) ValueError: substring not found in string.index Jeff From vanandel at atd.ucar.edu Wed Feb 2 13:35:34 2000 From: vanandel at atd.ucar.edu (Joe Van Andel) Date: Wed, 02 Feb 2000 11:35:34 -0700 Subject: [Numpy-discussion] single precision routines in NumPy? Message-ID: <389878F6.B7F2DAF@atd.ucar.edu> I would like a single precision version of 'interp' in the Numeric Core. (I want such a routine since I'm operating on huge single precision arrays, that I don't want promoted to double precision.) I've written such a routine, but Paul Dubois and I are discussing the best way of integrating it into the core. One solution is to simply add a new function 'interpf' to arrayfnsmodule.c . Another solution is to add a typecode=Float option to interp. Any opinions on how this single precision version be handled? -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel at ucar.edu From tla at research.nj.nec.com Thu Feb 3 16:57:41 2000 From: tla at research.nj.nec.com (Tom Adelman) Date: Thu, 03 Feb 2000 16:57:41 -0500 Subject: [Numpy-discussion] newbie: PyArray_Check difficulties Message-ID: <3.0.1.32.20000203165741.00958d00@zingo.nj.nec.com> I'm having a problem with PyArray_Check. If I just call PyArray_Check(args) I don't have a problem, but if I try to assign the result to anything, etc., it crashes (due to acces violation). So, for example the code at the end of this note doesn't work, yet I know an array is being passed and I can, for example, calculate its trace correctly if I type cast it as a PyArrayObject*. Also, a more general question: is this the recommended way to input numpy arrays when using swig, or do most people find it easier to use more elaborate typemaps, or something else? Finally, I apologize if this is the wrong forum to post this question. Please let me know. Thanks, Tom Method from C++ class: PyObject * Test01::trace(PyObject * args) { if (!(PyArray_Check(args))) { // <- crashes here PyErr_SetString(PyExc_ValueError, "must use NumPy array"); return NULL; } return NULL; } Swig file: (where typemaps are the ones included with most recent swig) /* TMatrix.i */ %module Ptest %include "typemaps.i" %{ #include "Test01.h" %} class Test01 { public: PyObject * trace(PyObject *INPUT); Test01(); virtual ~Test01(); }; Python code: import Ptest t = Ptest.Test01() import Numeric a = Numeric.arange(1.1, 2.7, .1) b = Numeric.reshape(a, (4,4)) x = t.trace(b) From Oliphant.Travis at mayo.edu Fri Feb 4 15:49:34 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 4 Feb 2000 14:49:34 -0600 (CST) Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #7 - 1 msg In-Reply-To: <200002042005.MAA15424@lists.sourceforge.net> Message-ID: > I'm having a problem with PyArray_Check. If I just call > PyArray_Check(args) I don't have a problem, but if I try to assign the > result to anything, etc., it crashes (due to acces violation). So, for > example the code at the end of this note doesn't work, yet I know an array > is being passed and I can, for example, calculate its trace correctly if I > type cast it as a PyArrayObject*. > > Also, a more general question: is this the recommended way to input numpy > arrays when using swig, or do most people find it easier to use more > elaborate typemaps, or something else? I have some experience with SWIG but it is not my favorite method to use Numerical Python with C, since you have so little control over how things get allocated. Your problem is probably due to the fact that you do not run import_array() in the module header. There is a typemap in SWIG that let's you put commands to run at module initialization. Try this in your *.i file. %init %{ import_array(); %} This may help. Best, Travis From Oliphant.Travis at mayo.edu Mon Feb 7 19:08:43 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Mon, 7 Feb 2000 18:08:43 -0600 (CST) Subject: [Numpy-discussion] An Experiment in code-cleanup. Message-ID: I wanted to let users of the community know (so they can help if they want, or offer criticism or comments) that over the next several months I will be experimenting with a branch of the main Numerical source tree and endeavoring to "clean-up" the code for Numerical Python. I have in mind a few (in my opinion minor) alterations to the current code-base which necessitates a branch. Guido has made some good suggestions for improving the code base, and both David Ascher and Paul Dubois have expressed concerns over the current state of the source code and given suggestions as to how to improve it. That said, I should emphasize that my work is not authorized, or endorsed, by any of the people mentioned above. It is simply my little experiment. My intent is not to re-create Numerical Python --- I like most of the current functionality --- but to merely, clean-up the code, comment it, and change the underlying structure just a bit and add some features I want. One goal I have is to create something that can go into Python 1.7 at some future point, so this incarnation of Numerical Python may not be completely C-source compatible with current Numerical Python (but it will be close). This means C-extensions that access the underlying structure of the current arrayobject may need some alterations to use this experimental branch if it every becomes useful. I don't know how long this will take me. I'm not promising anything. The purpose of this announcement is just to invite interested parties into the discussion. These are the (somewhat negotiable) directions I will be pursuing. 1) Still written in C but heavily (in my opinion) commented. 2) Addition of bit-types and unsigned integer types. 3) Facility for memory-mapped dataspace in arrays. 4) Slices become copies with the addition of methods for current strict referencing behavior. 5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible). 6) Rank-0 arrays will not be autoconverted to Python scalars, but will still behave as Python scalars whenever Python allows general scalar-like objects in it's operations. Methods will allow the user-controlled conversion to the Python scalars. 7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be warned, however, that I like actual code or code-templates more than just great ideas (truly great ideas are never turned away however ;-) ) If something I do benefits the current NumPy source in a non-invasive, backwards compatible way, I will try to place it in the current CVS tree, but that won't be a priority, as my time does have limitations, and I'm scratching my own itch at this point. Best regards, Travis Oliphant From dubois1 at llnl.gov Mon Feb 7 19:22:45 2000 From: dubois1 at llnl.gov (Paul F. Dubois) Date: Mon, 7 Feb 2000 16:22:45 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: Message-ID: Travis says that I don't necessarily endorse his goals but in fact I do, strongly. If I understand right he intends to make a CVS branch for this experiment and that is fine with me. The only goal I didn't quite understand was: Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. In a world of reusable components the situation is complicated. I would not like to support a dot-product routine, for example, if the user could turn off any double precision behind my back. My needs for precision are local to my algorithm. From archiver at db.geocrawler.com Tue Feb 8 10:52:47 2000 From: archiver at db.geocrawler.com (John Travers) Date: Tue, 8 Feb 2000 07:52:47 -0800 Subject: [Numpy-discussion] Re: A proposal for dot product Message-ID: <200002081552.HAA10267@www.geocrawler.com> This message was sent from Geocrawler.com by "John Travers" Be sure to reply to that address. If the above was implemented, I would be very happy indeed. As a maths student, I use NumPy a lot. And get infuriated with the current implementation. John. Geocrawler.com - The Knowledge Archive From hinsen at cnrs-orleans.fr Tue Feb 8 12:12:56 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 8 Feb 2000 18:12:56 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Mon, 7 Feb 2000 18:08:43 -0600 (CST)) References: Message-ID: <200002081712.SAA03158@chinon.cnrs-orleans.fr> > 3) Facility for memory-mapped dataspace in arrays. I'd really like to have that... > 4) Slices become copies with the addition of methods for current strict > referencing behavior. This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is? > 5) Handling of sliceobjects which consist of sequences of indices (so that > setting and getting elements of arrays using their index is possible). Sounds good as well... > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > still behave as Python scalars whenever Python allows general scalar-like > objects in it's operations. Methods will allow the > user-controlled conversion to the Python scalars. I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare. > 7) Addition of attributes so that different users can configure aspects of > the math behavior, to their hearts content. You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them. > If their is anyone interested in helping in this "unofficial branch > work" let me know and we'll see about setting up someplace to work. Be I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at dirac.cnrs-orleans.fr Tue Feb 8 12:13:20 2000 From: hinsen at dirac.cnrs-orleans.fr (hinsen at dirac.cnrs-orleans.fr) Date: Tue, 8 Feb 2000 18:13:20 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Mon, 7 Feb 2000 18:08:43 -0600 (CST)) References: Message-ID: <200002081713.SAA03161@chinon.cnrs-orleans.fr> > 3) Facility for memory-mapped dataspace in arrays. I'd really like to have that... > 4) Slices become copies with the addition of methods for current strict > referencing behavior. This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is? > 5) Handling of sliceobjects which consist of sequences of indices (so that > setting and getting elements of arrays using their index is possible). Sounds good as well... > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > still behave as Python scalars whenever Python allows general scalar-like > objects in it's operations. Methods will allow the > user-controlled conversion to the Python scalars. I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare. > 7) Addition of attributes so that different users can configure aspects of > the math behavior, to their hearts content. You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them. > If their is anyone interested in helping in this "unofficial branch > work" let me know and we'll see about setting up someplace to work. Be I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis at mayo.edu Tue Feb 8 12:38:26 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Tue, 8 Feb 2000 11:38:26 -0600 (CST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <200002081712.SAA03158@chinon.cnrs-orleans.fr> Message-ID: > > 3) Facility for memory-mapped dataspace in arrays. > > I'd really like to have that... This is pretty easy to add but it does require some changes to the underlying structure, So you can expect it. > > > 4) Slices become copies with the addition of methods for current strict > > referencing behavior. > > This will break a lot of code, and in a way that will be difficult to > debug. In fact, this is the only point you mention which would be > reason enough for me not to use your modified version; going through > all of my code to check what effect this might have sounds like a > nightmare. I know this will be a sticky point. I'm not sure what to do exactly, but the current behavior and implementation makes the semantics for slicing an array using a sequence problematic since I don't see a way to represent a reference to a sequence of indices in the underlying structure of an array. So such slices would have to be copies and not references, which makes for an inconsistent code. > > I see the point of having a copying version as well, but why not > implement the copying behaviour as methods and leave indexing as it > is? I want to agree with you, but I think we may need to change the behavior eventually so when is it going to happen? > > > 5) Handling of sliceobjects which consist of sequences of indices (so that > > setting and getting elements of arrays using their index is possible). > > Sounds good as well... This facility is already embedded in the underlying structure. My plan is to go with the original idea that Jim Hugunin and Chris Chase had for slice objects. The sliceobject in python is already general enough for this to work. > > > 6) Rank-0 arrays will not be autoconverted to Python scalars, but will > > still behave as Python scalars whenever Python allows general scalar-like > > objects in it's operations. Methods will allow the > > user-controlled conversion to the Python scalars. > > I suspect that full behaviour-compatibility with scalars is > impossible, but I am willing to be proven wrong. For example, Python > scalars are immutable, arrays aren't. This also means that rank-0 > arrays can't be used as keys in dictionaries. > > How do you plan to implement mixed arithmetic with scalars? If the > return value is a rank-0 array, then a single library returning > a rank-0 array somewhere could mess up a program well enough that > debugging becomes a nightmare. > Mixed arithmetic in general is another sticky point. I went back and read the discussion of this point which occured 1995-1996. It was very interesting reading and a lot of points were made. Now we have several years of experience and we should apply what we've learned (of course we've all learned different things :-) ). Konrad, you had a lot to say on this point 4 years ago. I've had a long discussion with a colleague who is starting to "get in" to Numerical Python and he has really been annoyed with the current mixed arithmetic rules. The seem to try to outguess the user. The spacesaving concept helps, but it still seem's like a hack to me. I know there are several opinions, so I'll offer mine. We need simple rules that are easy to teach a newcomer. Right now the rule is farily simple in that coercion always proceeds up. But, mixed arithmetic with a float and a double does not produce something with double precision -- yet that's our rule. I think any automatic conversion should go the other way. Konrad, 4 years ago, you talked about unexpected losses of precision if this were allowed to happen, but I couldn't understand how. To me, it is unexpected to have double precision arrays which are really only carrying single-precision results. My idea of the coercion hierchy is shown below with conversion always happening down when called for. The Python scalars get mapped to the "largest precision" in their category and then normal coercions rules take place. The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request them, they don't want them suddenly changing precision. That is my take anyway. Boolean Character Unsigned long int short Signed long int short Real /* long double */ double float Complex /* __complex__ long double */ __complex__ double __complex__ float Object > > 7) Addition of attributes so that different users can configure aspects of > > the math behavior, to their hearts content. > > You mean global attributes? That could be the end of universally > usable library modules, supposing that people actually use them. I thought I did, but I've changed my mind after reading the discussion in 1995. I don't like global attributes either, so I'm not going there. > > > If their is anyone interested in helping in this "unofficial branch > > work" let me know and we'll see about setting up someplace to work. Be > > I don't have much time at the moment, but I could still help out with > testing etc. Konrad you were very instrumental in getting NumPy off the ground in the first place and I will always appreciate your input. From pauldubois at home.com Tue Feb 8 12:56:11 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Tue, 8 Feb 2000 09:56:11 -0800 Subject: [Numpy-discussion] precision isn't just precision In-Reply-To: Message-ID: Before we all rattle on too long about precision, I'd like to point out that selecting a precision actually carries two consequences in the context of computer languages: 1. Expected: The number of digits of accuracy in the representation of a floating point number. 2. Unexpected: The range of numbers that can be represented by this type. Thus, to a scientist it is perfectly logical that if d is a double and f is a single, d * f has only single precision validity. Unfortunately in a computer if you hold this answer in a single, then it may fail if the contents of d include numbers outside the single range, even if f is 1.0. Thus the rules in C and Fortran that coercion is UP had to do as much with range as precision. From pearu at ioc.ee Tue Feb 8 14:46:16 2000 From: pearu at ioc.ee (Pearu Peterson) Date: Tue, 8 Feb 2000 21:46:16 +0200 (EET) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: Message-ID: On Tue, 8 Feb 2000, Travis Oliphant wrote: > I know there are several opinions, so I'll offer mine. We need > simple rules that are easy to teach a newcomer. Right now the rule is > farily simple in that coercion always proceeds up. But, mixed arithmetic > with a float and a double does not produce something with double > precision -- yet that's our rule. I think any automatic conversion should > go the other way. Remark: If you are consistent then you say here that mixed arithmetic with an int and a float/double produces int?! Right? (I hope that I am wrong.) > Boolean > Character > Unsigned > long > int > short > Signed > long > int > short How about `/* long long */'? Is this left out intentionally? > Real > /* long double */ > double > float Travis, while you are doing revision on NumPy, could you also estimate the degree of difficulty of introducing column major order arrays? Pearu From hinsen at cnrs-orleans.fr Tue Feb 8 14:56:21 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 8 Feb 2000 20:56:21 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: (message from Travis Oliphant on Tue, 8 Feb 2000 11:38:26 -0600 (CST)) References: Message-ID: <200002081956.UAA03241@chinon.cnrs-orleans.fr> > I know this will be a sticky point. I'm not sure what to do exactly, but > the current behavior and implementation makes the semantics for slicing an > array using a sequence problematic since I don't see a way to represent a You are right there. But is it really necessary to extend the meaning of slices? Of course everyone wants the functionality of indexing with a sequence, but I'd be perfectly happy to have it implemented as a method. Indexing would remain as it is (by reference), and a new method would provide copying behaviour for element extraction and also permit more generalized sequence indices. In addition to backwards compatibility, there is another argument for keeping indexing behaviour as it is: compatibility with other Python sequence types. If you have a list of lists, which in many ways behaves like a 2D array, and extract the third element (which is thus a list), then this data is shared with the full nested list. > > How do you plan to implement mixed arithmetic with scalars? If the > > return value is a rank-0 array, then a single library returning > > a rank-0 array somewhere could mess up a program well enough that > > debugging becomes a nightmare. > > > > Mixed arithmetic in general is another sticky point. I went back and read > the discussion of this point which occured 1995-1996. It was very What I meant was not mixed-precision arithmetic, but arithmetic in which one operand is a scalar and the other one a rank-0 array. Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays. The elements of a list of numbers are numbers, not number-like sequence objects. But back to precision, which is also a popular subject: > discussion with a colleague who is starting to "get in" to Numerical > Python and he has really been annoyed with the current mixed arithmetic > rules. The seem to try to outguess the user. The spacesaving concept > helps, but it still seem's like a hack to me. I wouldn't say that the current system tries to outguess the user. It simply gives precision a higher priority than memory space. That might not coincide with what a particular user wants, but it is consistent and easy to understand. > I know there are several opinions, so I'll offer mine. We need > simple rules that are easy to teach a newcomer. Right now the rule is > farily simple in that coercion always proceeds up. But, mixed arithmetic Like in Python (for scalars), C, Fortran, and all other languages that I can think of. > Konrad, 4 years ago, you talked about unexpected losses of precision if > this were allowed to happen, but I couldn't understand how. To me, it is > unexpected to have double precision arrays which are really only > carrying single-precision results. My idea of the coercion hierchy is I think this is a confusion of two different meanings of "precision". In numerical algorithms, precision refers to the deviation between an ideal and a real numerical value. In programming languages, it refers to the *maximum* precision that can be stored in a given data type (and is in fact often combined with a difference in range). The upcasting rule thus ensures that 1) No precision is lost accidentally. If you multiply a float by a double, the float might contain the exact number 2, and thus have infinite precision. The language can't know this, so it acts conservatively and chooses the "bigger" type. 2) No overflow occurs unless it is unavoidable (the range problem). > The casual user will never use single precision arrays and so will not > even notice they are there unless they request them. If they do request There are many ways in which single-precision arrays can creep into a program without a user's attention. Suppose you send me some data in a pickled array, which happens to be single-precision. Or I call a library routine that does some internal calculation on huge data arrays, which it keeps at single precision, and (intentionally or by error) returns a single-precision result. I think your "active flag" solution is a rather good solution to the casting problem, because it gives access to a different behaviour in a very explicit way. So unless future experience points out problems, I'd propose to keep it. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Barrett at stsci.edu Tue Feb 8 15:10:39 2000 From: Barrett at stsci.edu (Paul Barrett) Date: Tue, 8 Feb 2000 15:10:39 -0500 (EST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: <14496.16890.698835.619131@nem-srvr.stsci.edu> Message-ID: <14496.26037.829754.450187@nem-srvr.stsci.edu> Travis Oliphant writes: > > > > 1) The re-use of temporary arrays -- to conserve memory. > > Please elaborate about this request. When Python evaluates the expression: >>> Y = B*X + A where A, B, X, and Y are all arrays, B*X creates a temporary array, T. A new array, Y, will be created to hold the result of T + A, and T will be deleted. If T and Y have the same shape and typecode, then instead of creating Y, T can be re-used to conserve memory. > > > > 2) A copy-on-write option -- to enhance performance. > > > > I need more explanation of this as well. This would be an advanced feature of arrays that use memory-mapping or access their arrays from disk. It is similar to the secondary cache of a CPU. The data is held in memory until a write request is made. > > > > 3) The initialization of arrays by default -- to help novices. > > What kind of initialization are you taking about (we have zeros and ones > and random already). For mixed-type (or object) arrays containing strings, zeros() and ones() would be confusing. Therefore by default, integer and floating types are initialized to 0 and string types to ' ', and the option would be available to not initialize the array for performance. > > > > 4) The creation of a standard API -- which I guess is assumed, if it > > is to be part of the Python standard distribution. > > Any suggestions as to what needs to be changed in the already somewhat > standard API. No, not exactly. But the last time I looked, I thought some improvements could be made to it. > > > > 5) The inclusion of IEEE support. > > This was supposed to be there from the beginning, but it didn't get > finished. Jim's original idea was to have two math modules, one which > checked and gave error's for 1/0 and another that returned IEEE inf for > 1/0. > > The current umath does both with different types which is annoying. When I last spoke to Jim about this at IPC6, I was under the impression that IEEE support was not fully implemented and much work still needed to be done. Has this situation changed since then? > > > > And > > > > 6) Enhanced support for mixed-types or objects. > > > > This last issue is very import to me and the astronomical community, > > since we routinely store data as (multi-dimensional) arrays of fixed > > length records or C-structures. A current deficiency of NumPy is that > > the object typecode does not work with the fromstring() method, so > > importing arrays of records from a binary file is just not possible. > > I've been developing my own C-extension type to handle this situation > > and have come to realize that my record type is really just a > > generalization of NumPy's types. > > > I would like to see the code for your generalized type which would help me > see if there were some relatively painless way the two could be merged. recordmodule.c is part of my PyFITS module for dealing with FITS files. You can find it here: ftp://ra.stsci.edu/pub/barrett/PyFITS_0.3.tgz I use NumPy to access fixed-type arrays and the record type for accessing mixed-type arrays. A common example is accessing the second element of a mixed-type (ie. an object) from the entire array. This returns a record type with a single element, which is equivalent to a NumPy array of fixed type. Therefore users expect this object to be a NumPy array and it isn't. They have to convert it to one. > > two C-extension types merged. I think this enhancement can be done > > with minimal change to the current NumPy behavior and minor changes to > > the typecode system. > > If you already see how to do it, then great. Note that NumPy already has some support for an Object type. It has been proposed that it be removed, because it is not well supported and hence few people use it. I have the contrary opinion and feel we should enhance the Object type and make it much more usable. If you don't need it, then you don't have to use it. This enhancement really shouldn't get in the way of those who only use fixed-type arrays. So what changes to NumPy are needed? 1) Instead of a typecode (or in addition to the typecode for backward compatibility), I suggest an optional format keyword, which can be used to specify the mixed-type or object format. Namely, format = 'i, f, s10', where 'i' is an integer type, 'f' a floating point type, and s10 is a string of 10 characters. 2) Array access will be the same as it is now. For example # Create a 10x10 mixed-type array. A = array((10, 10), format = 'i, f, 10s') # Create a 10x10 fixed-type array. B = array((10, 10), typecode = 'i') # Print a 5x5 subarray of mixed-type. print A[:5,:5] # Print a 5x5 subarray of fixed-type print B[:5,:5] # Or # (Note that the 3rd index is optional for fixed-type arrays, it # always defaults to 0.) print B[:5,:5,0] # Print the second element of the mixed-type of the entire array. # Note that this is now an array of fixed-type. print A[:,:,1] The major thorn that I see at this point is how to reconcile the behavior of numbers and strings during operations. But I don't see this as an intractable problem. I actually believe this enhancement will encourage us to create a better and more generic multi-dimensional array module by concentrating on the behavioral aspects of this extension type. Note that J, which NumPy is base upon, allows such mixed-types. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-516-6714 DESD/DPT FAX: 410-516-8615 Baltimore, MD 21218 From pauldubois at home.com Tue Feb 8 15:16:55 2000 From: pauldubois at home.com (Paul F. Dubois) Date: Tue, 8 Feb 2000 12:16:55 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: Konrad wrote: > > In addition to backwards compatibility, there is another argument for > keeping indexing behaviour as it is: compatibility with other Python > sequence types. I claim the current Numeric is INconsistent with other Python sequence types: >>> x = [1, 2, 3, 4, 5] >>> y = x[2:5] >>> x [1, 2, 3, 4, 5] >>> y [3, 4, 5] >>> y[1] = 7 >>> y [3, 7, 5] >>> x [1, 2, 3, 4, 5] So, y is a copy of x[2:5], not a reference. From DavidA at ActiveState.com Tue Feb 8 15:30:23 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 8 Feb 2000 12:30:23 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <14496.26037.829754.450187@nem-srvr.stsci.edu> Message-ID: <000101bf7273$531e3f80$c355cfc0@ski.org> > So what changes to NumPy are needed? > > 1) Instead of a typecode (or in addition to the typecode for backward > compatibility), I suggest an optional format keyword, which can be > used to specify the mixed-type or object format. Namely, format = > 'i, f, s10', where 'i' is an integer type, 'f' a floating point > type, and s10 is a string of 10 characters. I'd suggest to go all the way and make it a real object, not just a string. That object can then have useful attributes, like size in bytes, maxval, minval, some indication of precision, etc. Logically, itemsize should be an attribute of the numeric type of an array, not of the array itself. --david ascher From beausol at exch.hpl.hp.com Tue Feb 8 16:31:30 2000 From: beausol at exch.hpl.hp.com (Beausoleil, Raymond) Date: Tue, 8 Feb 2000 13:31:30 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. Message-ID: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> I've been reading the posts on this topic with considerable interest. For a moment, I want to emphasize the "code-cleanup" angle more literally than the functionality mods suggested so far. Several months ago, I hacked my personal copy of the NumPy distribution so that I could use the Intel Math Kernel Library for Windows. The IMKL is (1) freely available from Intel at http://developer.intel.com/vtune/perflibst/mkl/index.htm; (2) basically BLAS and LAPACK, with an FFT or two thrown in for good measure; (3) optimized for the different x86 processors (e.g., generic x86, Pentium II & III); (4) configured to use 1, 2, or 4 processors; and (5) configured to use multithreading. It is an impressive, fast implementation. I'm sure there are similar native libraries available on other platforms. Probably due to my inexperience with both Python and NumPy, it took me a couple of days to successfully tear out the f2c'd stuff and get the IMKL linking correctly. The parts I've used work fine, but there are probably other features that I haven't tested yet that still aren't up to snuff. In any case, the resulting code wasn't very pretty. As long as the NumPy code is going to be commented and cleaned up, I'd be glad to help make sure that the process of using a native BLAS/LAPACK distribution (which was probably compiled using Fortran storage and naming conventions) is more straightforward. Among the more tedious issues to consider are: (1) The extent of the support for LAPACK. Do we want to stick with LAPACK Lite? (2) The storage format. If we've still got row-ordered matrices under the hood, and we want to use native LAPACK libraries that were compiled using column-major format, then we'll have to be careful to set all of the flags correctly. This isn't going to be a big deal, _unless_ NumPy will support more of LAPACK when a native library is available. Then, of course, there are the special cases: the IMKL has both a C and a Fortran interface to the BLAS. (3) Through the judicious use of header files with compiler-dependent flags, we could accommodate the various naming conventions used when the FORTRAN libraries were compiled (e.g., sgetrf_ or SGETRF). The primary output of this effort would be an expansion of the "Compilation Notes" subsection of Section 15 of the NumPy documentation, and some header files that made the recompilation easier than it is now. Regards, Ray ============================ Ray Beausoleil Hewlett-Packard Laboratories mailto:beausol at hpl.hp.com Vox: 425-883-6648 Fax: 425-883-2535 HP Telnet: 957-4951 ============================ From Oliphant.Travis at mayo.edu Tue Feb 8 16:32:57 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Tue, 8 Feb 2000 15:32:57 -0600 (CST) Subject: [Numpy-discussion] Come take an informal survey. In-Reply-To: <200002082004.MAA26529@lists.sourceforge.net> Message-ID: In an effort to try and get data about what users' attitudes are toward Numerical Python, I'm conducting a survey at sourceforge.net If you would like to participate in the survey, please go to http://www.sourceforge.net, log-in with your sourceforge id and go to the numpy page: http://sourceforge.net/project/?group_id=1369 In the Public Survey section there is a short survey you can fill out. Thank you, Travis Oliphant NumPy Developer From phil at geog.ubc.ca Tue Feb 8 18:33:18 2000 From: phil at geog.ubc.ca (Phil Austin) Date: Tue, 8 Feb 2000 15:33:18 -0800 (PST) Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: Message-ID: <14496.42942.5355.849670@brant.geog.ubc.ca> Travis Oliphant writes: > > 3) Facility for memory-mapped dataspace in arrays. > For the NumPy users who are as ignorant about mmap, msync, and madvise as I am, I've put a couple of documents on my web site: 1) http://www.geog.ubc.ca/~phil/mmap/mmap.pdf A pdf version of Kevin Sheehan's paper: "Why aren't you using mmap yet?" (19 page Frame postscript orginal, page order back to front). He gives a good discusion of the SV4 VM model, with some mmap examples in C. 2) http://www.geog.ubc.ca/~phil/mmap/threads.html An archived email exchange (initially on the F90 mailing list) between Kevin (who is an independent Solaris consultant) and Brian Sumner (SGI) about the pros and cons of using mmap. Executive summary: i) mmap on Solaris can be a very big win (see bottom of http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when used in combination with WILLNEED/WONTNEED madvise calls to guide the page prefetching. ii) IRIX and some other Unices (Linux 2.2 in particular), haven't implemented madvise, and naive use of mmap without madvise can produce lots of page faulting and much slower io than, say, asynchronous io calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) So I'd love to see mmap in Numpy, but we may need to produce a tutorial outlining the tradeoffs, and giving some examples of madvise/msync/mmap used together (with a few benchmarks). Any mmap module would need to include member functions that call madvise/msync for the mmapped array (but these may be no-ops on several popular OSes.) Regards, Phil From jrwebb at goodnet.com Tue Feb 8 01:03:42 2000 From: jrwebb at goodnet.com (James R. Webb) Date: Mon, 7 Feb 2000 23:03:42 -0700 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> Message-ID: <001801bf71fa$41f681a0$01f936d1@janus> There is now a linux native BLAS available through links at http://www.cs.utk.edu/~ghenry/distrib/ courtesy of the ASCI Option Red Project. There is also ATLAS (http://www.netlib.org/atlas/). Either library seems to link into NumPy without a hitch. ----- Original Message ----- From: "Beausoleil, Raymond" To: Cc: Sent: Tuesday, February 08, 2000 2:31 PM Subject: RE: [Matrix-SIG] An Experiment in code-cleanup. > I've been reading the posts on this topic with considerable interest. For a > moment, I want to emphasize the "code-cleanup" angle more literally than the > functionality mods suggested so far. > > Several months ago, I hacked my personal copy of the NumPy distribution so > that I could use the Intel Math Kernel Library for Windows. The IMKL is > (1) freely available from Intel at > http://developer.intel.com/vtune/perflibst/mkl/index.htm; > (2) basically BLAS and LAPACK, with an FFT or two thrown in for good > measure; > (3) optimized for the different x86 processors (e.g., generic x86, Pentium > II & III); > (4) configured to use 1, 2, or 4 processors; and > (5) configured to use multithreading. > It is an impressive, fast implementation. I'm sure there are similar native > libraries available on other platforms. > > Probably due to my inexperience with both Python and NumPy, it took me a > couple of days to successfully tear out the f2c'd stuff and get the IMKL > linking correctly. The parts I've used work fine, but there are probably > other features that I haven't tested yet that still aren't up to snuff. In > any case, the resulting code wasn't very pretty. > > As long as the NumPy code is going to be commented and cleaned up, I'd be > glad to help make sure that the process of using a native BLAS/LAPACK > distribution (which was probably compiled using Fortran storage and naming > conventions) is more straightforward. Among the more tedious issues to > consider are: > (1) The extent of the support for LAPACK. Do we want to stick with LAPACK > Lite? > (2) The storage format. If we've still got row-ordered matrices under the > hood, and we want to use native LAPACK libraries that were compiled using > column-major format, then we'll have to be careful to set all of the flags > correctly. This isn't going to be a big deal, _unless_ NumPy will support > more of LAPACK when a native library is available. Then, of course, there > are the special cases: the IMKL has both a C and a Fortran interface to the > BLAS. > (3) Through the judicious use of header files with compiler-dependent flags, > we could accommodate the various naming conventions used when the FORTRAN > libraries were compiled (e.g., sgetrf_ or SGETRF). > > The primary output of this effort would be an expansion of the "Compilation > Notes" subsection of Section 15 of the NumPy documentation, and some header > files that made the recompilation easier than it is now. > > Regards, > > Ray > > ============================ > Ray Beausoleil > Hewlett-Packard Laboratories > mailto:beausol at hpl.hp.com > Vox: 425-883-6648 > Fax: 425-883-2535 > HP Telnet: 957-4951 > ============================ > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG at python.org > http://www.python.org/mailman/listinfo/matrix-sig > From amullhau at zen-pharaohs.com Wed Feb 9 01:51:09 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 01:51:09 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: <03f401bf72ca$0e0608e0$5063cb0a@amullhau> > I'd be perfectly happy to have it implemented as a > method. Indexing would remain as it is (by reference), and a new > method would provide copying behaviour for element extraction and also > permit more generalized sequence indices. I think I can live with that, as long as it _syntactically_ looks like indexing. This is one case where the syntax is more important than functionality. There are things you want to index with indices, etc., and the composition with parenthesis-like (Dyck language) syntax has proved to be one of the few readable ways to do it. > In addition to backwards compatibility, there is another argument for > keeping indexing behaviour as it is: compatibility with other Python > sequence types. If you have a list of lists, which in many ways > behaves like a 2D array, and extract the third element (which is thus > a list), then this data is shared with the full nested list. _Avoiding_ data sharing will eventually be more important that supporting data sharing since memory continues to get cheaper but memory bandwidth and latency do not improve at the same rate. Locality of reference is hard to control when there is a lot of default data sharing, and performance suffers, yet it becomes important on more and more scales as memory systems become more and more hierarchical. Ultimately, the _semantics_ we like will be implemented efficiently by emulating references and copies in code which copies and references as it sees fit and keeps track of which copies are "really" references and which references are really "copies". I've thought this through for the "everything gets copied" languages and it isn't too mentally distressing - you simply reference count fake copies. The "everything is a reference" languages are less clean, but the database people have confronted that problem. > Which reminds me: rank-0 arrays are also incompatible with the > nested-list view of arrays. There are ways out of that trap. Most post-ISO APLs provide examples of how to cope. > > I know there are several opinions, so I'll offer mine. We need > > simple rules that are easy to teach a newcomer. Right now the rule is > > farily simple in that coercion always proceeds up. But, mixed arithmetic > > Like in Python (for scalars), C, Fortran, and all other languages that > I can think of. And that is not a bad thing. But which way is "up"? (See example below.) > > Konrad, 4 years ago, you talked about unexpected losses of precision if > > this were allowed to happen, but I couldn't understand how. To me, it is > > unexpected to have double precision arrays which are really only > > carrying single-precision results. Most people always hate, and only sometimes detect, when that happens. It specifically contravenes the Geneva conventions on programming mental hygiene. > The upcasting rule thus ensures that > > 1) No precision is lost accidentally. More or less. More precisely, it depends on what you call an accident. What happens when you add the IEEE single precision floating point value 1.0 to the 32-bit integer 2^30? A _lot_ of people don't expect to get the IEEE single precision floating point value 2.0^30, but that is what happens in some languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more sense? Now what about the case where the 32 bit integer is signed and adding one to it will "wrap around" if the value remains an integer? Because these two examples might make double precision or a wider integer (if available) seem the correct answer, suppose it's only one element of a gigantic array? Let's now talk about complex values.... There's plenty of rough edges like this when you mix numerical types. It's guaranteed that everybody's ox will get gored somewhere. > 2) No overflow occurs unless it is unavoidable (the range problem). > > > The casual user will never use single precision arrays and so will not > > even notice they are there unless they request them. If they do request > > There are many ways in which single-precision arrays can creep into a > program without a user's attention. Absolutely. > Suppose you send me some data in a > pickled array, which happens to be single-precision. Or I call a > library routine that does some internal calculation on huge data > arrays, which it keeps at single precision, and (intentionally or by > error) returns a single-precision result. And the worst one is when the accuracy of the result is single precision, but the _type_ of the result is double precision. There is a function in S-plus which does this (without documenting it, of course) and man was that a pain in the neck to sort out. Today, I just found another bug in one of the S-plus functions - turns out that that if you hand a complex triangular matrix and a real right hand side to the triangular solver (backsubstitution) it doesn't cast the right hand side to complex and uses whatever values are subsequent in memoty to the right hand side as if they were part of the vector. Obviously, when testing the function, they didn't try this mixed type case. But interpreters are really convenient for writing code so that you _don't_ have to think about types all the time and do your own casting. Which is why stubbing your head on an unexpected cast is so unlooked for. > I think your "active flag" solution is a rather good solution to the > casting problem, because it gives access to a different behaviour in a > very explicit way. So unless future experience points out problems, > I'd propose to keep it. Is there a simple way to ensure that no active arrays are ever activated at any time when I use Numerical Python? Later, Andrew Mullhaupt From amullhau at zen-pharaohs.com Wed Feb 9 02:17:39 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 02:17:39 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <14496.42942.5355.849670@brant.geog.ubc.ca> Message-ID: <03fe01bf72cd$c040d640$5063cb0a@amullhau> > Travis Oliphant writes: > > > > 3) Facility for memory-mapped dataspace in arrays. > > For the NumPy users who are as ignorant about mmap, msync, > and madvise as I am, I've put a couple of documents on > my web site: I have Kevin's "Why Aren't You Using mmap() Yet?" on my site. Kevin is working on a new (11th anniversary edition? 1xth anniversary edition?). By the way, Uresh Vahalia's book on Unix Internals is a very good idea for anyone not yet familiar with modern operating systems, especially Unices. Kevin is extremely knowledgable on this subject, and several others. > Executive summary: > > i) mmap on Solaris can be a very big win Orders of magnitude. > (see bottom of > http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when > used in combination with WILLNEED/WONTNEED madvise calls to > guide the page prefetching. And with the newer versions of Solaris, madvise() is a good way to go. madvise is _not_ SVR4 (not in SVID3) but it _is_ in the OSF/1 AES which means it is _not_ vendor specific. But the standard part of madvise is that it is a "hint". However everything it actually _does_ when you hint the kernel with madvise is specific usually to some versions of an operating system. There are tricks to get around madvise not doing everything you want (WONTNEED didn't work in Solaris for a long time. Kevin found a trick that worked really well instead. Kevin knows people at Sun, since he was one of the very earliest employees there, and so now the trick Kevin used to suggest has now been found to be the implementation of WONTNEED in Solaris.) And that trick is well worth understanding. It happens that msync() is a good call to know. It has an undocumented behavior on Solaris that when you msync a memory region with MS_INVALIDATE | MS_ASYNC, what happens is the dirty pages are queued for writing and backing store is available immediately, or if dirty, as soon as written out. This means that the pager doesn't have to run at all to scavenge the pages. Linux didn't do this last time I looked. I suggested it to the kernel guys and the idea got some positive response, but I don't know if they did it. > ii) IRIX and some other Unices (Linux 2.2 in particular), haven't > implemented madvise, and naive use of mmap without madvise can produce > lots of page faulting and much slower io than, say, asynchronous io > calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) IRIX has an awful implementation of mmap. And SGI people go around badmouthing mmap; not that they don't have cause, but they are usually very surprised to see how big the win is with a good implementation. Of course, the msync() trick doesn't work on IRIX last I looked, which leads to the SGI people believing that mmap() is brain damaged because it runs the pager into the ground. It's a point of view that is bound to come up. HP/UX was really wacked last time I looked. They had a version (10) which supported the full mmap() on one series of workstations (700, 7000, I forget, let's say 7e+?) and didn't support it except in the non-useful SVR3.2 way on another series of workstations (8e+?). The reason was that the 8e+? workstations were multiprocessor and they hadn't figured out how to get the newer kernel flying on the multiprocessors. I know Konrad had HP systems at one point, maybe he has the scoop on those. > So I'd love to see mmap in Numpy, but we may need to produce a > tutorial outlining the tradeoffs, and giving some examples of > madvise/msync/mmap used together (with a few benchmarks). Any mmap > module would need to include member functions that call madvise/msync > for the mmapped array (but these may be no-ops on several popular OSes.) I don't know if you want a separate module; maybe what you want is the normal allocation of memory for all Numerical Python objects to be handled in a way that makes sense for each operating system. The approach I took when I was writing portable code for this sort of thing was to write a wrapper for the memory operation semantics and then implement the operations as a small library that would be OS specific, although not _that_ specific. It was possible to write single source code for SVID3 and OSF/AES1 systems with sparing use of conditional defines. Unfortunately, that code is the intellectual property of another firm, or else I'd donate it as an example for people who want to learn stuff about mmap. As it stands, there was some similar code I was able to produce at some point. I forget who here has a copy, maybe Konrad, maybe David Ascher. Later, Andrew Mullhaupt From skaller at maxtal.com.au Wed Feb 9 11:12:49 2000 From: skaller at maxtal.com.au (skaller) Date: Thu, 10 Feb 2000 03:12:49 +1100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> Message-ID: <38A19201.8A43EC01@maxtal.com.au> Konrad Hinsen wrote: > But back to precision, which is also a popular subject: but one which even numerical programmers don't seem to understand ... > The upcasting rule thus ensures that > > 1) No precision is lost accidentally. If you multiply a float by > a double, the float might contain the exact number 2, and thus > have infinite precision. The language can't know this, so it > acts conservatively and chooses the "bigger" type. > > 2) No overflow occurs unless it is unavoidable (the range problem). .. which is all wrong. It is NOT safe to convert floating point from a lower to a higher number of bits. ALL such conversions should be removed for this reason: any conversions should have to be explicit. The reason is that whether a conversion to a larger number is safe or not is context dependent (and so it should NEVER be done silently). Consider a function k0 = 100 k = 99 while k < k0: .. k0 = k k = ... which refines a calculation until the measure k stops decreasing. This algorithm may terminate when k is a float, but _fail_ when k is a double -- the extra precision may cause the algorithm to perform many useless iterations, in which the precision of the result is in fact _lost_ due to rounding error. What is happening is that the real comparison is probably: k - k0 < epsilon where epsilon was 0.0 in floating point, and thus omitted. My point is that throwing away information is what numerical programming is all about. Numerical programmers need to know how big numbers are, and how much significance they have, and optimise calculations accordingly -- sometimes by _using_ the precision of the working types to advantage. to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have: it can be misleading. So a language should not assume that it is OK to add more precision. It may not be. -- John (Max) Skaller, mailto:skaller at maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller From gpk at bell-labs.com Wed Feb 9 11:23:47 2000 From: gpk at bell-labs.com (Greg Kochanski) Date: Wed, 09 Feb 2000 11:23:47 -0500 Subject: [Numpy-discussion] Re: Numpy-discussion digest, Vol 1 #10 - 10 msgs References: <200002091617.IAA28931@lists.sourceforge.net> Message-ID: <38A19493.6E2394E6@bell-labs.com> > From: "Andrew P. Mullhaupt" > > The upcasting rule thus ensures that > > > > 1) No precision is lost accidentally. > > More or less. > > More precisely, it depends on what you call an accident. What happens when > you add the IEEE single precision floating point value 1.0 to the 32-bit > integer 2^30? A _lot_ of people don't expect to get the IEEE single > precision floating point value 2.0^30, but that is what happens in some > languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more > sense? Now what about the case where the 32 bit integer is signed and adding > one to it will "wrap around" if the value remains an integer? Because these > two examples might make double precision or a wider integer (if available) > seem the correct answer, suppose it's only one element of a gigantic array? > Let's now talk about complex values.... > It's most important that the rules be simple, and (preferably) close to common languages. I'd suggest C. In my book, anyone who carelessly mixes floats and ints deserves whatever punishment the language metes out. I've done numeric work in languages where casting was by request _only_ (e.g., Limbo, for Inferno), and I found, to my surprise, that automatic type casting these type casting is only a mild convenience. Writing code with manual typecasting is surprisingly easy. Since automatic typecasting only buys a small improvement in ease of use, I'd want to be extremely sure that it doesn't cause many problems. It's very easy to write some complicated set of rules that wastes more time (in the form of unexpected, untraceable bugs) than it saves. By the way, automatic downcasting has a hidden problems if python is ever set to trap underflow errors. I had a program that would randomly crash every 10th (or so) time I ran it with a large dataset (1000x1000 linear algebra). After days of hair-pulling, I found that the matrix was being converted from double to float at one step, and about 1 in 10,000,000 of the entries was too small to represent as a single precision number. That very rare event would underflow, be trapped, and crash the program with a floating point exception. From hinsen at cnrs-orleans.fr Wed Feb 9 12:17:30 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 18:17:30 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <38A19201.8A43EC01@maxtal.com.au> (message from skaller on Thu, 10 Feb 2000 03:12:49 +1100) References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> Message-ID: <200002091717.SAA10604@chinon.cnrs-orleans.fr> > silently). Consider a function > > k0 = 100 > k = 99 > while k < k0: > .. > k0 = k > k = ... > > which refines a calculation until the measure k stops decreasing. > This algorithm may terminate when k is a float, but _fail_ when > k is a double -- the extra precision may cause the algorithm I'd call this a buggy implementation. Convergence criteria should be explicit and not rely on the internal representation of data types. Neither Python nor C guarantees you any absolute bounds for precision and value range, and even languages that do (such as Fortran 9x) only promise to give you a data type that is *at least* as big as your specification. > programming is all about. Numerical programmers need to know > how big numbers are, and how much significance they have, > and optimise calculations accordingly -- sometimes by _using_ > the precision of the working types to advantage. If you care at all about portability, you shouldn't even think about this. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From amullhau at zen-pharaohs.com Wed Feb 9 12:21:37 2000 From: amullhau at zen-pharaohs.com (Andrew P. Mullhaupt) Date: Wed, 9 Feb 2000 12:21:37 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> Message-ID: <054301bf7322$23cbbbe0$5063cb0a@amullhau> > Konrad Hinsen wrote: > > > But back to precision, which is also a popular subject: > > but one which even numerical programmers don't seem to > understand ... Some do, some don't. > It is NOT safe to convert floating point from a lower to a higher > number > of bits. It is usually safe. Extremely safe. Safe enough that code in which it is _not_ safe is badly designed. > ALL such conversions should be removed for this reason: any > conversions should have to be explicit. I really hope not. A generic function with six different arguments becomes an interesting object in a language without automatic conversions. Usually, a little table driven piece of code has to cast the arguments into conformance, and then multiple versions of similar code are applied. > which refines a calculation until the measure k stops decreasing. > This algorithm may terminate when k is a float, but _fail_ when > k is a double -- the extra precision may cause the algorithm > to perform many useless iterations, in which the precision > of the result is in fact _lost_ due to rounding error. This is a classic bad programming practice and _it_ is what should be eliminated. It is a good, (and required, if you work for me), practice that: 1. All iterations should have termination conditions which are correct; that is, prevent extra iterations. This is typically precision sensitive. But that is simply something that has to be taken into account when writing the termination condition. 2. All iterations should be protected against an unexpectedly large number of iterates taking place. There are examples of iterations which are intrinsically stable in lower precision and not in higher precision (Brun's algorithm) but those are quite rare in practice. (Note that the Fergueson-Forcade algorithm, as implemented by Lenstra, Odlyzko, and others, has completely supplanted any need to use Brun's algorithm as well.) When an algorithm converges because of lack of precision, it is because the rounding error regularizes the problem. This is normally referred to in the trade as "idiot regularization". It is in my experience, invariably better to actually choose a regularization that is specific to the computation than to rely on rounding effects which might be different from machine to machine. In particular, your type of example is in for serious programmer enjoyment hours on Intel or AMD machines, which have 80 bit wide registers for all the floating point arithmetic. Supporting needless machine dependency is not something to argue for, either, since the Cray style floating point arithmetic has a bad error model. Even Cray has been beaten into submission on this, finally releasing IEEE compliant processors, but only just recently. > to put this another way, it is generally bad to keep more digits (bits) > or precision than you actually have I couldn't agree less. The exponential function and inner product accumulation are famous examples of why extra bits are important in intermediate computations. It's almost impossible to have an accurate exponential function without using extra precision - which is one reason why so many machines have extra bits in their FPUs and there is an IEEE "extended" precision type. The storage history effects which result from temporarily increased precision are well understood, mild in that they violate no common error models used in numerical analysis. And for those few cases where testing for equality is needed for debugging purposes, many systems permit you to impose truncation and eliminate storage history issues. Later, Andrew Mullhaupt From hinsen at cnrs-orleans.fr Wed Feb 9 12:31:00 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 18:31:00 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: References: Message-ID: <200002091731.SAA10614@chinon.cnrs-orleans.fr> > > In addition to backwards compatibility, there is another argument for > > keeping indexing behaviour as it is: compatibility with other Python > > sequence types. > > I claim the current Numeric is INconsistent with other Python sequence > types: > > >>> x = [1, 2, 3, 4, 5] > >>> y = x[2:5] > >>> x > [1, 2, 3, 4, 5] > >>> y > [3, 4, 5] > >>> y[1] = 7 > >>> y > [3, 7, 5] > >>> x > [1, 2, 3, 4, 5] > > So, y is a copy of x[2:5], not a reference. Good point. So we can't be consistent with all properties of other Python sequence types. Which reminds me of some very different compatibility problem in NumPy that can (and should) be removed: the rules for integer division and remainders for negative arguments are not the same. NumPy inherits the C rules, Python has its own. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gpk at bell-labs.com Wed Feb 9 13:25:00 2000 From: gpk at bell-labs.com (Greg Kochanski) Date: Wed, 09 Feb 2000 13:25:00 -0500 Subject: [Numpy-discussion] Re: [Matrix-SIG] Re: Matrix-SIG digest, Vol 1 #364 - 9 msgs References: <20000209064911.9404C1CD3C@dinsdale.python.org> <38A19236.51D1879F@bell-labs.com> <053b01bf731f$30118c20$5063cb0a@amullhau> Message-ID: <38A1B0FC.168FDCB5@bell-labs.com> "Andrew P. Mullhaupt" wrote: > > It's most important that the rules be simple, and (preferably) close > > to common languages. I'd suggest C. > > That is a good example of a language which has a pretty weird history on > this particular matter. True. The only real advantage of C is that so many people are used to it. Don't forget the human element. FORTRAN would also be a reasonable choice. There's a big cost to learning a new language; if it gets too big, people simply won't use Python. > > Since automatic typecasting only buys a small improvement > > in ease of use, I'd want to be extremely sure that it doesn't cause > > many problems. > > Au contraire. It is a huge win. Try writing a "generic" function with six > arguments which can sensibly be integers, or single or double precision > variables. If you have to test variables to see what they are, then you have > to essentially write a table driven typecaster. If, as in Fortran, you have > to write different functions for different argument types then you have the > dangerous programming practice of having several different pieces of code > which do essentially the same computation. While that's nice to say, it doesn't really translate completely to practice. A lot of functions don't make sense with arbitrary objects; and some require subtle changes. For instance, who wants a matrix inversion function that operates on integers, using integer division inside? Lots of functions have DOUBLE_EPS or FLOAT_EPS embedded inside them. One has to change the small number when you change the data type. I'll grant you that running things with both doubles or floats is often useful. I'd be happy with automatic upcasting among them. I'd be moderately happy with upcasting among the integers. I really don't see any crying need to mix integers with floating point numbers. I'd like some examples to make me believe that mixing ints and floats is a 'huge win'. From hinsen at cnrs-orleans.fr Wed Feb 9 13:39:46 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 19:39:46 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> (beausol@exch.hpl.hp.com) References: <34E36C05935CD311AE5000A0C9B6B0BF07D16D@hplex3.hpl.hp.com> Message-ID: <200002091839.TAA10737@chinon.cnrs-orleans.fr> > (1) The extent of the support for LAPACK. Do we want to stick with LAPACK > Lite? There has been a full LAPACK interface for a long while, of which LAPACK Lite is just the subset that is needed for supporting the high-level routines in the module LinearAlgebra. I seem to have lost the URL to the full version, but it's on my disk, so I can put it onto my FTP server if there is a need. > (2) The storage format. If we've still got row-ordered matrices under the > hood, and we want to use native LAPACK libraries that were compiled using > column-major format, then we'll have to be careful to set all of the flags > correctly. This isn't going to be a big deal, _unless_ NumPy will support > more of LAPACK when a native library is available. Then, of course, there The low-level interface routines don't take care of this. It's the high-level Python code (module LinearAlgebra) that sets the transposition argument correctly. That looks like a good compromise to me. > (3) Through the judicious use of header files with compiler-dependent flags, > we could accommodate the various naming conventions used when the FORTRAN > libraries were compiled (e.g., sgetrf_ or SGETRF). That's already done! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From pitts.todd at mayo.edu Wed Feb 9 14:05:02 2000 From: pitts.todd at mayo.edu (Pitts, Todd A., Ph.D.) Date: Wed, 9 Feb 2000 13:05:02 -0600 (CST) Subject: [Numpy-discussion] Upcasting Message-ID: Here are my two cents worth on the subject... Most of what has been said in this thread (at least what I have read) I find very valuable. Apparently, many people have been thinking about the subject. I view this problem as inherent in a language without an Lvalue (like C) that allows a very explicit and clear definition from the programmer's point of view as to the size of the container you are going put things in. The language in many cases simply returns an object to you and has made some decision as to what you "needed" or "wanted". Of course, this is one of the things that makes languages such an Numerical Python, Matlab, IDL, etc. very nice for protyping and investigating. In many cases this decision will be adequate or acceptable. In some, quite simply, it will not be. At this point the programmer has to have a good means of managing this decision himself. If memory is not a constraint, I can think of very few situations (none, actually) where I would choose to go with something other than the Numerical Python default of double. In general, that is what you get when creating python arrays unless you make some effort to obtain some other type. However, in some important (read "cases that affect the author") situations memory is a very critical constraint. Typically, in Numerical Python I use 4-byte floats. In fact, one of the reasons I use Numerical Python is because I don't *need* doubles and matlab for example is really only setup to work gracefully with doubles. I do *need* to conserve memory as I deal with very large data sets. It seems the question we are discussing is not really "what *should* be done in terms of casting?" but "what provides good enough decisions much of the time *and* a gracefull way to manage the decisions when "good enough" no longer applies to you?" Currently, this is not a trivial thing to manage. Reading in a 100 MB data set and multiplying by the python scalar 2 produces a 200 MB data set. I manage this by wrapping the 2 in an array. This happens, of course, all the time. Having to do this once is not a big deal, but doing everywhere in code that uses floats makes for cluttered code -- not something which I expect to have to write in an otherwise very elegant and concise language. Also, I often find myself trudging through code looking for the subtlety that converted my floats to doubles, doubled my memory usage and then caused subsequent float only routines to error out. To those who are constrained to use floats this is awkward and time consuming. To those who are not I would say -- use doubles. The flag that causes an array to be a "space saving array" seems to be a temporary fix (that doesn't mean it was a bad idea -- just that it feels messy and effectively adds complexity that shouldn't be there). It also, mearly postpones the problem as I understand it -- what happens when I multiply two space saving arrays? We simply will never get away from situations where we have to manage the interaction ourselves and so we should be careful not to make that management so awkward (and currently I think it is awkward) that the floats, bytes, shorts, etc. become marginalized in their utility. My suggestion is to go with the rule that a simple hirearchy (in which downcasting is the rule) longs integers shorts bytes cardinals booleans doubles complex doubles <--- default floats complex floats for the most part makes good decisions: Principally because people who are not constrained to conserve memory will use the larger, default types all the time and not wince. They don't *need* floats or bytes. If anyone gives them a float a simple astype('d') or astype('D') to make sure it becomes a double lets them go on their way. Types like integers and bytes are effectively treated as being precise. If you are constrained to conserve memory by staying with floats or bytes instead of just reading things in from disk and making them doubles it will not be so awkward to manage the types in large programs. If I use someones code and they have a scalar anywhere in it at some point, even if I (or they) cast the output, memory usage swells at least for intermediate calculations. Effectively, python *has* 4-byte floats but programming with them is awkward. This means, of course, that multiplying a float array by a double array produces a float. Multiplying a double array by anything above it produces a double. etc. For my work, if I have a float anywhere in the calculation I don't believe precision beyond that in the output so getting a float back is reasonable. I know that some operations produce "more precision" and so I would cast the array if I needed to take advantage of that. Perhaps the downcasting is *not* the way to go. However, I definately think the current awkwardness should be eliminated. I hope my comments will not be percieved as being critical of the original language designers. I find python to be very useful or I wouldn't have bothered to make the comments at all. -Todd Pitts From beausol at exch.hpl.hp.com Wed Feb 9 14:16:58 2000 From: beausol at exch.hpl.hp.com (Beausoleil, Raymond) Date: Wed, 9 Feb 2000 11:16:58 -0800 Subject: [Numpy-discussion] RE: [Matrix-SIG] An Experiment in code-cleanup. Message-ID: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> From: Konrad Hinsen [mailto:hinsen at cnrs-orleans.fr] > > (1) The extent of the support for LAPACK. Do we want to stick > > with LAPACK Lite? > > There has been a full LAPACK interface for a long while, of which > LAPACK Lite is just the subset that is needed for supporting the > high-level routines in the module LinearAlgebra. I seem to have lost > the URL to the full version, but it's on my disk, so I can put it > onto my FTP server if there is a need. Yes, I'd like to get a copy! You can simply e-mail it to me, if you'd prefer. > > (2) The storage format. If we've still got row-ordered matrices > > under the hood, and we want to use native LAPACK libraries that > > were compiled using column-major format, then we'll have to be > > careful to set all of the flags correctly. This isn't going to > > be a big deal, _unless_ NumPy will support more of LAPACK when a > > native library is available. Then, of course, there ... > > The low-level interface routines don't take care of this. It's the > high-level Python code (module LinearAlgebra) that sets the > transposition argument correctly. That looks like a good compromise > to me. I'll have to look at this more carefully. Due to my relative lack of Python experience, I hacked the C code so that Fortran routines could be called instead, producing the expected results. > > (3) Through the judicious use of header files with compiler- > > dependent flags, we could accommodate the various naming > > conventions used when the FORTRAN libraries were compiled (e.g., > > sgetrf_ or SGETRF). > > That's already done! Where? Even in the latest f2c'd source code that I downloaded from SourceForge, I see all names written using the lower-case-trailing-underscore convention (e.g., dgeqrf_). The Intel MKL was compiled from Fortran source using the upper-case-no-underscore convention (e.g., DGEQRF). If I replace dgeqrf_ with DGEQRF in dlapack_lite.c (and a few other tweaks), then the subsequent link with the IMKL succeeds. ============================ Ray Beausoleil Hewlett-Packard Laboratories mailto:beausol at hpl.hp.com Vox: 425-883-6648 Fax: 425-883-2535 HP Telnet: 957-4951 ============================ From godzilla at netmeg.net Wed Feb 9 15:47:34 2000 From: godzilla at netmeg.net (Les Schaffer) Date: Wed, 9 Feb 2000 15:47:34 -0500 (EST) Subject: [Numpy-discussion] digest Message-ID: <14497.53862.853166.521584@gargle.gargle.HOWL> just switched over from matrix-sig to numpy-discussion. in the process i changed to the digest version and got my first issue. is it possible to distribute the digests properly formatted as multipart/digests as per rfc822 and company? having such a formatted digest makes it very easy when using an emailer like VM in emacs: VM automatically displays the digest as a virtual folder, allowing one to browse all the posts in a given digest very quickly and easily. don't know whether the other (lacklusters) emailers out there will handle it so nicely, but i don't think the extra required markers will interfere with your reading of the digests at all. highly recommended. i'd be glad to work with whoever has control over this to ensure that the proper markers get placed into the digests. les schaffer From hinsen at cnrs-orleans.fr Wed Feb 9 15:58:47 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 9 Feb 2000 21:58:47 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. In-Reply-To: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> (beausol@exch.hpl.hp.com) References: <34E36C05935CD311AE5000A0C9B6B0BF07D16F@hplex3.hpl.hp.com> Message-ID: <200002092058.VAA10798@chinon.cnrs-orleans.fr> > > onto my FTP server if there is a need. > > Yes, I'd like to get a copy! You can simply e-mail it to me, if you'd > prefer. OK, coming soon... > I'll have to look at this more carefully. Due to my relative lack of Python > experience, I hacked the C code so that Fortran routines could be called > instead, producing the expected results. That's fine, you can simply replace the f2c-generated code by Fortran-compiled code, as long as the calling conventions are the same. I have used optimized BLAS as well on some machines. > Where? Even in the latest f2c'd source code that I downloaded from > SourceForge, I see all names written using the > lower-case-trailing-underscore convention (e.g., dgeqrf_). The Intel MKL was Sure, f2c generates the underscores. But the LAPACK interface code (the one I'll send you, and also LAPACK Lite) supports both conventions, controlled by the preprocessor symbol NO_APPEND_FORTRAN (maybe not the most obvious name). On the other hand, there is no support for uppercase names; that convention is not used in the Unix world. But I suppose it could be added by machine transformation of the code. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From da at ski.org Wed Feb 9 16:23:22 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 13:23:22 -0800 Subject: [Numpy-discussion] digest References: <14497.53862.853166.521584@gargle.gargle.HOWL> Message-ID: <00cd01bf7343$ea55ae30$0100000a@ski.org> > just switched over from matrix-sig to numpy-discussion. in the process > i changed to the digest version and got my first issue. > > is it possible to distribute the digests properly formatted as > multipart/digests as per rfc822 and company? Did you try to edit your configuration on the mailman control panel? There is a choice between MIME and plain-text digests. --david ascher From skaller at maxtal.com.au Wed Feb 9 17:04:13 2000 From: skaller at maxtal.com.au (skaller) Date: Thu, 10 Feb 2000 09:04:13 +1100 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <200002081956.UAA03241@chinon.cnrs-orleans.fr> <38A19201.8A43EC01@maxtal.com.au> <200002091717.SAA10604@chinon.cnrs-orleans.fr> Message-ID: <38A1E45D.6E0EF317@maxtal.com.au> Konrad Hinsen wrote: > > > silently). Consider a function > > > > k0 = 100 > > k = 99 > > while k < k0: > > .. > > k0 = k > > k = ... > > > > which refines a calculation until the measure k stops decreasing. > > This algorithm may terminate when k is a float, but _fail_ when > > k is a double -- the extra precision may cause the algorithm > > I'd call this a buggy implementation. Convergence criteria should be > explicit and not rely on the internal representation of data > types. > If you care at all about portability, you shouldn't even think about > this. But sometimes you DON'T care about portability. Sometimes, you want the best result the architecture can support, and so you need to perform a portable computation of an architecture dependent value. -- John (Max) Skaller, mailto:skaller at maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller From da at ski.org Wed Feb 9 18:50:03 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 15:50:03 -0800 Subject: [Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup. References: <14496.42942.5355.849670@brant.geog.ubc.ca> <03fe01bf72cd$c040d640$5063cb0a@amullhau> Message-ID: <037d01bf7358$630b8980$0100000a@ski.org> > it as an example for people who want to learn stuff about mmap. As it > stands, there was some similar code I was able to produce at some point. I > forget who here has a copy, maybe Konrad, maybe David Ascher. > > Later, > Andrew Mullhaupt I did have some of that code, but it was almost 3 years ago and five computers ago. In other words, it's *somewhere*. I'll start a grep, but don't hold your breath... --da From da at ski.org Thu Feb 10 01:52:21 2000 From: da at ski.org (David Ascher) Date: Wed, 9 Feb 2000 22:52:21 -0800 Subject: [Numpy-discussion] Binary distribution available References: <14496.42942.5355.849670@brant.geog.ubc.ca> <03fe01bf72cd$c040d640$5063cb0a@amullhau> Message-ID: <051f01bf7393$61bff4e0$0100000a@ski.org> With Travis' wise advice, I appear to have succeeded in putting forth a binary installation of Numerical-15.2. Due to a bug in distutils, this is an 'install in place' package, instead of a 'run python setup.py install' package. So, unzip the file in your main Python tree, and it should 'work'. Let me (and Paul and Travis) know if it doesn't. Download is available from the main page (http://sourceforge.net/project/?group_id=1369 look for [zip]) or from http://download.sourceforge.net/numpy/python-numpy-15.2.zip --david ascher From gvwilson at nevex.com Thu Feb 10 13:28:51 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 10 Feb 2000 13:28:51 -0500 (EST) Subject: [Numpy-discussion] re: scientific Python publishing venue Message-ID: Hi, folks. A former colleague of mine is now editing a magazine devoted to scientific computing, and is looking for articles. If you're doing something scientific with Python, and want to tell the world about it, please give me a shout, and I'll forward more information. Greg Wilson http://www.software-carpentry.com From archiver at db.geocrawler.com Thu Feb 17 12:34:11 2000 From: archiver at db.geocrawler.com (andrew x swan) Date: Thu, 17 Feb 2000 09:34:11 -0800 Subject: [Numpy-discussion] more speed? Message-ID: <200002171734.JAA08011@www.geocrawler.com> This message was sent from Geocrawler.com by "andrew x swan" Be sure to reply to that address. hi - i've only just started using python and numpy... the program i wrote below runs much more slowly than a fortran equivalent. ie. on a dataset where the order of the matrix is (3325,3325), python took this long: 362.25user 0.74system 6:09.78elapsed 98%CPU and fortran took this long: 2.68user 1.12system 0:03.89elapsed 97%CPU is this because the element by element calculations involved are contained in python for loops? thanks #!/usr/bin/python from Numeric import * def nrm(pedigree): n_animals = len(pedigree) + 1 nrm = zeros((n_animals,n_animals),Float) for i in xrange(1,n_animals): isire = pedigree[i-1][1] idam = pedigree[i-1][2] nrm[i,i] = 1.0 + 0.5 * nrm[isire,idam] for j in xrange(i+1,n_animals): jsire = pedigree[j-1][1] jdam = pedigree[j-1][2] nrm[j,i] = 0.5 * (nrm[jsire,i] + nrm[jdam,i]) nrm[i,j] = nrm[j,i] return nrm if __name__ == '__main__': test_ped = [(1,0,0),(2,0,0),(3,1,0),(4,1,2), (5,3,4),(6,1,4),(7,5,6)] a = nrm(test_ped) print a Geocrawler.com - The Knowledge Archive From da at ski.org Thu Feb 17 18:25:57 2000 From: da at ski.org (David Ascher) Date: Thu, 17 Feb 2000 15:25:57 -0800 Subject: [Numpy-discussion] more speed? References: <200002171734.JAA08011@www.geocrawler.com> Message-ID: <04d501bf799e$5a82abd0$0100000a@ski.org> From: andrew x swan > python took this long: > > 362.25user 0.74system 6:09.78elapsed 98%CPU > > and fortran took this long: > > 2.68user 1.12system 0:03.89elapsed 97%CPU > > is this because the element by element > calculations involved are contained in python for > loops? yes. --david ascher From syrus at long.ucsd.edu Thu Feb 17 18:27:29 2000 From: syrus at long.ucsd.edu (Syrus Nemat-Nasser) Date: Thu, 17 Feb 2000 15:27:29 -0800 (PST) Subject: [Numpy-discussion] more speed? In-Reply-To: <200002171734.JAA08011@www.geocrawler.com> Message-ID: On Thu, 17 Feb 2000, andrew x swan wrote: > is this because the element by element > calculations involved are contained in python for > loops? Hi Andrew! I've only just begun using Numeric Python, but I'm a long-time user of GNU Octave and a sporadic user of MatLab. In general, for loops kill the execution speed of interpretive environments like Numpy and Octave. The high-speed comes when one uses vector operations such as Matrix multiplication. If you can vectorize your code, meaning replace all the loops with matrix operations, you should see equivalent speed to Fortran for large data sets. As far as I know, you will never see an interpreted language match a compiled one in the execution of for loops. Thanks. Syrus. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Syrus Nemat-Nasser UCSD Physics Dept. From peter at eexpc.eee.nott.ac.uk Fri Feb 18 12:05:06 2000 From: peter at eexpc.eee.nott.ac.uk (Peter Chang) Date: Fri, 18 Feb 2000 17:05:06 +0000 (GMT) Subject: [Numpy-discussion] numpy documentation - alternative format? Message-ID: Hi there, I've just started to use python and numpy and want to print out the numpy document but the PDF file has a strange aspect ratio which makes it hard to print it as 2up on A4 paper. (I've tried hacking about with the postscript generated by xpdf but it seems that there is no global setting for page size!) Could the authors please provide alternative formats for the doc, eg. as postscript files sized for A4 and letter so that people can print them out easier? Thanks Peter From roitblat at hawaii.edu Fri Feb 18 12:14:22 2000 From: roitblat at hawaii.edu (Herbert L. Roitblat) Date: Fri, 18 Feb 2000 07:14:22 -1000 Subject: [Numpy-discussion] numpy documentation - alternative format? Message-ID: <03fd01bf7a33$9b046320$8fd6afcf@0gl1u.pixi.com> Adobe Acrobat has a shrink to fit option in their print menu. I'm not sure if it comes with their free-reader. Try printing as a 1up. It seems a small adaptation. HLR -----Original Message----- From: Peter Chang To: Numpy-discussion at lists.sourceforge.net Date: Friday, February 18, 2000 7:09 AM Subject: [Numpy-discussion] numpy documentation - alternative format? > >Hi there, > >I've just started to use python and numpy and want to print out the numpy >document but the PDF file has a strange aspect ratio which makes it hard >to print it as 2up on A4 paper. (I've tried hacking about with the >postscript generated by xpdf but it seems that there is no global setting >for page size!) > >Could the authors please provide alternative formats for the doc, eg. >as postscript files sized for A4 and letter so that people can print them >out easier? > >Thanks > Peter > > > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >http://lists.sourceforge.net/mailman/listinfo/numpy-discussion > From peter at eexpc.eee.nott.ac.uk Fri Feb 18 12:18:46 2000 From: peter at eexpc.eee.nott.ac.uk (Peter Chang) Date: Fri, 18 Feb 2000 17:18:46 +0000 (GMT) Subject: [Numpy-discussion] numpy documentation - alternative format? In-Reply-To: <03fd01bf7a33$9b046320$8fd6afcf@0gl1u.pixi.com> Message-ID: On Fri, 18 Feb 2000, Herbert L. Roitblat wrote: > Adobe Acrobat has a shrink to fit option in their print menu. I'm not sure > if it comes with their free-reader. Is it available for Linux? I'll check it out... > Try printing as a 1up. It seems a small adaptation. I'm trying to save dead trees, i.e. print out 40 odd pages instead of 90 odd. Peter From sanner at scripps.edu Sat Feb 19 22:50:16 2000 From: sanner at scripps.edu (Michel Sanner) Date: Sat, 19 Feb 2000 19:50:16 -0800 Subject: [Numpy-discussion] Numeric Python under IRIX646 Message-ID: <1000219195017.ZM77150@noah.scripps.edu> Hi There, I just tried to add SGI running IRIX6.5 to the collection of Unix boxes I will support and I ran into the following problem: If I compile Python -O2 loading the Numeric extensions dumps the core, if I compile Python -g it works just fine and this regardless if Numeric is compile -g or -O2. After I re-compiled Objects/complexobject.o using -g (everything else being compiled -O2) I got it to work ... did anyone else out there see this kind of behavior ? I also post this to psa-members just in case this might be Python related -Michel -- ----------------------------------------------------------------------- >>>>>>>>>> AREA CODE CHANGE <<<<<<<<< we are now 858 !!!!!!! Michel F. Sanner Ph.D. The Scripps Research Institute Assistant Professor Department of Molecular Biology 10550 North Torrey Pines Road Tel. (858) 784-2341 La Jolla, CA 92037 Fax. (858) 784-2860 sanner at scripps.edu http://www.scripps.edu/sanner ----------------------------------------------------------------------- From mitch.chapman at mciworld.com Mon Feb 21 13:01:59 2000 From: mitch.chapman at mciworld.com (Mitch Chapman) Date: Mon, 21 Feb 2000 11:01:59 -0700 Subject: [Numpy-discussion] Re: [PSA MEMBERS] Numeric Python under IRIX646 In-Reply-To: <1000219195017.ZM77150@noah.scripps.edu> References: <1000219195017.ZM77150@noah.scripps.edu> Message-ID: <00022111060701.00593@mchapmanpc> On Sat, 19 Feb 2000, Michel Sanner wrote: > Hi There, > > I just tried to add SGI running IRIX6.5 to the collection of Unix boxes I will > support and I ran into the following problem: > > If I compile Python -O2 loading the Numeric extensions dumps the core, > if I compile Python -g it works just fine and this regardless if Numeric is > compile -g or -O2. > > After I re-compiled Objects/complexobject.o using -g (everything else being > compiled -O2) I got it to work ... > > did anyone else out there see this kind of behavior ? I saw exactly this behavior just last Friday afternoon. After all of Python was recompiled with -g the bus error went away. Thanks for pointing out that only complexobject needs to be compiled with -g. It didn't occur to me to try this, despite the location of the bus error, because it was possible to exercise complex objects interactively w. no problems. BTW I don't know whether you were compiling N32 or N64. In our case N32 created the bus error. -- Mitch Chapman mitch.chapman at mciworld.com From hinsen at cnrs-orleans.fr Fri Feb 25 07:26:58 2000 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 25 Feb 2000 13:26:58 +0100 Subject: [Numpy-discussion] Re: [Matrix-SIG] Numeric Array: adding a 0-D array to a cell in a 2-D array In-Reply-To: <019901bf7e93$8771d5e0$8fd6afcf@0gl1u.pixi.com> (roitblat@hawaii.edu) References: <019901bf7e93$8771d5e0$8fd6afcf@0gl1u.pixi.com> Message-ID: <200002251226.NAA14777@chinon.cnrs-orleans.fr> > We get the type error from trying to set the matrix element with a matrix > element (apparently). In the old version (1.9) on our NT box, > temp=a[kwd,kwd] results in temp being an int type. How can we either cast > the temp to an int or enable what we really want, which is to add an int to > a[kwd,kwd], as in a[kwd,kwd] = a[kwd,kwd] + jwd ? > > Do we have a bad version of Numeric? Maybe an experimental version. If you check the archives of this mailing list, you can find a recent discussion about proposed modifications. One of them was to eliminate the automatic conversion of rank-0 arrays to scalars, in order to prevent type promotion. Perhaps this proposal was implemented in the version you have. Note to the NumPy maintainers: please announce all new releases on this list, mentioning changes, especially those that affect backward compatibility. As a maintainer of code that makes heavy use of NumPy, I keep getting questions and bug reports caused by some new NumPy release that I haven't even heard of. A recent example is the change of location of the header files; C modules using arrays now have to include Numeric/arrayobject.h instead of simply arrayobject.h. I can understand this change (although I am not sure it's important enough to break compatibility), but I'd have preferred to learn about it directly and as early as possible. It's really no fun working through a 2 KB bug report sent by someone with zero knowledge of C. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis at mayo.edu Fri Feb 25 15:23:01 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 25 Feb 2000 14:23:01 -0600 (CST) Subject: [Numpy-discussion] Array-casting problem. Message-ID: Hi Herb, It has taken awhile for me to respond to this, but your problem's here illustrate exactly the kinds of difficulties one encounters with the current NumPy coercion rules: You do not have a bad version of Numeric. The behavior you describe is exactly what "should" happen though it needs to be fixed. I'll trace for you exactly what is going on as it could be illustrative to others: >>> a = zeros((5,5),'b') # You've just created a 5x5 byte array that follows "normal" coercion # rules filled with zeros. >>> a[3,3] = 8 # This line copies the rank-0 array of type 'b' created from the Python # Integer 8 (by a direct coercion in C) into element (3,3) of matrix a >>> temp = a[3,3] # This selects out the rank-0 array of typecode 'b' at position (3,3). As # of 15.2 this is nolonger changed to a scalar. Note that rank-0 arrays # act alot like scalars, but because there is not a one-to-one # correspondence between the Python Scalars and rank-0 arrays, this is not # automatically converted to a Python scalar (this is a change in 15.2) >>> temp = temp + 3 # This is the problem line for you right here. Something is wrong though, # since it should not be, a problem. # You are adding a rank-0 array of typecode 'b' to a Python Integer which # is interpreted by Numeric as a rank-0 array of typecode 'l'. The result # should be a Python Integer. For some reason this is returning an array # of typecode 'i' (which does not get automatically converted to a Python # scalar). >>> a[3,3] = temp # This would work fine if temp were the Python scalar it should be. # Right now, assignment doesn't let you assign an array of a "larger" type # to elements of a smaller type (except for Python scalars). Since temp # is (incorrectly I think) a type 'i' rank-0 array, it does not let you # make the assignment. At any rate it is inconsistent to let you assign # Python scalars but not rank-0 arrays of arbitrary precision, this should # be fixed. It is also a problem that temp + 3 returns an array of # typecode 'i'. I will look into fixing the above problems this example points out. Of course, it could also be fixed by having long integers lower in the coercion tree than byte arrays. Thanks for the feedback, Travis Oliphant From Oliphant.Travis at mayo.edu Fri Feb 25 15:57:34 2000 From: Oliphant.Travis at mayo.edu (Travis Oliphant) Date: Fri, 25 Feb 2000 14:57:34 -0600 (CST) Subject: [Numpy-discussion] Casting problems with new version of NumPy. Message-ID: The code sent by Herbert Roitblat pointed out some inconsistencies in the current NumPy, that I've fixed with two small changes: 1) Long's can no longer be safely cast to Int's (this is not safe on 64-bit machines anyway) -- this makes Numeric more consistent with how it interprets Python integers. 2) Automatic casting will be done when using rank-0 arrays to set elements of a Numeric array to be consistent with the behavior for Python scalars. The changes are in CVS right now, but are simple to change back if there is a problem. -Travis From collins at rushe.aero.org Mon Feb 28 13:17:38 2000 From: collins at rushe.aero.org (JEFFERY COLLINS) Date: Mon, 28 Feb 2000 10:17:38 -0800 Subject: [Numpy-discussion] Matrix.py problem Message-ID: <200002281817.KAA04027@rushe.aero.org> I installed the Numpy 15.2 and got the following error during the import of Matrix. Apparently, the version number is no longer embedded in the module doc string following the # sign. >>> import Matrix Traceback (innermost last): File "", line 1, in ? File "/usr/local/lib/python1.5/site-packages/Numeric/Matrix.py", line 5, in ? __version__ = int(__id__[string.index(__id__, '#')+1:-1]) File "/usr/local/lib/python1.5/string.py", line 138, in index return _apply(s.index, args) ValueError: substring not found in string.index Jeff