From pascucci@cs.utexas.edu Tue Jun 1 15:50:38 1999 From: pascucci@cs.utexas.edu (pascucci) Date: Tue, 01 Jun 1999 09:50:38 -0500 Subject: [Matrix-SIG] swig - visual studio References: Message-ID: <3753F33E.4CC5A47E@cs.utexas.edu> How can I set a visual studio project to preprocess a "filename.i" with SWIG and generate automatically the appropriate wrapper? thanks Valerio From pascucci@cs.utexas.edu Tue Jun 1 18:43:24 1999 From: pascucci@cs.utexas.edu (pascucci) Date: Tue, 01 Jun 1999 12:43:24 -0500 Subject: [Matrix-SIG] swig - visual studio References: <199906011556.KAA18543@smtp1.gte.net> Message-ID: <37541BBB.A62721BD@cs.utexas.edu> Thank you. It worked. I have one more question: how can I debug a piece of code in a dll starting from the python environment? If the code crashes the debugger seems to start automatically ... but what if I want to stop in another breakpoint before the error occurs? Valerio Roger Burnham wrote: > On 1 Jun 99, at 9:50, pascucci mused: > > > How can I set a visual studio project to preprocess a "filename.i" with SWIG > > and generate automatically the appropriate wrapper? > > thanks > > Valerio > > > > > > Using VC++ 6.0: > > Add a def for SWIG_EXE to your autoexec.bat: > set SWIG_EXE=C:\DevSrc\Swig\SWIG1.1p4\bin\swig.exe > > Add the .i file to the project. > > In the FileView, right-click on the .i, choose properties, and add a custom build > step: > > Description: Swigging up .c... > Commands: $(SWIG_EXE) -python -o .c .i > Outputs: .c > > Cheers, > > Roger Burnham > Cambridge Research & Instrumentation > rburnham@cri-inc.com > http://www.cri-inc.com/ > http://starship.python.net/crew/roger/ > PGP Key: http://www.nai.com/default_pgp.asp > PGP Fingerprint: 5372 729A 9557 5F36 177F 084A 6C64 BE27 0BC4 CF2D From dubois1@llnl.gov Tue Jun 1 20:02:01 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Tue, 1 Jun 1999 12:02:01 -0700 Subject: [Matrix-SIG] CXX and Numeric Message-ID: <000a01beac61$3b0c0040$f4160218@plstn1.sfba.home.com> This is a multi-part message in MIME format. ------=_NextPart_000_0007_01BEAC26.8E732C80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable As part of one of the demos for CXX there is an example defines an Array = object. Here is a portion of the=20 header file in the Include directory. This is only a demo. I did not = have the time to do a complete wrap on Numerical's API. A better version = of this class would be welcome. Please send mail about CXX direct to me at dubois1@llnl.gov. A friend = happened to forward your message but otherwise I missed it. Due to the = large amount of traffic I do not read the python list that carefully. I = gather you had some complaint about the robustness of an example and my = failure to do much more on the documentation. I regret my work = assignment has limited my attention to CXX although it is important to = me.=20 class Array: public Sequence { public: virtual bool accepts (PyObject *pyob) const { return pyob && PyArray_Check (pyob); } =20 explicit Array (PyObject *pyob): Sequence(pyob) { validate(); } =20 Array(const Object& other): Sequence(*other) { validate(); } =20 Array& operator=3D (const Object& rhs) { return (*this =3D *rhs); } =20 Array& operator=3D (PyObject* rhsp) { if(ptr() =3D=3D rhsp) return *this; set(rhsp); return *this; } =20 explicit Array (int n=3D0, PyArray_TYPES t =3D PyArray_DOUBLE) : Sequence(FromAPI(PyArray_FromDims(1, &n, t))) { validate(); } Array clone() const { PyObject *p =3D PyArray_CopyFromObject(ptr(), species(), rank(), = rank()); return Array(p); } int species() const { return PyArray_ObjectType(ptr(), 0); } int rank() const { return ((PyArrayObject*) ptr())->nd; } int dimension(int i) const { if (1 <=3D i && i <=3D rank()) { return ((PyArrayObject*) ptr())->dimensions[i-1]; } else { return 1; } } int is_contiguous() const { return PyArray_ISCONTIGUOUS ((PyArrayObject*) ptr()); } char* to_C() const { return ((PyArrayObject*) ptr())->data; } Array as_contiguous() { if (is_contiguous()) return Array(ptr()); return Array((PyObject*)PyArray_ContiguousFromObject(ptr(), = species(), 1, 0)); } =20 }; ------=_NextPart_000_0007_01BEAC26.8E732C80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

As part of one of the demos for CXX there is an = example=20 defines an Array object. Here is a portion of the

header file in the Include directory. This is only a = demo. I=20 did not have the time to do a complete wrap on Numerical's API. A better = version=20 of this class would be welcome.

Please send mail about CXX direct to me at dubois1@llnl.gov. A friend happened = to=20 forward your message but otherwise I missed it. Due to the large amount = of=20 traffic I do not read the python list that carefully. I gather you had = some=20 complaint about the robustness of an example and my failure to do much = more on=20 the documentation. I regret my work assignment has limited my attention = to CXX=20 although it is important to me.

class Array: public=20 Sequence
{
public:
    virtual bool accepts = (PyObject=20 *pyob) const {
        return pyob = && PyArray_Check (pyob);
    = }
   =20
    explicit Array (PyObject *pyob): Sequence(pyob)=20 {
       =20 validate();
    }
    =
   =20 Array(const Object& other): Sequence(*other)=20 {
       =20 validate();
    }
    =
   =20 Array& operator=3D (const Object& rhs)=20 {
        return (*this =3D=20 *rhs);
    }
    =
   =20 Array& operator=3D (PyObject* rhsp)=20 {
        if(ptr() =3D=3D rhsp) = return=20 *this;
       =20 set(rhsp);
        return=20 *this;
    }
    =
   =20 explicit Array (int n=3D0, PyArray_TYPES t =3D=20 PyArray_DOUBLE)
        :=20 Sequence(FromAPI(PyArray_FromDims(1, &n, t)))=20 {
       =20 validate();
    }

    Array clone() const=20 {
        PyObject *p =3D=20 PyArray_CopyFromObject(ptr(), species(), rank(),=20 rank());
        return=20 Array(p);
    }

    int species() const=20 {
        return = PyArray_ObjectType(ptr(),=20 0);
    }

    int rank() const=20 {
        return ((PyArrayObject*) = ptr())->nd;
    }

    int dimension(int i) const=20 {
        if (1 <=3D i = && i <=3D=20 rank()) = {
           =20 return ((PyArrayObject*)=20 ptr())->dimensions[i-1];
       = } else=20 {
            = return=20 1;
        }
    = }

    int is_contiguous() const=20 {
        return = PyArray_ISCONTIGUOUS=20 ((PyArrayObject*) ptr());
    }

    char* to_C() const=20 {
        return ((PyArrayObject*) = ptr())->data;
    }

    Array as_contiguous()=20 {
        if (is_contiguous()) = return=20 Array(ptr());
        return=20 Array((PyObject*)PyArray_ContiguousFromObject(ptr(), species(), 1,=20 0));
    }       =20
};

------=_NextPart_000_0007_01BEAC26.8E732C80-- From HYoon@exchange.ml.com Tue Jun 1 20:25:53 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Tue, 1 Jun 1999 15:25:53 -0400 Subject: [Matrix-SIG] RE: CXX and Numeric Message-ID: Paul, Thanks for your response. I do not have a great deal to complaint at the moment. As many of Python related stuff, I was fully expecting things are on the process status as usual. I was just merely trying to gather info if available. If you have anything simple code that takes numeric as arg and return numeric back for Cxx, I would appreciate it. As I understand it, Cxx is currently for one dimension only? Is any improvement in works? Anyway CXX is a wonderful change, however, from the C interface, which I could never figuire out. I don't know C++, bu t it took me basically 2-3 hours to go through the example and I am writing a fairly complicated data interface with it already. I am so happy about finally being able to wrap stuff in C. One small Q if you don't mind. How does one append list to a list (or Tuple)? So far this is what I have come up with. a.append(aux) apparently send out address of not the values of aux. This is quite diff from way Python does it normally. This will do what I want it to, but a little bit ugly. List a, Nada; List ans, aux; aux.append(Int(3)); aux.append(Float(6.0)); a.append(Float(0.0)); a.append(aux); a.append(Tuple(aux)); aux.append(Float(99.0)); aux = Nada; aux.append(Float(111.1)); a.append(Tuple(aux)); Thank you much for everything. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** From KCAZA@cymbolic.com Thu Jun 3 00:04:03 1999 From: KCAZA@cymbolic.com (Kevin Cazabon) Date: Wed, 02 Jun 1999 16:04:03 -0700 Subject: [Matrix-SIG] NumPy on Mac... Message-ID: I'm having problems using Numerical Py on a Mac... I've used the standard Python 1.5.1 installer (custom install including NumPy) that is SUPPOSED to install NumPy properly, and added NumPy to the PythonPath... but still getting errors. Mostly, it can't find multiarray.py and umath.py (I've run the macmkalias.py script too).... what am I missing here? Thanks, Kevin Cazabon. From just@letterror.com Thu Jun 3 00:54:00 1999 From: just@letterror.com (Just van Rossum) Date: Thu, 3 Jun 1999 01:54:00 +0200 Subject: [Matrix-SIG] NumPy on Mac... In-Reply-To: Message-ID: At 4:04 PM -0700 6/2/99, Kevin Cazabon wrote: >I'm having problems using Numerical Py on a Mac... I've used the standard >Python >1.5.1 installer (custom install including NumPy) that is SUPPOSED to >install NumPy >properly, and added NumPy to the PythonPath... but still getting errors. > >Mostly, it can't find multiarray.py and umath.py (I've run the >macmkalias.py script >too).... > >what am I missing here? I think you need to add these two lines $(PYTHON):Extensions:NumPy $(PYTHON):Extensions:NumPy:NumPy to your sys.path prefs with EditPythonPrefs. (This is only for 1.5.1, later versions will/do run NumPy out of the box) Just From rburnham@cri-inc.com Thu Jun 3 02:53:35 1999 From: rburnham@cri-inc.com (Roger Burnham) Date: Wed, 2 Jun 1999 17:53:35 -800 Subject: [Matrix-SIG] UnsignedInt16 support Message-ID: <199906030054.TAA03362@smtp1.gte.net> Hi all, Any objections to my adding UnsignedInt16 (PyArray_USHORT, "w") storage type to Numeric??? Now that I've added 16bit luminance images to PIL, I need this type to do array manipulations on the image data. Cheers, Roger Burnham Cambridge Research & Instrumentation rburnham@cri-inc.com http://www.cri-inc.com/ http://starship.python.net/crew/roger/ PGP Key: http://www.nai.com/default_pgp.asp PGP Fingerprint: 5372 729A 9557 5F36 177F 084A 6C64 BE27 0BC4 CF2D From Oliphant.Travis@mayo.edu Sat Jun 5 21:15:25 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Sat, 5 Jun 1999 15:15:25 -0500 (CDT) Subject: [Matrix-SIG] Multipack 0.6.1 released Message-ID: I'm announcing the release of Multipack 0.6.1 available at http://oliphant.netpedia.net While it's only a minor upgrade for the front end user, quite a few changes have been taking place in the backend: The biggest change is the inclusion of some excellent documentation from Pearu Peterson for fsolve and leastsq. Development is now taking place on a CVS server hosted by Pearu and administered by me. If you would like to contribute to this project contatct me and we can see about arranging access. There have been several improvements to the interface and some bug fixes thanks to Pearu's tireless testing and suggestions. I am really quite excited about what Python and NumPy is becoming for me. (Constructive) feedback is always appreciated. Travis Oliphant From dars@fook.mechanoid.soton.ac.uk Mon Jun 7 16:05:23 1999 From: dars@fook.mechanoid.soton.ac.uk (Dave Stinchcombe) Date: Mon, 7 Jun 1999 16:05:23 +0100 Subject: [Matrix-SIG] british standard graphs Message-ID: <19990607160523.A12333@fook.mechanoid.soton.ac.uk> Hi folks, I have been producing graphs for my work using the pygist module. Untill now there has been no problem. However I discover that there is a british standard for graphs. It is seen as essential that I now produce graphs which comply with this standard (as well as converting all my old ones). Essentially the standard requires the graph to be placed in a drawn box (not a virtual box), with the dashing on the inside of the box, and the numbers on the outside. Does anyone know the simplest way to sort this out ? One possibility I'm considering is writing a program to munge the postscript output from pygist, and turn into british standard form. Does this seem like a sane option ? Thanks for your time. Yours Dave From Oliphant.Travis@mayo.edu Mon Jun 7 20:47:42 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Mon, 7 Jun 1999 14:47:42 -0500 (CDT) Subject: [Matrix-SIG] Multipack is growing.... Message-ID: Many of you have noticed that I have been quite active on this list for quite a little while releasing several different packages all aimed to be quite general. My goal in doing this has been mainly to satisfy my own needs, but by releasing incremental updates in a variety of areas I have also saught to generate interest in adding some computational facilities to Numeric to make it attractive as an interactive data analysis environment. This is the idea recently advocated by Joe Harrington and tossed around a bit on this list several months ago. The conclusions I drew from that discussion was that while a general, scripting-language-independent approach was certainly a laudable ultimate goal, it would be difficult to know how to proceed without a reference implementation which gathered some of the available code together into Python first. That is basically what I've been trying to do over the past several months. As I mentioned in an earlier announcement, Pearu Peterson has graciously made available a CVS server where development of Multipack is occuring. I've decided to incorporate all of my packages into the umbrella of Multipack due to some significant changes that Pearu has made to the organization of that module which allows for its extensibility. For example the CVS version of Multipack now has the spline packages from ddierckx (fitpack) which were contributed by Pearu and will soon (over the next few months) contain cddlib, an enhanced FFTW interface, and a sparse matrix toolbox. After that an optimization toolbox is planned (by me at least) and later a wavelet toolbox. I am also planning on distributing Janko Hausers' wonderful interactive processing shell along with this package. The reason that I am posting this message is to let those of you who were/are interested in contributing to make Python into a "real" interactive data analysis package know that your contributions are welcome. Even if all you can do is look at the package and say, "Hey, the way you implement XXX is stupid, you should do it like this..." it would be appreciated. Of course real source code contributions are even more appreciated. One area that will need continued help, of course, is cross-platform portability. The license I've decided on for my code is the LGPL. Eventually I could see placing LinearAlgebra, RandomArray, and FFT all under the Multipack umbrella and leaving Numeric to hold the objects themselves with their methods. This is a very tentative eventually, of course, and for now isn't worth the effort. Comments encouraged, Travis Oliphant From alawhead@vcn.bc.ca Mon Jun 7 22:09:19 1999 From: alawhead@vcn.bc.ca (Alexander Lawhead) Date: Mon, 7 Jun 1999 14:09:19 -0700 (PDT) Subject: [Matrix-SIG] Multipack is growing.... In-Reply-To: Message-ID: On Mon, 7 Jun 1999, Travis Oliphant wrote: > > Many of you have noticed that I have been quite active on this list for > quite a little while releasing several different packages all aimed to be > quite general. The work you've been doing is excellent! The only reservation I have is that the distribution appears to be limited to the source code and binary implementations for linux. If this truly is going to develop into the standard set of extensions to NumPy will there be binary distributions for other platforms (read Windows XX!)? Compiling source on a unix box is usually trivial. For those people stuck using less flexible platforms, having binaries available is an absolute must! Are there plans to do this? Would it be a relatively simple thing or is this going to take a concerted effort by many people? Issues like this should probably be tackled early on in the process... Alexander From dubois1@llnl.gov Mon Jun 7 22:41:45 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Mon, 7 Jun 1999 14:41:45 -0700 Subject: [Matrix-SIG] Multipack is growing.... References: Message-ID: <001f01beb12e$8a6f1420$f4160218@plstn1.sfba.home.com> ----- Original Message ----- From: Travis Oliphant To: Sent: Monday, June 07, 1999 12:47 PM Subject: [Matrix-SIG] Multipack is growing.... > > Eventually I could see placing LinearAlgebra, RandomArray, and FFT all > under the Multipack umbrella and leaving Numeric to hold the objects > themselves with their methods. This is a very tentative eventually, of > course, and for now isn't worth the effort. > Actually, I think this is a very good idea. Strip the standard Numerical distribution down to the basics and have all the application code elsewhere. The truth is that there are several problems with the current setup. For one thing, there are competing choices for some of these packages, such as random numbers. They should all be on an even footing. For another, I lack the resources to maintain and test this stuff; with David's help we are managing to keep our head above water on the core parts but that's it. The rationale for the present setup is that people could get everything in one go and have a really usefull set of functions out of the box. But the fact is that just the core itself is quite capable and useful. While I'm on the subject of NumPy, I wondered if anyone has looked into the question of making numerical arrays use ExtensionObject.h? Would this be hard? I know it would be useful. From HYoon@exchange.ml.com Mon Jun 7 23:04:58 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Mon, 7 Jun 1999 18:04:58 -0400 Subject: [Matrix-SIG] Multipack is growing.... Message-ID: >From: Alexandar The work you've been doing is excellent! The only reservation I have is that the distribution appears to be limited to the source code and binary implementations for linux. If this truly is going to develop into the standard set of extensions to NumPy will there be binary distributions for other platforms (read Windows XX!)? I Second that. The NT often seems second thought and even with Linux's popularity, it still rules desktops. Some numeric modules like netCDF was not compileable, at least during my few attemps. And NaN implementation is still diff. Cross platform compatiability is one of the greatest strength of Python. Great job otherwise. I really love just puching away at my numeric prb from Python. ------------------- On a side note, does anyone know how to create a NaN value from a C++ extension modules? This is how I used to do this in Gauss: //Unix Missing Values #define ISMISSINGVAL(A) ( (*((unsigned short *)(A)) & (unsigned short)0x7FFF) > (unsigned short)0x7FF0 ) #define MKMISSINGVAL(A) ( *((unsigned int *)(A)) = 0x7fffffff, *((unsigned int *)(A)+1) = (unsigned int)0 ) // NT missing #define ISMISSINGVAL(A) ( (*((unsigned short *)(A)+3) & (unsigned short)0x7FFF) > (unsigned short)0x7FF0 ) #define MKMISSINGVAL(A) ( *((unsigned int *)(A)+1) = 0x7fffffff, *((unsigned int *)(A)) = (unsigned int)0 ) ; case DT_DOUBLE: holdLst.append(Float (*(double *)pData)); break; case DT_CHAR: tmpStr = (string)pData; tmpStr = tmpStr.substr(0,cch); tmpStr = tmpStr.substr(0,tmpStr.find_last_not_of(' ')+1); holdLst.append((String)tmpStr); break; } } else holdLst.append(Float(-9999.99)); // ****************currently I am marking with this ugly thing************* //(unsigned short)0x7FF0)); //0x7fffffff return holdLst; } I am finally writing extensions, thanks to cxx. Hopefully I can contribute more in the future (I got few pretty nasty models that needs to be hijacked from Splus). ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** > -----Original Message----- > From: Alexander Lawhead > Sent: Monday, June 07, 1999 5:09 PM > To: matrix-sig@python.org > Subject: Re: [Matrix-SIG] Multipack is growing.... > > On Mon, 7 Jun 1999, Travis Oliphant wrote: > > > > > Many of you have noticed that I have been quite active on this list for > > quite a little while releasing several different packages all aimed to > be > > quite general. > > The work you've been doing is excellent! The only reservation I have is > that the distribution appears to be limited to the source code and binary > implementations for linux. If this truly is going to develop into the > standard set of extensions to NumPy will there be binary distributions for > other platforms (read Windows XX!)? > > Compiling source on a unix box is usually trivial. For those people stuck > using less flexible platforms, having binaries available is an absolute > must! Are there plans to do this? Would it be a relatively simple thing or > is this going to take a concerted effort by many people? Issues like this > should probably be tackled early on in the process... > > Alexander > > > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG@python.org > http://www.python.org/mailman/listinfo/matrix-sig From da@ski.org Tue Jun 8 00:07:00 1999 From: da@ski.org (David Ascher) Date: Mon, 7 Jun 1999 16:07:00 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Multipack is growing.... In-Reply-To: <001f01beb12e$8a6f1420$f4160218@plstn1.sfba.home.com> Message-ID: On Mon, 7 Jun 1999, Paul F. Dubois wrote: > > Actually, I think this is a very good idea. Strip the standard Numerical I have no problem with this. We'd have to be careful about licenses. The LGPL is different from the existing licenses, I think. > While I'm on the subject of NumPy, I wondered if anyone has looked > into the question of making numerical arrays use ExtensionObject.h? > Would this be hard? I know it would be useful. Yes, I've done it in the past (and in the process found a bug in ExtensionObject, which Jim reportedly since fixed). I can do it again. In fact, I shall. This week, if all goes well. --david From da@ski.org Tue Jun 8 00:08:30 1999 From: da@ski.org (David Ascher) Date: Mon, 7 Jun 1999 16:08:30 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Multipack is growing.... In-Reply-To: Message-ID: On Mon, 7 Jun 1999, Alexander Lawhead wrote: > The work you've been doing is excellent! The only reservation I have is > that the distribution appears to be limited to the source code and binary > implementations for linux. If this truly is going to develop into the > standard set of extensions to NumPy will there be binary distributions for > other platforms (read Windows XX!)? I admit I've been behind in testing Multipack, but it might be possible to use my compile.py tool to do a build. I'll investigate. --david From da@ski.org Tue Jun 8 00:14:09 1999 From: da@ski.org (David Ascher) Date: Mon, 7 Jun 1999 16:14:09 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Multipack is growing.... In-Reply-To: Message-ID: On Mon, 7 Jun 1999, David Ascher wrote: > On Mon, 7 Jun 1999, Alexander Lawhead wrote: > > > The work you've been doing is excellent! The only reservation I have is > > that the distribution appears to be limited to the source code and binary > > implementations for linux. If this truly is going to develop into the > > standard set of extensions to NumPy will there be binary distributions for > > other platforms (read Windows XX!)? > > I admit I've been behind in testing Multipack, but it might be possible to > use my compile.py tool to do a build. I'll investigate. Alas, I don't have a FORTRAN compiler on Windows, so I can't compile multipack. --david From Oliphant.Travis@mayo.edu Tue Jun 8 03:13:07 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Mon, 7 Jun 1999 21:13:07 -0500 (CDT) Subject: [Matrix-SIG] Multipack and Windows Message-ID: I empathize with users who are stuck with Windows but as I spend no time on that platform I am not inclined to spend much time making a windows binary. Of course, I am not opposed to the idea of a Windows binary either. People interested in getting some of the functionality of Multipack may be interested in the Cygnus tools such as the cygnus-win32 development enviroment (including the same GNU Fortran compiler that I use.) Some of the subroutines (from MINPACK and QUADPACK) are in the SLATEC library which is available in binary form at http://www.geocities.com/Athens/Olympus/5564/slatec.htm The current Makefile in CVS allows for compilation of separate modules and linking with your own libraries so you could use this binary library and a C compiler to get a Windows binary of some of the functionality without "too-much" trouble I suspect. Or, you could just install Linux... :-) Best of luck, Travis Oliphant From Oliphant.Travis@mayo.edu Tue Jun 8 03:24:27 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Mon, 7 Jun 1999 21:24:27 -0500 (CDT) Subject: [Matrix-SIG] Have you looked at SLATEC In-Reply-To: Message-ID: Pearu and others interested, What do you know about SLATEC? This is a very large library of FORTRAN routines that seems to cover much of what we want included in Multipack. There are binary versions for Windoes of the SLATEC library available on the net which might help some people, too... Has anybody used SLATEC who could comment on it's applicability for this probject? Travis From Oliphant.Travis@mayo.edu Tue Jun 8 07:04:02 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Tue, 8 Jun 1999 01:04:02 -0500 (CDT) Subject: [Matrix-SIG] Lapack, fftpack, and ranlib becoming part of Multipack Message-ID: I looked over the source to the lapack, fftpack, and ranlib modules and see that it will be quite simple to add them to the current Multipack distribution. In fact, they fit in quite nicely with the changes that Pearu has made. As Paul seemed to think that moving these out of the Numeric core will simplify his life as the maintainer of NumPy (a good thing...) then I will proceed. My preferred approach is to rely on full lapack libraries. Are these available for Windows? I would also like to have binary releases for several platforms but will need help getting those together from interested users. Any entrepeneur out there willing to put together CD-ROM's of binaries of what this package will be? I think many people would be interested in binary copies and be willing to pay a small fee for it... Travis From r.hooft@euromail.net Tue Jun 8 09:57:07 1999 From: r.hooft@euromail.net (Rob Hooft) Date: Tue, 8 Jun 1999 10:57:07 +0200 (MZT) Subject: [Matrix-SIG] Multipack is growing.... In-Reply-To: <001f01beb12e$8a6f1420$f4160218@plstn1.sfba.home.com> References: <001f01beb12e$8a6f1420$f4160218@plstn1.sfba.home.com> Message-ID: <14172.56035.908142.362082@octopus.chem.uu.nl> I'm just responding to express my concern about moving towards an LGPL. LGPL is quite different from the python license. Everybody working in scientific python programming must applaud the efforts that are put into Multipack. I myself have heard a number of features that I'd like to try. But: whereas I have convinced my employer that it is good to use (and help develop) free software, doing the same for the LGPL might take some more efforts. One step further: moving the current LA package out of Numeric into Multipack with LGPL is not something I'd like to see, as I'm already using it now. Regards, Rob Hooft. -- ===== R.Hooft@EuroMail.net http://www.xs4all.nl/~hooft/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! ========= From spillar@uwyo.edu Tue Jun 8 16:17:01 1999 From: spillar@uwyo.edu (Earl Spillar) Date: Tue, 08 Jun 1999 09:17:01 -0600 Subject: [Matrix-SIG] Lapack, fftpack, and ranlib becoming part of Multipack References: Message-ID: <9906081517.AA00416@galaxies.uwyo.edu> Folks: I have no objections to moving all of these packages to Multipack either, except for a variation on one worry already expressed: those of us who are stuck (or is it blessed?) with other platforms, such as NT, Apple, or Openstep^h^h^h^h^h^h^h^h Mac OS X, etc. need to build versions too. Of course Travis need not be responsible for all of these ports, but I think we need an archive that can hold alternative binaries, and some volunteers to work on them. As usual, I am busy the next couple of months- changing jobs- but I hope I will be able to contribute after that. I have found the whole Numerical Python experience extremely positive and useful, and I need to give something back! Earl Spillar spillar@uwyo.edu From dubois1@llnl.gov Tue Jun 8 16:45:59 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Tue, 8 Jun 1999 08:45:59 -0700 Subject: [Matrix-SIG] Re: Lapack, fftpack, and ranlib becoming part of Multipack References: Message-ID: <99060808570300.21361@almanac> On Mon, 07 Jun 1999, you wrote: >... > > As Paul seemed to think that moving these out of the Numeric core will > simplify his life as the maintainer of NumPy (a good thing...) then I will > proceed. Don't do anything yet. We need to get a release out first. > > My preferred approach is to rely on full lapack libraries. Are these > available for Windows? I would also like to have binary releases for > several platforms but will need help getting those together from > interested users. Any entrepeneur out there willing to put together > CD-ROM's of binaries of what this package will be? I think many people > would be interested in binary copies and be willing to pay a small fee for > it... The version in NumPy is a C version of a portion of LAPACK and it is badly written; produces many warning errors. It should be possible to get the full LAPACK from www.netlib.org. However, one of the "bugs" we have had trouble with is that on a platform that already has LAPACK you should use the native version, which often has optimized BLAS. I think thereare separate jobs here: maintaining and distributing the source, and making and distributing binaries for different platforms. Let's hope disutils takes are of the compiling issues so that the latter job becomes something easy. Paul From kernr@ncifcrf.gov Tue Jun 8 20:04:37 1999 From: kernr@ncifcrf.gov (Robert Kern) Date: Tue, 08 Jun 1999 15:04:37 -0400 Subject: [Matrix-SIG] Multipack on Windows with mingw32 Message-ID: <375D6945.8C0641D@mail.ncifcrf.gov> I have had near-perfect success using Mumit Khan's egcs-1.1.2/mingw32 distribution to compile Multipack 0.6 and Travis' other modules. Mumit's distribution comes with gcc, g++, g77, obj-c (I think). One has to use Paul Sokolovsky's Python headers (http://www.infoservice.lg.ua/~paul/devel/python_headers.html) which are slightly patched from the distributed version to declare structures correctly and provide the right config.h information. My only (minor) failure in compiling was with the FFTW wrappers. They compile fine, but they are slightly slower than the NumPy FFTPACK wrapper. This problem may be a fault in my configuration of the FFTW libraries on my machine, though. I will be placing instructions and compilation helpers on my starship account very soon. Once I compile Multipack 0.6.1, I will place binaries there, too. -- Robert Kern | ----------------------|"In the fields of Hell where the grass grows high This space | Are the graves of dreams allowed to die." intentionally | - Richard Harter left blank. | From HYoon@exchange.ml.com Tue Jun 8 20:53:11 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Tue, 8 Jun 1999 15:53:11 -0400 Subject: [Matrix-SIG] To make NaN work on NT (was) Multipack on Windows with mingw32 Message-ID: Has anyone made CXX work with mingw32? Plus: I made slight improvement on Janko's NaN finding syntax to make the thing work for NT and Win platform. Somehow INF trick below works. comparA = not_equal(d,INF) Not sure why, but it works and I am happy. ---------------------- from Numeric import * import sys INF = 1e1000**1000000 NA = (INF/INF) def packr( M, axis=1): """ Returns only rows or colums, which is Not NaN (Packr is Gauss by Aptech func) I am not sure this works for n-dim matrix, but works for 2d. axis is bit confusing for a Gauss person """ return take( M, isanx(M)) def isanx(d, axis=1): if sys.platform == 'win32': comparA = not_equal(d,INF) if len(d.shape) == 2: return nonzero(equal(add.reduce(comparA,axis), d.shape[axis])) elif len(d.shape) == 1: return nonzero(ComparA) else: raise IndexError, "function : packr is for up to 2D matrix only" else: if len(d.shape) == 2: return nonzero(equal(add.reduce(equal(d,d),axis), d.shape[axis])) elif len(d.shape) == 1: return nonzero(equal(d,d)) else: raise IndexError, "function : packr is for up to 2D matrix only" if __name__ == '__main__': x = zeros((3,6),'f') x[2,4] = NA print x print packr(x) ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** From tom@spirit.gcrc.upenn.edu Wed Jun 9 22:05:15 1999 From: tom@spirit.gcrc.upenn.edu (Tom Fenn) Date: Wed, 09 Jun 1999 17:05:15 -0400 Subject: [Matrix-SIG] Would like references to current Python usage. Message-ID: <375ED70B.5B745A79@spirit.gcrc.upenn.edu> Hello, Next Monday I'm giving a talk about Python at the ASMS Conference on Mass Spectrometry. I would like to mention some previous cases where Python has been used for scientific projects, but except for Konrad Hinsen's MMTK, I am ignorant of specifics. If anyone has a project they would like to have mentioned, or feels is a particular convincing example of Python's applicability to scientific computing, please let me know about it. Thanks, Tom Fenn From Oliphant.Travis@mayo.edu Thu Jun 10 17:45:06 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Thu, 10 Jun 1999 11:45:06 -0500 (CDT) Subject: [Matrix-SIG] Advice on an how to handle errors in extension module. Message-ID: I'm writing some Python-C interfaces to some mathematical libraries in FORTRAN. In particular, I'm working on interfacing QUADPACK with Python. The code is basically complete and an old version can be found in Multipack at http://oliphant.netpedia.net The problem is that the FORTRAN code requires that there be an (external) function defined which computes the integrand. This function must be of a certain type. The function I use is a just wrapper that handles calling the user defined Python function (stored in a static global variable). The question I have is that if a Python error occurs while evaluation of the user-defined function, I'm not sure what to do inside this wrapper function which can only return a double. Currently, I'm printing the error (if it hasn't been printed before) and returning a zero. This won't stop the integrator from callling the function again but it will have the effect of treating the function as if it were zero. This actually works really well for 1-D integration. Lately, though, I've just made the C-interface so that it can be re-entrant (so the Python function itself could call an integration routine --- allows for easy multiple integration). But, this exposes the inelegance of my solution for handling a Python error. I'm just fishing for ideas from more knowledgeable people on this list. Travis Oliphant From robin@jessikat.demon.co.uk Thu Jun 10 19:29:17 1999 From: robin@jessikat.demon.co.uk (Robin Becker) Date: Thu, 10 Jun 1999 19:29:17 +0100 Subject: [Matrix-SIG] Advice on an how to handle errors in extension module. In-Reply-To: References: Message-ID: <$dQe0FA9PAY3EwCu@jessikat.demon.co.uk> In article , Travis Oliphant writes > >I'm writing some Python-C interfaces to some mathematical libraries in >FORTRAN. In particular, I'm working on interfacing QUADPACK with Python. >The code is basically complete and an old version can be found in >Multipack at http://oliphant.netpedia.net > >The problem is that the FORTRAN code requires that there be an (external) >function defined which computes the integrand. This function must be of a >certain type. The function I use is a just wrapper that handles calling >the user defined Python function (stored in a static global variable). > >The question I have is that if a Python error occurs while evaluation of >the user-defined function, I'm not sure what to do inside this wrapper the real problem here (I assume) is that the fortran code doesn't know about errors in the integrand function. Well written fortran used to have error returns etc etc., but the quadpack doesn't. The correct way to do this would be to rewrite the calls to the function to allow for an error return. Alternatively what I did for the tcl dll caller was to allow for an error return using a setjump in the main interface. How are you dealing with errors in the quadpack code? >function which can only return a double. Currently, I'm printing the >error (if it hasn't been printed before) and returning a zero. This won't >stop the integrator from callling the function again but it will have the >effect of treating the function as if it were zero. > >This actually works really well for 1-D integration. Lately, though, I've >just made the C-interface so that it can be re-entrant (so the Python >function itself could call an integration routine --- allows for easy >multiple integration). But, this exposes the inelegance of my solution >for handling a Python error. > >I'm just fishing for ideas from more knowledgeable people on this list. > >Travis Oliphant > > >_______________________________________________ >Matrix-SIG maillist - Matrix-SIG@python.org >http://www.python.org/mailman/listinfo/matrix-sig -- Robin Becker From HYoon@exchange.ml.com Fri Jun 11 15:28:27 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Fri, 11 Jun 1999 10:28:27 -0400 Subject: [Matrix-SIG] NumPy and XEmacs Crashes on Error on NT Message-ID: Hi, Just wanted to let you all NTXEmacs users out there that. The following problem no longer crashes pyshell on NTXEmacs. Just download the latest release at: ftp://ftp.ese-metz.fr/pub/xemacs/win32/xemacs-21.2b15-win32.zip "This is a bulk version of the latest xemacs compiled natively for win32 platforms. In order to use it, please make EMACSPACKAGEPATH point at the XEmacs directory. -Fabrice Popineau" It also fixes a lot of other problem as well. So, pls upgrade to this June 5 release. Still Beta, but a lot better. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** > -----Original Message----- > From: Yoon, Hoon (CICG - NY Program Trading) > Sent: Tuesday, May 25, 1999 2:56 PM > Cc: matrix-sig@python.org > Subject: [Matrix-SIG] NumPy and XEmacs Crashes on Error on NT > > from Numeric import * > from LinearAlgebra import * > matrixmultiply(array([[1,0],[0,1]], 'f'),array([[2],[1]], 'f')) > matrixmultiply(array([[1,0],[0,1]], 'f'),array([2],[1], 'f')) # crash! > > Hi, > > For some reason, while using NTEmacs beta [version 21.0; Aug 1998] by > Charles W. I am getting crashes on many call errors as above and python > mode > gets killed. > Is there a newer version of NTEmacs that does not crash? It just brings > up > exception as expected on python console or pythonwin: > Traceback (innermost last): > File "", line 0, in ? > TypeError: illegal argument type for built-in operation > Thanks, > > ************************************************************** > S. Hoon Yoon (Quant) Merrill Lynch > Equity Trading > yelled@yahoo.com hoon@bigfoot.com(w) > "Miracle is always only few standard deviations away, but so is > catastrophe." > * Expressed opinions are often my own, but NOT my employer's. > "I feel like a fugitive from the law of averages." Mauldin > ************************************************************** > > > -----Original Message----- > > From: Jannie Hofmeyr > > Sent: Friday, May 21, 1999 2:52 AM > > To: neelk@alum.mit.edu > > Cc: matrix-sig@python.org > > Subject: Re: [Matrix-SIG] Small patch to UserArray.py in > > LLNLDistribution11 > > > > Neel Krishnaswami wrote: > > > > > I just downloaded and built the Numeric package in LLNL Distribution > > > 11. I found a small problem in the UserArray.py module though -- it > > > wasn't working correctly because of inconsistent indentation in the > > > UserArray.__init__() method. (The indentation incorrectly defined all > > > the methods *inside* __init__()'s scope.) > > > > The problem is that the def __init__ is indented with spaces while the > > rest of the class is indented with tabs. > > > > Jannie > > > > > > > > _______________________________________________ > > Matrix-SIG maillist - Matrix-SIG@python.org > > http://www.python.org/mailman/listinfo/matrix-sig > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG@python.org > http://www.python.org/mailman/listinfo/matrix-sig From HYoon@exchange.ml.com Fri Jun 11 14:57:24 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Fri, 11 Jun 1999 09:57:24 -0400 Subject: [Matrix-SIG] Generating NaN and catching exceptions Message-ID: Hi, I was wondering how can I generate and use Python NaN's from my CXX code. Currently I am using the code below: (which generates this 1.#QNAN thing in NT) I would like to change it to what I generate for Python by INF = 1e1000**1000000 NA = (INF/INF) I try passing in this to Cxx extension as arg, but that does not quite work. Not to mention that I would like to assign this a variable rather than function call. pyRTL.NA & not NA = pyRTL.mkMsg() I guess I am really looking for what's equivalent to what's in #define ISMISSING and MKMISSING. One other thing is I tried, catch(const Exception&) { // Catch ValueError? but what I really want is catch(const ValueError&) whenever I pass in something like a String, as arg to ex_tst_msg(), but it does not seem to catch it. I thought that was strange, but C++ is a tempermental mistress. Any sample code or help will be appreciated greatly! hn_rtl.h file--------- //Unix Missing Values //#define ISMISSINGVAL(A) ( (*((unsigned short *)(A)) & (unsigned short)0x7FFF) > (unsigned short)0x7FF0 ) //#define MKMISSINGVAL(A) ( *((unsigned int *)(A)) = 0x7fffffff, *((unsigned int *)(A)+1) = (unsigned int)0 ) // NT missing #define ISMISSINGVAL(A) ( (*((unsigned short *)(A)+3) & (unsigned short)0x7FFF) > (unsigned short)0x7FF0 ) #define MKMISSINGVAL(A) ( *((unsigned int *)(A)+1) = 0x7fffffff, *((unsigned int *)(A)) = (unsigned int)0 ) in my cxx prg----------------- static Float mkMsg() { double *msgV; double msgv; msgv = 0.0; msgV = &msgv; MKMISSINGVAL(msgV); return Float(msgv); } static PyObject * ex_mkMsg(PyObject* self, PyObject* args) { return new_reference_to(mkMsg()); } static PyObject * ex_tst_msg(PyObject* self, PyObject* args) { try { Tuple tArgs(args); Float isMsgV = Float(tArgs[0]); double msgv = (double) isMsgV; double *msgV = &msgv; if ISMISSINGVAL(msgV) return new_reference_to(Int(1)); else return new_reference_to(Int(0)); // if (NAval == isMsgV) return new_reference_to(Int(0)); } catch(const Exception&) { // Catch ValueError? return new_reference_to(Int(0)); //cout << "Exception in clearing all ref.\n"; //return Null (); } } ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** From Oliphant.Travis@mayo.edu Mon Jun 14 21:46:07 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Mon, 14 Jun 1999 15:46:07 -0500 (CDT) Subject: [Matrix-SIG] New Multipack release (0.7) Message-ID: I'm announcing the release of Multipack-0.7 at http://oliphant.netpedia.net/ Multipack is a collection of Python extension modules which use the Numeric extension module API to provide a number of FORTRAN routines to the Numeric Python user. Included in this release are routines to numerically: - solve N nonlinear equations in N unknowns. - minimize m nonlinear equations in n unknowns (Levenberg-Marquardt) - integrate an ordinary differential equation (stiff or nonstiff) - integrate a function of 1, 2, or 3 variables. - fit a set of points to a 1 or 2-D spline and find derivatives, integrals, interpolations, etc. of those splines. (thanks Pearu Peterson!) Documentation has been updated and improved for this release and the way to contribute a module has been streamlined and detailed. It is now quite easy to add your own interfaces to the package. A FORTRAN compiler (or f2c) is required to compile the source. Comments and contributions are welcome. Travis Oliphant Oliphant.Travis@altavista.net From kernr@ncifcrf.gov Tue Jun 15 05:07:54 1999 From: kernr@ncifcrf.gov (Robert Kern) Date: Tue, 15 Jun 1999 00:07:54 -0400 Subject: [Matrix-SIG] Win32 Python Extensions with egcs Message-ID: <3765D19A.86517E3E@mail.ncifcrf.gov> Hello All, My Starship cabin is now (mostly) up. The instructions, notes, and tools for compiling Python extensions on Win32 with egcs are located at http://starship.python.net/crew/kernr/mingw32/Notes.html I will have Win32 binaries of relevant packages (e.g. Multipack 0.7) up soon. Please give me feedback if the page is incomprehensible. :-) Have fun! -- Robert Kern | ----------------------|"In the fields of Hell where the grass grows high This space | Are the graves of dreams allowed to die." intentionally | - Richard Harter left blank. | From jbaddor@sca.uqam.ca Tue Jun 15 23:53:47 1999 From: jbaddor@sca.uqam.ca (Jean-Bernard ADDOR) Date: Tue, 15 Jun 1999 18:53:47 -0400 (EDT) Subject: [Matrix-SIG] auto-correlation matrix in NumPy? Message-ID: Hy! I need to compute the auto-correlation matrix of a matrix. I have just seen there was a function corrcoef() in the Mlab module. I have no idea if it could work for me? If yes, where can I find the Mlab module as it is not in my LLNL 11 distribution? Can I find this function somethere else? Jean-Bernard From janne@avocado.pc.helsinki.fi Wed Jun 16 07:30:44 1999 From: janne@avocado.pc.helsinki.fi (Janne Sinkkonen) Date: 16 Jun 1999 09:30:44 +0300 Subject: [Matrix-SIG] auto-correlation matrix in NumPy? In-Reply-To: Jean-Bernard ADDOR's message of "Tue, 15 Jun 1999 18:53:47 -0400 (EDT)" References: Message-ID: Jean-Bernard ADDOR writes: > I need to compute the auto-correlation matrix of a matrix. If you mean you want the correlation matrix of a data matrix, you could do something like def covariance(X): N=X.shape[0] mX=sum(X)/N return dot(transpose(X),X)/N-multiply.outer(mX,mX) def correlation(X): C=covariance(X) V=diagonal(C) return C/sqrt(multiply.outer(V,V)) -- Janne From HYoon@exchange.ml.com Wed Jun 16 20:37:23 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Wed, 16 Jun 1999 15:37:23 -0400 Subject: [Matrix-SIG] rotater? Message-ID: Hi, Gauss has this thing called rotater, which given array rotates it by another value in array x = [[1,2,3], [2,3,4]] shft = [1,2] rotater(x, shft) will give you [[2,3,1], [4,2,3]] Easy enough to do in a loop, but can I avoid this somehow? ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** From tim.hochberg@ieee.org Wed Jun 16 20:30:43 1999 From: tim.hochberg@ieee.org (Tim Hochberg) Date: Wed, 16 Jun 1999 13:30:43 -0600 Subject: [Matrix-SIG] rotater? Message-ID: <014901beb82e$d6befdc0$783fa4cd@R20CAREY.MAYO.EDU> This is a multi-part message in MIME format. ------=_NextPart_000_0144_01BEB7FC.6F547480 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Yoon, Hoon (CICG - NY Program Trading) wrote: > Gauss has this thing called rotater, which given array rotates it by >another value in array > >x = [[1,2,3], [2,3,4]] >shft = [1,2] >rotater(x, shft) >will give you >[[2,3,1], [4,2,3]] > Easy enough to do in a loop, but can I avoid this somehow? It pretty easy to avoid the inner loop due to shifting all of the elements, you still may have to loop over the other axes, but perhaps this will get you started: -tim ------=_NextPart_000_0144_01BEB7FC.6F547480 Content-Type: text/plain; name="rotater.py" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="rotater.py" from Numeric import * def rotater(a, n): """Rotates the array a by n""" a = asarray(a) # Extension to multiple dimensions left as an excercise to the reader. result = zeros(shape(a), a.typecode()) # Make rotation positive n = n % len(a) if n == 0: return a result[:n] = a[-n:] result[n:] = a[:-n] return result x = [1,2,3,4,5] print rotater(x, 1) print rotater(x, 2) print rotater(x, -1) print rotater(x, -2) ------=_NextPart_000_0144_01BEB7FC.6F547480-- From kernr@ncifcrf.gov Wed Jun 16 20:16:35 1999 From: kernr@ncifcrf.gov (Robert Kern) Date: Wed, 16 Jun 1999 15:16:35 -0400 Subject: [Matrix-SIG] Multipack 0.70 Win32 Binaries Message-ID: <3767F813.19FFEC77@ncifcrf.gov> Hello again, To justify my last post here, I have compiled Travis Oliphant's Multipack modules Windows 95/98 (maybe NT) systems. They are located at (http://starship.python.net/crew/kernr/binaries/Multipack-0.70w.zip). I have only tested them on Pentium-2 Windows 95 and a Pentium-2 Windows 98 machines so far. They run tst.py satisfactorily (as far as I can tell). They may not work on NT. If anyone has an NT system handy, please try them and tell me if they work. To install, just unzip the archive into a directory. It will create a "Multipack" subdirectory if you unzip it to preserve directory structure. All the files are located in this directory or a subdirectory thereof. In addition to binary extension modules, this ZIP also includes the Python files for running Multipack, the User's Guide in the "doc" subdirectory, some files I used to compile it in "src", and the original README as well as a README.w32 for these binaries in the root directory. Note: These binaries were compiled against the SLATEC library as compiled for mingw32 at (http://www.geocities.com/Athens/Olympus/5564/slatec.html); it includes Pentium-optimized, assembly-coded BLAS1 routines. If this causes anyone problems, I can do an all-FORTRAN compile of the libraries included in Multipack. Also, I have mirrored the multipack-0.7.tgz file at (http://starship.python.net/crew/kernr/source/multipack-0.7.tgz) because I could only get a corrupted file from Travis' site through Netscape on Windows. Have fun and tell me if it works. Robert Kern From HYoon@exchange.ml.com Wed Jun 16 21:47:48 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Wed, 16 Jun 1999 16:47:48 -0400 Subject: [Matrix-SIG] rotater? Message-ID: Tim, Thanks much for your answer. It partially answer my Q. Unfortunately, this still means I need to loop over axis as you have said. Probably C extension time. Just few things that one expects when converting from one lang to another. Most stuff are easy, but few things are still easier on the old lang. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** > -----Original Message----- > From: Tim Hochberg > Sent: Wednesday, June 16, 1999 3:31 PM > To: Yoon, Hoon (CICG - NY Program Trading) > Cc: matrix-sig@python.org > Subject: Re: [Matrix-SIG] rotater? > > Yoon, Hoon (CICG - NY Program Trading) wrote: > > > Gauss has this thing called rotater, which given array rotates it by > >another value in array > > > >x = [[1,2,3], [2,3,4]] > >shft = [1,2] > >rotater(x, shft) > >will give you > >[[2,3,1], [4,2,3]] > > Easy enough to do in a loop, but can I avoid this somehow? > > > It pretty easy to avoid the inner loop due to shifting all of the > elements, > you still may have to loop over the other axes, but perhaps this will get > you started: > > -tim > << File: rotater.py >> From da@ski.org Wed Jun 16 21:53:14 1999 From: da@ski.org (David Ascher) Date: Wed, 16 Jun 1999 13:53:14 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Multipack 0.70 Win32 Binaries In-Reply-To: <3767F813.19FFEC77@ncifcrf.gov> Message-ID: On Wed, 16 Jun 1999, Robert Kern wrote: > > To justify my last post here, I have compiled Travis Oliphant's > Multipack modules Windows 95/98 (maybe NT) systems. They are located at > (http://starship.python.net/crew/kernr/binaries/Multipack-0.70w.zip). I > have only tested them on Pentium-2 Windows 95 and a Pentium-2 Windows 98 > machines so far. They run tst.py satisfactorily (as far as I can > tell). They may not work on NT. If anyone has an NT system handy, > please try them and tell me if they work. Seems to work for me, NT4SP3. Thanks again! --david From vanandel@ucar.edu Wed Jun 16 21:56:33 1999 From: vanandel@ucar.edu (Joe Van Andel) Date: Wed, 16 Jun 1999 14:56:33 -0600 Subject: [Matrix-SIG] Memory leaks in netCDFmodule? Message-ID: <37680F81.AEF4D479@ucar.edu> This is a multi-part message in MIME format. --------------27E54049BB21808E8F01A259 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I'm using Konrad Hinsen's netCDFmodule-1.0.3 with Python 1.5.2 and Numeric Python (LLNLDistribution11). Now that my application is nearly completed, I'm trying to find and fix the various memory leaks, such that I can process an arbitrary number of files without running out of memory. One of the leaks is associated with the netCDFmodule. When I open a file, e.g: a1file=NetCDFFile(file, 'r') the "C" routine netcdf_variable_new is called for each variable. However, when I'm done with this file, the memory for these variables should go away, and it doesn't. (PyNetCDFVariableObject_dealloc is never called.) I did find an interesting workaround. Immediately after creating a netCDF file object, >>> sys.getrefcount(a1file) 12 Reading the source code, some of these references are caused by creating the variables that reside in the file. If I explicitly free each variable, using code like: for v in a1file.variables.keys(): a1file.variables[v] = None Would everyone agree that the .close() method should explicitly free these variables, so all memory is recovered? Thanks much. I've attached a context diff that fixes 1 memory leak in netcdfmodule.c, I'm still chasing others. -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel@ucar.edu --------------27E54049BB21808E8F01A259 Content-Type: text/plain; charset=us-ascii; name="cdiff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="cdiff" *** netcdfmodule.orig Wed Jun 16 14:52:57 1999 --- netcdfmodule.c.clean Wed Jun 16 14:54:00 1999 *************** *** 171,176 **** --- 171,177 ---- int nattrs; { char name[MAX_NC_NAME]; + nc_type type; int length; int py_type; *************** *** 186,191 **** --- 187,193 ---- ncattget(fileid, varid, name, s); s[length] = '\0'; string = PyString_FromString(s); + free(s); if (string != NULL) { PyDict_SetItemString(attributes, name, string); Py_DECREF(string); *************** *** 273,278 **** --- 275,282 ---- PyNetCDFFileObject_dealloc(self) PyNetCDFFileObject *self; { + + if (self->open) PyNetCDFFile_Close(self); Py_XDECREF(self->dimensions); *************** *** 809,816 **** { if (self->dimids != NULL) free(self->dimids); ! if (self->name != NULL) free(self->name); Py_XDECREF(self->file); PyMem_DEL(self); } --- 813,822 ---- { if (self->dimids != NULL) free(self->dimids); ! if (self->name != NULL) { free(self->name); + } + Py_XDECREF(self->file); PyMem_DEL(self); } *************** *** 827,835 **** --- 833,843 ---- int *dimids; int nattrs; { + PyNetCDFVariableObject *self; int recdim; int i; + if (check_if_open(file, -1)) { self = PyObject_NEW(PyNetCDFVariableObject, &PyNetCDFVariable_Type); if (self == NULL) --------------27E54049BB21808E8F01A259-- From hinsen@cnrs-orleans.fr Thu Jun 17 09:49:03 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 17 Jun 1999 10:49:03 +0200 Subject: [Matrix-SIG] Re: Memory leaks in netCDFmodule? In-Reply-To: <37680F81.AEF4D479@ucar.edu> (message from Joe Van Andel on Wed, 16 Jun 1999 14:56:33 -0600) References: <37680F81.AEF4D479@ucar.edu> Message-ID: <199906170849.KAA17873@chinon.cnrs-orleans.fr> > One of the leaks is associated with the netCDFmodule. When I open a > file, e.g: > > a1file=NetCDFFile(file, 'r') > > the "C" routine netcdf_variable_new is called for each variable. > However, when I'm done with this file, the memory for these variables > should go away, and it doesn't. Right, because there are circular references (the file object keeps a list of variables, and each variable has a reference to the file object). This has been a known problem since the first version, but it was never important for me, as I never use more than two netCDF files in a program. But of course it should be fixed ultimately. > I did find an interesting workaround. > > Immediately after creating a netCDF file object, > >>> sys.getrefcount(a1file) > 12 > > Reading the source code, some of these references are caused by creating > the variables that reside in the file. If I explicitly free each > variable, using code like: > > for v in a1file.variables.keys(): > a1file.variables[v] = None I don't think this is sufficient in general. There might be other references to the variable objects elsewhere, for example in user code. On the other hand, the variable objects are useless once the file has been closed, so one could remove all references from the variables to the file without loss of functionality. Then all objects (file and variables) would be freed unless references remain in user code. And that's all one could hope for anyway. > I've attached a context diff that fixes 1 memory leak in netcdfmodule.c, > I'm still chasing others. Thanks, I'll try to work them into the current version. Which is in fact already available (but not yet advertised) on my FTP server: ftp://dirac.cnrs-orleans.fr/pub/ScientificPython-2.0a5.tar.gz (I decided to integrate the netCDF module into the ScientificPython package in order to have fewer distributions to worry about). Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From janne@avocado.pc.helsinki.fi Thu Jun 17 12:07:46 1999 From: janne@avocado.pc.helsinki.fi (Janne Sinkkonen) Date: 17 Jun 1999 14:07:46 +0300 Subject: [Matrix-SIG] rotater? In-Reply-To: "Yoon, Hoon's message of "Wed, 16 Jun 1999 16:47:48 -0400" References: Message-ID: "Yoon, Hoon (CICG - NY Program Trading)" writes: > Thanks much for your answer. It partially answer my Q. Unfortunately, this > still means I need to loop over axis as you have said. Probably C extension > time. You can avoid looping over other axes by first swapping the target axis to position -1 (bottom-most), then rotating by result[...,:n] = a[...,-n:] result[...,n:] = a[...,:-n] and then swapping back. Swap axes by Numeric.transpose (official, cumbersome) or Numeric.swapaxes(a,axis0,axis1) (undocumented, easier). Swapping axes or transposing in general is computationally cheaper than it appears, because array data is not copied. -- Janne From Warren B. Focke" Message-ID: On 17 Jun 1999, Janne Sinkkonen wrote: > Swapping axes or transposing in general is computationally > cheaper than it appears, because array data is not copied. Careful. The data is not copied when the axes are swapped, but it is, in many cases, copied to a contiguous temporary object before any use is made of it. The copying happens at c speed though, so still usually beats python loops for speed. Warren Focke From hinsen@dirac.cnrs-orleans.fr Thu Jun 17 14:20:04 1999 From: hinsen@dirac.cnrs-orleans.fr (hinsen@dirac.cnrs-orleans.fr) Date: Thu, 17 Jun 1999 15:20:04 +0200 Subject: [Matrix-SIG] Lapack, fftpack, and ranlib becoming part of Multipack Message-ID: <199906171320.PAA18389@chinon.cnrs-orleans.fr> > I looked over the source to the lapack, fftpack, and ranlib modules > and see that it will be quite simple to add them to the current > Multipack distribution. In fact, they fit in quite nicely with the > changes that Pearu has made. Sorry for the late reply; I just discovered that my subscription to the Matrix-SIG was deactivated for some reason unknown to me, so I missed all of this month's traffic. I agree that it makes sense to move the non-essential parts of NumPy somewhere else, but there are also some compatibility considerations, since many other packages already use these modules. First of all, NumPy contains C versions of everything, whereas Multipack uses the original Fortran versions. It is certainly better to use the Fortran versions whereever possible, but it also makes installation significantly more difficult (or even impossible, for those who don't have a Fortran compiler). Installation problems are already the single most frequent cause of questions I get about my packages, and therefore I would definitely refuse to make any of my published code dependent on other modules that require Fortran. In fact, I'd prefer to see Multipack move towards a "Fortran if possible, C if not" approach. Running the Fortran modules through f2c is not much work. Then a (yet-to-be-written) installation script would decide if it can use the Fortran version on a given machine, and if not would use the C translations. The best of both worlds: maximum performance and maximum portability. Another problem is licenses. As I understand it, LGPL is more restrictive than the Python/NumPy license, although I don't understand the details. In fact, I don't understand at all what LGPL (as opposed to GPL) means for Python code, which is not linked to anything! I'd just like to make sure that licensing problems don't prevent non-academic users from using the code. Finally, a suggestion for Multipack: Instead of importing everything into one top-level module, I'd prefer a package structure. In fact, I'd prefer to have each "XXXpack" as one module within Numeric (that doesn't require all the code to be distributed with NumPy, but it does require some changes to the current NumPy distribution). Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Thu Jun 17 14:43:17 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 17 Jun 1999 15:43:17 +0200 Subject: [Matrix-SIG] NetCDF memory leak Message-ID: <199906171343.PAA18648@chinon.cnrs-orleans.fr> After looking again at the netCDF interface code with respect to circular reference, I settled for a minimal solution that has the advantage of not changing the externally visible behaviour at all: when a file is closed, the references from the variable objects to the file objects are removed. However, the variable objects will still be in the variable dictionary of the file object. This means that no variable object will be deallocated before the file object is deallocated. And of course no object will be deallocated as long as there are references to it in user code. So if you use many netCDF files, make sure that no references to files are kept after they are closed. The fixed version will be in the next alpha release of Scientific Python (2.0a6), available later today or tomorrow on my FTP server (ftp://dirac.cnrs-orleans.fr). -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis@mayo.edu Thu Jun 17 16:59:22 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Thu, 17 Jun 1999 10:59:22 -0500 (CDT) Subject: [Matrix-SIG] Multipack Message-ID: Thanks for contributing your comments, Konrad. I'd been wondering where you had gone to, given you have played such an important role in the development of NumPy, I was sure you'd have some comments on my latest proposal. > In fact, I'd prefer to see Multipack move towards a "Fortran if > possible, C if not" approach. Running the Fortran modules through f2c > is not much work. Then a (yet-to-be-written) installation script would > decide if it can use the Fortran version on a given machine, and if > not would use the C translations. The best of both worlds: maximum > performance and maximum portability. This is a good idea, so do we require availability of f2c or distribute it with it? I guess what you are saying is deliver a set of translated-to-C sources with the distribution. This gets pretty big, the distribution will already be quite large. Perhaps two distributions? You can either download the Fortran only or the C-translated versions? With the availability of g77 on many platforms (including Windows) is it that much of a problem to require a Fortran compiler to compile? This does not preclude someone from making a binary available for platforms that don't have easy-to-install Fortran compilers (e.g. Robert Kern has done). I admit I have little experience with the MAC so I'm not sure what the Fortran compiler scene looks like there. > Another problem is licenses. As I understand it, LGPL is more > restrictive than the Python/NumPy license, although I don't understand > the details. In fact, I don't understand at all what LGPL (as opposed > to GPL) means for Python code, which is not linked to anything! I'd > just like to make sure that licensing problems don't prevent > non-academic users from using the code. There seems to be quite a bit of fear of the GPL floating around. If all you care about is using the code then the GPL has nothing to say. You can use it to do anything you want. The GPL and LGPL (lesser GPL) only limit what you can do when you want to distribute code derived from it. I picked the LGPL because it allows you to link against it (i.e. import the shared module into Python) without delivering the source code to the module that imports it. As far as I know companies can still write Python code which depends on the LGPL'd module code and not distribute their own sources. What a company cannot do is change the LGPL'd module, distribute the changed module as a binary only, and not release code back to the public. I really don't see how this gets in the way of anybody except people who don't want to play nicely. I can see that how people (like bosses) feel about the LGPL may be a problem for some, so my position on this issue is not immovable. At this point any changes will have to be approved by Pearu Peterson too. I'm anxious to hear feedback from people with actual problems using the code because of the license, not just ideas about hypothetical problems. > Finally, a suggestion for Multipack: Instead of importing everything > into one top-level module, I'd prefer a package structure. In fact, > I'd prefer to have each "XXXpack" as one module within Numeric (that > doesn't require all the code to be distributed with NumPy, but it does > require some changes to the current NumPy distribution). I'm not exactly sure what you mean here. Version 0.7 of Multipack is compiled as separate C-code modules and separate python modules. There is one Python module called Multipack that does nothing but import all of the other modules. This allows one to interact with all the code through Multipack or in a more modular fashion according to taste. I am proposing making LinearAlgebra and friends one more module underneath Multipack (or whatever other name people find more appealing). Thanks again for your feedback, Travis -------------------------------------------------- Travis Oliphant 200 First St SW Rochester MN 55905 Ultrasound Research Lab (507) 286-5293 Mayo Graduate School Oliphant.Travis@mayo.edu From hinsen@cnrs-orleans.fr Thu Jun 17 19:27:19 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 17 Jun 1999 20:27:19 +0200 Subject: [Matrix-SIG] Multipack In-Reply-To: (message from Travis Oliphant on Thu, 17 Jun 1999 10:59:22 -0500 (CDT)) References: Message-ID: <199906171827.UAA18782@chinon.cnrs-orleans.fr> > This is a good idea, so do we require availability of f2c or distribute it > with it? I guess what you are saying is deliver a set of translated-to-C > sources with the distribution. This gets pretty big, the distribution > will already be quite large. Perhaps two distributions? You can either > download the Fortran only or the C-translated versions? That could be a solution, but then again the user must know what to do. Believe it or not, I have had reports of beginning MMTK users who sent me "weird error messages during installation" which pointed out nothing else than that there was no C compiler installed! So you can't even expect some users to know their system installation, although I hope this is exceptional! > With the availability of g77 on many platforms (including Windows) is it > that much of a problem to require a Fortran compiler to compile? This If you don't have g77, then installing it is a major task. I know many scientific users who have workstations without Fortran compilers. I wish they'd all switch to Linux where life is easy! But the more important problem is not having a Fortran compiler but knowing how to link Fortran and C code together, especially in shared libraries. It took me an hour on some machines to figure out how to do it, and that with plenty of experience in this business. On some systems (e.g. DigitalUnix, but that was two years ago) I didn't succeed at all. > does not preclude someone from making a binary available for platforms > that don't have easy-to-install Fortran compilers (e.g. Robert Kern I don't know how realistic binary distributions are for Unix systems. They are beginning to be a problem for Linux, with so many incompatible versions of libc. > What a company cannot do is change the LGPL'd module, distribute the > changed module as a binary only, and not release code back to the public. > I really don't see how this gets in the way of anybody except people who > don't want to play nicely. Me neither. But then what is all the fuss about? Some companies seem to consider GPL/LGPL as an evil worse than Microsoft! > compiled as separate C-code modules and separate python modules. There is > one Python module called Multipack that does nothing but import all of the > other modules. This allows one to interact with all the code through > Multipack or in a more modular fashion according to taste. Exactly. I'd prefer to have several modules under Numeric, i.e. Numeric.MINPACK, Numeric.ODEPACK, etc. If you keep on adding to Minpack as it is now, sooner or later there will be a name conflict between different "packs". Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From kernr@ncifcrf.gov Fri Jun 18 05:16:28 1999 From: kernr@ncifcrf.gov (Robert Kern) Date: Fri, 18 Jun 1999 00:16:28 -0400 Subject: [Matrix-SIG] Win32 Binaries of cephesmodule and signaltools Message-ID: <3769C81C.55B2F8DF@mail.ncifcrf.gov> Grab'em while they're hot! http://starship.python.net/crew/kernr/binaries/Binaries.html Documentation on my part is still a little sparse, but that shouldn't be a huge problem unless you want to compile them for yourselves. The packages are ZIP archives containing the binaries, whatever documentation came with the sources, any Python wrappers or Python test cases, and maybe the Setup file I used to compile. Signaltools includes numpyio.pyd; I have decided, for the moment, to keep them bundled. I'm flexible, though. NOTE for NumpyIO Users: Always open files for reading and writing in binary mode. NumpyIO won't work otherwise. Please try them if you can. If they don't work right for you, tell me. -- Robert Kern | ----------------------|"In the fields of Hell where the grass grows high This space | Are the graves of dreams allowed to die." intentionally | - Richard Harter left blank. | From vanandel@ucar.edu Fri Jun 18 16:14:48 1999 From: vanandel@ucar.edu (Joe Van Andel) Date: Fri, 18 Jun 1999 09:14:48 -0600 Subject: [Matrix-SIG] EGCS now compiles CXX Message-ID: <376A6268.8A17ECFF@ucar.edu> This is a multi-part message in MIME format. --------------9ECE09EF751D959377350EC6 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sometime ago, I said that EGCS (recent snapshots, not the official release ) now compiles CXX. EGCS no longer requires patches to compile CXX, but CXX_Objects.h does need the attached patch. EGCS compile speed for CXX is disappointing, but I'm told that the EGCS maintainers will eventually speed up template handling. -- Joe VanAndel National Center for Atmospheric Research http://www.atd.ucar.edu/~vanandel/ Internet: vanandel@ucar.edu --------------9ECE09EF751D959377350EC6 Content-Type: text/plain; charset=us-ascii; name="cxx.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="cxx.patch" Patch to CXX_Objects.h *** 1.1 1999/04/20 13:52:05 --- 1.2 1999/04/23 20:39:46 *************** *** 920,926 **** } seqref front() { ! return seqref(this, 0); } const T back () const { --- 920,926 ---- } seqref front() { ! return seqref(this, 0); } const T back () const { *************** *** 928,934 **** } seqref back() { ! return seqref(this, size()-1); } void verify_length(size_type required_size) --- 928,934 ---- } seqref back() { ! return seqref(this, size()-1); } void verify_length(size_type required_size) *************** *** 945,951 **** } class iterator ! : public STD::iterator, int> { protected: friend class SeqBase; SeqBase* seq; --- 945,951 ---- } class iterator ! : public random_access_iterator, int> { protected: friend class SeqBase; SeqBase* seq; *************** *** 1055,1061 **** } class const_iterator ! : public STD::iterator { protected: friend class SeqBase; const SeqBase* seq; --- 1055,1061 ---- } class const_iterator ! : random_access_iterator { protected: friend class SeqBase; const SeqBase* seq; --------------9ECE09EF751D959377350EC6-- From alawhead@vcn.bc.ca Fri Jun 18 22:53:45 1999 From: alawhead@vcn.bc.ca (Alexander Lawhead) Date: Fri, 18 Jun 1999 14:53:45 -0700 (PDT) Subject: [Matrix-SIG] RNG module in latest distribution? In-Reply-To: <376A6268.8A17ECFF@ucar.edu> Message-ID: Just a quick question: was RNG not packaged in the latest LLNLDistribution for windows? It appears that some of the files are there but not the pyd... Alexander From amullhau@zen-pharaohs.com Sat Jun 19 18:16:19 1999 From: amullhau@zen-pharaohs.com (Andrew P. Mullhaupt) Date: Sat, 19 Jun 1999 13:16:19 -0400 Subject: [Matrix-SIG] Multipack References: Message-ID: <024f01beba77$7384a520$99a0720a@amullhau> > With the availability of g77 on many platforms (including Windows) is it > that much of a problem to require a Fortran compiler to compile? It can be. The performance of most Fortran packages is extremely dependent on compiler switches. The only justification for using original Fortran is performance. Our experience with the Sun Performance Library is that orders of magnitude of performance can be lost by failure to use exactly the right switches, and one has to use ones which are not necessarily compatible with all the other switches. We've resorted to rewriting some of the Sun Performance Library BLAS to get optimal performance, and as it happens we were unable to get the best performance with Fortran code, so we have a sandwich of C code calling Fortran calling our C code. We gained a factor of between 3 and 5 on the QR decomposition of large matrices, so it was worth it. Later, Andrew Mullhaupt From perry@stsci.edu Mon Jun 21 14:58:16 1999 From: perry@stsci.edu (Perry Greenfield) Date: Mon, 21 Jun 1999 09:58:16 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906211358.JAA12539@eclipse.stsci.edu> We are engaged in a project to use Python as the basis for a scripting language for our existing data analysis software. To date we have made great progress, and recently we have begun looking at creating an integrated environment that allows the use of Numeric along with our old software. In examining the Numeric capabilities, we see some significant shortcomings that we think could be rectified in the standard Numeric distribution. We realize that at least some of these topics have been raised before, but hope that this message will serve as a starting point for resolving these apparent problems (if perhaps only to show us that we are wrongheaded about what we perceive as problems). We realize some of our suggestions may be controversial. Occasionally we have seen suggestions like ours met with responses something like, "If you want to do image processing and care about efficiency, you shouldn't use Python/Numeric anyway." We don't agree with that, however -- image processing is in many cases the ideal application for an interpreted language like Python, because nearly all the compute time is spent doing vectorized calculations on millions of pixels, and the interpreter overhead is often negligible. Given the right additions to Numeric (and not that many are needed), Python could be competitive with any image/data processing language in existence for efficiency (and better than any other as a programming language.) We have a great deal of experience with IDL, and while the shortcomings of the current version of Numeric keep it from being competitive with IDL now, with a few changes Numeric could easily replace IDL for our applications. Here is a brief summary of the things we think are needed: (1) Improved control over memory allocation and over promotion of arrays to larger types (e.g., Float32 to Float64). (2) "put" function (inverse of take) to scatter values into a set of array elements. (3) Improved or new standard functions: (a) C versions (for speed and memory efficiency) of more standard functions including: arrayrange, nonzero, compress, clip, where. (b) A new function to rebin an array, either reducing the array size by summing adjacent elements or increasing the size by repeating elements. (c) A new function to compute histograms for arrays using a memory- and CPU-efficient algorithm. We discuss each of these in more detail below. 1. Memory allocation A lot of our data analysis applications involve processing large images. There are astronomical cameras in use that produce images of 16k by 16k pixels (2**28 = 268 million pixels) or larger. Handling these large quantities of data is a challenge, and Numeric's tendency to increase the number of bytes per pixel through datatype promotion makes it even harder. It is very difficult to keep arrays in their original datatype if they do not correspond to python ints (32-bit) or floats (64-bit). Operations with python literals or scalar values automatically upcast the results. Even operations with integers cause Float32 arrays to be converted to Float64 -- e.g., ones(10,Float32)/2 is a Float64, which is a very surprising result for users of other languages. The suggested solution (using rank-0 Numeric arrays instead of simple Python constants) often does not work because any Python operation on those arrays turns them back into (promoted) Python scalars. E.g., after a = array(1.0, Float32) b = -a b is a standard Python 64-bit scalar rather than a Float32. In the applications we have developed using Numeric, the majority of our effort appears to be spent on trying to work around this promotion behavior, and the resulting code is not very readable (and probably not very maintainable either). We wonder whether anyone has seriously used Numeric for 16 bit ints or 32 bit floats in cases (like ours) where the memory penalty of converting these to 32 bit ints and doubles is not acceptable. The nicest solution from the numerical point of view would be for Python to support these other types. We do not expect that to happen (nor should it happen). But the problem could be ameliorated by changing Numeric's default behavior in casting constants. If combining an array and a python scalar of similar types (some form of int with some form of int, or float with double or int), then the default behavior should be to cast the python type to the array type, even if it means down casting. The rules for promotion of ints to floats/doubles would remain as is. We consider this a very important issue. The effect of the current situation is to render Numeric types that have no corresponding Python type virtually useless except for conversion purposes. Another change that would make Numeric's memory use more efficient would be better use of temporary arrays. It is possible to avoid excess creation of temporary arrays by using the 3-argument forms of multiply(), add(), etc., e.g., replacing b=a*(a+1) by b=a+1 multiply(a,b,b) Not only does the second version take less memory, but it runs about 10% faster (presumably because it spends less time allocating memory.) The question is, is there a way for Numeric to determine that one of operands is a temporary? It looks to us as if the reference count for a temporary result (which will be discarded after the operation) is smaller than for a variable operand. If so, it should be possible for Numeric to reuse the temporary for the output in cases where the temporary is the same type and shape as the output. Maybe there is something tricky here that we don't understand, but if not it seems like a fairly simple change would make Numeric a lot less memory-hungry. 2. Inverse of take function We call this out as a separate item because we consider it essential to make Numeric effective. This topic has arisen repeatedly in the SIG, yet nothing has come of it for various reasons which seem to include: 1) a lack of agreement on what the general solution should be. 2) no one willing to implement any solution, agreed upon or not. The absence of an agreed upon general solution or someone to do the work should not prevent the adoption and distribution of less-than-general solutions (such as array_set, that comes with Gist) with the standard distribution. Surely a less-than-general solution is preferable to none. While an addition to the Numeric syntax that allows arrays to be indexed by array is preferable (so that one can use an indexing syntax rather than a function call), we can't see why the simple function call can't be added right away even if it is not completely general and even if it will ultimately be replaced by something more powerful and convenient. Perhaps progress on the more limited fronts may spur progress on the most general solution. While we of course would prefer someone else do the work (partly because we do not yet consider ourselves competent to muck around with Numeric), we are willing to do it ourselves if no one else will. 3. C implementations of standard functions We were surprised that some of the most heavily used functions in Numeric are implemented in Python rather than C, despite their apparent simplicity. At least some of these really need to be implemented in C to make them decently efficient: arrayrange, nonzero, compress, clip, where. The current Python versions are especially costly in memory (when working with large images) because they usually create several temporary arrays before finally producing the result. Note that the standard Python version of clip could be replaced by this much more efficient version (which is about 10 times faster): def clip(m, m_min, m_max): """clip(m, m_min, m_max) = every entry in m that is less than m_min is replaced by m_min, and every entry greater than m_max is replaced by m_max. """ b = maximum(m, m_min) return minimum(b, m_max) We also see a need for a function that rebins an array, e.g., zooming the array by pixel replication or dezooming it by summing blocks of pixels (not simply by slicing to select every N-th pixel.) Both of these would change the shape of the array by multiplying or dividing by an integral factor. These functions are very useful in image processing; repeat provides a somewhat clumsy way to expand an array, but it is not obvious how to efficiently reduce it. There are several Python functions for computing histograms from Numeric arrays, but there still seems to be a need for a simple, general histogram function written in C for efficiency. This means not just CPU efficiency but also memory efficiency -- the Python versions we have seen typically sort the array, which is costly in memory for big images. We think Numeric has great potential, and it would be a shame if the scientific community is discouraged by the lack of a few features and conveniences they have come to expect in array languages. Rick White Perry Greenfield Paul Barrett Space Telescope Science Institute From sonntag@lion-ag.de Mon Jun 21 11:05:15 1999 From: sonntag@lion-ag.de (Christian Sonntag) Date: Mon, 21 Jun 1999 12:05:15 +0200 Subject: [Matrix-SIG] segmentation fault in Numeric Message-ID: <376E0E5B.6DD28CD0@lion-ag.de> Hello if I import Numeric and call array([1],[2]) I get a reproducable seg fault on Linux, LLNL v11, python 1.5.2. Is that a bug? Christian From HYoon@exchange.ml.com Mon Jun 21 17:29:44 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Mon, 21 Jun 1999 12:29:44 -0400 Subject: [Matrix-SIG] segmentation fault in Numeric Message-ID: I think this will do the job. Thanks to Charles. I found few bugs like that Numpy that crashes it. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** > -----Original Message----- > From: Charles G Waldman > Sent: Wednesday, May 26, 1999 12:53 PM > To: David Ascher > Cc: matrix-sig@python.org > Subject: [Matrix-SIG] Patch for multiarraymodule.c > > > The symptom: > > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> from Matrix import * > >>> m = Matrix([1,1],[1,0]) > Segmentation fault (core dumped) > > The cure: > > --- multiarraymodule.c 1999/05/26 16:43:37 1.1 > +++ multiarraymodule.c 1999/05/26 16:50:59 > @@ -864,6 +864,11 @@ > if (tpo == Py_None) { > type = PyArray_NOTYPE; > } else { > + if (!PyString_Check(tpo)){ > + PyErr_SetString(PyExc_TypeError, > + "typecode must be a character"); > + return NULL; > + } > tp = PyString_AsString(tpo); > if (tp[0] == 0) type = PyArray_NOTYPE; > else type = tp[0]; > > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG@python.org > http://www.python.org/mailman/listinfo/matrix-sig From cgw@fnal.gov Mon Jun 21 17:50:14 1999 From: cgw@fnal.gov (Charles G Waldman) Date: Mon, 21 Jun 1999 11:50:14 -0500 (CDT) Subject: [Matrix-SIG] segmentation fault in Numeric In-Reply-To: <376E0E5B.6DD28CD0@lion-ag.de> References: <376E0E5B.6DD28CD0@lion-ag.de> Message-ID: <14190.27974.513243.552084@buffalo.fnal.gov> Christian Sonntag writes: > Hello > > if I import Numeric and > > call array([1],[2]) > > I get a reproducable seg fault on Linux, LLNL v11, python 1.5.2. > > Is that a bug? Yes, and I submitted a patch several weeks ago that fixes this. See: http://www.python.org/pipermail/matrix-sig/1999-May/002802.html From dubois1@llnl.gov Mon Jun 21 18:20:32 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Mon, 21 Jun 1999 10:20:32 -0700 Subject: [Matrix-SIG] About "put"... Message-ID: <99062110313700.09053@almanac> I was hoping that we could implement "put" as assignment, at least for 1-D arrays, viz., ix = [1, 3, 9, 2] v[ix] = .... Unfortunately the coding for v[something] and v[something] = something is particularly nasty and so it require more time than I was able to spend to be sure that I was not breaking something while implementing this. For more than 1-D you have the additional question of what semantics should be. For example, if v[ix, 3] = ... means the obvious thing, what does v[ix, [3,4, 2, 1]] mean? Is this an assignment to four components of v or to 16 or ? For the record, I think it has to mean four ([1,3], [3,4], [9,2], [2,1]). But the logic is very convoluted. For example, if v is 2-D, is v[ix] allowed the way it is when ix is a scalar? Implementing this would seem to need a whole new kind of object. From jhauser@ifm.uni-kiel.de Mon Jun 21 19:38:30 1999 From: jhauser@ifm.uni-kiel.de (jhauser@ifm.uni-kiel.de) Date: Mon, 21 Jun 1999 20:38:30 +0200 (CEST) Subject: [Matrix-SIG] About "put"... In-Reply-To: <99062110313700.09053@almanac> References: <99062110313700.09053@almanac> Message-ID: <14190.34130.844644.197584@lisboa> I think the 1-D way of assignment is ok, because the most obvious way to get lot's of indices is nonzero(), which is also only 1-D. It would be useful to have a fast mapping function from 1-D to N-D indices and vice versa. I want to agree, that a standard form of doing it 1-D is better than nothing. And it's great, that something along these lines is coming. __Janko From Oliphant.Travis@mayo.edu Tue Jun 22 05:26:52 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Mon, 21 Jun 1999 23:26:52 -0500 (CDT) Subject: [Matrix-SIG] N-D put. Message-ID: There is probably no optimal N-D put for all situations, but as a reference you might be interested to note that MATLAB would assign 16 positions for the case ix = [3,4,5,6] d[ix,[1,2,5,7]] = 1 One possibility is to use 1-D indexing to put into N-D arrays. -Travis From da@ski.org Tue Jun 22 06:21:25 1999 From: da@ski.org (David Ascher) Date: Mon, 21 Jun 1999 22:21:25 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906211358.JAA12539@eclipse.stsci.edu> Message-ID: On Mon, 21 Jun 1999, Perry Greenfield wrote: > The question is, is there a way for Numeric to determine that one of > operands is a temporary? It looks to us as if the reference count for > a temporary result (which will be discarded after the operation) is > smaller than for a variable operand. If so, it should be possible for > Numeric to reuse the temporary for the output in cases where the > temporary is the same type and shape as the output. Maybe there is > something tricky here that we don't understand, but if not it seems like > a fairly simple change would make Numeric a lot less memory-hungry. This is an issue I've thought about a little, but alas I believe the solution requires experience in fields I'm got a provably bad record in, such as parsing, code tree manipulations, etc. so thinking is all I'll do on the topic. The problem is that there is no way to have NumPy do the detection you mention above because the assignment process is one that Python does not give any 'hooks' into (yet). An alternative idea I did have is that one could do analysis of the bytecode of specific functions and automatically detect the patterns involved based on the lexical analysis, as long as one limited oneself to the simple patterns. I think that would be worthwhile, and doable, if somewhat hackish. One would then write things like: def foo(x,y): x = x + 1 y = y * 3 x = x + y return x foo = optimize(foo) and the new foo would contain bytecode which would be equivalent to that generated by an equivalent (but faster & smaller): def foo(x,y): add(x, 1, x) multiply(y, 3, y) add(x, y, x) return x Doing this completely is very hard. Doing it partially (with problems of the kind determined above) is just hard. Someone with the right kind of bent might find it an interesting problem, though. Resources worth investigating for this are bytecodehacks, Skip's pipeline optimizer, the parser module, etc. --david ascher From hinsen@cnrs-orleans.fr Tue Jun 22 11:41:33 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 22 Jun 1999 12:41:33 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906211358.JAA12539@eclipse.stsci.edu> (perry@stsci.edu) References: <199906211358.JAA12539@eclipse.stsci.edu> Message-ID: <199906221041.MAA05475@chinon.cnrs-orleans.fr> > Occasionally we have seen suggestions like ours met with responses > something like, "If you want to do image processing and care about > efficiency, you shouldn't use Python/Numeric anyway." We don't agree > with that, however -- image processing is in many cases the ideal > application for an interpreted language like Python, because nearly all > the compute time is spent doing vectorized calculations on millions of Did you try the Python Imaging Library (PIL)? It contains specialized array-like objects for dealing with images. Ideally NumPy and PIL would use a common low-level array object, but we don't live in a perfect world! > It is very difficult to keep arrays in their original datatype if they > do not correspond to python ints (32-bit) or floats (64-bit). Indeed. Unfortunately there is a conflict of interest between different applications and different types of users. NumPy's current behaviour is as compatible as possible with the behaviour of standard Python data types, which I think is an important advantage. > The suggested solution (using rank-0 Numeric arrays instead of simple > Python constants) often does not work because any Python operation on > those arrays turns them back into (promoted) Python scalars. E.g., after > a = array(1.0, Float32) > b = -a > b is a standard Python 64-bit scalar rather than a Float32. That sounds like a problem that can be solved easily. Instead of using rank-0 arrays, one could use special "Float32 Python scalars", which would of course have to be implemented, presumably in NumPy, but that is not much work. You would still have to create such objects with somewhat clumsy syntax (e.g. Float32(3.5)), because extension modules can't introduce new syntax rules, but there would be no more problems with constant upcasting once you have made sure that all data is in some float32 type. Another solution to the upcasting problem could be the introduction of a variant of Float32 arrays that are higher up in the upcasting hierarchy than Float64. Then mixed expressions would automatically end up in Float32. Whoever uses this type would of course have to watch out for precision loss, but it seems to me that in your kind of application this is not a problem. > very maintainable either). We wonder whether anyone has seriously used > Numeric for 16 bit ints or 32 bit floats in cases (like ours) where the > memory penalty of converting these to 32 bit ints and doubles is not > acceptable. Probably not! > The nicest solution from the numerical point of view would be for > Python to support these other types. We do not expect that to happen > (nor should it happen). Why not? As long as you don't expect to get special syntax for literals of the new type, this looks like a relatively simple solution (see above). > But the problem could be ameliorated by changing Numeric's default > behavior in casting constants. If combining an array and a python > scalar of similar types (some form of int with some form of int, or > float with double or int), then the default behavior should be to cast the > python type to the array type, even if it means down casting. But this would be incompatible with both the current NumPy behaviour (possibly breaking existing code) and with standard Python behaviour, causing headaches to new users. I think that the principle "never lose precision without clearly saying that you want it" is a good one. > The question is, is there a way for Numeric to determine that one of > operands is a temporary? It looks to us as if the reference count for > a temporary result (which will be discarded after the operation) is > smaller than for a variable operand. If so, it should be possible for > Numeric to reuse the temporary for the output in cases where the > temporary is the same type and shape as the output. Maybe there is > something tricky here that we don't understand, but if not it seems like > a fairly simple change would make Numeric a lot less memory-hungry. Working with reference counts is almost by definition something tricky. I won't comment on this idea (which looks interesting, but too difficult to verify quickly), but why don't you try it out in plain Python, by creating a Python wrapper around array objects which implements the optimizations you propose? (You can get the reference count via the function sys.getrefcount().) [about "put" function] > The absence of an agreed upon general solution or someone to do the > work should not prevent the adoption and distribution of > less-than-general solutions (such as array_set, that comes with Gist) > with the standard distribution. Surely a less-than-general solution is > preferable to none. Except that it would create a future obligation to maintain compatibility. Perhaps NumPy should contain a special module with "quick and dirty" code which might be replaced by incompatible improvements in the future. > We were surprised that some of the most heavily used functions in > Numeric are implemented in Python rather than C, despite their apparent > simplicity. At least some of these really need to be implemented in C > to make them decently efficient: arrayrange, nonzero, compress, clip, > where. The current Python versions are especially costly in memory Perhaps no one else ever had efficiency problems with them; I certainly didn't! > We also see a need for a function that rebins an array, e.g., zooming > the array by pixel replication or dezooming it by summing blocks of > pixels (not simply by slicing to select every N-th pixel.) Both of The first can be done by repeat() (at least along one axis). The second problem may be what the mysterious reduceat() operation does, although nobody seems to understand it well enough to decide! But liks in any case of presumed missing functionality, I am sure that a code contribution would be happily accepted by the maintainers... > an integral factor. These functions are very useful in image > processing; repeat provides a somewhat clumsy way to expand an array, > but it is not obvious how to efficiently reduce it. How about reshaping and adding? For a 1D array for example: a = array([2, 3, 1, 7, 3, -8]) b = add.reduce(reshape(a, (3, 2)), -1) > There are several Python functions for computing histograms from > Numeric arrays, but there still seems to be a need for a simple, > general histogram function written in C for efficiency. This means not I have never found a need for a C function. Please try the histogram class in version 2 of my ScientificPython package (available at ftp://dirac.cnrs-orleans.fr/pub/); it does not sort the array and doesn't create huge intermediates. It has performed well for all my applications. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Tue Jun 22 13:08:55 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 22 Jun 1999 14:08:55 +0200 Subject: [Matrix-SIG] About "put"... In-Reply-To: <99062110313700.09053@almanac> (dubois1@llnl.gov) References: <99062110313700.09053@almanac> Message-ID: <199906221208.OAA05498@chinon.cnrs-orleans.fr> > sure that I was not breaking something while implementing this. For > more than 1-D you have the additional question of what semantics > should be. For example, if v[ix, 3] = ... means the obvious thing, > what does v[ix, [3,4, 2, 1]] mean? Is this an assignment to four > components of v or to 16 or ? I'd say 16. The logic is that for the special case of a list index equivalent to a range or slice, the result should be the same as for a range or slice, i.e. a[[0, 1, 2], [2, 4, 6]] should be equivalent to a[0:3, 2:8:2] However, there is an implementation problem: if the result should really be equivalent, then sequence indexing should also produce an array that shares its data space with the original array. And that is impossible with the current array type implementation. But perhaps an additional type is not that difficult to implement. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From HYoon@exchange.ml.com Tue Jun 22 14:42:42 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Tue, 22 Jun 1999 09:42:42 -0400 Subject: [Matrix-SIG] array([19990609],'f') bug? Message-ID: Hi, Can anyone explain this strange behavior? NT svcpack 3 running version 11 of NumPy >>> array([19990609],'f') array([ 19990608.],'f') Anyone have a fix? ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** From neel@cswv.com Tue Jun 22 15:20:42 1999 From: neel@cswv.com (Neel Krishnaswami) Date: Tue, 22 Jun 1999 10:20:42 -0400 Subject: [Matrix-SIG] Numeric semantics question.... Message-ID: <3.0.1.32.19990622102042.00776aa8@emerald.cswv.com> Hi, I wanted to confirm that if I assign a subsequence of an array object, the entire array is kept in memory. That is, if I wrote some code like this: -*-*-*- import Numeric def memory_waster(): x = Numeric.zeros((1000000,1), Numeric.Float) return x[1] def hose_system(n): a = [] for i in range(n): a.append( memory_waster() ) return a x = hose_system(500) -*-*-*- Numeric would try to allocate memory for 500 million floats, right? Does making the following change correctly get around the problem? def memory_waster(): x = Numeric.zeros((1000000,1), Numeric.Float) return copy.copy(x[1]) I've tried it and it seems to work, but I'd appreciate advice about the subtleties from anyone more experienced than I. -- Neel Krishnaswami neelk@cswv.com From robin@jessikat.demon.co.uk Tue Jun 22 16:04:13 1999 From: robin@jessikat.demon.co.uk (Robin Becker) Date: Tue, 22 Jun 1999 16:04:13 +0100 Subject: [Matrix-SIG] array([19990609],'f') bug? In-Reply-To: References: Message-ID: In article , Yoon, Hoon (CICG - NY Program Trading) writes >Hi, > >Can anyone explain this strange behavior? >NT svcpack 3 running version 11 of NumPy >>>> array([19990609],'f') > array([ 19990608.],'f') >Anyone have a fix? > I don't think this is fixable. I think the number of decimal digits in a float will be somewhere between 7 and 8. ie >>> array([ 9990608.],'f') array([ 9990608.],'f') ie 7 digits is ok. >************************************************************** >S. Hoon Yoon (Quant) Merrill Lynch >Equity Trading >yelled@yahoo.com hoon@bigfoot.com(w) >"Miracle is always only few standard deviations away, but so is >catastrophe." >* Expressed opinions are often my own, but NOT my employer's. >"I feel like a fugitive from the law of averages." Mauldin >************************************************************** > >_______________________________________________ >Matrix-SIG maillist - Matrix-SIG@python.org >http://www.python.org/mailman/listinfo/matrix-sig -- Robin Becker From rlw@stsci.edu Tue Jun 22 18:57:40 1999 From: rlw@stsci.edu (Rick White) Date: Tue, 22 Jun 1999 13:57:40 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906221757.NAA05761@sundog.stsci.edu> On Tue, 22 Jun 1999, David Ascher wrote: >> The question is, is there a way for Numeric to determine that one of >> operands is a temporary? It looks to us as if the reference count for >> a temporary result (which will be discarded after the operation) is >> smaller than for a variable operand. If so, it should be possible for >> Numeric to reuse the temporary for the output in cases where the >> temporary is the same type and shape as the output. Maybe there is >> something tricky here that we don't understand, but if not it seems like >> a fairly simple change would make Numeric a lot less memory-hungry. > >This is an issue I've thought about a little, but alas I believe the >solution requires experience in fields I'm got a provably bad record in, >such as parsing, code tree manipulations, etc. so thinking is all I'll do >on the topic. > >The problem is that there is no way to have NumPy do the detection you >mention above because the assignment process is one that Python does not >give any 'hooks' into (yet). It appears to me it might be a lot simpler than this, though my understanding of the Python/Numeric code is not very deep so I certainly could be wrong. For example, here is the Python C code to do a multiply operation: case BINARY_MULTIPLY: w = POP(); v = POP(); x = PyNumber_Multiply(v, w); Py_DECREF(v); Py_DECREF(w); PUSH(x); if (x != NULL) continue; break; Presumably if v and/or w are Numeric arrays, PyNumber_Multiply eventually calls the Numeric multiply routine. In the current Numeric implementation, a new Numeric array is allocated, filled with the product, and returned for assignment to x. Every operation results in the creation of another temporary array. Suppose v is the intermediate result of another expression. Then its reference count will be 1, so that after returning from the multiply routine its memory will be released by the DECREF. Couldn't the Numeric multiply routine check the reference count of v and, if it is 1 and if the datatype and size of v are appropriate, put the result into v instead of into a new array? Then the ref count for v would be incremented and v would be returned as the function result. This doesn't require any knowledge about the parser or the context in which the expression is being evaluated. All that is required is the ability to modify one of the input arrays and return it as the result. I figure the only change to Numeric would be to check the reference counts, types, and sizes of the input arguments at the point when the result array is about to be allocated, and to reuse one of the input arrays if possible. As I said, maybe I'm being naive in my assumptions -- is there some reason this would not work? From da@ski.org Tue Jun 22 19:18:10 1999 From: da@ski.org (David Ascher) Date: Tue, 22 Jun 1999 11:18:10 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906221757.NAA05761@sundog.stsci.edu> Message-ID: On Tue, 22 Jun 1999, Rick White wrote: > It appears to me it might be a lot simpler than this, though my > understanding of the Python/Numeric code is not very deep so I certainly > could be wrong. For example, here is the Python C code to do a multiply > operation: > > case BINARY_MULTIPLY: > w = POP(); > v = POP(); > x = PyNumber_Multiply(v, w); > Py_DECREF(v); > Py_DECREF(w); > PUSH(x); > if (x != NULL) continue; > break; > > Presumably if v and/or w are Numeric arrays, PyNumber_Multiply > eventually calls the Numeric multiply routine. In the current Numeric > implementation, a new Numeric array is allocated, filled with the > product, and returned for assignment to x. Every operation results in > the creation of another temporary array. > > Suppose v is the intermediate result of another expression. Then its > reference count will be 1, so that after returning from the multiply > routine its memory will be released by the DECREF. Couldn't the > Numeric multiply routine check the reference count of v and, if it is 1 > and if the datatype and size of v are appropriate, put the result into > v instead of into a new array? Then the ref count for v would be > incremented and v would be returned as the function result. It's an interesting approach. However, I don't think that the "Suppose ..." statement above corresponds to an "if and only if", which I believe is needed. Consider: a = arange(10) b = a * 3 return a, b At the point of the PyArray_Multiply, the refcount of its first argument (a) is 1. However, it is not a temporary variable. Or did I miss something? --david ascher From rlw@stsci.edu Tue Jun 22 19:30:47 1999 From: rlw@stsci.edu (Rick White) Date: Tue, 22 Jun 1999 14:30:47 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906221757.NAA05761@sundog.stsci.edu> Message-ID: <199906221830.OAA05939@sundog.stsci.edu> >On Tue, 22 Jun 1999, David Ascher wrote: > >It's an interesting approach. However, I don't think that the "Suppose >..." statement above corresponds to an "if and only if", which I believe >is needed. Consider: > > a = arange(10) > b = a * 3 > return a, b > >At the point of the PyArray_Multiply, the refcount of its first argument >(a) is 1. However, it is not a temporary variable. Or did I miss >something? I think the refcount for (a) would be 2, one for the reference held by the variable 'a' and one for the reference on the stack. If the refcount were only 1, then the DECREF would reduce it to zero, allowing a to be released. I think the only time an argument to multiply (or other operator) can have a refcount of 1 is when it is temporary storage about to be released. E.g. a = (b*c)*d During the first multiply (b*c), both arguments have refcount > 1 so a new Numeric array gets allocated for the result. But for the second multiply the first argument will have a refcount of 1, indicating that it is available for reuse. I'm hoping somebody will tell me if I'm wrong (or even if I'm right!) As Konrad Hinsen suggested, I'm playing a bit with UserArray to see if I can get it to reuse temporaries like this. From da@ski.org Tue Jun 22 19:37:22 1999 From: da@ski.org (David Ascher) Date: Tue, 22 Jun 1999 11:37:22 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906221830.OAA05939@sundog.stsci.edu> Message-ID: On Tue, 22 Jun 1999, Rick White wrote: > > a = arange(10) > > b = a * 3 > > return a, b > > > >At the point of the PyArray_Multiply, the refcount of its first argument > >(a) is 1. However, it is not a temporary variable. Or did I miss > >something? > > I think the refcount for (a) would be 2, one for the reference > held by the variable 'a' and one for the reference on the stack. Gotcha. You're right, of course. --david From chase@att.com Tue Jun 22 20:14:36 1999 From: chase@att.com (Chase, Christopher J (Chris), ALNTK) Date: Tue, 22 Jun 1999 15:14:36 -0400 Subject: [Matrix-SIG] Numeric Nits Message-ID: <15BF137B61C7D211BAF50000C0589CFA0BC4CA@njc240po02.ho.att.com> > -----Original Message----- > From: rlw@stsci.edu [mailto:rlw@stsci.edu] > Sent: Tuesday, June 22, 1999 1:58 PM > To: da@ski.org > Cc: barrett@stsci.edu; matrix-sig@python.org; perry@stsci.edu; > rlw@stsci.edu > Subject: Re: [Matrix-SIG] Numeric Nits > > > On Tue, 22 Jun 1999, David Ascher wrote: > > >> The question is, is there a way for Numeric to determine > that one of > >> operands is a temporary? It looks to us as if the > reference count for > >> a temporary result (which will be discarded after the operation) is > >> smaller than for a variable operand. If so, it should be > possible for > >> Numeric to reuse the temporary for the output in cases where the > >> temporary is the same type and shape as the output. Maybe there is > >> something tricky here that we don't understand, but if not > it seems like > >> a fairly simple change would make Numeric a lot less memory-hungry. > > > >This is an issue I've thought about a little, but alas I believe the > >solution requires experience in fields I'm got a provably > bad record in, > >such as parsing, code tree manipulations, etc. so thinking > is all I'll do > >on the topic. > > > >The problem is that there is no way to have NumPy do the > detection you > >mention above because the assignment process is one that > Python does not > >give any 'hooks' into (yet). > > It appears to me it might be a lot simpler than this, though my > understanding of the Python/Numeric code is not very deep so > I certainly > could be wrong. For example, here is the Python C code to do > a multiply > operation: > > case BINARY_MULTIPLY: > w = POP(); > v = POP(); > x = PyNumber_Multiply(v, w); > Py_DECREF(v); > Py_DECREF(w); > PUSH(x); > if (x != NULL) continue; > break; > > Presumably if v and/or w are Numeric arrays, PyNumber_Multiply > eventually calls the Numeric multiply routine. In the current Numeric > implementation, a new Numeric array is allocated, filled with the > product, and returned for assignment to x. Every operation results in > the creation of another temporary array. > > Suppose v is the intermediate result of another expression. Then its > reference count will be 1, so that after returning from the multiply > routine its memory will be released by the DECREF. Couldn't the > Numeric multiply routine check the reference count of v and, > if it is 1 > and if the datatype and size of v are appropriate, put the result into > v instead of into a new array? Then the ref count for v would be > incremented and v would be returned as the function result. > I was thinking of the same approach. The problem is that the caller owns the reference. The numeric multiply routine has no idea who the caller is and what the caller will do with the reference, e.g. whether the caller is going to do a DECREF on return. Also the caller is not guaranteed to be the interpreter. Rather it could be a program using the C API. One possible solution is if python objects could support an TEMPORARY attribute perhaps in the basic object_HEAD structure. Then a caller that will release an object after a call can mark an object as TEMPORARY, meaning "I (the caller) will not access this object and delete my reference to the object after the call)". Another meaning for TEMPORARY might be "Pending DECREF on return". A called function receiving an object as TEMPORARY and a reference count of one is allowed to overwrite or reuse the object. This could work but is not a pretty nor elegant approach. But I don't know if there is a better approach given the caller owns reference approach of Python. Chris > This doesn't require any knowledge about the parser or the context > in which the expression is being evaluated. All that is required > is the ability to modify one of the input arrays and return it > as the result. I figure the only change to Numeric would be to > check the reference counts, types, and sizes of the input arguments > at the point when the result array is about to be allocated, > and to reuse one of the input arrays if possible. > > As I said, maybe I'm being naive in my assumptions -- is there > some reason this would not work? > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG@python.org > http://www.python.org/mailman/listinfo/matrix-sig > From rlw@stsci.edu Tue Jun 22 20:36:49 1999 From: rlw@stsci.edu (Rick White) Date: Tue, 22 Jun 1999 15:36:49 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906221936.PAA06150@sundog.stsci.edu> On Tue, 22 Jun 1999, Christopher J. Chase wrote: >The problem is that the caller owns the reference. The numeric multiply >routine has no idea who the caller is and what the caller will do with the >reference, e.g. whether the caller is going to do a DECREF on return. Also >the caller is not guaranteed to be the interpreter. Rather it could be a >program using the C API. Ah, I hadn't considered that possibility. You're right, that does complicate things. Suppose we just defined the behavior of the Numeric multiply, add, etc. routines as depending on the reference count of the arguments, and included this reuse of temporaries as part of the behavior? Existing C code that calls the Numeric routines directly without maintaining reference counts on arrays would have to change, and maybe there is enough such code to make it a bad idea. But I see this as a very desirable behavior that would be well worth adding if the cost is not too high. >One possible solution is if python objects could support an TEMPORARY >attribute perhaps in the basic object_HEAD structure. Then a caller that >will release an object after a call can mark an object as TEMPORARY, meaning >"I (the caller) will not access this object and delete my reference to the >object after the call)". Another meaning for TEMPORARY might be "Pending >DECREF on return". > >A called function receiving an object as TEMPORARY and a reference count of >one is allowed to overwrite or reuse the object. This would require changing Python itself, not just Numeric, right? Maybe we could convince Guido that this is worth dirtying up the Python code, but I would probably resist it if I were him. Reuse of temporaries is probably only an issue for Numeric (and maybe PIL and other image modules), so it would be nice if it could be handled by Numeric itself. - Rick From Barrett@stsci.edu Tue Jun 22 21:01:04 1999 From: Barrett@stsci.edu (Paul Barrett) Date: Tue, 22 Jun 1999 16:01:04 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <15BF137B61C7D211BAF50000C0589CFA0BC4CA@njc240po02.ho.att.com> References: <15BF137B61C7D211BAF50000C0589CFA0BC4CA@njc240po02.ho.att.com> Message-ID: <14191.59199.620985.989598@nem-srvr.stsci.edu> Chase, Christopher J (Chris), ALNTK writes: > > > > case BINARY_MULTIPLY: > > w = POP(); > > v = POP(); > > x = PyNumber_Multiply(v, w); > > Py_DECREF(v); > > Py_DECREF(w); > > PUSH(x); > > if (x != NULL) continue; > > break; > > I was thinking of the same approach. > > The problem is that the caller owns the reference. The numeric multiply > routine has no idea who the caller is and what the caller will do with the > reference, e.g. whether the caller is going to do a DECREF on return. Also > the caller is not guaranteed to be the interpreter. Rather it could be a > program using the C API. Yes! I tried to make this point the other day to Rick and Perry, but did not mention the point of the caller not being the interpreter. > One possible solution is if python objects could support an TEMPORARY > attribute perhaps in the basic object_HEAD structure. Then a caller that > will release an object after a call can mark an object as TEMPORARY, meaning > "I (the caller) will not access this object and delete my reference to the > object after the call)". Another meaning for TEMPORARY might be "Pending > DECREF on return". This is the solution that I would advocate for a future version of Python. What's nice about this is that a module can ignore this flag and continue to work as it has always done, but any new module can take advantage of this knowledge to use memory wisely. (Of course, all modules will have to be recompiled to work properly.) > A called function receiving an object as TEMPORARY and a reference count of > one is allowed to overwrite or reuse the object. > > This could work but is not a pretty nor elegant approach. But I don't know > if there is a better approach given the caller owns reference approach of > Python. Why isn't this an elegant approach? What's wrong with having a general purpose flags structure which can be extended to enable modules to make wise use of resources? Python already uses this technique to indicate that keyword arguments are being passed. Just my $.02 worth. -- Paul From dubois1@llnl.gov Tue Jun 22 21:08:33 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Tue, 22 Jun 1999 13:08:33 -0700 Subject: [Matrix-SIG] the temporary problem Message-ID: <99062213283001.24051@almanac> The discussion so far has been an interesting one. Perhaps those of you struggling with your thoughts would like to know that this is nothing peculiar to Python. In fact, every OO language suffers from it if it allows user classes to overload operators. In the initial days of using C++ for science it was quickly found that those who wrote beautiful classes representing abstractions could write really nice code that looked just like a physics book, but mysteriously ran very slowly and when profiled was found to be spending 90% of its time in the heap manager. Part of this was poor understanding of C++'s propensity to copy things, as explained in Scott Meyer's books. But part of it was unavoidable; Eiffel has the problem, Python has the problem, and Java would have the problem if it allowed you to overload. The problem is that in an OO language x = a + (b + c )is really assign('x', a.add(b.add(c))) or something like that. The "assign" in Python is internal and you cannot get at it, but in C++ you can. This problem has been solved in C++ by using expression templates. The idea is that the result of an addition is an object that has a strange type that we might call result-of-adding-two-arrays. The assign operator triggers the actual evaluation and the result is that the actual expression is evaluated exactly like it was Fortran 90: one loop, no temporaries. For vectors above a length of about 10 the result takes the same time as it would if you wrote out the loop by hand in C or Fortran. So, in effect, the result of operators is to build a parse tree for the expression rather than to carry out the operations. The final assignment hook is used to trigger the evalation. Natuarally, there is extra fun awaiting as one figures out how to accomodate scalar operands, sqrt, cos, etc. For C++ this has been done. The existing NumPy does something similar with indexing, in that it is essentially a "lazy" evaluation. I actually think this is mostly a "bad" thing and I did not agree with it when we designed NumPy. It is a source of zillions of gotchas. I suspect any tampering of the type suggested in the recent messages would suffer a similar consequence. Given the ease of dropping in some "real" C for a few time-critical spots, I don't think this area is NumPy's most pressing problem. I believe someone has previously posted to this group a parse-tree approach that they partially implemented as a Python class. The next release of Numerical will allow you to subclass Numerical arrays IN PYTHON so those of you who want to play with this might have some fun. Note also the "attributes" hook I added. You could use that for denoting a temporary. From perry@stsci.edu Tue Jun 22 21:27:59 1999 From: perry@stsci.edu (Perry Greenfield) Date: Tue, 22 Jun 1999 16:27:59 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906222027.QAA19028@eclipse.stsci.edu> > From hinsen@cnrs-orleans.fr Tue Jun 22 06:43 EDT 1999 > Date: Tue, 22 Jun 1999 12:41:33 +0200 > To: perry@stsci.edu > Subject: Re: [Matrix-SIG] Numeric Nits > > > Did you try the Python Imaging Library (PIL)? It contains specialized > array-like objects for dealing with images. Ideally NumPy and PIL > would use a common low-level array object, but we don't live in a > perfect world! > We have looked at it but Numeric comes far closer to having the features needed for astronomical image processing than does PIL (though it may prove very useful for displaying such images). We need the full range of mathematical capabilities of Numeric. > > The suggested solution (using rank-0 Numeric arrays instead of simple > > Python constants) often does not work because any Python operation on > > those arrays turns them back into (promoted) Python scalars. E.g., after > > a = array(1.0, Float32) > > b = -a > > b is a standard Python 64-bit scalar rather than a Float32. > > That sounds like a problem that can be solved easily. Instead of using > rank-0 arrays, one could use special "Float32 Python scalars", which > would of course have to be implemented, presumably in NumPy, but that > is not much work. You would still have to create such objects with > somewhat clumsy syntax (e.g. Float32(3.5)), because extension modules > can't introduce new syntax rules, but there would be no more problems > with constant upcasting once you have made sure that all data is > in some float32 type. > Yes, it is a possibility, but we believe it is not as good a solution reasons given a little later. > Another solution to the upcasting problem could be the introduction of > a variant of Float32 arrays that are higher up in the upcasting > hierarchy than Float64. Then mixed expressions would automatically end > up in Float32. Whoever uses this type would of course have to watch > out for precision loss, but it seems to me that in your kind of > application this is not a problem. > This is close to one idea we had tossed around of creating a variant of numeric arrays that would not be upcast. Still... > > very maintainable either). We wonder whether anyone has seriously used > > Numeric for 16 bit ints or 32 bit floats in cases (like ours) where the > > memory penalty of converting these to 32 bit ints and doubles is not > > acceptable. > > Probably not! > Then your comment below regarding incompatibility is not likely applicable! :-) > > The nicest solution from the numerical point of view would be for > > Python to support these other types. We do not expect that to happen > > (nor should it happen). > > Why not? As long as you don't expect to get special syntax for > literals of the new type, this looks like a relatively simple > solution (see above). > > > But the problem could be ameliorated by changing Numeric's default > > behavior in casting constants. If combining an array and a python > > scalar of similar types (some form of int with some form of int, or > > float with double or int), then the default behavior should be to cast the > > python type to the array type, even if it means down casting. > > But this would be incompatible with both the current NumPy behaviour > (possibly breaking existing code) and with standard Python behaviour, > causing headaches to new users. I think that the principle "never lose > precision without clearly saying that you want it" is a good one. > If you agree that very few people are using Float32 and Int16, then it is likely that little code will be broken. But even if there were subtantial amounts of code that would be broken, there is a larger issue to be addressed. We believe that a large segment of the scientific community expects to be able to operate with arrays of these types; these expectations are met with most other array-based languages available to them. We are arguing that if Python/Numeric is to be seen as a viable alternative to things like IDL and Matlab, this capability has to be available in a relatively painless way. Every alternative discussed so far has some pros and cons. The following summarizes the proposed solutions. 1) Current stituation: consistent with python upcasting behavior, but use of Float32 and Int16 so painful as to be virtually unusable. This locks out all users that desire to use these types. 2) Add special scalar types to NumPy. This is better than option 1) yet still leads to rather ugly code that will put off many scientific users. (e.g., b = Float32(2.)*(a - Float32(1.)) instead of b = 2*(a-1) ) 3) Add new array types that don't upcast with scalars. Better than 1) and 2) in our opinion but now we are complicating the array types available (and if new types are introduced to handle indexing vs mask variants, it becomes even worse). We are not in a position to be sure but this appears to probably require more work to implement than changing the existing default behavior for upcasting. But it certainly is worth considering. 4) Change default scalar casting behavior. Main drawback is that it introduces casting behavior different than people (Python or otherwise) are used to. Well, at one level anyway. Consider a new user of Python and NumPy. They type: a = b + 1. Where b is a single precision array. They are used to expecting that 1. is single precision and that the result ought to be also. We argue that the naive user will more likely be surprised by getting a double result than a single precision result because of this despite the fact that they expect the usual upcasting rules. The fundamental problem is that because Python doesn't support single precision floats there is going to be unexpected behavior no matter what. We believe that treating arrays differently in this respect is likely what most users want. Those that have no use for single precison arrays really should be little affected. They are unlikely to be using them now. All the default array creators give you the default Python type. People who don't use the lower precision types don't need to worry much about accidently casting down. After all, there are other aspects of Numeric arrays which are not compatible with Python either. For example, the semantics of a[0:5] with regard to copying are completely different. Treating arrays as the heavyweight object that dictates casting behavior strikes us as the compromise that is most reasonable. We believe that changing the default behavior will likely break little code (but if that isn't true, whomever depends on it now, please let us know). But even if it did, we think it is worth it to make Python/Numeric acceptable to a much larger community. This is a comparitively good time to make such a change. ********************* > Working with reference counts is almost by definition something tricky. > I won't comment on this idea (which looks interesting, but too > difficult to verify quickly), but why don't you try it out in plain > Python, by creating a Python wrapper around array objects which > implements the optimizations you propose? (You can get the reference > count via the function sys.getrefcount().) > Hmmm. That is something that we a trying right now. > [about "put" function] > > > The absence of an agreed upon general solution or someone to do the > > work should not prevent the adoption and distribution of > > less-than-general solutions (such as array_set, that comes with Gist) > > with the standard distribution. Surely a less-than-general solution is > > preferable to none. > > Except that it would create a future obligation to maintain > compatibility. Perhaps NumPy should contain a special module with > "quick and dirty" code which might be replaced by incompatible > improvements in the future. > True, but we can use different names for the prototype function and a final one. It's still no excuse for not having any. > > We were surprised that some of the most heavily used functions in > > Numeric are implemented in Python rather than C, despite their apparent > > simplicity. At least some of these really need to be implemented in C > > to make them decently efficient: arrayrange, nonzero, compress, clip, > > where. The current Python versions are especially costly in memory > > Perhaps no one else ever had efficiency problems with them; I > certainly didn't! > We're sure they are good enough for many uses, but with large arrays it does make a big difference. > > We also see a need for a function that rebins an array, e.g., zooming > > the array by pixel replication or dezooming it by summing blocks of > > pixels (not simply by slicing to select every N-th pixel.) Both of > > The first can be done by repeat() (at least along one axis). The second > problem may be what the mysterious reduceat() operation does, although > nobody seems to understand it well enough to decide! > > But liks in any case of presumed missing functionality, I am sure that > a code contribution would be happily accepted by the maintainers... > We are willing to contribute. The purpose of our list was to elicit comments about whether someone had already solved these problems or had better suggestions for solutions. But if necessary, we will be willing to do some or all of this work. > > > There are several Python functions for computing histograms from > > Numeric arrays, but there still seems to be a need for a simple, > > general histogram function written in C for efficiency. This means not > > I have never found a need for a C function. Please try the histogram > class in version 2 of my ScientificPython package (available at > ftp://dirac.cnrs-orleans.fr/pub/); it does not sort the array > and doesn't create huge intermediates. It has performed well for > all my applications. > -- Actually, this is a good illustration of how performance is context dependent. What is acceptable in one circumstance may not be suitable for large datasets. We tried the above routine on a large dataset and found that it took 87 seconds to compute a 1000 bin histogram on a 100,000 element double array with random values on a lowly Sparc4. By comparison, IDL (which does use underlying C code to do the histogram) took only 0.14 seconds. This is approaching three orders of magnitude in speed. Since we will be dealing with arrays far larger than this, it matters a lot to us. From rlw@stsci.edu Tue Jun 22 23:39:14 1999 From: rlw@stsci.edu (Rick White) Date: Tue, 22 Jun 1999 18:39:14 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906222239.SAA06735@sundog.stsci.edu> On Tue, 22 Jun 1999, Paul F. Dubois wrote: > [... interesting stuff about C++ expression templates ...] > >Given the ease of dropping in some "real" C for a few time-critical spots, I >don't think this area is NumPy's most pressing problem. I agree that this is not the most pressing problem -- unwanted promotions are a much bigger memory sink. If Numeric were changed so that it reused temporaries, I believe that it would save only one temporary array worth of memory. While even one extra image in memory can sometimes be a problem, it pales compared with the cost of changing many Float32 arrays to Float64 (and the associated effort of trying to prevent that.) Still, if temporaries could be reused it would be nice. I'd like to object to the solution of dropping in "real C" for time-critical spots, though. That is a fine approach if you are developing production codes that will be used many times. In interactive data analysis, however, the actual computation that is carried out is different every time. We explore the data, trying many different approaches to try to understand what we can learn from it. In most cases, the time to write a compiled C module would just not be justified for a one-shot application. That's why we want to use Python, after all -- the development time is much faster than C. Our experience with other array languages (principally IDL) is that the vast majority of data analysis exploration can be carried out very effectively without ever using a compiled language. The overheads that come from making a pass through the data for each operation are relatively modest and are acceptable in exchange for easy access to the data. Our goal is to make Python comparable in its efficiency to the other array languages. Compiled modules can be used to add major new capabilities to the toolbox (FFTs are a good example), and that is an important advantage of using Python. But the basic array operations really need to be made as fast, simple-to-use, and memory-efficient as possible. From tim.hochberg@ieee.org Tue Jun 22 23:54:19 1999 From: tim.hochberg@ieee.org (Tim Hochberg) Date: Tue, 22 Jun 1999 16:54:19 -0600 Subject: [Matrix-SIG] Numeric Nits Message-ID: <014401bebd02$2c3975a0$3c3fa4cd@R20CAREY.MAYO.EDU> Two thought regarding promotions of scalars to zeros sized arrays of doubles: First, perhaps the Numeric should be modified to so that zero-size arrays do not get turned to scalars when they are manipulated. So, for example, "-array(5)" would result in a zero sized array containing -5, not the scalar negative five. I think this is feasible (at least in JNumeric where I'm familiar with the internals), but I haven't investigated it in depth. It's also possible it will break code. Something to think about anyway. Second, two static functions could be added to Numeric to set the precision of Scalar values. By default, floats would be converted to type 'd' when they are converted to zero-d arrays, as they are now, but a call to: >>> Numeric.setFloatPrecision('f') would result in floats be converted to type 'f' instead. For example: >>> arange(5, 'f') * 5 array([0, 5, 10, 15, 20], 'f') Similarly, >>> Numeric.setIntPrecision('s') could be used to set the default conversion of Python integers to arrays. I'm not sure if this should affect array creation as well. For example should >>> Numeric.setFloatPrecision('f') >>> a= arrray([1.,2.,3.]) result in an array of type 'f' or an array of type 'd'? The big advantage of this approach is it's gauranteed to not break any code. It also has what I consider good (i.e., unsuprising) default behaviour, while allowing more memory efficiency if necessary. -tim From ransom@cfa.harvard.edu Tue Jun 22 22:32:18 1999 From: ransom@cfa.harvard.edu (Scott M. Ransom) Date: Tue, 22 Jun 1999 21:32:18 +0000 Subject: [Matrix-SIG] Numeric Nits References: <014401bebd02$2c3975a0$3c3fa4cd@R20CAREY.MAYO.EDU> Message-ID: <377000E2.D4917DC6@cfa.harvard.edu> Tim Hochberg wrote: > Second, two static functions could be added to Numeric to set the precision > of Scalar values. By default, floats would be converted to type 'd' when > they are converted to zero-d arrays, as they are now, but a call to: > > >>> Numeric.setFloatPrecision('f') > > ...... Is there a way to make this a more general solution when working with Numeric? To change the casting rules as a whole -- instead of just for zero-d arrays? Or would this not be possible (i.e. since it goes against the core Python rules). As another astronomer trying to wean myself from IDL, I have to agree with Rick and Perry that the memory use/efficiency issues with Numeric are quite important. I frequently deal with _huge_ (~10**8 point) 1-D time series (I am using a parallelized version of Python on an IBM SP2 among others), all of which are single precision float or single precision complex. I am currently working around the upcasts by using a few 'C' routines to perform some simple vector math and/or casting. So my code would certainly be amongst the rare bits that would be broken by changes to the casting or precision rules. But I would actually really like to see such changes. I completely agree that an ugly solution will turn away substantial numbers of scientific users -- something that I don't believe is good for any of us. With Tim's proposal, should it be possible to extend it to all numeric arrays, we could keep the majority of legacy code un-broken and still allow those of us working with un-gainly sized arrays the opportunity to prod and poke at our data sets interactively instead of making us run back to good-ole-'C'. Scott -- Scott M. Ransom Phone: (781) 320-9867 Address: 75 Sanderson Ave. email: ransom@cfa.harvard.edu Dedham, MA 02026 PGP Fingerprint: D2 0E D0 10 CD 95 06 DA EF 78 FE 2B CB 3A D3 53 From tim_one@email.msn.com Wed Jun 23 06:14:17 1999 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 23 Jun 1999 01:14:17 -0400 Subject: [Matrix-SIG] array([19990609],'f') bug? In-Reply-To: Message-ID: <000a01bebd37$3e465440$cb9e2299@tim> [Hoon Yoon] > Can anyone explain this strange behavior? > NT svcpack 3 running version 11 of NumPy > >>> array([19990609],'f') > array([ 19990608.],'f') > Anyone have a fix? IEEE floats (single-precision) have 24 mantissa bits, so the dense range of representable integers is [-2**24, 2**24] = [-16777216, 16777216]. Your input number is outside that range, but is in the next larger binade (i.e. +/- 2**25), where only every *second* integer is exactly representable (and in the next binade after that, only every fourth integer is representable; and in the next binade after that, only every eighth; etc). That's why it got chopped to an even integer. So long as you're using floats, you'll either get 19990608 or 19990610 (ANSI C doesn't define which one you'll get back, but the important point is that it's impossible to get back 19990609). so-use-doubles-or-ints-or-stick-to-float-ints-divisible-by-large-powers- of-2-ly y'rs - tim From HYoon@exchange.ml.com Wed Jun 23 14:06:16 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Wed, 23 Jun 1999 09:06:16 -0400 Subject: [Matrix-SIG] array([19990609],'f') bug? Message-ID: Tim, Actually, I am using double already. I thought this should bring up error msg or NaN. Not unexpected numeric change. Given that I often deal with large trade sizes, this is not unusal. The conversion could cause hard to find errors. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin ************************************************************** > -----Original Message----- > From: Tim Peters [SMTP:tim_one@email.msn.com] > Sent: Wednesday, June 23, 1999 1:14 AM > To: matrix-sig@python.org > Subject: RE: [Matrix-SIG] array([19990609],'f') bug? > > [Hoon Yoon] > > Can anyone explain this strange behavior? > > NT svcpack 3 running version 11 of NumPy > > >>> array([19990609],'f') > > array([ 19990608.],'f') > > Anyone have a fix? > > IEEE floats (single-precision) have 24 mantissa bits, so the dense range > of > representable integers is [-2**24, 2**24] = [-16777216, 16777216]. Your > input number is outside that range, but is in the next larger binade (i.e. > +/- 2**25), where only every *second* integer is exactly representable > (and > in the next binade after that, only every fourth integer is representable; > and in the next binade after that, only every eighth; etc). That's why it > got chopped to an even integer. > > So long as you're using floats, you'll either get 19990608 or 19990610 > (ANSI > C doesn't define which one you'll get back, but the important point is > that > it's impossible to get back 19990609). > > so-use-doubles-or-ints-or-stick-to-float-ints-divisible-by-large-powers- > of-2-ly y'rs - tim > > > > _______________________________________________ > Matrix-SIG maillist - Matrix-SIG@python.org > http://www.python.org/mailman/listinfo/matrix-sig From Oliphant.Travis@mayo.edu Wed Jun 23 16:44:07 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Wed, 23 Jun 1999 10:44:07 -0500 (CDT) Subject: [Matrix-SIG] Conversion of scalars. Message-ID: After looking at the source code it seems like it would be "somewhat" straightforward thing to implement Tim Hochberg's suggestion of defining a function (or an attribute) to set the desired behavior for the precision of scalars. It seems like what needs to be changed is of code in PyArray_ObjectType, which all the coercion section of code look to in order to determine what type to make a Python Object. In particular the last part of this code is called whenever the Python Object is one of the Numeric types: Here is the code: int PyArray_ObjectType(PyObject *op, int minimum_type) { if (PyInt_Check(op)) { return max(minimum_type, (int)PyArray_LONG); } else { if (PyFloat_Check(op)) { return max(minimum_type, (int)PyArray_DOUBLE); } else { if (PyComplex_Check(op)) { return max(minimum_type, (int)PyArray_CDOUBLE); } else { return (int)PyArray_OBJECT; } } } } As you can see right now the default is to return PyArray_LONG, PyArray_DOUBLE, and PyArray_CDOUBLE. It seems this could be changed quite straightforwardly to return the lower precision versions. Setting up some way to alter these return types would also have the effect of changing the way sequences of these types were returned so array([1.,3.,4.]) would become an array of type 'f'. I like the idea of having this return type be user settable. It would be easy to do by just having this code return what the user settable type is for the integer, float, and complex objects of Python. Asside from introducing a global variable floating around I don't see how to do this without an extra argument to PyArrayObject_Type, which isn't too bad except it requires all Numeric extensions which use this function to be changed (I'd be willing to change all of mine, though.) This seems like a pretty easy feature to add and as I use Float32 all of the time (the big reason I left from MATLAB), I would like to see it added. I would submit a patch myself, except it sounds like Paul has been making some changes to the source and is going to release a new version at some point... Just for fun, it would be interesting to see the effect of a hard-coded PyArray_SHORT, PyArray_FLOAT, and PyArray_CFLOAT here... Travis From Oliphant.Travis@mayo.edu Wed Jun 23 16:55:55 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Wed, 23 Jun 1999 10:55:55 -0500 (CDT) Subject: [Matrix-SIG] Hard coded results. Message-ID: Well, it was just too much temptation so I tried a hard-coding the PyArray_SHORT, PyArray_FLOAT, PyArray_CFLOAT into PyArray_ObjectType and recompiled. It works as expected. Here are the results. >>> d = array([1,2,3]) >>> d array([1, 2, 3],'s') >>> 2*d array([2, 4, 6],'s') >>> 3.0*d array([ 3., 6., 9.],'f') >>> 3j*d Segmentation fault Oops! Someone else should try this as I have problems with using egcs-compiled Numeric with pgcc-compiled Python (which is what this was tested under). Note that also: >>> d = array([1j,3,4]) >>> d array([ 0.+1.j, 3.+0.j, 4.+0.j],'F') Travis From hinsen@cnrs-orleans.fr Wed Jun 23 18:02:48 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 23 Jun 1999 19:02:48 +0200 Subject: [Matrix-SIG] Numeric semantics question.... In-Reply-To: <3.0.1.32.19990622102042.00776aa8@emerald.cswv.com> (neel@cswv.com) References: <3.0.1.32.19990622102042.00776aa8@emerald.cswv.com> Message-ID: <199906231702.TAA06387@chinon.cnrs-orleans.fr> > I wanted to confirm that if I assign a subsequence of an array object, the > entire array is kept in memory. That is, if I wrote some code like this: > [code deleted] > > Numeric would try to allocate memory for 500 million floats, right? Yes. > Does making the following change correctly get around the problem? > > def memory_waster(): > x = Numeric.zeros((1000000,1), Numeric.Float) > return copy.copy(x[1]) Yes. Internally, an array object is actually two objects: an array data space object and an array reference object (the first one is not strictly speaking a Python object, but it works in the same way). Indexing creates a new reference object that points to the same data object, and the data object is only released when there is no more reference object pointing to it. That's why no memory is released in the first example. When you make a copy, a new data object is created, so the first one can be released. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Wed Jun 23 18:18:32 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 23 Jun 1999 19:18:32 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <014401bebd02$2c3975a0$3c3fa4cd@R20CAREY.MAYO.EDU> (tim.hochberg@ieee.org) References: <014401bebd02$2c3975a0$3c3fa4cd@R20CAREY.MAYO.EDU> Message-ID: <199906231718.TAA06400@chinon.cnrs-orleans.fr> > First, perhaps the Numeric should be modified to so that zero-size arrays do > not get turned to scalars when they are manipulated. So, for example, > "-array(5)" would result in a zero sized array containing -5, not the scalar The original design decision was that rank-0 arrays should never be returned from any common operation because of their somewhat weird behaviour. In principle rank-0 arrays *are* scalars, and in fact rank-0 arrays practically behave like Python scalars, but many routines that expect scalars will not accept rank-0 arrays. It would be very confusing to have two kinds of data that are almost the same. Another problem is that there is no clean way to extract a scalar from a rank-0 array. Indexing requires a number of indices which is at most equal to the rank, but you can't index with zero indices! In practice, rank-0 arrays can be indexed with one index that has to be zero, but that's a kludge rather than good design. > negative five. I think this is feasible (at least in JNumeric where I'm > familiar with the internals), but I haven't investigated it in depth. It's > also possible it will break code. Something to think about anyway. It would break tons of code; for example, any routine written in C and expecting a number would raise an exception if it were fed a rank-0 array! > Second, two static functions could be added to Numeric to set the precision > of Scalar values. By default, floats would be converted to type 'd' when > they are converted to zero-d arrays, as they are now, but a call to: > > >>> Numeric.setFloatPrecision('f') > > would result in floats be converted to type 'f' instead. For example: That would create all the well-known problems of a global interpreter status. What if you ask for single-precision and then call a subroutine that expects double? What if you call a subroutine that changes the precision? And what about thread-safety? > The big advantage of this approach is it's gauranteed to not break any code. Only if it's not used! ;-) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Wed Jun 23 18:28:03 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 23 Jun 1999 19:28:03 +0200 Subject: [Matrix-SIG] Conversion of scalars. In-Reply-To: (message from Travis Oliphant on Wed, 23 Jun 1999 10:44:07 -0500 (CDT)) References: Message-ID: <199906231728.TAA06404@chinon.cnrs-orleans.fr> > Setting up some way to alter these return types would also have the effect > of changing the way sequences of these types were returned so > array([1.,3.,4.]) would become an array of type 'f'. If I understood your idea correctly, you would *never* get double precision arrays from any operation. If that's what you want, just compile your private version of NumPy in such a way that PyArray_DOUBLE corresponds to a C float! > I like the idea of having this return type be user settable. It would be > easy to do by just having this code return what the user settable type is > for the integer, float, and complex objects of Python. As a library developer, I don't like at all that the accuracy of my calculations depends on some user settings! I have plenty of routines that would not work correctly in single precision. > Asside from introducing a global variable floating around I don't see how > to do this without an extra argument to PyArrayObject_Type, which isn't > too bad except it requires all Numeric extensions which use this function > to be changed (I'd be willing to change all of mine, though.) I wouldn't. Not because of the effort in changing the code, but because most of my NumPy is published and has a non-neglectable user base. I already spend too much time answering questions about installation problems, so I don't need the additional trouble of explaining which version of my code works with which version of NumPy. Backwards compatibility has always been a strong point of Python! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Oliphant.Travis@mayo.edu Wed Jun 23 19:07:20 1999 From: Oliphant.Travis@mayo.edu (Travis Oliphant) Date: Wed, 23 Jun 1999 13:07:20 -0500 (CDT) Subject: [Matrix-SIG] Conversion of scalars. In-Reply-To: <199906231728.TAA06404@chinon.cnrs-orleans.fr> Message-ID: > > If I understood your idea correctly, you would *never* get double > precision arrays from any operation. If that's what you want, just > compile your private version of NumPy in such a way that PyArray_DOUBLE > corresponds to a C float! No that's not correct. You would get doubles whenever you have doubles as arguments to the operation. All it changes is how Python scalars and sequences are handled by default. Even if the proposed attribute is set you can still make sure they are returned as double precision by requesting it either in Python with array([3,4],Float), or in C with any of the Array construction function for which you specify a required type (which is generally always done when it is known that the routine needs double precision). > > As a library developer, I don't like at all that the accuracy of my > calculations depends on some user settings! I have plenty of routines > that would not work correctly in single precision. Help me understand a bit better, I don't see what libraries this would break. If a double array is ever introduced into the operation then everything immediately becomes double precision. The suggestion does not break that behavior it just changes the default handling of Python scalars (for which there are no single precision versions). If the user does not want single precision he/she wouldn't have to use it. The default would be as it is now. > I wouldn't. Not because of the effort in changing the code, but > because most of my NumPy is published and has a non-neglectable user > base. I already spend too much time answering questions about > installation problems, so I don't need the additional trouble of > explaining which version of my code works with which version of NumPy. > Backwards compatibility has always been a strong point of Python! I think this can be done in a backward compatible way, anyway, so changing code should not be necessary. Travis From tim.hochberg@ieee.org Wed Jun 23 18:27:15 1999 From: tim.hochberg@ieee.org (Tim Hochberg) Date: Wed, 23 Jun 1999 11:27:15 -0600 Subject: [Matrix-SIG] Numeric Nits Message-ID: <00bf01bebd9d$a7d19d00$623fa4cd@R20CAREY.MAYO.EDU> >The original design decision was that rank-0 arrays should never be >returned from any common operation because of their somewhat weird >behaviour. In principle rank-0 arrays *are* scalars, and in fact >rank-0 arrays practically behave like Python scalars, but many >routines that expect scalars will not accept rank-0 arrays. It would >be very confusing to have two kinds of data that are almost the same. I'll concede that virtually all the operations should go ahead and return scalars. However, the original poster referred to '-array([5])' returning a scalar. I haven't been able to think of any situations where people are going to be using negative on a rank-0 array without expecting the result to also be a rank-0 array. The chance of the rank-0 array's produced this way getting loose and hurting people seems slim. I suppose consistency would demand that '~' also get changed. I'm not sure whether this would be a good idea or not, but I figured it's worth discussing in light of the recent flurry of posts regarding upcasting. >Another problem is that there is no clean way to extract a scalar from >a rank-0 array. Indexing requires a number of indices which is at most >equal to the rank, but you can't index with zero indices! In practice, >rank-0 arrays can be indexed with one index that has to be zero, but >that's a kludge rather than good design. Actually, I've disallowed this in JPython because it is,as you say, a kludge. There are however two ways of getting at the value in a rank-0 array. a[()] a[...] I prefer the first (not just because it's shorter...). [SNIP] >That would create all the well-known problems of a global interpreter >status. What if you ask for single-precision and then call a subroutine >that expects double? What if you call a subroutine that changes the >precision? And what about thread-safety? It's guilty of all these things. It was suggested as a way to make using the interpreter in interactive mode with floats less painful. I suppose it could cause lots of trouble if misused though. This isn't a very good solution in many ways, but it's the best one I've seen so far. Using floats is enough of a pain that it would probably kill you in interactive mode, so I think the lack of support for floats is a legitimate complaint. I would love to see a better solution than this one though. >> The big advantage of this approach is it's gauranteed to not break any code. > >Only if it's not used! ;-) That's true of almost anything I suppose. -tim From hinsen@cnrs-orleans.fr Wed Jun 23 19:29:14 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 23 Jun 1999 20:29:14 +0200 Subject: [Matrix-SIG] Conversion of scalars. In-Reply-To: (message from Travis Oliphant on Wed, 23 Jun 1999 13:07:20 -0500 (CDT)) References: Message-ID: <199906231829.UAA06457@chinon.cnrs-orleans.fr> > No that's not correct. You would get doubles whenever you have doubles as > arguments to the operation. All it changes is how Python scalars and > sequences are handled by default. Even if the proposed attribute is set But in many real-life cases, Python arrays are created from other sequences. In fact, most NumPy functions accept other sequences as well and convert them silently to arrays. All these conversions would yield single-precision arrays, and the only way to get double precision is to request it explicitly. I'd expect much existing code to break with this behaviour. > > As a library developer, I don't like at all that the accuracy of my > > calculations depends on some user settings! I have plenty of routines > > that would not work correctly in single precision. > > Help me understand a bit better, I don't see what libraries this would > break. If a double array is ever introduced into the operation then > everything immediately becomes double precision. The suggestion does not In many cases the input is a Python scalar or a list of Python scalars, which is at some point converted to an array. > break that behavior it just changes the default handling of Python scalars > (for which there are no single precision versions). If the user does not > want single precision he/she wouldn't have to use it. The default would > be as it is now. Fine, but if much existing code doesn't work with the single-precision setting, it won't be useful. And using code that needs double precision together with code that needs single precision in one application seems like a much bigger mess than all the other proposals I have seen recently. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Wed Jun 23 19:40:45 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Wed, 23 Jun 1999 20:40:45 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <00bf01bebd9d$a7d19d00$623fa4cd@R20CAREY.MAYO.EDU> (tim.hochberg@ieee.org) References: <00bf01bebd9d$a7d19d00$623fa4cd@R20CAREY.MAYO.EDU> Message-ID: <199906231840.UAA06466@chinon.cnrs-orleans.fr> > I'll concede that virtually all the operations should go ahead and return > scalars. However, the original poster referred to '-array([5])' returning a > scalar. I haven't been able to think of any situations where people are > going to be using negative on a rank-0 array without expecting the result to > also be a rank-0 array. The chance of the rank-0 array's produced this way > getting loose and hurting people seems slim. Fine, but wouldn't you also want addition, multiplication, etc. of rank-0 arrays to return rank-0 arrays for consistency? And how should the NumPy code decide if it was called with an explicit rank-0 array or with a result of a previous operation that just happens to be a rank-0 array and that the user wants converted to a scalar? > Actually, I've disallowed this in JPython because it is,as you say, a > kludge. There are however two ways of getting at the value in a rank-0 > array. > > a[()] > a[...] > > I prefer the first (not just because it's shorter...). I agree, but this is still far from obvious (try to write an explanation for the NumPy tutorial!) > This isn't a very good solution in many ways, but it's the best one I've > seen so far. Using floats is enough of a pain that it would probably kill > you in interactive mode, so I think the lack of support for floats is a > legitimate complaint. I would love to see a better solution than this one > though. I'd prefer the "special-float-type-which-is-higher-in-the-cast-hierarchy-than-double" solution. At least the "weird" property is attached to arrays, and not to a global variable. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom@cfa.harvard.edu Wed Jun 23 20:16:18 1999 From: ransom@cfa.harvard.edu (Scott M. Ransom) Date: Wed, 23 Jun 1999 15:16:18 -0400 Subject: [Matrix-SIG] Numeric Nits References: <002401bebd22$2384dc40$5a323fd1@R20CAREY.MAYO.EDU> Message-ID: <37713282.5627FF23@cfa.harvard.edu> Tim Hochberg wrote: > > I'm curious, what other casting rules you would like to change? I'm one of > the people who rarely uses arrays of type 'f', so I don't have deep insight > into what "you people" want.... Upcasting is fine most of the time -- as you know it is the "right" thing to do in order to not lose accuracy in many circumstances. The more I think about it, maybe Konrad et al. are correct with the solution of declaring certain 'f' arrays higher in the cast hierarchy than doubles. These arrays would never be up-cast to type double. This would allow the use of the same simple syntax (i.e. arange(3.0, typecode = 'f') * 2.0 gives array([ 0., 2., 4.],'f') _not_ array([ 0., 2., 4.]). The user would still have to be careful about sending the 'f' arrays to functions that expect double arrays, but we have to do that now anyways. So I don't see that as a major problem. Scott -- Scott M. Ransom Address: Harvard-Smithsonian CfA Phone: (617) 496-2526 60 Garden St. MS 10 email: ransom@cfa.harvard.edu Cambridge, MA 02138 PGP Fingerprint: D2 0E D0 10 CD 95 06 DA EF 78 FE 2B CB 3A D3 53 From rlw@stsci.edu Wed Jun 23 20:30:12 1999 From: rlw@stsci.edu (Rick White) Date: Wed, 23 Jun 1999 15:30:12 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906231930.PAA09537@sundog.stsci.edu> Scott Ransom wrote: >The more I think about it, maybe Konrad et al. are correct with the >solution of declaring certain 'f' arrays higher in the cast hierarchy >than doubles. These arrays would never be up-cast to type double. This >would allow the use of the same simple syntax (i.e. arange(3.0, typecode >= 'f') * 2.0 gives array([ 0., 2., 4.],'f') _not_ array([ 0., 2., >4.]). > >The user would still have to be careful about sending the 'f' arrays to >functions that expect double arrays, but we have to do that now >anyways. So I don't see that as a major problem. I also agree that this is probably the best solution. Presumably this would be applied to all the types, so that there would be special non-promoting versions of bytes, shorts, etc. too. The only drawback I see to this approach is that the proliferation of types will make Numeric C code messier. If that is acceptable, though, this looks like it should satisfy everyone. From rlw@stsci.edu Wed Jun 23 22:12:07 1999 From: rlw@stsci.edu (Rick White) Date: Wed, 23 Jun 1999 17:12:07 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits Message-ID: <199906232112.RAA10630@sundog.stsci.edu> Tim Hochberg wrote: % Would it be possible and/or desirable to just have this special type modify % the ways it casts scalars and not other arrays. For example, to behave as: % % >>> a = arange(5, 'f2') % >>> 5 * a % array([0,5,10,15,20], 'f2') % >>> arange(5, 'd') * a % array([0,1,4,9,16], 'd') % % Or is that even more confusing.... I think it would be simpler both to implement and to understand if there was just a 'd2' datatype that is above 'd' and if the promotion hierarchy is defined so that binary operations between 'f2' and 'd' get promoted to 'f2', but binary operations between 'f2' and 'd2' get promoted to 'd2'. Python scalars would get turned into arrays of type 'd', and everything would work just fine (he said optimistically.) I figured if such a second hierarchy were defined, for my applications I would *always* use the higher types 'f2', 'd2', 'i2' etc. for every array I create. Then the effect would be what you want, which is that scalars would not cause promotion. I can see some tricky issues though involving scalar floats (type 'd') and non-promoting integers (type 'i2'). I think I want this behavior: >>> a = arange(5,'i2') >>> a array([0,5,10,15,20], 'i2') >>> a*3.14 array([0,5,10,15,20], 'f2') I.e. a 'd' scalar gets converted to type 'f2' when it is used with any type 2 Numeric array except for type 'd2'. I can imagine it may be hard to define this behavior so everyone agrees on it. From Christos Siopis Wed Jun 23 22:34:13 1999 From: Christos Siopis (Christos Siopis) Date: Wed, 23 Jun 1999 17:34:13 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <37713282.5627FF23@cfa.harvard.edu> Message-ID: On Wed, 23 Jun 1999, Scott M. Ransom wrote: > The more I think about it, maybe Konrad et al. are correct with the > solution of declaring certain 'f' arrays higher in the cast hierarchy > than doubles. These arrays would never be up-cast to type double. This > would allow the use of the same simple syntax (i.e. arange(3.0, typecode > = 'f') * 2.0 gives array([ 0., 2., 4.],'f') _not_ array([ 0., 2., > 4.]). However, just to play devil's advocate, suppose a new user *does* know that Python natively only supports doubles and that upcasting is the normal behavior --then s/he might be confused that the result is a float. Alternatively, what if one *does* want the result of arange(3.0, typecode = 'f') * 2.0 to be a double? (ok here one could simply use 'd' instead of 'f' but that could be an issue in more complex expressions, where one might be forced to convert the whole expression back to double). Furthermore, if types such as bytes, shorts, etc. are also introduced, figuring out the resulting type might get messy, e.g. would array([0,2,4],'Int32') * 2 be an Int32 (because Int64 is the Python default, in analogy with Float32 and Float64)? How about: array([0,2,4],'Int32') + array([9,10,11],'Int64') or even array([0.,2.,4.],'Float32') + array([9.,10.,11.],'Float64') (i.e., would the 'f'-arrays-higher-in-the-case-hierarchy apply only with respect to scalar doubles or also with respect to array doubles?) Even if such rules could be clearly set out, would they not make Numeric programming more complicated and error-prone? I was thinking that using single-letter appendices at the end of numbers to designate non-default types might serve as a solution. E.g.: 2.4f , 2.4e03f : float (single precision) 2.4d == 2.4 , 2.4e03 : double (Python default) 32b : 8-bit integer 32s : 16-bit integer 32i : 32-bit integer 32 : 64-bit integer (Python default) The advantage of this is that we need not change upcasting rules and no old codes would break. Simply, new code that wants to take advantage of this facility might do so. E.g., arange(3.0, typecode = 'f') * 2.0 would be a double, as it is now. To get a float, use instead: arange(3.0, typecode = 'f') * 2.0f and so on. Please do not jump on me if this is impossible to implement for some reason that has to do with Python's innards (I am new to all this) :) (would it require syntax extensions which are not permitted to modules?) but it seems to me that, if something like this is doable, it would be a nice compromise between clarity, adding new features and not breaking old code. Christos Siopis From dubois1@llnl.gov Wed Jun 23 22:51:45 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Wed, 23 Jun 1999 14:51:45 -0700 Subject: [Matrix-SIG] Numeric Nits References: Message-ID: <000701bebdc2$96cd61e0$f4160218@plstn1.sfba.home.com> > I was thinking that using single-letter appendices at the end of numbers > to designate non-default types might serve as a solution. E.g.: > > 2.4f , 2.4e03f : float (single precision) > 2.4d == 2.4 , 2.4e03 : double (Python default) > 32b : 8-bit integer > 32s : 16-bit integer > 32i : 32-bit integer > 32 : 64-bit integer (Python default) > > Christos Siopis > And there you almost have the Fortran 90 solution: integer, parameter:: normal = selected_real_kind(6,35) integer, parameter:: precise = selected_real_kind(14, 100) real(normal) x real(precise) y y = real (x, precise) ! conversion operator x = 1.23_normal ! constant, normal precision real z ! default real precision, processor dependent z = 3. ! default real constant, processor dependent Here normal and precise are called "kind parameters". The intrinsic selected_real_kind is a compile-time-callable function. The function selected_real_kind returns a "kind parameter" that promises the given number of decimal digits of accuracy and at least that much exponent range (e.g. 1e-35 to 1e+35) If one is going to do anything to Python or Numeric's precision we ought to go for the gold and be the SECOND language in which it is possible to do portable numerical programming. The current definition of a default real in Python is that it is the same as a C double. That is worth knowing but not actually a guarantee of anything. Basing your programming on the representation of the number as occupying a certain number of bits is very odd, if you think about it. From ransom@cfa.harvard.edu Wed Jun 23 19:47:48 1999 From: ransom@cfa.harvard.edu (Scott M. Ransom) Date: Wed, 23 Jun 1999 18:47:48 +0000 Subject: [Matrix-SIG] Numeric Nits References: Message-ID: <37712BD4.31D78D26@cfa.harvard.edu> Christos Siopis wrote: > > On Wed, 23 Jun 1999, Scott M. Ransom wrote: > > > The more I think about it, maybe Konrad et al. are correct with the > > solution of declaring certain 'f' arrays higher in the cast hierarchy > > than doubles. These arrays would never be up-cast to type double. This > > would allow the use of the same simple syntax (i.e. arange(3.0, typecode > = 'f') * 2.0 gives array([ 0., 2., 4.],'f') _not_ array([ 0., 2., > 4.]). > > However, just to play devil's advocate, suppose a new user *does* know > that Python natively only supports doubles and that upcasting is the > normal behavior --then s/he might be confused that the result is a float. > Alternatively, what if one *does* want the result of arange(3.0, typecode > = 'f') * 2.0 to be a double? (ok here one could simply use 'd' instead of > 'f' but that could be an issue in more complex expressions, where one > might be forced to convert the whole expression back to double). > > Furthermore, if types such as bytes, shorts, etc. are also introduced, > figuring out the resulting type might get messy, e.g. would I guess I didn't state this very clearly, but I was thinking of a _new_ type code for single-precision floats that would never be up-cast. Call the type 'fnc' for float no-cast or something. Then a standard 'f' type array performs as usual, but an 'fnc' array would not be up-cast. So my example above should read: (i.e. arange(3.0, typecode = 'fnc') * 2.0 gives array([ 0., 2., 4.],'fnc') _not_ array([ 0., 2., 4.]). I believe that the greatest need for this type is on the floating point side of things. Since Python uses Int32 as the standard length integer (or am I wrong about this on 64bit machines...), casts from 'int' to 'long' are not a problem. If people are concerned with short and byte arrays on the other hand, it should be possible to create no-cast versions of these types (although now we are talking about a whole lotta complication). Scott -- Scott M. Ransom Phone: (781) 320-9867 Address: 75 Sanderson Ave. email: ransom@cfa.harvard.edu Dedham, MA 02026 PGP Fingerprint: D2 0E D0 10 CD 95 06 DA EF 78 FE 2B CB 3A D3 53 From dlr@postbox.ius.cs.cmu.edu Thu Jun 24 00:46:25 1999 From: dlr@postbox.ius.cs.cmu.edu (dlr@postbox.ius.cs.cmu.edu) Date: Wed, 23 Jun 99 19:46:25 EDT Subject: [Matrix-SIG] Conversion of scalars. Message-ID: <199906232346.TAA21928@python.org> > [Much discussion of how to prevent upcasting of float arrays to > double.] Wasn't there a recent post promising clean ways to subclass Numpy arrays? If there is a way to cleanly subclass arrays, could we get some mileage out of creating a non-promoting subclass? The non-promoting version could define operators which type-check their arguments, demote them if necessary, and then call the operators of the parent class. Or am I confused? David From hinsen@cnrs-orleans.fr Thu Jun 24 13:31:28 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 24 Jun 1999 14:31:28 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: (message from Christos Siopis on Wed, 23 Jun 1999 17:34:13 -0400 (EDT)) References: Message-ID: <199906241231.OAA07003@chinon.cnrs-orleans.fr> > However, just to play devil's advocate, suppose a new user *does* know > that Python natively only supports doubles and that upcasting is the > normal behavior --then s/he might be confused that the result is a float. That's why this would be a special-purpose *additional* float32 type. New users wouldn't know about it and not use it, and those who do use it are supposed to know why! > Alternatively, what if one *does* want the result of arange(3.0, typecode > = 'f') * 2.0 to be a double? (ok here one could simply use 'd' instead of You'd cast it to a standard float32 first; that operation could be implemented without copying the array data, so without serious memory overhead. > Furthermore, if types such as bytes, shorts, etc. are also introduced, > figuring out the resulting type might get messy, e.g. would Indeed. Again, new users are not supposed to mess around with this. > How about: > > array([0,2,4],'Int32') + array([9,10,11],'Int64') > > or even > > array([0.,2.,4.],'Float32') + array([9.,10.,11.],'Float64') > > (i.e., would the 'f'-arrays-higher-in-the-case-hierarchy apply only with > respect to scalar doubles or also with respect to array doubles?) Scalar and array double are always treated in the same way, and I don't think this should be changed. > Even if such rules could be clearly set out, would they not make > Numeric programming more complicated and error-prone? Of course, but no more than any other proposition made so far. > I was thinking that using single-letter appendices at the end of numbers > to designate non-default types might serve as a solution. E.g.: > > 2.4f , 2.4e03f : float (single precision) > 2.4d == 2.4 , 2.4e03 : double (Python default) Fine, but that requires a change to the Python interpreter (implementation of new data types plus special syntax for literals). Not impossible, but I think Guido will want to see some very good arguments. > (would it require syntax extensions which are not permitted to modules?) Yes. Modules can implement new data types, but no new literals. The Python parser would have to know that 2.0f is legal syntax, for example. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Thu Jun 24 13:44:09 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 24 Jun 1999 14:44:09 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <000701bebdc2$96cd61e0$f4160218@plstn1.sfba.home.com> (dubois1@llnl.gov) References: <000701bebdc2$96cd61e0$f4160218@plstn1.sfba.home.com> Message-ID: <199906241244.OAA07039@chinon.cnrs-orleans.fr> > If one is going to do anything to Python or Numeric's precision we ought to > go for the gold and be the SECOND language in which it is possible to do > portable numerical programming. The current definition of a default real in I like that idea - but how would one implement it in portable C? And what would Python do to satisfy "unreasonable" precision requests? Use some arbitrary-precision library? In fact, what do Fortran 90 compilers do? BTW, I'd also prefer Python integers to be of the same size on all machines, most of all to be able to deal with binary files in a portable way. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From dubois1@llnl.gov Thu Jun 24 14:56:25 1999 From: dubois1@llnl.gov (Paul F. Dubois) Date: Thu, 24 Jun 1999 06:56:25 -0700 Subject: [Matrix-SIG] Numeric Nits References: <000701bebdc2$96cd61e0$f4160218@plstn1.sfba.home.com> <199906241244.OAA07039@chinon.cnrs-orleans.fr> Message-ID: <002401bebe49$597236e0$f4160218@plstn1.sfba.home.com> ----- Original Message ----- From: Konrad Hinsen To: Cc: Sent: Thursday, June 24, 1999 5:44 AM Subject: Re: [Matrix-SIG] Numeric Nits > > If one is going to do anything to Python or Numeric's precision we ought to > > go for the gold and be the SECOND language in which it is possible to do > > portable numerical programming. The current definition of a default real in > > I like that idea - but how would one implement it in portable C? And > what would Python do to satisfy "unreasonable" precision requests? Use > some arbitrary-precision library? In fact, what do Fortran 90 compilers > do? > > BTW, I'd also prefer Python integers to be of the same size on all > machines, most of all to be able to deal with binary files in a > portable way. > -- Fortran 90 compilers get an error if your request is "unreasonable" for that compiler. The same kind syntax is also used for controlling integer variable size. I presume that a routine to calculate the word sizes is possible or that something can be done with configure, but I haven't thought it out. From jh@oobleck.tn.cornell.edu Thu Jun 24 16:08:17 1999 From: jh@oobleck.tn.cornell.edu (Joe Harrington) Date: Thu, 24 Jun 1999 11:08:17 -0400 Subject: [Matrix-SIG] RE: Numeric Nits Message-ID: <199906241508.LAA15305@oobleck.tn.cornell.edu> A good posting from the STScI folks. I agree that Numeric must pay attention to speed and memory usage issues, even if Python doesn't. In an ideal world, these practical concerns and their solutions would be fed back to Guido and future language authors, whose professional experience with these issues is evidently rather limited. People have been talking about these problems since at least the late 1960s, when "double" became the default type for scientific notation in C and Fortran programs and all casts were up. That said, working with 16k**2 images is inherently hard, and to do so well will require a solution like what Saoimage, Gimp, Adobe Illustrator, and other image programs provide, namely indexed subsampling. You make a subsampled image that the user looks at on screen and does sample calculations on. When you're happy with the results there, you do the calculations on the full image. In visualization programs, you can run a magnifying glass around the subsampled image and see a full-resolution image section in another window. This would seem inherently easy to do in a class, such that you could just set a switch to tell whether to use the subsampled image or the real thing. A method could give back a local blowup given a location in the subsample and the desired size. On another note, I, too, got unsubscribed against my will, and was out of action for at least 2 months. I missed most of the discussion regarding my previous postings on a wrappable numeric library. I don't know how this happened but perhaps it might be worth looking into, given that it's happened to many of us. --jh-- From hinsen@cnrs-orleans.fr Thu Jun 24 18:59:19 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 24 Jun 1999 19:59:19 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <002401bebe49$597236e0$f4160218@plstn1.sfba.home.com> (dubois1@llnl.gov) References: <000701bebdc2$96cd61e0$f4160218@plstn1.sfba.home.com> <199906241244.OAA07039@chinon.cnrs-orleans.fr> <002401bebe49$597236e0$f4160218@plstn1.sfba.home.com> Message-ID: <199906241759.TAA09024@chinon.cnrs-orleans.fr> > Fortran 90 compilers get an error if your request is "unreasonable" for that > compiler. > The same kind syntax is also used for controlling integer variable size. > > I presume that a routine to calculate the word sizes is possible or that > something can be done with configure, but I haven't thought it out. Both integer and float type properties can be analyzed at configure time or at run time, that's no problem. For floats suitable code could be stolen from LAPACK, and for integers it is very simple to write. All this sounds like a good proposal for Python 2.0, together with a general rework of the number/arithmetic system. Maybe the Matrix-SIG could work out a detailed proposal to be submitted to Guido. Here are some features that I would like to see: - Support of different precisions in a portable way (see above). - Integration of arrays into the Python core, equivalence between scalars and rank-0 arrays. - integration of all number-like data types into a single data type, with integers, floats, etc. being only different internal representations of numbers. That would remove oddities such as 1/2 == 0 (the single most surprising feature in Python for newcomers in my opinion). Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Thu Jun 24 19:07:08 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 24 Jun 1999 20:07:08 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <37712BD4.31D78D26@cfa.harvard.edu> (ransom@cfa.harvard.edu) References: <37712BD4.31D78D26@cfa.harvard.edu> Message-ID: <199906241807.UAA09043@chinon.cnrs-orleans.fr> > side of things. Since Python uses Int32 as the standard length integer > (or am I wrong about this on 64bit machines...), casts from 'int' to Python ints are C longs. On Alphas, for example, this means 64 bit numbers. And yes, this does create compatibility problems, it is for example impossible to read integer arrays pickled on an Alpha on any other machine. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Thu Jun 24 19:10:52 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 24 Jun 1999 20:10:52 +0200 Subject: [Matrix-SIG] Conversion of scalars. In-Reply-To: <199906232346.TAA21928@python.org> (dlr@postbox.ius.cs.cmu.edu) References: <199906232346.TAA21928@python.org> Message-ID: <199906241810.UAA09047@chinon.cnrs-orleans.fr> > If there is a way to cleanly subclass arrays, could we get some > mileage out of creating a non-promoting subclass? The non-promoting > version could define operators which type-check their arguments, > demote them if necessary, and then call the operators of the parent > class. That's certainly doable. Depending on how the subclassing will work (is it via ExtensionTypes?), it could mean reimplementing all arithmetic operations for non-promoting arrays. And C routines that expect array arguments would perhaps refuse non-promoting array arguments. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From emmanuel.viennet@lipn.univ-paris13.fr Thu Jun 24 21:28:28 1999 From: emmanuel.viennet@lipn.univ-paris13.fr (Emmanuel Viennet) Date: Thu, 24 Jun 1999 22:28:28 +0200 Subject: [Matrix-SIG] Numeric Pickling Message-ID: <377294EC.772F9510@lipn.univ-paris13.fr> The current (LNLL11) Numeric pickling mecanism seems broken: it does not handle anymore byte-ordering issues. e.g., if we write data on a BigEndian machine: >>> x = array( [ 1234567, -1234567 ] ) >>> pickle.dump( x, open( 'test.pck', 'w' ) ) and then read the file on a LittleEndian (eg a PC) >>> x = pickle.load( open('test.pck') ) we get x == array([-2016013824, 2032791039]) Of course, a call to x.byteswapped() fixes the problem, but pickled objects are supposed to be portables across different architectures ! The problem is that there is currently no information in the pickle file about byte-ordering. I propose to change the pickling code (at the end of Numeric.py) to: # -- import copy_reg def array_constructor(shape, typecode, thestr, Endian=LittleEndian): print 'Endian=%s' % Endian x = fromstring(thestr, typecode) x.shape = shape if LittleEndian != Endian: return x.byteswapped() else: return x def pickle_array(a): return array_constructor, (a.shape, a.typecode(), a.tostring(), LittleEndian) copy_reg.pickle(ArrayType, pickle_array, array_constructor) # -- This solution allows to read existing files, defaulting to current (buggy) behavior (assume native byte ordering if no information in pickle). Emmanuel -- Emmanuel Viennet: LIPN - Institut Galilee - Universite Paris-Nord 93430 Villetaneuse - France http://www-lipn.univ-paris13.fr/~viennet/ From siopis@astro.ufl.edu Fri Jun 25 06:33:06 1999 From: siopis@astro.ufl.edu (Christos Siopis) Date: Fri, 25 Jun 1999 01:33:06 -0400 (EDT) Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906241231.OAA07003@chinon.cnrs-orleans.fr> Message-ID: On Thu, 24 Jun 1999, Konrad Hinsen wrote: > > However, just to play devil's advocate, suppose a new user *does* know > > that Python natively only supports doubles and that upcasting is the > > normal behavior --then s/he might be confused that the result is a float. > > That's why this would be a special-purpose *additional* float32 type. > New users wouldn't know about it and not use it, and those who do use > it are supposed to know why! > > .... > > > Furthermore, if types such as bytes, shorts, etc. are also introduced, > > figuring out the resulting type might get messy, e.g. would > > Indeed. Again, new users are not supposed to mess around with this. I would be somewhat reluctant to make this distinction between new and experienced users. I think it has more to do with what one is working on rather than his/her level of experience. Someone doing number crunching, whether new or experienced, would most probably not have to deal with multiple data types beyong what NumPy currently supports. However, someone doing data analysis could well need access to 8-bit or 16-bit integers from day one (e.g., if I have an RGB color datacube with 8-bit depth, I might get annoyed if it is *automatically* converted to Int32 and then I'd have to convert it back to Int8 to save it and so on). Christos From hinsen@cnrs-orleans.fr Fri Jun 25 11:04:35 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 25 Jun 1999 12:04:35 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: (message from Christos Siopis on Fri, 25 Jun 1999 00:29:38 -0400 (EDT)) References: Message-ID: <199906251004.MAA09629@chinon.cnrs-orleans.fr> > > - Support of different precisions in a portable way (see above). > > Though this would certainly be desirable for numerical work, does it > endanger in any way the ability to request a fixed-bitlength type? For No. As an alternative for requesting a particular precision, you could request a particular storage size. > > - integration of all number-like data types into a single > > data type, with integers, floats, etc. being only different > > internal representations of numbers. That would remove oddities > > such as 1/2 == 0 (the single most surprising feature in Python > > for newcomers in my opinion). > > I admit I fail to see why aforementioned integration of all number-like > data types would remove the "oddity" that 1/2 == 0, but I humbly submit Not directly. But the main argument for requiring 1/2 to be zero has always been that an operation between two objects of integer type should return a result of the same type. Removing that argument is a first step towards sanity ;-) > that this is what all Fortran and C programmers have come to accept (and > often take advantage of!) and so I see it as a feature rather than a bug. To quote a visiting graduate student who has recently started Python after previous Fortran experience, and who spent three days searching for a bug that turned out to be an integer division problem: "I knew this problem from Fortran, but Python is a so much nicer language that I was sure they had fixed this." During the last year I have worked with three students who started with Python, and the only problem that all of them encountered and considered unreasonable was integer division. Of course an integer division operation should exist, but it should not be the one associated with the general division operator. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Fri Jun 25 11:09:49 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 25 Jun 1999 12:09:49 +0200 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: (message from Christos Siopis on Fri, 25 Jun 1999 01:33:06 -0400 (EDT)) References: Message-ID: <199906251009.MAA09638@chinon.cnrs-orleans.fr> > > > Furthermore, if types such as bytes, shorts, etc. are also introduced, > > > figuring out the resulting type might get messy, e.g. would > > > > Indeed. Again, new users are not supposed to mess around with this. > > I would be somewhat reluctant to make this distinction between new and > experienced users. I think it has more to do with what one is working on > rather than his/her level of experience. Someone doing number crunching, I understand "new user" as "new to data size and casting problems", not "new to Python". Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen@cnrs-orleans.fr Fri Jun 25 15:56:24 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Fri, 25 Jun 1999 16:56:24 +0200 Subject: [Matrix-SIG] Numeric Pickling In-Reply-To: <377294EC.772F9510@lipn.univ-paris13.fr> (message from Emmanuel Viennet on Thu, 24 Jun 1999 22:28:28 +0200) References: <377294EC.772F9510@lipn.univ-paris13.fr> Message-ID: <199906251456.QAA10174@chinon.cnrs-orleans.fr> > The current (LNLL11) Numeric pickling mecanism > seems broken: it does not handle anymore byte-ordering > issues. I am quite certain that this works for me. I have used pickle files containing array data on various architectures and I never had problems. The last such massive data movements I did was last weeks, using LLNL11 on all machines involved. > The problem is that there is currently no information in the > pickle file about byte-ordering. > > I propose to change the pickling code (at the end of Numeric.py) to: Ah, I see where the difference is. Apparently someone added support for pickling with recent Python pickle versions to NumPy. Being a NumPy veteran, I used (and still use) the subclassed pickler and unpickler from Numeric. I didn't even know that the newer mechanism had been implemented! Anyway, the "old" pickle code works. So it seems we have two pickle implementations, which don't use the same file format! And only one of them works correctly. Maybe this is a good time to work on a better implementation of array pickling, supporting cPickle as well as pickle, and not causing the tremendous memory overhead of calling tostring(). Plus of course reading both earlier file formats for compatibility! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From da@ski.org Fri Jun 25 17:43:58 1999 From: da@ski.org (David Ascher) Date: Fri, 25 Jun 1999 09:43:58 -0700 (Pacific Daylight Time) Subject: [Matrix-SIG] Numeric Pickling In-Reply-To: <199906251456.QAA10174@chinon.cnrs-orleans.fr> Message-ID: On Fri, 25 Jun 1999, Konrad Hinsen wrote: > Ah, I see where the difference is. Apparently someone added support > for pickling with recent Python pickle versions to NumPy. Being a > NumPy veteran, I used (and still use) the subclassed pickler and > unpickler from Numeric. I didn't even know that the newer mechanism > had been implemented! I suspect that I'm the guilty party. I apologize. > Anyway, the "old" pickle code works. So it seems we have two pickle > implementations, which don't use the same file format! And only one > of them works correctly. > > Maybe this is a good time to work on a better implementation of array > pickling, supporting cPickle as well as pickle, and not causing the > tremendous memory overhead of calling tostring(). Plus of course reading > both earlier file formats for compatibility! How about we start a conversation off-line (Konrad, Emmanuel and me) to hash this out? I only have access to one byte-ordering kind, so testing this is nontrivial for me. The rest of you, stay tuned. --david From frohne@gci.net Fri Jun 25 19:19:22 1999 From: frohne@gci.net (Ivan Frohne) Date: Fri, 25 Jun 1999 10:19:22 -0800 Subject: [Matrix-SIG] Numeric Nits In-Reply-To: <199906251004.MAA09629@chinon.cnrs-orleans.fr> Message-ID: <000001bebf37$3fac1020$f4529fd0@dellxpsd333> Of course an integer division operation should exist, but it should not be the one associated with the general division operator. Fortran, and other strongly typed languages, may safely utilize a general division operator. If J and K are Fortran integers, then so is J / K. But in Python, who knows what type J and K are? You may intend them to be integers, but if the previous assignment to J was, for example, J = math.sqrt(100), J is now a double. If you want to be sure that J / K is an integer, you must use int(J) / int(K). Or, if you don't want 1 / 2 == 0, use float(J) / float(K). In Python, the general division operator is unsafe. But to require a different division operator for every numerical type (integer, float and complex) would be inconvenient. And maybe convenience ranks higher than safety. That's above my pay grade. --Ivan Frohne From kernr@ncifcrf.gov Tue Jun 29 02:48:35 1999 From: kernr@ncifcrf.gov (Robert Kern) Date: Mon, 28 Jun 1999 21:48:35 -0400 Subject: [Matrix-SIG] FFTW Numpy 0.6 Binaries for Win32 Message-ID: <377825F3.2C28EEEE@mail.ncifcrf.gov> I have compiled Travis Oliphant's FFTW wrappers for Win32 systems. They are compiled against FFTW version 2.1.2, and the underlying libraries have been optimized for Pentium processors (to an extent). Unlike my previous attempts, these binaries are now faster than the FFTPACK wrapper distributed in LLNLDistribution11 on my Pentium and Pentium II machines (Win95 and Win98, respectively). At least, according to the benchFFT2.py script that Travis provides. The archive is http://starship.python.net/crew/kernr/binaries/fftw-numpy-0.6w.zip My collection of other binaries is http://starship.python.net/crew/kernr/binaries/Binaries.html Have fun and tell me if they fail for you. -- Robert Kern | ----------------------|"In the fields of Hell where the grass grows high This space | Are the graves of dreams allowed to die." intentionally | - Richard Harter left blank. | From HYoon@exchange.ml.com Tue Jun 29 15:40:18 1999 From: HYoon@exchange.ml.com (Yoon, Hoon (CICG - NY Program Trading)) Date: Tue, 29 Jun 1999 10:40:18 -0400 Subject: [Matrix-SIG] Free Fortran Complier for F90 Message-ID: http://www.lahey.com/elfform.htm http://www.indowsway.com/demo.htm Notice that lahey compiler is for F90 only. The indowsway might help with that. The final code may be slower than g77, however. ************************************************************** S. Hoon Yoon (Quant) Merrill Lynch Equity Trading yelled@yahoo.com hoon@bigfoot.com(w) "Miracle is always only few standard deviations away, but so is catastrophe." * Expressed opinions are often my own, but NOT my employer's. "I feel like a fugitive from the law of averages." Mauldin **************************************************************