From charlesr.harris at gmail.com Fri Dec 1 00:12:03 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 30 Nov 2006 22:12:03 -0700 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Keith Goodman wrote: > > What's a good way to save matrix objects to file for later use? I just > need something quick for debugging. > > I saw two suggestions on this list from Francesc Altet (2006-05-22): > > 1. Use tofile and fromfile and save the meta data yourself. > > 2. pytables > > Any suggestions for #3? Is this what you want? In [14]: a Out[14]: matrix([[2, 3], [4, 5]]) In [15]: b Out[15]: matrix([[2, 3], [4, 5]]) In [16]: f = open('dump.pkl','w') In [17]: pickle.dump(a,f) In [18]: pickle.dump(b,f) In [19]: f.close() In [20]: f = open('dump.pkl','r') In [21]: x = pickle.load(f) In [22]: y = pickle.load(f) In [23]: f.close() In [24]: x Out[24]: matrix([[2, 3], [4, 5]]) In [25]: y Out[25]: matrix([[2, 3], [4, 5]]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Dec 1 00:17:29 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 30 Nov 2006 21:17:29 -0800 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Charles R Harris wrote: > > > On 11/30/06, Keith Goodman wrote: > > What's a good way to save matrix objects to file for later use? I just > > need something quick for debugging. > > > > I saw two suggestions on this list from Francesc Altet (2006-05-22): > > > > 1. Use tofile and fromfile and save the meta data yourself. > > > > 2. pytables > > > > Any suggestions for #3? > Is this what you want? > > In [14]: a > Out[14]: > matrix([[2, 3], > [4, 5]]) > > In [15]: b > Out[15]: > matrix([[2, 3], > [4, 5]]) > > In [16]: f = open(' dump.pkl','w') > > In [17]: pickle.dump(a,f) > > In [18]: pickle.dump(b,f) > > In [19]: f.close() > > In [20]: f = open('dump.pkl','r') > > In [21]: x = pickle.load(f) > > In [22]: y = pickle.load(f) > > In [23]: f.close() > > In [24]: x > Out[24]: > matrix([[2, 3], > [4, 5]]) > > In [25]: y > Out[25]: > matrix([[2, 3], > [4, 5]]) Yes. That will do very well. You got me out of a pickle. From charlesr.harris at gmail.com Fri Dec 1 00:19:24 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 30 Nov 2006 22:19:24 -0700 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Charles R Harris wrote: > > > > On 11/30/06, Keith Goodman wrote: > > > > What's a good way to save matrix objects to file for later use? I just > > need something quick for debugging. > > > > I saw two suggestions on this list from Francesc Altet (2006-05-22): > > > > 1. Use tofile and fromfile and save the meta data yourself. > > > > 2. pytables > > > > Any suggestions for #3? > > Is this what you want? > > In [14]: a > Out[14]: > matrix([[2, 3], > [4, 5]]) > > In [15]: b > Out[15]: > matrix([[2, 3], > [4, 5]]) > > In [16]: f = open(' dump.pkl','w') > > In [17]: pickle.dump(a,f) > > In [18]: pickle.dump(b,f) > > In [19]: f.close() > > In [20]: f = open('dump.pkl','r') > > In [21]: x = pickle.load(f) > > In [22]: y = pickle.load(f) > > In [23]: f.close() > > In [24]: x > Out[24]: > matrix([[2, 3], > [4, 5]]) > > In [25]: y > Out[25]: > matrix([[2, 3], > [4, 5]]) > It is also possible to put the variables of interest in a dictionary, then pickle the dictionary. That way you can also store the variable names. In [27]: f = open('dump.pkl','w') In [28]: pickle.dump( {'a':a,'b':b}, f) In [29]: f.close() In [30]: f = open('dump.pkl','r') In [31]: mystuff = pickle.load(f) In [32]: f.close() In [34]: mystuff Out[34]: {'a': matrix([[2, 3], [4, 5]]), 'b': matrix([[2, 3], [4, 5]])} I think you can actually pickle the whole evironment, but I don't recall how. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Dec 1 00:30:25 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 30 Nov 2006 21:30:25 -0800 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Charles R Harris wrote: > > It is also possible to put the variables of interest in a dictionary, then > pickle the dictionary. That way you can also store the variable names. > > In [27]: f = open(' dump.pkl','w') > > In [28]: pickle.dump( {'a':a,'b':b}, f) > > In [29]: f.close() > > In [30]: f = open('dump.pkl','r') > > In [31]: mystuff = pickle.load(f) > > In [32]: f.close() > > In [34]: mystuff > Out[34]: > {'a': matrix([[2, 3], > [4, 5]]), 'b': matrix([[2, 3], > [4, 5]])} I think I could use that to write a function savematrix(filename, a, b, c,...) Is there a way to write a loadmatrix(filename) that doesn't return anything but makes the matrices a, b, c, ... available? Probably not a good function design. But useful for quick things. From charlesr.harris at gmail.com Fri Dec 1 01:33:05 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 30 Nov 2006 23:33:05 -0700 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Keith Goodman wrote: > > On 11/30/06, Charles R Harris wrote: > > > > It is also possible to put the variables of interest in a dictionary, > then > > pickle the dictionary. That way you can also store the variable names. > > > > In [27]: f = open(' dump.pkl','w') > > > > In [28]: pickle.dump( {'a':a,'b':b}, f) > > > > In [29]: f.close() > > > > In [30]: f = open('dump.pkl','r') > > > > In [31]: mystuff = pickle.load(f) > > > > In [32]: f.close() > > > > In [34]: mystuff > > Out[34]: > > {'a': matrix([[2, 3], > > [4, 5]]), 'b': matrix([[2, 3], > > [4, 5]])} > > I think I could use that to write a function savematrix(filename, a, b, > c,...) > > Is there a way to write a loadmatrix(filename) that doesn't return > anything but makes the matrices a, b, c, ... available? I think there is, that is why I mentioned the saving the environment thingee. IIRC, I saw code for something like that a couple of years back but I don't recall the details. Maybe something like: In [80]: globals()['x'] = [1,2] In [81]: x Out[81]: [1, 2] Then you just have to merge the pickled dictionary with globals(). Like this: >>> globals().update(mystuff) where mystuff is the dictionary where you have your stuff. This could probably also go something like >>> globals().update(load(f)) where f contains the pickled dictionary. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Dec 1 02:02:19 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Dec 2006 00:02:19 -0700 Subject: [Numpy-discussion] save a matrix In-Reply-To: References: Message-ID: On 11/30/06, Charles R Harris wrote: > > > > On 11/30/06, Keith Goodman wrote: > > > > On 11/30/06, Charles R Harris wrote: > > > > > > It is also possible to put the variables of interest in a dictionary, > > then > > > pickle the dictionary. That way you can also store the variable names. > > > > > > > > In [27]: f = open(' dump.pkl','w') > > > > > > In [28]: pickle.dump( {'a':a,'b':b}, f) > > > > > > In [29]: f.close() > > > > > > In [30]: f = open('dump.pkl','r') > > > > > > In [31]: mystuff = pickle.load(f) > > > > > > In [32]: f.close() > > > > > > In [34]: mystuff > > > Out[34]: > > > {'a': matrix([[2, 3], > > > [4, 5]]), 'b': matrix([[2, 3], > > > [4, 5]])} > > > > I think I could use that to write a function savematrix(filename, a, b, > > c,...) > > > > Is there a way to write a loadmatrix(filename) that doesn't return > > anything but makes the matrices a, b, c, ... available? > > > I think there is, that is why I mentioned the saving the environment > thingee. IIRC, I saw code for something like that a couple of years back but > I don't recall the details. Maybe something like: > > In [80]: globals()['x'] = [1,2] > > In [81]: x > Out[81]: [1, 2] > > Then you just have to merge the pickled dictionary with globals(). Like > this: > > >>> globals().update(mystuff) > > where mystuff is the dictionary where you have your stuff. This could > probably also go something like > > >>> globals().update(load(f)) > > where f contains the pickled dictionary. > You could probably dump the entire environment from a subroutine, cPickle.dump(globals(), f), which might be a good way to save everything off, but not very efficient. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Fri Dec 1 04:54:36 2006 From: faltet at carabos.com (Francesc Altet) Date: Fri, 1 Dec 2006 10:54:36 +0100 Subject: [Numpy-discussion] bare bones numpy extension code In-Reply-To: <87d574n3v5.fsf@peds-pc311.bsd.uchicago.edu> References: <87d574n3v5.fsf@peds-pc311.bsd.uchicago.edu> Message-ID: <200612011054.36567.faltet@carabos.com> A Dijous 30 Novembre 2006 20:14, John Hunter escrigu?: > A colleague of mine wants to write some numpy extension code. I > pointed him to lots of examples in the matplotlib src dir, but the > build environment is more complicated than he needs with all the > numpy/numeric/numarray switches, etc. Does someone have the basic > "hello world" of numpy extensions that includes src code and a basic > setup.py that I can pass on to him. It might be nice to include > something like that in a numpy "examples" directory. Hi, In case your colleague is going to use Pyrex to do her extensions (which I do recommend, specially for naive users), you can find some simple but nice examples in the numpy/doc/pyrex/ directory of the NumPy distribution. Also interesting for beginners is: http://www.scipy.org/Cookbook/Pyrex_and_NumPy HTH, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lroubeyrie at limair.asso.fr Fri Dec 1 06:19:42 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Fri, 1 Dec 2006 12:19:42 +0100 Subject: [Numpy-discussion] subclass Message-ID: <200612011219.42733.lroubeyrie@limair.asso.fr> Hi all, is it possible to subclass numpy.array to set extras functionalities and change the behavior of others? I can't find any docs on that. Thanks -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From pgmdevlist at gmail.com Fri Dec 1 06:58:02 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Dec 2006 06:58:02 -0500 Subject: [Numpy-discussion] subclass In-Reply-To: <200612011219.42733.lroubeyrie@limair.asso.fr> References: <200612011219.42733.lroubeyrie@limair.asso.fr> Message-ID: <200612010658.02749.pgmdevlist@gmail.com> On Friday 01 December 2006 06:19, Lionel Roubeyrie wrote: > Hi all, > is it possible to subclass numpy.array to set extras functionalities and > change the behavior of others? I can't find any docs on that. Did you really look ;) ? http://www.scipy.org/Subclasses Check also the new implementation of MaskedArray, where masked arrays are implemented as subclass of ndarrays http://projects.scipy.org/scipy/numpy/wiki/MaskedArray. If you have some particular features in mind, let us know. P. From tgrav at mac.com Fri Dec 1 07:38:38 2006 From: tgrav at mac.com (Tommy Grav) Date: Fri, 1 Dec 2006 07:38:38 -0500 Subject: [Numpy-discussion] ScipySuperpack for Mac (PowerPC) Message-ID: <590C35C8-95B4-480C-8B08-92AEC624E27C@mac.com> I installed the Mac ScipySuperpack (from http://www.scipy.org/Download). However it seems that the version of matplotlib in there is not compatible with their version of numpy [tgrav@******] ch2/pbcd -> python ActivePython 2.4.3 Build 11 (ActiveState Software Inc.) based on Python 2.4.3 (#1, Apr 3 2006, 18:07:18) [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import pylab >>> x = range(1,10) >>> y = range(1,10) >>> pylab.plot(x,y) [] >>> pylab.show() alloc: invalid block: 0xa08bcd8: a 68 0 Abort [tgrav@*****] ch2/pbcd -> Anyone know how to fix this? Cheers Tommy From faltet at carabos.com Fri Dec 1 07:54:38 2006 From: faltet at carabos.com (Francesc Altet) Date: Fri, 1 Dec 2006 13:54:38 +0100 Subject: [Numpy-discussion] subclass In-Reply-To: <200612011219.42733.lroubeyrie@limair.asso.fr> References: <200612011219.42733.lroubeyrie@limair.asso.fr> Message-ID: <200612011354.39298.faltet@carabos.com> A Divendres 01 Desembre 2006 12:19, Lionel Roubeyrie escrigu?: > Hi all, > is it possible to subclass numpy.array to set extras functionalities and > change the behavior of others? I can't find any docs on that. > Thanks If what you want is extending the functionality of ndarray at C level, there is a complete section dedicated to this ('Subtyping the ndarray in C', chapter 15) in Travis' Guide to NumPy [1]. [1] http://www.tramy.us/ Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lroubeyrie at limair.asso.fr Fri Dec 1 08:04:35 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Fri, 1 Dec 2006 14:04:35 +0100 Subject: [Numpy-discussion] subclass In-Reply-To: <200612010658.02749.pgmdevlist@gmail.com> References: <200612011219.42733.lroubeyrie@limair.asso.fr> <200612010658.02749.pgmdevlist@gmail.com> Message-ID: <200612011404.35872.lroubeyrie@limair.asso.fr> Arg! I really didn't see that! thanks Le Vendredi 01 D?cembre 2006 12:58, Pierre GM a ?crit?: > On Friday 01 December 2006 06:19, Lionel Roubeyrie wrote: > > Hi all, > > is it possible to subclass numpy.array to set extras functionalities and > > change the behavior of others? I can't find any docs on that. > > Did you really look ;) ? > http://www.scipy.org/Subclasses > > Check also the new implementation of MaskedArray, where masked arrays are > implemented as subclass of ndarrays > http://projects.scipy.org/scipy/numpy/wiki/MaskedArray. > > If you have some particular features in mind, let us know. > P. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From lroubeyrie at limair.asso.fr Fri Dec 1 08:20:31 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Fri, 1 Dec 2006 14:20:31 +0100 Subject: [Numpy-discussion] subclass In-Reply-To: <200612011354.39298.faltet@carabos.com> References: <200612011219.42733.lroubeyrie@limair.asso.fr> <200612011354.39298.faltet@carabos.com> Message-ID: <200612011420.31501.lroubeyrie@limair.asso.fr> I search to handle time series by associating dates to a masked array, but no set (directly) computation (sum, max, ...) on dates and have the possibility to search/select datas entries by date. Le Vendredi 01 D?cembre 2006 13:54, Francesc Altet a ?crit?: > If what you want is extending the functionality of ndarray at C level, > there is a complete section dedicated to this ('Subtyping the ndarray > in C', chapter 15) in Travis' Guide to NumPy [1]. > > [1] http://www.tramy.us/ > > Cheers, -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From Chris.Barker at noaa.gov Fri Dec 1 15:22:12 2006 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri, 01 Dec 2006 12:22:12 -0800 Subject: [Numpy-discussion] seting the dtype for where... Message-ID: <45708EF4.1070202@noaa.gov> Hi all, I'd like to set the data type for what numpy.where creates. For example: import numpy as N N.where(a >= 5, 5, 0) creates an integer array, which makes sense. N.where(a >= 5, 5.0, 0) creates a float64 array, which also makes sense, but I'd like a float32 array, so I tried: N.where(a >= 5, array(5.0, dtype=N.float32), 0) but I got a float64 array again. How can I get a float32 array? where doesn't take a dtype argument -- maybe it should? numpy version 1.0 thanks, -Chris From robert.kern at gmail.com Fri Dec 1 16:04:34 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 01 Dec 2006 15:04:34 -0600 Subject: [Numpy-discussion] seting the dtype for where... In-Reply-To: <45708EF4.1070202@noaa.gov> References: <45708EF4.1070202@noaa.gov> Message-ID: <457098E2.6070307@gmail.com> Chris Barker wrote: > Hi all, > > I'd like to set the data type for what numpy.where creates. For example: > > import numpy as N > > N.where(a >= 5, 5, 0) > > creates an integer array, which makes sense. > > N.where(a >= 5, 5.0, 0) > > creates a float64 array, which also makes sense, but I'd like a float32 > array, so I tried: > > N.where(a >= 5, array(5.0, dtype=N.float32), 0) > > but I got a float64 array again. Well, it's consistent with all of the other coercion rules: In [6]: (array(5.0, dtype=float32) + 0).dtype Out[6]: dtype('float64') float64 is the lowest floating point dtype that can hold the full range of int32 values (much less int64) without losing precision. Since both operands ("coercands"?) are scalars, they both get a say in the final dtype (unlike a full array being coerced together with a scalar; only the array gets a say). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From kwgoodman at gmail.com Fri Dec 1 16:46:39 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 1 Dec 2006 13:46:39 -0800 Subject: [Numpy-discussion] nan functions convert matrix to array Message-ID: The first line of the nan functions (such as nansum, nanmin, nanmax) is y = array(a) That leads to matrix in, array out. Is there some way to make it matrix in, matrix out? Here, for example, is nansum: def nansum(a, axis=None): """Sum the array over the given axis, treating NaNs as 0. """ y = array(a) if not issubclass(y.dtype.type, _nx.integer): y[isnan(a)] = 0 return y.sum(axis) From mattknox_ca at hotmail.com Fri Dec 1 17:07:15 2006 From: mattknox_ca at hotmail.com (Matt Knox) Date: Fri, 1 Dec 2006 17:07:15 -0500 Subject: [Numpy-discussion] efficient way to get first index of first masked/non-masked value in a masked array Message-ID: all I can come up with is dumb brute force methods by iterating through all the values. Anyone got any tricks I can use? Thanks, - Matt Knox From robert.kern at gmail.com Fri Dec 1 17:13:41 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 01 Dec 2006 16:13:41 -0600 Subject: [Numpy-discussion] efficient way to get first index of first masked/non-masked value in a masked array In-Reply-To: References: Message-ID: <4570A915.90206@gmail.com> Matt Knox wrote: > all I can come up with is dumb brute force methods by iterating through all the values. Anyone got any tricks I can use? import numpy as np def first_masked(m): idx = np.where(m.mask)[0] if len(idx) != 0: return idx[0] else: raise ValueError("no masked data") first_unmasked() is left as an exercise for the reader. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Fri Dec 1 17:42:22 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Dec 2006 17:42:22 -0500 Subject: [Numpy-discussion] nan functions convert matrix to array In-Reply-To: References: Message-ID: <200612011742.23145.pgmdevlist@gmail.com> On Friday 01 December 2006 16:46, Keith Goodman wrote: > The first line of the nan functions (such as nansum, nanmin, nanmax) is > Is there some way to make it matrix in, matrix out? Quick workaround: Overwrite these functions with your own, where 'array' or 'asarray' in the first line is replaced by 'asanyarray'. From Chris.Barker at noaa.gov Fri Dec 1 18:55:41 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 01 Dec 2006 15:55:41 -0800 Subject: [Numpy-discussion] seting the dtype for where... In-Reply-To: <457098E2.6070307@gmail.com> References: <45708EF4.1070202@noaa.gov> <457098E2.6070307@gmail.com> Message-ID: <4570C0FD.7060003@noaa.gov> Robert Kern wrote: > Well, it's consistent with all of the other coercion rules: > > > In [6]: (array(5.0, dtype=float32) + 0).dtype > Out[6]: dtype('float64') duh! of course. If I use a float32 scalar for BOTH the operands, then I get a float32 array out. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From thibault at physics.cornell.edu Fri Dec 1 23:09:32 2006 From: thibault at physics.cornell.edu (Pierre Thibault) Date: Fri, 1 Dec 2006 23:09:32 -0500 Subject: [Numpy-discussion] numpy.fft.rfftn -> non-contiguous Message-ID: <1b1c766f0612012009q3ed87bd5i8b084915a6deda72@mail.gmail.com> Hello! I'm a little confused about what rfftn is doing: It seems to me that the best would be for it to return a C-contiguous array with the first element reduced by a half (plus one), so that one can easily obtain the non-repeated slices. What I get is the following: In [1]: from numpy import * In [2]: a = random.rand(8,8,8) In [3]: fa = fft.rfftn(a) In [4]: fa.shape Out[4]: (8, 8, 5) In [5]: fa.flags Out[5]: CONTIGUOUS : False FORTRAN : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [6]: fa.swapaxes(-1,-2).flags Out[6]: CONTIGUOUS : True FORTRAN : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False What I would like to have is fa.shape: (5,8,8) fa.flags: CONTIGUOUS = True So, now fa[0] is a contiguous block containing the data that is not supposed to appear twice in the complex fft (that is, it would be nice if fft.rfftn(a)[0] == fft.fftn(a)[0]). I tried playing with the axes argument in rfftn, to no avail. I can do without, but at least it looks kind of ugly to me that the array returned by a built-in function is not contiguous. By the way, I am using the official debian unstable numpy package. It doesn't look like it is using fftw, and I don't know if any of this behaviour would be different if I had compiled numpy by myself - I am not even sure using fftw is an option with numpy. Thanks for any answer/comments on that! Pierre -- Pierre Thibault 616 Clark Hall, Cornell University (607) 255-5522 From arildna at stud.ntnu.no Sun Dec 3 10:56:49 2006 From: arildna at stud.ntnu.no (=?ISO-8859-1?Q?Arild_B._N=E6ss?=) Date: Sun, 03 Dec 2006 16:56:49 +0100 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> Message-ID: Den 5. nov. 2006 kl. 20.26 skrev Steve Lianoglou: >> I'm sorry, I was a tad too quick typing there. I meant to say "And do >> I even need to [install Xcode] to run numpy?" Robert pointed out that >> a lot things mentioned in the install guide were necessary to run >> scipy, but that you could run numpy without them. >> >> Therefore I was wondering if installing the newest Xcode package was >> likely to fix the error message I am now getting when trying to >> install numpy: > > I think Robert may have suggested to install the newest XCode because > it will give you a newer gcc that can have a better chance compiling > numpy correctly (or at least will remove another "unkown" to help > find your true problem). > > Maybe there'd be some "Universal Binary-aware"ness that the old xcode > gcc might be missing that you'll get w/ the new one and since Python > 2.5 is universal, this might be it. Getting the new xcode would be > the simplest part of the install anyway, so .. why not :-) > > -steve Hi again, I had to make do without numpy for what I was originally planning to use it for, and I've been busy for a while, as well as fed up of not getting this thing to work. I've realized I'm really going to need it if I am to continue using python though, so I've installed the new XCode and given it another try. This gets me further, actually the installation seems to complete. However, when I type >> import Numeric in Python, I get the usual ImportError: No module named Numeric. >> import numpy works, but >>> a= array([[1,2,3],[4,5,6]]) tells me array is not defined. So the installation obviously hasn't worked. I'm sure some of you are as tired of hearing about this as I am of writing about it, but I really have no idea what to do here. The installation output in the terminal window is quite long, so I have only copied in the parts that seem to contain some kind of error message (see below). First it says that g77, f77, gfortran and f95 is missing, then I've copied in a long part where there is a lot of small errors: - a series of errors in configtests - 4 instances of "nothing done with h_files=..." - some more failing configtests with an "#error No _WIN32" Hope somebody can help. regards, Arild N?ss ------------------------------------------------------------------------ ------------------- Could not locate executable g77 Could not locate executable f77 Could not locate executable gfortran Could not locate executable f95 ... compile options: '-Inumpy/core/src -Inumpy/core/include -I/Library/ Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' gcc: _configtest.c _configtest.c: In function 'main': _configtest.c:4: error: 'isnan' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) _configtest.c: In function 'main': _configtest.c:4: error: 'isnan' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) lipo: can't figure out the architecture type of: /var/tmp//cczgPhx0.out _configtest.c: In function 'main': _configtest.c:4: error: 'isnan' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) _configtest.c: In function 'main': _configtest.c:4: error: 'isnan' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) lipo: can't figure out the architecture type of: /var/tmp//cczgPhx0.out failure. removing: _configtest.c _configtest.o C compiler: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/ MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 compile options: '-Inumpy/core/src -Inumpy/core/include -I/Library/ Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' gcc: _configtest.c _configtest.c: In function 'main': _configtest.c:4: error: 'isinf' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) _configtest.c: In function 'main': _configtest.c:4: error: 'isinf' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) lipo: can't figure out the architecture type of: /var/tmp//ccEAgr9A.out _configtest.c: In function 'main': _configtest.c:4: error: 'isinf' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) _configtest.c: In function 'main': _configtest.c:4: error: 'isinf' undeclared (first use in this function) _configtest.c:4: error: (Each undeclared identifier is reported only once _configtest.c:4: error: for each function it appears in.) lipo: can't figure out the architecture type of: /var/tmp//ccEAgr9A.out failure. removing: _configtest.c _configtest.o C compiler: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/ MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 compile options: '-Inumpy/core/src -Inumpy/core/include -I/Library/ Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' gcc: _configtest.c gcc _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest adding 'build/src.macosx-10.3-fat-2.5/numpy/core/config.h' to sources. executing numpy/core/code_generators/generate_array_api.py adding 'build/src.macosx-10.3-fat-2.5/numpy/core/ __multiarray_api.h' to sources. creating build/src.macosx-10.3-fat-2.5/numpy/core/src conv_template:> build/src.macosx-10.3-fat-2.5/numpy/core/src/ scalartypes.inc adding 'build/src.macosx-10.3-fat-2.5/numpy/core/src' to include_dirs. conv_template:> build/src.macosx-10.3-fat-2.5/numpy/core/src/ arraytypes.inc numpy.core - nothing done with h_files= ['build/src.macosx-10.3- fat-2.5/numpy/core/src/scalartypes.inc', 'build/src.macosx-10.3- fat-2.5/numpy/core/src/arraytypes.inc', 'build/src.macosx-10.3- fat-2.5/numpy/core/config.h', 'build/src.macosx-10.3-fat-2.5/numpy/ core/__multiarray_api.h'] building extension "numpy.core.umath" sources adding 'build/src.macosx-10.3-fat-2.5/numpy/core/config.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.macosx-10.3-fat-2.5/numpy/core/__ufunc_api.h' to sources. conv_template:> build/src.macosx-10.3-fat-2.5/numpy/core/src/ umathmodule.c adding 'build/src.macosx-10.3-fat-2.5/numpy/core/src' to include_dirs. numpy.core - nothing done with h_files= ['build/src.macosx-10.3- fat-2.5/numpy/core/src/scalartypes.inc', 'build/src.macosx-10.3- fat-2.5/numpy/core/src/arraytypes.inc', 'build/src.macosx-10.3- fat-2.5/numpy/core/config.h', 'build/src.macosx-10.3-fat-2.5/numpy/ core/__ufunc_api.h'] building extension "numpy.core._sort" sources adding 'build/src.macosx-10.3-fat-2.5/numpy/core/config.h' to sources. executing numpy/core/code_generators/generate_array_api.py adding 'build/src.macosx-10.3-fat-2.5/numpy/core/ __multiarray_api.h' to sources. conv_template:> build/src.macosx-10.3-fat-2.5/numpy/core/src/ _sortmodule.c numpy.core - nothing done with h_files= ['build/src.macosx-10.3- fat-2.5/numpy/core/config.h', 'build/src.macosx-10.3-fat-2.5/numpy/ core/__multiarray_api.h'] building extension "numpy.core.scalarmath" sources adding 'build/src.macosx-10.3-fat-2.5/numpy/core/config.h' to sources. executing numpy/core/code_generators/generate_array_api.py adding 'build/src.macosx-10.3-fat-2.5/numpy/core/ __multiarray_api.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.macosx-10.3-fat-2.5/numpy/core/__ufunc_api.h' to sources. conv_template:> build/src.macosx-10.3-fat-2.5/numpy/core/src/ scalarmathmodule.c numpy.core - nothing done with h_files= ['build/src.macosx-10.3- fat-2.5/numpy/core/config.h', 'build/src.macosx-10.3-fat-2.5/numpy/ core/__multiarray_api.h', 'build/src.macosx-10.3-fat-2.5/numpy/core/ __ufunc_api.h'] building extension "numpy.core._dotblas" sources adding 'numpy/core/blasdot/_dotblas.c' to sources. building extension "numpy.lib._compiled_base" sources building extension "numpy.numarray._capi" sources building extension "numpy.fft.fftpack_lite" sources building extension "numpy.linalg.lapack_lite" sources creating build/src.macosx-10.3-fat-2.5/numpy/linalg adding 'numpy/linalg/lapack_litemodule.c' to sources. building extension "numpy.random.mtrand" sources creating build/src.macosx-10.3-fat-2.5/numpy/random C compiler: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/ MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 compile options: '-Inumpy/core/src -Inumpy/core/include -I/Library/ Frameworks/Python.framework/Versions/2.5/include/python2.5 -c' gcc: _configtest.c _configtest.c:7:2: error: #error No _WIN32 _configtest.c:7:2: error: #error No _WIN32 lipo: can't figure out the architecture type of: /var/tmp//ccojlBrt.out _configtest.c:7:2: error: #error No _WIN32 _configtest.c:7:2: error: #error No _WIN32 lipo: can't figure out the architecture type of: /var/tmp//ccojlBrt.out failure. removing: _configtest.c _configtest.o From gael.varoquaux at normalesup.org Sun Dec 3 11:00:16 2006 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 3 Dec 2006 17:00:16 +0100 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> Message-ID: <20061203160012.GG20167@clipper.ens.fr> On Sun, Dec 03, 2006 at 04:56:49PM +0100, Arild B. N?ss wrote: > This gets me further, actually the installation seems to complete. > However, when I type > >> import Numeric > in Python, I get the usual ImportError: No module named Numeric. Thats normal, you installed numpy. > >> import numpy > works, but > >>> a= array([[1,2,3],[4,5,6]]) > tells me array is not defined. Hum, you can do either: from numpy import * a= array([[1,2,3],[4,5,6]]) or import numpy a= numpy.array([[1,2,3],[4,5,6]]) Ga?l From arildna at stud.ntnu.no Sun Dec 3 11:36:04 2006 From: arildna at stud.ntnu.no (=?ISO-8859-1?Q? Arild_B._N=E6ss ?=) Date: Sun, 03 Dec 2006 17:36:04 +0100 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: <20061203160012.GG20167@clipper.ens.fr> References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> Message-ID: <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> Den 3. des. 2006 kl. 17.00 skrev Gael Varoquaux: > On Sun, Dec 03, 2006 at 04:56:49PM +0100, Arild B. N?ss wrote: >> This gets me further, actually the installation seems to complete. >> However, when I type >>>> import Numeric >> in Python, I get the usual ImportError: No module named Numeric. > > Thats normal, you installed numpy. Hm, this doesn't work for you either? It says on this page that "import Numeric" is a normal test: http://numpy.scipy.org/numpydoc/numpy-3.html > >>>> import numpy >> works, but >>>>> a= array([[1,2,3],[4,5,6]]) >> tells me array is not defined. > > Hum, you can do either: > > from numpy import * > a= array([[1,2,3],[4,5,6]]) > > or > > import numpy > a= numpy.array([[1,2,3],[4,5,6]]) You got my hopes up for a second there, but I can do neither: Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import * Running from numpy source directory. >>> a= array([[1,2,3],[4,5,6]]) Traceback (most recent call last): File "", line 1, in NameError: name 'array' is not defined >>> import numpy >>> a = numpy.array([[1,2],[3,4]]) Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'array' regards, Arild N?ss -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Sun Dec 3 11:59:37 2006 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 3 Dec 2006 11:59:37 -0500 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> Message-ID: <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> On 12/3/06, Arild B. N?ss wrote: > > > Den 3. des. 2006 kl. 17.00 skrev Gael Varoquaux: > > On Sun, Dec 03, 2006 at 04:56:49PM +0100, Arild B. N?ss wrote: > This gets me further, actually the installation seems to complete. > However, when I type > > import Numeric > in Python, I get the usual ImportError: No module named Numeric. > > Thats normal, you installed numpy. > > Hm, this doesn't work for you either? It says on this page that "import > Numeric" is a normal test: > http://numpy.scipy.org/numpydoc/numpy-3.html That document looks out of date. If you just install numpy you won't be able to import Numeric -SNIP- > You got my hopes up for a second there, but I can do neither: > > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from numpy import * > Running from numpy source directory. > >>> a= array([[1,2,3],[4,5,6]]) > Traceback (most recent call last): > File "", line 1, in > NameError: name 'array' is not defined You will get this error running from the numpy source directory. cd to somewhere else. From pgmdevlist at gmail.com Sun Dec 3 12:07:18 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 3 Dec 2006 12:07:18 -0500 Subject: [Numpy-discussion] nan functions convert matrix to array In-Reply-To: References: <200612011742.23145.pgmdevlist@gmail.com> Message-ID: <200612031207.19054.pgmdevlist@gmail.com> On Friday 01 December 2006 17:56, Keith Goodman wrote: ... > Would it break anything to change the first line of the nan functions from > a = array(a) > to > a = asanyarray(a) > ? Seeing what the nan functions do, I don't think that would be a problem. An expception would be raised if the operation could not be performed anyway (like a N.sum on a record array). But I'm no judge, so I'll let the powers in place decide of that. In the same order of idea, I'm bumping a post of mine: would it be possible to get 'asnayarray' in 'apply_along_axis', 'apply_over_axes', 'vectorize' ? From lists.steve at arachnedesign.net Sun Dec 3 12:23:35 2006 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Sun, 3 Dec 2006 12:23:35 -0500 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> Message-ID: >> You got my hopes up for a second there, but I can do neither: >> >> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) >> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin >> Type "help", "copyright", "credits" or "license" for more >> information. >>>>> from numpy import * >> Running from numpy source directory. >>>>> a= array([[1,2,3],[4,5,6]]) >> Traceback (most recent call last): >> File "", line 1, in >> NameError: name 'array' is not defined > > You will get this error running from the numpy source > directory. cd to somewhere else. Lasty .. you don't have to "guess" (too much) if numpy installed correctly. Once you're not running from the source dir (I just checked here, it completely doesn't work when I'm in the source dir also), run numpy's test suite: [you@/Users/yourhomedir] $ python In [1]: import numpy In [2]: numpy.test(1,1) You should see the tests fly by and finally get something like: ---------------------------------------------------------------------- Ran 517 tests in 0.450s OK Out[2]: -steve From arildna at stud.ntnu.no Sun Dec 3 14:19:01 2006 From: arildna at stud.ntnu.no (=?ISO-8859-1?Q? Arild_B._N=E6ss ?=) Date: Sun, 03 Dec 2006 20:19:01 +0100 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> Message-ID: <680A53F2-B763-4A6D-99A1-83005F5CB1DA@stud.ntnu.no> Den 3. des. 2006 kl. 18.23 skrev Steve Lianoglou: >>> You got my hopes up for a second there, but I can do neither: >>> >>> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) >>> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin >>> Type "help", "copyright", "credits" or "license" for more >>> information. >>>>>> from numpy import * >>> Running from numpy source directory. >>>>>> a= array([[1,2,3],[4,5,6]]) >>> Traceback (most recent call last): >>> File "", line 1, in >>> NameError: name 'array' is not defined >> >> You will get this error running from the numpy source >> directory. cd to somewhere else. > > Lasty .. you don't have to "guess" (too much) if numpy installed > correctly. > > Once you're not running from the source dir (I just checked here, it > completely doesn't work when I'm in the source dir also), run numpy's > test suite: > > [you@/Users/yourhomedir] $ python > > In [1]: import numpy > In [2]: numpy.test(1,1) > > You should see the tests fly by and finally get something like: > > ---------------------------------------------------------------------- > Ran 517 tests in 0.450s > > OK > Out[2]: It seems running from the source dir has been the main problem all along. It works fine outside (I guess). I get one error in the test Steve recommends though. But hey, 519 out of 520 ain't so bad, is it? regards, Arild N?ss ====================================================================== FAIL: Ticket #112 ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/core/tests/test_regression.py", line 220, in check_longfloat_repr assert(str(a)[1:9] == str(a[0])[:8]) AssertionError ---------------------------------------------------------------------- Ran 520 tests in 8.141s FAILED (failures=1) From mforbes at phys.washington.edu Mon Dec 4 00:27:08 2006 From: mforbes at phys.washington.edu (Michael McNeil Forbes) Date: Sun, 03 Dec 2006 21:27:08 -0800 Subject: [Numpy-discussion] take semantics (bug?) Message-ID: What are the semantics of the "take" function? I would have expected that the following have the same shape and size: >>> a = array([1,2,3]) >>> inds = a.nonzero() >>> a[inds] array([1, 2, 3]) >>> a.take(inds) array([[1, 2, 3]]) Is there a bug somewhere here or is this intentional? Michael. From robert.kern at gmail.com Mon Dec 4 00:41:48 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 03 Dec 2006 23:41:48 -0600 Subject: [Numpy-discussion] take semantics (bug?) In-Reply-To: References: Message-ID: <4573B51C.7050209@gmail.com> Michael McNeil Forbes wrote: > What are the semantics of the "take" function? > > I would have expected that the following have the same shape and size: > >>>> a = array([1,2,3]) >>>> inds = a.nonzero() >>>> a[inds] > array([1, 2, 3]) >>>> a.take(inds) > array([[1, 2, 3]]) > > Is there a bug somewhere here or is this intentional? It's a result of a.nonzero() returning a tuple. In [3]: a.nonzero() Out[3]: (array([0, 1, 2]),) __getitem__ interprets tuples specially: a[1,2,3] == a[(1,2,3)], also a[0,] == a[0]. .take() doesn't; it simply tries to convert its argument into an array. It can convert (array([0, 1, 2]),) into array([[0, 1, 2]]), so it does. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Mon Dec 4 12:09:56 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 Dec 2006 09:09:56 -0800 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: <680A53F2-B763-4A6D-99A1-83005F5CB1DA@stud.ntnu.no> References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> <680A53F2-B763-4A6D-99A1-83005F5CB1DA@stud.ntnu.no> Message-ID: <45745664.8050301@noaa.gov> Arild B. N?ss wrote: > It seems running from the source dir has been the main problem all > along. It works fine outside (I guess). I'm glad you got it working. > FAIL: Ticket #112 > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ > python2.5/site-packages/numpy/core/tests/test_regression.py", line > 220, in check_longfloat_repr > assert(str(a)[1:9] == str(a[0])[:8]) > AssertionError I think that's a known error, that has to do with 64 bit G5 issues , and IIRC, it's a test error, not a real bug. Good luck! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Mon Dec 4 12:18:53 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 Dec 2006 09:18:53 -0800 Subject: [Numpy-discussion] Forward from PIL list... Message-ID: <4574587D.7090907@noaa.gov> Hi all, PIL 1.1.6 final was just released, which includes support for the numpy array interface. However, this just came out on the list: Zachary Pincus wrote: > - The 'fromarray' command is a bit broken in Image.py: > Specifically, the following stanza is incorrect -- > if mode is None: > typestr = arr['typestr'] > if not (typestr[0] == '|' or typestr[0] == _ENDIAN or > typestr[1:] not in ['u1', 'b1', 'i4', 'f4']): > raise TypeError("cannot handle data-type") > typestr = typestr[:2] > if typestr == 'i4': > ... > > The error is that 'typestr = typestr[:2]' should instead be 'typestr > = typestr[1:]' that code was contributed by the numpy developers, but it sure looks broken. hmm. Maybe someone who knows better than me can send a not to the PIL list. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jameseflowers1000 at yahoo.com Mon Dec 4 18:04:02 2006 From: jameseflowers1000 at yahoo.com (James Flowers) Date: Mon, 4 Dec 2006 15:04:02 -0800 (PST) Subject: [Numpy-discussion] Overlapping copy with object_ arrays Message-ID: <20061204230402.39447.qmail@web37210.mail.mud.yahoo.com> Hello, Having a problem with overlapping copies. Memory being freed twice ??? See below: ActivePython 2.4.3 Build 11 (ActiveState Software Inc.) based on Python 2.4.3 (#1, Apr 3 2006, 18:07:14) [GCC 4.0.1 (Apple Computer, Inc. build 5247)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> print numpy.__version__ 1.0.1 >>> x = numpy.zeros(10, numpy.object_) >>> x[:] = [], # set the array to empy lists >>> x[0] is x[1] # everyone is of course identical True >>> x[3:-1] = x[4:] # overlappping copy >>> x # all is right in the universe array([[], [], [], [], [], [], [], [], [], []], dtype=object) >>> for i in range(10): x[i] = [] # set the array with a loop ... >>> x[0] is x[1] # everyone is of course different False >>> x[3:-1] = x[4:] # overlapping copy >>> x # oops, situation not OK, heap apparently corrupted by overlapping copy Bus error Jim ____________________________________________________________________________________ Cheap talk? Check out Yahoo! Messenger's low PC-to-Phone call rates. http://voice.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.bogen at icecube.wisc.edu Tue Dec 5 11:07:28 2006 From: david.bogen at icecube.wisc.edu (David Bogen) Date: Tue, 05 Dec 2006 10:07:28 -0600 Subject: [Numpy-discussion] Numpy and Python 2.2 on RHEL3 Message-ID: <45759940.8080000@icecube.wisc.edu> All: Is it possible to build Numpy using Python 2.2? I haven't been able to find anything that explicitly lists the versions of Python with which Numpy functions so I've been working under the assumption that the two bits will mesh together somehow. When I try to build Numpy 1.0.1 on RedHat Enterprise Linux 3 using Python 2.2.3, I get the following error: $ /usr/bin/python2.2 setup.py build Running from numpy source directory. Traceback (most recent call last): File "setup.py", line 89, in ? setup_package() File "setup.py", line 59, in setup_package from numpy.distutils.core import setup File "numpy/distutils/__init__.py", line 5, in ? import ccompiler File "numpy/distutils/ccompiler.py", line 11, in ? import log File "numpy/distutils/log.py", line 4, in ? from distutils.log import * ImportError: No module named log Through extensive trial and error I've been able to hack the distutils files enough to make that error go away, but then I start getting an error describing an invalid syntax with the directive "yield os.path" which seems to be a deeper, more complex error to fix. Am I attempting the impossible here or am I just doing something fundamentally and obviously wrong? David -- David Bogen :: (608) 263-0168 Unix SysAdmin :: IceCube Project david.bogen at icecube.wisc.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3298 bytes Desc: S/MIME Cryptographic Signature URL: From wfspotz at sandia.gov Tue Dec 5 11:15:33 2006 From: wfspotz at sandia.gov (Bill Spotz) Date: Tue, 5 Dec 2006 09:15:33 -0700 Subject: [Numpy-discussion] Numpy and Python 2.2 on RHEL3 In-Reply-To: <45759940.8080000@icecube.wisc.edu> References: <45759940.8080000@icecube.wisc.edu> Message-ID: <0C449298-28F7-4D04-9966-24487F0E5F61@sandia.gov> From http://docs.python.org/ref/yield.html you might try from __future__ import generators On Dec 5, 2006, at 9:07 AM, David Bogen wrote: > All: > > Is it possible to build Numpy using Python 2.2? I haven't been > able to > find anything that explicitly lists the versions of Python with which > Numpy functions so I've been working under the assumption that the two > bits will mesh together somehow. > > When I try to build Numpy 1.0.1 on RedHat Enterprise Linux 3 using > Python 2.2.3, I get the following error: > > $ /usr/bin/python2.2 setup.py build > Running from numpy source directory. > Traceback (most recent call last): > File "setup.py", line 89, in ? > setup_package() > File "setup.py", line 59, in setup_package > from numpy.distutils.core import setup > File "numpy/distutils/__init__.py", line 5, in ? > import ccompiler > File "numpy/distutils/ccompiler.py", line 11, in ? > import log > File "numpy/distutils/log.py", line 4, in ? > from distutils.log import * > ImportError: No module named log > > Through extensive trial and error I've been able to hack the distutils > files enough to make that error go away, but then I start getting an > error describing an invalid syntax with the directive "yield os.path" > which seems to be a deeper, more complex error to fix. > > Am I attempting the impossible here or am I just doing something > fundamentally and obviously wrong? > > David > > -- > David Bogen :: (608) 263-0168 > Unix SysAdmin :: IceCube Project > david.bogen at icecube.wisc.edu > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-5451 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From david.bogen at icecube.wisc.edu Tue Dec 5 11:23:10 2006 From: david.bogen at icecube.wisc.edu (David Bogen) Date: Tue, 05 Dec 2006 10:23:10 -0600 Subject: [Numpy-discussion] Numpy and Python 2.2 on RHEL3 In-Reply-To: <0C449298-28F7-4D04-9966-24487F0E5F61@sandia.gov> References: <45759940.8080000@icecube.wisc.edu> <0C449298-28F7-4D04-9966-24487F0E5F61@sandia.gov> Message-ID: <45759CEE.6070702@icecube.wisc.edu> Bill Spotz wrote: > > you might try > > from __future__ import generators > Some research did turn up that alternative, but then I started getting this error: $ /usr/bin/python2.2 setup.py build Running from numpy source directory. Traceback (most recent call last): File "setup.py", line 90, in ? setup_package() File "setup.py", line 60, in setup_package from numpy.distutils.core import setup File "numpy/distutils/__init__.py", line 5, in ? import ccompiler File "numpy/distutils/ccompiler.py", line 12, in ? from exec_command import exec_command File "numpy/distutils/exec_command.py", line 56, in ? from numpy.distutils.misc_util import is_sequence File "numpy/distutils/misc_util.py", line 12, in ? from sets import Set as set ImportError: No module named sets Given the number of walls I was hitting, it just seemed that I was traveling down the wrong path. David -- David Bogen :: (608) 263-0168 Unix SysAdmin :: IceCube Project david.bogen at icecube.wisc.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3298 bytes Desc: S/MIME Cryptographic Signature URL: From oliphant.travis at ieee.org Fri Dec 1 15:33:45 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 01 Dec 2006 13:33:45 -0700 Subject: [Numpy-discussion] seting the dtype for where... In-Reply-To: <45708EF4.1070202@noaa.gov> References: <45708EF4.1070202@noaa.gov> Message-ID: <457091A9.1030501@ieee.org> Chris Barker wrote: > Hi all, > > I'd like to set the data type for what numpy.where creates. For example: > > import numpy as N > > N.where(a >= 5, 5, 0) > > creates an integer array, which makes sense. > > N.where(a >= 5, 5.0, 0) > > creates a float64 array, which also makes sense, but I'd like a float32 > array, so I tried: > > N.where(a >= 5, array(5.0, dtype=N.float32), 0) > > but I got a float64 array again. > > How can I get a float32 array? where doesn't take a dtype argument -- > maybe it should? > You need to do N.where(a >= 5, N.float32(5), N.float32(0)) The rules are the same as for ufuncs: The returned array for mixed-type operations uses the "largest" type unless one is a scalar and one is an array (then the scalar is ignored unless the "kind" is different). In this case, you have two scalars (a 0-d array is considered a scalar in this context. -Travis From fsenkel at verizon.net Sun Dec 3 22:28:00 2006 From: fsenkel at verizon.net (fsenkel at verizon.net) Date: Sun, 03 Dec 2006 21:28:00 -0600 (CST) Subject: [Numpy-discussion] How to speed up this function? Message-ID: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> Hello, I'm taking a CFD class, one of the codes I wrote runs very slow. When I look at hotshot is says the function below is the problem. Since this is an explicit step, the for loops are only traversed once, so I think it's caused by memory usage, but I'm not sure if it's the local variables or the loop? I can vectorize the inner loop, would declaring the data structures in the calling routine and passing them in be a better idea than using local storage? I'm new at python and numpy, I need to look at how to get profiling information for the lines within a function. Thank you, Frank PS I tried to post this via google groups, but it didn't seem to go through, sorry if it ends up as multiple postings def findw(wnext,wprior,phiprior,uprior,vprior): #format here is x[i,j] where i's are rows, j's columns, use flipud() to get the #print out consistent with the spacial up-down directions #assign local names that are more #inline with the class notation w = wprior phi = phiprior u = uprior v = vprior #three of the BC are known so just set them #symetry plane wnext[0,0:gcols] = 0.0 #upper wall wnext[gN,0:gcols] = 2.0/gdy**2 * (phi[gN,0:gcols] - phi[gN-1,0:gcols]) #inlet, off the walls wnext[1:grows-1,0] = 0.0 upos = where(u>0) vpos = where(v>0) Sx = ones_like(u) Sx[upos] = 0.0 Sy = ones_like(v) Sy[vpos] = 0.0 uw = u*w vw = v*w #interior nodes for j in range(1,gasizej-1): for i in range(1,gasizei-1): wnext[i,j] =( w[i,j] + gnu*gdt/gdx**2 * (w[i,j-1] - 2.0*w[i,j] + w[i,j+1]) + gnu*gdt/gdy**2 * (w[i-1,j] - 2.0*w[i,j] + w[i+1,j]) - (1.0 - Sx[i,j]) * gdt/gdx * (uw[i,j] - uw[i,j-1]) - Sx[i,j] * gdt/gdx * (uw[i,j+1] - uw[i,j]) - (1.0 - Sy[i,j]) * gdt/gdy * (vw[i,j] - vw[i-1,j]) - Sy[i,j] * gdt/gdy * (vw[i+1,j] - vw[i,j]) ) ## print "***wnext****" ## print "i: ", i, "j: ", j, "wnext[i,j]: ", wnext[i,j] #final BC at outlet, off walls wnext[1:grows-1,gM] = wnext[1:grows-1,gM-1] From oliphant.travis at ieee.org Sun Dec 3 23:11:53 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun, 03 Dec 2006 21:11:53 -0700 Subject: [Numpy-discussion] problems installing NumPy on OSX In-Reply-To: <680A53F2-B763-4A6D-99A1-83005F5CB1DA@stud.ntnu.no> References: <20061104222219.5v1dj889q8cgsg8c@webmail.ntnu.no> <20061105091727.ouibxoltwk40cksg@webmail.ntnu.no> <20061105193259.lbxo4gkzk44wsw4c@webmail.ntnu.no> <20061203160012.GG20167@clipper.ens.fr> <53B5F7E9-C152-418C-B872-BD0331030875@stud.ntnu.no> <331116dc0612030859h1bff2553j646c457368449087@mail.gmail.com> <680A53F2-B763-4A6D-99A1-83005F5CB1DA@stud.ntnu.no> Message-ID: <4573A009.3070505@ieee.org> > It seems running from the source dir has been the main problem all > along. It works fine outside (I guess). > > I get one error in the test Steve recommends though. But hey, 519 out > of 520 ain't so bad, is it? > Don't worry about the failing test. It's a bad test on your platform. If you don't use long double's you won't care. -Travis From koara at atlas.cz Mon Dec 4 15:37:05 2006 From: koara at atlas.cz (koara at atlas.cz) Date: Mon, 04 Dec 2006 21:37:05 +0100 Subject: [Numpy-discussion] ValueError: dimensions too large. Message-ID: <18eda9c6e3b5467cb0bc95aea4c7b518@atlas.cz> Hello, i tried to create a 2d array, but encountered: ValueError: dimensions too large. Does this refer to insufficient memory, or is there really a limit on dimension sizes? Cheers. From charlesr.harris at gmail.com Tue Dec 5 13:02:17 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Dec 2006 11:02:17 -0700 Subject: [Numpy-discussion] dot operations on multidimensional arrays In-Reply-To: <45655665.7020302@fysik.dtu.dk> References: <45655665.7020302@fysik.dtu.dk> Message-ID: On 11/23/06, Carsten Rostgaard wrote: > > Hi! > I am trying to use the "dot" method on multi-(more than 2)-dimensional > arrays. > > Specifically I do > >> y = dot(a, b) > where a is a 2D array and b is a 3D array. > > using numpy I get the the help: > " > dot(...) > dot(a,v) returns matrix-multiplication between a and b. > The product-sum is over the last dimension of a and the > second-to-last dimension of b. > " > I then expect that > >> y[i, j, k] = sum(a[i, :] * b[j, :, k]) > which is actually what I get. > > The question is then: > 1) Is there any way to change the axis for which the product-sum is > performed. This can of course be done by a swapaxis before and after the > operation, but this makes the array non-contiguous, in which case the > dot operation often makes bugs (at least in Numeric). > 2) For complicated reasons we still use Numeric in our software package, > and in this, "dot" behaves very strangely. > According to the Numeric help: In Numpy tensordot(a, b, axes=2) tensordot returns the product for any (ndim >= 1) arrays. r_{xxx, yyy} = \sum_k a_{xxx,k} b_{k,yyy} where the axes to be summed over are given by the axes argument. the first element of the sequence determines the axis or axes in arr1 to sum over, and the second element in axes argument sequence determines the axis or axes in arr2 to sum over. When there is more than one axis to sum over, the corresponding arguments to axes should be sequences of the same length with the first axis to sum over given first in both sequences, the second axis second, and so forth. If the axes argument is an integer, N, then the last N dimensions of a and first N dimensions of b are summed over. I don't know about numeric. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at ieee.org Tue Dec 5 13:16:05 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue, 05 Dec 2006 11:16:05 -0700 Subject: [Numpy-discussion] How to speed up this function? In-Reply-To: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> References: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> Message-ID: <4575B765.1040505@ieee.org> fsenkel at verizon.net wrote: > Hello, > > I'm taking a CFD class, one of the codes I wrote runs very slow. When I look at hotshot is says the function below is the problem. Since this is an explicit step, the for loops are only traversed once, so I think it's caused by memory usage, but I'm not sure if it's the local variables or the loop? I can vectorize the inner loop, would declaring the data structures in the calling routine and passing them in be a better idea than using local storage? > > I'm new at python and numpy, I need to look at how to get profiling information for the lines within a function. > > > Thank you, > > Frank > > > PS > I tried to post this via google groups, but it didn't seem to go through, sorry if it ends up as multiple postings > > > def findw(wnext,wprior,phiprior,uprior,vprior): > #format here is x[i,j] where i's are rows, j's columns, use flipud() to get the > #print out consistent with the spacial up-down directions > > #assign local names that are more > #inline with the class notation > w = wprior > phi = phiprior > u = uprior > v = vprior > > > #three of the BC are known so just set them > #symetry plane > wnext[0,0:gcols] = 0.0 > > #upper wall > wnext[gN,0:gcols] = 2.0/gdy**2 * (phi[gN,0:gcols] - phi[gN-1,0:gcols]) > > #inlet, off the walls > wnext[1:grows-1,0] = 0.0 > > > upos = where(u>0) > vpos = where(v>0) > > Sx = ones_like(u) > Sx[upos] = 0.0 > > Sy = ones_like(v) > Sy[vpos] = 0.0 > > uw = u*w > vw = v*w > > #interior nodes > for j in range(1,gasizej-1): > for i in range(1,gasizei-1): > > wnext[i,j] =( w[i,j] + gnu*gdt/gdx**2 * (w[i,j-1] - 2.0*w[i,j] + w[i,j+1]) + > gnu*gdt/gdy**2 * (w[i-1,j] - 2.0*w[i,j] + w[i+1,j]) - > (1.0 - Sx[i,j]) * gdt/gdx * (uw[i,j] - uw[i,j-1]) - > Sx[i,j] * gdt/gdx * (uw[i,j+1] - uw[i,j]) - > (1.0 - Sy[i,j]) * gdt/gdy * (vw[i,j] - vw[i-1,j]) - > Sy[i,j] * gdt/gdy * (vw[i+1,j] - vw[i,j]) ) > I imagine that this loop is what is killing you. Remove at least the inner loop, if not both (try removing just the inner loop as well as both since sometimes it's faster to just remove the inner one due to memory usage issues..) removing both will look something like: wnext[1:-1,1:-1] = w[1:-1, 1:-1] + gnu*gdx**2* (w[1:-1, 0:-2] - 2.0*w[1:-1, 1:-1] + w[1:-1,2:] etc, etc When you're done with that, note also that you have the same array term present multiple times. You could save more time by collapsing those terms and using different scalar multipliers. Occasionally that is numerically unwise, but I doubt it in this case. There all sorts of other things that you can do such as using inplace operations, etc. But try vectorizing the loop first. -tim > ## print "***wnext****" > ## print "i: ", i, "j: ", j, "wnext[i,j]: ", wnext[i,j] > > #final BC at outlet, off walls > wnext[1:grows-1,gM] = wnext[1:grows-1,gM-1] > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > From tim.hochberg at ieee.org Tue Dec 5 13:20:29 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue, 05 Dec 2006 11:20:29 -0700 Subject: [Numpy-discussion] Precision in Python In-Reply-To: References: Message-ID: <4575B86D.7020600@ieee.org> Elton Mendes wrote: > Hi. > I'm having a precision problem in python > > Example: > > > >>> a = 5.14343434 > >>> b = round(a,1) > >>> b > 5.0999999999999996 > >>> > > It?s possible to round the number exactly to 5.1 Read this: http://www.python.org/infogami-faq/general/why-are-floating-point-calculations-so-inaccurate/ From charlesr.harris at gmail.com Tue Dec 5 13:18:47 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Dec 2006 11:18:47 -0700 Subject: [Numpy-discussion] How to speed up this function? In-Reply-To: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> References: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> Message-ID: On 12/3/06, fsenkel at verizon.net wrote: > > Hello, > > I'm taking a CFD class, one of the codes I wrote runs very slow. When I > look at hotshot is says the function below is the problem. Since this is an > explicit step, the for loops are only traversed once, so I think it's caused > by memory usage, but I'm not sure if it's the local variables or the loop? I > can vectorize the inner loop, would declaring the data structures in the > calling routine and passing them in be a better idea than using local > storage? > > I'm new at python and numpy, I need to look at how to get profiling > information for the lines within a function. > > > Thank you, > > Frank > > > PS > I tried to post this via google groups, but it didn't seem to go through, > sorry if it ends up as multiple postings > > > def findw(wnext,wprior,phiprior,uprior,vprior): > #format here is x[i,j] where i's are rows, j's columns, use flipud() to > get the > #print out consistent with the spacial up-down directions > > #assign local names that are more > #inline with the class notation > w = wprior > phi = phiprior > u = uprior > v = vprior > > > #three of the BC are known so just set them > #symetry plane > wnext[0,0:gcols] = 0.0 > > #upper wall > wnext[gN,0:gcols] = 2.0/gdy**2 * (phi[gN,0:gcols] - phi[gN-1,0:gcols]) > > #inlet, off the walls > wnext[1:grows-1,0] = 0.0 > > > upos = where(u>0) > vpos = where(v>0) > > Sx = ones_like(u) > Sx[upos] = 0.0 > > Sy = ones_like(v) > Sy[vpos] = 0.0 > > uw = u*w > vw = v*w > > #interior nodes > for j in range(1,gasizej-1): > for i in range(1,gasizei-1): > > wnext[i,j] =( w[i,j] + gnu*gdt/gdx**2 * (w[i,j-1] - 2.0*w[i,j] > + w[i,j+1]) + > gnu*gdt/gdy**2 * (w[i-1,j] - 2.0*w[i,j] + > w[i+1,j]) - > (1.0 - Sx[i,j]) * gdt/gdx * (uw[i,j] - uw[i,j-1]) > - > Sx[i,j] * gdt/gdx * (uw[i,j+1] - uw[i,j]) - > (1.0 - Sy[i,j]) * gdt/gdy * (vw[i,j] - vw[i-1,j]) > - > Sy[i,j] * gdt/gdy * (vw[i+1,j] - vw[i,j]) ) > > ## print "***wnext****" > ## print "i: ", i, "j: ", j, "wnext[i,j]: ", wnext[i,j] > > #final BC at outlet, off walls > wnext[1:grows-1,gM] = wnext[1:grows-1,gM-1] > _ Explicit indexing tends to be very slow. I note what looks to be a lot of differencing in the code, so I suspect what you have here is a PDE. Your best bet in the short term is to vectorize as many of these operations as possible, but because the expression is so complicated it is a bit of a chore to see just how. It your CFD class allows it, there are probably tools in scipy that are adapted to this sort of problem, and in particular to CFD. Sandia also puts out PyTrilinos, http://software.sandia.gov/trilinos/packages/pytrilinos/, which provides interfaces to distributed and parallel PDE solvers. It's big iron software for serious problems, so might be a bit of overkill for your applications. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpederse at gmail.com Tue Dec 5 13:30:41 2006 From: bpederse at gmail.com (Brent Pedersen) Date: Tue, 5 Dec 2006 10:30:41 -0800 Subject: [Numpy-discussion] How to speed up this function? In-Reply-To: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> References: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> Message-ID: it looks like you could use weave.blitz() without much change to your code. or weave.inline() if needed. see this page: http://scipy.org/PerformancePython On 12/3/06, fsenkel at verizon.net wrote: > > Hello, > > I'm taking a CFD class, one of the codes I wrote runs very slow. When I > look at hotshot is says the function below is the problem. Since this is an > explicit step, the for loops are only traversed once, so I think it's caused > by memory usage, but I'm not sure if it's the local variables or the loop? I > can vectorize the inner loop, would declaring the data structures in the > calling routine and passing them in be a better idea than using local > storage? > > I'm new at python and numpy, I need to look at how to get profiling > information for the lines within a function. > > > Thank you, > > Frank > > > PS > I tried to post this via google groups, but it didn't seem to go through, > sorry if it ends up as multiple postings > > > def findw(wnext,wprior,phiprior,uprior,vprior): > #format here is x[i,j] where i's are rows, j's columns, use flipud() to > get the > #print out consistent with the spacial up-down directions > > #assign local names that are more > #inline with the class notation > w = wprior > phi = phiprior > u = uprior > v = vprior > > > #three of the BC are known so just set them > #symetry plane > wnext[0,0:gcols] = 0.0 > > #upper wall > wnext[gN,0:gcols] = 2.0/gdy**2 * (phi[gN,0:gcols] - phi[gN-1,0:gcols]) > > #inlet, off the walls > wnext[1:grows-1,0] = 0.0 > > > upos = where(u>0) > vpos = where(v>0) > > Sx = ones_like(u) > Sx[upos] = 0.0 > > Sy = ones_like(v) > Sy[vpos] = 0.0 > > uw = u*w > vw = v*w > > #interior nodes > for j in range(1,gasizej-1): > for i in range(1,gasizei-1): > > wnext[i,j] =( w[i,j] + gnu*gdt/gdx**2 * (w[i,j-1] - 2.0*w[i,j] > + w[i,j+1]) + > gnu*gdt/gdy**2 * (w[i-1,j] - 2.0*w[i,j] + > w[i+1,j]) - > (1.0 - Sx[i,j]) * gdt/gdx * (uw[i,j] - uw[i,j-1]) > - > Sx[i,j] * gdt/gdx * (uw[i,j+1] - uw[i,j]) - > (1.0 - Sy[i,j]) * gdt/gdy * (vw[i,j] - vw[i-1,j]) > - > Sy[i,j] * gdt/gdy * (vw[i+1,j] - vw[i,j]) ) > > ## print "***wnext****" > ## print "i: ", i, "j: ", j, "wnext[i,j]: ", wnext[i,j] > > #final BC at outlet, off walls > wnext[1:grows-1,gM] = wnext[1:grows-1,gM-1] > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 5 13:33:30 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Dec 2006 11:33:30 -0700 Subject: [Numpy-discussion] How to speed up this function? In-Reply-To: References: <5263831.1901231165202880646.JavaMail.root@vms062.mailsrvcs.net> Message-ID: On 12/5/06, Charles R Harris wrote: > > > > On 12/3/06, fsenkel at verizon.net wrote: > > > > Hello, > > > > I'm taking a CFD class, one of the codes I wrote runs very slow. When I > > look at hotshot is says the function below is the problem. Since this is an > > explicit step, the for loops are only traversed once, so I think it's caused > > by memory usage, but I'm not sure if it's the local variables or the loop? I > > can vectorize the inner loop, would declaring the data structures in the > > calling routine and passing them in be a better idea than using local > > storage? > > > > I'm new at python and numpy, I need to look at how to get profiling > > information for the lines within a function. > > > > > > Thank you, > > > > Frank > > > > > > PS > > I tried to post this via google groups, but it didn't seem to go > > through, sorry if it ends up as multiple postings > > > Explicit indexing tends to be very slow. I note what looks to be a lot of > differencing in the code, so I suspect what you have here is a PDE. Your > best bet in the short term is to vectorize as many of these operations as > possible, but because the expression is so complicated it is a bit of a > chore to see just how. It your CFD class allows it, there are probably > tools in scipy that are adapted to this sort of problem, and in particular > to CFD. Sandia also puts out PyTrilinos, > http://software.sandia.gov/trilinos/packages/pytrilinos/, which provides > interfaces to distributed and parallel PDE solvers. It's big iron software > for serious problems, so might be a bit of overkill for your applications. > If it is a PDE, you might also want to look into sparse matrices. Other folks here can tell you more about that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 5 13:39:49 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Dec 2006 11:39:49 -0700 Subject: [Numpy-discussion] Precision in Python In-Reply-To: References: Message-ID: On 11/27/06, Elton Mendes wrote: > > Hi. > I'm having a precision problem in python > > Example: > > > >>> a = 5.14343434 > >>> b = round(a,1) > >>> b > 5.0999999999999996 > >>> > > It?s possible to round the number exactly to 5.1 Short answer, no. The number 5.1 can't be exactly represented as a binary fraction, i.e., it can't be expressed in the form int/power_of_two. If all you are worried about is appearance, then the print routine will round it to 5.1 if you restrict the precision of the output. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 5 14:19:27 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 05 Dec 2006 13:19:27 -0600 Subject: [Numpy-discussion] Numpy and Python 2.2 on RHEL3 In-Reply-To: <45759940.8080000@icecube.wisc.edu> References: <45759940.8080000@icecube.wisc.edu> Message-ID: <4575C63F.8050509@gmail.com> David Bogen wrote: > All: > > Is it possible to build Numpy using Python 2.2? I haven't been able to > find anything that explicitly lists the versions of Python with which > Numpy functions so I've been working under the assumption that the two > bits will mesh together somehow. numpy requires Python 2.3 . -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.u.r.e.l.i.a.n at gmx.net Tue Dec 5 15:51:06 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Tue, 5 Dec 2006 21:51:06 +0100 Subject: [Numpy-discussion] dot operations on multidimensional arrays In-Reply-To: <45655665.7020302@fysik.dtu.dk> References: <45655665.7020302@fysik.dtu.dk> Message-ID: <200612052151.07223.a.u.r.e.l.i.a.n@gmx.net> Hi, > The question is then: > 1) Is there any way to change the axis for which the product-sum is > performed. This can of course be done by a swapaxis before and after the > operation, but this makes the array non-contiguous, in which case the > dot operation often makes bugs (at least in Numeric). > 2) For complicated reasons we still use Numeric in our software package, > and in this, "dot" behaves very strangely. The behaviour for >2D arrays has a bug which was fixed for numpy long ago. (I was the one who found it. :-)) It lead exactly to the behaviour you found (first row is correct, rest is garbage). I do not know if it was fixed in Numeric, maybe updating to the latest version will help. Otherwise, maybe the best workaround is to use a for loop and calculate dot elementwise. Johannes From gnchen at cortechs.net Tue Dec 5 16:27:55 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Tue, 05 Dec 2006 13:27:55 -0800 Subject: [Numpy-discussion] compile scipy by using intel compiler Message-ID: <1165354075.6742.5.camel@cortechs25.cortechs.net> Hi! All, I have a dual opteron 285 with 8G ram machine. And I ran FC6 x86_64 on that. I did manage to get numpy (from svn) compiled by using icc 9.1.0.45 and mkl 9.0 ( got 3 errors when I ran the est). But no such luck for scipy (from svn). Below is the error: Lib/special/cephes/mconf.h(137): remark #193: zero used for undefined preprocessing identifier #if WORDS_BIGENDIAN /* Defined in pyconfig.h */ ^ Lib/special/cephes/const.c(92): error: floating-point operation result is out of range double INFINITY = 1.0/0.0; /* 99e999; */ ^ Lib/special/cephes/const.c(97): error: floating-point operation result is out of range double NAN = 1.0/0.0 - 1.0/0.0; ^ Lib/special/cephes/const.c(97): error: floating-point operation result is out of range double NAN = 1.0/0.0 - 1.0/0.0; ^ compilation aborted for Lib/special/cephes/const.c (code 2) error: Command "icc -O2 -g -fomit-frame-pointer -mcpu=pentium4 -mtune=pentium4 -march=pentium4 -msse3 -axW -Wall -fPIC -c Lib/special/cephes/const.c -o build/temp.linux-x86_64-2.4/Lib/special/cephes/const.o" failed with exit status 2 Did anyone has a solution for this? BTW, the 3 error I got from numpy are: File "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line 25, in test_ufunclike Failed example: nx.sign(a) Expected: array([ 1., -1., 0., 0., 1., -1.]) Got: array([ 1., -1., -1., 0., 1., -1.]) ********************************************************************** File "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line 40, in test_ufunclike Failed example: nx.sign(a, y) Expected: array([True, True, False, False, True, True], dtype=bool) Got: array([True, True, True, False, True, True], dtype=bool) ********************************************************************** File "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line 43, in test_ufunclike Failed example: y Expected: array([True, True, False, False, True, True], dtype=bool) Got: array([True, True, True, False, True, True], dtype=bool) Are these error serious?? Or maybe I should get back to gcc? Anyone got a good speed up by using icc and mkl? -- Gen-Nan Chen, PhD -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Tue Dec 5 16:46:42 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 5 Dec 2006 13:46:42 -0800 Subject: [Numpy-discussion] numpy in debian Message-ID: I'm impressed with how easy it is to compile numpy. I'm even more impressed with how easy it is let someone else compile it and just apt-get it. Does anyone know the numpy plans for debian? It is currently at 1.0rc1. I'm afraid to ask the debian numpy maintainers since they have already done me such a big favor by packaging numpy and I haven't even thanked them yet. Oh, yeah. Then there's that other project I should thank. Thank you. Numpy is a joy to use. From mforbes at phys.washington.edu Tue Dec 5 21:57:10 2006 From: mforbes at phys.washington.edu (Michael McNeil Forbes) Date: Tue, 05 Dec 2006 18:57:10 -0800 Subject: [Numpy-discussion] take semantics (bug?) References: <4573B51C.7050209@gmail.com> Message-ID: Robert Kern wrote: > Michael McNeil Forbes wrote: > > What are the semantics of the "take" function? > > > > I would have expected that the following have the same shape and size: > > > >>>> a = array([1,2,3]) > >>>> inds = a.nonzero() > >>>> a[inds] > > array([1, 2, 3]) > >>>> a.take(inds) > > array([[1, 2, 3]]) > > > > Is there a bug somewhere here or is this intentional? > > It's a result of a.nonzero() returning a tuple. ... > __getitem__ interprets tuples specially: a[1,2,3] == a[(1,2,3)], also a[0,] > == a[0]. > > .take() doesn't; it simply tries to convert its argument into an array. It > can > convert (array([0, 1, 2]),) into array([[0, 1, 2]]), so it does. Okay. I understand why this happens from the code. 1) Is there a design reason for this inconsistent treatment of "indices"? 2) If so, is there some place (perhaps on the Wiki or in some source code I cannot find) that design decisions like this are discussed? (I have several other inconsistencies I would like to address, but would like to find out if they are "intentional" before wasting people's time.) Thanks, Michael. From robert.kern at gmail.com Tue Dec 5 22:19:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 05 Dec 2006 21:19:03 -0600 Subject: [Numpy-discussion] take semantics (bug?) In-Reply-To: References: <4573B51C.7050209@gmail.com> Message-ID: <457636A7.6040408@gmail.com> Michael McNeil Forbes wrote: > Robert Kern wrote: >> Michael McNeil Forbes wrote: >>> What are the semantics of the "take" function? >>> >>> I would have expected that the following have the same shape and size: >>> >>>>>> a = array([1,2,3]) >>>>>> inds = a.nonzero() >>>>>> a[inds] >>> array([1, 2, 3]) >>>>>> a.take(inds) >>> array([[1, 2, 3]]) >>> >>> Is there a bug somewhere here or is this intentional? >> It's a result of a.nonzero() returning a tuple. > ... >> __getitem__ interprets tuples specially: a[1,2,3] == a[(1,2,3)], also a[0,] >> == a[0]. >> >> .take() doesn't; it simply tries to convert its argument into an array. It >> can >> convert (array([0, 1, 2]),) into array([[0, 1, 2]]), so it does. > > Okay. I understand why this happens from the code. > > 1) Is there a design reason for this inconsistent treatment of "indices"? Indexing needs to handle tuples of indices separately from other objects in order to support x[i, j] .take() does not support multidimensional indexing, so it shouldn't try to go through the special cases that __getitem__ does. Instead, it follows the rules that nearly every other method uses (i.e. "just turn it into an array"). > 2) If so, is there some place (perhaps on the Wiki or in some source > code I cannot find) that design decisions like this are discussed? (I > have several other inconsistencies I would like to address, but would > like to find out if they are "intentional" before wasting people's time.) If they're recorded outside of the code, _The Guide to NumPy_, or the mailing list, they're here: http://projects.scipy.org/scipy/numpy -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From nadavh at visionsense.com Wed Dec 6 01:42:47 2006 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 6 Dec 2006 08:42:47 +0200 Subject: [Numpy-discussion] dot operations on multidimensional arrays Message-ID: <07C6A61102C94148B8104D42DE95F7E8CC1FA9@exchange2k.envision.co.il> Try numpy.tensordot Nadav -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Carsten Rostgaard Sent: Thursday, November 23, 2006 10:06 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] dot operations on multidimensional arrays Hi! I am trying to use the "dot" method on multi-(more than 2)-dimensional arrays. Specifically I do >> y = dot(a, b) where a is a 2D array and b is a 3D array. using numpy I get the the help: " dot(...) dot(a,v) returns matrix-multiplication between a and b. The product-sum is over the last dimension of a and the second-to-last dimension of b. " I then expect that >> y[i, j, k] = sum(a[i, :] * b[j, :, k]) which is actually what I get. The question is then: 1) Is there any way to change the axis for which the product-sum is performed. This can of course be done by a swapaxis before and after the operation, but this makes the array non-contiguous, in which case the dot operation often makes bugs (at least in Numeric). 2) For complicated reasons we still use Numeric in our software package, and in this, "dot" behaves very strangely. According to the Numeric help: " dot(a, b) dot(a,b) returns matrix-multiplication between a and b. The product-sum is over the last dimension of a and the second-to-last dimension of b. " so I would have expected again that y[i, j, k] = sum(a[i, :] * b[j, :, k]), and the dimensions actually fit, i.e. y.shape = (a.shape[0], b.shape[0], b.shape[2]), but only some rows of the result has these values!! Does anyone know what Numeric.dot(a, b) actually does when b has more than two dimensions? I use the following test script: ---------------------BEGIN SCRIPT----------------------- import Numeric as num # import numpy as num # make 'random' input arrays a = num.zeros((2, 5)) b = num.zeros((3, 5, 4)) a.flat[:] = num.arange(len(a.flat)) - 3 b.flat[:] = num.arange(len(b.flat)) + 5 # built-in dot product y1 = num.dot(a, b) # manual dot product y2 = num.zeros((a.shape[0], b.shape[0], b.shape[2])) for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(b.shape[2]): y2[i, j, k] = num.sum(a[i,:] * b[j, :, k]) # test for consistency print y1 == y2 ---------------------END SCRIPT------------------------ with the result: [[[1 1 1 1] [0 0 0 0] [0 0 0 0]] [[1 1 1 1] [0 0 0 0] [0 0 0 0]]] thanks a lot, Carsten Rostgaard Carsten.Rostgaard at fysik.dtu.dk _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From nadavh at visionsense.com Wed Dec 6 01:52:35 2006 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 6 Dec 2006 08:52:35 +0200 Subject: [Numpy-discussion] Precision in Python Message-ID: <07C6A61102C94148B8104D42DE95F7E8C8F12C@exchange2k.envision.co.il> -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Elton Mendes Sent: Monday, November 27, 2006 13:57 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] Precision in Python Hi. I'm having a precision problem in python Example: >>> a = 5.14343434 >>> b = round(a,1) >>> b 5.0999999999999996 >>> It?s possible to round the number exactly to 5.1 NO. 5.1 can not be represented exactly as a machine native float. The only way I know to represent this value exactly is to use the decimal module. Usually you do not want to do this. Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Wed Dec 6 02:05:06 2006 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 6 Dec 2006 09:05:06 +0200 Subject: [Numpy-discussion] How to speed up this function? Message-ID: <07C6A61102C94148B8104D42DE95F7E8C8F12D@exchange2k.envision.co.il> You can speed it up easily by avoiding the loop. The main idea is to replace the indexing of the type [i+1,j], [i-1,j], [i,j+1], [i,j-1] by the appropriate slicing. For example for i in xrange(1,n): for j in xrange(1,m) a[i,j] = b[i-1,j] + c[i,j+1] can be replaced by a[1:,:-1] = b[:-1] + c[:,1:] Nadav. -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of fsenkel at verizon.net Sent: Monday, December 04, 2006 05:28 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] How to speed up this function? Hello, I'm taking a CFD class, one of the codes I wrote runs very slow. When I look at hotshot is says the function below is the problem. Since this is an explicit step, the for loops are only traversed once, so I think it's caused by memory usage, but I'm not sure if it's the local variables or the loop? I can vectorize the inner loop, would declaring the data structures in the calling routine and passing them in be a better idea than using local storage? I'm new at python and numpy, I need to look at how to get profiling information for the lines within a function. Thank you, Frank PS I tried to post this via google groups, but it didn't seem to go through, sorry if it ends up as multiple postings def findw(wnext,wprior,phiprior,uprior,vprior): #format here is x[i,j] where i's are rows, j's columns, use flipud() to get the #print out consistent with the spacial up-down directions #assign local names that are more #inline with the class notation w = wprior phi = phiprior u = uprior v = vprior #three of the BC are known so just set them #symetry plane wnext[0,0:gcols] = 0.0 #upper wall wnext[gN,0:gcols] = 2.0/gdy**2 * (phi[gN,0:gcols] - phi[gN-1,0:gcols]) #inlet, off the walls wnext[1:grows-1,0] = 0.0 upos = where(u>0) vpos = where(v>0) Sx = ones_like(u) Sx[upos] = 0.0 Sy = ones_like(v) Sy[vpos] = 0.0 uw = u*w vw = v*w #interior nodes for j in range(1,gasizej-1): for i in range(1,gasizei-1): wnext[i,j] =( w[i,j] + gnu*gdt/gdx**2 * (w[i,j-1] - 2.0*w[i,j] + w[i,j+1]) + gnu*gdt/gdy**2 * (w[i-1,j] - 2.0*w[i,j] + w[i+1,j]) - (1.0 - Sx[i,j]) * gdt/gdx * (uw[i,j] - uw[i,j-1]) - Sx[i,j] * gdt/gdx * (uw[i,j+1] - uw[i,j]) - (1.0 - Sy[i,j]) * gdt/gdy * (vw[i,j] - vw[i-1,j]) - Sy[i,j] * gdt/gdy * (vw[i+1,j] - vw[i,j]) ) ## print "***wnext****" ## print "i: ", i, "j: ", j, "wnext[i,j]: ", wnext[i,j] #final BC at outlet, off walls wnext[1:grows-1,gM] = wnext[1:grows-1,gM-1] _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From giorgio.luciano at chimica.unige.it Wed Dec 6 10:41:23 2006 From: giorgio.luciano at chimica.unige.it (Giorgio Luciano) Date: Wed, 06 Dec 2006 16:41:23 +0100 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question) Message-ID: <4576E4A3.7040004@chimica.unige.it> Today I've also posted a question to scipy groups because I've thought I've found a solution but this work good bar(N, b1[:,0], width, color='r', yerr=binterv) ############ s3=find(sig1[:,arange(ini,c)]<=0.001) b1=b.flatten() #if s3!=[]: for i3 in arange(len(s3)): text(s3[i3], b1[s3[i3]+ini],'***') s2=find(logical_and(sig1[:,arange(ini,c)]>0.001,sig1[:,arange(ini,c)]<=0.01)) for i2 in arange(len(s2)): text(s2[i2], b1[s2[i2]+ini],'**') s1=find(logical_and(sig1[:,arange(ini,c)]>0.01,sig1[:,arange(ini,c)]<=0.05)) for i1 in arange(len(s1)): text(s1[i1], b1[s1[i1]+ini],'*') title('Plot of the coefficients of the model') and when i uncomment the ifs3!=[] part it does not... so in this case I've solve the problem.. but is there an equivalent for isempty matlab command in numpy ? Thanks in advance for the reply Giorgio From david.huard at gmail.com Wed Dec 6 10:21:44 2006 From: david.huard at gmail.com (David Huard) Date: Wed, 6 Dec 2006 10:21:44 -0500 Subject: [Numpy-discussion] Resizing without allocating additional memory Message-ID: <91cf711d0612060721t3a7c0bc1y8ae72fd72d66f556@mail.gmail.com> Hi, I have fortran subroutines wrapped with f2py that take arrays as arguments, and I often need to use resize(a, N) to pass an array of copies of an element. The resize call , however, is becoming the speed bottleneck, so my question is: Is it possible to create an (1xN) array from a scalar without allocating additional memory for the array, ie just return a new "view" of the array where all elements point to the same scalar. Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at marquardt.sc Wed Dec 6 11:28:39 2006 From: christian at marquardt.sc (Christian Marquardt) Date: Wed, 6 Dec 2006 17:28:39 +0100 (CET) Subject: [Numpy-discussion] Indices returned by where() Message-ID: <31940.193.17.11.23.1165422519.squirrel@webmail.marquardt.sc> Dear list, apologies if the answer to my question is obvious... Is the following intentional? $>python Python 2.4 (#1, Mar 22 2005, 21:42:42) [GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> print np.__version__ 1.0 >>> x = np.array([1., 2., 3., 4., 5.]) >>> idx = np.where(x > 6.) >>> print len(idx) 1 The reason is of course that where() returns a tuple of index arrays instead of simply an index array: >>> print idx (array([], dtype=int32),) Does that mean that one always has to explicitely request the first element of the returned tuple in order to check how many matches were found, even for 1d arrays? What's the reason for designing it that way? Many thanks, Christian From robert.kern at gmail.com Wed Dec 6 12:01:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 06 Dec 2006 11:01:00 -0600 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question) In-Reply-To: <4576E4A3.7040004@chimica.unige.it> References: <4576E4A3.7040004@chimica.unige.it> Message-ID: <4576F74C.3080700@gmail.com> Giorgio Luciano wrote: > Today I've also posted a question to scipy groups because I've thought > I've found a solution but > > this work good > > bar(N, b1[:,0], width, color='r', yerr=binterv) > ############ > s3=find(sig1[:,arange(ini,c)]<=0.001) > b1=b.flatten() > #if s3!=[]: > for i3 in arange(len(s3)): > text(s3[i3], b1[s3[i3]+ini],'***') > s2=find(logical_and(sig1[:,arange(ini,c)]>0.001,sig1[:,arange(ini,c)]<=0.01)) > for i2 in arange(len(s2)): > text(s2[i2], b1[s2[i2]+ini],'**') > s1=find(logical_and(sig1[:,arange(ini,c)]>0.01,sig1[:,arange(ini,c)]<=0.05)) > for i1 in arange(len(s1)): > text(s1[i1], b1[s1[i1]+ini],'*') > title('Plot of the coefficients of the model') > > and when i uncomment the ifs3!=[] part it does not... > so in this case I've solve the problem.. but is there an equivalent for > isempty matlab command in numpy ? Use (len(s3) != 0) instead of (s3 != []). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From filip.wasilewski at gmail.com Wed Dec 6 12:09:52 2006 From: filip.wasilewski at gmail.com (Filip Wasilewski) Date: Wed, 6 Dec 2006 18:09:52 +0100 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question) In-Reply-To: <4576E4A3.7040004@chimica.unige.it> References: <4576E4A3.7040004@chimica.unige.it> Message-ID: On 12/6/06, Giorgio Luciano wrote: > Today I've also posted a question to scipy groups because I've thought > I've found a solution but > > this work good > > bar(N, b1[:,0], width, color='r', yerr=binterv) > ############ > s3=find(sig1[:,arange(ini,c)]<=0.001) Just a few tips before I answer your question. Is sig1 a global constant? It is a good practice to write constant names in uppercase. Otherwise consider passing it as a function argument. > b1=b.flatten() > #if s3!=[]: if s3: ... > for i3 in arange(len(s3)): Although this works, a no-surprise way is to use standard xrange: for i3 in xrange(len(s3)): ... or enumerate: for i3, elem in enumerate(s3): ... > text(s3[i3], b1[s3[i3]+ini],'***') > s2=find(logical_and(sig1[:,arange(ini,c)]>0.001,sig1[:,arange(ini,c)]<=0.01)) Boolean operators are also ok, just remember about parentheses and operators priority: (sig1[:,arange(ini,c)]>0.001) & (sig1[:,arange(ini,c)]<=0.01) > for i2 in arange(len(s2)): > text(s2[i2], b1[s2[i2]+ini],'**') > s1=find(logical_and(sig1[:,arange(ini,c)]>0.01,sig1[:,arange(ini,c)]<=0.05)) > for i1 in arange(len(s1)): > text(s1[i1], b1[s1[i1]+ini],'*') > title('Plot of the coefficients of the model') > > and when i uncomment the ifs3!=[] part it does not... I think you have just discovered a bug or it's an inconsistency I didn't know of? >>> print numpy.array([1,1]) == [], numpy.array([1,1]) != [] False True >>> print numpy.array([1]) == [], numpy.array([1]) != [] [] [] >>> print numpy.array([]) == [], numpy.array([]) != [] [] [] >>> numpy.__version__ '1.0' > so in this case I've solve the problem.. but is there an equivalent for > isempty matlab command in numpy ? Just like for other Python objects: if ifs3: print "not empty" or check if .size attribute is positive. cheers, fw From robert.kern at gmail.com Wed Dec 6 12:20:19 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 06 Dec 2006 11:20:19 -0600 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question) In-Reply-To: References: <4576E4A3.7040004@chimica.unige.it> Message-ID: <4576FBD3.3090601@gmail.com> Filip Wasilewski wrote: > Just like for other Python objects: > > if ifs3: > print "not empty" No, that doesn't work. numpy arrays do not have a truth value. They raise an error when you try to use them in such a context. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Wed Dec 6 13:45:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 06 Dec 2006 11:45:03 -0700 Subject: [Numpy-discussion] Resizing without allocating additional memory In-Reply-To: <91cf711d0612060721t3a7c0bc1y8ae72fd72d66f556@mail.gmail.com> References: <91cf711d0612060721t3a7c0bc1y8ae72fd72d66f556@mail.gmail.com> Message-ID: <45770FAF.1030009@ee.byu.edu> David Huard wrote: > Hi, > > I have fortran subroutines wrapped with f2py that take arrays as > arguments, and I often need to use resize(a, N) to pass an array of > copies of an element. The resize call , however, is becoming the speed > bottleneck, so my question is: > Is it possible to create an (1xN) array from a scalar without > allocating additional memory for the array, ie just return a new > "view" of the array where all elements point to the same scalar. > I don't think this would be possible in Fortran because Fortran does not provide a facility for using arbitrary striding (maybe later versions of Fortran using pointers does, though). If you can use arbitrary striding in your code, then you can construct such a view using appropriate strides (i.e. a stride of 0). You can do this with the ndarray constructor: a = array(5) g = ndarray(shape=(1,10), dtype=int, buffer=a, strides=(0,0)) But, notice you will get interesting results using g += 1 Explain why the result of this is an array of 15 (Hint: look at the value of a). -Travis From filip.wasilewski at gmail.com Wed Dec 6 14:04:35 2006 From: filip.wasilewski at gmail.com (Filip Wasilewski) Date: Wed, 6 Dec 2006 20:04:35 +0100 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question) In-Reply-To: <4576FBD3.3090601@gmail.com> References: <4576E4A3.7040004@chimica.unige.it> <4576FBD3.3090601@gmail.com> Message-ID: On 12/6/06, Robert Kern wrote: > Filip Wasilewski wrote: > > > Just like for other Python objects: > > > > if ifs3: > > print "not empty" > > No, that doesn't work. numpy arrays do not have a truth value. They raise an > error when you try to use them in such a context. Right! I could swear I have checked this before posting. Evidently I got bitten by this: >>> bool(numpy.array([])) False >>> bool(numpy.array([1])) True >>> bool(numpy.array([0])) False >>> bool(numpy.array([1,1])) Traceback (most recent call last): File "", line 1, in -toplevel- bool(numpy.array([1,1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() So depending on the situation one can use len or size: >>> len(numpy.array([[],[]])) 2 >>> numpy.array([[],[]]).size 0 And how to understand following? >>> print numpy.array([1,1]) == [], numpy.array([1,1]) != [] False True >>> print `numpy.array([1]) == []`, `numpy.array([1]) != []` array([], dtype=bool) array([], dtype=bool) >>> print bool(numpy.array([1]) == []), bool(numpy.array([1]) != []) False False cheers, fw From david.huard at gmail.com Wed Dec 6 15:15:04 2006 From: david.huard at gmail.com (David Huard) Date: Wed, 6 Dec 2006 15:15:04 -0500 Subject: [Numpy-discussion] Resizing without allocating additional memory In-Reply-To: <45770FAF.1030009@ee.byu.edu> References: <91cf711d0612060721t3a7c0bc1y8ae72fd72d66f556@mail.gmail.com> <45770FAF.1030009@ee.byu.edu> Message-ID: <91cf711d0612061215j1e0acba9q1bf2223cd265c19@mail.gmail.com> Thanks Travis, I guess we'll have to tweak the fortran subroutines. It would have been neat though. David Answer: Since g+=1 adds one to all N elements of g, the buffer a gets incremented N times. So a = array(i) g = ndarray(shape=(1,N), dtype=int, buffer=a, strides=(0,0)) g+=M returns i + M*N 2006/12/6, Travis Oliphant : > > David Huard wrote: > > > Hi, > > > > I have fortran subroutines wrapped with f2py that take arrays as > > arguments, and I often need to use resize(a, N) to pass an array of > > copies of an element. The resize call , however, is becoming the speed > > bottleneck, so my question is: > > Is it possible to create an (1xN) array from a scalar without > > allocating additional memory for the array, ie just return a new > > "view" of the array where all elements point to the same scalar. > > > I don't think this would be possible in Fortran because Fortran does not > provide a facility for using arbitrary striding (maybe later versions of > Fortran using pointers does, though). > > If you can use arbitrary striding in your code, then you can construct > such a view using appropriate strides (i.e. a stride of 0). You can do > this with the ndarray constructor: > > > a = array(5) > g = ndarray(shape=(1,10), dtype=int, buffer=a, strides=(0,0)) > > But, notice you will get interesting results using > > g += 1 > > Explain why the result of this is an array of 15 (Hint: look at the > value of a). > > -Travis > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at ee.byu.edu Wed Dec 6 15:27:43 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 06 Dec 2006 13:27:43 -0700 Subject: [Numpy-discussion] Overlapping copy with object_ arrays In-Reply-To: <20061204230402.39447.qmail@web37210.mail.mud.yahoo.com> References: <20061204230402.39447.qmail@web37210.mail.mud.yahoo.com> Message-ID: <457727BF.6040601@ee.byu.edu> James Flowers wrote: > Hello, > > Having a problem with overlapping copies. Memory being freed twice > ??? See below: Thanks for the test. This problem is fixed and will be checked into SVN as soon as I can figure out why I'm not able to access SVN from my work machine. The problem is that object array copies were done by first decrementing the reference count of all elements of the destination array and then incrementing the reference count of all elements of the destination array once the copy was complete. For over-lapping copies (containing only a single reference to an object). This created a problem as the reference count went to 0 before the copy occurred. I've changed the code so that the reference count of the source is increased and the reference count of the destination is decreased before the copy is made. Then, the reference counts are correct after the copy is completed even for over-lapping copies. -Travis From charlesr.harris at gmail.com Thu Dec 7 01:17:52 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Dec 2006 23:17:52 -0700 Subject: [Numpy-discussion] Indices returned by where() In-Reply-To: <31940.193.17.11.23.1165422519.squirrel@webmail.marquardt.sc> References: <31940.193.17.11.23.1165422519.squirrel@webmail.marquardt.sc> Message-ID: On 12/6/06, Christian Marquardt wrote: > > Dear list, > > apologies if the answer to my question is obvious... > > Is the following intentional? > > $>python > > Python 2.4 (#1, Mar 22 2005, 21:42:42) > [GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numpy as np > >>> print np.__version__ > 1.0 > > >>> x = np.array([1., 2., 3., 4., 5.]) > > >>> idx = np.where(x > 6.) > >>> print len(idx) > 1 > > The reason is of course that where() returns a tuple of index arrays > instead of simply an index array: > > >>> print idx > (array([], dtype=int32),) > > Does that mean that one always has to explicitely request the first > element of the returned tuple in order to check how many matches were > found, even for 1d arrays? What's the reason for designing it that way? Fancy indexing. In [1]: a = arange(10).reshape(2,5) In [2]: i = where(a>3) In [3]: i Out[3]: (array([0, 1, 1, 1, 1, 1]), array([4, 0, 1, 2, 3, 4])) In [4]: a[i] = 0 In [5]: a Out[5]: array([[0, 1, 2, 3, 0], [0, 0, 0, 0, 0]]) If you just want a count, try In [6]: a = arange(10).reshape(2,5) In [7]: sum(a>3) Out[7]: 6 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivetti at itc.it Thu Dec 7 05:30:15 2006 From: olivetti at itc.it (Emanuele Olivetti) Date: Thu, 07 Dec 2006 11:30:15 +0100 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 Message-ID: <4577ED37.6020208@itc.it> I'm running numpy 1.0 and 1.0.1 on several hosts and today I've found that pickling arrays in 1.0.1 generates problems to 1.0. An example: --- numpy 1.0.1 --- import numpy import pickle a = numpy.array([1,2,3]) f=open('test1.pickle','w') pickle.dump(a,f) f.close() --- If I unpickle test1.pickle in numpy 1.0 I got: --- numpy 1.0 >>> import numpy >>> import pickle >>> f=open('test1.pickle') >>> a=pickle.load(f) Traceback (most recent call last): File "", line 1, in File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 1370, in load return Unpickler(file).load() File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 858, in load dispatch[key](self) File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 1217, in load_build setstate(state) TypeError: argument 1 must be sequence of length 5, not 8 ----------------- How can I let access pickled arrays made in numpy 1.0.1 to numpy 1.0 ? Help! Thanks in advance, Emanuele From giorgio.luciano at chimica.unige.it Thu Dec 7 05:36:03 2006 From: giorgio.luciano at chimica.unige.it (Giorgio Luciano) Date: Thu, 07 Dec 2006 11:36:03 +0100 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question), (Robert Kern) In-Reply-To: References: Message-ID: <4577EE93.4060904@chimica.unige.it> > Today's Topics: > > 1. Re: equivalent to isempty command in matlab (newbie question) > (Robert Kern) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 06 Dec 2006 11:20:19 -0600 > From: Robert Kern > Subject: Re: [Numpy-discussion] equivalent to isempty command in > matlab (newbie question) > To: Discussion of Numerical Python > Message-ID: <4576FBD3.3090601 at gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Filip Wasilewski wrote: > > >> Just like for other Python objects: >> >> if ifs3: >> print "not empty" >> > > No, that doesn't work. numpy arrays do not have a truth value. They raise an > error when you try to use them in such a context. > > Does it exist a workaround for that to make numpy understand when an array is empty ? Giorgio From faltet at carabos.com Thu Dec 7 06:15:49 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu, 07 Dec 2006 12:15:49 +0100 Subject: [Numpy-discussion] equivalent to isempty command in matlab (newbie question), (Robert Kern) In-Reply-To: <4577EE93.4060904@chimica.unige.it> References: <4577EE93.4060904@chimica.unige.it> Message-ID: <1165490149.2588.28.camel@localhost.localdomain> El dj 07 de 12 del 2006 a les 11:36 +0100, en/na Giorgio Luciano va escriure: > Does it exist a workaround for that to make numpy understand when an > array is empty ? > Giorgio > I guess there should be many. One possibility is to use .size: In [9]:a=numpy.array([]) In [10]:a.size == False Out[10]:True In [11]:a=numpy.array([1]) In [12]:a.size == False Out[12]:False Cheers, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From alexandre.fayolle at logilab.fr Thu Dec 7 10:50:21 2006 From: alexandre.fayolle at logilab.fr (Alexandre Fayolle) Date: Thu, 7 Dec 2006 16:50:21 +0100 Subject: [Numpy-discussion] Numeric memory leak when building Numeric.array from numarray.array Message-ID: <20061207155020.GA18896@crater.logilab.fr> Hi, I'm facing a memory leak on an application that has to use numarray and Numeric (because of external dependencies). The problem occurs when building a Numeric array from a numarray array: import Numeric import numarray import sys atest = numarray.arange(200) temp = Numeric.array(atest) print sys.getrefcount(atest) # prints 3 print sys.getrefcount(temp) # prints 2 I'm running numarray 1.5.2 and Numeric 24.2 I can work around this by using an intermediate string representation: temp = Numeric.fromstring(atest.tostring(), atest.typecode()) temp.shape = atest.shape -- Alexandre Fayolle LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services Informatique scientifique: http://www.logilab.fr/science Reprise et maintenance de sites CPS: http://www.migration-cms.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 481 bytes Desc: Digital signature URL: From kxroberto at googlemail.com Thu Dec 7 11:11:14 2006 From: kxroberto at googlemail.com (Robert) Date: Thu, 07 Dec 2006 17:11:14 +0100 Subject: [Numpy-discussion] Future Python 2.3 support ? - Re: Numpy and Python 2.2 on RHEL3 In-Reply-To: <4575C63F.8050509@gmail.com> References: <45759940.8080000@icecube.wisc.edu> <4575C63F.8050509@gmail.com> Message-ID: Robert Kern wrote: > David Bogen wrote: >> All: >> >> Is it possible to build Numpy using Python 2.2? I haven't been able to >> find anything that explicitly lists the versions of Python with which >> Numpy functions so I've been working under the assumption that the two >> bits will mesh together somehow. > > numpy requires Python 2.3 . > hope Python2.3 support will not be dropped too early. There is not much cost overall when going from 2.2 to 2.3. Yet from Py2.3 to Py2.4 there is a tremendous increase in memory footprint, in distributable file sizes, load time, cgi start time, compiler troubles on Win etc. .. - not really balanced by comparable improvements. For most types of practical applications Py2.3 is still the "good Python" for me. Robert From faltet at carabos.com Thu Dec 7 11:36:22 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu, 07 Dec 2006 17:36:22 +0100 Subject: [Numpy-discussion] Numeric memory leak when building Numeric.array from numarray.array In-Reply-To: <20061207155020.GA18896@crater.logilab.fr> References: <20061207155020.GA18896@crater.logilab.fr> Message-ID: <1165509382.2588.54.camel@localhost.localdomain> El dj 07 de 12 del 2006 a les 16:50 +0100, en/na Alexandre Fayolle va escriure: > Hi, > > I'm facing a memory leak on an application that has to use numarray and > Numeric (because of external dependencies). > > The problem occurs when building a Numeric array from a numarray array: > > import Numeric > import numarray > import sys > atest = numarray.arange(200) > temp = Numeric.array(atest) > print sys.getrefcount(atest) # prints 3 > print sys.getrefcount(temp) # prints 2 > > I'm running numarray 1.5.2 and Numeric 24.2 Yeah, it seems like the array protocol implementation in Numeric is leaking. Unfortunately, as Numeric maintenance has been dropped, there is small chances that this would be fixed in the future. > > I can work around this by using an intermediate string representation: > > temp = Numeric.fromstring(atest.tostring(), atest.typecode()) > temp.shape = atest.shape Another (faster) workaround would be: temp2 = Numeric.fromstring(atest._data, typecode=atest.typecode()) which is pretty fast: In [20]:Timer("Numeric.fromstring(atest._data, typecode=atest.typecode())", "import numarray, Numeric; atest=numarray.arange(200)").repeat(3,10000) Out[20]:[0.18092107772827148, 0.13870906829833984, 0.13995194435119629] i.e. more than 2x faster than your current solution: In [21]:Timer("Numeric.fromstring(atest.tostring(), typecode=atest.typecode())", "import numarray, Numeric; atest=numarray.arange(200)").repeat(3,10000) Out[21]:[0.37756705284118652, 0.32852792739868164, 0.32704305648803711] and similar in speed to the native .array() and .asarray() based on the array protocol: In [22]:Timer("Numeric.array(atest)", "import numarray, Numeric; atest=numarray.arange(200)").repeat(3,10000) Out[22]:[0.17277789115905762, 0.12470793724060059, 0.12530016899108887] In [23]:Timer("Numeric.asarray(atest)", "import numarray, Numeric; atest=numarray.arange(200)").repeat(3,10000) Out[23]:[0.20457005500793457, 0.15211081504821777, 0.15212082862854004] As an aside, and curiously enough, Numeric.array() (a copy is done) is faster than Numeric.asarray() (a copy shouldn't be done) :-/ HTH, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From ddrake at brontes3d.com Thu Dec 7 11:45:05 2006 From: ddrake at brontes3d.com (Daniel Drake) Date: Thu, 07 Dec 2006 11:45:05 -0500 Subject: [Numpy-discussion] numarray-1.5.2 and Py_NONE refcount crash Message-ID: <1165509905.26874.30.camel@systems03.lan.brontes3d.com> Hi, I know that numarray is outdated now, but it's too big a job to change to numpy right now. On the offchance that someone can help: After upgrading from numarray-1.3.1 to numarray-1.5.2, we get occasional crashes where python tries to free Py_NONE I'm aware of the NA_FromDimsStridesDescrAndData fix at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=399440 however this doesn't solve the problem here and that particular function doesn't seem to be in our codepath anyway. Any ideas? Thanks -- Daniel Drake Brontes Technologies, A 3M Company From kwgoodman at gmail.com Thu Dec 7 12:05:31 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 7 Dec 2006 09:05:31 -0800 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: <4577ED37.6020208@itc.it> References: <4577ED37.6020208@itc.it> Message-ID: On 12/7/06, Emanuele Olivetti wrote: > How can I let access pickled arrays made in numpy 1.0.1 to numpy 1.0 ? If you pickle in 1.0.1, I bet you can read it in 1.0. I don't know why the pickle format keeps changing. But I understand why an old version of software can't always read data generated by a new version of software. From kwgoodman at gmail.com Thu Dec 7 12:06:31 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 7 Dec 2006 09:06:31 -0800 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: References: <4577ED37.6020208@itc.it> Message-ID: On 12/7/06, Keith Goodman wrote: > On 12/7/06, Emanuele Olivetti wrote: > > How can I let access pickled arrays made in numpy 1.0.1 to numpy 1.0 ? > > If you pickle in 1.0.1, I bet you can read it in 1.0. > > I don't know why the pickle format keeps changing. But I understand > why an old version of software can't always read data generated by a > new version of software. Sorry. I meant if you pickle in 1.0, I bet you can read it in 1.0.1. From charlesr.harris at gmail.com Thu Dec 7 12:42:06 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Dec 2006 10:42:06 -0700 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: References: <4577ED37.6020208@itc.it> Message-ID: On 12/7/06, Keith Goodman wrote: > > On 12/7/06, Keith Goodman wrote: > > On 12/7/06, Emanuele Olivetti wrote: > > > How can I let access pickled arrays made in numpy 1.0.1 to numpy 1.0 ? > > > > If you pickle in 1.0.1, I bet you can read it in 1.0. > > > > I don't know why the pickle format keeps changing. But I understand > > why an old version of software can't always read data generated by a > > new version of software. The 1.0.x versions are supposed to be compatible. I don't see any changes to pickle in svn since before the 1.0 release, so there might be another problem here. Are there other differences between the machines? Python version, OS, endianess, 32 vs 64 bit, compiler, etc. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Dec 7 12:49:56 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Dec 2006 10:49:56 -0700 Subject: [Numpy-discussion] Strange numpy behaviour with pickle In-Reply-To: <4565B9C5.1080108@xrce.xerox.com> References: <4565B9C5.1080108@xrce.xerox.com> Message-ID: On 11/23/06, Jerome Fuselier wrote: > > Hello, > I've discovered a small problem when I tried to save a numpy array with > the pickle module. The dumped array is not always equal to the loaded one > and the error is not always here, depending on the way I express matrix > operations. > > I illustrated the error with a small script attached with this message, > the 4th test corresponds to what I did first which didn't do what I was > expecting. > > Am I missing something or is it a bug ? > The tests all work for me. What version of numpy are you using? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Dec 7 13:26:37 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 07 Dec 2006 12:26:37 -0600 Subject: [Numpy-discussion] Future Python 2.3 support ? - Re: Numpy and Python 2.2 on RHEL3 In-Reply-To: References: <45759940.8080000@icecube.wisc.edu> <4575C63F.8050509@gmail.com> Message-ID: <45785CDD.5060004@gmail.com> Robert wrote: > Robert Kern wrote: >> numpy requires Python 2.3 . > > hope Python2.3 support will not be dropped too early. There is not much cost overall when going from 2.2 to 2.3. It won't. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From emanuele at relativita.com Thu Dec 7 16:19:00 2006 From: emanuele at relativita.com (emanuele at relativita.com) Date: Thu, 7 Dec 2006 22:19:00 +0100 (CET) Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: References: <4577ED37.6020208@itc.it> Message-ID: <38164.194.242.201.223.1165526340.squirrel@webmail.relativita.com> On Thu, December 7, 2006 6:42 pm, Charles R Harris wrote: > The 1.0.x versions are supposed to be compatible. I don't see any changes > to > pickle in svn since before the 1.0 release, so there might be another > problem here. Are there other differences between the machines? Python > version, OS, endianess, 32 vs 64 bit, compiler, etc. Both hosts are 32bit i386 with python 2.5 and different version of numpy. Gcc version is pretty similar: - host1 : gcc version 3.4.5 20051201 (Red Hat 3.4.5-2) - host2 : gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) Try my example yourself and tell me if it works for you. Thanks, Emanuele From oliphant.travis at ieee.org Thu Dec 7 19:45:48 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu, 07 Dec 2006 17:45:48 -0700 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: <4577ED37.6020208@itc.it> References: <4577ED37.6020208@itc.it> Message-ID: Emanuele Olivetti wrote: > I'm running numpy 1.0 and 1.0.1 on several hosts and > today I've found that pickling arrays in 1.0.1 generates > problems to 1.0. An example: > --- numpy 1.0.1 --- > import numpy > import pickle > a = numpy.array([1,2,3]) > f=open('test1.pickle','w') > pickle.dump(a,f) > f.close() > --- > > If I unpickle test1.pickle in numpy 1.0 I got: > --- numpy 1.0 > >>> import numpy > >>> import pickle > >>> f=open('test1.pickle') > >>> a=pickle.load(f) > Traceback (most recent call last): > File "", line 1, in > File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 1370, in load > return Unpickler(file).load() > File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 858, in load > dispatch[key](self) > File "/hardmnt/virgo0/sra/olivetti/myapps/lib/python2.5/pickle.py", line 1217, in load_build > setstate(state) > TypeError: argument 1 must be sequence of length 5, not 8 > ----------------- Please show which version of numpy you are using. There were no changes to pickle from released numpy 1.0 to 1.0.1 (at least that I'm aware of). There might, however, be bugs. -Travis From Jerome.Fuselier at xrce.xerox.com Fri Dec 8 03:33:40 2006 From: Jerome.Fuselier at xrce.xerox.com (Jerome Fuselier) Date: Fri, 08 Dec 2006 09:33:40 +0100 Subject: [Numpy-discussion] Strange numpy behaviour with pickle In-Reply-To: References: <4565B9C5.1080108@xrce.xerox.com> Message-ID: <45792364.5090408@xrce.xerox.com> An HTML attachment was scrubbed... URL: From oliphant.travis at ieee.org Fri Dec 8 03:40:13 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Fri, 08 Dec 2006 01:40:13 -0700 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: <4577ED37.6020208@itc.it> References: <4577ED37.6020208@itc.it> Message-ID: Emanuele Olivetti wrote: > I'm running numpy 1.0 and 1.0.1 on several hosts and > today I've found that pickling arrays in 1.0.1 generates > problems to 1.0. An example: I correct my previous statement. Yes, this is true. Pickles generated with 1.0.1 cannot be read by version 1.0 However, pickles generated with 1.0 can be read by 1.0.1. It is typically not the case that pickles created with newer versions of the code will work with older versions. I obviously didn't think that was something to be concerned about because it slipped my mind. Changeset 3411 is the reason where the hasobject member of the data-type object was given more bits (and therefore needed to be saved). Sorry for the trouble, -Travis From emanuele at relativita.com Fri Dec 8 10:06:41 2006 From: emanuele at relativita.com (Emanuele Olivetti) Date: Fri, 08 Dec 2006 16:06:41 +0100 Subject: [Numpy-discussion] pickling arrays: numpy 1.0 can't unpickle numpy 1.0.1 In-Reply-To: References: <4577ED37.6020208@itc.it> Message-ID: <45797F81.8060204@relativita.com> Travis E. Oliphant wrote: > I correct my previous statement. Yes, this is true. Pickles generated > with 1.0.1 cannot be read by version 1.0 > > However, pickles generated with 1.0 can be read by 1.0.1. It is > typically not the case that pickles created with newer versions of the > code will work with older versions. I obviously didn't think that was > something to be concerned about because it slipped my mind. > > Changeset 3411 is the reason where the hasobject member of the data-type > object was given more bits (and therefore needed to be saved). > Thank you for the detailed explanation. I can easily upgrade the central host that collects results from other hosts in order to be able to read all results (1.0 or 1.0.1). Emanuele From weili at jimmy.harvard.edu Thu Dec 7 21:52:34 2006 From: weili at jimmy.harvard.edu (Wei) Date: Thu, 7 Dec 2006 21:52:34 -0500 Subject: [Numpy-discussion] installation error in cygwin Message-ID: <008401c71a73$ea39b480$a32d349b@WeiLiLaptop> Hi, I just got my new intel core duo laptop. So I downloaded the new cygwin (including everything) but couldn't get the numarray or numpy modules installed correctly. I always got the following error. Can some one help? Many thanks! Wei python setup.py install Using EXTRA_COMPILE_ARGS = [] Using builtin 'lite' BLAS and LAPACK running config Wrote config.h running install running build running build_py copying Lib/numinclude.py -> build/lib.cygwin-1.5.22-i686-2.4/numarray running build_ext building 'numarray.libnumarray' extension gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.5.22-i686-2.4/Src/libnumarraymodule.o -L/usr/lib/python2.4/config -lpython2.4 -o build/lib.cygwin-1.5.22-i686-2.4/numarray/libnumarray.dll -lm -L/lib -lm -lc -lgcc -L/lib/mingw -lmingwex /lib/mingw/libmingwex.a(feclearexcept.o):feclearexcept.c:(.text+0x21): undefined reference to `___cpu_features' /lib/mingw/libmingwex.a(fetestexcept.o):fetestexcept.c:(.text+0x7): undefined reference to `___cpu_features' collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomh at kurage.nimh.nih.gov Fri Dec 8 06:47:41 2006 From: tomh at kurage.nimh.nih.gov (Tom Holroyd) Date: Fri, 8 Dec 2006 06:47:41 -0500 (EST) Subject: [Numpy-discussion] shuffle bug Message-ID: This is certainly a bug. It has been mentioned before, but there was no comment. shuffle() doesn't handle arrays. >>> from numpy import * >>> from numpy.random import * >>> a = arange(12) >>> a.shape = (6,2) >>> a array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11]]) >>> shuffle(a) >>> a array([[ 0, 1], [ 2, 3], [ 2, 3], [ 0, 1], [ 4, 5], [10, 11]]) This is with numpy 1.0. The [0, 1] element was duplicated. That's not right. Tom Holroyd, Ph.D. We experience the world not as it is, but as we expect it to be. From robert.kern at gmail.com Fri Dec 8 21:07:34 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 08 Dec 2006 20:07:34 -0600 Subject: [Numpy-discussion] shuffle bug In-Reply-To: References: Message-ID: <457A1A66.7090602@gmail.com> Tom Holroyd wrote: > This is certainly a bug. It has been mentioned before, but there > was no comment. Yes, there was. http://projects.scipy.org/pipermail/numpy-discussion/2006-November/024783.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Fri Dec 8 21:52:33 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 08 Dec 2006 19:52:33 -0700 Subject: [Numpy-discussion] installation error in cygwin In-Reply-To: <008401c71a73$ea39b480$a32d349b@WeiLiLaptop> References: <008401c71a73$ea39b480$a32d349b@WeiLiLaptop> Message-ID: <457A24F1.1030608@ieee.org> Wei wrote: > > Hi, > > I just got my new intel core duo laptop. So I downloaded the new > cygwin (including everything) but couldn?t get the numarray or numpy > modules installed correctly. I always got the following error. Can > some one help? > > Many thanks! > > Wei > > python setup.py install > > Using EXTRA_COMPILE_ARGS = [] > > Using builtin 'lite' BLAS and LAPACK > > running config > > Wrote config.h > > running install > > running build > > running build_py > > copying Lib/numinclude.py -> build/lib.cygwin-1.5.22-i686-2.4/numarray > > running build_ext > > building 'numarray.libnumarray' extension > > gcc -shared -Wl,--enable-auto-image-base > build/temp.cygwin-1.5.22-i686-2.4/Src/libnumarraymodule.o > -L/usr/lib/python2.4/config -lpython2.4 -o > build/lib.cygwin-1.5.22-i686-2.4/numarray/libnumarray.dll -lm -L/lib > -lm -lc -lgcc -L/lib/mingw -lmingwex > > /lib/mingw/libmingwex.a(feclearexcept.o):feclearexcept.c:(.text+0x21): > undefined reference to `___cpu_features' > > /lib/mingw/libmingwex.a(fetestexcept.o):fetestexcept.c:(.text+0x7): > undefined reference to `___cpu_features' > > collect2: ld returned 1 exit status > > error: command 'gcc' failed with exit status 1 > You don't need cygwin to install NumPy. I use the mingw compiler to compile windows binaries. This error looks like a problem with the platform-dependent code, but it's showing up in an odd place. I'm not sure what the issue is. -Travis From charlesr.harris at gmail.com Fri Dec 8 22:58:20 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Dec 2006 20:58:20 -0700 Subject: [Numpy-discussion] installation error in cygwin In-Reply-To: <457A24F1.1030608@ieee.org> References: <008401c71a73$ea39b480$a32d349b@WeiLiLaptop> <457A24F1.1030608@ieee.org> Message-ID: On 12/8/06, Travis Oliphant wrote: > > Wei wrote: > > > > Hi, > > > > I just got my new intel core duo laptop. So I downloaded the new > > cygwin (including everything) but couldn't get the numarray or numpy > > modules installed correctly. I always got the following error. Can > > some one help? > > > > Many thanks! > > > > Wei > > > > python setup.py install > > > > Using EXTRA_COMPILE_ARGS = [] > > > > Using builtin 'lite' BLAS and LAPACK > > > > running config > > > > Wrote config.h > > > > running install > > > > running build > > > > running build_py > > > > copying Lib/numinclude.py -> build/lib.cygwin-1.5.22-i686-2.4/numarray > > > > running build_ext > > > > building 'numarray.libnumarray' extension > > > > gcc -shared -Wl,--enable-auto-image-base > > build/temp.cygwin-1.5.22-i686-2.4/Src/libnumarraymodule.o > > -L/usr/lib/python2.4/config -lpython2.4 -o > > build/lib.cygwin-1.5.22-i686-2.4/numarray/libnumarray.dll -lm -L/lib > > -lm -lc -lgcc -L/lib/mingw -lmingwex > > > > /lib/mingw/libmingwex.a(feclearexcept.o):feclearexcept.c:(.text+0x21): > > undefined reference to `___cpu_features' > > > > /lib/mingw/libmingwex.a(fetestexcept.o):fetestexcept.c:(.text+0x7): > > undefined reference to `___cpu_features' > > > > collect2: ld returned 1 exit status > > > > error: command 'gcc' failed with exit status 1 > > Looks almost like a mix of mingw and cygwin. The two aren't compatible, so I wonder if the python you have was compiled with cygwin or with vc. If the latter, mingw is the proper compiler to use. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Sat Dec 9 00:31:47 2006 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 9 Dec 2006 00:31:47 -0500 Subject: [Numpy-discussion] shuffle bug In-Reply-To: <457A1A66.7090602@gmail.com> References: <457A1A66.7090602@gmail.com> Message-ID: > Tom Holroyd wrote: >> there was no comment. On Fri, 08 Dec 2006, Robert Kern apparently wrote: > Yes, there was. > http://projects.scipy.org/pipermail/numpy-discussion/2006-November/024783.html Also see the previous comment: http://projects.scipy.org/pipermail/numpy-discussion/2006-November/024782.html fwiw, Alan Isaac From cssmwbs at gmail.com Sat Dec 9 01:20:31 2006 From: cssmwbs at gmail.com (WB) Date: Fri, 8 Dec 2006 22:20:31 -0800 Subject: [Numpy-discussion] compile scipy by using intel compiler In-Reply-To: <1165354075.6742.5.camel@cortechs25.cortechs.net> References: <1165354075.6742.5.camel@cortechs25.cortechs.net> Message-ID: <7c13686f0612082220w56ddf420xbeb6cc27588739e6@mail.gmail.com> hi gen, have you tried the compiler designed specifically for the opteron rather than the intel? you can download it here: http://developer.amd.com/acml.jsp don't know if it will get rid of any of your errors or if it will help compile scipy, but it may be worth a try anyway. wb On 12/5/06, Gennan Chen < gnchen at cortechs.net> wrote: > > Hi! All, > > I have a dual opteron 285 with 8G ram machine. And I ran FC6 x86_64 on > that. I did manage to get numpy (from svn) compiled by using icc 9.1.0.45and mkl > 9.0 ( got 3 errors when I ran the est). But no such luck for scipy (from > svn). Below is the error: > > Lib/special/cephes/mconf.h(137): remark #193: zero used for undefined > preprocessing identifier > #if WORDS_BIGENDIAN /* Defined in pyconfig.h */ > ^ > > Lib/special/cephes/const.c(92): error: floating-point operation result is > out of range > double INFINITY = 1.0/0.0; /* 99e999; */ > ^ > > Lib/special/cephes/const.c(97): error: floating-point operation result is > out of range > double NAN = 1.0/0.0 - 1.0/0.0; > ^ > > Lib/special/cephes/const.c(97): error: floating-point operation result is > out of range > double NAN = 1.0/0.0 - 1.0/0.0; > ^ > > compilation aborted for Lib/special/cephes/const.c (code 2) > error: Command "icc -O2 -g -fomit-frame-pointer -mcpu=pentium4 > -mtune=pentium4 -march=pentium4 -msse3 -axW -Wall -fPIC -c > Lib/special/cephes/const.c -o build/temp.linux-x86_64-2.4/Lib/special/cephes/const.o" > failed with exit status 2 > > Did anyone has a solution for this? > > BTW, the 3 error I got from numpy are: > File > "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line > 25, in test_ufunclike > Failed example: > nx.sign(a) > Expected: > array([ 1., -1., 0., 0., 1., -1.]) > Got: > array([ 1., -1., -1., 0., 1., -1.]) > ********************************************************************** > File > "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line > 40, in test_ufunclike > Failed example: > nx.sign(a, y) > Expected: > array([True, True, False, False, True, True], dtype=bool) > Got: > array([True, True, True, False, True, True], dtype=bool) > ********************************************************************** > File > "/usr/lib64/python2.4/site-packages/numpy/lib/tests/test_ufunclike.py", line > 43, in test_ufunclike > Failed example: > y > Expected: > array([True, True, False, False, True, True], dtype=bool) > Got: > array([True, True, True, False, True, True], dtype=bool) > > > Are these error serious?? > > Or maybe I should get back to gcc? Anyone got a good speed up by using icc > and mkl? > > -- > Gen-Nan Chen, PhD > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morten.bjerkaas at gmail.com Sun Dec 10 07:40:28 2006 From: morten.bjerkaas at gmail.com (=?ISO-8859-1?Q?Morten_Bjerk=E5s?=) Date: Sun, 10 Dec 2006 13:40:28 +0100 Subject: [Numpy-discussion] griddata in python from x,y,z coordinates Message-ID: Dear I wounder about if you know about a command in numpy similar to the command griddata in MATLAB. This command makes a grid out of a set of x,y,z coordinates. best regards Morten -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Dec 10 07:45:40 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 10 Dec 2006 06:45:40 -0600 Subject: [Numpy-discussion] griddata in python from x,y,z coordinates In-Reply-To: References: Message-ID: <457C0174.9010207@gmail.com> Morten Bjerk?s wrote: > Dear > I wounder about if you know about a command in numpy similar to the > command griddata in MATLAB. This command makes a grid out of a set of > x,y,z coordinates. http://www.scipy.org/Cookbook/Matplotlib/Gridding_irregularly_spaced_data -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Sun Dec 10 15:06:12 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 10 Dec 2006 15:06:12 -0500 Subject: [Numpy-discussion] ANN: An alternative to numpy.core.ma Message-ID: <200612101506.12873.pgmdevlist@gmail.com> All, I just posted on the DeveloperZone of the wiki the latest version of maskedarray, an alternative to numpy.core.ma. You can download it here: http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedarray-1.00.dev0040.tar.gz The package has three modules: core (with the basic functions of numpy.core.ma), extras (which adds some functions, such as apply_along_axis, or the concatenator mr_), and testutils (which adds support for maskedarray for the tests functions). It also comes with its test suite (available in the tests subdirectory). For those of you who were not aware of it, the new MaskedArray is a subclass of ndarray, and it accepts any subclass of ndarray as data. You can use it as you would with numpy.core.ma.MaskedArray. For those of you who already tested the package, the main modifications are: - the reorganization of the initial module in core+extras. - Data are now shared by default (in other terms, the copy flag defaults to False in MaskedArray.__new__), for consistency with the rest of numpy. - An additional boolean flag has been introduced: keep_mask (with a default of True). This flag is useful when trying to mask a mask array: it tells __new__ whether to keep the initial mask (in that case, the new mask will be combined with the old mask) or not (in that case, the new mask replaces the old one). - Some functions/routines that were missing have been added (any/all...) As always, this is a work in progress. In particular, I should really check for the bottlenecks: would anybody have some pointers ? If you wanna be on the safe, optimized side, stick to numpy.core.ma. Otherwise, please try this new implementation, and don't forget to give me some feedback! PS: Technical question: how can I delete some files in the DeveloperZone wiki ? The maskedarray.py, test_maskedarray.py, test_subclasses.py are out of date, and should be replaced by the package. Thanks a lot in advance ! From david at icps.u-strasbg.fr Mon Dec 11 06:32:09 2006 From: david at icps.u-strasbg.fr (R. David) Date: Mon, 11 Dec 2006 12:32:09 +0100 Subject: [Numpy-discussion] lapack_lite dgesv Message-ID: <457D41B9.1030804@icps.u-strasbg.fr> Hello, I am trying to use the lapack_lite dgesv routine. The following sample code : from numpy import * [....] a=zeros((nbrows,nbcols),float,order='C') [....] ipiv=zeros((DIM),int,order='C') [....] linalg.lapack_lite.dgesv(DIM,1,a,DIM,asarray(ipiv),b,DIM,info) leads do the followin error message : lapack_lite.LapackError: Parameter ipiv is not of type PyArray_INT in lapack_lite.dgesv I don't understand the type problem for ipiv ! Indeed, the type of 'a' is OK, ipiv is created the same way than a, but something goes wrong. Do you have a clue for this ? Regards, Romaric -- -------------------------------------- R. David - david at icps.u-strasbg.fr Tel. : 03 90 24 45 48 (Fax 45 47) -------------------------------------- From alexandre.fayolle at logilab.fr Mon Dec 11 08:16:21 2006 From: alexandre.fayolle at logilab.fr (Alexandre Fayolle) Date: Mon, 11 Dec 2006 14:16:21 +0100 Subject: [Numpy-discussion] Numeric memory leak when building Numeric.array from numarray.array In-Reply-To: <1165509382.2588.54.camel@localhost.localdomain> References: <20061207155020.GA18896@crater.logilab.fr> <1165509382.2588.54.camel@localhost.localdomain> Message-ID: <20061211131621.GE18685@crater.logilab.fr> On Thu, Dec 07, 2006 at 05:36:22PM +0100, Francesc Altet wrote: > El dj 07 de 12 del 2006 a les 16:50 +0100, en/na Alexandre Fayolle va > escriure: > > Hi, > > > > I'm facing a memory leak on an application that has to use numarray and > > Numeric (because of external dependencies). > > > > The problem occurs when building a Numeric array from a numarray array: > > > > import Numeric > > import numarray > > import sys > > atest = numarray.arange(200) > > temp = Numeric.array(atest) > > print sys.getrefcount(atest) # prints 3 > > print sys.getrefcount(temp) # prints 2 > > > > I'm running numarray 1.5.2 and Numeric 24.2 > > Yeah, it seems like the array protocol implementation in Numeric is > leaking. Unfortunately, as Numeric maintenance has been dropped, there > is small chances that this would be fixed in the future. > > > > > I can work around this by using an intermediate string representation: > > > > temp = Numeric.fromstring(atest.tostring(), atest.typecode()) > > temp.shape = atest.shape > > Another (faster) workaround would be: > > temp2 = Numeric.fromstring(atest._data, typecode=atest.typecode()) Nice! Thanks Francesc. -- Alexandre Fayolle LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services Informatique scientifique: http://www.logilab.fr/science Reprise et maintenance de sites CPS: http://www.migration-cms.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 481 bytes Desc: Digital signature URL: From tim.hochberg at ieee.org Mon Dec 11 08:20:02 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon, 11 Dec 2006 06:20:02 -0700 Subject: [Numpy-discussion] lapack_lite dgesv In-Reply-To: <457D41B9.1030804@icps.u-strasbg.fr> References: <457D41B9.1030804@icps.u-strasbg.fr> Message-ID: <457D5B02.2000501@ieee.org> R. David wrote: > Hello, > > I am trying to use the lapack_lite dgesv routine. > > The following sample code : > > from numpy import * > [....] > a=zeros((nbrows,nbcols),float,order='C') > [....] > ipiv=zeros((DIM),int,order='C') > [....] > linalg.lapack_lite.dgesv(DIM,1,a,DIM,asarray(ipiv),b,DIM,info) > > leads do the followin error message : > lapack_lite.LapackError: Parameter ipiv is not of type PyArray_INT in lapack_lite.dgesv > > I don't understand the type problem for ipiv ! > Indeed, the type of 'a' is OK, ipiv is created the same way than a, but something goes > wrong. > Do you have a clue for this ? > The problem is probably your definition of ipiv. "(DIM)" is just a parenthesized scalar, what you probably want is "(DIM,)", which is a one-tuple. Personally, I'd recommend using list notation ("[nbrows, nbcols]", "[DIM]") rather than tuple notation since it's both easier to read and and avoids this type of mistake. Regards, -tim From david at icps.u-strasbg.fr Mon Dec 11 08:32:44 2006 From: david at icps.u-strasbg.fr (R. David) Date: Mon, 11 Dec 2006 14:32:44 +0100 Subject: [Numpy-discussion] lapack_lite dgesv In-Reply-To: <457D5B02.2000501@ieee.org> References: <457D41B9.1030804@icps.u-strasbg.fr> <457D5B02.2000501@ieee.org> Message-ID: <457D5DFC.2060109@icps.u-strasbg.fr> Hello Tim, > > The problem is probably your definition of ipiv. "(DIM)" is just a > parenthesized scalar, what you probably want is "(DIM,)", which is a > one-tuple. Personally, I'd recommend using list notation ("[nbrows, > nbcols]", "[DIM]") rather than tuple notation since it's both easier to > read and and avoids this type of mistake. I tried both notations and none work. In the meantime, I tried extending the ipiv arrays to a 2 dimensionnal ones (if I had more than on right member for instante), but I still get the error message. Romaric From tim.hochberg at ieee.org Mon Dec 11 08:48:55 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon, 11 Dec 2006 06:48:55 -0700 Subject: [Numpy-discussion] lapack_lite dgesv In-Reply-To: <457D5DFC.2060109@icps.u-strasbg.fr> References: <457D41B9.1030804@icps.u-strasbg.fr> <457D5B02.2000501@ieee.org> <457D5DFC.2060109@icps.u-strasbg.fr> Message-ID: <457D61C7.8050308@ieee.org> R. David wrote: > Hello Tim, > >> The problem is probably your definition of ipiv. "(DIM)" is just a >> parenthesized scalar, what you probably want is "(DIM,)", which is a >> one-tuple. Personally, I'd recommend using list notation ("[nbrows, >> nbcols]", "[DIM]") rather than tuple notation since it's both easier to >> read and and avoids this type of mistake. >> > > I tried both notations and none work. > > In the meantime, I tried extending the ipiv arrays to a 2 dimensionnal > ones (if I had more than on right member for instante), but I still > get the error message. > > Romaric, Try replacing 'int' with intc (or numpy.intc if you are not using 'import *'). The following 'works' for me in the sense that it doesn't throw any errors (although I imagine the results are nonsense): from numpy import * nbrows, nbcols = 10, 10 a=zeros([nbrows,nbcols],float,order='C') b = zeros([nbcols], float, order='C') DIM = nbrows info = 0 ipiv=zeros([DIM],intc,order='C') linalg.lapack_lite.dgesv(DIM,1,a,DIM,ipiv,b,DIM,info) [In the future, could you could you try including self contained examples, so others don't have to go back and figure out sensible values for DIMS and info and whatnot? Ideally we'd just be able to throw them into a file and run them and get the same error that you are getting] Hope that solves it. -tim From david at icps.u-strasbg.fr Mon Dec 11 09:00:52 2006 From: david at icps.u-strasbg.fr (R. David) Date: Mon, 11 Dec 2006 15:00:52 +0100 Subject: [Numpy-discussion] lapack_lite dgesv In-Reply-To: <457D61C7.8050308@ieee.org> References: <457D41B9.1030804@icps.u-strasbg.fr> <457D5B02.2000501@ieee.org> <457D5DFC.2060109@icps.u-strasbg.fr> <457D61C7.8050308@ieee.org> Message-ID: <457D6494.3080103@icps.u-strasbg.fr> Hello, > Try replacing 'int' with intc (or numpy.intc if you are not using > 'import *'). The following 'works' for me in the sense that it doesn't > throw any errors (although I imagine the results are nonsense): Thanks, it works now !! Sorry for non including the whole code, I just not wanted to annoy the whole list with bunchs of code :-) Regards, Romaric From tim.hochberg at ieee.org Mon Dec 11 09:01:43 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon, 11 Dec 2006 07:01:43 -0700 Subject: [Numpy-discussion] lapack_lite dgesv In-Reply-To: <457D6494.3080103@icps.u-strasbg.fr> References: <457D41B9.1030804@icps.u-strasbg.fr> <457D5B02.2000501@ieee.org> <457D5DFC.2060109@icps.u-strasbg.fr> <457D61C7.8050308@ieee.org> <457D6494.3080103@icps.u-strasbg.fr> Message-ID: <457D64C7.6030905@ieee.org> R. David wrote: > Hello, > > > >> Try replacing 'int' with intc (or numpy.intc if you are not using >> 'import *'). The following 'works' for me in the sense that it doesn't >> throw any errors (although I imagine the results are nonsense): >> > Thanks, it works now !! > Great. Glad that helped. > Sorry for non including the whole code, I just not wanted to annoy the whole list > with bunchs of code :-) > Don't include the whole code -- that would probably annoy somebody. What you included was almost right, except that it should run on it's own. So nbrows, nbcols, DIMS and info all needed to be defined. That's all. -tim From faltet at carabos.com Mon Dec 11 10:33:18 2006 From: faltet at carabos.com (Francesc Altet) Date: Mon, 11 Dec 2006 16:33:18 +0100 Subject: [Numpy-discussion] Numeric memory leak when building Numeric.array from numarray.array In-Reply-To: <20061211131621.GE18685@crater.logilab.fr> References: <20061207155020.GA18896@crater.logilab.fr> <1165509382.2588.54.camel@localhost.localdomain> <20061211131621.GE18685@crater.logilab.fr> Message-ID: <1165851198.2847.12.camel@localhost.localdomain> El dl 11 de 12 del 2006 a les 14:16 +0100, en/na Alexandre Fayolle va escriure: > > > I can work around this by using an intermediate string representation: > > > > > > temp = Numeric.fromstring(atest.tostring(), atest.typecode()) > > > temp.shape = atest.shape > > > > Another (faster) workaround would be: > > > > temp2 = Numeric.fromstring(atest._data, typecode=atest.typecode()) > > Nice! Well, I've to say that this approach only work for contiguous, non-offseted arrays, as can be seen in: In [59]:atest = numarray.arange(10) In [60]:Numeric.fromstring(atest[5:]._data, typecode=atest.typecode()) Out[60]:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],'i') # wrong! In [61]:Numeric.fromstring(atest[5:].tostring(), atest.typecode()) Out[61]:array([5, 6, 7, 8, 9],'i') # good In [62]:Numeric.fromstring(atest[::2]._data, typecode=atest.typecode()) Out[62]:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],'i') # wrong! In [63]:Numeric.fromstring(atest[::2].tostring(), atest.typecode()) Out[63]:array([0, 2, 4, 6, 8],'i') # good So, be careful when using it. I'd rather keep using your approach, which is the faster one that is completely general. -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From abli at freemail.hu Mon Dec 11 12:51:29 2006 From: abli at freemail.hu (Abel Daniel) Date: Mon, 11 Dec 2006 17:51:29 +0000 (UTC) Subject: [Numpy-discussion] a==b for numpy arrays Message-ID: > Hi! My unittests got broken because 'a==b' for numpy arrays returns an array instead of returning True or False: >>> import numpy >>> a = numpy.array([1, 2]) >>> b = numpy.array([1, 4]) >>> a==b array([True, False], dtype=bool) This means, for example: >>> if a==b: ... print 'equal' ... Traceback (most recent call last): File "", line 1, in ? ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() >>> Now, I think that having a way of getting an element-wise comparison (i.e. getting an array of bools) is great. _But_ why make that the result of a '==' comparison? Is there any actual code that does, for example >>> result_array = a==b or any variant thereof? Thanks in advance, Daniel From david.huard at gmail.com Mon Dec 11 13:18:18 2006 From: david.huard at gmail.com (David Huard) Date: Mon, 11 Dec 2006 13:18:18 -0500 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: Message-ID: <91cf711d0612111018tbf35b22h1b0b5f26a2fa1264@mail.gmail.com> Hi Daniel, Just out of curiosity, what's wrong with if all(a==b): ... ? Cheers, David 2006/12/11, Abel Daniel : > > > > Hi! > > My unittests got broken because 'a==b' for numpy arrays returns an > array instead of returning True or False: > > >>> import numpy > >>> a = numpy.array([1, 2]) > >>> b = numpy.array([1, 4]) > >>> a==b > array([True, False], dtype=bool) > > This means, for example: > >>> if a==b: > ... print 'equal' > ... > Traceback (most recent call last): > File "", line 1, in ? > ValueError: The truth value of an array with more than one element is > ambiguous. > Use a.any() or a.all() > >>> > > > Now, I think that having a way of getting an element-wise comparison > (i.e. getting an array of bools) is great. _But_ why make that the > result of a '==' comparison? Is there any actual code that does, for > example > >>> result_array = a==b > or any variant thereof? > > Thanks in advance, > Daniel > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 11 14:32:27 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Dec 2006 13:32:27 -0600 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: Message-ID: <457DB24B.6020500@gmail.com> Abel Daniel wrote: > Hi! > > My unittests got broken because 'a==b' for numpy arrays returns an > array instead of returning True or False: > >>>> import numpy >>>> a = numpy.array([1, 2]) >>>> b = numpy.array([1, 4]) >>>> a==b > array([True, False], dtype=bool) > > This means, for example: >>>> if a==b: > ... print 'equal' > ... > Traceback (most recent call last): > File "", line 1, in ? > ValueError: The truth value of an array with more than one element is ambiguous. > Use a.any() or a.all() > > > Now, I think that having a way of getting an element-wise comparison > (i.e. getting an array of bools) is great. _But_ why make that the > result of a '==' comparison? Is there any actual code that does, for > example >>>> result_array = a==b > or any variant thereof? Yes, a lot. Rich comparisons have been in Numeric for quite a long time, now. I'm not sure what version of Numeric you were transitioning from that didn't do this, but it must have been extremely old. I suspect, however, that you were using a relatively recent version of Numeric and simply did not know that the rich comparisons were taking place. Now, what did change from Numeric to numpy (also, from Numeric to numarray) is that arrays can no longer be used as a truth value. It used to be that Numeric arrays' truth value was the same as Numeric.sometrue(arr). It is likely that your unit tests were expecting (a == b) to be the same as Numeric.alltrue(a == b). Since this is not the case, your unit tests had bugs. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From abli at freemail.hu Mon Dec 11 15:41:05 2006 From: abli at freemail.hu (Abel Daniel) Date: Mon, 11 Dec 2006 20:41:05 +0000 (UTC) Subject: [Numpy-discussion] a==b for numpy arrays References: <457DB24B.6020500@gmail.com> Message-ID: Robert Kern gmail.com> writes: > > Abel Daniel wrote: > > Now, I think that having a way of getting an element-wise comparison > > (i.e. getting an array of bools) is great. _But_ why make that the > > result of a '==' comparison? Is there any actual code that does, for > > example > >>>> result_array = a==b > > or any variant thereof? > > Yes, a lot. > And it would be much more cumbersome to use something like numpy.eq_as_array(a,b) or a.eq_as_array(b) in these cases? Could you show an example so that I can better appreciate the difference? The thing I can't get into my head is that '=' in the mathematical sense has a well-defined meaning for matrices, this seems to be broken by the current behaviour. That is, what "A+B" on a blackboard in a math class means maps nicely to what 'a+b' means with a and b being numpy arrays. But 'A=B' means something completely different than 'a==b'. I tried to dig up something about this "'a==b' return an array" decision from the discussion surrounding PEP 207 (on comp.lang.python or on python-dev) but I got lost in that thread. -- Daniel From tim.hochberg at ieee.org Mon Dec 11 16:09:50 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon, 11 Dec 2006 14:09:50 -0700 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: <457DB24B.6020500@gmail.com> Message-ID: <457DC91E.8070509@ieee.org> Abel Daniel wrote: > Robert Kern gmail.com> writes: > > >> Abel Daniel wrote: >> >>> Now, I think that having a way of getting an element-wise comparison >>> (i.e. getting an array of bools) is great. _But_ why make that the >>> result of a '==' comparison? Is there any actual code that does, for >>> example >>> >>>>>> result_array = a==b >>>>>> >>> or any variant thereof? >>> >> Yes, a lot. >> >> > And it would be much more cumbersome to use something like > numpy.eq_as_array(a,b) or a.eq_as_array(b) in these cases? > Yes. > Could you show an example so that I can better appreciate the difference? > # Replace all zeros with something safe so some calculation doesn't go insance. a[a==0] = DELTA Keep in mind also that all of the comparison operators are overloaded. It would be difficult to explain if "a<=0" returned an array, but "a==0" returned a scalar. > The thing I can't get into my head is that '=' in the mathematical sense has a > well-defined meaning for matrices, this seems to be broken by the current > behaviour. Numpy is not really about matrices. Numpy is about array's which are different and, for the most part, more powerful. You can use arrays inside numpyif you insist, but I personally think you're better off just learning to use arrays. Tastes vary though. > That is, what "A+B" on a blackboard in a math class means maps nicely > to what 'a+b' means with a and b being numpy arrays. But 'A=B' means something > completely different than 'a==b'. > One thing to keep in mind is that what you have in mind, which is equivalent to numpy.all(a==b) is almost always a bad idea when using floating point. -tim From robert.kern at gmail.com Mon Dec 11 16:17:39 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Dec 2006 15:17:39 -0600 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: <457DB24B.6020500@gmail.com> Message-ID: <457DCAF3.5080106@gmail.com> Abel Daniel wrote: > Robert Kern gmail.com> writes: > >> Abel Daniel wrote: >>> Now, I think that having a way of getting an element-wise comparison >>> (i.e. getting an array of bools) is great. _But_ why make that the >>> result of a '==' comparison? Is there any actual code that does, for >>> example >>>>>> result_array = a==b >>> or any variant thereof? >> Yes, a lot. >> > And it would be much more cumbersome to use something like > numpy.eq_as_array(a,b) or a.eq_as_array(b) in these cases? numpy.equal() is the ufunc corresponding to the == operation. > Could you show an example so that I can better appreciate the difference? a[a < 0] = 0 a[less(a, 0)] = 0 ma.masked_array(crufty_data, mask=(crufty_data==9999)) ma.masked_array(crufty_data, mask=equal(crufty_data, 9999)) (a >= b) & (a <= c) greater_equal(a,b) & less_equal(a,c) null_space = a[s <= eps] null_space = u[less_equal(s, eps)] > The thing I can't get into my head is that '=' in the mathematical sense has a > well-defined meaning for matrices, this seems to be broken by the current > behaviour. That is, what "A+B" on a blackboard in a math class means maps nicely > to what 'a+b' means with a and b being numpy arrays. But 'A=B' means something > completely different than 'a==b'. Well, yes. Computer languages reuse symbols that have other meanings in other contexts. For that matter 'a = b' in Python is definitely not the same thing as 'A = B' on the blackboard. Suffice it to say that a large majority of people felt that rich comparisons (and specifically rich comparisons for Numeric arrays) were enough of an improvement over the use of functions to do the same thing that we got the language changed to support it. Perhaps it is simply a matter of taste as to whether or not one thinks it is a improvement, but enough people think it is that it won't be changing back. > I tried to dig up something about this "'a==b' return an array" decision from > the discussion surrounding PEP 207 (on comp.lang.python or on python-dev) but I > got lost in that thread. Most of the results of that discussion are in the PEP itself. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.u.r.e.l.i.a.n at gmx.net Mon Dec 11 18:18:18 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Tue, 12 Dec 2006 00:18:18 +0100 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: <457DB24B.6020500@gmail.com> Message-ID: <200612120018.18739.a.u.r.e.l.i.a.n@gmx.net> Hi, > current behaviour. That is, what "A+B" on a blackboard in a math class > means maps nicely to what 'a+b' means with a and b being numpy arrays. But > 'A=B' means something completely different than 'a==b'. This mapping is dangerous, I think A+B and A-B might be the only cases where it actually works. A*B and A/B (=A*inv(B)) are completely different from a*b in python. As it is, you only have to remember that every binary operator works element-wise. Reasoning aside, just wrap an all(...) around your if-comparisons and you will be fine. :-) Johannes From hirzel at resonon.com Mon Dec 11 18:26:56 2006 From: hirzel at resonon.com (Tim Hirzel) Date: Mon, 11 Dec 2006 18:26:56 -0500 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() Message-ID: <457DE940.1060807@resonon.com> Hi, Does anyone know how to get fromfile and tofile to work from a tempfile.TemporaryFile? Or if its not possible? I am getting this: >>> import tempfile >>> f = tempfile.TemporaryFile() >>> f ', mode 'w+b' at 0x01EE1728> >>> a = numpy.arange(10) >>> a.tofile(f) Traceback (most recent call last): File "", line 1, in ? IOError: first argument must be a string or open file thanks! tim From lists.steve at arachnedesign.net Mon Dec 11 18:37:24 2006 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 11 Dec 2006 18:37:24 -0500 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: <457DCAF3.5080106@gmail.com> References: <457DB24B.6020500@gmail.com> <457DCAF3.5080106@gmail.com> Message-ID: Hi, It's not relevant to the point of this discussion all that much, but: > a[a < 0] = 0 > a[less(a, 0)] = 0 Instead I've been doing something like: a[where(a < 0)] = 0 I didn't realized you could do it the other way. Is there a difference somewhere between the two, or are they interchangeable? I kind of like the shorter way (sans where clause) better ... Thanks, -steve From lists.steve at arachnedesign.net Mon Dec 11 18:40:40 2006 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 11 Dec 2006 18:40:40 -0500 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: <457DB24B.6020500@gmail.com> <457DCAF3.5080106@gmail.com> Message-ID: <0E86BF16-C8C7-40B1-95A3-8AC8E2CC9FF6@arachnedesign.net> > It's not relevant to the point of this discussion all that much, but: > >> a[a < 0] = 0 >> a[less(a, 0)] = 0 > > Instead I've been doing something like: > > a[where(a < 0)] = 0 > > I didn't realized you could do it the other way. Is there a > difference somewhere between the two, or are they interchangeable? Ah ... I see, w/o the where returns a boolean array. I reckon that's actually better to use than the where clause for cases like this since (for one) it'll take up less memory than arrays of ints. Sorry for talking to myself ... -steve From kwgoodman at gmail.com Mon Dec 11 18:48:41 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 11 Dec 2006 15:48:41 -0800 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: <0E86BF16-C8C7-40B1-95A3-8AC8E2CC9FF6@arachnedesign.net> References: <457DB24B.6020500@gmail.com> <457DCAF3.5080106@gmail.com> <0E86BF16-C8C7-40B1-95A3-8AC8E2CC9FF6@arachnedesign.net> Message-ID: On 12/11/06, Steve Lianoglou wrote: > > It's not relevant to the point of this discussion all that much, but: > > > >> a[a < 0] = 0 > >> a[less(a, 0)] = 0 > > > > Instead I've been doing something like: > > > > a[where(a < 0)] = 0 > > > > I didn't realized you could do it the other way. Is there a > > difference somewhere between the two, or are they interchangeable? > > Ah ... I see, w/o the where returns a boolean array. I reckon that's > actually better to use than the where clause for cases like this > since (for one) it'll take up less memory than arrays of ints. These are different: a[a[:,0] >0, :] a[where(a[:,0].A >0)[0],:] I think it would be great if the former gave the same result as the latter. From charlesr.harris at gmail.com Mon Dec 11 18:50:45 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 Dec 2006 16:50:45 -0700 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457DE940.1060807@resonon.com> References: <457DE940.1060807@resonon.com> Message-ID: On 12/11/06, Tim Hirzel wrote: > > Hi, > Does anyone know how to get fromfile and tofile to work from a > tempfile.TemporaryFile? Or if its not possible? > > I am getting this: > >>> import tempfile > >>> f = tempfile.TemporaryFile() > >>> f > ', mode 'w+b' at 0x01EE1728> > >>> a = numpy.arange(10) > >>> a.tofile(f) > Traceback (most recent call last): > File "", line 1, in ? > IOError: first argument must be a string or open file Works for me: In [16]: f = tempfile.TemporaryFile() In [17]: a = ones(10) In [18]: a.tofile(f) In [19]: f.seek(0) In [20]: b = fromfile(f) In [21]: b Out[21]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) In [22]: f.close() What version of numpy are you running? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon Dec 11 19:35:54 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 11 Dec 2006 16:35:54 -0800 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: <0E86BF16-C8C7-40B1-95A3-8AC8E2CC9FF6@arachnedesign.net> References: <457DB24B.6020500@gmail.com> <457DCAF3.5080106@gmail.com> <0E86BF16-C8C7-40B1-95A3-8AC8E2CC9FF6@arachnedesign.net> Message-ID: <457DF96A.3080602@noaa.gov> Steve Lianoglou wrote: >> a[where(a < 0)] = 0 > Ah ... I see, w/o the where returns a boolean array. I reckon that's > actually better to use than the where clause for cases like this > since (for one) it'll take up less memory than arrays of ints. not to mention that you're creating an entire temporary array for no reason when you use were. the above statement creates a boolean array for a < 10, then creates another array with the where statement. Where is very handy when you want a new array, created according to some element-wise condition: b = where(a > 0, 10, 0) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From David.L.Goldsmith at noaa.gov Mon Dec 11 20:19:17 2006 From: David.L.Goldsmith at noaa.gov (David Goldsmith) Date: Mon, 11 Dec 2006 17:19:17 -0800 Subject: [Numpy-discussion] a==b for numpy arrays In-Reply-To: References: <457DB24B.6020500@gmail.com> Message-ID: <457E0395.1040404@noaa.gov> Abel Daniel wrote: > to what 'a+b' means with a and b being numpy arrays. But 'A=B' means something > completely different than 'a==b'. > > I disagree: A=B "on the blackboard" does mean that every element in A equals its positionally-corresponding element in B, and a==b in numpy will only be wholly true if a=b in the blackboard sense. As has been said by others in this thread, what needs to be adjusted to (and many off-the-shelf numerical programs have operated this way for years, so it's not like one has to make this adjustment - if they haven't already - only if one is using numpy) is what Robert calls "rich comparisons", i.e., a comparison of arrays/matrices returns a boolean-valued but otherwise similar object, whose elements indicate whether the comparison is true or false at each position. To determine if the comparison returns true for every element, all one has to do is use the 'all' method - not a huge amount of overhead, and now rather ubiquitous (in my experience) throughout the numerical software community (not to mention that rich comparison is _much_ more flexible, and in that, powerful). Oh, and another convenience method with which you should be aware is 'any', which returns true if any of the element-wise comparisons are true. DG > I tried to dig up something about this "'a==b' return an array" decision from > the discussion surrounding PEP 207 (on comp.lang.python or on python-dev) but I > got lost in that thread. > > From david at ar.media.kyoto-u.ac.jp Tue Dec 12 02:49:30 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 12 Dec 2006 16:49:30 +0900 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? Message-ID: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Hi, I am polishing some code to compute autocorrelation using fft, and when testing the code against numpy.correlate, I realised that I am not sure about the definition... There are various function related to correlation as far as numpy/scipoy is concerned: numpy.correlate numpy.corrcoef scipy.signal.correlate For me, the correlation between two sequences X and Y at lag t is the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. numpy.correlate does not use the conjugate, scipy.signal.correlate as well, and I don't understand numpy.corrcoef. I've never seen complex correlation used without the conjugate, so I was curious why this definition was used ? It is incompatible with the correlation as a scalar product, for example. Could someone give the definition used by those function ? cheers, David From svetosch at gmx.net Tue Dec 12 06:18:50 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Tue, 12 Dec 2006 12:18:50 +0100 Subject: [Numpy-discussion] numpy book In-Reply-To: <454A1197.1020606@ee.byu.edu> References: <4549E34A.4060507@gmx.net> <454A1197.1020606@ee.byu.edu> Message-ID: <457E901A.1030907@gmx.net> Travis Oliphant schrieb: >> Note that this is not a request to Travis to send me the latest version >> by private email. That would be inefficient and my need is not that >> urgent. Nevertheless I think that issue should be settled. >> >> > There will be an update, soon. I'm currently working on the index, > corrections, and formatting issues. > > The update will be sent in conjunction with the release of 1.0.1 which I > am targetting in 2 weeks. > I don't want to be a PITA, but should I have received something now that 1.0.1 is out? I can also offer help with formatting/editing the Lyx source file if that's the problem. cheers, sven From charlesr.harris at gmail.com Tue Dec 12 08:12:16 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Dec 2006 06:12:16 -0700 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Message-ID: On 12/12/06, David Cournapeau wrote: > > Hi, > > I am polishing some code to compute autocorrelation using fft, and > when testing the code against numpy.correlate, I realised that I am not > sure about the definition... There are various function related to > correlation as far as numpy/scipoy is concerned: > > numpy.correlate > numpy.corrcoef > scipy.signal.correlate > > For me, the correlation between two sequences X and Y at lag t is > the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. > numpy.correlate does not use the conjugate, scipy.signal.correlate as > well, and I don't understand numpy.corrcoef. I've never seen complex > correlation used without the conjugate, so I was curious why this Neither have I, it is one of those oddities that may have been inherited from Numeric. I wouldn't mind seeing it changed but it is probably a bit late for that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Dec 12 08:17:43 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 12 Dec 2006 22:17:43 +0900 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Message-ID: <457EABF7.3090302@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On 12/12/06, *David Cournapeau* > wrote: > > Hi, > > I am polishing some code to compute autocorrelation using fft, and > when testing the code against numpy.correlate, I realised that I > am not > sure about the definition... There are various function related to > correlation as far as numpy/scipoy is concerned: > > numpy.correlate > numpy.corrcoef > scipy.signal.correlate > > For me, the correlation between two sequences X and Y at lag t is > the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. > numpy.correlate does not use the conjugate, scipy.signal.correlate as > well, and I don't understand numpy.corrcoef. I've never seen complex > correlation used without the conjugate, so I was curious why this > > > Neither have I, it is one of those oddities that may have been > inherited from Numeric. I wouldn't mind seeing it changed but it is > probably a bit late for that. Well, I would myself call this a bug, not a feature, unless at least the doc specifies the behaviour; the point of my question was to get the opinion of other on this point. Anyway, a function to implements the 'real' cross correlation as defined in signal processing and statistics is a must have IMHO, David From aisaac at american.edu Tue Dec 12 09:40:02 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 12 Dec 2006 09:40:02 -0500 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Message-ID: > On 12/12/06, David Cournapeau wrote: >> I am polishing some code to compute autocorrelation using >> fft, and when testing the code against numpy.correlate, >> I realised that I am not sure about the definition... >> There are various function related to correlation as far >> as numpy/scipoy is concerned: >> numpy.correlate >> numpy.corrcoef >> scipy.signal.correlate >> For me, the correlation between two sequences X and Y at lag t is >> the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. >> numpy.correlate does not use the conjugate, scipy.signal.correlate as >> well, and I don't understand numpy.corrcoef. I've never seen complex >> correlation used without the conjugate, so I was curious why this On Tue, 12 Dec 2006, Charles R Harris apparently wrote: > Neither have I, it is one of those oddities that may have > been inherited from Numeric. I wouldn't mind seeing it > changed but it is probably a bit late for that. I hope that "too late" is not a determining argument! I hope the argument will address the following: - was there a justification for the extant behavior? If so, what was it, and does it still seem valid? - is the current definition reasonable; does it match definitions in use in at least some domain? http://mathworld.wolfram.com/Cross-Correlation.html http://en.wikipedia.org/wiki/Cross-correlation - if not, is this behavior so unexpected as to be considered a bug? - are many existing applications depending on it? The worst case is: it is a bug, but many existing users depend on the current behavior. I am not taking a position, but that seems the current view on this list. I hope that *if* that is the assessment, then a transition path will be plotted. For example, a keyword could be added, with a proper default, and a warning emitted when it is not set. Cheers, Alan Isaac From david.huard at gmail.com Tue Dec 12 10:02:12 2006 From: david.huard at gmail.com (David Huard) Date: Tue, 12 Dec 2006 10:02:12 -0500 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Message-ID: <91cf711d0612120702t1e7926e2y6aebd65454accc62@mail.gmail.com> > > - if not, is this behavior so unexpected as to be considered > a bug? > - are many existing applications depending on it? > > The worst case is: > it is a bug, but many existing users depend on the current behavior. > I am not taking a position, but that seems the current view on this list. > I hope that *if* that is the assessment, then a transition > path will be plotted. For example, a keyword could be > added, with a proper default, and a warning emitted when it > is not set. > +1 for a change. I'm not using the current implementation. Since it was undocumented, I prefered coding my own. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From hirzel at resonon.com Tue Dec 12 11:26:33 2006 From: hirzel at resonon.com (Tim Hirzel) Date: Tue, 12 Dec 2006 11:26:33 -0500 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: References: <457DE940.1060807@resonon.com> Message-ID: <457ED839.4080604@resonon.com> Hi Chuck, Thanks for checking that. I am running numpy 1.0.1 in python 2.4 on win32 (xp). Are you on linux? I double checked the behavior in 1.0 and 1.0.1, just to be extra sure, and it thows the IOError in both cases. tim Charles R Harris wrote: > > > On 12/11/06, *Tim Hirzel* > wrote: > > Hi, > Does anyone know how to get fromfile and tofile to work from a > tempfile.TemporaryFile? Or if its not possible? > > I am getting this: > >>> import tempfile > >>> f = tempfile.TemporaryFile () > >>> f > ', mode 'w+b' at 0x01EE1728> > >>> a = numpy.arange(10) > >>> a.tofile(f) > Traceback (most recent call last): > File "", line 1, in ? > IOError: first argument must be a string or open file > > > Works for me: > > In [16]: f = tempfile.TemporaryFile() > > In [17]: a = ones(10) > > In [18]: a.tofile(f) > > In [19]: f.seek(0) > > In [20]: b = fromfile(f) > > In [21]: b > Out[21]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) > > In [22]: f.close() > > What version of numpy are you running? > > Chuck > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Dec 12 11:38:19 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Dec 2006 11:38:19 -0500 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: <91cf711d0612120702t1e7926e2y6aebd65454accc62@mail.gmail.com> References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> <91cf711d0612120702t1e7926e2y6aebd65454accc62@mail.gmail.com> Message-ID: <200612121138.19906.pgmdevlist@gmail.com> > +1 for a change. I'm not using the current implementation. Since it was > undocumented, I prefered coding my own. Same case as David. I found it easier to code something with FFTs than trying to understand what was going on. From tim.hochberg at ieee.org Tue Dec 12 12:07:24 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue, 12 Dec 2006 10:07:24 -0700 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: <457EABF7.3090302@ar.media.kyoto-u.ac.jp> References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> <457EABF7.3090302@ar.media.kyoto-u.ac.jp> Message-ID: <457EE1CC.9080806@ieee.org> David Cournapeau wrote: > Charles R Harris wrote: > >> On 12/12/06, *David Cournapeau* > > wrote: >> >> Hi, >> >> I am polishing some code to compute autocorrelation using fft, and >> when testing the code against numpy.correlate, I realised that I >> am not >> sure about the definition... There are various function related to >> correlation as far as numpy/scipoy is concerned: >> >> numpy.correlate >> numpy.corrcoef >> scipy.signal.correlate >> >> For me, the correlation between two sequences X and Y at lag t is >> the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. >> numpy.correlate does not use the conjugate, scipy.signal.correlate as >> well, and I don't understand numpy.corrcoef. I've never seen complex >> correlation used without the conjugate, so I was curious why this >> >> >> Neither have I, it is one of those oddities that may have been >> inherited from Numeric. I wouldn't mind seeing it changed but it is >> probably a bit late for that. >> > Well, I would myself call this a bug, not a feature, unless at least the > doc specifies the behaviour; the point of my question was to get the > opinion of other on this point. Anyway, a function to implements the > 'real' cross correlation as defined in signal processing and statistics > is a must have IMHO, > It's unfriendly to modify the behavior of a function like this in a point release. And, this particular type of modification is particularly unfriendly since any code that depends on the current behavior won't break cleanly, but will start producing failures, possibly intermittent, data dependent failures, which are especially troublesome. In addition, neither the name correlation nor its docstring is strongly, cough, correlated with cross-correlation. The docstring claims that it's the "discrete, linear correlation", which appears to mean nothing in my far from exhaustive web search. So rather than "fixing" the function, I would first propose introducing a function with a more descriptive name and docstring , for example you could steal the name 'xcorr' from matlab. Then if in fact the behavior of correlate is deemed to be an error, deprecate it and start issuing a warning in the next point release, then remove it in the next major release. Even better, IMO, would be if someone who cares about this stuff pulls together all the related signal processing stuff and moves them to a submodule so we could actually find what signal processing primitives are available. At the same time, more informative docstrings would be a great. -tim From charlesr.harris at gmail.com Tue Dec 12 13:00:38 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Dec 2006 11:00:38 -0700 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457ED839.4080604@resonon.com> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> Message-ID: On 12/12/06, Tim Hirzel wrote: > > Hi Chuck, > Thanks for checking that. I am running numpy 1.0.1 in python 2.4 on > win32 (xp). Are you on linux? I double checked the behavior in 1.0 > and 1.0.1, just to be extra sure, and it thows the IOError in both cases. > tim I'm running linux and the current svn version of numpy. Maybe the problem is with the tempfile module on windows. Do fromfile and tofile work for files opened normally? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Tue Dec 12 13:06:03 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 12 Dec 2006 13:06:03 -0500 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: <457EE1CC.9080806@ieee.org> References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> <457EABF7.3090302@ar.media.kyoto-u.ac.jp><457EE1CC.9080806@ieee.org> Message-ID: On Tue, 12 Dec 2006, Tim Hochberg apparently wrote: > So rather than "fixing" the function, I would first > propose introducing a function with a more descriptive > name and docstring , for example you could steal the name > 'xcorr' from matlab. Then if in fact the behavior of > correlate is deemed to be an error, deprecate it and start > issuing a warning in the next point release, then remove > it in the next major release. This is also a good way forward. The important thing is to find a way forward. Cheers, Alan Isaac From oliphant.travis at ieee.org Tue Dec 12 13:30:45 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue, 12 Dec 2006 11:30:45 -0700 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> Message-ID: <457EF555.5010401@ieee.org> > > On 12/12/06, *David Cournapeau* > wrote: > > Hi, > > I am polishing some code to compute autocorrelation using fft, and > when testing the code against numpy.correlate, I realised that I > am not > sure about the definition... There are various function related to > correlation as far as numpy/scipoy is concerned: > > numpy.correlate > numpy.corrcoef > scipy.signal.correlate > > For me, the correlation between two sequences X and Y at lag t is > the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y. > numpy.correlate does not use the conjugate, scipy.signal.correlate as > well, and I don't understand numpy.corrcoef. I've never seen complex > correlation used without the conjugate, so I was curious why this > > > Neither have I, it is one of those oddities that may have been > inherited from Numeric. I wouldn't mind seeing it changed but it is > probably a bit late for that. It is inherited from Numeric and can't really change. We can move forward with a different function, however, that uses the conjugate for complex data. The non-conjugated version is still well-defined, however. Convolution, for example, is defined without the conjugation, and the correlate function is the basis for that computation. So, it is not a good idea to change it. The scipy.signal.correlate function is a generalization to N-D of the numpy.correlate function which is 1-d only, the numpy.corrcoef function is completely different and just computes the correlation coefficients from the covariance matrix assuming observations of random vectors. -Travis From hirzel at resonon.com Tue Dec 12 14:24:31 2006 From: hirzel at resonon.com (Tim Hirzel) Date: Tue, 12 Dec 2006 14:24:31 -0500 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> Message-ID: <457F01EF.9040600@resonon.com> > I'm running linux and the current svn version of numpy. Maybe the > problem is with the tempfile module on windows. Do fromfile and tofile > work for files opened normally? > > Chuck fromfile and tofile work fine on regular files. From skimming the code a bit, it's hard to imagine numpy code is the culprit, since it must be getting a NULL pointer back from PyFile_AsFile(file)... Perhaps this is a question for a python dev list? My gut says it's probably something in the windows tempfile module. But perhaps in the PyFile_AsFile(file) implementation. Seems one of those isn't playing nice. It's all quite mysterious to me... tim From gamercier at yahoo.com Tue Dec 12 15:16:52 2006 From: gamercier at yahoo.com (Gustavo Mercier) Date: Tue, 12 Dec 2006 12:16:52 -0800 (PST) Subject: [Numpy-discussion] Installation - Numpy 1.0.1 Suse Linux 10.1 Message-ID: <261866.81649.qm@web31803.mail.mud.yahoo.com> Hi! I am trying to install Numpy 1.0.1 in a Linux Box (Suse Linux 10.1; gcc 4.1x). I have done this successfully with previous versions up to 1.0b5. However, I now run into problems. It compiles and installs ok, but upon opening python and importing numpy it hangs with an unresolved reference when loading linalg. The reference is to a _gfortran...function. I use Atlas and this was compiled with gfortran, and following the instructions to combine lapack with atlas. blas was also compiled the same way. I see that Numpy is being compiled with g77. I am trying to add the reference to the gfortran library object in /usr/lib but I don't see how to do this as an option in distutils. I also try to change the compiler from g77 to gfortran, but I failed. Setting --fcompiler anc --ccompiler options led to failures due to trying to compile c code with the fortran compiler. Any suggestions? Thanks for your help! -- Gustavo A. Mercier, Jr. MD,PhD gamercier at yahoo.com 469-396-6750 - cell -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Dec 12 15:25:29 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 12 Dec 2006 12:25:29 -0800 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457F01EF.9040600@resonon.com> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> Message-ID: <457F1039.2080508@noaa.gov> did you try reading and writing to/from that temp file with regular old python functions? -Chris Tim Hirzel wrote: >> I'm running linux and the current svn version of numpy. Maybe the >> problem is with the tempfile module on windows. Do fromfile and tofile >> work for files opened normally? >> >> Chuck > > fromfile and tofile work fine on regular files. From skimming the code > a bit, it's hard to imagine numpy code is the culprit, since it must be > getting a NULL pointer back from PyFile_AsFile(file)... Perhaps this is > a question for a python dev list? My gut says it's probably something > in the windows tempfile module. But perhaps in the PyFile_AsFile(file) > implementation. Seems one of those isn't playing nice. It's all quite > mysterious to me... > > tim > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From hirzel at resonon.com Tue Dec 12 16:56:08 2006 From: hirzel at resonon.com (Tim Hirzel) Date: Tue, 12 Dec 2006 16:56:08 -0500 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457F1039.2080508@noaa.gov> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> <457F1039.2080508@noaa.gov> Message-ID: <457F2578.4040703@resonon.com> Good thought Chris, Normal reading and writing does seem to work. .. But, my friend Daniel figured out a workaround when I asked to confirm this behavior on his windows setup (and it is does behave the same for him). The first clue was this: >>> f = tempfile.TemporaryFile() >>> type(f) >>> g = open("temp","w+b") >>> type(g) so Daniel did a dir(f) and found the 'file' attribute so if you do (where 'a' is a numpy array) >>> a.tofile(f.file) It works! writing to the "file" attribute of the TemporaryFile, it works fine! So that's good, but still a little hinky. Especially since it works on linux... on a linux platform, what does type(tempfile.TemporaryFile()) return? I assume an as well... anways, so at least there is a quick repair for now. Good news is, I assume using 'f.file' would work on linux too in terms of having a single cross-platform solution. cheers, tim Christopher Barker wrote: > did you try reading and writing to/from that temp file with regular old > python functions? > > -Chris > > > Tim Hirzel wrote: > >>> I'm running linux and the current svn version of numpy. Maybe the >>> problem is with the tempfile module on windows. Do fromfile and tofile >>> work for files opened normally? >>> >>> Chuck >>> >> fromfile and tofile work fine on regular files. From skimming the code >> a bit, it's hard to imagine numpy code is the culprit, since it must be >> getting a NULL pointer back from PyFile_AsFile(file)... Perhaps this is >> a question for a python dev list? My gut says it's probably something >> in the windows tempfile module. But perhaps in the PyFile_AsFile(file) >> implementation. Seems one of those isn't playing nice. It's all quite >> mysterious to me... >> >> tim >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> > > From oliphant at ee.byu.edu Tue Dec 12 17:26:44 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue, 12 Dec 2006 15:26:44 -0700 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457F2578.4040703@resonon.com> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> <457F1039.2080508@noaa.gov> <457F2578.4040703@resonon.com> Message-ID: <457F2CA4.2050402@ee.byu.edu> Tim Hirzel wrote: >Good thought Chris, >Normal reading and writing does seem to work. .. >But, my friend Daniel figured out a workaround when I asked to confirm >this behavior on his windows setup (and it is does behave the same for >him). The first clue was this: > > >>> f = tempfile.TemporaryFile() > >>> type(f) > > >>> g = open("temp","w+b") > >>> type(g) > > >so Daniel did a dir(f) and found the 'file' attribute > >so if you do (where 'a' is a numpy array) > >>> a.tofile(f.file) >It works! > >writing to the "file" attribute of the TemporaryFile, it works fine! So >that's good, but still a little hinky. Especially since it works on >linux... >on a linux platform, what does type(tempfile.TemporaryFile()) return? I >assume an as well... > >anways, so at least there is a quick repair for now. Good news is, I >assume using 'f.file' would work on linux too in terms of having a >single cross-platform solution. > > There is no file attribute on Linux. On linux you get >>> type(f) So, you might have to do something like: if not isinstance(f, file): f = f.file before passing f to the tofile method. It seems to me that the temporary file mechanism on Windows is a little odd. -Travis From dalcinl at gmail.com Tue Dec 12 18:31:33 2006 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 12 Dec 2006 20:31:33 -0300 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457F2CA4.2050402@ee.byu.edu> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> <457F1039.2080508@noaa.gov> <457F2578.4040703@resonon.com> <457F2CA4.2050402@ee.byu.edu> Message-ID: > > It seems to me that the temporary file mechanism on Windows is a little > odd. > Indeed, looking at sources, the posix version uses the mkstemp/unlink idiom.. but in win it uses a bit of hackery. It seems opened files cannot be unlinked. if _os.name != 'posix' or _os.sys.platform == 'cygwin': # On non-POSIX and Cygwin systems, assume that we cannot unlink a file # while it is open. TemporaryFile = NamedTemporaryFile else: def TemporaryFile(mode='w+b', bufsize=-1, suffix="", prefix=template, dir=None): ............ (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags) try: _os.unlink(name) return _os.fdopen(fd, mode, bufsize) except: _os.close(fd) raise -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From charlesr.harris at gmail.com Tue Dec 12 19:02:38 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Dec 2006 17:02:38 -0700 Subject: [Numpy-discussion] a.T Message-ID: Hi all, I'm curious about the error thrown when I use a.T as the left side of a multiply assign. In the following, I am multiplying each the rows of 'a' by the corresponding element of arange(n), i.e., broadcasting from the left. The result looks fine, but an error is thrown after the operation is complete. In [62]: a = arange(12).reshape(3,2,2) In [63]: a Out[63]: array([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11]]]) In [64]: a.T *= arange(3) --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) /home/charris/workspace/microsat/tycho-work/ TypeError: attribute 'T' of 'numpy.ndarray' objects is not writable In [65]: a Out[65]: array([[[ 0, 0], [ 0, 0]], [[ 4, 5], [ 6, 7]], [[16, 18], [20, 22]]]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 12 19:09:11 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Dec 2006 17:09:11 -0700 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: <457F2CA4.2050402@ee.byu.edu> References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> <457F1039.2080508@noaa.gov> <457F2578.4040703@resonon.com> <457F2CA4.2050402@ee.byu.edu> Message-ID: On 12/12/06, Travis Oliphant wrote: > > Tim Hirzel wrote: > > >Good thought Chris, > >Normal reading and writing does seem to work. .. > >But, my friend Daniel figured out a workaround when I asked to confirm > >this behavior on his windows setup (and it is does behave the same for > >him). The first clue was this: > > > > >>> f = tempfile.TemporaryFile() > > >>> type(f) > > > > >>> g = open("temp","w+b") > > >>> type(g) > > > > > >so Daniel did a dir(f) and found the 'file' attribute > > > >so if you do (where 'a' is a numpy array) > > >>> a.tofile(f.file) > >It works! > > > >writing to the "file" attribute of the TemporaryFile, it works fine! So > >that's good, but still a little hinky. Especially since it works on > >linux... > >on a linux platform, what does type(tempfile.TemporaryFile()) return? I > >assume an as well... > > > >anways, so at least there is a quick repair for now. Good news is, I > >assume using 'f.file' would work on linux too in terms of having a > >single cross-platform solution. > > > > > There is no file attribute on Linux. On linux you get > > >>> type(f) > > > So, you might have to do something like: > > if not isinstance(f, file): > f = f.file > > before passing f to the tofile method. > > It seems to me that the temporary file mechanism on Windows is a little > odd. Looks like a tempfile bug to me. Python should be cross platform and since the file attribute is correctly set on windows, I don't see why the tempfile can't be made to behave correctly. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Dec 12 19:11:35 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Dec 2006 19:11:35 -0500 Subject: [Numpy-discussion] a.T In-Reply-To: References: Message-ID: <200612121911.36014.pgmdevlist@gmail.com> On Tuesday 12 December 2006 19:02, Charles R Harris wrote: > Hi all, > > I'm curious about the error thrown when I use a.T as the left side of a > multiply assign. I Chuck, if you keep in mind that .T is a shortcut for .transpose(), you'll understand why you can't assign to a function call. From david at ar.media.kyoto-u.ac.jp Tue Dec 12 22:19:20 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 13 Dec 2006 12:19:20 +0900 Subject: [Numpy-discussion] Definition of correlation, correlate and so on ? In-Reply-To: <457EE1CC.9080806@ieee.org> References: <457E5F0A.4080707@ar.media.kyoto-u.ac.jp> <457EABF7.3090302@ar.media.kyoto-u.ac.jp> <457EE1CC.9080806@ieee.org> Message-ID: <457F7138.9010401@ar.media.kyoto-u.ac.jp> Tim Hochberg wrote: > > So rather than "fixing" the function, I would first propose introducing > a function with a more descriptive name and docstring , for example you > could steal the name 'xcorr' from matlab. Then if in fact the behavior > of correlate is deemed to be an error, deprecate it and start issuing a > warning in the next point release, then remove it in the next major release. That was my idea too: specifiy in the docstring that this does not compute the correlation, and put a new function xcorr (or whatever name). The good news being this function is already done for rank up to 2, with basic tests... :) > > Even better, IMO, would be if someone who cares about this stuff pulls > together all the related signal processing stuff and moves them to a > submodule so we could actually find what signal processing primitives > are available. At the same time, more informative docstrings would be a > great. > Do you mean signal function in numpy or scipy ? For scipy, this is already done (module scipy.signals), David From cameron.walsh at gmail.com Tue Dec 12 22:27:34 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Wed, 13 Dec 2006 12:27:34 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets Message-ID: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> Hi all, I'm trying to generate histograms of extremely large datasets. I've tried a few methods, listed below, all with their own shortcomings. Mailing-list archive and google searches have not revealed any solutions. Method 1: import numpy import matplotlib data=numpy.empty((489,1000,1000),dtype="uint8") # Replace this line with actual data samples, but the size and types are correct. histogram = pylab.hist(data, bins=range(0,256)) pylab.xlim(0,256) pylab.show() The problem with this method is it appears to never finish. It is however, extremely fast for smaller data sets, like 5x1000x1000 (1-2 seconds) instead of 500x1000x1000. Method 2: import numpy import matplotlib data=numpy.empty((489,1000,1000),dtype="uint8") # Replace this line with actual data samples, but the size and types are correct. bins=numpy.zeros((256),dtype="uint32") for val in data.flat: bins[val]+=1 barchart = pylab.bar(xrange(256),bins,align="center") pylab.xlim(0,256) pylab.show() The problem with this method is it is incredibly slow, taking up to 30 seconds for a 1x1000x1000 sample, I have neither the patience nor the inclination to time a 500x1000x1000 sample. Method 3: import numpy data=numpy.empty((489,1000,1000),dtype="uint8") # Replace this line with actual data samples, but the size and types are correct. a=numpy.histogram(data,256) The problem with this one is: Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.5/site-packages/numpy/lib/function_base.py", line 96, in histogram n = sort(a).searchsorted(bins) ValueError: dimensions too large. It seems that iterating over the entire array and doing it manually is the slowest possible method, but that the rest are not much better. Is there a faster method available, or do I have to implement method 2 in C and submit the change as a patch? Thanks and best regards, Cameron. From eric at enthought.com Wed Dec 13 02:42:09 2006 From: eric at enthought.com (eric jones) Date: Wed, 13 Dec 2006 01:42:09 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> Message-ID: <457FAED1.6010803@enthought.com> Hey Cameron, I wrote a simple weave based histogram function that should work for your problem. It should work for any array input data type. The needed files (and a few tests and examples) are attached. Below is the output from the histogram_speed.py file attached. The test takes about 10 seconds to bin a uniformly distributed set of data from a 1000x1000x100 uint8 array into 256 bins. It compares Travis' nifty new iterator based indexing in numpy to raw C indexing of a contiguous array. The two algorithms give identical results, and the speed difference is negligible. That's cool because the iterator based stuff makes this sort of algorithms quite easy to handle for N-dimensional. Hope that helps, eric ps. For those who care, I had to make a minor change to the array type converters so that they can be used with the iterator interface more easily. Later this will be folded into weave, but for now I sub-classed the standard array converter and made the modifications. # speed test output. c:\eric\code\histogram> histogram_speed.py type: uint8 millions of elements: 100.0 sec (C indexing based): 9.52776707654 [390141 390352 390598 389706 390985 390856 389785 390262 389929 391024 391854 390243 391255 390723 390525 391751 389842 391612 389601 391210 390799 391674 390693 390381 390460 389839 390185 390909 390215 391271 390934 390818 390528 389990 389982 389667 391035 390317 390616 390916 390191 389771 391448 390325 390556 391333 390148 390894 389611 390511 390614 390999 389646 391255 391284 391214 392106 391067 391480 389991 391091 390271 389801 390044 391459 390644 391309 390450 390200 391537 390907 390160 391117 390738 391638 391200 390815 390611 390355 389925 390939 390932 391569 390287 389987 389545 391140 391280 389773 389794 389559 390085 389991 391372 390189 391010 390863 390432 390743 390959 389271 390210 390967 390999 391177 389777 391748 390623 391597 392009 389308 390557 390213 390930 390449 390327 390600 390626 389985 390816 389671 390187 390595 390973 390921 390599 390167 391196 390381 391345 392166 389709 390656 389886 390646 390355 391273 391342 390234 390751 390515 390048 390455 391122 391069 390968 390488 390708 391027 391179 391110 390453 390632 390825 391369 390844 390001 391487 390778 390788 390609 390254 389907 391803 391508 391414 391012 389987 389284 390699 391094 390658 390463 390291 390848 389616 390894 389561 390971 391165 391378 391698 389434 390591 390027 391088 390787 391165 390169 391212 389799 389829 389764 390435 391158 391834 391206 390041 391537 390237 390253 391025 392336 391081 390005 391057 390226 390240 390197 389906 391164 391157 390639 391501 389125 389922 390961 390012 389832 389650 390018 390461 390695 390140 390939 389089 391094 390076 391123 389518 391340 390039 390786 391751 391133 390675 392305 390667 391243 389889 390103 390438 389215 389805 392180 391351 389923 390932 390136 390556 389684 390324 390152 390982 391355] sec (numpy iteration based): 10.3055525213 [390141 390352 390598 389706 390985 390856 389785 390262 389929 391024 391854 390243 391255 390723 390525 391751 389842 391612 389601 391210 390799 391674 390693 390381 390460 389839 390185 390909 390215 391271 390934 390818 390528 389990 389982 389667 391035 390317 390616 390916 390191 389771 391448 390325 390556 391333 390148 390894 389611 390511 390614 390999 389646 391255 391284 391214 392106 391067 391480 389991 391091 390271 389801 390044 391459 390644 391309 390450 390200 391537 390907 390160 391117 390738 391638 391200 390815 390611 390355 389925 390939 390932 391569 390287 389987 389545 391140 391280 389773 389794 389559 390085 389991 391372 390189 391010 390863 390432 390743 390959 389271 390210 390967 390999 391177 389777 391748 390623 391597 392009 389308 390557 390213 390930 390449 390327 390600 390626 389985 390816 389671 390187 390595 390973 390921 390599 390167 391196 390381 391345 392166 389709 390656 389886 390646 390355 391273 391342 390234 390751 390515 390048 390455 391122 391069 390968 390488 390708 391027 391179 391110 390453 390632 390825 391369 390844 390001 391487 390778 390788 390609 390254 389907 391803 391508 391414 391012 389987 389284 390699 391094 390658 390463 390291 390848 389616 390894 389561 390971 391165 391378 391698 389434 390591 390027 391088 390787 391165 390169 391212 389799 389829 389764 390435 391158 391834 391206 390041 391537 390237 390253 391025 392336 391081 390005 391057 390226 390240 390197 389906 391164 391157 390639 391501 389125 389922 390961 390012 389832 389650 390018 390461 390695 390140 390939 389089 391094 390076 391123 389518 391340 390039 390786 391751 391133 390675 392305 390667 391243 389889 390103 390438 389215 389805 392180 391351 389923 390932 390136 390556 389684 390324 390152 390982 391355] 0 Cameron Walsh wrote: > Hi all, > > I'm trying to generate histograms of extremely large datasets. I've > tried a few methods, listed below, all with their own shortcomings. > Mailing-list archive and google searches have not revealed any > solutions. > > Method 1: > > import numpy > import matplotlib > > data=numpy.empty((489,1000,1000),dtype="uint8") > # Replace this line with actual data samples, but the size and types > are correct. > > histogram = pylab.hist(data, bins=range(0,256)) > pylab.xlim(0,256) > pylab.show() > > The problem with this method is it appears to never finish. It is > however, extremely fast for smaller data sets, like 5x1000x1000 (1-2 > seconds) instead of 500x1000x1000. > > > Method 2: > > import numpy > import matplotlib > > data=numpy.empty((489,1000,1000),dtype="uint8") > # Replace this line with actual data samples, but the size and types > are correct. > > bins=numpy.zeros((256),dtype="uint32") > for val in data.flat: > bins[val]+=1 > barchart = pylab.bar(xrange(256),bins,align="center") > pylab.xlim(0,256) > pylab.show() > > The problem with this method is it is incredibly slow, taking up to 30 > seconds for a 1x1000x1000 sample, I have neither the patience nor the > inclination to time a 500x1000x1000 sample. > > > Method 3: > > import numpy > > data=numpy.empty((489,1000,1000),dtype="uint8") > # Replace this line with actual data samples, but the size and types > are correct. > > a=numpy.histogram(data,256) > > > The problem with this one is: > > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.5/site-packages/numpy/lib/function_base.py", > line 96, in histogram > n = sort(a).searchsorted(bins) > ValueError: dimensions too large. > > > It seems that iterating over the entire array and doing it manually is > the slowest possible method, but that the rest are not much better. > Is there a faster method available, or do I have to implement method 2 > in C and submit the change as a patch? > > Thanks and best regards, > > Cameron. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- A non-text attachment was scrubbed... Name: weave_histogram.py Type: text/x-python Size: 2533 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: histogram_speed.py Type: text/x-python Size: 702 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_weave_histogram.py Type: text/x-python Size: 2170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typed_array_converter.py Type: text/x-python Size: 1582 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: weave_contiguous_histogram.py Type: text/x-python Size: 2388 bytes Desc: not available URL: From cameron.walsh at gmail.com Wed Dec 13 03:27:22 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Wed, 13 Dec 2006 17:27:22 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <457FAED1.6010803@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> Message-ID: <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> On 13/12/06, eric jones wrote 290 lines of awesome code and a fantastic explanation: > Hey Cameron, > > I wrote a simple weave based histogram function that should work for > your problem. It should work for any array input data type. The needed > files (and a few tests and examples) are attached. Thank you very much, they seem to be exactly what I need. I haven't yet been able to test it all completely, as for some reason I'm missing the zlib module. That might have to wait till tomorrow depending on how the next half hour goes. > > Below is the output from the histogram_speed.py file attached. The test > takes about 10 seconds to bin a uniformly distributed set of data from a > 1000x1000x100 uint8 array into 256 bins. It compares Travis' nifty new > iterator based indexing in numpy to raw C indexing of a contiguous > array. The two algorithms give identical results, and the speed > difference is negligible. That's cool because the iterator based stuff > makes this sort of algorithms quite easy to handle for N-dimensional. If that's the case, assuming our machines are the same, your new code is around 5 times faster. That brings it back to a reasonable time frame. I'll let you know as soon as I can how it all works. > > Hope that helps, > eric It certainly does! Cameron. > > ps. For those who care, I had to make a minor change to the array type > converters so that they can be used with the iterator interface more > easily. Later this will be folded into weave, but for now I sub-classed > the standard array converter and made the modifications. > > # speed test output. > c:\eric\code\histogram> histogram_speed.py > type: uint8 > millions of elements: 100.0 > sec (C indexing based): 9.52776707654 > [390141 390352 390598 389706 390985 390856 389785 390262 389929 391024 > 391854 390243 391255 390723 390525 391751 389842 391612 389601 391210 > 390799 391674 390693 390381 390460 389839 390185 390909 390215 391271 > 390934 390818 390528 389990 389982 389667 391035 390317 390616 390916 > 390191 389771 391448 390325 390556 391333 390148 390894 389611 390511 > 390614 390999 389646 391255 391284 391214 392106 391067 391480 389991 > 391091 390271 389801 390044 391459 390644 391309 390450 390200 391537 > 390907 390160 391117 390738 391638 391200 390815 390611 390355 389925 > 390939 390932 391569 390287 389987 389545 391140 391280 389773 389794 > 389559 390085 389991 391372 390189 391010 390863 390432 390743 390959 > 389271 390210 390967 390999 391177 389777 391748 390623 391597 392009 > 389308 390557 390213 390930 390449 390327 390600 390626 389985 390816 > 389671 390187 390595 390973 390921 390599 390167 391196 390381 391345 > 392166 389709 390656 389886 390646 390355 391273 391342 390234 390751 > 390515 390048 390455 391122 391069 390968 390488 390708 391027 391179 > 391110 390453 390632 390825 391369 390844 390001 391487 390778 390788 > 390609 390254 389907 391803 391508 391414 391012 389987 389284 390699 > 391094 390658 390463 390291 390848 389616 390894 389561 390971 391165 > 391378 391698 389434 390591 390027 391088 390787 391165 390169 391212 > 389799 389829 389764 390435 391158 391834 391206 390041 391537 390237 > 390253 391025 392336 391081 390005 391057 390226 390240 390197 389906 > 391164 391157 390639 391501 389125 389922 390961 390012 389832 389650 > 390018 390461 390695 390140 390939 389089 391094 390076 391123 389518 > 391340 390039 390786 391751 391133 390675 392305 390667 391243 389889 > 390103 390438 389215 389805 392180 391351 389923 390932 390136 390556 > 389684 390324 390152 390982 391355] > sec (numpy iteration based): 10.3055525213 > [390141 390352 390598 389706 390985 390856 389785 390262 389929 391024 > 391854 390243 391255 390723 390525 391751 389842 391612 389601 391210 > 390799 391674 390693 390381 390460 389839 390185 390909 390215 391271 > 390934 390818 390528 389990 389982 389667 391035 390317 390616 390916 > 390191 389771 391448 390325 390556 391333 390148 390894 389611 390511 > 390614 390999 389646 391255 391284 391214 392106 391067 391480 389991 > 391091 390271 389801 390044 391459 390644 391309 390450 390200 391537 > 390907 390160 391117 390738 391638 391200 390815 390611 390355 389925 > 390939 390932 391569 390287 389987 389545 391140 391280 389773 389794 > 389559 390085 389991 391372 390189 391010 390863 390432 390743 390959 > 389271 390210 390967 390999 391177 389777 391748 390623 391597 392009 > 389308 390557 390213 390930 390449 390327 390600 390626 389985 390816 > 389671 390187 390595 390973 390921 390599 390167 391196 390381 391345 > 392166 389709 390656 389886 390646 390355 391273 391342 390234 390751 > 390515 390048 390455 391122 391069 390968 390488 390708 391027 391179 > 391110 390453 390632 390825 391369 390844 390001 391487 390778 390788 > 390609 390254 389907 391803 391508 391414 391012 389987 389284 390699 > 391094 390658 390463 390291 390848 389616 390894 389561 390971 391165 > 391378 391698 389434 390591 390027 391088 390787 391165 390169 391212 > 389799 389829 389764 390435 391158 391834 391206 390041 391537 390237 > 390253 391025 392336 391081 390005 391057 390226 390240 390197 389906 > 391164 391157 390639 391501 389125 389922 390961 390012 389832 389650 > 390018 390461 390695 390140 390939 389089 391094 390076 391123 389518 > 391340 390039 390786 391751 391133 390675 392305 390667 391243 389889 > 390103 390438 389215 389805 392180 391351 389923 390932 390136 390556 > 389684 390324 390152 390982 391355] > 0 > > > Cameron Walsh wrote: > > Hi all, > > > > I'm trying to generate histograms of extremely large datasets. I've > > tried a few methods, listed below, all with their own shortcomings. > > Mailing-list archive and google searches have not revealed any > > solutions. > > > > Method 1: > > > > import numpy > > import matplotlib > > > > data=numpy.empty((489,1000,1000),dtype="uint8") > > # Replace this line with actual data samples, but the size and types > > are correct. > > > > histogram = pylab.hist(data, bins=range(0,256)) > > pylab.xlim(0,256) > > pylab.show() > > > > The problem with this method is it appears to never finish. It is > > however, extremely fast for smaller data sets, like 5x1000x1000 (1-2 > > seconds) instead of 500x1000x1000. > > > > > > Method 2: > > > > import numpy > > import matplotlib > > > > data=numpy.empty((489,1000,1000),dtype="uint8") > > # Replace this line with actual data samples, but the size and types > > are correct. > > > > bins=numpy.zeros((256),dtype="uint32") > > for val in data.flat: > > bins[val]+=1 > > barchart = pylab.bar(xrange(256),bins,align="center") > > pylab.xlim(0,256) > > pylab.show() > > > > The problem with this method is it is incredibly slow, taking up to 30 > > seconds for a 1x1000x1000 sample, I have neither the patience nor the > > inclination to time a 500x1000x1000 sample. > > > > > > Method 3: > > > > import numpy > > > > data=numpy.empty((489,1000,1000),dtype="uint8") > > # Replace this line with actual data samples, but the size and types > > are correct. > > > > a=numpy.histogram(data,256) > > > > > > The problem with this one is: > > > > Traceback (most recent call last): > > File "", line 1, in > > File "/usr/local/lib/python2.5/site-packages/numpy/lib/function_base.py", > > line 96, in histogram > > n = sort(a).searchsorted(bins) > > ValueError: dimensions too large. > > > > > > It seems that iterating over the entire array and doing it manually is > > the slowest possible method, but that the rest are not much better. > > Is there a faster method available, or do I have to implement method 2 > > in C and submit the change as a patch? > > > > Thanks and best regards, > > > > Cameron. > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > From faltet at carabos.com Wed Dec 13 12:28:01 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed, 13 Dec 2006 18:28:01 +0100 Subject: [Numpy-discussion] .byteswap() and copy/view dilemma Message-ID: <1166030881.2846.64.camel@localhost.localdomain> Hi, I'm a bit confused about which cases the .byteswap() method in NumPy would return a copy or a view. From the docstrings: """ a.byteswap(False) -> View or copy. Swap the bytes in the array. Swap the bytes in the array. Return the byteswapped array. If the first argument is TRUE, byteswap in-place and return a reference to self. """ >From the above description, it is not clear to me whether the .byteswap(False) would return a copy or a view. However, the official NumPy book seems to explain this clearer: """ Byteswap the elements of the array and return the byteswapped array. If the argument is True, then byteswap in-place and return a reference to self. Otherwise, return a copy of the array with the elements byteswapped. The data-type descriptor is not changed so the array will have changed numbers. """ My experiments also show that .byteswap(False) does do a copy: In [154]:a=numpy.array([1,2,3]) In [155]:b=a.byteswap() In [156]:b Out[156]:array([16777216, 33554432, 50331648]) In [157]:a[1]=0 In [158]:b Out[158]:array([16777216, 33554432, 50331648]) I'm wondering if I'm the only one that finds this docstring confusing. Regards, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From pgmdevlist at gmail.com Wed Dec 13 15:22:06 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 Dec 2006 15:22:06 -0500 Subject: [Numpy-discussion] rollaxis Message-ID: <200612131522.06193.pgmdevlist@gmail.com> All, I have a ND array whose axes I want to reorganize, so that axis "i" is at the end while the others stay in their relative position. What's the easiest ? From peridot.faceted at gmail.com Wed Dec 13 15:29:03 2006 From: peridot.faceted at gmail.com (A. M. Archibald) Date: Wed, 13 Dec 2006 15:29:03 -0500 Subject: [Numpy-discussion] rollaxis In-Reply-To: <200612131522.06193.pgmdevlist@gmail.com> References: <200612131522.06193.pgmdevlist@gmail.com> Message-ID: On 13/12/06, Pierre GM wrote: > All, > I have a ND array whose axes I want to reorganize, so that axis "i" is at the > end while the others stay in their relative position. What's the easiest ? Generate an axis-permutation tuple and use transpose: s = list(A.shape) s.remove(i) s.append(i) B = A.transpose(s) A. M. Archibald From hirzel at resonon.com Wed Dec 13 16:28:02 2006 From: hirzel at resonon.com (Tim Hirzel) Date: Wed, 13 Dec 2006 16:28:02 -0500 Subject: [Numpy-discussion] fromfile and tofile access with a tempfile.TemporaryFile() In-Reply-To: References: <457DE940.1060807@resonon.com> <457ED839.4080604@resonon.com> <457F01EF.9040600@resonon.com> <457F1039.2080508@noaa.gov> <457F2578.4040703@resonon.com> <457F2CA4.2050402@ee.byu.edu> Message-ID: <45807062.3070603@resonon.com> Thanks for everyone's help and thoughts. I agree that this behavior is buggy. I submitted a bug report to the python project at sourceforge, with a link to this thread. Hopefully the report will be helpful. tim Charles R Harris wrote: > > > On 12/12/06, *Travis Oliphant* > wrote: > > Tim Hirzel wrote: > > >Good thought Chris, > >Normal reading and writing does seem to work. .. > >But, my friend Daniel figured out a workaround when I asked to > confirm > >this behavior on his windows setup (and it is does behave the > same for > >him). The first clue was this: > > > > >>> f = tempfile.TemporaryFile() > > >>> type(f) > > > > >>> g = open("temp","w+b") > > >>> type(g) > > > > > >so Daniel did a dir(f) and found the 'file' attribute > > > >so if you do (where 'a' is a numpy array) > > >>> a.tofile(f.file) > >It works! > > > >writing to the "file" attribute of the TemporaryFile, it works > fine! So > >that's good, but still a little hinky. Especially since it works on > >linux... > >on a linux platform, what does type( tempfile.TemporaryFile()) > return? I > >assume an as well... > > > >anways, so at least there is a quick repair for now. Good news is, I > >assume using 'f.file' would work on linux too in terms of having a > >single cross-platform solution. > > > > > There is no file attribute on Linux. On linux you get > > >>> type(f) > > > So, you might have to do something like: > > if not isinstance(f, file): > f = f.file > > before passing f to the tofile method. > > It seems to me that the temporary file mechanism on Windows is a > little > odd. > > > Looks like a tempfile bug to me. Python should be cross platform and > since the file attribute is correctly set on windows, I don't see why > the tempfile can't be made to behave correctly. > > Chuck > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Wed Dec 13 16:44:35 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 Dec 2006 16:44:35 -0500 Subject: [Numpy-discussion] rollaxis In-Reply-To: References: <200612131522.06193.pgmdevlist@gmail.com> Message-ID: <200612131644.36536.pgmdevlist@gmail.com> On Wednesday 13 December 2006 15:29, A. M. Archibald wrote: > Generate an axis-permutation tuple and use transpose: Ah OK. It took me a little while to get it running: instead of s=list(A.shape) in your example, one should read s=range(A.ndim) But it does the trick, thanks a lot! And now, double or nothing: Samething, but with rows or columns: For example, put a 5th column in the far right, without modifying the relative positions of the others. Thanks in advance ! From pgmdevlist at gmail.com Wed Dec 13 17:54:55 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 Dec 2006 17:54:55 -0500 Subject: [Numpy-discussion] rollaxis In-Reply-To: References: <200612131522.06193.pgmdevlist@gmail.com> <200612131644.36536.pgmdevlist@gmail.com> Message-ID: <200612131754.55240.pgmdevlist@gmail.com> > Generate a column-permutation tuple and use fancy indexing: Works like a charm, thanks a lot ! From cameron.walsh at gmail.com Wed Dec 13 20:32:05 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Thu, 14 Dec 2006 10:32:05 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> Message-ID: <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> On 13/12/06, Cameron Walsh wrote: > On 13/12/06, eric jones wrote 290 lines of > awesome code and a fantastic explanation: > > > Hey Cameron, > > > > I wrote a simple weave based histogram function that should work for > > your problem. It should work for any array input data type. The needed > > files (and a few tests and examples) are attached. [...] Hi Eric, I've ran test_weave_histogram.py and histogram_speed.py, but each one seems to fail on calling typed_array_converter.declaration_code() as follows: Traceback (most recent call last): File "histogram_speed.py", line 26, in res2 = histogram(data, bins) File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/weave_histogram.py", line 67, in histogram compiler='gcc') File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", line 339, in inline **kw) File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", line 447, in compile_function verbose=verbose, **kw) File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", line 353, in compile kw,file = self.build_kw_and_file(location,kw) File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", line 334, in build_kw_and_file file = self.generate_file(location=location) File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", line 295, in generate_file code = self.module_code() File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", line 203, in module_code self.function_code(), File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", line 269, in function_code all_function_code += func.function_code() File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", line 83, in function_code decl_code = indent(self.arg_declaration_code(),4) File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", line 62, in arg_declaration_code arg_strings.append(arg.declaration_code(inline=1)) File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/typed_array_converter.py", line 18, in declaration_code code = super(typed_array_converter, self).declaration_code(templatize, TypeError: super() argument 1 must be type, not classobj I've tried this with Python2.4 and Python2.5 with the same results. What do I need to change, since it seems to have worked for you but not for me. Thanks and best regards, Cameron. From wbaxter at gmail.com Wed Dec 13 21:06:30 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Thu, 14 Dec 2006 11:06:30 +0900 Subject: [Numpy-discussion] linalg.lstsq for complex In-Reply-To: References: Message-ID: Is this code from linalg.lstsq for the complex case correct? lapack_routine = lapack_lite.zgelsd lwork = 1 rwork = zeros((lwork,), real_t) work = zeros((lwork,),t) results = lapack_routine(m, n, n_rhs, a, m, bstar, ldb, s, rcond, 0, work, -1, rwork, iwork, 0) lwork = int(abs(work[0])) rwork = zeros((lwork,),real_t) a_real = zeros((m,n),real_t) bstar_real = zeros((ldb,n_rhs,),real_t) results = lapack_lite.dgelsd(m, n, n_rhs, a_real, m, bstar_real, ldb, s, rcond, 0, rwork, -1, iwork, 0) lrwork = int(rwork[0]) work = zeros((lwork,), t) rwork = zeros((lrwork,), real_t) results = lapack_routine(m, n, n_rhs, a, m, bstar, ldb, s, rcond, The middle call to dgelsd looks unnecessary to me. At the very least, allocating astar_real and bstar_real shouldn't be necessary since they aren't referenced anywhere else in the lstsq function. The lapack documentation for zgelsd also doesn't mention any need to call dgelsd to compute the size of the work array. --bb From eric at enthought.com Wed Dec 13 21:17:26 2006 From: eric at enthought.com (eric jones) Date: Wed, 13 Dec 2006 20:17:26 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> Message-ID: <4580B436.8040203@enthought.com> Hmmm. ? Not sure. ? Change that line to this instead which should work as well. code = array_converter.declaration_code(self, templatize, inline) Both work for me. eric Cameron Walsh wrote: > On 13/12/06, Cameron Walsh wrote: > >> On 13/12/06, eric jones wrote 290 lines of >> awesome code and a fantastic explanation: >> >> >>> Hey Cameron, >>> >>> I wrote a simple weave based histogram function that should work for >>> your problem. It should work for any array input data type. The needed >>> files (and a few tests and examples) are attached. >>> > [...] > > Hi Eric, > > I've ran test_weave_histogram.py and histogram_speed.py, but each one > seems to fail on calling typed_array_converter.declaration_code() as > follows: > > Traceback (most recent call last): > File "histogram_speed.py", line 26, in > res2 = histogram(data, bins) > File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/weave_histogram.py", > line 67, in histogram > compiler='gcc') > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > line 339, in inline > **kw) > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > line 447, in compile_function > verbose=verbose, **kw) > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > line 353, in compile > kw,file = self.build_kw_and_file(location,kw) > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > line 334, in build_kw_and_file > file = self.generate_file(location=location) > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > line 295, in generate_file > code = self.module_code() > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > line 203, in module_code > self.function_code(), > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > line 269, in function_code > all_function_code += func.function_code() > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > line 83, in function_code > decl_code = indent(self.arg_declaration_code(),4) > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > line 62, in arg_declaration_code > arg_strings.append(arg.declaration_code(inline=1)) > File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/typed_array_converter.py", > line 18, in declaration_code > code = super(typed_array_converter, self).declaration_code(templatize, > TypeError: super() argument 1 must be type, not classobj > > I've tried this with Python2.4 and Python2.5 with the same results. > > What do I need to change, since it seems to have worked for you but not for me. > > Thanks and best regards, > > Cameron. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From cameron.walsh at gmail.com Wed Dec 13 22:03:50 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Thu, 14 Dec 2006 12:03:50 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4580B436.8040203@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> Message-ID: <106309950612131903l43b1d67fsaf71aba632725780@mail.gmail.com> Thanks very much, Eric. That line fixed it for me, although I'm still not sure why it broke with the last line. Your weave_histogram works a charm and is around 16 times faster than any of the other options I've tried. On my laptop it took 30 seconds to generate a histogram from 500 million numbers, which is fine. Thanks and best regards, Cameron. On 14/12/06, eric jones wrote: > Hmmm. > > ? Not sure. ? > > Change that line to this instead which should work as well. > > code = array_converter.declaration_code(self, templatize, inline) > > Both work for me. > > eric > > Cameron Walsh wrote: > > On 13/12/06, Cameron Walsh wrote: > > > >> On 13/12/06, eric jones wrote 290 lines of > >> awesome code and a fantastic explanation: > >> > >> > >>> Hey Cameron, > >>> > >>> I wrote a simple weave based histogram function that should work for > >>> your problem. It should work for any array input data type. The needed > >>> files (and a few tests and examples) are attached. > >>> > > [...] > > > > Hi Eric, > > > > I've ran test_weave_histogram.py and histogram_speed.py, but each one > > seems to fail on calling typed_array_converter.declaration_code() as > > follows: > > > > Traceback (most recent call last): > > File "histogram_speed.py", line 26, in > > res2 = histogram(data, bins) > > File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/weave_histogram.py", > > line 67, in histogram > > compiler='gcc') > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > > line 339, in inline > > **kw) > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > > line 447, in compile_function > > verbose=verbose, **kw) > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > > line 353, in compile > > kw,file = self.build_kw_and_file(location,kw) > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > > line 334, in build_kw_and_file > > file = self.generate_file(location=location) > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > > line 295, in generate_file > > code = self.module_code() > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > > line 203, in module_code > > self.function_code(), > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", > > line 269, in function_code > > all_function_code += func.function_code() > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > > line 83, in function_code > > decl_code = indent(self.arg_declaration_code(),4) > > File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", > > line 62, in arg_declaration_code > > arg_strings.append(arg.declaration_code(inline=1)) > > File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/typed_array_converter.py", > > line 18, in declaration_code > > code = super(typed_array_converter, self).declaration_code(templatize, > > TypeError: super() argument 1 must be type, not classobj > > > > I've tried this with Python2.4 and Python2.5 with the same results. > > > > What do I need to change, since it seems to have worked for you but not for me. > > > > Thanks and best regards, > > > > Cameron. > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From eric at enthought.com Wed Dec 13 23:35:38 2006 From: eric at enthought.com (eric jones) Date: Wed, 13 Dec 2006 22:35:38 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612131903l43b1d67fsaf71aba632725780@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> <106309950612131903l43b1d67fsaf71aba632725780@mail.gmail.com> Message-ID: <4580D49A.70303@enthought.com> Glad to here it worked for you. see ya, eric Cameron Walsh wrote: > Thanks very much, Eric. That line fixed it for me, although I'm still > not sure why it broke with the last line. > > Your weave_histogram works a charm and is around 16 times faster than > any of the other options I've tried. On my laptop it took 30 seconds > to generate a histogram from 500 million numbers, which is fine. > > Thanks and best regards, > > Cameron. > > > On 14/12/06, eric jones wrote: > >> Hmmm. >> >> ? Not sure. ? >> >> Change that line to this instead which should work as well. >> >> code = array_converter.declaration_code(self, templatize, inline) >> >> Both work for me. >> >> eric >> >> Cameron Walsh wrote: >> >>> On 13/12/06, Cameron Walsh wrote: >>> >>> >>>> On 13/12/06, eric jones wrote 290 lines of >>>> awesome code and a fantastic explanation: >>>> >>>> >>>> >>>>> Hey Cameron, >>>>> >>>>> I wrote a simple weave based histogram function that should work for >>>>> your problem. It should work for any array input data type. The needed >>>>> files (and a few tests and examples) are attached. >>>>> >>>>> >>> [...] >>> >>> Hi Eric, >>> >>> I've ran test_weave_histogram.py and histogram_speed.py, but each one >>> seems to fail on calling typed_array_converter.declaration_code() as >>> follows: >>> >>> Traceback (most recent call last): >>> File "histogram_speed.py", line 26, in >>> res2 = histogram(data, bins) >>> File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/weave_histogram.py", >>> line 67, in histogram >>> compiler='gcc') >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", >>> line 339, in inline >>> **kw) >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", >>> line 447, in compile_function >>> verbose=verbose, **kw) >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", >>> line 353, in compile >>> kw,file = self.build_kw_and_file(location,kw) >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", >>> line 334, in build_kw_and_file >>> file = self.generate_file(location=location) >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", >>> line 295, in generate_file >>> code = self.module_code() >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", >>> line 203, in module_code >>> self.function_code(), >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/ext_tools.py", >>> line 269, in function_code >>> all_function_code += func.function_code() >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", >>> line 83, in function_code >>> decl_code = indent(self.arg_declaration_code(),4) >>> File "/usr/local/lib/python2.5/site-packages/scipy/weave/inline_tools.py", >>> line 62, in arg_declaration_code >>> arg_strings.append(arg.declaration_code(inline=1)) >>> File "/home/cameron/repos/wavesmaker/trunk/code/gui/process_modules/eric_histo/typed_array_converter.py", >>> line 18, in declaration_code >>> code = super(typed_array_converter, self).declaration_code(templatize, >>> TypeError: super() argument 1 must be type, not classobj >>> >>> I've tried this with Python2.4 and Python2.5 with the same results. >>> >>> What do I need to change, since it seems to have worked for you but not for me. >>> >>> Thanks and best regards, >>> >>> Cameron. >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://projects.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From rlw at stsci.edu Wed Dec 13 23:40:03 2006 From: rlw at stsci.edu (Rick White) Date: Wed, 13 Dec 2006 23:40:03 -0500 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> Message-ID: <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: > I'm trying to generate histograms of extremely large datasets. I've > tried a few methods, listed below, all with their own shortcomings. > Mailing-list archive and google searches have not revealed any > solutions. The numpy.histogram function can be modified to use memory much more efficiently when the input array is large, and the modification turns out to be faster even for smallish arrays (in my tests, anyway). Below is a modified version of the histogram function from function_base.py. It is almost identical, but it avoids doing the sort of the entire input array simply by dividing it into blocks. (It would be even better to avoid the call to ravel too.) The only other messy detail is that the builtin range function is shadowed by the 'range' parameter. In my timing tests this is about the same speed for arrays about the same size as the block size and is faster than the current version by 30-40% for large arrays. The speed difference increases as the array size increases. I haven't compared this to Eric's weave function, but this has the advantages of being pure Python and of being much simpler. On my machine (MacBook Pro) it takes about 4 seconds for an array with 100 million elements. The time increases perfectly linearly with array size for arrays larger than a million elements. Rick from numpy import * lrange = range def histogram(a, bins=10, range=None, normed=False): a = asarray(a).ravel() if not iterable(bins): if range is None: range = (a.min(), a.max()) mn, mx = [mi+0.0 for mi in range] if mn == mx: mn -= 0.5 mx += 0.5 bins = linspace(mn, mx, bins, endpoint=False) # best block size probably depends on processor cache size block = 65536 n = sort(a[:block]).searchsorted(bins) for i in lrange(block,len(a),block): n += sort(a[i:i+block]).searchsorted(bins) n = concatenate([n, [len(a)]]) n = n[1:]-n[:-1] if normed: db = bins[1] - bins[0] return 1.0/(a.size*db) * n, bins else: return n, bins From eric at enthought.com Thu Dec 14 01:03:45 2006 From: eric at enthought.com (eric jones) Date: Thu, 14 Dec 2006 00:03:45 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> Message-ID: <4580E941.4030805@enthought.com> Looks to me like Rick's version is simpler and faster.It looks like it offers a speed-up of about 1.6 on my machine over the weave version. I believe this is because the sorting approach results in quite a few less compares than the algorithm I used. Very cool. I vote that his version go into numpy. eric Rick White wrote: > On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: > > >> I'm trying to generate histograms of extremely large datasets. I've >> tried a few methods, listed below, all with their own shortcomings. >> Mailing-list archive and google searches have not revealed any >> solutions. >> > > The numpy.histogram function can be modified to use memory much more > efficiently when the input array is large, and the modification turns > out to be faster even for smallish arrays (in my tests, anyway). > Below is a modified version of the histogram function from > function_base.py. It is almost identical, but it avoids doing the > sort of the entire input array simply by dividing it into blocks. > (It would be even better to avoid the call to ravel too.) The only > other messy detail is that the builtin range function is shadowed by > the 'range' parameter. > > In my timing tests this is about the same speed for arrays about the > same size as the block size and is faster than the current version by > 30-40% for large arrays. The speed difference increases as the array > size increases. > > I haven't compared this to Eric's weave function, but this has the > advantages of being pure Python and of being much simpler. On my > machine (MacBook Pro) it takes about 4 seconds for an array with 100 > million elements. The time increases perfectly linearly with array > size for arrays larger than a million elements. > Rick > > from numpy import * > > lrange = range > def histogram(a, bins=10, range=None, normed=False): > a = asarray(a).ravel() > if not iterable(bins): > if range is None: > range = (a.min(), a.max()) > mn, mx = [mi+0.0 for mi in range] > if mn == mx: > mn -= 0.5 > mx += 0.5 > bins = linspace(mn, mx, bins, endpoint=False) > > # best block size probably depends on processor cache size > block = 65536 > n = sort(a[:block]).searchsorted(bins) > for i in lrange(block,len(a),block): > n += sort(a[i:i+block]).searchsorted(bins) > n = concatenate([n, [len(a)]]) > n = n[1:]-n[:-1] > > if normed: > db = bins[1] - bins[0] > return 1.0/(a.size*db) * n, bins > else: > return n, bins > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From cameron.walsh at gmail.com Thu Dec 14 02:56:09 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Thu, 14 Dec 2006 16:56:09 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4580E941.4030805@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> Message-ID: <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> Hi all, Absolutely gorgeous, I confirm the 1.6x speed-up over the weave version, i.e. a 25x speed-up over the existing version. It would be good if the redefinition of the range function could be changed in the numpy modules, before it goes into subversion, to avoid the need for Rick's line lrange=range before the new histogram function. At some point I might try and test different cache sizes for different data-set sizes and see what the effect is. For now, 65536 seems a good number and I would be happy to see this replace the current numpy.histogram. Thanks very much Eric and Rick, you've both taught me a lot, as well as solving the original problem. I'm sure this will be of use to others in the future, so if there's anything I can do to assist in getting this into the next numpy release, please let me know. Best regards, Cameron. On 14/12/06, eric jones wrote: > Looks to me like Rick's version is simpler and faster.It looks like it > offers a speed-up of about 1.6 on my machine over the weave version. I > believe this is because the sorting approach results in quite a few less > compares than the algorithm I used. > > Very cool. I vote that his version go into numpy. > > eric > > > > Rick White wrote: > > On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: > > > > > >> I'm trying to generate histograms of extremely large datasets. I've > >> tried a few methods, listed below, all with their own shortcomings. > >> Mailing-list archive and google searches have not revealed any > >> solutions. > >> > > > > The numpy.histogram function can be modified to use memory much more > > efficiently when the input array is large, and the modification turns > > out to be faster even for smallish arrays (in my tests, anyway). > > Below is a modified version of the histogram function from > > function_base.py. It is almost identical, but it avoids doing the > > sort of the entire input array simply by dividing it into blocks. > > (It would be even better to avoid the call to ravel too.) The only > > other messy detail is that the builtin range function is shadowed by > > the 'range' parameter. > > > > In my timing tests this is about the same speed for arrays about the > > same size as the block size and is faster than the current version by > > 30-40% for large arrays. The speed difference increases as the array > > size increases. > > > > I haven't compared this to Eric's weave function, but this has the > > advantages of being pure Python and of being much simpler. On my > > machine (MacBook Pro) it takes about 4 seconds for an array with 100 > > million elements. The time increases perfectly linearly with array > > size for arrays larger than a million elements. > > Rick > > > > from numpy import * > > > > lrange = range > > def histogram(a, bins=10, range=None, normed=False): > > a = asarray(a).ravel() > > if not iterable(bins): > > if range is None: > > range = (a.min(), a.max()) > > mn, mx = [mi+0.0 for mi in range] > > if mn == mx: > > mn -= 0.5 > > mx += 0.5 > > bins = linspace(mn, mx, bins, endpoint=False) > > > > # best block size probably depends on processor cache size > > block = 65536 > > n = sort(a[:block]).searchsorted(bins) > > for i in lrange(block,len(a),block): > > n += sort(a[i:i+block]).searchsorted(bins) > > n = concatenate([n, [len(a)]]) > > n = n[1:]-n[:-1] > > > > if normed: > > db = bins[1] - bins[0] > > return 1.0/(a.size*db) * n, bins > > else: > > return n, bins > > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From rlw at stsci.edu Thu Dec 14 08:30:05 2006 From: rlw at stsci.edu (Rick White) Date: Thu, 14 Dec 2006 08:30:05 -0500 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> Message-ID: <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote: > At some point I might try and test > different cache sizes for different data-set sizes and see what the > effect is. For now, 65536 seems a good number and I would be happy to > see this replace the current numpy.histogram. I experimented a little on my machine and found that 64k was a good size, but it is fairly insensitive to the size over a wide range (16000 to 1e6). I'd be interested to hear how this scales on other machines -- I'm pretty sure that the ideal size will keep the piece of the array being sorted smaller than the on-chip cache. Just so we don't get too smug about the speed, if I do this in IDL on the same machine it is 10 times faster (0.28 seconds instead of 4 seconds). I'm sure the IDL version uses the much faster approach of just sweeping through the array once, incrementing counts in the appropriate bins. It only handles equal-sized bins, so it is not as general as the numpy version -- but equal-sized bins is a very common case. I'd still like to see a C version of histogram (which I guess would need to be a ufunc) go into the core numpy. Rick From giorgio.luciano at chimica.unige.it Thu Dec 14 08:31:47 2006 From: giorgio.luciano at chimica.unige.it (Giorgio Luciano) Date: Thu, 14 Dec 2006 14:31:47 +0100 Subject: [Numpy-discussion] empty data matrix (are they really empty ?) In-Reply-To: <4580B436.8040203@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> Message-ID: <45815243.9070406@chimica.unige.it> I was converting a matlab file to my new favority scientific language Numpy :) In the old file I created a matrix on the fly. I know that Numpy and python cannot do that so I found a workaround here's the code lev2=empty((1,h)) ir=1 for j in arange(1,nstep+2): #a=gr[[arange(ir-1,ir+nstep)],:] a2=gr[arange(ir-1,ir+nstep)] #flev=dot(a2,dot(disper,a2.transpose())) clev=diag(dot(a2,dot(disper,a2.transpose()))) lev2=vstack((lev2,clev)) #print ir #print clev #print h ir=ir+nstep+1 lev=lev2[1:,] print lev So First I create the empty matrix Secon perform the calculation Third take the matrix and exclude the first line since it has "dummy" values because after I need to plot it with contour(lev) H,K = meshgrid(lab,lab) fig=p.figure() ax=p3.Axes3D(fig) ax.plot_wireframe(H,K,lev) p.show() p.close everything works fine.. but is this really necessary ? could not an empy just just be "really empty" ? Thanks for the answers Cheers Giorgio From svetosch at gmx.net Thu Dec 14 09:19:42 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu, 14 Dec 2006 15:19:42 +0100 Subject: [Numpy-discussion] empty data matrix (are they really empty ?) In-Reply-To: <45815243.9070406@chimica.unige.it> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> <45815243.9070406@chimica.unige.it> Message-ID: <45815D7E.3090109@gmx.net> [you probably should have started a new thread instead of replying to another one...] Giorgio Luciano schrieb: > In the old file I created a matrix on the fly. I know that Numpy and > python cannot do that so I found a workaround I'm not sure what you mean what numpy cannot do, but... > here's the code > > lev2=empty((1,h)) > lev=lev2[1:,] > So > First I create the empty matrix > Secon perform the calculation > Third take the matrix and exclude the first line since it has "dummy" ... > > everything works fine.. but is this really necessary ? could not an empy > just just be "really empty" ? > Thanks for the answers > > you can use lev2 = empty((0,h)) as a starting point for adding rows, it works and then nothing "dummy"-like will be in lev2 hth, sven From Chris.Barker at noaa.gov Thu Dec 14 12:40:43 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 14 Dec 2006 09:40:43 -0800 Subject: [Numpy-discussion] empty data matrix (are they really empty ?) In-Reply-To: <45815D7E.3090109@gmx.net> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> <45815243.9070406@chimica.unige.it> <45815D7E.3090109@gmx.net> Message-ID: <45818C9B.5060404@noaa.gov> Sven Schreiber wrote: >> In the old file I created a matrix on the fly. I know that Numpy and >> python cannot do that so I found a workaround numpy can create matrices on the fly, in fact, you are doing that with this code! The only thing it doesn't do is have a lateral that joins matrices the way matlab does -- you need to use vstack and the like. >> First I create the empty matrix To get better performance, you could create the entire empty matrix, not just one row -- this is the same as MATLAB -- if you know how big your matrix is going to be, it's better to create it first with "zeros". In numpy you can use either zeros or empty - just make sure that if you use empty, you fill the whole thing later, or you'll get garbage. Your code: lev2=empty((1,h)) # you've just created and empty single row . . . lev2=vstack((lev2,clev)) #now you are creating a whole new array, with one more row than before. The alternative: lev2=empty((nstep+1,h)) 3 create the whole empty array ir=1 for j in arange(1,nstep+2): a2=gr[arange(ir-1,ir+nstep)] clev=diag(dot(a2,dot(disper,a2.transpose()))) lev2[j,:] = clev # fill in the row you've just calculated ir=ir+nstep+1 print lev2 I may have got some of the indexing wrong, but I hope you get the idea. By the way, if you sent a complete, runnable sample, we can test out suggestions, and you'll get better answers. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ellisonbg.net at gmail.com Thu Dec 14 12:55:00 2006 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 14 Dec 2006 10:55:00 -0700 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> Message-ID: <6ce0ac130612140955i6e1c79b9n60b17e169589b173@mail.gmail.com> This same idea could be used to parallelize the histogram computation. Then you could really get into large (many Gb/TB/PB) data sets. I might try to find time to do this with ipython1, but someone else could do this as well. Brian On 12/13/06, Rick White wrote: > On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: > > > I'm trying to generate histograms of extremely large datasets. I've > > tried a few methods, listed below, all with their own shortcomings. > > Mailing-list archive and google searches have not revealed any > > solutions. > > The numpy.histogram function can be modified to use memory much more > efficiently when the input array is large, and the modification turns > out to be faster even for smallish arrays (in my tests, anyway). > Below is a modified version of the histogram function from > function_base.py. It is almost identical, but it avoids doing the > sort of the entire input array simply by dividing it into blocks. > (It would be even better to avoid the call to ravel too.) The only > other messy detail is that the builtin range function is shadowed by > the 'range' parameter. > > In my timing tests this is about the same speed for arrays about the > same size as the block size and is faster than the current version by > 30-40% for large arrays. The speed difference increases as the array > size increases. > > I haven't compared this to Eric's weave function, but this has the > advantages of being pure Python and of being much simpler. On my > machine (MacBook Pro) it takes about 4 seconds for an array with 100 > million elements. The time increases perfectly linearly with array > size for arrays larger than a million elements. > Rick > > from numpy import * > > lrange = range > def histogram(a, bins=10, range=None, normed=False): > a = asarray(a).ravel() > if not iterable(bins): > if range is None: > range = (a.min(), a.max()) > mn, mx = [mi+0.0 for mi in range] > if mn == mx: > mn -= 0.5 > mx += 0.5 > bins = linspace(mn, mx, bins, endpoint=False) > > # best block size probably depends on processor cache size > block = 65536 > n = sort(a[:block]).searchsorted(bins) > for i in lrange(block,len(a),block): > n += sort(a[i:i+block]).searchsorted(bins) > n = concatenate([n, [len(a)]]) > n = n[1:]-n[:-1] > > if normed: > db = bins[1] - bins[0] > return 1.0/(a.size*db) * n, bins > else: > return n, bins > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From tim.hochberg at ieee.org Thu Dec 14 13:21:25 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Thu, 14 Dec 2006 11:21:25 -0700 Subject: [Numpy-discussion] Pyrex and numpy Message-ID: <45819625.8040803@ieee.org> I was just going to try pyrex out with numpy to see how it compares with weave (which is cool but quirky). My first attempt ended in failure: I tried to compile the demo in in numpy/doc/pyrex and got this error: c_numpy.pxd:99:22: Array element cannot be a Python object Does anyone who uses pyrex see this? Does anyone know what it's from? Not that I deleted numpyx.c, since otherwise pyrex isn't invoked at all? -tim From eric at enthought.com Thu Dec 14 14:25:20 2006 From: eric at enthought.com (eric jones) Date: Thu, 14 Dec 2006 13:25:20 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> Message-ID: <4581A520.4030806@enthought.com> Rick White wrote: > Just so we don't get too smug about the speed, if I do this in IDL on > the same machine it is 10 times faster (0.28 seconds instead of 4 > seconds). I'm sure the IDL version uses the much faster approach of > just sweeping through the array once, incrementing counts in the > appropriate bins. It only handles equal-sized bins, so it is not as > general as the numpy version -- but equal-sized bins is a very common > case. I'd still like to see a C version of histogram (which I guess > would need to be a ufunc) go into the core numpy. > Yes, this gets rid of the search, and indices can just be caluclated from offsets. I've attached a modified weaved histogram that takes this approach. Running the snippet below on my machine takes .118 sec for the evenly binned weave algorithm and 0.385 sec for Rick's algorithm on 5 million elements. That is close to 4x faster (but not 10x...), so there is indeed some speed to be gained for the common special case. I don't know if the code I wrote has a 2x gain left in it, but I've spent zero time optimizing it. I'd bet it can be improved substantially. eric ### test_weave_even_histogram.py from numpy import arange, product, sum, zeros, uint8 from numpy.random import randint import weave_even_histogram import time shape = 1000,1000,5 size = product(shape) data = randint(0,256,size).astype(uint8) bins = arange(256+1) print 'type:', data.dtype print 'millions of elements:', size/1e6 bin_start = 0 bin_size = 1 bin_count = 256 t1 = time.clock() res = weave_even_histogram.histogram(data, bin_start, bin_size, bin_count) t2 = time.clock() print 'sec (evenly spaced):', t2-t1, sum(res) print res > Rick > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- A non-text attachment was scrubbed... Name: weave_even_histogram.py Type: text/x-python Size: 1726 bytes Desc: not available URL: From eric at enthought.com Thu Dec 14 14:27:35 2006 From: eric at enthought.com (eric jones) Date: Thu, 14 Dec 2006 13:27:35 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4581A520.4030806@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> <4581A520.4030806@enthought.com> Message-ID: <4581A5A7.5030406@enthought.com> I just noticed a bug in this code. "PyArray_ITER_NEXT(iter);" should be moved out of the if statement. eric eric jones wrote: > > > Rick White wrote: >> Just so we don't get too smug about the speed, if I do this in IDL >> on the same machine it is 10 times faster (0.28 seconds instead of >> 4 seconds). I'm sure the IDL version uses the much faster approach >> of just sweeping through the array once, incrementing counts in the >> appropriate bins. It only handles equal-sized bins, so it is not as >> general as the numpy version -- but equal-sized bins is a very >> common case. I'd still like to see a C version of histogram (which >> I guess would need to be a ufunc) go into the core numpy. >> > Yes, this gets rid of the search, and indices can just be caluclated > from offsets. I've attached a modified weaved histogram that takes > this approach. Running the snippet below on my machine takes .118 sec > for the evenly binned weave algorithm and 0.385 sec for Rick's > algorithm on 5 million elements. That is close to 4x faster (but not > 10x...), so there is indeed some speed to be gained for the common > special case. I don't know if the code I wrote has a 2x gain left in > it, but I've spent zero time optimizing it. I'd bet it can be > improved substantially. > > eric > > ### test_weave_even_histogram.py > > from numpy import arange, product, sum, zeros, uint8 > from numpy.random import randint > > import weave_even_histogram > > import time > > shape = 1000,1000,5 > size = product(shape) > data = randint(0,256,size).astype(uint8) > bins = arange(256+1) > > print 'type:', data.dtype > print 'millions of elements:', size/1e6 > > bin_start = 0 > bin_size = 1 > bin_count = 256 > t1 = time.clock() > res = weave_even_histogram.histogram(data, bin_start, bin_size, > bin_count) > t2 = time.clock() > print 'sec (evenly spaced):', t2-t1, sum(res) > print res > > >> Rick >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ------------------------------------------------------------------------ > > from numpy import array, zeros, asarray, sort, int32 > from scipy import weave > from typed_array_converter import converters > > def histogram(ary, bin_start, bin_size, bin_count): > > ary = asarray(ary) > > # Create an array to hold the histogram count results. > results = zeros(bin_count,dtype=int32) > > # The C++ code that actually does the histogramming. > code = """ > PyArrayIterObject *iter = (PyArrayIterObject*)PyArray_IterNew(py_ary); > > while(iter->index < iter->size) > { > > ////////////////////////////////////////////////////////// > // binary search > ////////////////////////////////////////////////////////// > > // This requires an update to weave > ary_data_type value = *((ary_data_type*)iter->dataptr); > if (value>=bin_start) > { > int bin_index = (int)((value-bin_start)/bin_size); > > ////////////////////////////////////////////////////////// > // Bin counter increment > ////////////////////////////////////////////////////////// > > // If the value was found, increment the counter for that bin. > if (bin_index < bin_count) > { > results[bin_index]++; > } > PyArray_ITER_NEXT(iter); > } > } > """ > weave.inline(code, ['ary', 'bin_start', 'bin_size','bin_count', 'results'], > type_converters=converters, > compiler='gcc') > > return results > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From faltet at carabos.com Thu Dec 14 14:30:47 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu, 14 Dec 2006 20:30:47 +0100 Subject: [Numpy-discussion] Pyrex and numpy In-Reply-To: <45819625.8040803@ieee.org> References: <45819625.8040803@ieee.org> Message-ID: <1166124647.2645.38.camel@localhost.localdomain> El dj 14 de 12 del 2006 a les 11:21 -0700, en/na Tim Hochberg va escriure: > > I was just going to try pyrex out with numpy to see how it compares with > weave (which is cool but quirky). My first attempt ended in failure: I > tried to compile the demo in in numpy/doc/pyrex and got this error: > > c_numpy.pxd:99:22: Array element cannot be a Python object > > > Does anyone who uses pyrex see this? Does anyone know what it's from? > Not that I deleted numpyx.c, since otherwise pyrex isn't invoked at all? > Mmm, I can compile and run the example just fine. That's strange because your Pyrex error seems to tell that NPY_MAXDIMS is a Python object instead of an integer. But in my numpy installation, NPY_MAXDIMS is defined in ndarrayobject.h, which should be imported automatically by Pyrex in: cdef extern from "numpy/arrayobject.h": block (which should include ndarrayobject.h). Sorry for not being able to help more, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From david.huard at gmail.com Thu Dec 14 14:45:24 2006 From: david.huard at gmail.com (David Huard) Date: Thu, 14 Dec 2006 14:45:24 -0500 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4581A520.4030806@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> <4581A520.4030806@enthought.com> Message-ID: <91cf711d0612141145u2fce5b3bwc09bcc191be5f8f1@mail.gmail.com> Hi, I spent some time a while ago on an histogram function for numpy. It uses digitize and bincount instead of sorting the data. If I remember right, it was significantly faster than numpy's histogram, but I don't know how it will behave with very large data sets. I attached the file if you want to take a look, or if you me the benchmark, I'll add it to it and report the results. Cheers, David 2006/12/14, eric jones : > > > > Rick White wrote: > > Just so we don't get too smug about the speed, if I do this in IDL on > > the same machine it is 10 times faster (0.28 seconds instead of 4 > > seconds). I'm sure the IDL version uses the much faster approach of > > just sweeping through the array once, incrementing counts in the > > appropriate bins. It only handles equal-sized bins, so it is not as > > general as the numpy version -- but equal-sized bins is a very common > > case. I'd still like to see a C version of histogram (which I guess > > would need to be a ufunc) go into the core numpy. > > > Yes, this gets rid of the search, and indices can just be caluclated > from offsets. I've attached a modified weaved histogram that takes this > approach. Running the snippet below on my machine takes .118 sec for > the evenly binned weave algorithm and 0.385 sec for Rick's algorithm on > 5 million elements. That is close to 4x faster (but not 10x...), so > there is indeed some speed to be gained for the common special case. I > don't know if the code I wrote has a 2x gain left in it, but I've spent > zero time optimizing it. I'd bet it can be improved substantially. > > eric > > ### test_weave_even_histogram.py > > from numpy import arange, product, sum, zeros, uint8 > from numpy.random import randint > > import weave_even_histogram > > import time > > shape = 1000,1000,5 > size = product(shape) > data = randint(0,256,size).astype(uint8) > bins = arange(256+1) > > print 'type:', data.dtype > print 'millions of elements:', size/1e6 > > bin_start = 0 > bin_size = 1 > bin_count = 256 > t1 = time.clock() > res = weave_even_histogram.histogram(data, bin_start, bin_size, bin_count) > t2 = time.clock() > print 'sec (evenly spaced):', t2-t1, sum(res) > print res > > > > Rick > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: histogram1d.py Type: text/x-python Size: 6826 bytes Desc: not available URL: From eric at enthought.com Thu Dec 14 15:15:00 2006 From: eric at enthought.com (eric jones) Date: Thu, 14 Dec 2006 14:15:00 -0600 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4581A5A7.5030406@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> <4581A520.4030806@enthought.com> <4581A5A7.5030406@enthought.com> Message-ID: <4581B0C4.9050605@enthought.com> I've attached the newest version of my benchmarking code, and added one more evenly spaced version that works on contiguous arrays. Turns out that, the numpy iterator interface overhead is noticeably slower when we are using faster algorithms. The fastest (but least flexible) version is 5x faster than Rick's. So, we're still not at 10x that IDL gives you. eric C:\eric\code\histogram>c:\Python24\python.exe histogram_speed.py type: uint8 millions of elements: 10.0 sec (C indexing based): 0.944789631084 10000000 sec (numpy iteration based): 1.04402933892 10000000 sec (rick's pure python): 0.703124279762 10000000 sec (evenly spaced): 0.231293921434 10000000 sec (evenly spaced): 0.139521643114 10000000 Summary: case sec speed-up weave_1d_arbitrary 0.944790 0.744213 weave_nd_arbitrary 1.044029 0.673472 ricks_arbitrary 0.703124 1.000000 weave_nd_even 0.231294 3.039960 weave_1d_even 0.139522 5.039536 eric jones wrote: > I just noticed a bug in this code. "PyArray_ITER_NEXT(iter);" should be moved out of the if statement. > > eric > > eric jones wrote: > >> Rick White wrote: >> >>> Just so we don't get too smug about the speed, if I do this in IDL >>> on the same machine it is 10 times faster (0.28 seconds instead of >>> 4 seconds). I'm sure the IDL version uses the much faster approach >>> of just sweeping through the array once, incrementing counts in the >>> appropriate bins. It only handles equal-sized bins, so it is not as >>> general as the numpy version -- but equal-sized bins is a very >>> common case. I'd still like to see a C version of histogram (which >>> I guess would need to be a ufunc) go into the core numpy. >>> >>> >> Yes, this gets rid of the search, and indices can just be caluclated >> from offsets. I've attached a modified weaved histogram that takes >> this approach. Running the snippet below on my machine takes .118 sec >> for the evenly binned weave algorithm and 0.385 sec for Rick's >> algorithm on 5 million elements. That is close to 4x faster (but not >> 10x...), so there is indeed some speed to be gained for the common >> special case. I don't know if the code I wrote has a 2x gain left in >> it, but I've spent zero time optimizing it. I'd bet it can be >> improved substantially. >> >> eric >> >> ### test_weave_even_histogram.py >> >> from numpy import arange, product, sum, zeros, uint8 >> from numpy.random import randint >> >> import weave_even_histogram >> >> import time >> >> shape = 1000,1000,5 >> size = product(shape) >> data = randint(0,256,size).astype(uint8) >> bins = arange(256+1) >> >> print 'type:', data.dtype >> print 'millions of elements:', size/1e6 >> >> bin_start = 0 >> bin_size = 1 >> bin_count = 256 >> t1 = time.clock() >> res = weave_even_histogram.histogram(data, bin_start, bin_size, >> bin_count) >> t2 = time.clock() >> print 'sec (evenly spaced):', t2-t1, sum(res) >> print res >> >> >> >>> Rick >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://projects.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >> ------------------------------------------------------------------------ >> >> from numpy import array, zeros, asarray, sort, int32 >> from scipy import weave >> from typed_array_converter import converters >> >> def histogram(ary, bin_start, bin_size, bin_count): >> >> ary = asarray(ary) >> >> # Create an array to hold the histogram count results. >> results = zeros(bin_count,dtype=int32) >> >> # The C++ code that actually does the histogramming. >> code = """ >> PyArrayIterObject *iter = (PyArrayIterObject*)PyArray_IterNew(py_ary); >> >> while(iter->index < iter->size) >> { >> >> ////////////////////////////////////////////////////////// >> // binary search >> ////////////////////////////////////////////////////////// >> >> // This requires an update to weave >> ary_data_type value = *((ary_data_type*)iter->dataptr); >> if (value>=bin_start) >> { >> int bin_index = (int)((value-bin_start)/bin_size); >> >> ////////////////////////////////////////////////////////// >> // Bin counter increment >> ////////////////////////////////////////////////////////// >> >> // If the value was found, increment the counter for that bin. >> if (bin_index < bin_count) >> { >> results[bin_index]++; >> } >> PyArray_ITER_NEXT(iter); >> } >> } >> """ >> weave.inline(code, ['ary', 'bin_start', 'bin_size','bin_count', 'results'], >> type_converters=converters, >> compiler='gcc') >> >> return results >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- A non-text attachment was scrubbed... Name: histogram.zip Type: application/x-zip-compressed Size: 5823 bytes Desc: not available URL: From ellisonbg.net at gmail.com Thu Dec 14 20:06:14 2006 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 14 Dec 2006 18:06:14 -0700 Subject: [Numpy-discussion] Can we change how fortran compiler version strings are handled?! Message-ID: <6ce0ac130612141706r147c3208i687de9625051a0e6@mail.gmail.com> Hi, I have been doing quite a bit of numpy evangelism here at my work and slowly people are starting to use it. One of the main things people are interested in is f2py. But, I am finding that there is one persistent problem that keeps coming up when people try to install numpy on various systems: In three cases I have found that numpy failed to find and use a fortran compiler because the version string didn't match what was hardcoded into numpy.distutils. The reality is that version strings are in no way "standardized". In the most recent cases, we had a version of the lahey compiler that had the extra word "Express" and in another case, the xlf version string on a supercomputer was completely different. What is crazy to me is that this simple mismatch prevents numpy from even trying the compiler. Can we please change how Numpy handles the version string of fortran compilers? My suggestion would be to simply print the version string, but to attempt to use the compiler no matter what the version string is. That way, the success or failure of using the fortran compiler will be determined by the actual compiler, not its version string. There could be some other smart way of handling this, but I think it should be dealt with to make the installation process easier. I am willing to work up a patch if there is agreement on what should be done. Oh, the other difficult thing is that in the current arrangement, numpy.distutils doesn't print an error message that is easy to debug. It just silently does find the compiler rather than saying why. Thanks Brian From robert.kern at gmail.com Thu Dec 14 20:14:22 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Dec 2006 17:14:22 -0800 Subject: [Numpy-discussion] Can we change how fortran compiler version strings are handled?! In-Reply-To: <6ce0ac130612141706r147c3208i687de9625051a0e6@mail.gmail.com> References: <6ce0ac130612141706r147c3208i687de9625051a0e6@mail.gmail.com> Message-ID: <4581F6EE.5070006@gmail.com> Brian Granger wrote: > Can we please change how Numpy handles the version string of fortran > compilers? Yes, please. I'll be happy to apply any patch you might provide for this. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cameron.walsh at gmail.com Thu Dec 14 22:43:46 2006 From: cameron.walsh at gmail.com (Cameron Walsh) Date: Fri, 15 Dec 2006 12:43:46 +0900 Subject: [Numpy-discussion] Histograms of extremely large data sets In-Reply-To: <4581B0C4.9050605@enthought.com> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <9936E3D8-CF84-4605-BFB9-757D8FCC7912@stsci.edu> <4580E941.4030805@enthought.com> <106309950612132356r30b01b5ale19a928bc27ce7c6@mail.gmail.com> <6897A9E2-4B52-4B2F-942F-BDF16AA03779@stsci.edu> <4581A520.4030806@enthought.com> <4581A5A7.5030406@enthought.com> <4581B0C4.9050605@enthought.com> Message-ID: <106309950612141943p405a3e44i975aa7dd99fa90@mail.gmail.com> Using Eric's latest speed-testing, here's David's results: cameron at cameron-laptop:~/code_snippets/histogram$ python histogram_speed.py type: uint8 millions of elements: 100.0 sec (C indexing based): 8.44 100000000 sec (numpy iteration based): 8.91 100000000 sec (rick's pure python): 6.4 100000000 sec (nd evenly spaced): 2.1 100000000 sec (1d evenly spaced): 1.33 100000000 sec (david huard): 35.84 100000000 Summary: case sec speed-up weave_1d_arbitrary 8.440000 0.758294 weave_nd_arbitrary 8.910000 0.718294 ricks_arbitrary 6.400000 1.000000 weave_nd_even 2.100000 3.047619 weave_1d_even 1.330000 4.812030 david_huard 35.840000 0.178571 I also tried this on an equal-sized sample of my real-world data: 100 image slices, 8bits/sample, 1000x1000 pixels per image. The full data set is 489 image slices, but I was unable to randomly generate 489 million data samples because I ran out of memory and started thrashing the page file, ruining any results. So I've compared like with like and got the following results with real-world data: type: uint8 millions of elements: 100.0 sec (C indexing based): 6.1 100000000 sec (numpy iteration based): 7.07 100000000 sec (rick's pure python): 4.77 100000000 sec (nd evenly spaced): 2.12 100000000 sec (1d evenly spaced): 1.33 100000000 sec (david huard): 16.47 100000000 Summary: case sec speed-up weave_1d_arbitrary 6.100000 0.781967 weave_nd_arbitrary 7.070000 0.674682 ricks_arbitrary 4.770000 1.000000 weave_nd_even 2.120000 2.250000 weave_1d_even 1.330000 3.586466 david_huard 16.470000 0.289617 Note how much faster some of the algorithms run on the non-random, real-world data. I assume this is due to variations in the scaling of the quick-sort algorithm depending on the starting order of the data? Scaling with the full data set was similar. Unfortunately, David's code was not able to load the entire 489 image slices, throwing the same error as that mentioned in the first email in this thread. Later parts of the project I am working on will probably require iteration over the entire data set, and iteration seems to be slowing down several of these histogram algorithms, requiring the sort() approach. I'll have a look at the iterator, and see if there's anything that can be done there instead. I'm hoping that it will be possible to use a C-based iterator for a numpy multiarray, as this would allow many data processing algorithms to run faster, not just the histogram. Once again, thanks to everyone for all your input. This seems to have generated more discussion and action than I anticipated, for which I am very grateful. Best regards, Cameron. From giorgio.luciano at chimica.unige.it Fri Dec 15 03:14:28 2006 From: giorgio.luciano at chimica.unige.it (Giorgio Luciano) Date: Fri, 15 Dec 2006 09:14:28 +0100 Subject: [Numpy-discussion] empty data matrix (are they really empty ?) In-Reply-To: <45815D7E.3090109@gmx.net> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> <45815243.9070406@chimica.unige.it> <45815D7E.3090109@gmx.net> Message-ID: <45825964.1000702@chimica.unige.it> Ok thanks Swen I will try and sorry for not starting a new thread.. I'm still a newbie with mailing list too :) Giorgio > > > From giorgio.luciano at chimica.unige.it Fri Dec 15 03:59:06 2006 From: giorgio.luciano at chimica.unige.it (Giorgio Luciano) Date: Fri, 15 Dec 2006 09:59:06 +0100 Subject: [Numpy-discussion] empty data matrix (are they really empty ?) In-Reply-To: <45818C9B.5060404@noaa.gov> References: <106309950612121927o6b50ee7fj4996c52d5ff2d250@mail.gmail.com> <457FAED1.6010803@enthought.com> <106309950612130027m7c454b19r1255670114d5cb33@mail.gmail.com> <106309950612131732j1e48dccfnbd60ee1c17b0c6c9@mail.gmail.com> <4580B436.8040203@enthought.com> <45815243.9070406@chimica.unige.it> <45815D7E.3090109@gmx.net> <45818C9B.5060404@noaa.gov> Message-ID: <458263DA.30303@chimica.unige.it> Here's the runnable example Everything work fine with installed python 2.5 matplotlib 0.87.7 numpy 1.01scipy 0.5.2 Cheers Giorgio this module plots leverage for a regression module (only two steps in the grid since it's only a try to compare with a matlab file I have) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pyplotlev.py URL: From meesters at uni-mainz.de Fri Dec 15 05:52:26 2006 From: meesters at uni-mainz.de (Christian Meesters) Date: Fri, 15 Dec 2006 11:52:26 +0100 Subject: [Numpy-discussion] migrating to numpy Message-ID: <200612151152.26804.meesters@uni-mainz.de> Hi I was looking for a guide how to migrate old numeric code to numpy and couldn't find anything on the web. Any pointers for me? TIA and sorry for bothering, since this surely was discussed already, Christian From pjssilva at ime.usp.br Fri Dec 15 08:27:21 2006 From: pjssilva at ime.usp.br (Paulo Jose da Silva e Silva) Date: Fri, 15 Dec 2006 11:27:21 -0200 Subject: [Numpy-discussion] Automatic matrices Message-ID: <1166189241.17633.27.camel@localhost.localdomain> Hello, If a numpy user is specially concerned with numerical linear algebra (or more generally with Math), it may find unconvenient the use of the dot function instead of the * operator. This behavior may be specially unpleasant for someone migrating from Matlab. I believe that this is the may reason for the existence of the matrix class within numpy. However, after trying to use the matrix class I have came across a major roadblock: many numpy/scipy functions return an array by default and not matrices. Then, we need then to add many conversion calls to the 'mat' function in our code. This is also unconvenient. I have had the idea of trying to write some code to automatically call the mat function for me. I have a very simple and inefficient prototype now that can be downloaded at: http://www.ime.usp.br/~pjssilva/matrix_import.py The use is simple. Instead of importing a numerical module with import, use the special class MatrixfiedModule from the above file. Let me give an example: --- ipython session --- In [2]:import matrix_import as mi In [3]:num = mi.MatrixfiedModule('numpy') Importing numpy In [4]:la = mi.MatrixfiedModule('scipy.linalg') Importing scipy.linalg In [5]:A = num.random.rand(3,4) Importing numpy.random In [6]:Q, R = la.qr(A) In [7]:la.norm(Q*R - A) Out[7]:6.0555516793379748e-16 ----- End session ----- For now the solution is very inefficient: every function call to a MatrixfiedModule function is wrapped on the fly to search for array return values ad convert them to matrix. This can certainly be improved the wrapping all the functions in the original module first. I plan to add this possibility soon. It is also incomplete: The automatic conversion only happens for return values of module function. It doesn't try to deal with special objects like finfo(float).eps or mgrid[0:9.,0:6.]. I am not sure how to deal with this. I can donate the code to scipy if there is any interest. Any comments? Best, Paulo -- Paulo Jos? da Silva e Silva Professor Assistente, Dep. de Ci?ncia da Computa??o (Assistant Professor, Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From evan.lapisky at gmail.com Fri Dec 15 09:37:26 2006 From: evan.lapisky at gmail.com (Evan Lapisky) Date: Fri, 15 Dec 2006 09:37:26 -0500 Subject: [Numpy-discussion] Pyrex and numpy Message-ID: <4052c5140612150637m126d7ca2u5537f3a7b7a7178d@mail.gmail.com> > I was just going to try pyrex out with numpy to see how it compares with > weave (which is cool but quirky). My first attempt ended in failure: I > tried to compile the demo in in numpy/doc/pyrex and got this error: > > c_numpy.pxd:99:22: Array element cannot be a Python object > > > Does anyone who uses pyrex see this? Does anyone know what it's from? > Not that I deleted numpyx.c, since otherwise pyrex isn't invoked at all? > I had the same problem. I don't know why it didn't work, but the pyrex example from http://scipy.org/PerformancePython worked just fine. -Evan From robert.kern at gmail.com Fri Dec 15 10:05:35 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Dec 2006 07:05:35 -0800 Subject: [Numpy-discussion] migrating to numpy In-Reply-To: <200612151152.26804.meesters@uni-mainz.de> References: <200612151152.26804.meesters@uni-mainz.de> Message-ID: <4582B9BF.2070003@gmail.com> Christian Meesters wrote: > Hi > > I was looking for a guide how to migrate old numeric code to numpy and > couldn't find anything on the web. Any pointers for me? http://www.scipy.org/Converting_from_Numeric Also, chapter 2.6 of the _Guide to NumPy_ in the freely available sample chapters covers this. http://www.tramy.us/numpybooksample.pdf -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at ieee.org Fri Dec 15 11:05:05 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Fri, 15 Dec 2006 09:05:05 -0700 Subject: [Numpy-discussion] Pyrex and numpy In-Reply-To: <4052c5140612150637m126d7ca2u5537f3a7b7a7178d@mail.gmail.com> References: <4052c5140612150637m126d7ca2u5537f3a7b7a7178d@mail.gmail.com> Message-ID: <4582C7B1.2090406@ieee.org> Evan Lapisky wrote: >> I was just going to try pyrex out with numpy to see how it compares with >> weave (which is cool but quirky). My first attempt ended in failure: I >> tried to compile the demo in in numpy/doc/pyrex and got this error: >> >> c_numpy.pxd:99:22: Array element cannot be a Python object >> >> >> Does anyone who uses pyrex see this? Does anyone know what it's from? >> Not that I deleted numpyx.c, since otherwise pyrex isn't invoked at all? >> >> > > I had the same problem. I don't know why it didn't work, but the pyrex > example from http://scipy.org/PerformancePython worked just fine. > > Hmmm. Thanks. I'll give it a try. -tim From svetosch at gmx.net Fri Dec 15 17:37:57 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Fri, 15 Dec 2006 23:37:57 +0100 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <1166189241.17633.27.camel@localhost.localdomain> References: <1166189241.17633.27.camel@localhost.localdomain> Message-ID: <458323C5.7010808@gmx.net> Paulo Jose da Silva e Silva schrieb: > > However, after trying to use the matrix class I have came across a major > roadblock: many numpy/scipy functions return an array by default and not > matrices. Then, we need then to add many conversion calls to the 'mat' > function in our code. This is also unconvenient. > scipy I don't know, but in numpy as a matrix user I'm glad that such behavior has been treated as bugs on the way to 1.0 -- so could you please send a list with the affected numpy functions? -sven From pjssilva at ime.usp.br Fri Dec 15 19:21:37 2006 From: pjssilva at ime.usp.br (Paulo Jose da Silva e Silva) Date: Fri, 15 Dec 2006 22:21:37 -0200 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <458323C5.7010808@gmx.net> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> Message-ID: <1166228497.17633.61.camel@localhost.localdomain> Em Sex, 2006-12-15 ?s 23:37 +0100, Sven Schreiber escreveu: > Paulo Jose da Silva e Silva schrieb: > > > > > However, after trying to use the matrix class I have came across a major > > roadblock: many numpy/scipy functions return an array by default and not > > matrices. Then, we need then to add many conversion calls to the 'mat' > > function in our code. This is also unconvenient. > > > > scipy I don't know, but in numpy as a matrix user I'm glad that such > behavior has been treated as bugs on the way to 1.0 -- so could you > please send a list with the affected numpy functions? > -sven Ops... I did not try to imply that there are some functions in numpy that return array when receiving matrices. What I meant is that there are functions in numpy that always return arrays. Hence they ask for an explicit conversion to matrices. Good examples is the whole numpy.random sub-module. So if you want a random matrix you need to type: A = mat(numpy.random.rand(4,4)) Hence, a matrix user of numpy module still have to be aware of such conversions. Note that in my code, after importing numpy using the special module, I can write A = num.random.rand(4,4) There is no special case. best, Paulo Obs: I remember reading somewhere in the list that we can change the behavior of numpy to make it return matrices as default, even in calls for functions like zeros or ones. I don't have the reference now. Anyhow I wanted a solution that can make any module play nice with matrices. From kwgoodman at gmail.com Fri Dec 15 19:44:25 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 15 Dec 2006 16:44:25 -0800 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <1166228497.17633.61.camel@localhost.localdomain> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> Message-ID: On 12/15/06, Paulo Jose da Silva e Silva wrote: > I did not try to imply that there are some functions in numpy that > return array when receiving matrices. What I meant is that there are > functions in numpy that always return arrays. Hence they ask for an > explicit conversion to matrices. Good examples is the whole numpy.random > sub-module. So if you want a random matrix you need to type: There are many numpy functions that will take a matrix as input but return an array. The nan functions (nanmin, nanmax, nanargmin, nanargmax, nansum) are an example. The first line in these functions is y = array(a) which converts the matrix input into an array. A more matrix friendly alternative would be y = asanyarray(a) But are there any unintended consequences of changing from array to asanyarray? From robert.kern at gmail.com Fri Dec 15 21:49:49 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Dec 2006 20:49:49 -0600 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> Message-ID: <45835ECD.1060703@gmail.com> Keith Goodman wrote: > But are there any unintended consequences of changing from array to asanyarray? Not by itself, no. That entails that the implementations cannot rely on any particular behavior of the arrays. The correct(ish) approach looks something like the following, I believe: def foo(input): input_arr = asanyarray(input) wrapper = input_arr.__array_wrap__ input_arr = asarray(input_arr) # Do stuff on input_arr to get output_arr (an ndarray). return wrapper(output_arr) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From svetosch at gmx.net Sat Dec 16 15:24:41 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Sat, 16 Dec 2006 21:24:41 +0100 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> Message-ID: <45845609.40705@gmx.net> Keith Goodman schrieb: > > There are many numpy functions that will take a matrix as input but > return an array. > > The nan functions (nanmin, nanmax, nanargmin, nanargmax, nansum) are an example. > So that would be a bug IMHO and should be filed as a ticket. I will do that eventually if nobody stops me first... -sven From svetosch at gmx.net Sat Dec 16 15:27:14 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Sat, 16 Dec 2006 21:27:14 +0100 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <1166228497.17633.61.camel@localhost.localdomain> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> Message-ID: <458456A2.6010900@gmx.net> Paulo Jose da Silva e Silva schrieb: > > Obs: I remember reading somewhere in the list that we can change the > behavior of numpy to make it return matrices as default, even in calls > for functions like zeros or ones. I don't have the reference now. Anyhow > I wanted a solution that can make any module play nice with matrices. > Yes, from numpy.matlib import ones, zeros, empty, rand, eye should cover most cases (at least for me) -sven From cjw at sympatico.ca Sat Dec 16 19:55:46 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat, 16 Dec 2006 19:55:46 -0500 Subject: [Numpy-discussion] Subclasses - use of __finalize__ Message-ID: An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Dec 17 00:59:08 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 16 Dec 2006 22:59:08 -0700 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <458456A2.6010900@gmx.net> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> <458456A2.6010900@gmx.net> Message-ID: Testing, please disregard.... -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Dec 17 15:25:20 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 Dec 2006 13:25:20 -0700 Subject: [Numpy-discussion] Anyone have a "little" shooting-method function to share In-Reply-To: <1163094437.6376.6.camel@localhost.localdomain> References: <45527171.8030903@noaa.gov> <1163094437.6376.6.camel@localhost.localdomain> Message-ID: On 11/9/06, Pauli Virtanen wrote: > > ke, 2006-11-08 kello 16:08 -0800, David L Goldsmith kirjoitti: > > Hi! I tried to send this earlier: it made it into my sent mail folder, > > but does not appear to have made it to the list. > > > > I need to numerically solve: > > (1-t)x" + x' - x = f(t), x(0) = x0, x(1) = x1 > > I've been trying to use (because it's the approach I inherited) an > > elementary finite-difference discretization, but unit tests have shown > > that that approach isn't working. I thought I would try a Chebyshev method on this problem. My solution with order ten (degree 9) Chebyshev polynomials goes as follows. In [111]: import chebyshev as c In [112]: t = c.modified_points(10,0,1) # use 10 sample points In [113]: D = c.modified_derivative(10,0,1) # derivative operator In [114]: op = (1.0 - t)[:,newaxis]*dot(D,D) + D - eye(10) # differential equation In [115]: op[0] = 0 # set up boundary condition y(0) = y0 In [116]: op[0,0] = 1 In [117]: op[9] = 0 # set up boundary condition y(1) = y1 In [118]: op[9,9] = 1 In [119]: opinv = alg.inv(op) # invert the operator In [120]: f = exp(t) # try f(t) = exp(t) In [121]: f[0] = 2 # y0 = 2 In [122]: f[9] = 1 # y1 = 1 In [123]: soln = dot(opinv,f) # solve equation In [124]: plot(t,soln) Out[124]: [] The plot is rather rough with only 10 points. Replot with more. In [125]: tsmp = linspace(0,1) In [126]: interp = c.modified_values(tsmp, 10, 0, 0, 1) In [127]: plot(tsmp, dot(interp, soln)) Out[127]: [] Looks OK here. You can save opinv as it doesn't change with f. Likewise, if you always want to interpolate the result, then save dot(interp, opinv). I've attached a plot of the solution I got along with the chebyshev module I use. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: solution.jpg.zip Type: application/zip Size: 16196 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: chebyshev.py.zip Type: application/zip Size: 3997 bytes Desc: not available URL: From zhangyunfeng at gmail.com Sun Dec 17 21:26:41 2006 From: zhangyunfeng at gmail.com (zhang yunfeng) Date: Mon, 18 Dec 2006 10:26:41 +0800 Subject: [Numpy-discussion] sum of two arrays with different shape? Message-ID: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> Hi, I'm newbie to Numpy. When reading tutorials at http://www.scipy.org/Tentative_NumPy_Tutorial, I found a snippet about addition of two arrays with different shape, Does it make sense? If array shapes are not same, why it doesn't throw out an error? see the code below (taken from the above webpage) array a.shape is (4,) and y.shape is (3,4) and a+y ? ------------------------------------------- >>> y = arange(12) >>> y array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) >>> y.shape = 3,4 # does not modify the total number of elements >>> y array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) It is possible to operate with arrays of diferent dimensions as long as they fit well. >>> 3*a # multiply each element of a by 3 array([ 30, 60, 90, 120]) >>> a+y # sum a to each row of y array([[10, 21, 32, 43], [14, 25, 36, 47], [18, 29, 40, 51]]) -------------------------------------------- -- http://my.opera.com/zhangyunfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Dec 17 21:39:30 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 Dec 2006 20:39:30 -0600 Subject: [Numpy-discussion] sum of two arrays with different shape? In-Reply-To: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> References: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> Message-ID: <4585FF62.10701@gmail.com> zhang yunfeng wrote: > Hi, I'm newbie to Numpy. > > When reading tutorials at > http://www.scipy.org/Tentative_NumPy_Tutorial > , I found a snippet about > addition of two arrays with different shape, Does it make sense? If > array shapes are not same, why it doesn't throw out an error? When two arrays of different shapes are operated against each other, numpy tries to "broadcast" them to a compatible shape according to certain rules. This is a fairly powerful concept, and it provides quite a lot of convenience. The following wiki page has an explanation of the broadcasting rules: http://www.scipy.org/EricsBroadcastingDoc It still refers to Numeric, numpy's predecessor, but the concepts still apply (change "Numeric" to "numpy" and "NewAxis" to "newaxis", and I believe all of the code examples will be correct). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Mon Dec 18 02:17:08 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 18 Dec 2006 16:17:08 +0900 Subject: [Numpy-discussion] slow numpy.clip ? Message-ID: <45864074.6090203@ar.media.kyoto-u.ac.jp> Hi, When trying to speed up some matplotlib routines with the matplotlib dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is slower than a direct numpy implementation (that is data[dataM] = M; return data.copy()). My understanding is that the code does the same thing, right ? Below, a small script which shows the difference (twice slower for a 8000x256 array on my workstation): import numpy as N #========================== # To benchmark imshow alone #========================== def generate_data_2d(fr, nwin, hop, len): nframes = 1.0 * fr / hop * len return N.random.randn(nframes, nwin) def bench_clip(): m = -1. M = 1. # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap data = generate_data_2d(8000, 256, 128, 120) def clip1_bench(data, niter): for i in range(niter): blop = N.clip(data, m, M) def clip2_bench(data, niter): for i in range(niter): data[data > Hi, > > The following issue has puzzled me for a while. I want to add a > numpy.ndarray and an instance of my own class. I define this operation > by implementing the methods __add__ and __radd__. My programme > (including output) looks like: > > #!/usr/local/bin/python > > import numpy > > class Cyclehist: > def __init__(self,vals): > self.valuearray = numpy.array(vals) > > def __str__(self): > return 'Cyclehist object: valuearray = '+str(self.valuearray) > > def __add__(self,other): > print "__add__ : ",self,other > return self.valuearray + other > > def __radd__(self,other): > print "__radd__ : ",self,other > return other + self.valuearray > > c = Cyclehist([1.0,-21.2,3.2]) > a = numpy.array([-1.0,2.2,-2.2]) > print c + a > print a + c > > # ---------- OUTPUT ---------- > # > # addprob $ addprob.py > # __add__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] [-1. > 2.2 -2.2] > # [ 0. -19. 1.] > # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -1.0 > # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] 2.2 > # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -2.2 > # [[ 0. -22.2 2.2] [ 3.2 -19. 5.4] [ -1.2 -23.4 1. ]] > # addprob $ > # > # ---------------------------- > > > I expected the output of "c+a" and "a+c" to be identical, however, the > output of "a+c" gets nested in an elementwise fashion. Can anybody > explain this? Is it a bug or a feature? I'm using Python 2.4.4c1 and > numpy 1.0. I tried the programme using an older version of Python and > numpy and there the result of "c+a" and "a+c" are identical. > > > Regards, > > Mark Hoffmann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Mon Dec 18 03:24:25 2006 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 17 Dec 2006 22:24:25 -1000 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45864074.6090203@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> Message-ID: <45865039.3070500@hawaii.edu> David Cournapeau wrote: > Hi, > > When trying to speed up some matplotlib routines with the matplotlib > dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is > slower than a direct numpy implementation (that is data[data data[data>M] = M; return data.copy()). My understanding is that the code > does the same thing, right ? > > Below, a small script which shows the difference (twice slower for a > 8000x256 array on my workstation): I think there was a bug in your clip2_bench that was making it artificially fast. Attached is a script that I think gives a more fair comparison, in which clip1 and clip2 are nearly identical, and includes a third version using putmask which is faster than either of the others: 15 function calls in 6.450 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.450 6.450 cliptest.py:10(bench_clip) 1 2.302 2.302 2.302 2.302 cliptest.py:19(clip2_bench) 1 0.013 0.013 2.280 2.280 cliptest.py:15(clip1_bench) 10 2.267 0.227 2.267 0.227 /usr/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:357(clip) 1 1.498 1.498 1.498 1.498 cliptest.py:25(clip3_bench) 1 0.366 0.366 0.366 0.366 cliptest.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) Eric > > import numpy as N > > #========================== > # To benchmark imshow alone > #========================== > def generate_data_2d(fr, nwin, hop, len): > nframes = 1.0 * fr / hop * len > return N.random.randn(nframes, nwin) > > def bench_clip(): > m = -1. > M = 1. > # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % > overlap > data = generate_data_2d(8000, 256, 128, 120) > > def clip1_bench(data, niter): > for i in range(niter): > blop = N.clip(data, m, M) > def clip2_bench(data, niter): > for i in range(niter): > data[data data[data blop = data.copy() > > clip1_bench(data, 10) > clip2_bench(data, 10) > > if __name__ == '__main__': > # test clip > import hotshot, hotshot.stats > profile_file = 'clip.prof' > prof = hotshot.Profile(profile_file, lineevents=1) > prof.runcall(bench_clip) > p = hotshot.stats.load(profile_file) > print p.sort_stats('cumulative').print_stats(20) > prof.close() > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: cliptest.py Type: text/x-python Size: 1123 bytes Desc: not available URL: From stefan at sun.ac.za Mon Dec 18 03:27:56 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Mon, 18 Dec 2006 10:27:56 +0200 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45864074.6090203@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> Message-ID: <20061218082756.GV2180@mentat.za.net> Hi David The benchmark below isn't quite correct. In clip2_bench the data is effectively only clipped once. I attach a slightly modified version, for which the benchmark results look like this: 4 function calls in 4.631 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 4.631 4.631 clipb.py:10(bench_clip) 1 2.149 2.149 2.149 2.149 clipb.py:16(clip1_bench) 1 2.070 2.070 2.070 2.070 clipb.py:19(clip2_bench) 1 0.409 0.409 0.409 0.409 clipb.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) The remaining difference is probably a cache effect. If I change the order, so that clip1_bench is executed last, I see: 4 function calls in 5.250 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 5.250 5.250 clipb.py:10(bench_clip) 1 2.588 2.588 2.588 2.588 clipb.py:19(clip2_bench) 1 2.148 2.148 2.148 2.148 clipb.py:16(clip1_bench) 1 0.512 0.512 0.512 0.512 clipb.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) Regards St?fan On Mon, Dec 18, 2006 at 04:17:08PM +0900, David Cournapeau wrote: > Hi, > > When trying to speed up some matplotlib routines with the matplotlib > dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is > slower than a direct numpy implementation (that is data[data data[data>M] = M; return data.copy()). My understanding is that the code > does the same thing, right ? > > Below, a small script which shows the difference (twice slower for a > 8000x256 array on my workstation): > [...] -------------- next part -------------- A non-text attachment was scrubbed... Name: clipb.py Type: text/x-python Size: 1029 bytes Desc: not available URL: From david at ar.media.kyoto-u.ac.jp Mon Dec 18 03:45:09 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 18 Dec 2006 17:45:09 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <20061218082756.GV2180@mentat.za.net> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> Message-ID: <45865515.1000208@ar.media.kyoto-u.ac.jp> Stefan van der Walt wrote: > Hi David > > The benchmark below isn't quite correct. In clip2_bench the data is > effectively only clipped once. I attach a slightly modified version, > for which the benchmark results look like this: Yes, I of course mistyped the < and the copy. But the function is still moderately faster on my workstation: ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) 10 1.990 0.199 1.990 0.199 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) 1 0.258 0.258 0.258 0.258 slowclip.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) I agree this is not really much a difference, though. The question is then, in the context of matplotlib, is there really a need to copy ? Because if I do not copy the array before clipping, then the function is really faster (for those wondering, this is one bottleneck when calling matplotlib.imshow, used in specgram and so on), cheers, David From stefan at sun.ac.za Mon Dec 18 04:17:10 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Mon, 18 Dec 2006 11:17:10 +0200 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45865515.1000208@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> Message-ID: <20061218091710.GW2180@mentat.za.net> On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote: > Yes, I of course mistyped the < and the copy. But the function is still > moderately faster on my workstation: > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) > 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) > 10 1.990 0.199 1.990 0.199 > /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) > 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) > 1 0.258 0.258 0.258 0.258 > slowclip.py:6(generate_data_2d) > 0 0.000 0.000 profile:0(profiler) Did you try swapping the order of execution (i.e. clip1 second)? Cheers St?fan From stefan at sun.ac.za Mon Dec 18 04:35:57 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Mon, 18 Dec 2006 11:35:57 +0200 Subject: [Numpy-discussion] Unexpected output using numpy.ndarray and __radd__ In-Reply-To: <1A0F0517C2D6894282F07FAF76FA3612020AD01F@CPH-EXCH-SG4.manbw.dk> References: <1A0F0517C2D6894282F07FAF76FA3612020AD01F@CPH-EXCH-SG4.manbw.dk> Message-ID: <20061218093557.GX2180@mentat.za.net> Hi Mark On Mon, Dec 18, 2006 at 08:30:20AM +0100, Mark Hoffmann wrote: > The following issue has puzzled me for a while. I want to add a numpy.ndarray > and an instance of my own class. I define this operation by implementing the > methods __add__ and __radd__. My programme (including output) looks like: > > #!/usr/local/bin/python > > import numpy > > class Cyclehist: > def __init__(self,vals): > self.valuearray = numpy.array(vals) > > def __str__(self): > return 'Cyclehist object: valuearray = '+str(self.valuearray) > > def __add__(self,other): > print "__add__ : ",self,other > return self.valuearray + other > > def __radd__(self,other): > print "__radd__ : ",self,other > return other + self.valuearray > > c = Cyclehist([1.0,-21.2,3.2]) > a = numpy.array([-1.0,2.2,-2.2]) > print c + a > print a + c In the first instance, c.__add__(a) is called, which works fine. In the second, a.__add__(c) is executed, which is your problem, since you rather want c.__radd__(a) to be executed. A documentation snippets: """For instance, to evaluate the expression x-y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented. Note: If the right operand's type is a subclass of the left operand's type and that subclass provides the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.""" Since a.__add__ does not return NotImplemented, c.__radd__ is not called where you expect it to be. I am not sure why broadcasting takes place here, maybe someone else on the list can elaborate. To solve your problem, you may want to look into subclassing ndarrays, as described at http://www.scipy.org/Subclasses. Cheers St?fan From david at ar.media.kyoto-u.ac.jp Mon Dec 18 05:09:52 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 18 Dec 2006 19:09:52 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <20061218091710.GW2180@mentat.za.net> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> Message-ID: <458668F0.90604@ar.media.kyoto-u.ac.jp> Stefan van der Walt wrote: > On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote: >> Yes, I of course mistyped the < and the copy. But the function is still >> moderately faster on my workstation: >> >> ncalls tottime percall cumtime percall filename:lineno(function) >> 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) >> 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) >> 10 1.990 0.199 1.990 0.199 >> /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) >> 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) >> 1 0.258 0.258 0.258 0.258 >> slowclip.py:6(generate_data_2d) >> 0 0.000 0.000 profile:0(profiler) > > Did you try swapping the order of execution (i.e. clip1 second)? Yes, I tried different orders, etc... and it showed the same pattern. The thing is, this kind of thing is highly CPU dependent in my experience; I don't have the time right now to update numpy.scipy on my laptop, but it happens that profiles results are quite different between my workstation (P4 xeon) and my laptop (pentium m). anyway, contrary to what I thought first, the real problem is the copy, so this is where I should investigate in matplotlib case, David From hermann.ruckerbauer at qimonda.com Mon Dec 18 05:44:35 2006 From: hermann.ruckerbauer at qimonda.com (hermann.ruckerbauer at qimonda.com) Date: Mon, 18 Dec 2006 11:44:35 +0100 Subject: [Numpy-discussion] Python 2.5 on win and Gnuplot and alter_code1 Message-ID: <00314E7F9E92594E84D353E2F112FE4F029C3831@mucse306.eu.infineon.com> Hello *, Sorry for this maybe stupid question, I might still miss some understanding of pyton or other stuff ... I have problems getting Gnuplot running under python 2.5 on Windows. My understanding is, that Gnuplot requires Numeric. Unfortionally there is only a numeric for python 2.4 available on the net. This binary does not install under python 2.5. Now I checked numpy, and it seems to able to act as replacement for the numeric package. I'm not really sure if this is the old.numeric module (just replace each numeric with numpy.old.numeric) or if this requires some more changes on the source files (e. g. exchang numeric with numpy and change some of the function calls). So I tried to convert my Gnuplot .py files with the alter_code1. But somehow I'm too stupid to get it running . First absolutel nothing happende when doing e.g. import numpy.oldnumeric.alter_code1 as noa noa.converttree() No error message, but also no file was touched, altough the command took a while to be executed. As I had no idea how to give the top level I tried a little bit around and copied the alter_code1.py into the gnuplot directory. And this file was touched ... It got the comment: At the beginning of alter_code1.py, but non of the other files have been touched. After some copy and paste trials i added on all .py files in the Gnuplot folder the comment from the beginning of the alter_code1.py. And now magically all files in this folders have been touched by running the converttree. BUT: only the comment "## Automatically adapted for numpy.oldnumeric Dec 15, 2006 by " Was added at the beginning of each file, none of the commands have been touched ... Does anybody have some hints for me to get gnuplot running under python 2.5 for windows ? Thanks and regards Hermann From faltet at carabos.com Mon Dec 18 07:15:35 2006 From: faltet at carabos.com (Francesc Altet) Date: Mon, 18 Dec 2006 13:15:35 +0100 Subject: [Numpy-discussion] Python 2.5 on win and Gnuplot and alter_code1 In-Reply-To: <00314E7F9E92594E84D353E2F112FE4F029C3831@mucse306.eu.infineon.com> References: <00314E7F9E92594E84D353E2F112FE4F029C3831@mucse306.eu.infineon.com> Message-ID: <1166444135.2680.7.camel@localhost.localdomain> El dl 18 de 12 del 2006 a les 11:44 +0100, en/na hermann.ruckerbauer at qimonda.com va escriure: > Hello *, > > Sorry for this maybe stupid question, I might still miss some > understanding of pyton or other stuff ... > > I have problems getting Gnuplot running under python 2.5 on Windows. Do you have a special need to use Gnuplot? As it requires Numeric and because it has issues with python2.5 and is not maintained anymore, I don't recommend you using it any longer. Please, have a look at matplotlib (http://matplotlib.sourceforge.net/) and try to compile it against numpy (http://numpy.scipy.org/). Note that both of these are currently maintained and work great with python 2.5. Hope that helps, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth From hermann.ruckerbauer at qimonda.com Mon Dec 18 07:18:45 2006 From: hermann.ruckerbauer at qimonda.com (hermann.ruckerbauer at qimonda.com) Date: Mon, 18 Dec 2006 13:18:45 +0100 Subject: [Numpy-discussion] Python 2.5 on win and Gnuplot and alter_code1 In-Reply-To: <1166444135.2680.7.camel@localhost.localdomain> Message-ID: <00314E7F9E92594E84D353E2F112FE4F029C394A@mucse306.eu.infineon.com> Thanks for the feedback, The reason why I'm trying to use Gnuplot is, that I got quite some old scripts with Gnuplot that i would like to reuse. Everything I create now is matplotlib based ... This is working fine in my config. Regards Hermann Hermann Ruckerbauer Module Design Engineer QAG PD ICD EDS Qimonda AG Phone: +49 89 234 60088 2021 Fax: +49 89 234 60088 44 2021 e-mail: Hermann.Ruckerbauer at qimonda.com Postal Address: Qimonda AG, P.O. Box 800949, D-81739 Munich Visitor Address: Am Campeon 1 - 12, D-85579 Neubiberg, Room 10.02.358 *** visit our homepage at http://www.qimonda.com *** -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Francesc Altet Sent: Monday, December 18, 2006 1:16 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Python 2.5 on win and Gnuplot and alter_code1 El dl 18 de 12 del 2006 a les 11:44 +0100, en/na hermann.ruckerbauer at qimonda.com va escriure: > Hello *, > > Sorry for this maybe stupid question, I might still miss some > understanding of pyton or other stuff ... > > I have problems getting Gnuplot running under python 2.5 on Windows. Do you have a special need to use Gnuplot? As it requires Numeric and because it has issues with python2.5 and is not maintained anymore, I don't recommend you using it any longer. Please, have a look at matplotlib (http://matplotlib.sourceforge.net/) and try to compile it against numpy (http://numpy.scipy.org/). Note that both of these are currently maintained and work great with python 2.5. Hope that helps, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From Mark.Hoffmann at dk.manbw.com Mon Dec 18 07:29:57 2006 From: Mark.Hoffmann at dk.manbw.com (Mark Hoffmann) Date: Mon, 18 Dec 2006 13:29:57 +0100 Subject: [Numpy-discussion] Unexpected output using numpy.ndarray and__radd__ Message-ID: <1A0F0517C2D6894282F07FAF76FA3612020AD023@CPH-EXCH-SG4.manbw.dk> I appreciate the answer and the solution suggestion. I see that it is possible to make a work around by subclassing from ndarray. Still, in the "print a+c" statement, I don't understand why a.__add__(c) doesn't return NotImplemented (because ndarray shouldn't recognize the Cyclehist class) and directly call c.__radd__(a) implemented in my Cyclehist class. I tried the exactly same programme using Python 2.4.1 and Scipy 0.3.2 (based on numeric/numarray) and the result of the "print a+c" didn't get nested as I expect. Regards, Mark -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Stefan van der Walt Sent: 18. december 2006 10:36 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Unexpected output using numpy.ndarray and__radd__ Hi Mark On Mon, Dec 18, 2006 at 08:30:20AM +0100, Mark Hoffmann wrote: > The following issue has puzzled me for a while. I want to add a > numpy.ndarray and an instance of my own class. I define this operation > by implementing the methods __add__ and __radd__. My programme (including output) looks like: > > #!/usr/local/bin/python > > import numpy > > class Cyclehist: > def __init__(self,vals): > self.valuearray = numpy.array(vals) > > def __str__(self): > return 'Cyclehist object: valuearray = '+str(self.valuearray) > > def __add__(self,other): > print "__add__ : ",self,other > return self.valuearray + other > > def __radd__(self,other): > print "__radd__ : ",self,other > return other + self.valuearray > > c = Cyclehist([1.0,-21.2,3.2]) > a = numpy.array([-1.0,2.2,-2.2]) > print c + a > print a + c In the first instance, c.__add__(a) is called, which works fine. In the second, a.__add__(c) is executed, which is your problem, since you rather want c.__radd__(a) to be executed. A documentation snippets: """For instance, to evaluate the expression x-y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented. Note: If the right operand's type is a subclass of the left operand's type and that subclass provides the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.""" Since a.__add__ does not return NotImplemented, c.__radd__ is not called where you expect it to be. I am not sure why broadcasting takes place here, maybe someone else on the list can elaborate. To solve your problem, you may want to look into subclassing ndarrays, as described at http://www.scipy.org/Subclasses. Cheers St?fan _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From tim.hochberg at ieee.org Mon Dec 18 08:05:34 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon, 18 Dec 2006 06:05:34 -0700 Subject: [Numpy-discussion] Unexpected output using numpy.ndarray and__radd__ In-Reply-To: <1A0F0517C2D6894282F07FAF76FA3612020AD023@CPH-EXCH-SG4.manbw.dk> References: <1A0F0517C2D6894282F07FAF76FA3612020AD023@CPH-EXCH-SG4.manbw.dk> Message-ID: <4586921E.1030802@ieee.org> Mark Hoffmann wrote: > I appreciate the answer and the solution suggestion. I see that it is possible to make a work around by subclassing from ndarray. Still, in the "print a+c" statement, I don't understand why a.__add__(c) doesn't return NotImplemented (because ndarray shouldn't recognize the Cyclehist class) and directly call c.__radd__(a) implemented in my Cyclehist class. I tried the exactly same programme using Python 2.4.1 and Scipy 0.3.2 (based on numeric/numarray) and the result of the "print a+c" didn't get nested as I expect. > > Regards, > Mark > I'm not sure what this is doing -- it looks kind of bizzare -- however, you can fix this case without resorting to subclassing to ndarray. Just toss an '__array_priority__ = 10' up at the top of the class definition and it will use your __methods__ in preference to the ndarrays. I don't have time to look into this further right now unfortunately. -tim > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Stefan van der Walt > Sent: 18. december 2006 10:36 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Unexpected output using numpy.ndarray and__radd__ > > Hi Mark > > On Mon, Dec 18, 2006 at 08:30:20AM +0100, Mark Hoffmann wrote: > >> The following issue has puzzled me for a while. I want to add a >> numpy.ndarray and an instance of my own class. I define this operation >> by implementing the methods __add__ and __radd__. My programme (including output) looks like: >> >> #!/usr/local/bin/python >> >> import numpy >> >> class Cyclehist: >> def __init__(self,vals): >> self.valuearray = numpy.array(vals) >> >> def __str__(self): >> return 'Cyclehist object: valuearray = '+str(self.valuearray) >> >> def __add__(self,other): >> print "__add__ : ",self,other >> return self.valuearray + other >> >> def __radd__(self,other): >> print "__radd__ : ",self,other >> return other + self.valuearray >> >> c = Cyclehist([1.0,-21.2,3.2]) >> a = numpy.array([-1.0,2.2,-2.2]) >> print c + a >> print a + c >> > > In the first instance, c.__add__(a) is called, which works fine. In the second, a.__add__(c) is executed, which is your problem, since you rather want c.__radd__(a) to be executed. A documentation snippets: > > """For instance, to evaluate the expression x-y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented. > > Note: If the right operand's type is a subclass of the left operand's type and that subclass provides the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.""" > > Since a.__add__ does not return NotImplemented, c.__radd__ is not called where you expect it to be. I am not sure why broadcasting takes place here, maybe someone else on the list can elaborate. > > To solve your problem, you may want to look into subclassing ndarrays, as described at http://www.scipy.org/Subclasses. > > Cheers > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > From Mark.Hoffmann at dk.manbw.com Mon Dec 18 08:20:57 2006 From: Mark.Hoffmann at dk.manbw.com (Mark Hoffmann) Date: Mon, 18 Dec 2006 14:20:57 +0100 Subject: [Numpy-discussion] Unexpected output usingnumpy.ndarray and__radd__ Message-ID: <1A0F0517C2D6894282F07FAF76FA3612020AD024@CPH-EXCH-SG4.manbw.dk> Excellent, thank you - it solved the problem! /Mark -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Tim Hochberg Sent: 18. december 2006 14:06 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Unexpected output usingnumpy.ndarray and__radd__ Mark Hoffmann wrote: > I appreciate the answer and the solution suggestion. I see that it is possible to make a work around by subclassing from ndarray. Still, in the "print a+c" statement, I don't understand why a.__add__(c) doesn't return NotImplemented (because ndarray shouldn't recognize the Cyclehist class) and directly call c.__radd__(a) implemented in my Cyclehist class. I tried the exactly same programme using Python 2.4.1 and Scipy 0.3.2 (based on numeric/numarray) and the result of the "print a+c" didn't get nested as I expect. > > Regards, > Mark > I'm not sure what this is doing -- it looks kind of bizzare -- however, you can fix this case without resorting to subclassing to ndarray. Just toss an '__array_priority__ = 10' up at the top of the class definition and it will use your __methods__ in preference to the ndarrays. I don't have time to look into this further right now unfortunately. -tim > -----Original Message----- > From: numpy-discussion-bounces at scipy.org > [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Stefan van > der Walt > Sent: 18. december 2006 10:36 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Unexpected output using numpy.ndarray > and__radd__ > > Hi Mark > > On Mon, Dec 18, 2006 at 08:30:20AM +0100, Mark Hoffmann wrote: > >> The following issue has puzzled me for a while. I want to add a >> numpy.ndarray and an instance of my own class. I define this >> operation by implementing the methods __add__ and __radd__. My programme (including output) looks like: >> >> #!/usr/local/bin/python >> >> import numpy >> >> class Cyclehist: >> def __init__(self,vals): >> self.valuearray = numpy.array(vals) >> >> def __str__(self): >> return 'Cyclehist object: valuearray = '+str(self.valuearray) >> >> def __add__(self,other): >> print "__add__ : ",self,other >> return self.valuearray + other >> >> def __radd__(self,other): >> print "__radd__ : ",self,other >> return other + self.valuearray >> >> c = Cyclehist([1.0,-21.2,3.2]) >> a = numpy.array([-1.0,2.2,-2.2]) >> print c + a >> print a + c >> > > In the first instance, c.__add__(a) is called, which works fine. In the second, a.__add__(c) is executed, which is your problem, since you rather want c.__radd__(a) to be executed. A documentation snippets: > > """For instance, to evaluate the expression x-y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented. > > Note: If the right operand's type is a subclass of the left operand's type and that subclass provides the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.""" > > Since a.__add__ does not return NotImplemented, c.__radd__ is not called where you expect it to be. I am not sure why broadcasting takes place here, maybe someone else on the list can elaborate. > > To solve your problem, you may want to look into subclassing ndarrays, as described at http://www.scipy.org/Subclasses. > > Cheers > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From aisaac at american.edu Mon Dec 18 10:03:03 2006 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 18 Dec 2006 10:03:03 -0500 Subject: [Numpy-discussion] =?utf-8?q?Python_2=2E5_on_win_and_Gnuplot_and?= =?utf-8?q?=09alter_code1?= In-Reply-To: <1166444135.2680.7.camel@localhost.localdomain> References: <00314E7F9E92594E84D353E2F112FE4F029C3831@mucse306.eu.infineon.com><1166444135.2680.7.camel@localhost.localdomain> Message-ID: On Mon, 18 Dec 2006, Francesc Altet apparently wrote to Hermann: > Do you have a special need to use Gnuplot? As it requires > Numeric and because it has issues with python2.5 and is > not maintained anymore, I don't recommend you using it any > longer. Have you tried SVN? My recollection is that Gnuplot.py now works with numpy. Ask on the mailing list: there was definitely talk about doing this. Cheers, Alan Isaac From pgmdevlist at gmail.com Mon Dec 18 13:12:50 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 18 Dec 2006 13:12:50 -0500 Subject: [Numpy-discussion] Subclasses - use of __finalize__ In-Reply-To: References: Message-ID: <200612181312.50883.pgmdevlist@gmail.com> On Saturday 16 December 2006 19:55, Colin J. Williams wrote: Colin, First of all, a disclaimer: I'm a (bad) hydrologist, not a computer scientist. I learned python/numpy by playing around, and really got into subclassing since 3-4 months ago. My explanations might not be completely accurate, I'll ask more experienced users to correct me if I'm wrong. `__new__` is the class constructor method. A call to `__new__(cls,...)` creates a new instance of the class `cls`, but doesn't initialize the instance, that's the role of the `__init__` method. According to the python documentation, If __new__() returns an instance of cls, then the new instance's __init__() method will be invoked like "__init__(self[, ...])", where self is the new instance and the remaining arguments are the same as were passed to __new__(). If __new__() does not return an instance of cls, then the new instance's __init__() method will not be invoked. __new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It turns out that ndarrays behaves as immutable types, therefore an `__init__` method is never called. How can we initialize the instance, then ? By calling `__array_finalize__`. `__array_finalize__` is called automatically once an instance is created with `__new__`. Moreover, it is called each time a new array is returned by a method, even if the method doesn't specifically call `__new__`. For example, the `__add__`, `__iadd__`, `reshape` return new arrays, so `__array_finalize` is called. Note that these methods do not create a new array from scratch, so there is no call to `__new__`. As another example, we can also modify the shape of the array with `resize`. However, this method works in place, so a new array is NOT created. About the `obj` argument in `__array_finalize__`: The first time a subarray is created, `__array_finalize__` is called with the argument `obj` as a regular ndarray. Afterwards, when a new array is returned without ccall to `__new__`, the `obj` argument is the initial subarray (the one calling the method). The easier is to try and see what happens. Here's a small script that defines a `InfoArray` class: just a ndarray with a tag attached. That's basically the class of the wiki, with messages printed in `__new__` and `__array_finalize__`. I join some doctest to illustrate some of the concepts, I hope it will be explanatory enough. Please let me know whether it helps. If it does, I'll update the wiki page ############################################## """ Let us define a new InfoArray object >>> x = InfoArray(N.arange(10), info={'name':'x'}) __new__ received __new__ sends as __array_finalize__ received __array_finalize__ defined Let's get the first element: >>> x[0] 0 We expect a scalar, we get a scalar, everything's fine. If now we want all the elements, we can use `x[:]`, which calls `__getslice__` and returns a new array. Therefore, we expect `__array_finalize__` to get called: >>> x[:] __array_finalize__ received __array_finalize__ defined InfoArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Let's add 1 to the array: this operation calls the `__add__` method, which returns a new array from `x` >>> x+1 __array_finalize__ received __array_finalize__ defined InfoArray([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) Let us change the shape of the array from *(10,)* to *(2,5)* with the `reshape` method. The method returns a new array, so we expect a call to `array_finalize`: >>> y = x.reshape((2,5)) __array_finalize__ received __array_finalize__ defined If now we print y, we call the __repr__ method, which in turns defines as many arrays as rows: we expect 2 calls to `__array_finalize__`: >>> print y __array_finalize__ received __array_finalize__ defined __array_finalize__ received __array_finalize__ defined [[0 1 2 3 4] [5 6 7 8 9]] Let's change the shape of `y` back to *(10,)*, but using the `resize` method this time. `resize` works in place, so a new array isn't be created, and `array_finalize` is not called. >>> y.resize((10,)) >>> y.shape (10,) OK, and what about `transpose` ? Well, it returns a new array (1 call), plus as we print it, we have *rows* calls to `array_finalize`, a total of *rows+1* calls >>> y.resize((5,2)) >>> print y.T __array_finalize__ received __array_finalize__ defined __array_finalize__ received __array_finalize__ defined __array_finalize__ received __array_finalize__ defined [[0 1 2 3 4] [5 6 7 8 9]] Now let's create a new array from scratch. `__new__` is called, but as the argument is already an InfoArray, the *__new__ sends...* line is bypassed. Moreover, if we don't precise the type, we call `data.astype` which in turn calls `__array_finalize__`. Then, `__array_finalize__` is called a second time, this time to initialize the new object. >>> z = InfoArray(x) __new__ received __new__ saw another dtype. __array_finalize__ received __array_finalize__ defined __array_finalize__ received __array_finalize__ defined Note that if we precise the dtype, we don't have to call `data.astype`, and `__array_finalize`` gets called once: >>> z = InfoArray(x, dtype=x.dtype) __new__ received __new__ saw the same dtype. __array_finalize__ received __array_finalize__ defined """ import numpy as N class InfoArray(N.ndarray): def __new__(subtype, data, info=None, dtype=None, copy=False): # When data is an InfoArray print "__new__ received %s" % type(data) if isinstance(data, InfoArray): if not copy and dtype==data.dtype: print "__new__ saw the same dtype." return data.view(subtype) else: print "__new__ saw another dtype." return data.astype(dtype).view(subtype) subtype._info = info subtype.info = subtype._info print "__new__ sends %s as %s" % (type(N.asarray(data)), subtype) return N.array(data).view(subtype) def __array_finalize__(self,obj): print "__array_finalize__ received %s" % type(obj) if hasattr(obj, "info"): # The object already has an info tag: just use it self.info = obj.info else: # The object has no info tag: use the default self.info = self._info print "__array_finalize__ defined %s" % type(self) def _test(): import doctest doctest.testmod(verbose=True) if __name__ == "__main__": _test() From efiring at hawaii.edu Mon Dec 18 13:53:40 2006 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 18 Dec 2006 08:53:40 -1000 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <458668F0.90604@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> Message-ID: <4586E3B4.2040609@hawaii.edu> David, I think my earlier post got lost in the exchange between you and Stefan, so I will reiterate the central point: numpy.clip *is* slow, in that an implementation using putmask is substantially faster: def fastclip(a, vmin, vmax): a = a.copy() putmask(a, a<=vmin, vmin) putmask(a, a>=vmax, vmax) return a Using the equivalent of this in a modification of your benchmark, the time using the native clip on *or* your alternative on my machine was about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems that putmask is quite a bit faster than boolean indexing. Obviously, the function above could be implemented as a method, and a copy kwarg could be used to make the copy optional--often one does not need a copy. It is also clear that it should be possible to make a much faster native clip function that does everything in one pass with no intermediate arrays at all. Whether this is something numpy devels would want to do, and how much effort it would take, are entirely different questions. I looked at the present code in clip (and part of the way through the chain of functions it invokes) and was quite baffled. Eric David Cournapeau wrote: > Stefan van der Walt wrote: >> On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote: >>> Yes, I of course mistyped the < and the copy. But the function is still >>> moderately faster on my workstation: >>> >>> ncalls tottime percall cumtime percall filename:lineno(function) >>> 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) >>> 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) >>> 10 1.990 0.199 1.990 0.199 >>> /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) >>> 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) >>> 1 0.258 0.258 0.258 0.258 >>> slowclip.py:6(generate_data_2d) >>> 0 0.000 0.000 profile:0(profiler) >> Did you try swapping the order of execution (i.e. clip1 second)? > Yes, I tried different orders, etc... and it showed the same pattern. > The thing is, this kind of thing is highly CPU dependent in my > experience; I don't have the time right now to update numpy.scipy on my > laptop, but it happens that profiles results are quite different between > my workstation (P4 xeon) and my laptop (pentium m). > > anyway, contrary to what I thought first, the real problem is the copy, > so this is where I should investigate in matplotlib case, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From zhangyunfeng at gmail.com Mon Dec 18 20:45:04 2006 From: zhangyunfeng at gmail.com (zhang yunfeng) Date: Tue, 19 Dec 2006 09:45:04 +0800 Subject: [Numpy-discussion] sum of two arrays with different shape? In-Reply-To: <4585FF62.10701@gmail.com> References: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> <4585FF62.10701@gmail.com> Message-ID: <4ff46d8f0612181745r7cc11f75ne471e7e8862191ed@mail.gmail.com> 2006/12/18, Robert Kern : > > zhang yunfeng wrote: > > Hi, I'm newbie to Numpy. > > > > When reading tutorials at > > http://www.scipy.org/Tentative_NumPy_Tutorial > > , I found a snippet about > > addition of two arrays with different shape, Does it make sense? If > > array shapes are not same, why it doesn't throw out an error? > > When two arrays of different shapes are operated against each other, numpy > tries > to "broadcast" them to a compatible shape according to certain rules. This > is a > fairly powerful concept, and it provides quite a lot of convenience. The > following wiki page has an explanation of the broadcasting rules: > > http://www.scipy.org/EricsBroadcastingDoc > > Yes, It seems powerful. But If one happened to add two incompatible array by mistake, Does the result make sense? May be the broadcast feature should be limited in a certain range not to mess normal operation. -- http://my.opera.com/zhangyunfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 18 21:03:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Dec 2006 20:03:00 -0600 Subject: [Numpy-discussion] sum of two arrays with different shape? In-Reply-To: <4ff46d8f0612181745r7cc11f75ne471e7e8862191ed@mail.gmail.com> References: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> <4585FF62.10701@gmail.com> <4ff46d8f0612181745r7cc11f75ne471e7e8862191ed@mail.gmail.com> Message-ID: <45874854.7090702@gmail.com> zhang yunfeng wrote: > Yes, It seems powerful. But If > one happened to add two incompatible array by > mistake, Does the result make sense? It may. The array object can't read your mind and know that you didn't intend it to do what you (accidentally) told it to do. > May be the broadcast feature > should be limited in a certain range not to mess normal operation. Broadcasting is really a fundamental feature of numpy. It *is* normal operation as much as anything else is. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Tue Dec 19 00:10:29 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Dec 2006 14:10:29 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4586E3B4.2040609@hawaii.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> Message-ID: <45877445.20508@ar.media.kyoto-u.ac.jp> Eric Firing wrote: > David, > > I think my earlier post got lost in the exchange between you and Stefan, > so I will reiterate the central point: numpy.clip *is* slow, in that an > implementation using putmask is substantially faster: > > def fastclip(a, vmin, vmax): > a = a.copy() > putmask(a, a<=vmin, vmin) > putmask(a, a>=vmax, vmax) > return a > > Using the equivalent of this in a modification of your benchmark, the > time using the native clip on *or* your alternative on my machine was > about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems > that putmask is quite a bit faster than boolean indexing. > > Obviously, the function above could be implemented as a method, and a > copy kwarg could be used to make the copy optional--often one does not > need a copy. > > It is also clear that it should be possible to make a much faster native > clip function that does everything in one pass with no intermediate > arrays at all. Whether this is something numpy devels would want to do, > and how much effort it would take, are entirely different questions. I > looked at the present code in clip (and part of the way through the > chain of functions it invokes) and was quite baffled. Well, this is something I would be willing to try *if* this is the main bottleneck of imshow/show. I am still unsure about the problem, because if I change numpy.clip to my function, including a copy, I really get a big difference myself: val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask) vs def myclip(b, m, M): a = b.copy() a[aM] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) By trying the best result, I get 0.888 ms vs 0.784 for a show() call, which is already a 10 % improvement, and I get almost a 15 % if I remove the copy. I am updating numpy/scipy/mpl on my laptop to see if this is specific to the CPU of my workstation (big cache, high frequency clock, bi CPU with HT enabled). I would really like to see the imshow/show calls goes in the range of a few hundred ms; for interactive plotting, this really change a lot in my opinion. cheers, David From gael.varoquaux at normalesup.org Tue Dec 19 02:13:38 2006 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Dec 2006 08:13:38 +0100 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45877445.20508@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> Message-ID: <20061219071338.GA21179@clipper.ens.fr> On Tue, Dec 19, 2006 at 02:10:29PM +0900, David Cournapeau wrote: > I would really like to see the imshow/show calls goes in the range of a > few hundred ms; for interactive plotting, this really change a lot in my > opinion. I think this is strongly dependant on some parameters. I did some interactive plotting on both a pentium 2, linux, WxAgg (thus Gtk behind Wx), and a pentium 4, windows, WxAgg (thus MFC behin Wx), and there was a huge difference between the speeds. The speed difference was a few orders of magnitudes. I couldn't explain it but it was a good surprise, as the application was developped for the windows box. Ga?l From david at ar.media.kyoto-u.ac.jp Tue Dec 19 02:12:34 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Dec 2006 16:12:34 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) Message-ID: <458790E2.8040607@ar.media.kyoto-u.ac.jp> Hi, Following the discussion on clip and other functions which *may* be slow in numpy, I would like to know if there is a way to easily profile numpy, ie functions which are written in C. For example, I am not sure to understand why a function like take(a, b) with a a double 256x4 array and b a 8000x256 int array takes almost 200 ms on a fairly fast CPU; in the source code, I can see that numpy uses memmove, and I know memmove to be slower than memcpy. Is there an easy way to check that this is coming from memmove (case in which nothing much can be done to improve the situation I guess), and not from something else ? cheers, David From efiring at hawaii.edu Tue Dec 19 02:19:21 2006 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 18 Dec 2006 21:19:21 -1000 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45877445.20508@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> Message-ID: <45879279.6030707@hawaii.edu> David Cournapeau wrote: > Eric Firing wrote: >> David, >> >> I think my earlier post got lost in the exchange between you and Stefan, >> so I will reiterate the central point: numpy.clip *is* slow, in that an >> implementation using putmask is substantially faster: >> >> def fastclip(a, vmin, vmax): >> a = a.copy() >> putmask(a, a<=vmin, vmin) >> putmask(a, a>=vmax, vmax) >> return a >> >> Using the equivalent of this in a modification of your benchmark, the >> time using the native clip on *or* your alternative on my machine was >> about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems >> that putmask is quite a bit faster than boolean indexing. >> >> Obviously, the function above could be implemented as a method, and a >> copy kwarg could be used to make the copy optional--often one does not >> need a copy. >> >> It is also clear that it should be possible to make a much faster native >> clip function that does everything in one pass with no intermediate >> arrays at all. Whether this is something numpy devels would want to do, >> and how much effort it would take, are entirely different questions. I >> looked at the present code in clip (and part of the way through the >> chain of functions it invokes) and was quite baffled. > Well, this is something I would be willing to try *if* this is the main > bottleneck of imshow/show. I am still unsure about the problem, because > if I change numpy.clip to my function, including a copy, I really get a > big difference myself: > > val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), > mask=mask) > > vs > > def myclip(b, m, M): > a = b.copy() > a[a a[a>M] = M > return a > val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) > > By trying the best result, I get 0.888 ms vs 0.784 for a show() call, > which is already a 10 % improvement, and I get almost a 15 % if I remove > the copy. I am updating numpy/scipy/mpl on my laptop to see if this is > specific to the CPU of my workstation (big cache, high frequency clock, > bi CPU with HT enabled). Please try the putmask version without the copy on your machines; I expect it will be quite a bit faster on both machines. The relative speeds of the versions may differ widely depending on how many values actually get changed, though. Eric From david at ar.media.kyoto-u.ac.jp Tue Dec 19 03:17:01 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Dec 2006 17:17:01 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <20061219071338.GA21179@clipper.ens.fr> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <20061219071338.GA21179@clipper.ens.fr> Message-ID: <45879FFD.8070408@ar.media.kyoto-u.ac.jp> Gael Varoquaux wrote: > On Tue, Dec 19, 2006 at 02:10:29PM +0900, David Cournapeau wrote: >> I would really like to see the imshow/show calls goes in the range of a >> few hundred ms; for interactive plotting, this really change a lot in my >> opinion. > > I think this is strongly dependant on some parameters. I did some > interactive plotting on both a pentium 2, linux, WxAgg (thus Gtk behind > Wx), and a pentium 4, windows, WxAgg (thus MFC behin Wx), and there was a > huge difference between the speeds. The speed difference was a few orders > of magnitudes. I couldn't explain it but it was a good surprise, as the > application was developped for the windows box. I started to investigate the problem because under matlab, plotting a spectrogram is negligeable compared to computing it, whereas in matplotlib with numpy array backend, plotting it takes as much time as computing it, which didn't make sense to me. Most of the computing time is spend into code which is independent of the backend, that is during the conversion from the rank 2 array to rgba (60 % of the time of my fast workstation, 85 % of the time on my laptop with a pentium M @ 1.2 Ghz), so I don't think the GUI backend makes any difference. cheers, David From david at ar.media.kyoto-u.ac.jp Tue Dec 19 03:56:06 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Dec 2006 17:56:06 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45879279.6030707@hawaii.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> Message-ID: <4587A926.7070401@ar.media.kyoto-u.ac.jp> Eric Firing wrote: > David Cournapeau wrote: >> Well, this is something I would be willing to try *if* this is the main >> bottleneck of imshow/show. I am still unsure about the problem, because >> if I change numpy.clip to my function, including a copy, I really get a >> big difference myself: >> >> val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), >> mask=mask) >> >> vs >> >> def myclip(b, m, M): >> a = b.copy() >> a[a> a[a>M] = M >> return a >> val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) >> >> By trying the best result, I get 0.888 ms vs 0.784 for a show() call, >> which is already a 10 % improvement, and I get almost a 15 % if I remove >> the copy. I am updating numpy/scipy/mpl on my laptop to see if this is >> specific to the CPU of my workstation (big cache, high frequency clock, >> bi CPU with HT enabled). > > Please try the putmask version without the copy on your machines; I > expect it will be quite a bit faster on both machines. The relative > speeds of the versions may differ widely depending on how many values > actually get changed, though. On my workstation (dual xeon; I run each corresponding script 5 times and took the best result): - nx.clip takes ~ 170 ms (of 920 ms for the whole show call) - your fast clip, with copy: ~ 50 ms (of ~820 ms) - mine, with copy: ~50 ms (of ~830 ms) - your wo copy: ~ 30 ms (of 830 ms) - mine wo copy: ~ 40 ms (of 830 ms) Same on my laptop (pentium M @ 1.2 Ghz): - nx.clip takes ~ 230 ms (of 1460 ms) - mine with copy ~ 70 ms (of 1200 ms) - mine wo copy ~ 55 ms (of 1300 ms) - yours with copy ~ 80 ms (of 1300 ms) - yours wo copy ~ 67 ms (of 1300 ms) Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though. cheers, David From robert.kern at gmail.com Tue Dec 19 04:37:51 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 03:37:51 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4587A926.7070401@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> Message-ID: <4587B2EF.7010803@gmail.com> David Cournapeau wrote: > Basically, at least from those figures, both versions are pretty > similar, and not worth improving much anyway for matplotlib. There is > something funny with numpy version, though. Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Tue Dec 19 04:41:16 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 19 Dec 2006 18:41:16 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4587B2EF.7010803@gmail.com> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> Message-ID: <4587B3BC.5040802@ar.media.kyoto-u.ac.jp> Robert Kern wrote: > > Looking at the code, it's certainly not surprising that the current > implementation of clip() is slow. It is a direct numpy C API translation of the > following (taken from numarray, but it is the same in Numeric): > > > def clip(m, m_min, m_max): > """clip() returns a new array with every entry in m that is less than m_min > replaced by m_min, and every entry greater than m_max replaced by m_max. > """ > selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) > return choose(selector, (m, m_min, m_max)) > > > Creating that integer selector array is probably the most expensive part. > Copying the array, then using putmask() or similar is certainly a better > approach, and I can see no drawbacks to it. > > If anyone is up to translating their faster clip() into C, I'm more than happy > to check it in. I might also entertain adding a copy=True keyword argument, but > I'm not entirely certain we should be expanding the API during the 1.0.x series. > I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ? For now, putting the function with a copy (the current behaviour ?) would be ok, right ? The copy part is a much smaller problem than the rest of the function anyway, at least from my modest benchmarking, David From robert.kern at gmail.com Tue Dec 19 04:49:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 03:49:05 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4587B3BC.5040802@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4587B3BC.5040802@ar.media.kyoto-u.ac.jp> Message-ID: <4587B591.3060308@gmail.com> David Cournapeau wrote: > I would be happy to code the function; for new code to be added to > numpy, is there another branch than the current one ? What is the > approach for a 1.1.x version of numpy ? I don't think we've decided on one, yet. > For now, putting the function with a copy (the current behaviour ?) > would be ok, right ? The copy part is a much smaller problem than the > rest of the function anyway, at least from my modest benchmarking, I'd prefer that you simply modify PyArray_Clip to use a better approach than to make an entirely new function. In that case, it certainly must make a copy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From faltet at carabos.com Tue Dec 19 09:03:30 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue, 19 Dec 2006 15:03:30 +0100 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <458790E2.8040607@ar.media.kyoto-u.ac.jp> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> Message-ID: <200612191503.32995.faltet@carabos.com> A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigu?: > Hi, > > Following the discussion on clip and other functions which *may* be > slow in numpy, I would like to know if there is a way to easily profile > numpy, ie functions which are written in C. > For example, I am not sure to understand why a function like take(a, > b) with a a double 256x4 array and b a 8000x256 int array takes almost > 200 ms on a fairly fast CPU; in the source code, I can see that numpy > uses memmove, and I know memmove to be slower than memcpy. Is there an > easy way to check that this is coming from memmove (case in which > nothing much can be done to improve the situation I guess), and not from > something else ? For doing profiles on C extensions, you can use cProfile, which has been included in Python 2.5. See an example of your benchmark using cProfile below. I've run it against clip1_bench and clip2_bench. Here are my results (using a Pentium4 Mobile @ 2 GHz). For clip1 (i.e. clip from numpy): 17 function calls in 5.131 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 10 4.638 0.464 4.638 0.464 {method 'clip' of 'numpy.ndarray' objects} 1 0.453 0.453 0.453 0.453 {method 'randn' of 'mtrand.RandomState' objects} 1 0.020 0.020 4.658 4.658 clipb2.py:16(clip1_bench) 1 0.002 0.002 5.113 5.113 clipb2.py:10(bench_clip) 1 0.002 0.002 5.115 5.115 :1() 1 0.000 0.000 0.453 0.453 clipb2.py:6(generate_data_2d) 1 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} [you can see the C extensions between curly brackets] For clip2 (i.e. your hand-made clip equivalent): 17 function calls in 3.371 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 2.485 2.485 2.911 2.911 clipb2.py:19(clip2_bench) 1 0.456 0.456 0.456 0.456 {method 'randn' of 'mtrand.RandomState' objects} 10 0.426 0.043 0.426 0.043 {method 'copy' of 'numpy.ndarray' objects} 1 0.003 0.003 3.369 3.369 clipb2.py:10(bench_clip) 1 0.002 0.002 3.371 3.371 :1() 1 0.000 0.000 0.456 0.456 clipb2.py:6(generate_data_2d) 1 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} From these timings, one can see that most of the time in clip1 is wasted in .clip() method of numpy (nothing really new). So, cProfile is only showing where the time is spent at the first-level calls in extension level. If we want more introspection on the C stack, and you are running un Linux, oprofile (http://oprofile.sourceforge.net) is a very nice profiler. Here are the outputs for the above routines on my machine. For clip1: Profiling through timer interrupt samples % image name symbol name 643 54.6769 libc-2.3.6.so memmove 151 12.8401 multiarray.so PyArray_Choose 35 2.9762 umath.so BYTE_multiply 34 2.8912 umath.so DOUBLE_greater 32 2.7211 mtrand.so rk_random 32 2.7211 umath.so DOUBLE_less 30 2.5510 libc-2.3.6.so memcpy For clip2: Profiling through timer interrupt samples % image name symbol name 188 24.5111 libc-2.3.6.so memmove 143 18.6441 multiarray.so _nonzero_indices 126 16.4276 multiarray.so PyArray_MapIterNext 37 4.8240 umath.so DOUBLE_greater 36 4.6936 mtrand.so rk_gauss 33 4.3025 umath.so DOUBLE_less 24 3.1291 libc-2.3.6.so memcpy So, it seems like if you are right: the bottleneck is in calling the memmove routine. Looking at the code in PyArray_Choose (multiarraymodule.c), I've replaced the memmove call by a memcpy one. Here it is the patch: --- numpy/core/src/multiarraymodule.c (revision 3487) +++ numpy/core/src/multiarraymodule.c (working copy) @@ -2126,7 +2126,7 @@ } offset = i*elsize; if (offset >= sizes[mi]) {offset = offset % sizes[mi]; } - memmove(ret_data, mps[mi]->data+offset, elsize); + memcpy(ret_data, mps[mi]->data+offset, elsize); ret_data += elsize; self_data++; } With this patch applied, he have, for clip1: Profiling through timer interrupt samples % image name symbol name 659 55.2389 libc-2.3.6.so memcpy 184 15.4233 multiarray.so PyArray_Choose 46 3.8558 mtrand.so rk_gauss 37 3.1014 umath.so BYTE_multiply 34 2.8500 umath.so DOUBLE_greater 34 2.8500 umath.so DOUBLE_less 24 2.0117 libm-2.3.6.so __ieee754_log So, it seems clear that the use of memcpy hasn't accelerated the computations at all. This is somewhat striking, because in most situations, memcpy should perform better (see [1] for a practical example of this). My guess is that the real bottleneck is in calling so many times memmove (once per element in the array). Perhaps the algorithm can be changed to do a block copy at the beginning and then modify only the places on which the clip should act (kind of the same that you have made in Python, but at C level). [1] Cheers, ------------------------------------------------------------------ # Example of clip using cProfile import numpy as N #========================== # To benchmark imshow alone #========================== def generate_data_2d(fr, nwin, hop, len): nframes = 1.0 * fr / hop * len return N.random.randn(nframes, nwin) def bench_clip(): m = -1. M = 1. # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap data = generate_data_2d(8000, 256, 128, 120) def clip1_bench(data, niter): for i in range(niter): blop = data.clip(m, M) def clip2_bench(data, niter): for i in range(niter): blop = data.copy() blop[blopM] = M #clip2_bench(data, 10) clip1_bench(data, 10) if __name__ == '__main__': # test clip import pstats import cProfile as prof profile_wanted = False if not profile_wanted: bench_clip() else: profile_file = 'clip.prof' prof.run('bench_clip()', profile_file) stats = pstats.Stats(profile_file) stats.strip_dirs() stats.sort_stats('time', 'calls') stats.print_stats() ------------------------------------------------------------------- -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Tue Dec 19 09:33:37 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Dec 2006 07:33:37 -0700 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612191503.32995.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> Message-ID: On 12/19/06, Francesc Altet wrote: > > A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigu?: > > Hi, > > > > My guess is that the real bottleneck is in calling so many times > memmove (once per element in the array). Perhaps the algorithm can be > changed to do a block copy at the beginning and then modify only the > places on which the clip should act (kind of the same that you have > made in Python, but at C level). IIRC, doing a simple type specific assignment is faster than either memmov or memcpy. If speed is really of the essence it would probably be worth writing a type specific version of clip. A special function combining clip with RGB conversion might do even better. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From asafdav2 at gmail.com Tue Dec 12 09:06:53 2006 From: asafdav2 at gmail.com (asaf david) Date: Tue, 12 Dec 2006 16:06:53 +0200 Subject: [Numpy-discussion] A question about argmax and argsort In-Reply-To: References: Message-ID: Hello Let's say i have an N sized array, and i want to get the positions of the K largest items. for K = 1 this is simply argmax. is there any way to generalize it for k !=1? currently I use argsort and take only K items from it, but I'm paying an additional ~lg(N)... Thanks in advance, asaf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Hoffmann at dk.manbw.com Fri Dec 15 10:42:24 2006 From: Mark.Hoffmann at dk.manbw.com (Mark Hoffmann) Date: Fri, 15 Dec 2006 16:42:24 +0100 Subject: [Numpy-discussion] Unexpected output using numpy.ndarray and __radd__ Message-ID: <1A0F0517C2D6894282F07FAF76FA3612020AD01E@CPH-EXCH-SG4.manbw.dk> Hi, The following issue has puzzled me for a while. I want to add a numpy.ndarray and an instance of my own class. I define this operation by implementing the methods __add__ and __radd__. My programme (including output) looks like: #!/usr/local/bin/python import numpy class Cyclehist: def __init__(self,vals): self.valuearray = numpy.array(vals) def __str__(self): return 'Cyclehist object: valuearray = '+str(self.valuearray) def __add__(self,other): print "__add__ : ",self,other return self.valuearray + other def __radd__(self,other): print "__radd__ : ",self,other return other + self.valuearray c = Cyclehist([1.0,-21.2,3.2]) a = numpy.array([-1.0,2.2,-2.2]) print c + a print a + c # ---------- OUTPUT ---------- # # addprob $ addprob.py # __add__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] [-1. 2.2 -2.2] # [ 0. -19. 1.] # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -1.0 # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] 2.2 # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -2.2 # [[ 0. -22.2 2.2] [ 3.2 -19. 5.4] [ -1.2 -23.4 1. ]] # addprob $ # # ---------------------------- I expected the output of "c+a" and "a+c" to be identical, however, the output of "a+c" gets nested in an elementwise fashion. Can anybody explain this? Is it a bug or a feature? I'm using Python 2.4.4c1 and numpy 1.0. I tried the programme using an older version of Python and numpy and there the result of "c+a" and "a+c" are identical. Regards, Mark Hoffmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Dec 17 13:31:33 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 Dec 2006 11:31:33 -0700 Subject: [Numpy-discussion] Anyone have a "little" shooting-method function to share In-Reply-To: <45539769.6030306@noaa.gov> References: <45527171.8030903@noaa.gov> <1163094437.6376.6.camel@localhost.localdomain> <45539769.6030306@noaa.gov> Message-ID: I apologize if this shows up twice. On 11/9/06, David L Goldsmith wrote: > > Wow, thanks! > > DG > > Pauli Virtanen wrote: > > ke, 2006-11-08 kello 16:08 -0800, David L Goldsmith kirjoitti: > > > >> Hi! I tried to send this earlier: it made it into my sent mail folder, > >> but does not appear to have made it to the list. > >> > >> I need to numerically solve: > >> (1-t)x" + x' - x = f(t), x(0) = x0, x(1) = x1 > >> I've been trying to use (because it's the approach I inherited) an > >> elementary finite-difference discretization, but unit tests have shown > >> that that approach isn't working. > >> I thought I would try a Chebyshev method on this problem. My solution with order ten (degree 9) Chebyshev polynomials goes as follows. In [111]: import chebyshev as c In [112]: t = c.modified_points(10,0,1) # use 10 sample points In [113]: D = c.modified_derivative(10,0,1) # derivative operator In [114]: op = (1.0 - t)[:,newaxis]*dot(D,D) + D - eye(10) # differential equation In [115]: op[0] = 0 # set up boundary condition y(0) = y0 In [116]: op[0,0] = 1 In [117]: op[9] = 0 # set up boundary condition y(1) = y1 In [118]: op[9,9] = 1 In [119]: opinv = alg.inv(op) # invert the operator In [120]: f = exp(t) # try f(t) = exp(t) In [121]: f[0] = 2 # y0 = 2 In [122]: f[9] = 1 # y1 = 1 In [123]: soln = dot(opinv,f) # solve equation In [124]: plot(t,soln) Out[124]: [] The plot is rather rough with only 10 points. Replot with more. In [125]: tsmp = linspace(0,1) In [126]: interp = c.modified_values(tsmp, 10, 0, 0, 1) In [127]: plot(tsmp, dot(interp, soln)) Out[127]: [] Looks OK here. You can save opinv as it doesn't change with f. Likewise, if you always want to interpolate the result, then save dot(interp, opinv). I've attached a plot of the solution I got along with the chebyshev module I use. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: solution.zip Type: application/zip Size: 30179 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: chebyshev.py Type: text/x-python Size: 18573 bytes Desc: not available URL: From Derek.Bandler at gs.com Tue Dec 19 08:24:23 2006 From: Derek.Bandler at gs.com (Bandler, Derek) Date: Tue, 19 Dec 2006 08:24:23 -0500 Subject: [Numpy-discussion] (no subject) Message-ID: Hi, I would like to get information on the software licenses for numpy & numeric. On the sourceforge home for the packages, the listed license is OSI-Approved Open Source . Is it possible to get more information on this? A copy of the document would be useful. Thank you. Best regards, Derek Bandler -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregwillden at gmail.com Tue Dec 19 14:33:40 2006 From: gregwillden at gmail.com (Greg Willden) Date: Tue, 19 Dec 2006 13:33:40 -0600 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: Message-ID: <903323ff0612191133r224044e8o15ac7cf94fb72050@mail.gmail.com> Hi Derek, Like all Free & Open Source Software (FOSS) projects the license is distributed with the source code. There is a file called LICENSE.txt in the numpy tar archive. Here are the contents of that file. Copyright (c) 2005, NumPy Developers All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Greg On 12/19/06, Bandler, Derek wrote: > > Hi, > > I would like to get information on the software licenses for numpy & > numeric. On the sourceforge home for the packages, the listed license is > *OSI-Approved Open Source*. Is it possible to get more information on > this? A copy of the document would be useful. Thank you. > > Best regards, > Derek Bandler > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 19 14:35:52 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 13:35:52 -0600 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: Message-ID: <45883F18.3020204@gmail.com> Bandler, Derek wrote: > Hi, > > I would like to get information on the software licenses for numpy & > numeric. On the sourceforge home for the packages, the listed license > is _OSI-Approved Open Source_ > . Is it possible to get > more information on this? A copy of the document would be useful. > Thank you. They are both BSD-like licenses. http://projects.scipy.org/scipy/numpy/browser/trunk/LICENSE.txt http://projects.scipy.org/scipy/scipy/browser/trunk/LICENSE.txt -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Tue Dec 19 20:18:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue, 19 Dec 2006 18:18:06 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4587B3BC.5040802@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4587B3BC.5040802@ar.media.kyoto-u.ac.jp> Message-ID: <45888F4E.7070707@ee.byu.edu> David Cournapeau wrote: > Robert Kern wrote: > >> Looking at the code, it's certainly not surprising that the current >> implementation of clip() is slow. It is a direct numpy C API translation of the >> following (taken from numarray, but it is the same in Numeric): >> >> >> def clip(m, m_min, m_max): >> """clip() returns a new array with every entry in m that is less than m_min >> replaced by m_min, and every entry greater than m_max replaced by m_max. >> """ >> selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) >> return choose(selector, (m, m_min, m_max)) >> >> >> Creating that integer selector array is probably the most expensive part. >> Copying the array, then using putmask() or similar is certainly a better >> approach, and I can see no drawbacks to it. >> >> If anyone is up to translating their faster clip() into C, I'm more than happy >> to check it in. I might also entertain adding a copy=True keyword argument, but >> I'm not entirely certain we should be expanding the API during the 1.0.x series. >> >> > I would be happy to code the function; for new code to be added to > numpy, is there another branch than the current one ? What is the > approach for a 1.1.x version of numpy ? > The idea is to make a 1.0.x branch as soon as the trunk changes the C-API. The guarantee is that extension modules won't have to be rebuilt until 1.1. I don't know that we've specified if there will be *no* API changes. For example, there have already been some backward-compatible extensions to the 1.0.X series. I like the idea of being able to add functions to the 1.0.X series but without breaking compatibility. I also don't mind adding new keywords to functions (but not to C-API calls as that would require a re-compile of extension modules). -Travis From oliphant at ee.byu.edu Tue Dec 19 20:21:31 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue, 19 Dec 2006 18:21:31 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4587B2EF.7010803@gmail.com> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> Message-ID: <4588901B.5030904@ee.byu.edu> Robert Kern wrote: > David Cournapeau wrote: > > >> Basically, at least from those figures, both versions are pretty >> similar, and not worth improving much anyway for matplotlib. There is >> something funny with numpy version, though. >> > > Looking at the code, it's certainly not surprising that the current > implementation of clip() is slow. It is a direct numpy C API translation of the > following (taken from numarray, but it is the same in Numeric): > > > def clip(m, m_min, m_max): > """clip() returns a new array with every entry in m that is less than m_min > replaced by m_min, and every entry greater than m_max replaced by m_max. > """ > selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) > return choose(selector, (m, m_min, m_max)) > > There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up. > Creating that integer selector array is probably the most expensive part. > Copying the array, then using putmask() or similar is certainly a better > approach, and I can see no drawbacks to it. > > If anyone is up to translating their faster clip() into C, I'm more than happy > to check it in. I might also entertain adding a copy=True keyword argument, but > I'm not entirely certain we should be expanding the API during the 1.0.x series. > > The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. We would probably be better off not expanding the keyword arguments to methods as well until that time. -Travis From robert.kern at gmail.com Tue Dec 19 20:56:15 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 19:56:15 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588901B.5030904@ee.byu.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> Message-ID: <4588983F.5000902@gmail.com> Travis Oliphant wrote: > The problem with the copy=True keyword is that it would imply needing to > expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. I don't think we have to change the signature of PyArray_Clip() at all. PyArray_Clip() takes an "out" argument. Currently, this is only set to something other than NULL if explicitly provided as a keyword "out=" argument to numpy.ndarray.clip(). All we have to do is modify the implementation of array_clip() to parse a "copy=" argument and set "out = self" before calling PyArray_Clip(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue Dec 19 20:57:46 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 19:57:46 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588901B.5030904@ee.byu.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> Message-ID: <4588989A.1040506@gmail.com> Travis Oliphant wrote: > There are a lot of functions that are essentially this. Many things > were done to just get something working. It would seem like a good idea > to re-code many of these to speed them up. Off the top of your head, do you have a list of these? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at ieee.org Tue Dec 19 21:15:17 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue, 19 Dec 2006 19:15:17 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588983F.5000902@gmail.com> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588983F.5000902@gmail.com> Message-ID: <45889CB5.4090604@ieee.org> Robert Kern wrote: > Travis Oliphant wrote: > >> The problem with the copy=True keyword is that it would imply needing to >> expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. >> > > I don't think we have to change the signature of PyArray_Clip() at all. > PyArray_Clip() takes an "out" argument. Currently, this is only set to something > other than NULL if explicitly provided as a keyword "out=" argument to > numpy.ndarray.clip(). All we have to do is modify the implementation of > array_clip() to parse a "copy=" argument and set "out = self" before calling > PyArray_Clip(). > I admit to not following the clip discussion very closely, but if PyArray_Clip already supports 'out', why use a copy parameter at all? Why not just expose 'out' at the python level. This allows in place operations: "clip(m, m_min, m_max, out=m)", it is more flexible than a copy argument and matches the interface of a whole pile of other functions. My $0.02 -tim From david at ar.media.kyoto-u.ac.jp Tue Dec 19 21:30:45 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 11:30:45 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588901B.5030904@ee.byu.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> Message-ID: <4588A055.5070707@ar.media.kyoto-u.ac.jp> Travis Oliphant wrote: > Robert Kern wrote: >> David Cournapeau wrote: >> >> >>> Basically, at least from those figures, both versions are pretty >>> similar, and not worth improving much anyway for matplotlib. There is >>> something funny with numpy version, though. >>> >> Looking at the code, it's certainly not surprising that the current >> implementation of clip() is slow. It is a direct numpy C API translation of the >> following (taken from numarray, but it is the same in Numeric): >> >> >> def clip(m, m_min, m_max): >> """clip() returns a new array with every entry in m that is less than m_min >> replaced by m_min, and every entry greater than m_max replaced by m_max. >> """ >> selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) >> return choose(selector, (m, m_min, m_max)) >> >> > > There are a lot of functions that are essentially this. Many things > were done to just get something working. It would seem like a good idea > to re-code many of these to speed them up. >> Creating that integer selector array is probably the most expensive part. >> Copying the array, then using putmask() or similar is certainly a better >> approach, and I can see no drawbacks to it. >> >> If anyone is up to translating their faster clip() into C, I'm more than happy >> to check it in. I might also entertain adding a copy=True keyword argument, but >> I'm not entirely certain we should be expanding the API during the 1.0.x series. >> >> > The problem with the copy=True keyword is that it would imply needing to > expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. > > We would probably be better off not expanding the keyword arguments to > methods as well until that time. When I went back to home, I started taking a close look a numpy/core C sources, with the help of the numpy ebook. The huge source files make it really difficult for me to follow some things: I was wondering if there is some rationale behind it, or if this is just a remain of old developments of numpy. The main problem I have with those huge files is that I am confused between the functions parts of the public API, the one for backward compatibility, etc... I wanted to extract the PyArray_TakeFom function to see where the time is spent, but this is quite difficult, because of various dependencies. My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ? cheers, David From robert.kern at gmail.com Tue Dec 19 21:36:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Dec 2006 20:36:05 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <45889CB5.4090604@ieee.org> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588983F.5000902@gmail.com> <45889CB5.4090604@ieee.org> Message-ID: <4588A195.3060809@gmail.com> Tim Hochberg wrote: > Robert Kern wrote: >> Travis Oliphant wrote: >> >>> The problem with the copy=True keyword is that it would imply needing to >>> expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. >>> >> I don't think we have to change the signature of PyArray_Clip() at all. >> PyArray_Clip() takes an "out" argument. Currently, this is only set to something >> other than NULL if explicitly provided as a keyword "out=" argument to >> numpy.ndarray.clip(). All we have to do is modify the implementation of >> array_clip() to parse a "copy=" argument and set "out = self" before calling >> PyArray_Clip(). >> > I admit to not following the clip discussion very closely, but if > PyArray_Clip already supports 'out', why use a copy parameter at all? > Why not just expose 'out' at the python level. This allows in place > operations: "clip(m, m_min, m_max, out=m)", it is more flexible than a > copy argument and matches the interface of a whole pile of other functions. It's already exposed. I just didn't know that before I proposed copy=True (and when I learned it, my brain was already stuck in that mode). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Tue Dec 19 21:36:15 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 11:36:15 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612191503.32995.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> Message-ID: <4588A19F.3050405@ar.media.kyoto-u.ac.jp> Francesc Altet wrote: > A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigu?: >> Hi, >> >> Following the discussion on clip and other functions which *may* be >> slow in numpy, I would like to know if there is a way to easily profile >> numpy, ie functions which are written in C. >> For example, I am not sure to understand why a function like take(a, >> b) with a a double 256x4 array and b a 8000x256 int array takes almost >> 200 ms on a fairly fast CPU; in the source code, I can see that numpy >> uses memmove, and I know memmove to be slower than memcpy. Is there an >> easy way to check that this is coming from memmove (case in which >> nothing much can be done to improve the situation I guess), and not from >> something else ? > > For doing profiles on C extensions, you can use cProfile, which has > been included in Python 2.5. See an example of your benchmark using > cProfile below. I haven't used python2.5 cprofile yet, mainly because of a really annoying bug of ubuntu with python2.5 and ctypes, which makes it unusable for me. I totally forgot about ooprofile, which I tried once some time ago; I really like what I saw then. Thank you for the tip ! Concerning the memmove vs memcpy: the problem I was speaking about is in another function (numpy take), where the problem is much bigger speed wise. I will be on holiday starting from tomorrow, with a big flight from Osaka to France, so I will have some time in the plane to investigate those :) cheers, David From david at ar.media.kyoto-u.ac.jp Tue Dec 19 21:41:02 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 11:41:02 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> Message-ID: <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > > > > My guess is that the real bottleneck is in calling so many times > memmove (once per element in the array). Perhaps the algorithm can be > changed to do a block copy at the beginning and then modify only the > places on which the clip should act (kind of the same that you have > made in Python, but at C level). > > > IIRC, doing a simple type specific assignment is faster than either > memmov or memcpy. If speed is really of the essence it would probably > be worth writing a type specific version of clip. A special function > combining clip with RGB conversion might do even better. At the end, in the original context (speeding the drawing of spectrogram), this is the problem. Even if multiple backend/toolkits have obviously an impact in performances, I really don't see why a numpy function to convert an array to a RGB representation should be 10-20 times slower than matlab on the same machine. I will take into account all those helpful messages, and hopefully come with something for the end of the week :), cheers David From charlesr.harris at gmail.com Tue Dec 19 22:51:32 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Dec 2006 20:51:32 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588A055.5070707@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> Message-ID: On 12/19/06, David Cournapeau wrote: > > Travis Oliphant wrote: > > Robert Kern wrote: > >> David Cournapeau wrote: When I went back to home, I started taking a close look a numpy/core C > sources, with the help of the numpy ebook. The huge source files make it > really difficult for me to follow some things: I was wondering if there > is some rationale behind it, or if this is just a remain of old > developments of numpy. > > The main problem I have with those huge files is that I am confused > between the functions parts of the public API, the one for backward > compatibility, etc... I wanted to extract the PyArray_TakeFom function > to see where the time is spent, but this is quite difficult, because of > various dependencies. > > My question is then: is there any plan to change this ? If not, is this > for some reasons I don't see, or is this just because of lack of manpower > ? I raised the possibility of breaking up the files before and Travis was agreeable to the idea. It is still in the back of my mind but I haven't got around to doing anything about it. Maybe we should put together a step by step approach, agree on some file names for the new files, fix the build so it loads in the new stub files in the correct order, and then start moving stuff. My own original desire was to break out the keyword parsers into a separate file but I think Travis had different priorities. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhunter at ace.bsd.uchicago.edu Tue Dec 19 23:28:58 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Tue, 19 Dec 2006 22:28:58 -0600 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> (David Cournapeau's message of "Wed, 20 Dec 2006 11:41:02 +0900") References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> Message-ID: <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "David" == David Cournapeau writes: David> At the end, in the original context (speeding the drawing David> of spectrogram), this is the problem. Even if multiple David> backend/toolkits have obviously an impact in performances, David> I really don't see why a numpy function to convert an array David> to a RGB representation should be 10-20 times slower than David> matlab on the same machine. This isn't exactly right. When matplotlib converts a 2D grayscale array to rgba, a lot goes on under the hood. It's all numpy, but it's far from single function and it involves many passes through the data. In principle, this could be done with one or two passes through the data. In practice, our normalization and colormapping abstractions are so abstract that it is difficult (though not impossible) to special case and optimize. The top-level routine is def to_rgba(self, x, alpha=1.0): '''Return a normalized rgba array corresponding to x. If x is already an rgb or rgba array, return it unchanged. ''' if hasattr(x, 'shape') and len(x.shape)>2: return x x = ma.asarray(x) x = self.norm(x) x = self.cmap(x, alpha) return x which implies at a minimum two passes through the data, one for norm and one for cmap. In 99% of the use cases, cmap is a LinearSegmentedColormap though users can define their own as long as it is callable. My guess is that the expensive part is Colormap.__call__, the base class for LinearSegmentedColormap. We could probably write some extension code that does the following routine in one pass through the data. But it would be hairy. In a quick look and rough count, I see about 10 passes through the data in the function below. If you are interested in optimizing colormapping in mpl, I'd start here. I suspect there may be some low hanging fruit. def __call__(self, X, alpha=1.0): """ X is either a scalar or an array (of any dimension). If scalar, a tuple of rgba values is returned, otherwise an array with the new shape = oldshape+(4,). If the X-values are integers, then they are used as indices into the array. If they are floating point, then they must be in the interval (0.0, 1.0). Alpha must be a scalar. """ if not self._isinit: self._init() alpha = min(alpha, 1.0) # alpha must be between 0 and 1 alpha = max(alpha, 0.0) self._lut[:-3, -1] = alpha mask_bad = None if not iterable(X): vtype = 'scalar' xa = array([X]) else: vtype = 'array' xma = ma.asarray(X) xa = xma.filled(0) mask_bad = ma.getmask(xma) if typecode(xa) in typecodes['Float']: putmask(xa, xa==1.0, 0.9999999) #Treat 1.0 as slightly less than 1. xa = (xa * self.N).astype(Int) # Set the over-range indices before the under-range; # otherwise the under-range values get converted to over-range. putmask(xa, xa>self.N-1, self._i_over) putmask(xa, xa<0, self._i_under) if mask_bad is not None and mask_bad.shape == xa.shape: putmask(xa, mask_bad, self._i_bad) rgba = take(self._lut, xa) if vtype == 'scalar': rgba = tuple(rgba[0,:]) return rgba David> I will take into account all those helpful messages, and David> hopefully come with something for the end of the week :), David> cheers David> David _______________________________________________ David> Numpy-discussion mailing list Numpy-discussion at scipy.org David> http://projects.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Wed Dec 20 01:00:53 2006 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 19 Dec 2006 20:00:53 -1000 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> Message-ID: <4588D195.30400@hawaii.edu> John, The current version of __call__ already includes substantial speedups prompted by David's profiling, and if I understand correctly the present bottleneck is actually the numpy take function. That is not to say that other improvements can't be made, of course. Eric John Hunter wrote: >>>>>> "David" == David Cournapeau writes: > David> At the end, in the original context (speeding the drawing > David> of spectrogram), this is the problem. Even if multiple > David> backend/toolkits have obviously an impact in performances, > David> I really don't see why a numpy function to convert an array > David> to a RGB representation should be 10-20 times slower than > David> matlab on the same machine. > > This isn't exactly right. When matplotlib converts a 2D grayscale > array to rgba, a lot goes on under the hood. It's all numpy, but it's > far from single function and it involves many passes through the > data. In principle, this could be done with one or two passes through > the data. In practice, our normalization and colormapping abstractions > are so abstract that it is difficult (though not impossible) to > special case and optimize. > > The top-level routine is > > def to_rgba(self, x, alpha=1.0): > '''Return a normalized rgba array corresponding to x. > If x is already an rgb or rgba array, return it unchanged. > ''' > if hasattr(x, 'shape') and len(x.shape)>2: return x > x = ma.asarray(x) > x = self.norm(x) > x = self.cmap(x, alpha) > return x > > which implies at a minimum two passes through the data, one for norm > and one for cmap. > > In 99% of the use cases, cmap is a LinearSegmentedColormap though > users can define their own as long as it is callable. My guess is > that the expensive part is Colormap.__call__, the base class for > LinearSegmentedColormap. We could probably write some extension code > that does the following routine in one pass through the data. But it > would be hairy. In a quick look and rough count, I see about 10 > passes through the data in the function below. > > If you are interested in optimizing colormapping in mpl, I'd start > here. I suspect there may be some low hanging fruit. > > def __call__(self, X, alpha=1.0): > """ > X is either a scalar or an array (of any dimension). > If scalar, a tuple of rgba values is returned, otherwise > an array with the new shape = oldshape+(4,). If the X-values > are integers, then they are used as indices into the array. > If they are floating point, then they must be in the > interval (0.0, 1.0). > Alpha must be a scalar. > """ > if not self._isinit: self._init() > alpha = min(alpha, 1.0) # alpha must be between 0 and 1 > alpha = max(alpha, 0.0) > self._lut[:-3, -1] = alpha > mask_bad = None > if not iterable(X): > vtype = 'scalar' > xa = array([X]) > else: > vtype = 'array' > xma = ma.asarray(X) > xa = xma.filled(0) > mask_bad = ma.getmask(xma) > if typecode(xa) in typecodes['Float']: > putmask(xa, xa==1.0, 0.9999999) #Treat 1.0 as slightly less than 1. > xa = (xa * self.N).astype(Int) > # Set the over-range indices before the under-range; > # otherwise the under-range values get converted to over-range. > putmask(xa, xa>self.N-1, self._i_over) > putmask(xa, xa<0, self._i_under) > if mask_bad is not None and mask_bad.shape == xa.shape: > putmask(xa, mask_bad, self._i_bad) > rgba = take(self._lut, xa) > if vtype == 'scalar': > rgba = tuple(rgba[0,:]) > return rgba > > > > > > David> I will take into account all those helpful messages, and > David> hopefully come with something for the end of the week :), > > David> cheers > > David> David _______________________________________________ > David> Numpy-discussion mailing list Numpy-discussion at scipy.org > David> http://projects.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From david at ar.media.kyoto-u.ac.jp Wed Dec 20 01:13:17 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 15:13:17 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> Message-ID: <4588D47D.6080206@ar.media.kyoto-u.ac.jp> John Hunter wrote: >>>>>> "David" == David Cournapeau writes: > David> At the end, in the original context (speeding the drawing > David> of spectrogram), this is the problem. Even if multiple > David> backend/toolkits have obviously an impact in performances, > David> I really don't see why a numpy function to convert an array > David> to a RGB representation should be 10-20 times slower than > David> matlab on the same machine. > > This isn't exactly right. When matplotlib converts a 2D grayscale > array to rgba, a lot goes on under the hood. It's all numpy, but it's > far from single function and it involves many passes through the > data. In principle, this could be done with one or two passes through > the data. In practice, our normalization and colormapping abstractions > are so abstract that it is difficult (though not impossible) to > special case and optimize. > Well, we managed to have more than a 100 % speed increase for to_rgba function already, with the help from Eric Firing. Before, both functors Normalize and Colormap were expensive; I think we got something like a 2-3x speed increase in normalize (now, the main problem is the clip function, which lead to the discussion about numpy clip), and now, the Colormap functor takes 75 % of the time of to_rgba function with an array of 8000x256 samples: 1 0.000 0.000 0.832 0.832 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:275(expose_event) 1 0.010 0.010 0.831 0.831 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:71(_render_figure) 1 0.000 0.000 0.821 0.821 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_agg.py:385(draw) 1 0.000 0.000 0.819 0.819 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:511(draw) 1 0.000 0.000 0.817 0.817 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:1043(draw) 1 0.000 0.000 0.648 0.648 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1927(imshow) 3 0.000 0.000 0.573 0.191 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca) 1 0.000 0.000 0.572 0.572 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold) 1 0.007 0.007 0.510 0.510 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:173(draw) 1 0.109 0.109 0.503 0.503 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:109(make_image) 4 0.000 0.000 0.491 0.123 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf) 1 0.000 0.000 0.491 0.491 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:818(figure) 1 0.000 0.000 0.491 0.491 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:36(new_figure_manager) 1 0.024 0.024 0.482 0.482 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:401(__init__) 1 0.000 0.000 0.458 0.458 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:25(_get_toolbar) 1 0.001 0.001 0.458 0.458 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:496(__init__) 1 0.000 0.000 0.458 0.458 /home/david/local/lib/python2.4/site-packages/matplotlib/backend_bases.py:1112(__init__) 1 0.010 0.010 0.458 0.458 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:557(_init_toolbar) 1 0.029 0.029 0.448 0.448 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:595(_init_toolbar2_4) 1 0.419 0.419 0.419 0.419 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:967(__init__) 1 0.002 0.002 0.393 0.393 /home/david/local/lib/python2.4/site-packages/matplotlib/cm.py:50(to_rgba) 1 0.111 0.111 0.307 0.307 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:570(__call__) Of this 300 ms spent in Colormap functor, 200 ms are taken by the take function: this is the function which I think can be speed up considerably. > The top-level routine is > > def to_rgba(self, x, alpha=1.0): > '''Return a normalized rgba array corresponding to x. > If x is already an rgb or rgba array, return it unchanged. > ''' > if hasattr(x, 'shape') and len(x.shape)>2: return x > x = ma.asarray(x) > x = self.norm(x) > x = self.cmap(x, alpha) > return x > > which implies at a minimum two passes through the data, one for norm > and one for cmap. > > In 99% of the use cases, cmap is a LinearSegmentedColormap though > users can define their own as long as it is callable. My guess is > that the expensive part is Colormap.__call__, the base class for > LinearSegmentedColormap. We could probably write some extension code > that does the following routine in one pass through the data. But it > would be hairy. In a quick look and rough count, I see about 10 > passes through the data in the function below. So with above points, I agree that self.norm and self.cmap are the slow parts, but I think this can be much improved by improving the corresponding numpy functions. I think there is space to improve things without touching matplotlib, cheers, David From david at ar.media.kyoto-u.ac.jp Wed Dec 20 01:59:50 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 15:59:50 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612191503.32995.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> Message-ID: <4588DF66.5000005@ar.media.kyoto-u.ac.jp> Francesc Altet wrote: > > > > So, cProfile is only showing where the time is spent at the > first-level calls in extension level. If we want more introspection on > the C stack, and you are running un Linux, oprofile > (http://oprofile.sourceforge.net) is a very nice profiler. Here are > the outputs for the above routines on my machine. > > For clip1: > > Profiling through timer interrupt > samples % image name symbol name > 643 54.6769 libc-2.3.6.so memmove > 151 12.8401 multiarray.so PyArray_Choose > 35 2.9762 umath.so BYTE_multiply > 34 2.8912 umath.so DOUBLE_greater > 32 2.7211 mtrand.so rk_random > 32 2.7211 umath.so DOUBLE_less > 30 2.5510 libc-2.3.6.so memcpy > > > For clip2: > > Profiling through timer interrupt > samples % image name symbol name > 188 24.5111 libc-2.3.6.so memmove > 143 18.6441 multiarray.so _nonzero_indices > 126 16.4276 multiarray.so PyArray_MapIterNext > 37 4.8240 umath.so DOUBLE_greater > 36 4.6936 mtrand.so rk_gauss > 33 4.3025 umath.so DOUBLE_less > 24 3.1291 libc-2.3.6.so memcpy Could you detail a bit how you did the profiling with oprofile ? I don't manage to get the same results than you (that is on per application basis when the application is a python script and not a 'binary' program) Thank you, David From faltet at carabos.com Wed Dec 20 03:48:48 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed, 20 Dec 2006 09:48:48 +0100 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <4588DF66.5000005@ar.media.kyoto-u.ac.jp> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588DF66.5000005@ar.media.kyoto-u.ac.jp> Message-ID: <200612200948.48753.faltet@carabos.com> A Dimecres 20 Desembre 2006 07:59, David Cournapeau escrigu?: > Could you detail a bit how you did the profiling with oprofile ? I don't > manage to get the same results than you (that is on per application > basis when the application is a python script and not a 'binary' program) Sure. You need first to start the profiler with: opcontrol --start then run your application, for example: python2.5 /tmp/clipb2.py after this you should instruct oprofile to stop collecting samples: opcontrol --stop now, you need to tell oprofile that you want a report on the binary you have run (i.e. your interpreter): opreport -l /usr/local/bin/python2.5 # put there your actual path That's all. Remember to reset all the samples in oprofile each time you want to start a new run (otherwise the samples will accumulate from run to run): opcontrol --reset You can get more info in: http://oprofile.sourceforge.net/docs/ HTH, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From oliphant at ee.byu.edu Wed Dec 20 04:03:16 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 20 Dec 2006 02:03:16 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> Message-ID: <4588FC54.7040106@ee.byu.edu> > My question is then: is there any plan to change this ? If not, is > this > for some reasons I don't see, or is this just because of lack of > manpower ? > > > I raised the possibility of breaking up the files before and Travis > was agreeable to the idea. It is still in the back of my mind but I > haven't got around to doing anything about it. Maybe we should put > together a step by step approach, agree on some file names for the new > files, fix the build so it loads in the new stub files in the correct > order, and then start moving stuff. My own original desire was to > break out the keyword parsers into a separate file but I think Travis > had different priorities. The problem with separate files is (and has always been) the NumPy C-API. I tried to use separate files to some extent (and then use #include to make it all one big file). The C-API is exposed by filling in a table of function pointers. You will notice that when arrayobject.h is included for an extension module, all of the C-API is defined to pull a particular function pointer out of a table that is stored in a Python CObject in the multiarray module extension itself. Basically, NumPy is following the standard Python advice (as Numeric and Numarray did) about how to expose a C-API, but it's just gotten a bit big. Solutions to that problem are always welcome. -Travis From oliphant at ee.byu.edu Wed Dec 20 04:05:21 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 20 Dec 2006 02:05:21 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588A055.5070707@ar.media.kyoto-u.ac.jp> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> Message-ID: <4588FCD1.10800@ee.byu.edu> David Cournapeau wrote: > en I went back to home, I started taking a close look a numpy/core C > sources, with the help of the numpy ebook. The huge source files make it > really difficult for me to follow some things: I was wondering if there > is some rationale behind it, or if this is just a remain of old > developments of numpy. > > The main problem I have with those huge files is that I am confused > between the functions parts of the public API, the one for backward > compatibility, etc... I wanted to extract the PyArray_TakeFom function > to see where the time is spent, but this is quite difficult, because of > various dependencies. > > My question is then: is there any plan to change this ? If not, is this > for some reasons I don't see, or is this just because of lack of manpower ? > I'm not sure what you mean by "this". I have no plans to change the infrastructure, but naturally suggestions are always welcome. You just have to understand and figure out the limitations of trying to expose a C-API. -Travis From david at ar.media.kyoto-u.ac.jp Wed Dec 20 04:16:36 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 18:16:36 +0900 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612200948.48753.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588DF66.5000005@ar.media.kyoto-u.ac.jp> <200612200948.48753.faltet@carabos.com> Message-ID: <4588FF74.5070703@ar.media.kyoto-u.ac.jp> Francesc Altet wrote: > A Dimecres 20 Desembre 2006 07:59, David Cournapeau escrigu?: >> Could you detail a bit how you did the profiling with oprofile ? I don't >> manage to get the same results than you (that is on per application >> basis when the application is a python script and not a 'binary' program) > > Sure. You need first to start the profiler with: > > opcontrol --start > > then run your application, for example: > > python2.5 /tmp/clipb2.py > > after this you should instruct oprofile to stop collecting samples: > > opcontrol --stop > > now, you need to tell oprofile that you want a report on the binary > you have run (i.e. your interpreter): > > opreport -l /usr/local/bin/python2.5 # put there your actual path > Ok, I am a bit stupid, I should have thought about using the python interpreter instead of my script. But if I do this, I have only one line, which corresponds to the time spend to python: opreport -l /usr/bin/python2.5 6520 100.00 (no symbols) I guess the problem is that oprofile has no way to know that code spend into eg umath.so was called by scripts run python2.5. How do you do that ? Do you need a specially compiled interpreter (with -g ?) cheers, David From david at ar.media.kyoto-u.ac.jp Wed Dec 20 04:27:44 2006 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 Dec 2006 18:27:44 +0900 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588FCD1.10800@ee.byu.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <20061218082756.GV2180@mentat.za.net> <45865515.1000208@ar.media.kyoto-u.ac.jp> <20061218091710.GW2180@mentat.za.net> <458668F0.90604@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> <4588FCD1.10800@ee.byu.edu> Message-ID: <45890210.7090205@ar.media.kyoto-u.ac.jp> Travis Oliphant wrote: > David Cournapeau wrote: >> en I went back to home, I started taking a close look a numpy/core C >> sources, with the help of the numpy ebook. The huge source files make it >> really difficult for me to follow some things: I was wondering if there >> is some rationale behind it, or if this is just a remain of old >> developments of numpy. >> >> The main problem I have with those huge files is that I am confused >> between the functions parts of the public API, the one for backward >> compatibility, etc... I wanted to extract the PyArray_TakeFom function >> to see where the time is spent, but this is quite difficult, because of >> various dependencies. >> >> My question is then: is there any plan to change this ? If not, is this >> for some reasons I don't see, or is this just because of lack of manpower ? >> > > I'm not sure what you mean by "this". I have no plans to change the > infrastructure, but naturally suggestions are always welcome. You just > have to understand and figure out the limitations of trying to expose a > C-API. "this" was just about the big source files, and I was wondering if there was a rationale or not. Your previous email answered this: there is a rationale. I don't have much experience in pure C python modules, and if this is the standard python way of doing things, I guess there is no other easy way of doing things. Thank you for your explanation, David From faltet at carabos.com Wed Dec 20 04:32:07 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed, 20 Dec 2006 10:32:07 +0100 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <4588FF74.5070703@ar.media.kyoto-u.ac.jp> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612200948.48753.faltet@carabos.com> <4588FF74.5070703@ar.media.kyoto-u.ac.jp> Message-ID: <200612201032.08487.faltet@carabos.com> A Dimecres 20 Desembre 2006 10:16, David Cournapeau escrigu?: > Francesc Altet wrote: > > A Dimecres 20 Desembre 2006 07:59, David Cournapeau escrigu?: > >> Could you detail a bit how you did the profiling with oprofile ? I don't > >> manage to get the same results than you (that is on per application > >> basis when the application is a python script and not a 'binary' > >> program) > > > > Sure. You need first to start the profiler with: > > > > opcontrol --start > > > > then run your application, for example: > > > > python2.5 /tmp/clipb2.py > > > > after this you should instruct oprofile to stop collecting samples: > > > > opcontrol --stop > > > > now, you need to tell oprofile that you want a report on the binary > > you have run (i.e. your interpreter): > > > > opreport -l /usr/local/bin/python2.5 # put there your actual path > > Ok, I am a bit stupid, I should have thought about using the python > interpreter instead of my script. But if I do this, I have only one > line, which corresponds to the time spend to python: > > opreport -l /usr/bin/python2.5 > > 6520 100.00 (no symbols) > > I guess the problem is that oprofile has no way to know that code spend > into eg umath.so was called by scripts run python2.5. How do you do that > ? Do you need a specially compiled interpreter (with -g ?) No, I don't think so (at least, if you are not going to profile python itself). I think that the only thing you need should be to specify the -g when compiling the libraries that you are going to profile; in this case: NumPy. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From ivilata at carabos.com Wed Dec 20 07:02:26 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Wed, 20 Dec 2006 13:02:26 +0100 Subject: [Numpy-discussion] Type of 1st argument in Numexpr where() Message-ID: <20061220120226.GC11105@tardis.terramar.selidor.net> Hi all, I noticed that the set of ``where()`` functions defined by Numexpr all have a signature like ``xfxx``, i.e. the first argument is a float and the return, second and third arguments are of the same type (whatever it is). Since the first argument effectively represents a condition, wouldn't it make more sense for it to be a boolean? Booleans are already supported by Numexpr, maybe the old signatures are just a legacy from the time when Numexpr didn't support them. I have attached a patch to the latest version of Numexpr which implements this. Cheers, PS: It seems that http://numpy.scipy.org/ still points to the old SourceForge list address. :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- Index: interp_body.c =================================================================== --- interp_body.c (revisi?n: 2439) +++ interp_body.c (copia de trabajo) @@ -155,7 +155,7 @@ case OP_POW_III: VEC_ARG2(i_dest = (i2 < 0) ? (1 / i1) : (long)pow(i1, i2)); case OP_MOD_III: VEC_ARG2(i_dest = i1 % i2); - case OP_WHERE_IFII: VEC_ARG3(i_dest = f1 ? i2 : i3); + case OP_WHERE_IBII: VEC_ARG3(i_dest = b1 ? i2 : i3); case OP_CAST_FB: VEC_ARG1(f_dest = (long)b1); case OP_CAST_FI: VEC_ARG1(f_dest = (double)(i1)); @@ -175,7 +175,7 @@ case OP_SQRT_FF: VEC_ARG1(f_dest = sqrt(f1)); case OP_ARCTAN2_FFF: VEC_ARG2(f_dest = atan2(f1, f2)); - case OP_WHERE_FFFF: VEC_ARG3(f_dest = f1 ? f2 : f3); + case OP_WHERE_FBFF: VEC_ARG3(f_dest = b1 ? f2 : f3); case OP_FUNC_FF: VEC_ARG1(f_dest = functions_f[arg2](f1)); case OP_FUNC_FFF: VEC_ARG2(f_dest = functions_ff[arg3](f1, f2)); @@ -206,8 +206,8 @@ case OP_EQ_BCC: VEC_ARG2(b_dest = (c1r == c2r && c1i == c2i) ? 1 : 0); case OP_NE_BCC: VEC_ARG2(b_dest = (c1r != c2r || c1i != c2i) ? 1 : 0); - case OP_WHERE_CFCC: VEC_ARG3(cr_dest = f1 ? c2r : c3r; - ci_dest = f1 ? c2i : c3i); + case OP_WHERE_CBCC: VEC_ARG3(cr_dest = b1 ? c2r : c3r; + ci_dest = b1 ? c2i : c3i); case OP_FUNC_CC: VEC_ARG1(ca.real = c1r; ca.imag = c1i; functions_cc[arg2](&ca, &ca); Index: tests/test_numexpr.py =================================================================== --- tests/test_numexpr.py (revisi?n: 2439) +++ tests/test_numexpr.py (copia de trabajo) @@ -186,8 +186,8 @@ 'sinh(a)', '2*a + (cos(3)+5)*sinh(cos(b))', '2*a + arctan2(a, b)', - 'where(a, 2, b)', - 'where((a-10).real, a, 2)', + 'where(a != 0.0, 2, b)', + 'where((a-10).real != 0.0, a, 2)', 'cos(1+1)', '1+1', '1', Index: interpreter.c =================================================================== --- interpreter.c (revisi?n: 2439) +++ interpreter.c (copia de trabajo) @@ -45,7 +45,7 @@ OP_DIV_III, OP_POW_III, OP_MOD_III, - OP_WHERE_IFII, + OP_WHERE_IBII, OP_CAST_FB, OP_CAST_FI, @@ -63,7 +63,7 @@ OP_TAN_FF, OP_SQRT_FF, OP_ARCTAN2_FFF, - OP_WHERE_FFFF, + OP_WHERE_FBFF, OP_FUNC_FF, OP_FUNC_FFF, @@ -80,7 +80,7 @@ OP_SUB_CCC, OP_MUL_CCC, OP_DIV_CCC, - OP_WHERE_CFCC, + OP_WHERE_CBCC, OP_FUNC_CC, OP_FUNC_CCC, @@ -148,9 +148,9 @@ case OP_POW_III: if (n == 0 || n == 1 || n == 2) return 'i'; break; - case OP_WHERE_IFII: + case OP_WHERE_IBII: if (n == 0 || n == 2 || n == 3) return 'i'; - if (n == 1) return 'f'; + if (n == 1) return 'b'; break; case OP_CAST_FB: if (n == 0) return 'f'; @@ -178,8 +178,9 @@ case OP_ARCTAN2_FFF: if (n == 0 || n == 1 || n == 2) return 'f'; break; - case OP_WHERE_FFFF: - if (n == 0 || n == 1 || n == 2 || n == 3) return 'f'; + case OP_WHERE_FBFF: + if (n == 0 || n == 2 || n == 3) return 'f'; + if (n == 1) return 'b'; break; case OP_FUNC_FF: if (n == 0 || n == 1) return 'f'; @@ -217,9 +218,9 @@ case OP_DIV_CCC: if (n == 0 || n == 1 || n == 2) return 'c'; break; - case OP_WHERE_CFCC: + case OP_WHERE_CBCC: if (n == 0 || n == 2 || n == 3) return 'c'; - if (n == 1) return 'f'; + if (n == 1) return 'b'; break; case OP_FUNC_CC: if (n == 0 || n == 1) return 'c'; @@ -1320,7 +1321,7 @@ add_op("div_iii", OP_DIV_III); add_op("pow_iii", OP_POW_III); add_op("mod_iii", OP_MOD_III); - add_op("where_ifii", OP_WHERE_IFII); + add_op("where_ibii", OP_WHERE_IBII); add_op("cast_fb", OP_CAST_FB); add_op("cast_fi", OP_CAST_FI); @@ -1339,7 +1340,7 @@ add_op("tan_ff", OP_TAN_FF); add_op("sqrt_ff", OP_SQRT_FF); add_op("arctan2_fff", OP_ARCTAN2_FFF); - add_op("where_ffff", OP_WHERE_FFFF); + add_op("where_fbff", OP_WHERE_FBFF); add_op("func_ff", OP_FUNC_FF); add_op("func_fff", OP_FUNC_FFF); @@ -1356,7 +1357,7 @@ add_op("sub_ccc", OP_SUB_CCC); add_op("mul_ccc", OP_MUL_CCC); add_op("div_ccc", OP_DIV_CCC); - add_op("where_cfcc", OP_WHERE_CFCC); + add_op("where_cbcc", OP_WHERE_CBCC); add_op("func_cc", OP_FUNC_CC); add_op("func_ccc", OP_FUNC_CCC); Index: timing.py =================================================================== --- timing.py (revisi?n: 2439) +++ timing.py (copia de trabajo) @@ -88,13 +88,13 @@ """ % ((array_size,)*3) expr5 = 'where(0.1*a > arctan2(a, b), 2*a, arctan2(a,b))' -expr6 = 'where(a, 2, b)' +expr6 = 'where(a != 0.0, 2, b)' -expr7 = 'where(a-10, a, 2)' +expr7 = 'where(a-10 != 0.0, a, 2)' -expr8 = 'where(a%2, b+5, 2)' +expr8 = 'where(a%2 != 0.0, b+5, 2)' -expr9 = 'where(a%2, 2, b+5)' +expr9 = 'where(a%2 != 0.0, 2, b+5)' expr10 = 'a**2 + (b+1)**-2.5' -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From faltet at carabos.com Wed Dec 20 07:08:34 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed, 20 Dec 2006 13:08:34 +0100 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <4588A19F.3050405@ar.media.kyoto-u.ac.jp> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A19F.3050405@ar.media.kyoto-u.ac.jp> Message-ID: <200612201308.36419.faltet@carabos.com> A Dimecres 20 Desembre 2006 03:36, David Cournapeau escrigu?: > Francesc Altet wrote: > > A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigu?: > >> Hi, > >> > >> Following the discussion on clip and other functions which *may* be > >> slow in numpy, I would like to know if there is a way to easily profile > >> numpy, ie functions which are written in C. > >> For example, I am not sure to understand why a function like take(a, > >> b) with a a double 256x4 array and b a 8000x256 int array takes almost > >> 200 ms on a fairly fast CPU; in the source code, I can see that numpy > >> uses memmove, and I know memmove to be slower than memcpy. Is there an > >> easy way to check that this is coming from memmove (case in which > >> nothing much can be done to improve the situation I guess), and not from > >> something else ? > > Concerning the memmove vs memcpy: the problem I was speaking about is in > another function (numpy take), where the problem is much bigger speed wise. Ops. Yes, you are right. Out of curiosity, I've looked into this as well, and created a small script to benchmark take(), which is listed at the end of the message. I've used a = double 256x4 and b = int 80x256 (my b is quite smaller than yours, mainly because of the time that it takes to generate with my current approach; by the way, if anybody knows an easy way to do a cartesian product with numpy, please, tell me), but I don't think this is going to influence the timings. With it, here is the cProfile for take(a,b) (1000 iterations): 2862 function calls in 6.907 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 1000 6.679 0.007 6.679 0.007 {method 'take' of 'numpy.ndarray' objects} 1 0.138 0.138 0.138 0.138 {numpy.core.multiarray.array} 1 0.059 0.059 6.907 6.907 prova.py:30(bench_take) and here the output from oprofile for the same benchmark: Profiling through timer interrupt samples % image name symbol name 1360 83.0281 libc-2.3.6.so memmove 167 10.1954 multiarray.so PyArray_TakeFrom 11 0.6716 multiarray.so .plt and, if we replace memmove by memcpy: Profiling through timer interrupt samples % image name symbol name 1307 82.0980 libc-2.3.6.so memcpy 178 11.1809 multiarray.so PyArray_TakeFrom 13 0.8166 multiarray.so .plt So, again, it seems like we have the same pattern than with .clip() method: the bottleneck is in calling memmove (the same would go for memcpy) for every element in the array a that has to be taken. A workaround for speeding this up (as suggested by Travis in his excellent book) is to replace: c = take(a, b) by c = a.flat[b] Here are the cProfile results for the version using fancy indexing: 862 function calls in 3.455 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 1 3.300 3.300 3.455 3.455 prova.py:30(bench_take) 1 0.132 0.132 0.132 0.132 {numpy.core.multiarray.array} 257 0.018 0.000 0.018 0.000 {map} which is 2x faster than the take approach. The oprofile output for the fancy indexing approach: samples % image name symbol name 426 53.2500 multiarray.so iter_subscript 277 34.6250 multiarray.so DOUBLE_copyswap 9 1.1250 python2.5 PyString_FromFormatV seems to tell us that memmove/memcopy are not called at all, but instead the DOUBLE_copyswap function. This is in fact an apparence, because if we look at the code of DOUBLE_copyswap (found in arraytypes.inc.src): @fname at _copyswap (void *dst, void *src, int swap, void *arr) { if (src != NULL) /* copy first if needed */ memcpy(dst, src, sizeof(@type@)); [where the numpy code generator is replacing @fname@ by DOUBLE] we see that memcpy is called under the hood (I don't know why oprofile is not able to detect this call anymore). After looking at the function, and remembering what Charles Harris said in a previous message about the convenience to use a simple type specific assignment, I've ended replacing the memcpy. Here it is the patch: --- numpy/core/src/arraytypes.inc.src (revision 3487) +++ numpy/core/src/arraytypes.inc.src (working copy) @@ -997,11 +997,11 @@ } static void - at fname@_copyswap (void *dst, void *src, int swap, void *arr) + at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr) { if (src != NULL) /* copy first if needed */ - memcpy(dst, src, sizeof(@type@)); + *dst = *src; if (swap) { register char *a, *b, c; and after this, timings seems to improve a bit. With CProfile: 862 function calls in 3.251 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 3.092 3.092 3.251 3.251 prova.py:31(bench_take) 1 0.135 0.135 0.135 0.135 {numpy.core.multiarray.array} 257 0.018 0.000 0.018 0.000 {map} which is around a 6% faster. With oprofile: samples % image name symbol name 525 64.7349 multiarray.so iter_subscript 186 22.9346 multiarray.so DOUBLE_copyswap 8 0.9864 python2.5 PyString_FromFormatV so, DOUBLE_copyswap seems around a 50% faster (186 samples vs 277) now due to the use of the type specific assignment trick. It seems to me that the above patch is safe, and besides, the complete test suite in numpy passes (in fact, it runs around a 6% faster), so perhaps it would be a nice thing to apply it. In this sense, it would be good to do a overhauling of the NumPy code so as to discover other places where this trick can be applied. There is still a long way to get an optimal replacement for the take() method (can iter_subscript be further optimised?), but this may help a bit. Cheers, -------------------------------------------------------------------------- import numpy niter = 1000 d0, d1 = (256, 4) #d0, d1 = (6, 4) def generate_data_2d(d0, d1): return numpy.random.randn(d0, d1) def cartesian(Lists): import operator if Lists: result = map(lambda I: (I,), Lists[0]) for list in Lists[1:]: curr = [] for item in list: new = map(operator.add, result, [(item,)]*len(result)) curr[len(curr):] = new result = curr else: result = [] return result def generate_data_indices(d0, d1, N1, N2): iaxis = numpy.random.randint(0, d0, N1) jaxis = numpy.random.randint(0, d1, N2) return numpy.array(cartesian([iaxis.tolist(), jaxis.tolist()])) def bench_take(): a = generate_data_2d(d0, d1) b = generate_data_indices(d0, d1, 80, 256) #b = generate_data_indices(d0, d1, 8, 2) for i in range(niter): #c = numpy.take(a, b) c = a.flat[b] # equivalent using fancy indexing if __name__ == '__main__': # test take import pstats import cProfile as prof profile_wanted = False if not profile_wanted: bench_take() else: profile_file = 'take.prof' prof.run('bench_take()', profile_file) stats = pstats.Stats(profile_file) stats.strip_dirs() stats.sort_stats('time', 'calls') stats.print_stats() -------------------------------------------------------------------------- -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From charlesr.harris at gmail.com Wed Dec 20 10:25:13 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Dec 2006 08:25:13 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <4588FC54.7040106@ee.byu.edu> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> <4588FC54.7040106@ee.byu.edu> Message-ID: On 12/20/06, Travis Oliphant wrote: > > > > My question is then: is there any plan to change this ? If not, is > > this > > for some reasons I don't see, or is this just because of lack of > > manpower ? > > > > > > I raised the possibility of breaking up the files before and Travis > > was agreeable to the idea. It is still in the back of my mind but I > > haven't got around to doing anything about it. Maybe we should put > > together a step by step approach, agree on some file names for the new > > files, fix the build so it loads in the new stub files in the correct > > order, and then start moving stuff. My own original desire was to > > break out the keyword parsers into a separate file but I think Travis > > had different priorities. > > The problem with separate files is (and has always been) the NumPy > C-API. I tried to use separate files to some extent (and then use > #include to make it all one big file). The C-API is exposed by filling > in a table of function pointers. You will notice that when > arrayobject.h is included for an extension module, all of the C-API is > defined to pull a particular function pointer out of a table that is > stored in a Python CObject in the multiarray module extension itself. > Basically, NumPy is following the standard Python advice (as Numeric and > Numarray did) about how to expose a C-API, but it's just gotten a bit big. > > Solutions to that problem are always welcome. I've been thinking about that a bit. One solution is to have a small python program that takes all the pieces and writes one big build file, I think something like that happens now. Another might be to use includes in a base file; there is nothing sacred about not including .c files or not putting code in .h files, it is just a convention, we could even chose another extension. I also wonder if we couldn't just link in object files. The table of function pointers just needs some addresses and, while the python convention of hiding all the function names by using static functions is nice, it is probably not required. Maybe we could use ctypes in some way? I am not pushing any of these alternatives at the moment, just putting them down. Maybe there are others? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhunter at ace.bsd.uchicago.edu Wed Dec 20 10:40:49 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Wed, 20 Dec 2006 09:40:49 -0600 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <4588D47D.6080206@ar.media.kyoto-u.ac.jp> (David Cournapeau's message of "Wed, 20 Dec 2006 15:13:17 +0900") References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A2BE.4030801@ar.media.kyoto-u.ac.jp> <87ac1j9omt.fsf@peds-pc311.bsd.uchicago.edu> <4588D47D.6080206@ar.media.kyoto-u.ac.jp> Message-ID: <873b7algn2.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "David" == David Cournapeau writes: David> Of this 300 ms spent in Colormap functor, 200 ms are taken David> by the take function: this is the function which I think David> can be speed up considerably. Sorry I had missed this in the previous conversations. It is impressive that take is taking such a big chunk the __call__ time, because there is a lot of other stuff going on in that function! You might want to run this against numarray for comparison -- in a few instances Travis has been able find some big wins by borrowing from numarray. JDH From tim.hochberg at ieee.org Wed Dec 20 11:20:01 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Wed, 20 Dec 2006 09:20:01 -0700 Subject: [Numpy-discussion] Type of 1st argument in Numexpr where() In-Reply-To: <20061220120226.GC11105@tardis.terramar.selidor.net> References: <20061220120226.GC11105@tardis.terramar.selidor.net> Message-ID: <458962B1.9010005@ieee.org> Ivan Vilata i Balaguer wrote: > Hi all, > > I noticed that the set of ``where()`` functions defined by Numexpr all > have a signature like ``xfxx``, i.e. the first argument is a float and > the return, second and third arguments are of the same type (whatever it > is). > > Since the first argument effectively represents a condition, wouldn't it > make more sense for it to be a boolean? Booleans are already supported > by Numexpr, maybe the old signatures are just a legacy from the time > when Numexpr didn't support them. > > Actually, this is on purpose. Numpy.where (and most other switching constructs in Python) will switch on almost anything. In particular, any number that is nonzero is considered True, zero is considered False. By changing the signature, you're restricting where to only accepting booleans. Since booleans and ints can by freely cast to doubles in numexpr, always using float for the condition saves us a couple of opcodes. [I just realized that numpy.where also handles complex conditions, and I suspect that numexpr.where will refuse those. That should probably be fixed at some point I suppose] Anyway, in theory it would be more efficient to supply a separate boolean version of the opcode in *addition* to the float version (and potentially an int version as well although that is less compelling), since it would save a cast. However, I'm always worried that increasing the opcode count is going to slow down the numexpr interpreter, so I tend to push back on those unless it's demonstrably a speed win. regards -tim From strawman at astraw.com Wed Dec 20 13:32:37 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed, 20 Dec 2006 10:32:37 -0800 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612201308.36419.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A19F.3050405@ar.media.kyoto-u.ac.jp> <200612201308.36419.faltet@carabos.com> Message-ID: <458981C5.3070700@astraw.com> I added a ticket for Francesc's enhancement: http://projects.scipy.org/scipy/numpy/ticket/403 From ivilata at carabos.com Wed Dec 20 13:51:55 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Wed, 20 Dec 2006 19:51:55 +0100 Subject: [Numpy-discussion] Type of 1st argument in Numexpr where() In-Reply-To: <458962B1.9010005@ieee.org> References: <20061220120226.GC11105@tardis.terramar.selidor.net> <458962B1.9010005@ieee.org> Message-ID: <20061220185155.GE11105@tardis.terramar.selidor.net> Tim Hochberg (el 2006-12-20 a les 09:20:01 -0700) va dir:: > Actually, this is on purpose. Numpy.where (and most other switching > constructs in Python) will switch on almost anything. In particular, any > number that is nonzero is considered True, zero is considered False. By > changing the signature, you're restricting where to only accepting > booleans. Since booleans and ints can by freely cast to doubles in > numexpr, always using float for the condition saves us a couple of opcodes. > [...] Yes, I understand the reasons you expose here. Nou you brought the topic about, I'm curious about what does "always using float for the condition saves us a couple of opcodes" mean. Could you explain this? Just for curiosity. :) :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From oliphant at ee.byu.edu Wed Dec 20 13:58:23 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 20 Dec 2006 11:58:23 -0700 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612201308.36419.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A19F.3050405@ar.media.kyoto-u.ac.jp> <200612201308.36419.faltet@carabos.com> Message-ID: <458987CF.2050800@ee.byu.edu> Francesc Altet wrote: >seems to tell us that memmove/memcopy are not called at all, but >instead the DOUBLE_copyswap function. This is in fact an apparence, >because if we look at the code of DOUBLE_copyswap (found in >arraytypes.inc.src): > >@fname at _copyswap (void *dst, void *src, int swap, void *arr) >{ > > if (src != NULL) /* copy first if needed */ > memcpy(dst, src, sizeof(@type@)); > >[where the numpy code generator is replacing @fname@ by DOUBLE] > >we see that memcpy is called under the hood (I don't know why oprofile >is not able to detect this call anymore). > >After looking at the function, and remembering what Charles Harris >said in a previous message about the convenience to use a simple type >specific assignment, I've ended replacing the memcpy. Here it is the >patch: > >--- numpy/core/src/arraytypes.inc.src (revision 3487) >+++ numpy/core/src/arraytypes.inc.src (working copy) >@@ -997,11 +997,11 @@ > } > > static void >- at fname@_copyswap (void *dst, void *src, int swap, void *arr) >+ at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr) > { > > if (src != NULL) /* copy first if needed */ >- memcpy(dst, src, sizeof(@type@)); >+ *dst = *src; > > if (swap) { > register char *a, *b, c; > >and after this, timings seems to improve a bit. With CProfile: > > 862 function calls in 3.251 CPU seconds > > Ordered by: internal time, call count > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 3.092 3.092 3.251 3.251 prova.py:31(bench_take) > 1 0.135 0.135 0.135 0.135 {numpy.core.multiarray.array} > 257 0.018 0.000 0.018 0.000 {map} > >which is around a 6% faster. With oprofile: > >samples % image name symbol name >525 64.7349 multiarray.so iter_subscript >186 22.9346 multiarray.so DOUBLE_copyswap >8 0.9864 python2.5 PyString_FromFormatV > >so, DOUBLE_copyswap seems around a 50% faster (186 samples vs 277) now >due to the use of the type specific assignment trick. > >It seems to me that the above patch is safe, and besides, the complete >test suite in numpy passes (in fact, it runs around a 6% faster), so >perhaps it would be a nice thing to apply it. In this sense, it would >be good to do a overhauling of the NumPy code so as to discover other >places where this trick can be applied. > > This is a good idea. We've used this trick in the general-purpose copying code. Compilers seem to do a better job of handling the direct assignment than using general-purpose memcpy. I suspect we should look at every use of memcpy and see if it can't be improved. -Travis From faltet at carabos.com Wed Dec 20 14:09:01 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed, 20 Dec 2006 20:09:01 +0100 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <458981C5.3070700@astraw.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612201308.36419.faltet@carabos.com> <458981C5.3070700@astraw.com> Message-ID: <200612202009.02736.faltet@carabos.com> A Dimecres 20 Desembre 2006 19:32, Andrew Straw escrigu?: > I added a ticket for Francesc's enhancement: > http://projects.scipy.org/scipy/numpy/ticket/403 Thanks Andrew, but I realized that my patch is not safe for dealing with unaligned arrays (Sun machines would segfault). After thinking several alternatives, I've ended modifying the iter_subscript_* funtions instead (see the new patch below). For this, I've created a small function named assign_behaved() that only will get called when the arrays source and destination are well behaved (i.e. aligned and in native byteorder), and small enough so that optimizers can easily inline it (this is key so as to achieve the new speed-up). The results are quite good, as one can achieve almost a 2x of speedup over the original functions. Here is the time for a.flat[b] (in original numpy): 862 function calls in 3.482 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 3.325 3.325 3.482 3.482 prova.py:31(bench_take) 1 0.133 0.133 0.133 0.133 {numpy.core.multiarray.array} 257 0.017 0.000 0.017 0.000 {map} and here with the new patch applied: 862 function calls in 1.815 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 1.662 1.662 1.815 1.815 prova.py:31(bench_take) 1 0.131 0.131 0.131 0.131 {numpy.core.multiarray.array} 257 0.016 0.000 0.016 0.000 {map} We can compare this against the original take(a, b): 2862 function calls in 7.030 CPU seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1000 6.792 0.007 6.792 0.007 {method 'take' of 'numpy.ndarray' objects} 1 0.142 0.142 0.142 0.142 {numpy.core.multiarray.array} 1 0.063 0.063 7.030 7.030 prova.py:31(bench_take) So, the iterator approach plus the patch is more than 4x faster. Given these results, the iterator provided by Travis is becoming very useful for dealing with a wider range of situations, without loosing performance (or even drastically achieving quite more, like in the above example). Below is the patch. I've checked that it passes all the tests in numpy, but still, maybe Travis could see if I forgot something important. Also, it would be nice to look into another places in the code that can benefit of the new assig_behaved() function. Index: numpy/core/src/arrayobject.c =================================================================== --- numpy/core/src/arrayobject.c (revision 3487) +++ numpy/core/src/arrayobject.c (working copy) @@ -8988,6 +8988,19 @@ return self->size; } +/* Specific function that accelerates the copy of some types through assignments */ +static void +assign_behaved(void *dest, void *src, size_t itemsize) +{ + switch (itemsize) { + case 1: *((npy_int8 *)dest) = *((npy_int8 *)src); break; + case 2: *((npy_int16 *)dest) = *((npy_int16 *)src); break; + case 4: *((npy_int32 *)dest) = *((npy_int32 *)src); break; + /* npy_float64 is more efficient than npy_int64 in assignments */ + case 8: *((npy_float64 *)dest) = *((npy_float64 *)src); break; + default: memcpy(dest, src, itemsize); break; + } +} static PyObject * iter_subscript_Bool(PyArrayIterObject *self, PyArrayObject *ind) @@ -8996,7 +9009,7 @@ intp count=0; char *dptr, *optr; PyObject *r; - int swap; + int swap, isbehaved; PyArray_CopySwapFunc *copyswap; @@ -9038,6 +9051,9 @@ swap = (PyArray_ISNOTSWAPPED(self->ao) != PyArray_ISNOTSWAPPED(r)); while(index--) { if (*((Bool *)dptr) != 0) { + if (isbehaved) + assign_behaved(optr, self->dataptr, itemsize); + else copyswap(optr, self->dataptr, swap, self->ao); optr += itemsize; } @@ -9055,9 +9071,9 @@ PyObject *r; PyArrayIterObject *ind_it; int itemsize; - int swap; + int swap, isbehaved; char *optr; - int index; + int index; /* shouldn't be intp? */ PyArray_CopySwapFunc *copyswap; itemsize = self->ao->descr->elsize; @@ -9092,6 +9108,7 @@ index = ind_it->size; copyswap = PyArray_DESCR(r)->f->copyswap; swap = (PyArray_ISNOTSWAPPED(r) != PyArray_ISNOTSWAPPED(self->ao)); + isbehaved = PyArray_ISBEHAVED(r) && PyArray_ISBEHAVED_RO(self->ao); while(index--) { num = *((intp *)(ind_it->dataptr)); if (num < 0) num += self->size; @@ -9106,7 +9123,10 @@ return NULL; } PyArray_ITER_GOTO1D(self, num); - copyswap(optr, self->dataptr, swap, r); + if (isbehaved) + assign_behaved(optr, self->dataptr, itemsize); + else + copyswap(optr, self->dataptr, swap, r); optr += itemsize; PyArray_ITER_NEXT(ind_it); } ---------------------------------------------------------------------- Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From tim.hochberg at ieee.org Wed Dec 20 16:29:57 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Wed, 20 Dec 2006 14:29:57 -0700 Subject: [Numpy-discussion] Type of 1st argument in Numexpr where() In-Reply-To: <20061220185155.GE11105@tardis.terramar.selidor.net> References: <20061220120226.GC11105@tardis.terramar.selidor.net> <458962B1.9010005@ieee.org> <20061220185155.GE11105@tardis.terramar.selidor.net> Message-ID: <4589AB55.5080602@ieee.org> Ivan Vilata i Balaguer wrote: > Tim Hochberg (el 2006-12-20 a les 09:20:01 -0700) va dir:: > > >> Actually, this is on purpose. Numpy.where (and most other switching >> constructs in Python) will switch on almost anything. In particular, any >> number that is nonzero is considered True, zero is considered False. By >> changing the signature, you're restricting where to only accepting >> booleans. Since booleans and ints can by freely cast to doubles in >> numexpr, always using float for the condition saves us a couple of opcodes. >> [...] >> > > Yes, I understand the reasons you expose here. Nou you brought the > topic about, I'm curious about what does "always using float for the > condition saves us a couple of opcodes" mean. Could you explain this? > Just for curiosity. :) > Let's look at simpler than where, which is a confusing function. How about *sin*. Also, let's pretend complex numbers don't exist to make things still simpler. There is only a single *sin* function defined in the numexpr interpreter, and it operates on floats. This works because the numexpr compiler is smart enough to insert cast opcodes to convert boolean or integer types to floats before operating on the with the *sin* opcode which strictly works on floats (remember we are pretending complex numbers don't exist). The situation with the first argument to where is analogous. Booleans and ints are automagically promoted to floats. Since the opcode is designed to work on floats everything works great. And, we only need a single opcode to treat bools, ints and float. That is where "saving a couple of opcodes" comes in. However:: 1. Booleans are probably more common than floats as the argument to where. At present floats are the most efficient case; other cases incur some extra overhead due to casting. 2. It doesn't work for complex values. Problem #2 is easily fixable, should we so desire, simply by adding another opcode. Problem #1 is not so easy. It would be possible to adapt your original idea. We could do the following: 1. Add a function boolean() to the numexpr namespace. This would cast it's argument to an array of bools. 2. Tweak the compile (actually, probably where_func in expressions.py) to compile where(x,a,b) as where(bool(x),a,b) 3. Change where to take bools as the first argument. Or, maybe it would be cleaner to instead change the casting rules so that casting to bool happens automagically. Having cycles in the casting rules frightens me a bit, but it could probably be made to work. So, in summary, I think that the general idea you proposed could be made to work with some more effort. Conceptually, it's cleaner and it could be made more efficient for the common case. On the downside, this would require three new opcodes, as opposed to a single new opcode to do the simple minded fix. So, I'm still a bit up in the air as to whether it's a good idea. -tim From charlesr.harris at gmail.com Wed Dec 20 17:22:54 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Dec 2006 15:22:54 -0700 Subject: [Numpy-discussion] Profiling numpy ? (parts written in C) In-Reply-To: <200612201308.36419.faltet@carabos.com> References: <458790E2.8040607@ar.media.kyoto-u.ac.jp> <200612191503.32995.faltet@carabos.com> <4588A19F.3050405@ar.media.kyoto-u.ac.jp> <200612201308.36419.faltet@carabos.com> Message-ID: On 12/20/06, Francesc Altet wrote: > > A Dimecres 20 Desembre 2006 03:36, David Cournapeau escrigu?: > > Francesc Altet wrote: > > > A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigu?: > > >> Hi, > > >> @fname at _copyswap (void *dst, void *src, int swap, void *arr) > { > > if (src != NULL) /* copy first if needed */ > memcpy(dst, src, sizeof(@type@)); > > [where the numpy code generator is replacing @fname@ by DOUBLE] > > we see that memcpy is called under the hood (I don't know why oprofile > is not able to detect this call anymore). > > After looking at the function, and remembering what Charles Harris > said in a previous message about the convenience to use a simple type > specific assignment, I've ended replacing the memcpy. Here it is the > patch: > > --- numpy/core/src/arraytypes.inc.src (revision 3487) > +++ numpy/core/src/arraytypes.inc.src (working copy) > @@ -997,11 +997,11 @@ > } > > static void > - at fname@_copyswap (void *dst, void *src, int swap, void *arr) > + at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr) > { > > if (src != NULL) /* copy first if needed */ > - memcpy(dst, src, sizeof(@type@)); > + *dst = *src; > > if (swap) { > register char *a, *b, c; We could get rid of the register keyword too, it is considered obsolete these days. Also, for most architectures #if SIZEOF_ at fsize@ == 4 b = a + 3; c = *a; *a++ = *b; *b-- = c; c = *a; *a++ = *b; *b = c; will be notably slower than #if SIZEOF_ at fsize@ == 4 c = a[0]; a[0] = a[3]; a[3] = c; c = a[1]; a[1] = a[2]; a[2] = c; because loading the indexed addresses is a single instruction if a is in a register. Inlining would also be good, but can be tricky and compiler dependent. If all the code is in one big chunk, things aren't so bad and a simple inline directive should do the trick. We would also want to break the subroutine up into smaller pieces so that the common case was inlined and the more complicated cases remained function calls. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjanikas at esri.com Wed Dec 20 17:48:06 2006 From: mjanikas at esri.com (Mark Janikas) Date: Wed, 20 Dec 2006 14:48:06 -0800 Subject: [Numpy-discussion] Newbie Question, Probability Message-ID: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> Hello all, Is there a way to get probability values for the various families of distributions in numpy? I.e. ala R: > pnorm(1.96, mean = 0 , sd = 1) [1] 0.9750021 # for the normal > pt(1.65, df=100) [1] 0.9489597 # for student t Any suggestions would be greatly appreciated. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjanikas at esri.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.denniston at alum.dartmouth.org Wed Dec 20 18:02:41 2006 From: tom.denniston at alum.dartmouth.org (Tom Denniston) Date: Wed, 20 Dec 2006 17:02:41 -0600 Subject: [Numpy-discussion] A question about argmax and argsort In-Reply-To: References: Message-ID: If you want the n largest item i would recommend quicksort but at each partition you only recurse into the side of the pivot that has the values you care about. This is easy to determine because you know how many items are on either side of the pivot and you know that you want the nth item. This makes it take N+1/2N +1/4N... time, which telscopes to 2N or O(N) rather than n log(n) of sorting algorithms. I've found empirically this beats the pants of quicksorting and taking the nth value. I don't know of a way to do this in numpy. I think it would require adding a cfunction to numpy. Perhaps an "argnth" function? Does anyone else know of an existing mechanism? On 12/12/06, asaf david wrote: > Hello > Let's say i have an N sized array, and i want to get the positions of the K > largest items. for K = 1 this is simply argmax. is there any way to > generalize it for k !=1? currently I use argsort and take only K items from > it, but I'm paying an additional ~lg(N)... > > Thanks in advance, asaf > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > From pgmdevlist at gmail.com Wed Dec 20 18:17:54 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 20 Dec 2006 18:17:54 -0500 Subject: [Numpy-discussion] A question about argmax and argsort In-Reply-To: References: Message-ID: <200612201817.54428.pgmdevlist@gmail.com> On Wednesday 20 December 2006 18:02, Tom Denniston wrote: > If you want the n largest item i would recommend quicksort ... > I don't know of a way to do this in numpy. I think it would require > adding a cfunction to numpy. Perhaps an "argnth" function? > > Does anyone else know of an existing mechanism? Is it really needed when you have argsort ? >>> x=N.array([1,3,5,2,4]) >>> ax=N.argsort(x) >>> ax array([0, 3, 1, 4, 2]) >>> x[ax[0]], x[ax[-1]], x[ax-3]] 1, 5, 3 Or am I once again missing the point entirely ? From robert.kern at gmail.com Wed Dec 20 18:30:40 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Dec 2006 17:30:40 -0600 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> Message-ID: <4589C7A0.6030008@gmail.com> Mark Janikas wrote: > Hello all, > > Is there a way to get probability values for the various families of > distributions in numpy? I.e. ala R: We have a full complement of PDFs, CDFs, etc. in scipy. In [1]: from scipy import stats In [2]: stats.norm.pdf(1.96, loc=0.0, scale=1.0) Out[2]: array(0.058440944333451469) In [3]: stats.norm.cdf(1.96, loc=0.0, scale=1.0) Out[3]: array(0.97500210485177952) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Wed Dec 20 18:34:32 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Dec 2006 17:34:32 -0600 Subject: [Numpy-discussion] A question about argmax and argsort In-Reply-To: <200612201817.54428.pgmdevlist@gmail.com> References: <200612201817.54428.pgmdevlist@gmail.com> Message-ID: <4589C888.9010604@gmail.com> Pierre GM wrote: > On Wednesday 20 December 2006 18:02, Tom Denniston wrote: >> If you want the n largest item i would recommend quicksort > ... >> I don't know of a way to do this in numpy. I think it would require >> adding a cfunction to numpy. Perhaps an "argnth" function? >> >> Does anyone else know of an existing mechanism? > > Is it really needed when you have argsort ? >>>> x=N.array([1,3,5,2,4]) >>>> ax=N.argsort(x) >>>> ax > array([0, 3, 1, 4, 2]) >>>> x[ax[0]], x[ax[-1]], x[ax-3]] > 1, 5, 3 > > Or am I once again missing the point entirely ? There are algorithms that can be faster if you can ignore the bulk of the irrelevant data. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gnchen at mac.com Wed Dec 20 19:44:07 2006 From: gnchen at mac.com (Gennan Chen) Date: Wed, 20 Dec 2006 16:44:07 -0800 Subject: [Numpy-discussion] PyArray_DIMS problem Message-ID: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> Hi! I have problem with this function call under FC6 X86_64 for my own numpy extension printf("\n %d %d %d", PyArray_DIM(imgi,0),PyArray_DIM(imgi, 1),PyArray_DIM(imgi,2)) it gave me 166 256 256 if I tried: int *dim; dim = PyArray_DIMS(imgi) printf("\n %d %d %d", dim[0], dim[1], dim[2]); it gave me 166 0 256 Numpy version: In [2]: numpy.__version__ Out[2]: '1.0.2.dev3487' I did test it under OS X 10.4.8 on MacPro. Those two methods gave me the exact results. So, what happens here ?? Gen-Nan Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at ee.byu.edu Wed Dec 20 20:25:23 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 20 Dec 2006 18:25:23 -0700 Subject: [Numpy-discussion] PyArray_DIMS problem In-Reply-To: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> References: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> Message-ID: <4589E283.7000603@ee.byu.edu> Gennan Chen wrote: > Hi! > > I have problem with this function call under FC6 X86_64 for my own > numpy extension > > printf("\n %d %d %d", > PyArray_DIM(imgi,0),PyArray_DIM(imgi,1),PyArray_DIM(imgi,2)) > > it gave me > > 166 256 256 > > if I tried: > > int *dim; > dim = PyArray_DIMS(imgi) > printf("\n %d %d %d", dim[0], dim[1], dim[2]); > > it gave me 166 0 256 > > Numpy version: > > In [2]: numpy.__version__ > Out[2]: '1.0.2.dev3487' > > I did test it under OS X 10.4.8 on MacPro. Those two methods gave me > the exact results. So, what happens here ?? > No idea. You should try PyArray_DIMS(imgi)[0], PyArray_DIMS(imgi)[1], PyArray_DIMS(imgi)[2] and see what that does. -Travis From aisaac at american.edu Wed Dec 20 20:41:29 2006 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 20 Dec 2006 20:41:29 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <4589C7A0.6030008@gmail.com> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com><4589C7A0.6030008@gmail.com> Message-ID: On Wed, 20 Dec 2006, Robert Kern apparently wrote: > We have a full complement of PDFs, CDFs, etc. in scipy. This is my "most missed" functionality in NumPy. (For now I feel cannot ask students to install SciPy.) Although it is a slippery slope, and I definitely do not want NumPy to slide down it, I would certainly not complain if this basic functionaltiy were moved to NumPy... Cheers, Alan Isaac From cjw at sympatico.ca Wed Dec 20 18:11:24 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed, 20 Dec 2006 18:11:24 -0500 Subject: [Numpy-discussion] sum of two arrays with different shape? In-Reply-To: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> References: <4ff46d8f0612171826q262231f0kc3d5f2e8070046f9@mail.gmail.com> Message-ID: zhang yunfeng wrote: > Hi, I'm newbie to Numpy. > > When reading tutorials at > http://www.scipy.org/Tentative_NumPy_Tutorial > , I found a snippet about > addition of two arrays with different shape, Does it make sense? If > array shapes are not same, why it doesn't throw out an error? > I'm not sure what the rules are but this example throws an error, which it should. [Dbg]>>> x= N.array([1, 2, 3]) [Dbg]>>> y= x+[1, 2, 3, 4] Traceback (most recent call last): File "", line 1, in ValueError: shape mismatch: objects cannot be broadcast to a single shape [Dbg]>>> > see the code below (taken from the above webpage) > array a.shape is (4,) and y.shape is (3,4) and a+y ? > > ------------------------------------------- > >>> y = arange(12) > >>> y > array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) > >>> y.shape = 3,4 # does not modify the total number of > elements > >>> y > array([[ 0, 1, 2, 3], > [ 4, 5, 6, 7], > [ 8, 9, 10, 11]]) > > It is possible to operate with arrays of diferent dimensions as long as > they fit well. > >>> 3*a # multiply each element of a by 3 > array([ 30, 60, 90, 120]) > >>> a+y # sum a to each row of y > array([[10, 21, 32, 43], > [14, 25, 36, 47], > [18, 29, 40, 51]]) > -------------------------------------------- > This seems a reasonable operation. Colin W. > -- > http://my.opera.com/zhangyunfeng > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From lists.steve at arachnedesign.net Wed Dec 20 21:35:44 2006 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 20 Dec 2006 21:35:44 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com><4589C7A0.6030008@gmail.com> Message-ID: <3FA28F05-9CCA-4F67-B1EF-2D2AF92D4BBA@arachnedesign.net> On Dec 20, 2006, at 8:41 PM, Alan G Isaac wrote: > On Wed, 20 Dec 2006, Robert Kern apparently wrote: >> We have a full complement of PDFs, CDFs, etc. in scipy. > > This is my "most missed" functionality in NumPy. > (For now I feel cannot ask students to install SciPy.) If they're already installing numpy, isn't 98% of the work already done at that point? I'm pretty sure you can install scipy w/o the fortran dependency if you're not concerned about speed and what not, right? It should be a pretty easy install. Besides ... what else do they have to do during the first week-and-a- half of the semester anyway, right? ;-) -steve From haase at msg.ucsf.edu Wed Dec 20 22:43:28 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Wed, 20 Dec 2006 19:43:28 -0800 Subject: [Numpy-discussion] PyArray_DIMS problem In-Reply-To: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> References: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> Message-ID: On 12/20/06, Gennan Chen wrote: > Hi! > > > I have problem with this function call under FC6 X86_64 for my own numpy > extension > > > printf("\n %d %d %d", > PyArray_DIM(imgi,0),PyArray_DIM(imgi,1),PyArray_DIM(imgi,2)) > > > it gave me > > > 166 256 256 > > > if I tried: > > > int *dim; > dim = PyArray_DIMS(imgi) > printf("\n %d %d %d", dim[0], dim[1], dim[2]); > > > it gave me 166 0 256 > Hi - maybe I'm dense here - but how is this /supposed/ to work ? Is PyArray_DIMS allocating some memory that never gets freed !? I thought "tuples" in C had to always be passed into a function, so that that function could modify it, as in: const int maxNDim = 20; int dim[maxNDim]; PyArray_DIMS(imgi, dim); What am I missing ... ? -Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: From gnchen at mac.com Wed Dec 20 23:48:58 2006 From: gnchen at mac.com (Gennan Chen) Date: Wed, 20 Dec 2006 20:48:58 -0800 Subject: [Numpy-discussion] PyArray_DIMS problem In-Reply-To: References: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> Message-ID: Here is the definition of that call from ndarrayobject.h #define PyArray_DIMS(obj) (((PyArrayObject *)(obj))->dimensions) I believe the memory has been allocated. It just return a pointer. Gen On Dec 20, 2006, at 7:43 PM, Sebastian Haase wrote: > > > On 12/20/06, Gennan Chen wrote: > Hi! > > > I have problem with this function call under FC6 X86_64 for my own > numpy extension > > > printf("\n %d %d %d", PyArray_DIM(imgi,0),PyArray_DIM(imgi, > 1),PyArray_DIM(imgi,2)) > > > it gave me > > > 166 256 256 > > > if I tried: > > > int *dim; > dim = PyArray_DIMS(imgi) > printf("\n %d %d %d", dim[0], dim[1], dim[2]); > > > it gave me 166 0 256 > > > Hi - > maybe I'm dense here - > but how is this /supposed/ to work ? Is PyArray_DIMS allocating > some memory that never gets freed !? > I thought "tuples" in C had to always be passed into a function, > so that that function could modify it, as in: > > const int maxNDim = 20; > int dim[maxNDim]; > PyArray_DIMS(imgi, dim); > > > What am I missing ... ? > -Sebastian > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Wed Dec 20 23:59:47 2006 From: peridot.faceted at gmail.com (A. M. Archibald) Date: Wed, 20 Dec 2006 23:59:47 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <4589C7A0.6030008@gmail.com> Message-ID: On 20/12/06, Alan G Isaac wrote: > On Wed, 20 Dec 2006, Robert Kern apparently wrote: > > We have a full complement of PDFs, CDFs, etc. in scipy. > > This is my "most missed" functionality in NumPy. > (For now I feel cannot ask students to install SciPy.) > Although it is a slippery slope, and I definitely do not > want NumPy to slide down it, I would certainly not complain > if this basic functionaltiy were moved to NumPy... This is silly. If it were up to me I would rip out much of the fancy features from numpy and put them in scipy. It's really not very difficult to install, particularly if you don't much care how fast it is, or are using (say) a Linux distribution that packages it. It seems to me that numpy should include only tools for basic calculations on arrays of numbers. The ufuncs, simple wrappers (dot, for example). Anything that requires nontrivial amounts of math (matrix inversion, statistical functions, generating random numbers from exponential distributions, and so on) should go in scipy. If numpy were to satisfy everyone who says, "I like numpy, but I wish it included [their favourite feature from scipy] because I don't want to install scipy", numpy would grow to include everything in scipy. Perhaps an alternative criterion would be "it can go in numpy if it has no external requirements". I think this is a mistake, since it means users have a monstrous headache figuring out what is in which package (for example, some of scipy.integrate depends on external tools and some does not). Moreover it damages the performance of numpy. For example, dot would be faster (for arrays that happen to be matrix-shaped, and possibly in general) if it could use ATLAS' routine from BLAS. Of course, numpy is currently fettered by the need to maintain some sort of compatibility with Numeric and numarray; shortly it will have to worry about compatibility with previous versions of numpy as well. A. M. Archibald From svetosch at gmx.net Thu Dec 21 09:30:31 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu, 21 Dec 2006 15:30:31 +0100 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <45845609.40705@gmx.net> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> <45845609.40705@gmx.net> Message-ID: <458A9A87.2000302@gmx.net> Sven Schreiber schrieb: > Keith Goodman schrieb: > >> There are many numpy functions that will take a matrix as input but >> return an array. >> >> The nan functions (nanmin, nanmax, nanargmin, nanargmax, nansum) are an example. >> > > So that would be a bug IMHO and should be filed as a ticket. I will do > that eventually if nobody stops me first... > This is now ticket #405. Keith, you said "example", do you know of any other functions? Maybe you could add a comment to the ticket. cheers, sven From svetosch at gmx.net Thu Dec 21 10:10:17 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu, 21 Dec 2006 16:10:17 +0100 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <4589C7A0.6030008@gmail.com> Message-ID: <458AA3D9.6060805@gmx.net> A. M. Archibald schrieb: > On 20/12/06, Alan G Isaac wrote: >> This is my "most missed" functionality in NumPy. >> (For now I feel cannot ask students to install SciPy.) >> Although it is a slippery slope, and I definitely do not >> want NumPy to slide down it, I would certainly not complain >> if this basic functionaltiy were moved to NumPy... ... > If numpy were to satisfy everyone who says, "I like numpy, but I wish > it included [their favourite feature from scipy] because I don't want > to install scipy", numpy would grow to include everything in scipy. > Well my package manager just reported something like 800K for numpy and 20M for scipy, so I think we're not quite at the point of numpy taking over everything yet (if those numbers are actually meaningful, probably I'm missing something ?). I would also welcome if some functionality could be moved to numpy if the size requirements are reasonably small. Currently I try to avoid to depend on the scipy package to make my programs more portable, and I'm mostly successful, but not always. The p-value stuff in numpy would be helpful here, as Alan already said. Now I don't know if that stuff passes the size criterion, some expert would know that. But if it does, it would be nice if you could consider moving it over eventually. Of course you need to strike a balance, and the optimum is debatable. But again, if scipy is really more than 20 times the size of numpy, and some frequently used things are not in numpy, is there really an urgent need to freeze numpy's set of functionality? just a user's thought, sven From kwgoodman at gmail.com Thu Dec 21 11:13:54 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Dec 2006 08:13:54 -0800 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <458A9A87.2000302@gmx.net> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> <45845609.40705@gmx.net> <458A9A87.2000302@gmx.net> Message-ID: On 12/21/06, Sven Schreiber wrote: > Sven Schreiber schrieb: > > Keith Goodman schrieb: > > > >> There are many numpy functions that will take a matrix as input but > >> return an array. > >> > >> The nan functions (nanmin, nanmax, nanargmin, nanargmax, nansum) are an example. > >> > > > > So that would be a bug IMHO and should be filed as a ticket. I will do > > that eventually if nobody stops me first... > > > > This is now ticket #405. Keith, you said "example", do you know of any > other functions? Maybe you could add a comment to the ticket. I'll look for more examples. How about diag? >> x matrix([[ 0.82553498, 0.89115156], [ 0.106748 , 0.21844565]]) >> M.diag(x) array([ 0.82553498, 0.21844565]) >> M.asmatrix(M.diag(x)).T matrix([[ 0.82553498], [ 0.21844565]]) >> From kwgoodman at gmail.com Thu Dec 21 11:30:17 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Dec 2006 08:30:17 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <4589C7A0.6030008@gmail.com> Message-ID: On 12/20/06, A. M. Archibald wrote: > Moreover it damages the performance of > numpy. For example, dot would be faster (for arrays that happen to be > matrix-shaped, and possibly in general) if it could use ATLAS' routine > from BLAS. I thought numpy uses ATLAS. Matrix multiplication in numpy is about as fast as in Octave. So it must be using ATLAS. From mjanikas at esri.com Thu Dec 21 11:43:31 2006 From: mjanikas at esri.com (Mark Janikas) Date: Thu, 21 Dec 2006 08:43:31 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458AA3D9.6060805@gmx.net> Message-ID: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> Thanks for all the input so far. The only thing that seems odd about the omission of probability or quantile functions in NumPy is that all the random number generators are present in RandomArray. At any rate, hopefully this bit of functionality will be present in the future, but for now, IMO the library is awesome..... I am used to using R for math routines, and all my sparse matrix stuff is WAAAAAAY faster using the Python-NumPy Combo! Thanks to all for their insight, MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Sven Schreiber Sent: Thursday, December 21, 2006 7:10 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Newbie Question, Probability A. M. Archibald schrieb: > On 20/12/06, Alan G Isaac wrote: >> This is my "most missed" functionality in NumPy. >> (For now I feel cannot ask students to install SciPy.) >> Although it is a slippery slope, and I definitely do not >> want NumPy to slide down it, I would certainly not complain >> if this basic functionaltiy were moved to NumPy... ... > If numpy were to satisfy everyone who says, "I like numpy, but I wish > it included [their favourite feature from scipy] because I don't want > to install scipy", numpy would grow to include everything in scipy. > Well my package manager just reported something like 800K for numpy and 20M for scipy, so I think we're not quite at the point of numpy taking over everything yet (if those numbers are actually meaningful, probably I'm missing something ?). I would also welcome if some functionality could be moved to numpy if the size requirements are reasonably small. Currently I try to avoid to depend on the scipy package to make my programs more portable, and I'm mostly successful, but not always. The p-value stuff in numpy would be helpful here, as Alan already said. Now I don't know if that stuff passes the size criterion, some expert would know that. But if it does, it would be nice if you could consider moving it over eventually. Of course you need to strike a balance, and the optimum is debatable. But again, if scipy is really more than 20 times the size of numpy, and some frequently used things are not in numpy, is there really an urgent need to freeze numpy's set of functionality? just a user's thought, sven _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From faltet at carabos.com Thu Dec 21 11:45:02 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu, 21 Dec 2006 17:45:02 +0100 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> Message-ID: <200612211745.03602.faltet@carabos.com> A Dijous 21 Desembre 2006 05:59, A. M. Archibald escrigu?: > On 20/12/06, Alan G Isaac wrote: > > On Wed, 20 Dec 2006, Robert Kern apparently wrote: > > > We have a full complement of PDFs, CDFs, etc. in scipy. > > > > This is my "most missed" functionality in NumPy. > > (For now I feel cannot ask students to install SciPy.) > > Although it is a slippery slope, and I definitely do not > > want NumPy to slide down it, I would certainly not complain > > if this basic functionaltiy were moved to NumPy... > > This is silly. > > If it were up to me I would rip out much of the fancy features from > numpy and put them in scipy. It's really not very difficult to > install, particularly if you don't much care how fast it is, or are > using (say) a Linux distribution that packages it. > > It seems to me that numpy should include only tools for basic > calculations on arrays of numbers. The ufuncs, simple wrappers (dot, > for example). Anything that requires nontrivial amounts of math > (matrix inversion, statistical functions, generating random numbers > from exponential distributions, and so on) should go in scipy. > > If numpy were to satisfy everyone who says, "I like numpy, but I wish > it included [their favourite feature from scipy] because I don't want > to install scipy", numpy would grow to include everything in scipy. > > Perhaps an alternative criterion would be "it can go in numpy if it > has no external requirements". I think this is a mistake, since it > means users have a monstrous headache figuring out what is in which > package (for example, some of scipy.integrate depends on external > tools and some does not). Moreover it damages the performance of > numpy. For example, dot would be faster (for arrays that happen to be > matrix-shaped, and possibly in general) if it could use ATLAS' routine > from BLAS. > > Of course, numpy is currently fettered by the need to maintain some > sort of compatibility with Numeric and numarray; shortly it will have > to worry about compatibility with previous versions of numpy as well. > I agree with most of the arguments above, so +1 -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From aisaac at american.edu Thu Dec 21 12:23:41 2006 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 21 Dec 2006 12:23:41 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <200612211745.03602.faltet@carabos.com> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com><200612211745.03602.faltet@carabos.com> Message-ID: A Dijous 21 Desembre 2006 05:59, A. M. Archibald escrigu?: > It seems to me that numpy should include only tools for > basic calculations on arrays of numbers. The ufuncs, > simple wrappers (dot, for example). Anything that requires > nontrivial amounts of math (matrix inversion, statistical > functions, generating random numbers from exponential > distributions, and so on) should go in scipy. As a user, I suggest that this becomes a reasonable goal when up to date SciPy installers are maintained for all target platforms. Unless you wish to exclude everyone who is intimidated when installation is less than trivial... Until then, I suggest, the question of the proper functionality bundle with NumPy remains open. Of course as a user I do not pretend to resolve such a question---recall that I mentioned the slippery slope in my post---but I do object to it being dismissed as "silly" when I offered a straightforward explanation. It is well understood that the current view of the developers is that if anything too much is already in NumPy. Any user comments are taking place within that context. Alan Isaac PS A question: is it a good thing if more students start using NumPy *now*? It looks to me like building community size is an important current goal for NumPy. Strip it down like you suggest and aside from Windows users (and Macs are increasingly popular among my students) you'll have only the few that are not intimidated by building SciPy (which still has no intaller for Python 2.5). From faltet at carabos.com Thu Dec 21 12:52:37 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu, 21 Dec 2006 18:52:37 +0100 Subject: [Numpy-discussion] ANN: PyTables 1.4 released Message-ID: <200612211852.38035.faltet@carabos.com> =========================== Announcing PyTables 1.4 =========================== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. It is based on the HDF5 library for doing the I/O and leverages the numarray/NumPy/Numeric packages so as to deliver the data to the user in convenient in-memory containers. This is a new major release of PyTables, and probably the last major one of the 1.x series (i.e. with numarray at the core). On it, we have implemented better code to deal with table buffers, enhanced the capability for reading native HDF5 files, enhanced support for 64-bit platforms (but not with Python 2.5: see ``Special Warning`` section below), better support for AIX, optional automatic parent creation and the traditional amount of bug fixes. Go to the PyTables web site for downloading the beast: http://www.pytables.org/ or keep reading for more info about the new features and bugs fixed. Changes more in depth ===================== Improvements: - Table buffers code refactored: now each Row read iterator has its own buffers, completely independent of their table (although write iterators still share a single buffer in the same table). This separation makes the logic of buffering much more clear and less prone to errors (in fact, some of them have been solved). Performance and memory consumption are more or less equal than before. - When flushing the complete file (i.e. when calling File.flush()), only the buffers of those nodes that are alive (i.e. referenced from user code) are actually flushed. This brings much better efficiency (and also stability) to situations where one has to flush (and hence, close) files with many nodes on it. - Better support for AIX by renaming the internal LONLONG_MAX C constant (it was used internally by the xlc compiler). Thanks to Brian Granger for the report. - Added optional automatic parent creation support during node creation, copying and moving operations. See the release notes for more information. - Improved support for Python2.4 and 64-bit platforms (but beware, there are still known issues when using Python2.5 in combination with 64-bit platforms). Thanks to Gerard Vermeulen for his patches for Win64 platforms. - Implemented a workaround for a leak present in numarray --> Numeric conversions when using the array protocol, as can be seen in: http://comments.gmane.org/gmane.comp.python.numeric.general/12563 The workaround can potentially be far slower than the array protocol (because a copy of the arrays is always made), but at least the new code doesn't leak anymore. Bug fixes: - Previously, when the size for memory compounds type was less than the size of the type on disk (for example, when one have padding or aligned fields), PyTables was unable to read info on them. This has been fixed. This allows reading general compound types in HDF5 files written with other tools than PyTables. - When many tables with indexed columns were created simultaneously, a bug make PyTables to crash. This has been fixed (for more info, see bug #26). - Fixed a typo in the code that prevented recognizing complex data in non-PyTables files. - Table.createIndex() now refuses to index complex columns. - Now, it is possible to index several nested columns that hangs from the same column parent. Fixes bug #24. - Fixed a typo in nctoh5 utility that prevented using filters properly. Thanks to Lou Wicker for reporting this. - When setting/appending an array in-memory to an Array (or descendant) object and they have mismatched byteorders, the array was set/appended without being byteswapped first. This has been fixed. Thanks to Elias Collas for the report. Deprecated features: - None Backward-incompatible changes: - Please, see ``RELEASE-NOTES.txt`` file. Special Warning for Python 2.5 and 64-bit platforms users ========================================================= Unfortunately, and due to problems with the combination numarray 1.5.2, Python2.5 and 64-bit platforms, PyTables cannot be safely used yet in such scenario. This will be solved either when numarray can address this issue (hopefully with numarray 1.5.3), or when PyTables 2.x series (with NumPy at its core) will be out. Important note for Windows users ================================ If you are willing to use PyTables with Python 2.4 or 2.5 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win.ZIP Platforms ========= This version has been extensively checked on quite a few platforms, like Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64 (Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64) and MacOSX on PowerPC. For other platforms, chances are that the code can be easily compiled and run without further issues. Please, contact us in case you are experiencing problems. Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdf.ncsa.uiuc.edu/HDF5/ About numarray: http://www.stsci.edu/resources/software_hardware/numarray About NumPy: http://numpy.scipy.org/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Acknowledgments =============== Thanks to various the users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to Acusim (http://www.acusim.com/) for sponsoring many of the job done for releasing this version of PyTables. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team From kwgoodman at gmail.com Thu Dec 21 13:28:40 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Dec 2006 10:28:40 -0800 Subject: [Numpy-discussion] Assignment when there is nothing to assign Message-ID: I have the following two lines in a for loop: if len(idx): x[idx,i] = x[idx,i-1] where idx is the output of a where statement. I need the 'if len(idx)' line to prevent an error when idx is empty. Would it make sense to allow x[idx,i] = x[idx,i-1] when idx is empty instead of raising an error? The error I get is ValueError: array is not broadcastable to correct shape From peridot.faceted at gmail.com Thu Dec 21 13:41:51 2006 From: peridot.faceted at gmail.com (A. M. Archibald) Date: Thu, 21 Dec 2006 13:41:51 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <200612211745.03602.faltet@carabos.com> Message-ID: On 21/12/06, Alan G Isaac wrote: > A Dijous 21 Desembre 2006 05:59, A. M. Archibald escrigu?: > > It seems to me that numpy should include only tools for > > basic calculations on arrays of numbers. The ufuncs, > > simple wrappers (dot, for example). Anything that requires > > nontrivial amounts of math (matrix inversion, statistical > > functions, generating random numbers from exponential > > distributions, and so on) should go in scipy. > > As a user, I suggest that this becomes a reasonable goal > when up to date SciPy installers are maintained for all > target platforms. Unless you wish to exclude everyone who > is intimidated when installation is less than trivial... > > Until then, I suggest, the question of the proper > functionality bundle with NumPy remains open. Of course as > a user I do not pretend to resolve such a question---recall > that I mentioned the slippery slope in my post---but I do > object to it being dismissed as "silly" when I offered > a straightforward explanation. > > It is well understood that the current view of the > developers is that if anything too much is already in NumPy. > Any user comments are taking place within that context. Just to be clear: I am not a developer. I am a user who is frustrated with the difficulty of telling whether to look for a given feature in numpy or in scipy. (I have also never really had much difficulty installing scipy either from the packages in one of several linux distribution or compiling it from scratch.) I suppose the basic difference of opinions here is that I think numpy has already taken too many steps down the slippery slope. Also I don't think 20 megabytes is enough disk space to care about, and I think it is better in the long term to encourage the scipy developers to get the installers working than it is to jam all kinds of scientific functionality into this array package to avoid having to install the scientific computing package. > PS A question: is it a good thing if more students start > using NumPy *now*? It looks to me like building community > size is an important current goal for NumPy. Strip it down > like you suggest and aside from Windows users (and Macs are > increasingly popular among my students) you'll have only the > few that are not intimidated by building SciPy (which still > has no intaller for Python 2.5). I didn't have to build scipy (though I have, it's not hard), and I don't use Windows. But no, I don't think it can be stripped down yet; the backward compatibility issue is currently important. I think moving scientific functionality from scipy to numpy is a step in the wrong direction, though. A. M. Archibald From seb.haase at gmx.net Wed Dec 20 22:39:03 2006 From: seb.haase at gmx.net (Sebastian Haase) Date: Wed, 20 Dec 2006 19:39:03 -0800 Subject: [Numpy-discussion] PyArray_DIMS problem In-Reply-To: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> References: <4F7A1316-B04F-4C3C-929B-B2420F97A139@mac.com> Message-ID: Hi! On 12/20/06, Gennan Chen wrote: > Hi! > > > I have problem with this function call under FC6 X86_64 for my own numpy > extension > > > printf("\n %d %d %d", > PyArray_DIM(imgi,0),PyArray_DIM(imgi,1),PyArray_DIM(imgi,2)) > > > it gave me > > > 166 256 256 > > > if I tried: > > > int *dim; > dim = PyArray_DIMS(imgi) > printf("\n %d %d %d", dim[0], dim[1], dim[2]); > > > it gave me 166 0 256 > Hi - maybe I'm dense here - but how is this /supposed/ to work ? Is PyArray_DIMS allocating some memory that never gets freed !? I thought "tuples" in C had to always be passed into a function, so that that function could modify it, as in: const int maxNDim = 20; int dim[maxNDim]; PyArray_DIMS(imgi, dim); --- What am I missing ... ? -Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Dec 21 14:17:20 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Dec 2006 11:17:20 -0800 Subject: [Numpy-discussion] Assignment when there is nothing to assign In-Reply-To: References: Message-ID: On 12/21/06, Keith Goodman wrote: > I have the following two lines in a for loop: > > if len(idx): > x[idx,i] = x[idx,i-1] > > where idx is the output of a where statement. > > I need the 'if len(idx)' line to prevent an error when idx is empty. > > Would it make sense to allow > > x[idx,i] = x[idx,i-1] > > when idx is empty instead of raising an error? > > The error I get is > > ValueError: array is not broadcastable to correct shape Why doesn't this work? >> y matrix([[ 0.93473209, 0.34122426], [ 0.68353656, 0.77589206], [ 0.50677768, 0.25089722]]) >> idx = M.array([1, 2]) >> y[idx,0] <------this works matrix([[ 0.68353656], [ 0.50677768]]) >> y[idx,0] = y[idx,1] <------but this doesn't --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) ValueError: array is not broadcastable to correct shape But this works: >> idx = M.asarray(M.asmatrix(idx).T) >> y[idx,0] = y[idx,1] From svetosch at gmx.net Thu Dec 21 15:33:50 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu, 21 Dec 2006 21:33:50 +0100 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> <45845609.40705@gmx.net> <458A9A87.2000302@gmx.net> Message-ID: <458AEFAE.4040609@gmx.net> Keith Goodman schrieb: > How about diag? > There was a thread about this (in which you participated, I believe); for matrices you should now use m.diagonal() I think. So diag doesn't qualify. -sven From oliphant at ee.byu.edu Thu Dec 21 16:10:37 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu, 21 Dec 2006 14:10:37 -0700 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> Message-ID: <458AF84D.1060400@ee.byu.edu> Mark Janikas wrote: > Thanks for all the input so far. The only thing that seems odd about > the omission of probability or quantile functions in NumPy is that all > the random number generators are present in RandomArray. A big part of the issue is that getting many of those pdfs into NumPy would require putting many special functions into NumPy (some of which are actually coded in Fortran). I much prefer to make SciPy an easy install for as many people as possible and/or work on breaking up SciPy into modular components that can be installed separately if needed. This was my original intention --- to make NumPy as small as possible. It's current size is driven by backwards compatibility, only. -Travis From Chris.Barker at noaa.gov Thu Dec 21 16:08:15 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 21 Dec 2006 13:08:15 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <200612211745.03602.faltet@carabos.com> Message-ID: <458AF7BF.9020109@noaa.gov> A key thing to remember here is that each user has their particular set of "small things" that are all they need from scipy -- put us all together, and you have SciPy -- that's what it is for. > As a user, I suggest that this becomes a reasonable goal > when up to date SciPy installers are maintained for all > target platforms. All it takes is someone to do it. Also, there was talk of "modularizing" scipy so that it would be easy to install only those bits you need -- in particular, the non-Fortran stuff should be trivial to build. > Macs are > increasingly popular among my students It can be a pain to build this kind of thing on OS-X, as Apple has not supported a Fortran compiler yet, but it can (and has) been done. IN fact, the Mac is a great target for pre-built binaries as there is only a small variety of hardware to support, and Apple supplies LAPACK/BLAS libs with the system. As for distributing it, the archive at: pythonmac.org/packages takes submissions from anyone -- just send a note to the pythonmac list -- that list is a great help in figuring out how to build stuff too. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Thu Dec 21 16:13:17 2006 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 21 Dec 2006 16:13:17 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458AF84D.1060400@ee.byu.edu> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> Message-ID: <200612211613.18347.pgmdevlist@gmail.com> On Thursday 21 December 2006 16:10, Travis Oliphant wrote: > I much prefer to make SciPy an easy install for as many people as > possible and/or work on breaking up SciPy into modular components that > can be installed separately if needed. Talking about that, what happened to these projects of modular installation of scipy ? Robert promised us last month to explain what went wrong with his approach, but never had the time... From robert.kern at gmail.com Thu Dec 21 16:43:28 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 21 Dec 2006 15:43:28 -0600 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <200612211613.18347.pgmdevlist@gmail.com> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> Message-ID: <458B0000.8070909@gmail.com> Pierre GM wrote: > On Thursday 21 December 2006 16:10, Travis Oliphant wrote: > >> I much prefer to make SciPy an easy install for as many people as >> possible and/or work on breaking up SciPy into modular components that >> can be installed separately if needed. > > Talking about that, what happened to these projects of modular installation > of scipy ? Robert promised us last month to explain what went wrong with his > approach, but never had the time... I created a module (scipy_subpackages.py, IIRC) next to setup.py that essentially just served as a global configuration to inform all of the setup.py's what subpackages they were supposed to build (mostly just Lib/setup.py, actually). I then had a script run through the various collections of subpackages that I wanted to build, set the appropriate values in scipy_subpackages, and run setup() with the appropriate parameters to build an egg for each collection. However, build/ apparently needs to be cleaned out between each egg, otherwise you contaminate later eggs. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From kwgoodman at gmail.com Thu Dec 21 17:00:55 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Dec 2006 14:00:55 -0800 Subject: [Numpy-discussion] Automatic matrices In-Reply-To: <458AEFAE.4040609@gmx.net> References: <1166189241.17633.27.camel@localhost.localdomain> <458323C5.7010808@gmx.net> <1166228497.17633.61.camel@localhost.localdomain> <45845609.40705@gmx.net> <458A9A87.2000302@gmx.net> <458AEFAE.4040609@gmx.net> Message-ID: On 12/21/06, Sven Schreiber wrote: > Keith Goodman schrieb: > > > How about diag? > > > > There was a thread about this (in which you participated, I believe); > for matrices you should now use m.diagonal() I think. So diag doesn't > qualify. I think the different results returned by x.diagonal and M.diagonal(x) is confusing: >> x matrix([[-0.87175207, 1.57394765], [-1.7135918 , -1.5183181 ]]) >> x.diagonal() matrix([[-0.87175207, -1.5183181 ]]) <-----matrix >> M.diagonal(x) array([-0.87175207, -1.5183181 ]) <-----array 321 def diagonal(a, offset=0, axis1=0, axis2=1): 322 """diagonal(a, offset=0, axis1=0, axis2=1) returns the given diagonals 323 defined by the last two dimensions of the array. 324 """ 325 return asarray(a).diagonal(offset, axis1, axis2) Maybe this asarray could be changed to asanyarray? From svetosch at gmx.net Fri Dec 22 04:47:33 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Fri, 22 Dec 2006 10:47:33 +0100 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458B0000.8070909@gmail.com> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> Message-ID: <458BA9B5.6050709@gmx.net> Robert Kern schrieb: > Pierre GM wrote: >> Talking about that, what happened to these projects of modular installation >> of scipy ? Robert promised us last month to explain what went wrong with his >> approach, but never had the time... > > I created a module (scipy_subpackages.py, IIRC) next to setup.py that > essentially just served as a global configuration to inform all of the > setup.py's what subpackages they were supposed to build (mostly just > Lib/setup.py, actually). I then had a script run through the various collections > of subpackages that I wanted to build, set the appropriate values in > scipy_subpackages, and run setup() with the appropriate parameters to build an > egg for each collection. > > However, build/ apparently needs to be cleaned out between each egg, otherwise > you contaminate later eggs. > So, to put it "pointedly" (if that's the right word...?): Numpy should not get small functions from scipy -> because the size of scipy doesn't matter -> because scipy's modules will be installable as add-ons separately (and because there will be ready-to-use installers); however, nobody knows how to actually do that in practice ? Well that will convince my colleagues! Please don't feel offended, I just want to make the point (as usual) that this way numpy is going to be a good library for other software projects, but not super-attractive for direct users (aka "matlab converts", although I personally don't come from matlab). cheers, sven From gael.varoquaux at normalesup.org Fri Dec 22 05:14:12 2006 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 22 Dec 2006 11:14:12 +0100 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458BA9B5.6050709@gmx.net> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> Message-ID: <20061222101410.GA22958@clipper.ens.fr> On Fri, Dec 22, 2006 at 10:47:33AM +0100, Sven Schreiber wrote: > Please don't feel offended, I just want to make the point (as usual) > that this way numpy is going to be a good library for other software > projects, but not super-attractive for direct users (aka "matlab > converts", although I personally don't come from matlab). I think that the equivalent of MatLab is more scipy than numpy. I thing there is a miss-understanding here. Ga?l From ivilata at carabos.com Fri Dec 22 06:27:40 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Fri, 22 Dec 2006 12:27:40 +0100 Subject: [Numpy-discussion] Type of 1st argument in Numexpr where() In-Reply-To: <4589AB55.5080602@ieee.org> References: <20061220120226.GC11105@tardis.terramar.selidor.net> <458962B1.9010005@ieee.org> <20061220185155.GE11105@tardis.terramar.selidor.net> <4589AB55.5080602@ieee.org> Message-ID: <20061222112740.GF11105@tardis.terramar.selidor.net> Tim Hochberg (el 2006-12-20 a les 14:29:57 -0700) va dir:: > Let's look at simpler than where, which is a confusing function. How > about *sin*. > [...] Ok, I think I already get the idea about the need of adding extra opcodes if ``where()`` only accepted booleans as first arguments. Thanks for the explanation! :) :: > It would be possible to adapt your original idea. We could do the following: > > 1. Add a function boolean() to the numexpr namespace. This would cast > it's argument to an array of bools. > 2. Tweak the compile (actually, probably where_func in > expressions.py) to compile where(x,a,b) as where(bool(x),a,b) > 3. Change where to take bools as the first argument. > > Or, maybe it would be cleaner to instead change the casting rules so > that casting to bool happens automagically. Having cycles in the casting > rules frightens me a bit, but it could probably be made to work. I understand that the new ``boolean()`` function and the "downwards" casting to boolean are functionally equivalent and require the same number of new opcodes. However, if cycles in casting rules are frowned upon (though I don't see any problem at forst sight), I would opt for the fully explicit ``boolean()`` function. :: > So, in summary, I think that the general idea you proposed could be made > to work with some more effort. Conceptually, it's cleaner and it could > be made more efficient for the common case. On the downside, this would > require three new opcodes, as opposed to a single new opcode to do the > simple minded fix. So, I'm still a bit up in the air as to whether it's > a good idea. Well, my previous patch works without adding new opcodes; however, one is no longer able to use ``where(x, a, b)`` with ``x`` being something other than a boolean array, so one should use ``where(x != 0, a, b)``, which is more explicit about its meaning. Nonetheless, I see using non-booleans as boolean conditions as an idiom, and I don't know how frequently that feature will be used in numerical computations. So I'm also in the air about the idea. ;) Cheers, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From Chris.Barker at noaa.gov Fri Dec 22 12:08:59 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 Dec 2006 09:08:59 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458BA9B5.6050709@gmx.net> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> Message-ID: <458C112B.4090400@noaa.gov> Sven Schreiber wrote: > So, to put it "pointedly" (if that's the right word...?): > Numpy should not get small functions from scipy -> because the size of > scipy doesn't matter -> because scipy's modules will be installable as > add-ons separately (and because there will be ready-to-use installers); > however, nobody knows how to actually do that in practice ? > I just want to make the point (as usual) > that this way numpy is going to be a good library for other software > projects, but not super-attractive for direct users (aka "matlab > converts" No one is denying that there is still work to do. I also think there are a lot of us (from "just users" to the major contributers), that WOULD like to see an easy to install package that can do everything (and more, and better) that MATLAB can. The only questions are: A) How best to accomplish this (or work toward it anyway)? B) Who's going to do the work? As for (A), I think the consensus is pretty clear -- keep numpy focused on the basic array package, with some extras for backwards compatibility, and work towards a SciPy that has all the bells and whistles, preferably installable as separate packages (like Matlab "toolboxes" I suppose). As for the above comments, if we are looking at the "Matlab converts", or more to the point, people looking for a comprehensive scientific/engineering computation package: -- "The size of SciPy doesn't matter" -- True, after all, how big is MATLAB? -- "Scipy's modules will be installable as add-ons separately" -- this is a good goal, and I think there has been progress there. -- "Nobody knows how to actually do that in practice" -- well, it's not so much that nobody knows how to do it, as that nobody has done it -- it's going to take work, but adding extra stuff to numpy takes work too, it's a matter of where you're going to focus the work. Given the size of disk drives and the speed of Internet connections these days, I'm not sure it's that important to have the "core" part of SciPy very small -- but it does need to have easy installers. That approach provides opportunity though -- breaking SciPy down into smaller packages requires expertise and consensus among the developers. However, building an installer requires only one person to take the time to do it. Yes, SciPy is too hard to build and install for an average newbie -- but it's gotten better, and it's not too hard for a savvy user that is willing to put some time it. The kind of person who is willing to put the time in to post to discussions on this group, for instance. Please, rather than complaining that core developers aren't putting your personally desired "small function" into numpy , just take the time to build the installer you need -- we need one for OS-X, build it and put it up on pythonmac -- it's not that hard, and there are a lot of people here and on the scipy and python-mac lists that will help. Now my rant: Please, please, please, could at least a few of the people that build packages take the time to make a simple installer for them and put them up on the web somewhere? Now a question: One of the key difficulties to building SciPy is that parts of it depend on Fortran and LAPACK. We'd all like LAPACK to be built to support our particular hardware for best performance. However, would it be that hard to have Scipy build by default with generic LAPACK (kind of like numpy), and put installers built that way up on the web, along with instructions for those that want to re-build and optimize? For that matter, is it possible (let's say on Windows) to deliver SciPy with a set of multiple dlls for LAPACK/BLAS, and have the appropriate ones chosen at install or run time? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From kwgoodman at gmail.com Fri Dec 22 12:27:07 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 22 Dec 2006 09:27:07 -0800 Subject: [Numpy-discussion] repr of bool matrix Message-ID: Bool matrices and arrays don't line up when you display them because False has 5 letters and True has 4. After several columns it can become impossible to tell which column an element belongs to. Could the repr of bool matrices print 'True ' instead of 'True'? Some truths are more true than others. From robert.kern at gmail.com Fri Dec 22 16:40:39 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Dec 2006 15:40:39 -0600 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <4586E3B4.2040609@hawaii.edu> <45877445.20508@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> <4588FC54.7040106@ee.byu.edu> Message-ID: <458C50D7.7080207@gmail.com> Charles R Harris wrote: > I've been thinking about that a bit. One solution is to have a small > python program that takes all the pieces and writes one big build file, > I think something like that happens now. Another might be to use > includes in a base file; there is nothing sacred about not including .c > files or not putting code in .h files, it is just a convention, we could > even chose another extension. I also wonder if we couldn't just link in > object files. The table of function pointers just needs some addresses > and, while the python convention of hiding all the function names by > using static functions is nice, it is probably not required. Maybe we > could use ctypes in some way? > > I am not pushing any of these alternatives at the moment, just putting > them down. Maybe there are others? None that I want to think about. #including separate .c files, leaving the extension alone, is best, IMO. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Fri Dec 22 18:14:58 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Dec 2006 17:14:58 -0600 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458BA9B5.6050709@gmx.net> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> Message-ID: <458C66F2.9020206@gmail.com> Sven Schreiber wrote: > So, to put it "pointedly" (if that's the right word...?): > Numpy should not get small functions from scipy -> because the size of > scipy doesn't matter -> because scipy's modules will be installable as > add-ons separately (and because there will be ready-to-use installers); > however, nobody knows how to actually do that in practice ? Rather, to put it accurately, numpy should not get large chunks of scipy functionality that require FORTRAN dependencies for reasons that should be obvious from that description. scipy.stats.distributions is just such a chunk. The ancillary point is that I think that, for those who do find the largeness and difficult-to-installness of scipy onerous, the best path forward is to work on the build process of scipy. And it will take *work* not wishes nor complaints nor tags. And honestly, the more I see the latter, the less motivated I am to bother with the former. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Fri Dec 22 18:45:34 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 22 Dec 2006 17:45:34 -0600 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com><200612211745.03602.faltet@carabos.com> Message-ID: <458C6E1E.7070602@gmail.com> Alan G Isaac wrote: > PS A question: is it a good thing if more students start > using NumPy *now*? It looks to me like building community > size is an important current goal for NumPy. Strip it down > like you suggest and aside from Windows users (and Macs are > increasingly popular among my students) you'll have only the > few that are not intimidated by building SciPy (which still > has no intaller for Python 2.5). You mean a Windows installer? Yes, it does. http://sourceforge.net/project/showfiles.php?group_id=27747 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Fri Dec 22 19:05:14 2006 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 22 Dec 2006 19:05:14 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458C6E1E.7070602@gmail.com> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com><200612211745.03602.faltet@carabos.com> <458C6E1E.7070602@gmail.com> Message-ID: > Alan G Isaac wrote: >> Strip it down like you suggest and aside from Windows >> users (and Macs are increasingly popular among my >> students) you'll have only the few that are not >> intimidated by building SciPy (which still has no >> intaller for Python 2.5). On Fri, 22 Dec 2006, Robert Kern apparently wrote: > You mean a Windows installer? Yes, it does. > http://sourceforge.net/project/showfiles.php?group_id=27747 1. No, I meant a Mac installer. Sorry that was unclear. And let me be clear that I understand that if I really want one for my students that I should learn how to build one. (And if I get a moment to breathe, I'd like to learn how.) My point was only that lack of availability does have implications. 2. Re: other message, I was not aware that moving scipy.stats.distributions into NumPy would complicate the Numpy build process (which is currently delightfully easy). Thank you, Alan Isaac From oliphant at ee.byu.edu Sat Dec 23 00:12:47 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri, 22 Dec 2006 22:12:47 -0700 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458AF7BF.9020109@noaa.gov> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <200612211745.03602.faltet@carabos.com> <458AF7BF.9020109@noaa.gov> Message-ID: <458CBACF.7000508@ee.byu.edu> Christopher Barker wrote: > It can be a pain to build this kind of thing on OS-X, as Apple has not > supported a Fortran compiler yet, but it can (and has) been done. IN > fact, the Mac is a great target for pre-built binaries as there is only > a small variety of hardware to support, and Apple supplies LAPACK/BLAS > libs with the system. As for distributing it, the archive at: > I'm always confused about how to distribute something like SciPy for the MAC. What exactly should be distributed? Is it possible to use distutils to get it done? I'd love to provide MAC binaries of SciPy / NumPy. Right now, we rely on people like Chris for that. -Travis From oliphant at ee.byu.edu Sat Dec 23 01:26:45 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri, 22 Dec 2006 23:26:45 -0700 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458BA9B5.6050709@gmx.net> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> Message-ID: <458CCC25.7050507@ee.byu.edu> Sven Schreiber wrote: > Robert Kern schrieb: > >> Pierre GM wrote: >> > So, to put it "pointedly" (if that's the right word...?): > Numpy should not get small functions from scipy -> because the size of > scipy doesn't matter -> because scipy's modules will be installable as > add-ons separately (and because there will be ready-to-use installers); > however, nobody knows how to actually do that in practice ? > > Well that will convince my colleagues! > > Please don't feel offended, I just want to make the point (as usual) > that this way numpy is going to be a good library for other software > projects, but not super-attractive for direct users (aka "matlab > converts", although I personally don't come from matlab). > Don't worry about offending, we all recognize the weaknesses. I think you are pointing out something most of us already see. It is the combination of SciPy+NumPy+Matplotlib+IPython (+ perhaps a good IDE) that can succeed at being a MATLAB/IDL replacement for a lot of people. NumPy by itself isn't usually enough, and it's also important to keep NumPy as a library that can be used for other development. What is also needed is a good "package" of it all --- like the Enthon distribution. This requires quite a bit of thankless work. Enthought has done quite a bit in this direction for Windows but they have not had client demand to get it wrapped up for other platforms. I think people are hoping that eggs will help here but it hasn't come to fruition. This is an area that SciPy could really use someone stepping up and taking charge. I like the discussions that have taken place regarding documentation issues, as well as the work that went into making all of SciPy compile with gfortran (though perhaps not bug free...). These are important steps that are much appreciated. Best regards, -Travis From robert.kern at gmail.com Sat Dec 23 01:30:30 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 Dec 2006 00:30:30 -0600 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458CBACF.7000508@ee.byu.edu> References: <627102C921CD9745B070C3B10CB8199B010EBF67@hardwire.esri.com> <200612211745.03602.faltet@carabos.com> <458AF7BF.9020109@noaa.gov> <458CBACF.7000508@ee.byu.edu> Message-ID: <458CCD06.5040205@gmail.com> Travis Oliphant wrote: > I'm always confused about how to distribute something like SciPy for the > MAC. What exactly should be distributed? Is it possible to use > distutils to get it done? To get a package format that is actually useful (bdist_dumb just doesn't cut it on any platform, really), you need to install something else. I prefer building eggs instead of mpkgs. So this is what I do: Install setuptools. I then create a script (I usually call it ./be) in the numpy/ and scipy/ directories to hold all of the options I use for building: #!/bin/sh pythonw2.5 -c "import setuptools; execfile('setup.py')" build_src build_clib --fcompiler=gnu95 build_ext --fcompiler=gnu95 build "$@" Then, to build an egg: $ ./be bdist_egg You can then upload it to the Package Index (maybe. I had trouble uploading the Windows scipy binary that Gary Pajer sent me. I suspect that the Index rejected it because it was too large). Here are the outstanding issues as I see them: * Using scipy requires that the FORTRAN runtime libraries that you compiled against be installed in the appropriate place, i.e. /usr/local/lib. This is annoying, since there are currently only tarballs available, so the user needs root access to install them. If an enterprising individual wants to make this situation better, he might try to make a framework out of the necessary libraries such that we can simply link against those. Frameworks are easier to install to different locations with less hassle. http://hpc.sourceforge.net/ * g77 does not work with the Universal Python build process, so we are stuck with gfortran. * The GNU FORTRAN compilers that are available are architecture-specific. For us, that means that we cannot build Universal scipy binaries. If you build on an Intel Mac, the binaries will only work on Intel Macs; if you build on a PPC Mac, likewise, the resulting binaries only work on PPC Macs. * In a related problem, I cannot link against ATLAS on Intels, and possibly not on PPCs, either (I haven't tried building with a Universal Python on PPC). The Universal compile flags (notably "-arch ppc -arch intel") are used when compiling the numpy.distutils test programs for discovering ATLAS's version. Using a single-architecture ATLAS library causes those test programs to not link (since they are missing the _ATL_buildinfo symbol for the missing architecture). I've tried using lipo(1) to assemble a Universal ATLAS library from a PPC-built library and an Intel-built library, but this did not change the symptomology. Fortunately, part of ATLAS is already built into the Accelerate.framework provided with the OS and is automatically recognized by numpy.distutils. It's missing the C versions of LAPACK functions, so scipy.linalg.clapack will be empty. Also, I think numpy.distutils won't recognize that it is otherwise an ATLAS (_ATL_buildinfo is also missing from the framework), so it may not try to compile the C BLAS interfaces, either. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Sat Dec 23 02:13:34 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat, 23 Dec 2006 00:13:34 -0700 Subject: [Numpy-discussion] repr of bool matrix In-Reply-To: References: Message-ID: <458CD71E.1080107@ee.byu.edu> Keith Goodman wrote: > Bool matrices and arrays don't line up when you display them because > False has 5 letters and True has 4. > > After several columns it can become impossible to tell which column an > element belongs to. > > Could the repr of bool matrices print 'True ' instead of 'True'? > Done in SVN. True prints as ' True' -Travis From svetosch at gmx.net Sun Dec 24 04:54:52 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Sun, 24 Dec 2006 10:54:52 +0100 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458C66F2.9020206@gmail.com> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> <458C66F2.9020206@gmail.com> Message-ID: <458E4E6C.4000504@gmx.net> Robert Kern schrieb: > Rather, to put it accurately, numpy should not get large chunks of scipy > functionality that require FORTRAN dependencies for reasons that should be > obvious from that description. scipy.stats.distributions is just such a chunk. I was probably not very clear, I was referring to "small" functions. As you and others have pointed out, for p-value stuff this doesn't apply apparently. Ok. > > The ancillary point is that I think that, for those who do find the largeness > and difficult-to-installness of scipy onerous, the best path forward is to work > on the build process of scipy. And it will take *work* not wishes nor complaints > nor tags. And honestly, the more I see the latter, the less motivated I > am to bother with the former. > My impression of the discussion was that many people said _nothing_ at all should be added into numpy ever, which sounded kind of fundamentalistic to me. And partly the given justification were some features of scipy that don't exist yet (fair enough), and that nobody is even working on. Of course I understand the lack of manpower, but then I think this state of affairs should be properly taken into account when arguing against moving (only small!) features from scipy into numpy. Hence my earlier post. I also try to contribute to open-source projects where I can, and believe me, it would probably help my career more to just have my faculty pay for matlab and forget about numpy et al. You know, users' time is valuable, too. Unfortunately I don't have the skills to help with modularizing scipy (nor the time to acquire those skills). Btw, that's why I like the idea of paying for stuff like documentation and other things that open-source projects often forget about because it's not fun for the developers. (Hey, I'm an economist...) So I would be willing to donate money for some of the dull tasks, for example. (I'm fully aware that would not cover the real cost of the work, just like in Travis' case with the numpy guide.) Ok, that's enough, happy holidays, Sven From Norbert.Nemec.list at gmx.de Sun Dec 24 07:57:25 2006 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Sun, 24 Dec 2006 13:57:25 +0100 Subject: [Numpy-discussion] Bug in numarray<->numpy interaction Message-ID: <458E7935.109@gmx.de> The following snippet demonstrates a problem in the interaction of numarray 1.5.2 with numpy 1.0.1 (and older versions): ------------------- #!/usr/bin/env python import numarray, numpy na = numarray.array(0.) np = numpy.array(0.) na[...] = np ------------------- the last linec causes the error "TypeError: NA_setFromPythonScalar: bad value type." The problem occured in a perfectly normal piece of code combining PyTables (internally based on numarray) with calculations done in NumPy. AFAICS, it should work but simply is a bug somewhere in numarray or numpy. Thanks, Norbert From charlesr.harris at gmail.com Sun Dec 24 13:14:14 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 24 Dec 2006 11:14:14 -0700 Subject: [Numpy-discussion] Bug in numarray<->numpy interaction In-Reply-To: <458E7935.109@gmx.de> References: <458E7935.109@gmx.de> Message-ID: On 12/24/06, Norbert Nemec wrote: > > The following snippet demonstrates a problem in the interaction of > numarray 1.5.2 with numpy 1.0.1 (and older versions): > > ------------------- > #!/usr/bin/env python > > import numarray, numpy > > na = numarray.array(0.) > np = numpy.array(0.) > > na[...] = np > ------------------- > > the last linec causes the error > "TypeError: NA_setFromPythonScalar: bad value type." > > The problem occured in a perfectly normal piece of code combining > PyTables (internally based on numarray) with calculations done in NumPy. > AFAICS, it should work but simply is a bug somewhere in numarray or numpy. Recent versions of PyTables work fine with numpy. Unless you have old tables using the numarray flavor I suggest making the change. It is probably possible to read the numarray data into numpy by specifying a particular flavor in the read statement, but Francesc could probably tell you more about that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Dec 24 13:20:18 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 24 Dec 2006 11:20:18 -0700 Subject: [Numpy-discussion] Bug in numarray<->numpy interaction In-Reply-To: <458E7935.109@gmx.de> References: <458E7935.109@gmx.de> Message-ID: On 12/24/06, Norbert Nemec wrote: > > The following snippet demonstrates a problem in the interaction of > numarray 1.5.2 with numpy 1.0.1 (and older versions): > > ------------------- > #!/usr/bin/env python > > import numarray, numpy > > na = numarray.array(0.) > np = numpy.array(0.) > > na[...] = np As a work around, do In [5]: na[...] = int(np) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkg at cs.nyu.edu Sun Dec 24 15:28:50 2006 From: mkg at cs.nyu.edu (Matthew Koichi Grimes) Date: Sun, 24 Dec 2006 15:28:50 -0500 Subject: [Numpy-discussion] nested recarrays Message-ID: <458EE302.1000703@cs.nyu.edu> (Newbie alert.) I'm having trouble making a nested record array. I'm trying to work from the following example on the scipy.org examples page: >>> mydescriptor = dtype([('x', 'f4'),('y', 'f4'), # nested recarray ... ('nested', [('i', 'i2'),('j','i2')])]) >>> myarr = array([(1.0, 2.0, (1,2))], dtype=mydescriptor) ... but this isn't really a nested recarray, since you can't refer to fields 'x', 'y', or 'nested' as attributes: >>> myarr.x AttributeError: 'numpy.ndarray' object has no attribute 'x' You have to use the more cumbersome bracket notation: >>> myarr['x'] array([ 1.], dtype=float32) When I try modifying the above example by simply replacing the 'array' constructor with 'recarray', I get the following error, which I haven't really grokked yet: >>> myrecarr = N.recarray([(1.0, 2.0, (1,2))], dtype=dt) --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) /home/mkg/ /usr/lib/python2.4/site-packages/numpy/core/records.py in __new__(subtype, shape, dtype, buf, offset, strides, formats, names, titles, byteorder, aligned) 176 177 if buf is None: --> 178 self = sb.ndarray.__new__(subtype, shape, (record, descr)) 179 else: 180 self = sb.ndarray.__new__(subtype, shape, (record, descr), TypeError: an integer is required I'm trying to make a record array that stores two record arrays called "states" and "controls", each of which store two float arrays "x" and "dx". The dtype would be something like: dtype([('states', [('x', 'f4'), ('dx, 'f4')]), ('controls', [('x', 'f4'), ('dx', 'f4')])]) It'd be great if I could address elements as: myarr.states.x[0] as opposed to myarr['states']['x'][0] Any tips would be greatly appreciated. Once I figure this out, I'd be glad to post the solution as an example under the "recarray()" entry in the examples list. -- Matt From oliphant at ee.byu.edu Sun Dec 24 20:22:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun, 24 Dec 2006 18:22:05 -0700 Subject: [Numpy-discussion] nested recarrays In-Reply-To: <458EE302.1000703@cs.nyu.edu> References: <458EE302.1000703@cs.nyu.edu> Message-ID: <458F27BD.1050006@ee.byu.edu> Matthew Koichi Grimes wrote: > (Newbie alert.) > > I'm having trouble making a nested record array. I'm trying to work from > the following example on the scipy.org examples page: > > >>> mydescriptor = dtype([('x', 'f4'),('y', 'f4'), # nested recarray > ... ('nested', [('i', 'i2'),('j','i2')])]) > >>> myarr = array([(1.0, 2.0, (1,2))], dtype=mydescriptor) > > > ... but this isn't really a nested recarray, since you can't refer to > fields 'x', 'y', or 'nested' as attributes: > It is nested but, yes, it's not as convenient as attribute access. > When I try modifying the above example by simply replacing the 'array' > constructor with 'recarray', I get the following error, which I haven't > really grokked yet: > The problem is that recarray is not analagous with array. It is analagous with ndarray. Use N.rec.array instead. -Travis From faltet at carabos.com Mon Dec 25 06:15:57 2006 From: faltet at carabos.com (Francesc Altet) Date: Mon, 25 Dec 2006 12:15:57 +0100 Subject: [Numpy-discussion] Bug in numarray<->numpy interaction In-Reply-To: <458E7935.109@gmx.de> References: <458E7935.109@gmx.de> Message-ID: <200612251215.59037.faltet@carabos.com> A Diumenge 24 Desembre 2006 13:57, Norbert Nemec escrigu?: > The following snippet demonstrates a problem in the interaction of > numarray 1.5.2 with numpy 1.0.1 (and older versions): > > ------------------- > #!/usr/bin/env python > > import numarray, numpy > > na = numarray.array(0.) > np = numpy.array(0.) > > na[...] = np > ------------------- > > the last linec causes the error > "TypeError: NA_setFromPythonScalar: bad value type." > > The problem occured in a perfectly normal piece of code combining > PyTables (internally based on numarray) with calculations done in NumPy. > AFAICS, it should work but simply is a bug somewhere in numarray or numpy. Well, I'm not sure whether we should take this as a bug of numarray (although numpy is happily accepting rank-0 arrays from numarray or Numeric, so we won't have this problem in forthcoming PyTables 2.0). In any case, here you have a quick workaround for PyTables: Index: tables/Array.py =================================================================== --- tables/Array.py (revision 2166) +++ tables/Array.py (revision 2167) @@ -656,7 +656,10 @@ # Assign the value to it try: - narr[...] = value + if hasattr(value, 'shape') and value.shape == (): # Rank-0 case + narr[...] = value[()] + else: + narr[...] = value except Exception, exc: #XXX raise ValueError, \ """value parameter '%s' cannot be converted into an array object compliant with %s: The above patch will accept rank-0 arrays from both NumPy or Numeric. It has also been applied into std-1.4 branch: http://www.pytables.org/trac/changeset/2168/branches Cheers! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From v-nijs at kellogg.northwestern.edu Mon Dec 25 17:07:12 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Mon, 25 Dec 2006 16:07:12 -0600 Subject: [Numpy-discussion] Time series: lag function Message-ID: I am tryin to convert some of my time-series code written in Ox to scipy/numpy (e.g., unit root tests, IRFs, cointegration, etc). Two key functions I need for this are 'lag' and 'diff'. 'diff' is available but 'lag' is apparently not. Below is my attempt at a lag function. I tried to be somewhat consistent with the diff function which is part of numpy (also listed for convenience). It seems to work fine for a 2-d array but not for a 1-d or 3-d array (see tests at bottom of email). I'd appreciate any suggestions you may have. Thanks, Vincent from numpy import * def diff(a, n=1, axis=-1): """Calculate the nth order discrete difference along given axis. """ if n == 0: return a if n < 0: raise ValueError, 'order must be non-negative but got ' + repr(n) a = asanyarray(a) nd = len(a.shape) slice1 = [slice(None)]*nd slice2 = [slice(None)]*nd slice1[axis] = slice(1, None) slice2[axis] = slice(None, -1) slice1 = tuple(slice1) slice2 = tuple(slice2) if n > 1: return diff(a[slice1]-a[slice2], n-1, axis=axis) else: return a[slice1]-a[slice2] def lag(a, n=1, lag_axis=0, concat_axis=1): """ Calculate the nth order discrete lag along given axis. Note: axis=-1 means 'last dimension'. This is the default for the diff function. However, the first dimension (0) may be preferred for time-series analysis. """ a = asanyarray(a) n = ravel(n) # convert input to an array nmax = n.max() # determines length of array to be returned nd = len(a.shape) # number of dimentions in array a s = [slice(None)]*nd # lag for 1st element in n s[lag_axis] = slice(nmax-n[0],-n[0]) ret_a = a[tuple(s)] # array to be returned # lags for other elements in n for i in n[1:]: s[lag_axis] = slice(nmax-i,-i) ret_a = concatenate((ret_a,a[tuple(s)]), concat_axis) return ret_a # testing lag function # test 1 data = arange(10) print "=" * 30 print "test 1 - data" print data print "\nlag 2" print lag(data,2) print "\nlag 1,2,3" print lag(data,range(1,4)) print "=" * 30 + "\n" # test 2 data = arange(10) data = vstack((data,data)).T print "=" * 30 print "test 2 - data" print data print "\nlag 2" print lag(data,2) print "\nlag 1,2,3" print lag(data,range(1,4)) print "=" * 30 + "\n" # test 3 data = arange(10) data = vstack((data,data)).T data = dstack((data,data)) print "=" * 30 print "test 3 - data" print data print "\nlag 2" print lag(data,2) print "\nlag 1,2,3," print lag(data,range(1,4)) print "=" * 30 + "\n" From cournape at gmail.com Tue Dec 26 03:16:38 2006 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Dec 2006 09:16:38 +0100 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <458C50D7.7080207@gmail.com> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> <4588FC54.7040106@ee.byu.edu> <458C50D7.7080207@gmail.com> Message-ID: <5b8d13220612260016y79112a70o5897da68ce53250b@mail.gmail.com> On 12/22/06, Robert Kern wrote: > Charles R Harris wrote: > > > I've been thinking about that a bit. One solution is to have a small > > python program that takes all the pieces and writes one big build file, > > I think something like that happens now. Another might be to use > > includes in a base file; there is nothing sacred about not including .c > > files or not putting code in .h files, it is just a convention, we could > > even chose another extension. I also wonder if we couldn't just link in > > object files. The table of function pointers just needs some addresses > > and, while the python convention of hiding all the function names by > > using static functions is nice, it is probably not required. Maybe we > > could use ctypes in some way? > > > > I am not pushing any of these alternatives at the moment, just putting > > them down. Maybe there are others? > > None that I want to think about. #including separate .c files, leaving the > extension alone, is best, IMO. > > I've studied a bit how exposing C api from python extensions work at the python.org website. My understanding is that the problem when splitting into different files is that the C standard has no class storage equivalent to a "shared static", eg using a function in several C files of the same shared library, without the function being exposed in the shared library. One elegant solution for this is non portable, unfortunately: recent gcc version has this functionality called new C++ visibility support, which also works for C source files. http://gcc.gnu.org/wiki/Visibility This file explains the different ways of limiting symbols in dso available http://people.redhat.com/drepper/dsohowto.pdf Having several include of C files is the easiest way, and I guess this would be the safest way to start splitting source files. A better way can always be used after anyway, I guess. The question would be then: how do people think one should split the files ? By topics (eg one file for arrays destruction/construction, one file for elementary operations, one file for the C api, etc...) ? I am willing to spend some time on this, if this is considered useful, cheers, David From ivilata at carabos.com Tue Dec 26 04:00:19 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Tue, 26 Dec 2006 10:00:19 +0100 Subject: [Numpy-discussion] Fixes to Numexpr under 64 bit platforms Message-ID: <20061226090019.GH11105@tardis.terramar.selidor.net> Hi all, here you have a patch that fixes some type declaration bugs which cause Numexpr to crash under 64 bit platforms. All of them are confusions between the ``int`` and ``intp`` types, which happen to be the same under 32 bit platforms but not under 64 bit ones, which caused garbage values to be used as shapes and strides. The errors where easy to spot by looking at the warnings yielded by the compiler. Changes have been tested under a Dual Core AMD Opteron 270 running SuSE 10.0 X86-64 with Python 2.4 and 2.5. Have nice holidays, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- Index: interpreter.c =================================================================== --- interpreter.c (revision 2465) +++ interpreter.c (working copy) @@ -704,7 +704,7 @@ rawmemsize = BLOCK_SIZE1 * (size_from_sig(constsig) + size_from_sig(tempsig)); mem = PyMem_New(char *, 1 + n_inputs + n_constants + n_temps); rawmem = PyMem_New(char, rawmemsize); - memsteps = PyMem_New(int, 1 + n_inputs + n_constants + n_temps); + memsteps = PyMem_New(intp, 1 + n_inputs + n_constants + n_temps); if (!mem || !rawmem || !memsteps) { Py_DECREF(constants); Py_DECREF(constsig); @@ -822,8 +822,8 @@ int count; int size; int findex; - int *shape; - int *strides; + intp *shape; + intp *strides; int *index; char *buffer; }; @@ -956,7 +956,7 @@ PyObject *output = NULL, *a_inputs = NULL; struct index_data *inddata = NULL; unsigned int n_inputs, n_dimensions = 0; - int shape[MAX_DIMS]; + intp shape[MAX_DIMS]; int i, j, size, r, pc_error; char **inputs = NULL; intp strides[MAX_DIMS]; /* clean up XXX */ @@ -1032,7 +1032,7 @@ for (i = 0; i < n_inputs; i++) { PyObject *a = PyTuple_GET_ITEM(a_inputs, i); PyObject *b; - int strides[MAX_DIMS]; + intp strides[MAX_DIMS]; int delta = n_dimensions - PyArray_NDIM(a); if (PyArray_NDIM(a)) { for (j = 0; j < n_dimensions; j++) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From faltet at carabos.com Tue Dec 26 04:46:06 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue, 26 Dec 2006 10:46:06 +0100 Subject: [Numpy-discussion] Fixes to Numexpr under 64 bit platforms In-Reply-To: <20061226090019.GH11105@tardis.terramar.selidor.net> References: <20061226090019.GH11105@tardis.terramar.selidor.net> Message-ID: <200612261046.07278.faltet@carabos.com> Ei! de puta mare! A Dimarts 26 Desembre 2006 10:00, Ivan Vilata i Balaguer escrigu?: > Hi all, here you have a patch that fixes some type declaration bugs > which cause Numexpr to crash under 64 bit platforms. All of them are > confusions between the ``int`` and ``intp`` types, which happen to be > the same under 32 bit platforms but not under 64 bit ones, which caused > garbage values to be used as shapes and strides. > > The errors where easy to spot by looking at the warnings yielded by the > compiler. Changes have been tested under a Dual Core AMD Opteron 270 > running SuSE 10.0 X86-64 with Python 2.4 and 2.5. > > Have nice holidays, > > > > Ivan Vilata i Balaguer >qo< http://www.carabos.com/ > C?rabos Coop. V. V V Enjoy Data > "" -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From ivilata at carabos.com Tue Dec 26 06:31:43 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Tue, 26 Dec 2006 12:31:43 +0100 Subject: [Numpy-discussion] Different numpy.int64 in 64 bit platforms? Message-ID: <20061226113143.GJ11105@tardis.terramar.selidor.net> I have come across this strange behaviour of NumPy 1.0.1 under a 64 bit AMD Opteron: >>> import numpy >>> numpy.__version__ '1.0.1' >>> numpy.dtype(int).type >>> numpy.dtype(int).type is numpy.int64 True >>> numpy.dtype('int64').type >>> numpy.dtype('int64').type is numpy.int64 True >>> numpy.dtype(long).type >>> numpy.dtype(long).type is numpy.int64 # strange, but ok False >>> issubclass(numpy.dtype(long).type, numpy.int64) # what? False I.e. the NumPy type used for ``long`` is not the same than for ``int`` or explicit ``'int64'``, not even an instance. Is this a bug or is this kind of type comparisons discouraged? Thanks! :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From oliphant at ee.byu.edu Tue Dec 26 19:05:20 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue, 26 Dec 2006 17:05:20 -0700 Subject: [Numpy-discussion] Different numpy.int64 in 64 bit platforms? In-Reply-To: <20061226113143.GJ11105@tardis.terramar.selidor.net> References: <20061226113143.GJ11105@tardis.terramar.selidor.net> Message-ID: <4591B8C0.9070703@ee.byu.edu> Ivan Vilata i Balaguer wrote: > I have come across this strange behaviour of NumPy 1.0.1 under a 64 bit > AMD Opteron: > > I.e. the NumPy type used for ``long`` is not the same than for ``int`` > or explicit ``'int64'``, not even an instance. Is this a bug or is this > kind of type comparisons discouraged? Thanks! > > There is a NumPy type for each underlying c-type (i.e. int, long, short, longlong). How many bits these have is platform dependent. Which of these gets mapped to the name numpy.int64 is also platform dependent. It is rare that you should be doing is-type comparisons on the type of the array scalar. Equality testing on the data-type object (the thing returned by the dtype attribute of the ndarray and the array scalar) returns true if the data-types are compatible. This is a more reliable test. So, it's not a bug, but should be a FAQ. -Travis From oliphant at ee.byu.edu Tue Dec 26 19:27:45 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue, 26 Dec 2006 17:27:45 -0700 Subject: [Numpy-discussion] slow numpy.clip ? In-Reply-To: <5b8d13220612260016y79112a70o5897da68ce53250b@mail.gmail.com> References: <45864074.6090203@ar.media.kyoto-u.ac.jp> <45879279.6030707@hawaii.edu> <4587A926.7070401@ar.media.kyoto-u.ac.jp> <4587B2EF.7010803@gmail.com> <4588901B.5030904@ee.byu.edu> <4588A055.5070707@ar.media.kyoto-u.ac.jp> <4588FC54.7040106@ee.byu.edu> <458C50D7.7080207@gmail.com> <5b8d13220612260016y79112a70o5897da68ce53250b@mail.gmail.com> Message-ID: <4591BE01.2070808@ee.byu.edu> David Cournapeau wrote: > On 12/22/06, Robert Kern wrote: > >> Charles R Harris wrote: >> >> >>> I've been thinking about that a bit. One solution is to have a small >>> python program that takes all the pieces and writes one big build file, >>> I think something like that happens now. Another might be to use >>> includes in a base file; there is nothing sacred about not including .c >>> files or not putting code in .h files, it is just a convention, we could >>> even chose another extension. I also wonder if we couldn't just link in >>> object files. The table of function pointers just needs some addresses >>> and, while the python convention of hiding all the function names by >>> using static functions is nice, it is probably not required. Maybe we >>> could use ctypes in some way? >>> >>> I am not pushing any of these alternatives at the moment, just putting >>> them down. Maybe there are others? >>> >> None that I want to think about. #including separate .c files, leaving the >> extension alone, is best, IMO. >> >> >> > > The question would be then: how do people think one should split the > files ? By topics (eg one file for arrays destruction/construction, > one file for elementary operations, one file for the C api, etc...) ? > I think it's useful but don't have time to think very much about it. I suspect anything that's semi-coherent that results in smaller files will be beneficial for editing purposes. The only real opinion I have at this point is that I'd like to see multiarraymodule.c contain little more than include statements (of headers and other .c files) and comments. -Travis From ivilata at carabos.com Wed Dec 27 03:23:26 2006 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Wed, 27 Dec 2006 09:23:26 +0100 Subject: [Numpy-discussion] Small fix to Numexpr getType() Message-ID: <20061227082326.GK11105@tardis.terramar.selidor.net> Hi all, According to Travis' advice in a prevoius thread (see http://www.mail-archive.com/numpy-discussion%40scipy.org/msg00442.html), I have modified the ``compiler.getType()`` function in Numexpr so that it uses the ``dtype.kind`` attribute instead of ``issubclass()``. The patch is attached. By the way, what is the proper casing of "Numexpr"? :) :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- Index: compiler.py =================================================================== --- compiler.py (revision 2465) +++ compiler.py (working copy) @@ -554,14 +554,14 @@ def getType(a): - t = a.dtype.type - if issubclass(t, numpy.bool_): + kind = a.dtype.kind + if kind == 'b': return bool - if issubclass(t, numpy.integer): + if kind in 'iu': return int - if issubclass(t, numpy.floating): + if kind == 'f': return float - if issubclass(t, numpy.complexfloating): + if kind == 'c': return complex raise ValueError("unkown type %s" % a.dtype.name) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 309 bytes Desc: Digital signature URL: From gnchen at mac.com Wed Dec 27 12:06:11 2006 From: gnchen at mac.com (Gennan Chen) Date: Wed, 27 Dec 2006 09:06:11 -0800 Subject: [Numpy-discussion] which fft I should use Message-ID: <4FB3C0EC-6C5C-4AE7-A8BF-D9433DB3B279@mac.com> Hi! all, There are so many fft routines in Scipy/Numpy. Does anyone know which one should be used officially? Gen From robert.kern at gmail.com Wed Dec 27 12:58:24 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Dec 2006 12:58:24 -0500 Subject: [Numpy-discussion] which fft I should use In-Reply-To: <4FB3C0EC-6C5C-4AE7-A8BF-D9433DB3B279@mac.com> References: <4FB3C0EC-6C5C-4AE7-A8BF-D9433DB3B279@mac.com> Message-ID: <4592B440.2010201@gmail.com> Gennan Chen wrote: > Hi! all, > > There are so many fft routines in Scipy/Numpy. Does anyone know > which one should be used officially? For maximum portability and speed, use numpy.dual.fft() and its friends. That will use the optimized functions in scipy.fftpack if it is available and numpy.fft otherwise. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Wed Dec 27 14:49:06 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Dec 2006 11:49:06 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <458CCC25.7050507@ee.byu.edu> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> <458CCC25.7050507@ee.byu.edu> Message-ID: <4592CE32.3060700@noaa.gov> Travis Oliphant wrote: > It is the > combination of SciPy+NumPy+Matplotlib+IPython (+ perhaps a good IDE) > that can succeed at being a MATLAB/IDL replacement for a lot of people. > What is also needed is a good "package" of it all --- like the Enthon > distribution. This requires quite a bit of thankless work. I know Robert put some serious effort into "MacEnthon" a while back, but is no longer maintaining that, which doesn't surprise me a bit -- that looked like a LOT of work. However, MacEnthon was much bigger than that just the packages Travis listed above, and I think Travis has that right -- those are the key ones to do. Let's "just do it!" -- first we need to solve the Fortran+Universal binary problems though -- that seems to be the technical sticking point on OS-X Also, while the Enthon distribution is fabulous, they do tend to stay behind the bleeding edge a fair bit -- it would be nice to have the core packages with the latest and greatest on Windows and Linux too, all as one easy installer. (or rpm or .deb or whatever for Linux) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pearu at cens.ioc.ee Wed Dec 27 15:39:18 2006 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed, 27 Dec 2006 22:39:18 +0200 (EET) Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <4592CE32.3060700@noaa.gov> Message-ID: On Wed, 27 Dec 2006, Christopher Barker wrote: > Travis Oliphant wrote: > > It is the > > combination of SciPy+NumPy+Matplotlib+IPython (+ perhaps a good IDE) > > that can succeed at being a MATLAB/IDL replacement for a lot of people. > > > > What is also needed is a good "package" of it all --- like the Enthon > > distribution. This requires quite a bit of thankless work. > > I know Robert put some serious effort into "MacEnthon" a while back, but > is no longer maintaining that, which doesn't surprise me a bit -- that > looked like a LOT of work. > > However, MacEnthon was much bigger than that just the packages Travis > listed above, and I think Travis has that right -- those are the key > ones to do. Let's "just do it!" -- first we need to solve the > Fortran+Universal binary problems though -- that seems to be the > technical sticking point on OS-X Let me add a comment on the Fortran problem (which I assume to be the (lack of) Fortran compiler problem, right?). I have been working on f2py rewrite to support wrapping Fortran 90 types among other F90 constructs and as a result we have almost a complete Fortran parser in Python. It is relatively easy to use this parser to automatically convert Fortran 77 codes that we have in scipy to C codes whenever no Fortran compiler is available. Due to lack of funding this work has been freezed for now but I'd say that there is a hope to resolve the Fortran compiler issues for any platform in future. Pearu From Chris.Barker at noaa.gov Wed Dec 27 16:09:42 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Dec 2006 13:09:42 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: References: Message-ID: <4592E116.4080602@noaa.gov> pearu at cens.ioc.ee wrote: > I have been working on > f2py rewrite to support wrapping Fortran 90 types among other F90 > constructs and as a result we have almost a complete Fortran parser in > Python. It is relatively easy to use this parser to automatically convert > Fortran 77 codes that we have in scipy to C codes whenever no Fortran > compiler is available. Cool! How is the different/better than the old standby f2c? One issue with f2c is that it required a pretty good set of libs to support stuff that Fortran had that C didn't -- complex numbers come to mind, I'm not sure what else was is in libf2c. In fact, I've often wondered why scipy doesn't use f2c. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From robert.kern at gmail.com Wed Dec 27 16:35:31 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Dec 2006 16:35:31 -0500 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <4592E116.4080602@noaa.gov> References: <4592E116.4080602@noaa.gov> Message-ID: <4592E723.8020600@gmail.com> Christopher Barker wrote: > pearu at cens.ioc.ee wrote: >> I have been working on >> f2py rewrite to support wrapping Fortran 90 types among other F90 >> constructs and as a result we have almost a complete Fortran parser in >> Python. It is relatively easy to use this parser to automatically convert >> Fortran 77 codes that we have in scipy to C codes whenever no Fortran >> compiler is available. > > Cool! > > How is the different/better than the old standby f2c? > > One issue with f2c is that it required a pretty good set of libs to > support stuff that Fortran had that C didn't -- complex numbers come to > mind, I'm not sure what else was is in libf2c. > > In fact, I've often wondered why scipy doesn't use f2c. Generally speaking, g77 was always more likely to work on more platforms with less hassle. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From svetosch at gmx.net Wed Dec 27 19:17:35 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu, 28 Dec 2006 01:17:35 +0100 Subject: [Numpy-discussion] Time series: lag function In-Reply-To: References: Message-ID: <45930D1F.2040801@gmx.net> Vincent Nijs schrieb: > I am tryin to convert some of my time-series code written in Ox to > scipy/numpy (e.g., unit root tests, IRFs, cointegration, etc). Two key > functions I need for this are 'lag' and 'diff'. 'diff' is available but > 'lag' is apparently not. > > Below is my attempt at a lag function. I tried to be somewhat consistent > with the diff function which is part of numpy (also listed for convenience). > It seems to work fine for a 2-d array but not for a 1-d or 3-d array (see > tests at bottom of email). I'd appreciate any suggestions you may have. > Great to see somebody converting from Ox to numpy, I see synergies ahead! > def lag(a, n=1, lag_axis=0, concat_axis=1): > """ > Calculate the nth order discrete lag along given axis. > Note: axis=-1 means 'last dimension'. This is the default > for the diff function. However, the first dimension (0) > may be preferred for time-series analysis. > """ > a = asanyarray(a) > > n = ravel(n) # convert input to an array why don't you leave n as an integer? maybe you're trying to be too clever here. I think it's a good idea to have lag resemble the existing diff function, and then a single number n should be enough. (And I'm not sure about your concat_axis, e.g. what does axis=1 mean for a 1-d array?) Do you get your errors also for integer n? cheers, sven From v-nijs at kellogg.northwestern.edu Wed Dec 27 20:01:55 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Wed, 27 Dec 2006 19:01:55 -0600 Subject: [Numpy-discussion] Time series: lag function In-Reply-To: <45930D1F.2040801@gmx.net> Message-ID: Sven: I simplified the function to create lags only along axis 0 (see attached). I am using c_ now which seems to play nice with 1- and 2-d array's. The reason I am using 'n = ravel(n)' in the code is that I want to be able to pass integers as well as lists. For example, I want each of the following to work: lag(a,2) lag(a,range(1,3)) lag(a,[1]+range(4,6)) I'd actually also like the following to work: lag(a,1,3,6,8) I could do that with *n but then I don't think I can use range(x,y) in the same function call. For example, lag(a,1,3,range(6,9)). You probably don't need this flexibility when calling diff() since, at least in TS applications, I only ever need diff(a,1) or diff(a,2). Thanks, Vincent On 12/27/06 6:17 PM, "Sven Schreiber" wrote: > Vincent Nijs schrieb: >> I am tryin to convert some of my time-series code written in Ox to >> scipy/numpy (e.g., unit root tests, IRFs, cointegration, etc). Two key >> functions I need for this are 'lag' and 'diff'. 'diff' is available but >> 'lag' is apparently not. >> >> Below is my attempt at a lag function. I tried to be somewhat consistent >> with the diff function which is part of numpy (also listed for convenience). >> It seems to work fine for a 2-d array but not for a 1-d or 3-d array (see >> tests at bottom of email). I'd appreciate any suggestions you may have. >> > > Great to see somebody converting from Ox to numpy, I see synergies ahead! > > >> def lag(a, n=1, lag_axis=0, concat_axis=1): >> """ >> Calculate the nth order discrete lag along given axis. >> Note: axis=-1 means 'last dimension'. This is the default >> for the diff function. However, the first dimension (0) >> may be preferred for time-series analysis. >> """ >> a = asanyarray(a) >> >> n = ravel(n) # convert input to an array > > why don't you leave n as an integer? maybe you're trying to be too > clever here. I think it's a good idea to have lag resemble the existing > diff function, and then a single number n should be enough. > > (And I'm not sure about your concat_axis, e.g. what does axis=1 mean for > a 1-d array?) > > Do you get your errors also for integer n? > > cheers, > sven > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- -------------- next part -------------- A non-text attachment was scrubbed... Name: lag.py Type: application/octet-stream Size: 479 bytes Desc: not available URL: From v-nijs at kellogg.northwestern.edu Wed Dec 27 20:10:45 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Wed, 27 Dec 2006 19:10:45 -0600 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: Message-ID: I just started working on a time-series module/class in scipy/numpy and it seemed useful to have some of the R data-frame functionality (i.e., select columns of data based on variable names). I tried rec-arrays but couldn't get them to work the way I wanted. I also looked at the Dataframe class by Andrew Straw but at over 400 lines of code that seemed pretty complicated, to me at least. I searched the mailing-list archives and found a discussion on 'Table like array' (see exert below). To get the minimal functionality discussed, I wrote a simple class (see attached) to try and implement X.get('a','c') where 'a' and 'c' are variables names linked to columns of data in X. I added some test code so that if you run the code in the attachment you will see that is seems to work. However, since this is my first class I'd appreciate your input on the approach I used and any suggestions on how to improve the class (or use something else). I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. Thanks, Vincent > [Numpy-discussion] Re: [SciPy-user] Table like array > Paul Barrett pebarrett at gmail.com > Wed Mar 1 06:45:02 CST 2006 > > On 3/1/06, Travis Oliphant wrote: >> >> >> How many people would like to see x['f1','f2','f5'] return a new array >> with a new data-type descriptor constructed from the provided fields? >> >> > > I'm surprised that it's not already available. > > -- Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: dbase.py.zip Type: application/octet-stream Size: 7219 bytes Desc: not available URL: From nadavh at visionsense.com Thu Dec 28 08:35:03 2006 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 28 Dec 2006 15:35:03 +0200 Subject: [Numpy-discussion] cpuinfo fails to recognize Core2 cpu on linux Message-ID: <07C6A61102C94148B8104D42DE95F7E8C8F162@exchange2k.envision.co.il> System: gentoo linux (64 bits) on Core2 Duo. scipy fails to install since cpuinfo idendified the cpu as i686. Solution: I changed the linux_cpuinfo._is_Nocona() method to: def _is_Nocona(self): # return self.is_PentiumIV() and self.is_64bit() return re.match(r'Intel.*?Core.*\b', self.info[0]['model name']) is not None I am not sure that this solution is general enough. Here is my "cat /proc/cpuinfo" result: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping : 6 cpu MHz : 1596.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4797.90 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping : 6 cpu MHz : 2394.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4795.32 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual Nadav. From emsellem at obs.univ-lyon1.fr Thu Dec 28 10:33:35 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Thu, 28 Dec 2006 16:33:35 +0100 Subject: [Numpy-discussion] extracting values from an array Message-ID: <4593E3CF.9060005@obs.univ-lyon1.fr> Hi, I have a simple problem of extracting a subarray from a bigger one, for which I don't find an elegant/easy solution. I tried using searchsorted, or other tricks but I always end up with many "if" statements, testing all cases (also because searchsorted does not work on arrays which are sorted in decreasing order). There must be an elegant solution but I cannot find it. ==> I have one array of floats which has e.g. regular steps (as produced by arange) but can either be decreasing or increasing as for: x = arange(0.,1.,0.1) ## or x = arange(0.,-1.,-0.1) I also have another "data" y array, which has the same length than x. And I would like to extract from these arrays the subarrays, corresponding to the range x1 -- x2, going from x1, to x2. The output array should have its first element starting from the element in x closest to x1 (and in the range defined by x1, x2), and then going towards x2, for all cases (decreasing or increading order for x, and x1>=x2 or x1<=x2). So here is the output I would like to get in 3 different simple examples: ### Increasing order in x, and x1 <= x2 : x = arange(0.,1.,0.1) x1 = 0.1 x2 = 0.55 ### the output I would like is simply: array([ 0.1, 0.2, 0.3, 0.4, 0.5]) ### decreasing order in x, and x1 <= x2 : x = arange(0.,-1.,-0.1) x1 = -0.55 x2 = -0.1 ### I would like is then: array([ -0.5, -0.4, -0.3, -0.2, -0.1]) ### decreasing order in x, and x1 >= x2 : x = arange(0.,-1.,-0.1) x1 = -0.1 x2 = -0.55 ### I would like is then: array([ -0.1, -0.2, -0.3, -0.4, -0.5]) etc.... And it should work if both x1, and x2 are outside the given range provided in x (output should be an empty array) Note that I also need to extract the corresponding subarray from the data array (same indices as the one I extract from x). I hope this is clear. It is a very simple problem but I cannot see a simple solution without involving lots of stupid "if". It would be great if these "if" statements were hidden in some efficient numpy tricks/functions. thanks for any input. Eric From gregwillden at gmail.com Thu Dec 28 11:31:44 2006 From: gregwillden at gmail.com (Greg Willden) Date: Thu, 28 Dec 2006 10:31:44 -0600 Subject: [Numpy-discussion] extracting values from an array In-Reply-To: <4593E3CF.9060005@obs.univ-lyon1.fr> References: <4593E3CF.9060005@obs.univ-lyon1.fr> Message-ID: <903323ff0612280831x1d6aa833w3c566575e5c40596@mail.gmail.com> Hi Eric, Here are ways of doing this. starting with import numpy as N On 12/28/06, Eric Emsellem wrote: > > ### Increasing order in x, and x1 <= x2 : > x = arange(0.,1.,0.1) > x1 = 0.1 > x2 = 0.55 > ### the output I would like is simply: array([ 0.1, 0.2, 0.3, 0.4, > 0.5]) How about this? x=N.arange(0.,1.,0.1) x[ (x>=0.1) & (x<=0.55) ] ### decreasing order in x, and x1 <= x2 : > x = arange(0.,-1.,-0.1) > x1 = -0.55 > x2 = -0.1 > ### I would like is then: array([ -0.5, -0.4, -0.3, -0.2, -0.1]) x=N.arange(0.,-1.,-0.1) N.sort( x[ (x<=-0.1) & (x>=-0.55) ] ) or x[(x<=-0.1)&(x>=-0.55)][::-1] just reverses the returned array. ### decreasing order in x, and x1 >= x2 : > x = arange(0.,-1.,-0.1) > x1 = -0.1 > x2 = -0.55 > ### I would like is then: array([ -0.1, -0.2, -0.3, -0.4, -0.5]) x=N.arange(0.,-1.,-0.1) x[ (x<=-0.1) & (x>=-0.55) ] A few comments because I'm not totally clear on what you want to do. (x<=-0.1)&(x>=-0.55) will give you a boolean array of the same length as x find((x<=-0.1)&(x>=-0.55)) will return the list of indices where the argument is true. Regards, Greg -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: From emsellem at obs.univ-lyon1.fr Thu Dec 28 11:56:37 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Thu, 28 Dec 2006 17:56:37 +0100 Subject: [Numpy-discussion] extracting values from an array In-Reply-To: <903323ff0612280831x1d6aa833w3c566575e5c40596@mail.gmail.com> References: <4593E3CF.9060005@obs.univ-lyon1.fr> <903323ff0612280831x1d6aa833w3c566575e5c40596@mail.gmail.com> Message-ID: <4593F745.5050804@obs.univ-lyon1.fr> An HTML attachment was scrubbed... URL: From gregwillden at gmail.com Thu Dec 28 12:38:23 2006 From: gregwillden at gmail.com (Greg Willden) Date: Thu, 28 Dec 2006 11:38:23 -0600 Subject: [Numpy-discussion] extracting values from an array In-Reply-To: <4593F745.5050804@obs.univ-lyon1.fr> References: <4593E3CF.9060005@obs.univ-lyon1.fr> <903323ff0612280831x1d6aa833w3c566575e5c40596@mail.gmail.com> <4593F745.5050804@obs.univ-lyon1.fr> Message-ID: <903323ff0612280938h750aafefyebafab3608b82c0e@mail.gmail.com> Hi Eric, Well I think that you have the parts that you need. Perhaps something like is what you want. Put x1 and x2 into an array and sort it then access it from the sorted array. x=N.arange(0.,-1.,-0.1); xs=sort(array([-0.1, -0.55])); sort(x[(x >= xs[0] )&(x<=xs[1])]) returns: [-0.5,-0.4,-0.3,-0.2,-0.1,] x=N.arange(0.,1.,0.1); xs=sort(array([0.1, 0.55])); sort(x[(x >= xs[0] )&(x<=xs[1])]) returns: [ 0.1, 0.2, 0.3, 0.4, 0.5,] Same code just different x and different limits going into xs. Cheers, Greg On 12/28/06, Eric Emsellem wrote: > > Hi, > thanks for the answer, but I guess my request was not clear. > What I want is something which works in ALL cases so that > > function(x, x1, x2) provides the output I mentioned... What you propose > (as far as I can see) depends on the values of x1, x2, their order and the > order of x (decreasing, increasing)... > > if you have a hint on how to do this without TESTING how x is ordered > (dec, inc) and which of x1 or x2 is larger... > thanks > > Eric > > Greg Willden wrote: > > Hi Eric, > Here are ways of doing this. > starting with > import numpy as N > > On 12/28/06, Eric Emsellem < emsellem at obs.univ-lyon1.fr> wrote: > > > > ### Increasing order in x, and x1 <= x2 : > > x = arange(0.,1.,0.1) > > x1 = 0.1 > > x2 = 0.55 > > ### the output I would like is simply: array([ 0.1, 0.2, 0.3, 0.4, > > 0.5]) > > > How about this? > x=N.arange(0.,1.,0.1) > x[ (x>=0.1) & (x<= 0.55) ] > > ### decreasing order in x, and x1 <= x2 : > > x = arange(0.,-1.,-0.1) > > x1 = -0.55 > > x2 = -0.1 > > ### I would like is then: array([ -0.5, -0.4, -0.3, -0.2, -0.1]) > > > x=N.arange(0.,-1.,-0.1) > N.sort( x[ (x<=-0.1) & (x>=-0.55) ] ) > or > x[(x<=-0.1)&(x>=- 0.55)][::-1] > just reverses the returned array. > > > ### decreasing order in x, and x1 >= x2 : > > x = arange(0.,-1.,-0.1) > > x1 = -0.1 > > x2 = -0.55 > > ### I would like is then: array([ -0.1, -0.2, -0.3, -0.4, -0.5]) > > > x=N.arange(0.,-1.,-0.1) > x[ (x<=-0.1) & (x>=-0.55) ] > > > A few comments because I'm not totally clear on what you want to do. > (x<=-0.1)&(x>=-0.55) > will give you a boolean array of the same length as x > find((x<=-0.1)&(x>=-0.55)) > will return the list of indices where the argument is true. > > Regards, > Greg > > -- > Linux. Because rebooting is for adding hardware. > > ------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion > > > -- > ==================================================================== > Eric Emsellem emsellem at obs.univ-lyon1.fr > Centre de Recherche Astrophysique de Lyon > 9 av. Charles-Andre tel: +33 (0)4 78 86 83 84 > 69561 Saint-Genis Laval Cedex fax: +33 (0)4 78 86 83 86 > France http://www-obs.univ-lyon1.fr/eric.emsellem > ==================================================================== > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: From emsellem at obs.univ-lyon1.fr Thu Dec 28 12:59:01 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Thu, 28 Dec 2006 18:59:01 +0100 Subject: [Numpy-discussion] extracting values from an array In-Reply-To: <903323ff0612280938h750aafefyebafab3608b82c0e@mail.gmail.com> References: <4593E3CF.9060005@obs.univ-lyon1.fr> <903323ff0612280831x1d6aa833w3c566575e5c40596@mail.gmail.com> <4593F745.5050804@obs.univ-lyon1.fr> <903323ff0612280938h750aafefyebafab3608b82c0e@mail.gmail.com> Message-ID: <459405E5.4060303@obs.univ-lyon1.fr> looks ok, except that I don't want to sort out the output but keep the right order depending on x0 and x1, so I have then to add the order I wish for the output array, maybe with something like: ## init x0 = -0.55 x1 = -0.1 min,max,step = 0., -1., -0.1 x=num.arange(min,max,step); ## getting the right output xs=sort(array([x0,x1])) x[(x >= xs[0]) & (x<=xs[1])][::num.sign(x1-x0)*num.sign(step)] should work. I don't see a simpler way here... thanks to get me on track!!! Eric Greg Willden wrote: > Hi Eric, > Well I think that you have the parts that you need. > Perhaps something like is what you want. Put x1 and x2 into an array > and sort it then access it from the sorted array. > > x=N.arange(0.,-1.,-0.1); > xs=sort(array([-0.1, -0.55])); > sort(x[(x >= xs[0] )&(x<=xs[1])]) > > returns: [-0.5,-0.4,-0.3,-0.2,-0.1,] > > x=N.arange(0.,1.,0.1); > xs=sort(array([0.1, 0.55])); > sort(x[(x >= xs[0] )&(x<=xs[1])]) > > returns: [ 0.1, 0.2, 0.3, 0.4, 0.5,] -- ==================================================================== Eric Emsellem emsellem at obs.univ-lyon1.fr Centre de Recherche Astrophysique de Lyon 9 av. Charles-Andre tel: +33 (0)4 78 86 83 84 69561 Saint-Genis Laval Cedex fax: +33 (0)4 78 86 83 86 France http://www-obs.univ-lyon1.fr/eric.emsellem ==================================================================== From eike.welk at gmx.net Thu Dec 28 13:54:55 2006 From: eike.welk at gmx.net (Eike Welk) Date: Thu, 28 Dec 2006 19:54:55 +0100 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: References: Message-ID: <200612281954.56003.eike.welk@gmx.net> If your main concern is to store scientific data on disk you might try: http://www.pytables.org/moin However, it uses numarray internally and a C library, which you have to build from source. (You use a Mac right?) Concerning your code: - Your two file solution seems impractical to me. I think you should just pickle your whole dbase object. - Maybe you should write 'load' and 'store' methods that create the temporary file, Pickler and Unpickler objects. -The __init__ method should then construct the object from a list of variable names and an array. -Offcourse you need a set method. more ideas: - A special variable name 'time'. Then you can implement a getAtTime( varNameList, timePoint) method with interpolation. - A 'plot' method that works like matplotlib's plot function. - An extract(varNameList) method, that returns a new dbase object with only the selected variables. - A companion class that can hold several time series at once to compare different experiments. Finally, post the code to the mailing list. At least I would like to use such a class :-). Yours Eike. From v-nijs at kellogg.northwestern.edu Thu Dec 28 14:40:12 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Thu, 28 Dec 2006 13:40:12 -0600 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: <200612281954.56003.eike.welk@gmx.net> Message-ID: Thanks for the input Eike. I will add load and store methods to Pickle/UnPickle the object. I have got to get the data into the class first however from an ascii file (txt or csv). I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. I you have any suggestion on how I might do this please let me know. Unfortunately I don't know what a 'set' method is or would do :) Could you point to an example perhaps. I like your ideas for extending the class. I'll look into that when I get the basic class working. Best, Vincent On 12/28/06 12:54 PM, "Eike Welk" wrote: > If your main concern is to store scientific data on disk you might > try: > http://www.pytables.org/moin > > However, it uses numarray internally and a C library, which you have > to build from source. (You use a Mac right?) > > > Concerning your code: > - Your two file solution seems impractical to me. I think you should > just pickle your whole dbase object. > - Maybe you should write 'load' and 'store' methods that create the > temporary file, Pickler and Unpickler objects. > -The __init__ method should then construct the object from a list of > variable names and an array. > -Offcourse you need a set method. > > more ideas: > - A special variable name 'time'. Then you can implement a > getAtTime( varNameList, timePoint) method with interpolation. > - A 'plot' method that works like matplotlib's plot function. > - An extract(varNameList) method, that returns a new dbase object with > only the selected variables. > - A companion class that can hold several time series at once to > compare different experiments. > > Finally, post the code to the mailing list. At least I would like to > use such a class :-). > > Yours > Eike. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs at kellogg.northwestern.edu Skype: vincentnijs From v-nijs at kellogg.northwestern.edu Thu Dec 28 18:39:12 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Thu, 28 Dec 2006 17:39:12 -0600 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: Message-ID: Based on Eike's input the dbase class can now also load and dump (simple) csv and pickle files. See the tests at the bottom of the file and the doc-strings. If there is an easy way to read array data + variable names using the csv module it would be great if that could be added to cookbook/InputOutput. I couldn't figure out how to do it. Eike: I think I can figure out how to add a plot method. However, if you have some more suggestions on how to implement the getAtTime, extract, and set methods you mentioned that would be great. Vincent On 12/28/06 1:40 PM, "Vincent Nijs" wrote: > Thanks for the input Eike. > > I will add load and store methods to Pickle/UnPickle the object. I have got > to get the data into the class first however from an ascii file (txt or > csv). > > I'd like to read the data and variable names directly from a single csv > file. I tried this through the python csv module but it would read all data > as strings and I couldn't figure out how to easily separate the variable > names and the data. I you have any suggestion on how I might do this please > let me know. > > Unfortunately I don't know what a 'set' method is or would do :) Could you > point to an example perhaps. > > I like your ideas for extending the class. I'll look into that when I get > the basic class working. > > Best, > > Vincent > > > On 12/28/06 12:54 PM, "Eike Welk" wrote: > >> If your main concern is to store scientific data on disk you might >> try: >> http://www.pytables.org/moin >> >> However, it uses numarray internally and a C library, which you have >> to build from source. (You use a Mac right?) >> >> >> Concerning your code: >> - Your two file solution seems impractical to me. I think you should >> just pickle your whole dbase object. >> - Maybe you should write 'load' and 'store' methods that create the >> temporary file, Pickler and Unpickler objects. >> -The __init__ method should then construct the object from a list of >> variable names and an array. >> -Offcourse you need a set method. >> >> more ideas: >> - A special variable name 'time'. Then you can implement a >> getAtTime( varNameList, timePoint) method with interpolation. >> - A 'plot' method that works like matplotlib's plot function. >> - An extract(varNameList) method, that returns a new dbase object with >> only the selected variables. >> - A companion class that can hold several time series at once to >> compare different experiments. >> >> Finally, post the code to the mailing list. At least I would like to >> use such a class :-). >> >> Yours >> Eike. >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> -- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs at kellogg.northwestern.edu Skype: vincentnijs -------------- next part -------------- A non-text attachment was scrubbed... Name: dbase.py Type: application/octet-stream Size: 2800 bytes Desc: not available URL: From v-nijs at kellogg.northwestern.edu Thu Dec 28 20:43:42 2006 From: v-nijs at kellogg.northwestern.edu (Vincent Nijs) Date: Thu, 28 Dec 2006 19:43:42 -0600 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: Message-ID: Sorry for the extra post. There were are few errors in the previous attachment. Vincent On 12/28/06 5:39 PM, "Vincent Nijs" wrote: > Based on Eike's input the dbase class can now also load and dump (simple) > csv and pickle files. See the tests at the bottom of the file and the > doc-strings. > > If there is an easy way to read array data + variable names using the csv > module it would be great if that could be added to cookbook/InputOutput. I > couldn't figure out how to do it. > > Eike: > I think I can figure out how to add a plot method. However, if you have some > more suggestions on how to implement the getAtTime, extract, and set methods > you mentioned that would be great. > > Vincent > > > On 12/28/06 1:40 PM, "Vincent Nijs" wrote: > >> Thanks for the input Eike. >> >> I will add load and store methods to Pickle/UnPickle the object. I have got >> to get the data into the class first however from an ascii file (txt or >> csv). >> >> I'd like to read the data and variable names directly from a single csv >> file. I tried this through the python csv module but it would read all data >> as strings and I couldn't figure out how to easily separate the variable >> names and the data. I you have any suggestion on how I might do this please >> let me know. >> >> Unfortunately I don't know what a 'set' method is or would do :) Could you >> point to an example perhaps. >> >> I like your ideas for extending the class. I'll look into that when I get >> the basic class working. >> >> Best, >> >> Vincent >> >> >> On 12/28/06 12:54 PM, "Eike Welk" wrote: >> >>> If your main concern is to store scientific data on disk you might >>> try: >>> http://www.pytables.org/moin >>> >>> However, it uses numarray internally and a C library, which you have >>> to build from source. (You use a Mac right?) >>> >>> >>> Concerning your code: >>> - Your two file solution seems impractical to me. I think you should >>> just pickle your whole dbase object. >>> - Maybe you should write 'load' and 'store' methods that create the >>> temporary file, Pickler and Unpickler objects. >>> -The __init__ method should then construct the object from a list of >>> variable names and an array. >>> -Offcourse you need a set method. >>> >>> more ideas: >>> - A special variable name 'time'. Then you can implement a >>> getAtTime( varNameList, timePoint) method with interpolation. >>> - A 'plot' method that works like matplotlib's plot function. >>> - An extract(varNameList) method, that returns a new dbase object with >>> only the selected variables. >>> - A companion class that can hold several time series at once to >>> compare different experiments. >>> >>> Finally, post the code to the mailing list. At least I would like to >>> use such a class :-). >>> >>> Yours >>> Eike. >>> >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://projects.scipy.org/mailman/listinfo/numpy-discussion >>> -- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs at kellogg.northwestern.edu Skype: vincentnijs -------------- next part -------------- A non-text attachment was scrubbed... Name: dbase.py Type: application/octet-stream Size: 2862 bytes Desc: not available URL: From nadavh at visionsense.com Fri Dec 29 01:02:07 2006 From: nadavh at visionsense.com (Nadav Horesh) Date: Fri, 29 Dec 2006 08:02:07 +0200 Subject: [Numpy-discussion] optimize.fmin_cg bug Message-ID: <07C6A61102C94148B8104D42DE95F7E8C8F163@exchange2k.envision.co.il> optimize.fmin_cg fails when the function to be minimized returns a numpy scalar: >>> from numpy import * >>> from scipy import optimize >>> f = lambda x: exp(x)-x >>> df = lambda x: exp(x)-1.0 >>> scipy.optimize.fmin(f, [0.2]) Traceback (most recent call last): File "", line 1, in scipy.optimize.fmin(f, [0.2]) NameError: name 'scipy' is not defined >>> optimize.fmin(f, [0.2]) Optimization terminated successfully. Current function value: 1.000000 Iterations: 15 Function evaluations: 30 array([ 3.88578059e-16]) >>> optimize.fmin_cg(f, [0.2], df) Traceback (most recent call last): File "", line 1, in optimize.fmin_cg(f, [0.2], df) File "C:\Python25\Lib\site-packages\scipy\optimize\optimize.py", line 855, in fmin_cg old_fval_backup,old_old_fval_backup) File "C:\Python25\Lib\site-packages\scipy\optimize\optimize.py", line 471, in line_search phi0, derphi0, c1, c2) File "C:\Python25\Lib\site-packages\scipy\optimize\optimize.py", line 359, in zoom a_j = _cubicmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi, a_rec, phi_rec) File "C:\Python25\Lib\site-packages\scipy\optimize\optimize.py", line 309, in _cubicmin [A,B] = numpy.dot([[dc**2, -db**2],[-dc**3, db**3]],[fb-fa-C*db,fc-fa-C*dc]) ValueError: objects are not aligned # Here is the cure: Make f return a python float >>> f = lambda x: float(exp(x)-x) >>> optimize.fmin_cg(f, [0.2], df) Optimization terminated successfully. Current function value: 1.000000 Iterations: 2 Function evaluations: 14 Gradient evaluations: 8 array([ -8.15285339e-14]) I had this error with scipy 0.5.2 and scipy from svn. Nadav. From mail at stevesimmons.com Fri Dec 29 04:05:23 2006 From: mail at stevesimmons.com (Stephen Simmons) Date: Fri, 29 Dec 2006 03:05:23 -0600 Subject: [Numpy-discussion] Advice please on efficient subtotal function Message-ID: Hi, I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. This is more easily explained in code that words, thus: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] data comes from a series of PyTables files with ~200m rows. Each row has ~20 cols, and I use the first three columns (which are 1-3 char strings) to form the indexing functions i[] and j[], then want to calc averages of the remaining 17 numerical cols. I have tried various indirect ways of doing this with searchsorted and bincount, but intuitively they feel overly complex solutions to what is essentially a very simple problem. My work involved comparing the subtotals for various different segmentation strategies (the i[] and j[] indexing functions). Efficient solutions are important because I need to make many passes through the 200m rows of data. Memory usage is the easiest thing for me to adjust by changing how many rows of data to read in for each pass and then reusing the same array data buffers. Thanks in advance for any suggestions! Stephen From gregwillden at gmail.com Fri Dec 29 09:04:00 2006 From: gregwillden at gmail.com (Greg Willden) Date: Fri, 29 Dec 2006 08:04:00 -0600 Subject: [Numpy-discussion] Advice please on efficient subtotal function In-Reply-To: References: Message-ID: <903323ff0612290604m62f7193bwb0451f656093d214@mail.gmail.com> Hi Stephen, If you want to sum/average down a column or across a row you can use sum(). The optional axis={0,1} parameter determines whether you are summing down a column (default or axis=0) or across a row (axis=1). Greg On 12/29/06, Stephen Simmons wrote: > > Hi, > > I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. > This > is more easily explained in code that words, thus: > > for n in xrange(len(data)): > totals[ i[n], j[n] ] += data[n] > > data comes from a series of PyTables files with ~200m rows. Each row has > ~20 > cols, and I use the first three columns (which are 1-3 char strings) to > form > the indexing functions i[] and j[], then want to calc averages of the > remaining 17 numerical cols. > > I have tried various indirect ways of doing this with searchsorted and > bincount, but intuitively they feel overly complex solutions to what is > essentially a very simple problem. > > My work involved comparing the subtotals for various different > segmentation > strategies (the i[] and j[] indexing functions). Efficient solutions are > important because I need to make many passes through the 200m rows of > data. > Memory usage is the easiest thing for me to adjust by changing how many > rows > of data to read in for each pass and then reusing the same array data > buffers. > > Thanks in advance for any suggestions! > > Stephen > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Fri Dec 29 10:05:28 2006 From: faltet at carabos.com (Francesc Altet) Date: Fri, 29 Dec 2006 16:05:28 +0100 Subject: [Numpy-discussion] Advice please on efficient subtotal function In-Reply-To: References: Message-ID: <200612291605.29095.faltet@carabos.com> A Divendres 29 Desembre 2006 10:05, Stephen Simmons escrigu?: > Hi, > > I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. > This is more easily explained in code that words, thus: > > for n in xrange(len(data)): > totals[ i[n], j[n] ] += data[n] > > data comes from a series of PyTables files with ~200m rows. Each row has > ~20 cols, and I use the first three columns (which are 1-3 char strings) to > form the indexing functions i[] and j[], then want to calc averages of the > remaining 17 numerical cols. > > I have tried various indirect ways of doing this with searchsorted and > bincount, but intuitively they feel overly complex solutions to what is > essentially a very simple problem. > > My work involved comparing the subtotals for various different segmentation > strategies (the i[] and j[] indexing functions). Efficient solutions are > important because I need to make many passes through the 200m rows of data. > Memory usage is the easiest thing for me to adjust by changing how many > rows of data to read in for each pass and then reusing the same array data > buffers. Well, from your words I guess you should already have tested this, but just in case. As PyTables saves data in tables row-wise, it is always faster using the complete row for computations in each iteration than using just a single column. This is shown in the small benchmark that I'm attaching at the end of the message. Here is its output for a table with 1m rows: time for creating the file--> 12.044 time for using column reads --> 46.407 time for using the row wise iterator--> 73.036 time for using block reads (row wise)--> 5.156 So, using block reads (in case you can use them) is your best bet. HTH, -------------------------------------------------------------------------------------- import tables import numpy from time import time nrows = 1000*1000 # Create a table definition with 17 double cols and 3 string cols coltypes = numpy.dtype("f8,"*17 + "S3,"*3) t1 = time() # Create a file with an empty table. Use compression to minimize file size. f = tables.openFile("/tmp/prova.h5", 'w') table = f.createTable(f.root, 'table', numpy.empty(0, coltypes), filters=tables.Filters(complevel=1, complib='lzo')) # Fill the table with default values (empty strings and zeros) row = table.row for nrow in xrange(nrows): row.append() f.close() print "time for creating the file-->", round(time()-t1, 3) # *********** Start benchmarks ************************** f = tables.openFile("/tmp/prova.h5", 'r') table = f.root.table colnames = table.colnames[:-3] # exclude the string cols # Loop over the table using column reads t1 = time(); cum = numpy.zeros(17) for ncol, colname in enumerate(colnames): col = table.read(0, nrows, field=colname) cum[ncol] += col.sum() print "time for using column reads -->", round(time()-t1, 3) # Loop over the table using its row iterator t1 = time(); cum = numpy.zeros(17) for row in table: for ncol, colname in enumerate(colnames): cum[ncol] += row[colname] print "time for using the row iterator-->", round(time()-t1, 3) # Loop over the table using block reads (row wise) t1 = time(); cum = numpy.zeros(17) step = 10000 for nrow in xrange(0, nrows, step): ra = table[nrow:nrow+step] for ncol, colname in enumerate(colnames): cum[ncol] += ra[colname].sum() print "time for using block reads (row wise)-->", round(time()-t1, 3) f.close() -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From Chris.Barker at noaa.gov Fri Dec 29 13:29:45 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 29 Dec 2006 10:29:45 -0800 Subject: [Numpy-discussion] Newbie Question, Probability In-Reply-To: <4592CE32.3060700@noaa.gov> References: <627102C921CD9745B070C3B10CB8199B010EBF69@hardwire.esri.com> <458AF84D.1060400@ee.byu.edu> <200612211613.18347.pgmdevlist@gmail.com> <458B0000.8070909@gmail.com> <458BA9B5.6050709@gmx.net> <458CCC25.7050507@ee.byu.edu> <4592CE32.3060700@noaa.gov> Message-ID: <45955E99.9070802@noaa.gov> I just discovered the: Scipy Superpack for OS X http://trichech.us/?page_id=4 Maybe this will help folks looking for an OS_X Scipy build. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mail at stevesimmons.com Fri Dec 29 19:22:16 2006 From: mail at stevesimmons.com (Stephen Simmons) Date: Fri, 29 Dec 2006 18:22:16 -0600 Subject: [Numpy-discussion] Advice please on efficient subtotal function In-Reply-To: <200612291605.29095.faltet@carabos.com> References: <200612291605.29095.faltet@carabos.com> Message-ID: Thanks Francesc, but I am already planning to read the data in block-wise as you suggest. My question is rather how best to update the subtotals for each block in a parallel way using numpy efficiently, rather than a simplistic and slow element-by-element loop. I can't use a simple sum(), as in your benchmark example or Greg's reply, because I need to do: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] and not for n in xrange(len(data)): totals[n] += data[n] My best solution so far is roughly like this: - read in next block of 100k or so rows (taking into account the PyTables table's _v_maxTuples and _v_chunksize) - calculate the subtotal index arrays i and j - do a lexsort() on [i, j, n] - partition the sorted [i, j, n] into subsets where the i and j arrays change values. The k_th such subset is thus s_k = [ i_k, j_k, [n_k0, ..., n_kN] ] - update the subtotals for each subset in the block totals[i_k, j_k]+= sum(data[n_k0, ..., n_kN]) This should be reasonably efficient, but it's messy, and I'm not familar enough with numpy's indexing tricks to get this right first time. Maybe instead I'll have a go at writing a Pyrex function that implements the simple loop at C speed: subtotal2d(data_array, idx_array, out=None, dtype=None) where data_array is Nx1, idx_array is NxM and out is M-dimensional. Incidentally, there's one other function I'd find useful here in forming the index arrays i[] and j[], a fast translate-from-dict function: arr2 = fromiter( d[a] for a in arr1 ) My initial impression is that a C version would be substantially faster; maybe I should do some benchmarking to see whether a pure Python/numpy approach is actually faster than I expect. Cheers, and thanks for any further suggestions, Stephen Francesc Altet wrote: > A Divendres 29 Desembre 2006 10:05, Stephen Simmons escrigu?: > > Hi, > > > > I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. > > This is more easily explained in code that words, thus: > > > > for n in xrange(len(data)): > > totals[ i[n], j[n] ] += data[n] > > > > data comes from a series of PyTables files with ~200m rows. Each row has > > ~20 cols, and I use the first three columns (which are 1-3 char strings) to > > form the indexing functions i[] and j[], then want to calc averages of the > > remaining 17 numerical cols. > > > > I have tried various indirect ways of doing this with searchsorted and > > bincount, but intuitively they feel overly complex solutions to what is > > essentially a very simple problem. > > > > My work involved comparing the subtotals for various different segmentation > > strategies (the i[] and j[] indexing functions). Efficient solutions are > > important because I need to make many passes through the 200m rows of data. > > Memory usage is the easiest thing for me to adjust by changing how many > > rows of data to read in for each pass and then reusing the same array data > > buffers. > > Well, from your words I guess you should already have tested this, but just in > > case. As PyTables saves data in tables row-wise, it is always faster using > the complete row for computations in each iteration than using just a single > column. This is shown in the small benchmark that I'm attaching at the end of > the message. Here is its output for a table with 1m rows: > > time for creating the file--> 12.044 > time for using column reads --> 46.407 > time for using the row wise iterator--> 73.036 > time for using block reads (row wise)--> 5.156 > > So, using block reads (in case you can use them) is your best bet. > > HTH, > > -------------------------------------------------------------------------------------- > import tables > import numpy > from time import time > > nrows = 1000*1000 > > # Create a table definition with 17 double cols and 3 string cols > coltypes = numpy.dtype("f8,"*17 + "S3,"*3) > > t1 = time() > # Create a file with an empty table. Use compression to minimize file size. > f = tables.openFile("/tmp/prova.h5", 'w') > table = f.createTable(f.root, 'table', numpy.empty(0, coltypes), > filters=tables.Filters(complevel=1, complib='lzo')) > # Fill the table with default values (empty strings and zeros) > row = table.row > for nrow in xrange(nrows): > row.append() > f.close() > print "time for creating the file-->", round(time()-t1, 3) > > # *********** Start benchmarks ************************** > f = tables.openFile("/tmp/prova.h5", 'r') > table = f.root.table > colnames = table.colnames[:-3] # exclude the string cols > > # Loop over the table using column reads > t1 = time(); cum = numpy.zeros(17) > for ncol, colname in enumerate(colnames): > col = table.read(0, nrows, field=colname) > cum[ncol] += col.sum() > print "time for using column reads -->", round(time()-t1, 3) > > # Loop over the table using its row iterator > t1 = time(); cum = numpy.zeros(17) > for row in table: > for ncol, colname in enumerate(colnames): > cum[ncol] += row[colname] > print "time for using the row iterator-->", round(time()-t1, 3) > > # Loop over the table using block reads (row wise) > t1 = time(); cum = numpy.zeros(17) > step = 10000 > for nrow in xrange(0, nrows, step): > ra = table[nrow:nrow+step] > for ncol, colname in enumerate(colnames): > cum[ncol] += ra[colname].sum() > print "time for using block reads (row wise)-->", round(time()-t1, 3) > > f.close() From eike.welk at gmx.net Fri Dec 29 20:22:24 2006 From: eike.welk at gmx.net (Eike Welk) Date: Sat, 30 Dec 2006 02:22:24 +0100 Subject: [Numpy-discussion] newbie: attempt at data frame In-Reply-To: References: Message-ID: <200612300222.25274.eike.welk@gmx.net> On Friday 29 December 2006 00:39, Vincent Nijs wrote: > Eike: > I think I can figure out how to add a plot method. However, if you > have some more suggestions on how to implement the getAtTime, > extract, and set methods you mentioned that would be great. Set method: I thought of a method to change the the data. Something like: myDb.set(varNameList, dataArray) Extract method: A way to get an other dbase object with a subset of variables. Tomorrow I'll propose an implementation. Because your __init__ method wants a file name it needs to be changed too. GetAtTime: Maybe your data are samples from some continuous process or function. Then you might want to have values from between the stored timepoints. You could compute them trough interpolation. The following class from scipy will do the job: http://www.scipy.org/doc/api_docs/scipy.interpolate.interpolate.interp1d.html Yours Eike. From bthom at cs.hmc.edu Fri Dec 29 21:04:21 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Fri, 29 Dec 2006 18:04:21 -0800 Subject: [Numpy-discussion] test issue Message-ID: Hello, I've been going thru Dave Kuhlman's "SciPy Course Outline" (http:// www.rexx.com/~dkuhlman/scipy_course_01.html) and found out about test functions -- very cool. Except that on my end, not all tests pass (appended below). Is this a problem for other people? Is it something I should worry about? Here's my setup: Mac G5 w/OS X 10.4.8, using MacPython 2.4, numpy.__version__ is 1.0, matplotlib.__version__ 0.87.7 and Numeric.__version__ 24.2 Thanks, --b ========== In [94]: import numpy In [95]: numpy.test() Found 13 tests for numpy.core.umath Found 9 tests for numpy.lib.arraysetops Found 3 tests for numpy.fft.helper Found 1 tests for numpy.lib.ufunclike Found 4 tests for numpy.ctypeslib Found 2 tests for numpy.lib.polynomial Found 8 tests for numpy.core.records Found 26 tests for numpy.core.numeric Found 5 tests for numpy.distutils.misc_util Found 3 tests for numpy.lib.getlimits Found 31 tests for numpy.core.numerictypes Found 4 tests for numpy.core.scalarmath Found 12 tests for numpy.lib.twodim_base Found 47 tests for numpy.lib.shape_base Found 4 tests for numpy.lib.index_tricks Found 32 tests for numpy.linalg.linalg Found 42 tests for numpy.lib.type_check Found 184 tests for numpy.core.multiarray Found 36 tests for numpy.core.ma Found 10 tests for numpy.core.defmatrix Found 41 tests for numpy.lib.function_base Found 0 tests for __main__ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ F....................................................................... ........................................................................ ............. ====================================================================== FAIL: Ticket #112 ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy/core/tests/test_regression.py", line 220, in check_longfloat_repr assert(str(a)[1:9] == str(a[0])[:8]) AssertionError ---------------------------------------------------------------------- Ran 517 tests in 1.241s FAILED (failures=1) From efiring at hawaii.edu Fri Dec 29 21:58:43 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 29 Dec 2006 16:58:43 -1000 Subject: [Numpy-discussion] test issue In-Reply-To: References: Message-ID: <4595D5E3.7000002@hawaii.edu> belinda thom wrote: > Hello, > > I've been going thru Dave Kuhlman's "SciPy Course Outline" (http:// > www.rexx.com/~dkuhlman/scipy_course_01.html) and found out about test > functions -- very cool. Except that on my end, not all tests pass > (appended below). Is this a problem for other people? Is it something > I should worry about? Not a real problem. The test has been commented out in svn with the notation: # Longfloat support is not consistent enough across # platforms for this test to be meaningful. Eric > > Here's my setup: Mac G5 w/OS X 10.4.8, using MacPython 2.4, > numpy.__version__ is 1.0, matplotlib.__version__ 0.87.7 and > Numeric.__version__ 24.2 > > Thanks, > > --b > > ========== > > In [94]: import numpy > > In [95]: numpy.test() > Found 13 tests for numpy.core.umath > Found 9 tests for numpy.lib.arraysetops > Found 3 tests for numpy.fft.helper > Found 1 tests for numpy.lib.ufunclike > Found 4 tests for numpy.ctypeslib > Found 2 tests for numpy.lib.polynomial > Found 8 tests for numpy.core.records > Found 26 tests for numpy.core.numeric > Found 5 tests for numpy.distutils.misc_util > Found 3 tests for numpy.lib.getlimits > Found 31 tests for numpy.core.numerictypes > Found 4 tests for numpy.core.scalarmath > Found 12 tests for numpy.lib.twodim_base > Found 47 tests for numpy.lib.shape_base > Found 4 tests for numpy.lib.index_tricks > Found 32 tests for numpy.linalg.linalg > Found 42 tests for numpy.lib.type_check > Found 184 tests for numpy.core.multiarray > Found 36 tests for numpy.core.ma > Found 10 tests for numpy.core.defmatrix > Found 41 tests for numpy.lib.function_base > Found 0 tests for __main__ > ........................................................................ > ........................................................................ > ........................................................................ > ........................................................................ > ........................................................................ > F....................................................................... > ........................................................................ > ............. > ====================================================================== > FAIL: Ticket #112 > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/site-packages/numpy/core/tests/test_regression.py", line > 220, in check_longfloat_repr > assert(str(a)[1:9] == str(a[0])[:8]) > AssertionError > > ---------------------------------------------------------------------- > Ran 517 tests in 1.241s > > FAILED (failures=1) > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From bthom at cs.hmc.edu Fri Dec 29 23:02:45 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Fri, 29 Dec 2006 20:02:45 -0800 Subject: [Numpy-discussion] test issue In-Reply-To: <4595D5E3.7000002@hawaii.edu> References: <4595D5E3.7000002@hawaii.edu> Message-ID: <2A589751-77C9-4F8E-8B7F-0112A1876EA5@cs.hmc.edu> Eric, Thanks for the well-thought-out answers to some of my recent posts. I've been using: http://pythonmac.org/packages/py24-fat/index.html for installing scipy, numpy, and matplotlib, as I didn't feel as confident installing things manually. Should I be using svn instead? (Is that what most users do?) And if so, is there a two-minute tutorial on what I'd need to do to get that stuff running on my machine? (The code I end up using needs to be stable enough for classroom use). Thanks again, --b On Dec 29, 2006, at 6:58 PM, Eric Firing wrote: > belinda thom wrote: >> Hello, >> >> I've been going thru Dave Kuhlman's "SciPy Course Outline" (http:// >> www.rexx.com/~dkuhlman/scipy_course_01.html) and found out about test >> functions -- very cool. Except that on my end, not all tests pass >> (appended below). Is this a problem for other people? Is it something >> I should worry about? > > Not a real problem. The test has been commented out in svn with the > notation: > > # Longfloat support is not consistent enough across > # platforms for this test to be meaningful. > > Eric > >> >> Here's my setup: Mac G5 w/OS X 10.4.8, using MacPython 2.4, >> numpy.__version__ is 1.0, matplotlib.__version__ 0.87.7 and >> Numeric.__version__ 24.2 >> >> Thanks, >> >> --b >> >> ========== >> >> In [94]: import numpy >> >> In [95]: numpy.test() >> Found 13 tests for numpy.core.umath >> Found 9 tests for numpy.lib.arraysetops >> Found 3 tests for numpy.fft.helper >> Found 1 tests for numpy.lib.ufunclike >> Found 4 tests for numpy.ctypeslib >> Found 2 tests for numpy.lib.polynomial >> Found 8 tests for numpy.core.records >> Found 26 tests for numpy.core.numeric >> Found 5 tests for numpy.distutils.misc_util >> Found 3 tests for numpy.lib.getlimits >> Found 31 tests for numpy.core.numerictypes >> Found 4 tests for numpy.core.scalarmath >> Found 12 tests for numpy.lib.twodim_base >> Found 47 tests for numpy.lib.shape_base >> Found 4 tests for numpy.lib.index_tricks >> Found 32 tests for numpy.linalg.linalg >> Found 42 tests for numpy.lib.type_check >> Found 184 tests for numpy.core.multiarray >> Found 36 tests for numpy.core.ma >> Found 10 tests for numpy.core.defmatrix >> Found 41 tests for numpy.lib.function_base >> Found 0 tests for __main__ >> ..................................................................... >> ... >> ..................................................................... >> ... >> ..................................................................... >> ... >> ..................................................................... >> ... >> ..................................................................... >> ... >> F.................................................................... >> ... >> ..................................................................... >> ... >> ............. >> ===================================================================== >> = >> FAIL: Ticket #112 >> --------------------------------------------------------------------- >> - >> Traceback (most recent call last): >> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ >> python2.4/site-packages/numpy/core/tests/test_regression.py", line >> 220, in check_longfloat_repr >> assert(str(a)[1:9] == str(a[0])[:8]) >> AssertionError >> >> --------------------------------------------------------------------- >> - >> Ran 517 tests in 1.241s >> >> FAILED (failures=1) >> >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Fri Dec 29 23:47:56 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 29 Dec 2006 18:47:56 -1000 Subject: [Numpy-discussion] test issue In-Reply-To: <2A589751-77C9-4F8E-8B7F-0112A1876EA5@cs.hmc.edu> References: <4595D5E3.7000002@hawaii.edu> <2A589751-77C9-4F8E-8B7F-0112A1876EA5@cs.hmc.edu> Message-ID: <4595EF7C.6050508@hawaii.edu> belinda thom wrote: > Eric, > > Thanks for the well-thought-out answers to some of my recent posts. > > I've been using: > > http://pythonmac.org/packages/py24-fat/index.html > > for installing scipy, numpy, and matplotlib, as I didn't feel as > confident installing things manually. > > Should I be using svn instead? (Is that what most users do?) And if > so, is there a two-minute tutorial on what I'd need to do to get that > stuff running on my machine? (The code I end up using needs to be > stable enough for classroom use). Belinda, I think the great majority of mpl, numpy, and scipy users install from packages, not from svn or tarballs. I am in the minority. I use linux (presently Ubuntu Edgy, previously Mandriva), and in general installation from svn and tarballs is easy with linux for all three of these packages. There is an initial learning curve when one has to get the right libraries and devel packages installed, but once that is done then subsequent updates from svn are not a problem at all. I do not use Windows or OSX so I do not have personal experience, but based on what I have seen on the mailing lists it seems pretty clear that building from source--any source--on either of these platforms is much more daunting, and very few people do it. As far as I know, your pythonmac package source is a good choice. I'm sure one of the many Mac users on this list can elaborate. Eric From bthom at cs.hmc.edu Fri Dec 29 23:54:18 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Fri, 29 Dec 2006 20:54:18 -0800 Subject: [Numpy-discussion] test issue In-Reply-To: <4595EF7C.6050508@hawaii.edu> References: <4595D5E3.7000002@hawaii.edu> <2A589751-77C9-4F8E-8B7F-0112A1876EA5@cs.hmc.edu> <4595EF7C.6050508@hawaii.edu> Message-ID: Thanks again for the input. You've been really helpful. On Dec 29, 2006, at 8:47 PM, Eric Firing wrote: > As far as I know, your pythonmac package > source is a good choice. I'm sure one of the many Mac users on this > list can elaborate. I won't go into the long tirade of problems I've run into when trying to use these packages w/Mac OS X 10.4 But just in case you're interested in the flavor, the wx pacakge doesn't seem to work w/ matplotlib's WXAgg, setting matplotlib's numerix to Numeric breaks plotting. The list goes on. Your comments about installing from svn are like others I've heard. Unfortunately, if I really might have to bite the bullet at some point, as the wx / matplotlib problem is likely related to the fact that various pieces were built with different versions of the same compiler. (Sigh). --b From bthom at cs.hmc.edu Sat Dec 30 01:24:48 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Fri, 29 Dec 2006 22:24:48 -0800 Subject: [Numpy-discussion] numpy install on mac os x 10.4 Message-ID: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> Hi, I just used easy_install to get the latest version of numpy. Is this the "preferred" method for installing? I'm on a G5 w/mac OS X 10.4 and MacPython 2.4. I'm wondering why the os-related file names that easy_install creates have macosx-10.3 in them (as opposed to 10.4), e.g. creating /Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg Is this something I should be concerned about? Thanks, --b From robert.kern at gmail.com Sat Dec 30 17:25:50 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 30 Dec 2006 17:25:50 -0500 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> Message-ID: <4596E76E.60908@gmail.com> belinda thom wrote: > Hi, > > I just used easy_install to get the latest version of numpy. Is this > the "preferred" method for installing? > > I'm on a G5 w/mac OS X 10.4 and MacPython 2.4. > > I'm wondering why the os-related file names that easy_install creates > have macosx-10.3 in them (as opposed to 10.4), e.g. > > creating /Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg > > Is this something I should be concerned about? No. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From bthom at cs.hmc.edu Sat Dec 30 17:44:33 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Sat, 30 Dec 2006 14:44:33 -0800 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <4596E76E.60908@gmail.com> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> Message-ID: <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> >> I'm wondering why the os-related file names that easy_install creates >> have macosx-10.3 in them (as opposed to 10.4), e.g. >> >> creating /Library/Frameworks/Python.framework/Versions/2.4/lib/ >> python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg >> >> Is this something I should be concerned about? > > No. One of the reasons I was dorking around w/numpy again is because of a problem I ran into wrt scipy (this was when I was obtaining both from http://www.macpython.org/packages/py24-fat/index.html). Turns out that doing: >>> import scipy >>> scipy.test() produced: Python 2.4.4 (#1, Oct 18 2006, 10:34:39) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information. history mechanism set up >>> from scipy import * RuntimeError: module compiled against version 1000002 of C-API but this version of numpy is 1000009 Traceback (most recent call last): File "", line 1, in ? File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/scipy/io/__init__.py", line 8, in ? from numpyio import packbits, unpackbits, bswap, fread, fwrite, \ ImportError: numpy.core.multiarray failed to import >>> So I decided to try easy_install on both. Interestingly, the easy_install removed a sole (non-important) error when running numpy.test(), but I can't even get scipy to install into the site- packages directory; easy_install fails before it can do that (appended below). I am at my wits end. Advice on a painless way to install scipy on my G5 OS X 10.4.8 mac greatly appreciated. --b easy_install ~/Download/scipy-0.5.2.tar Processing scipy-0.5.2.tar Running scipy-0.5.2/setup.py -q bdist_egg --dist-dir /tmp/ easy_install-nO-aHA/scipy-0.5.2/egg-dist-tmp-qKoMO9 non-existing path in '/private/tmp/easy_install-nO-aHA/scipy-0.5.2/ Lib/linsolve': 'tests' /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/ site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg/numpy/distutils/ system_info.py:401: UserWarning: UMFPACK sparse solver (http://www.cise.ufl.edu/research/sparse/ umfpack/) not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [umfpack]) or by setting creating build/temp.macosx-10.3-fat-2.4/build creating build/temp.macosx-10.3-fat-2.4/build/src.macosx-10.3-fat-2.4 creating build/temp.macosx-10.3-fat-2.4/build/src.macosx-10.3- fat-2.4/Lib creating build/temp.macosx-10.3-fat-2.4/build/src.macosx-10.3- fat-2.4/Lib/fftpack creating build/temp.macosx-10.3-fat-2.4/private/tmp/easy_install- nO-aHA/scipy-0.5.2/Lib/fftpack creating build/temp.macosx-10.3-fat-2.4/private/tmp/easy_install- nO-aHA/scipy-0.5.2/Lib/fftpack/src compile options: '-DSCIPY_FFTW3_H -I/opt/local/include -Ibuild/ src.macosx-10.3-fat-2.4 -I/Library/Frameworks/Python.framework/ Versions/2.4/lib/python2.4/site-packages/numpy-1.0.1-py2.4- macosx-10.3-fat.egg/numpy/core/include -I/Library/Frameworks/ Python.framework/Versions/2.4/include/python2.4 -c' gcc: build/src.macosx-10.3-fat-2.4/fortranobject.c gcc: /private/tmp/easy_install-nO-aHA/scipy-0.5.2/Lib/fftpack/src/ zrfft.c gcc: build/src.macosx-10.3-fat-2.4/Lib/fftpack/_fftpackmodule.c gcc: /private/tmp/easy_install-nO-aHA/scipy-0.5.2/Lib/fftpack/src/ zfftnd.c gcc: /private/tmp/easy_install-nO-aHA/scipy-0.5.2/Lib/fftpack/src/ drfft.c gcc: /private/tmp/easy_install-nO-aHA/scipy-0.5.2/Lib/fftpack/src/ zfft.c Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/Current/bin/ easy_install", line 7, in ? sys.exit( File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 1588, in main with_ei_usage(lambda: File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 1577, in with_ei_usage return f() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 1592, in distclass=DistributionWithoutHelpCommands, **kw File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 211, in run self.easy_install(spec, not self.no_deps) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 427, in easy_install return self.install_item(None, spec, tmpdir, deps, True) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 471, in install_item dists = self.install_eggs(spec, download, tmpdir) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 655, in install_eggs return self.build_and_install(setup_script, setup_base) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 930, in build_and_install self.run_setup(setup_script, setup_base, args) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ easy_install.py", line 919, in run_setup run_setup(setup_script, args) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/ sandbox.py", line 26, in run_setup DirectorySandbox(setup_dir).run( File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/ sandbox.py", line 63, in run return func() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/ sandbox.py", line 29, in {'__file__':setup_script, '__name__':'__main__'} File "setup.py", line 55, in ? File "setup.py", line 47, in setup_package File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg/numpy/ distutils/core.py", line 174, in setup return old_setup(**new_attr) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ bdist_egg.py", line 174, in run cmd = self.call_command('install_lib', warn_dir=0) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ bdist_egg.py", line 161, in call_command self.run_command(cmdname) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/setuptools-0.6c3-py2.4.egg/setuptools/command/ install_lib.py", line 20, in run self.build() File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/command/install_lib.py", line 110, in build self.run_command('build_ext') File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg/numpy/ distutils/command/build_ext.py", line 121, in run self.build_extensions() File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/distutils/command/build_ext.py", line 405, in build_extensions self.build_extension(ext) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy-1.0.1-py2.4-macosx-10.3-fat.egg/numpy/ distutils/command/build_ext.py", line 312, in build_extension link = self.fcompiler.link_shared_object AttributeError: 'NoneType' object has no attribute 'link_shared_object' From gnata at obs.univ-lyon1.fr Sat Dec 30 19:38:50 2006 From: gnata at obs.univ-lyon1.fr (Xavier Gnata) Date: Sun, 31 Dec 2006 01:38:50 +0100 Subject: [Numpy-discussion] PyArray_FromDims segfault?? Message-ID: <4597069A.4050705@obs.univ-lyon1.fr> Hello, I would like to use PyObject_CallObject to call the imshow function from matplolib from a C code. So, the first step is to create a 2D PyArrayObject from a C array. I have read the Numpy book which is great but now I'm puzzled: My goal is to convert a C array of doubles into a 2D numpy array object. PyArray_SimpleNewFromData(nd, dims, typenum, data) seems to be the function I need but I did not manage to write/find any compiling *and* not segfaulting code performing this convertion :( Ok, maybe PyArray_SimpleNewFromData is to complex to begin with. Let's try with PyArray_FromDims... segfault also :( My code looks so simpleKK I cannot see what I'm doing the wrong way :( #include #include int main (int argc, char *argv[]) { PyArrayObject *array; int i, length = 100; double* a; double x; Py_Initialize (); array = (PyArrayObject *) PyArray_FromDims(1, &length, PyArray_DOUBLE); a = new double[100]; for (i = 0; i < length; i++) { x = (double) i/(length-1); a[i] = x; } Py_Finalize (); return 0; } g++ imshow.cpp -o imshow -lpython2.4 -I/usr/lib/python2.4/site-packages/numpy/core/include/ ./imshow -> Segmentation fault I really need the Py_Initialize/Py_Finalize calls because it is 'embedding python' (and not extending :)). To sum up, any code taking an double* and 2 int standind for the dimX and the dimY and returning the corresponding PyObject would be very much appreciated :) I don't care if it has to copy the data. The simpler the better. In [2]: numpy.__version__ Out[2]: '1.0.2.dev3491' Xavier. ps : I really want to use only the C API first at all to learn it and because it really looks like as the simple way to fit my needs. Ok PyObject_CallObject needs quite a lot of additionnal code but it is always almost the same :) -- ############################################ Xavier Gnata CRAL - Observatoire de Lyon 9, avenue Charles Andr? 69561 Saint Genis Laval cedex Phone: +33 4 78 86 85 28 Fax: +33 4 78 86 83 86 E-mail: gnata at obs.univ-lyon1.fr ############################################ From Chris.Barker at noaa.gov Sun Dec 31 02:02:19 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 30 Dec 2006 23:02:19 -0800 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> Message-ID: <4597607B.408@noaa.gov> belinda thom wrote: > Advice on a painless way to install scipy on my G5 OS X 10.4.8 mac > greatly appreciated. Sorry, there isn't one at this point -- I think numpy, SciPy, MPL, and wx are all fairly stable right now, so it's a pretty good time to do it, but it's a challenge because: 1) SciPy, MPL, numpy and wx all have to be compatible -- so it's best if the same person does them all, or at least communicates enough to make sure the packages at pythonmac all match. 2) Building MPL requires the Universal version of a few libs (though libpng may be the only one now (or is it libjpeg? -- don't have my Mac handy), as there MAY be a version of libfreetype that works provided by Apple now. This is the hardest one: 3) SciPy (or at least parts of it) requires Fortran. Apple has not released a gcc Fortran, and the ones that do exist are not Universal, and require libs in inconvenient places. This makes it hard to build a Universal, easy to install, binary of SciPy -- It's still hard to build one yourself, but if you don't need it universal, it is doable by mere mortals. I'd love to see a Universal one in the pythonmac repository (and I think with the right incantations of lipo , it should be doable), but in the meantime, maybe we should at least have separate PPC and Intel versions-- and is there any chance of either statically linking or putting the libs in the Python tree somewhere? So, I'd love to have someone: Start with Python2.5 from pythonmac (2.4 would be nice too -- but let's focus on 2.5) 1) Get the latest wxPython for OS-X (2.8.*) 2) Build the latest numpy 3) Build MPL against the above (and Numeric and numarray, if possible) 4) Build SciPy for both Intel and PPC (probably separately) 5) Put all that up on pythonmac. I'd like to do it, but I'm not the least bit sure when I'll be able to-- someone please beat me to it! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From bthom at cs.hmc.edu Sun Dec 31 02:16:18 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Sat, 30 Dec 2006 23:16:18 -0800 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <4597607B.408@noaa.gov> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> Message-ID: Not sure if this helps but I stumbled upon the following trick. Do the following: 1) install g77 via instructions at: http://www.scipy.org/Installing_SciPy/Mac_OS_X. in particular download: http://prdownloads.sf.net/hpc/g77v3.4-bin.tar.gz?download and then do: sudo tar -xvf g77v3.4-bin.tar -C / which installs everything in /usr/local, most importantly creating: lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libg2c.0.dylib@ - > libg2c.0.0.0.dylib lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libg2c.dylib@ -> libg2c.0.0.0.dylib lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libgcc_s.dylib@ - > libgcc_s.1.0.dylib in /usr/local/lib/. 2) once this was done, magically scipy could work. in particular, these libg2c dylibs need to be on my machine. Not sure if this is equivalent to "getting the fortran stuff to work", but it at least allowed me to run all the spicy.test() On Dec 30, 2006, at 11:02 PM, Christopher Barker wrote: > belinda thom wrote: > >> Advice on a painless way to install scipy on my G5 OS X 10.4.8 mac >> greatly appreciated. > > Sorry, there isn't one at this point -- I think numpy, SciPy, MPL, and > wx are all fairly stable right now, so it's a pretty good time to > do it, > but it's a challenge because: > > 1) SciPy, MPL, numpy and wx all have to be compatible -- so it's > best if > the same person does them all, or at least communicates enough to make > sure the packages at pythonmac all match. > > 2) Building MPL requires the Universal version of a few libs (though > libpng may be the only one now (or is it libjpeg? -- don't have my Mac > handy), as there MAY be a version of libfreetype that works > provided by > Apple now. > > This is the hardest one: > 3) SciPy (or at least parts of it) requires Fortran. Apple has not > released a gcc Fortran, and the ones that do exist are not Universal, > and require libs in inconvenient places. This makes it hard to build a > Universal, easy to install, binary of SciPy -- It's still hard to > build > one yourself, but if you don't need it universal, it is doable by mere > mortals. I'd love to see a Universal one in the pythonmac repository I think more on this point could really be helpful. I'm working off of Python2.4 because some AI-related code crashes on 2.5, so I hope this can still remain a priority. For me, the following combo seems to work good enough: 1) get matplotlib from www.macpython.org/packages/py24-fat (The superpack at scipy fails b/c a TkAgg library can't be found) 2) use the superpack for installing numpy and scipy. (These will only work for me if I've already done the g77 trick.) 3) use ipython gotten via easy_install (The superpack version is broken b/c it doesn't provide an executable) > (and I think with the right incantations of lipo , it should be > doable), > but in the meantime, maybe we should at least have separate PPC and > Intel versions-- and is there any chance of either statically > linking or > putting the libs in the Python tree somewhere? A need for a "scipy" package that actually works w/ipython, matplotlib, numpy, and scipy is seriously needed. Most of my friends think I'm crazy to have wasted all my time on this stuff. Its ridiculous. > So, I'd love to have someone: > > Start with Python2.5 from pythonmac (2.4 would be nice too -- but > let's > focus on 2.5) > > 1) Get the latest wxPython for OS-X (2.8.*) > 2) Build the latest numpy > 3) Build MPL against the above (and Numeric and numarray, if possible) > 4) Build SciPy for both Intel and PPC (probably separately) > 5) Put all that up on pythonmac. > > I'd like to do it, but I'm not the least bit sure when I'll be able > to-- > someone please beat me to it! I would like to say I could help, but: i) I'm pretty new to all this, ii) I've wasted so much time getting something running on my machine that I'm running out of time (have a class I need to teach to prepare for). Hopefully my comments here are at least helpful :-) --b p.s. I'd like to thank you for all the work you've done on this regard. I'm beginning to realize how time consuming this open source stuff can be... From oliphant at ee.byu.edu Sun Dec 31 02:33:09 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun, 31 Dec 2006 00:33:09 -0700 Subject: [Numpy-discussion] PyArray_FromDims segfault?? In-Reply-To: <4597069A.4050705@obs.univ-lyon1.fr> References: <4597069A.4050705@obs.univ-lyon1.fr> Message-ID: <459767B5.8010200@ee.byu.edu> Xavier Gnata wrote: > Hello, > > I would like to use PyObject_CallObject to call the imshow function from > matplolib from a C code. > So, the first step is to create a 2D PyArrayObject from a C array. > > I have read the Numpy book which is great but now I'm puzzled: > My goal is to convert a C array of doubles into a 2D numpy array object. > PyArray_SimpleNewFromData(nd, dims, typenum, data) seems to be the > function I need but I did not manage to write/find any compiling *and* > not segfaulting code performing this convertion :( > > Ok, maybe PyArray_SimpleNewFromData is to complex to begin with. > Let's try with PyArray_FromDims... segfault also :( > > My code looks so simpleKK I cannot see what I'm doing the wrong way :( > > #include > #include > > int > main (int argc, char *argv[]) > { > > PyArrayObject *array; > int i, length = 100; > double* a; > double x; > Py_Initialize (); > > array = (PyArrayObject *) > PyArray_FromDims(1, &length, PyArray_DOUBLE); > > a = new double[100]; > for (i = 0; i < length; i++) { > x = (double) i/(length-1); > a[i] = x; > } > Py_Finalize (); > > return 0; > } > > g++ imshow.cpp -o imshow -lpython2.4 > -I/usr/lib/python2.4/site-packages/numpy/core/include/ > > ./imshow -> Segmentation fault > > I really need the Py_Initialize/Py_Finalize calls because it is > 'embedding python' (and not extending :)). > > To sum up, any code taking an double* and 2 int standind for the dimX > and the dimY and returning the corresponding PyObject would be very > much appreciated :) I don't care if it has to copy the data. The simpler > the better. > You *always* need to use import_array() or one of it's variants somewhere in your code. In this case, because you are returning from main, you need to use: import_array1(-1) in order to return -1 on import error. The import array sets up the C-API so it can be used. Also, this code is not doing anything with the memory you just created for array. If you want to use pre-existing memory, then PyArray_SimpleNewFromData will work and construct a PyObject * (an ndarray object) where the memory is the pointer you pass in for the data-area to that function. You must be sure that the memory is not released before the returned object is used up or you will get segmentation faults. -Travis From erin.sheldon at gmail.com Sun Dec 31 12:27:24 2006 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 31 Dec 2006 12:27:24 -0500 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> Message-ID: <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> Hi All - You can do this quite simply with fink if you have the patience to wait for the compilations to finish. This works on my ppc mac with XCode and fink installed (12/31/2006): fink install scipy-py24 sudo apt-get install gettext-dev=0.10.40-25 gettext=0.10.40-25 fink install matplotlib-py24 For more details see this page I set up: http://howdy.physics.nyu.edu/index.php/Numpy_For_Mac_Using_Fink Erin On 12/31/06, belinda thom wrote: > Not sure if this helps but I stumbled upon the following trick. > > Do the following: > > 1) install g77 via instructions at: > > http://www.scipy.org/Installing_SciPy/Mac_OS_X. > > in particular download: > > http://prdownloads.sf.net/hpc/g77v3.4-bin.tar.gz?download > > and then do: > > sudo tar -xvf g77v3.4-bin.tar -C / > > which installs everything in /usr/local, most importantly creating: > > lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libg2c.0.dylib@ - > > libg2c.0.0.0.dylib > lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libg2c.dylib@ -> > libg2c.0.0.0.dylib > lrwxrwxrwx 1 root wheel 18 Dec 30 15:29 libgcc_s.dylib@ - > > libgcc_s.1.0.dylib > > in /usr/local/lib/. > > 2) once this was done, magically scipy could work. in particular, > these libg2c dylibs need to be on my machine. > > Not sure if this is equivalent to "getting the fortran stuff to > work", but it at least allowed me to run all the spicy.test() > > On Dec 30, 2006, at 11:02 PM, Christopher Barker wrote: > > > belinda thom wrote: > > > >> Advice on a painless way to install scipy on my G5 OS X 10.4.8 mac > >> greatly appreciated. > > > > Sorry, there isn't one at this point -- I think numpy, SciPy, MPL, and > > wx are all fairly stable right now, so it's a pretty good time to > > do it, > > but it's a challenge because: > > > > 1) SciPy, MPL, numpy and wx all have to be compatible -- so it's > > best if > > the same person does them all, or at least communicates enough to make > > sure the packages at pythonmac all match. > > > > 2) Building MPL requires the Universal version of a few libs (though > > libpng may be the only one now (or is it libjpeg? -- don't have my Mac > > handy), as there MAY be a version of libfreetype that works > > provided by > > Apple now. > > > > This is the hardest one: > > 3) SciPy (or at least parts of it) requires Fortran. Apple has not > > released a gcc Fortran, and the ones that do exist are not Universal, > > and require libs in inconvenient places. This makes it hard to build a > > Universal, easy to install, binary of SciPy -- It's still hard to > > build > > one yourself, but if you don't need it universal, it is doable by mere > > mortals. I'd love to see a Universal one in the pythonmac repository > > I think more on this point could really be helpful. > > I'm working off of Python2.4 because some AI-related code crashes on > 2.5, so I hope this can still remain a priority. > > For me, the following combo seems to work good enough: > > 1) get matplotlib from www.macpython.org/packages/py24-fat > > (The superpack at scipy fails b/c a TkAgg library can't be found) > > 2) use the superpack for installing numpy and scipy. > > (These will only work for me if I've already done the g77 trick.) > > 3) use ipython gotten via easy_install > > (The superpack version is broken b/c it doesn't provide an executable) > > > (and I think with the right incantations of lipo , it should be > > doable), > > but in the meantime, maybe we should at least have separate PPC and > > Intel versions-- and is there any chance of either statically > > linking or > > putting the libs in the Python tree somewhere? > > A need for a "scipy" package that actually works w/ipython, > matplotlib, numpy, and scipy is seriously needed. Most of my friends > think I'm crazy to have wasted all my time on this stuff. Its > ridiculous. > > > So, I'd love to have someone: > > > > Start with Python2.5 from pythonmac (2.4 would be nice too -- but > > let's > > focus on 2.5) > > > > 1) Get the latest wxPython for OS-X (2.8.*) > > 2) Build the latest numpy > > 3) Build MPL against the above (and Numeric and numarray, if possible) > > 4) Build SciPy for both Intel and PPC (probably separately) > > 5) Put all that up on pythonmac. > > > > I'd like to do it, but I'm not the least bit sure when I'll be able > > to-- > > someone please beat me to it! > > I would like to say I could help, but: i) I'm pretty new to all this, > ii) I've wasted so much time getting something running on my machine > that I'm running out of time (have a class I need to teach to prepare > for). Hopefully my comments here are at least helpful :-) > > --b > > p.s. I'd like to thank you for all the work you've done on this > regard. I'm beginning to realize how time consuming this open source > stuff can be... > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From jswhit at fastmail.fm Sun Dec 31 13:10:35 2006 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Sun, 31 Dec 2006 11:10:35 -0700 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> Message-ID: <4597FD1B.9000504@fastmail.fm> Erin Sheldon wrote: > Hi All - > > You can do this quite simply with fink if you have > the patience to wait for the compilations to finish. > This works on my ppc mac with > XCode and fink installed (12/31/2006): > > fink install scipy-py24 > sudo apt-get install gettext-dev=0.10.40-25 gettext=0.10.40-25 > fink install matplotlib-py24 > > For more details see this page I set up: > http://howdy.physics.nyu.edu/index.php/Numpy_For_Mac_Using_Fink > > Erin > Erin: Nice tutorial. I recommend one extra step though - right after installing fink, add 'unstable/main' to the 'Trees:' line in /sw/etc/fink.conf, and run 'fink selfupdate'. That way you will get the latest versions of all the packages. Also, if you want the python 2.5 versions, substitute 'py25' for 'py24'. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 NOAA/OAR/CDC R/PSD1 FAX : (303)497-6449 325 Broadway Boulder, CO, USA 80305-3328 From erin.sheldon at gmail.com Sun Dec 31 13:20:55 2006 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 31 Dec 2006 13:20:55 -0500 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <4597FD1B.9000504@fastmail.fm> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> <4597FD1B.9000504@fastmail.fm> Message-ID: <331116dc0612311020m160efb7ejcf1ad2d8c96c5b7e@mail.gmail.com> On 12/31/06, Jeff Whitaker wrote: > Erin Sheldon wrote: > > Hi All - > > > > You can do this quite simply with fink if you have > > the patience to wait for the compilations to finish. > > This works on my ppc mac with > > XCode and fink installed (12/31/2006): > > > > fink install scipy-py24 > > sudo apt-get install gettext-dev=0.10.40-25 gettext=0.10.40-25 > > fink install matplotlib-py24 > > > > For more details see this page I set up: > > http://howdy.physics.nyu.edu/index.php/Numpy_For_Mac_Using_Fink > > > > Erin > > > Erin: Nice tutorial. I recommend one extra step though - right after > installing fink, add 'unstable/main' to the 'Trees:' line in > /sw/etc/fink.conf, and run 'fink selfupdate'. That way you will get the > latest versions of all the packages. Right, thanks. That was explained in the tutorial but I only described how to use FinkCommander to enable the unstable branch. See " Installing NumPy and SciPy" BTW, it is a wiki so feel free to edit. > > Also, if you want the python 2.5 versions, substitute 'py25' for 'py24'. > > -Jeff From Chris.Barker at noaa.gov Sun Dec 31 13:45:21 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sun, 31 Dec 2006 10:45:21 -0800 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> Message-ID: <45980541.5070307@noaa.gov> Erin Sheldon wrote: > You can do this quite simply with fink I've generally stayed away form fink, as it felt like kind of a separate system within OS-X, rather than integrated -- kind of like cygwin. In particular, if you use Fink Python, can you: 1) Write apps that use the native GUI (not X), in particular, PyObjC, wx-Mac, and TK-aqua. 2) Bundle up apps with Py2App, or otherwise create self contained application bundles? 3) Universal (PPC+Intel) anything. Apart from "feel", I think those are the concrete reasons to use MacPython, rather than fink. Please correct me if I'm got a wrong (or outdated) impression. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From jswhit at fastmail.fm Sun Dec 31 15:01:36 2006 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Sun, 31 Dec 2006 13:01:36 -0700 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <45980541.5070307@noaa.gov> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> <45980541.5070307@noaa.gov> Message-ID: <45981720.90809@fastmail.fm> Christopher Barker wrote: > Erin Sheldon wrote: > >> You can do this quite simply with fink >> > > I've generally stayed away form fink, as it felt like kind of a separate > system within OS-X, rather than integrated -- kind of like cygwin. > > In particular, if you use Fink Python, can you: > > 1) Write apps that use the native GUI (not X), in particular, PyObjC, > wx-Mac, and TK-aqua. > > 2) Bundle up apps with Py2App, or otherwise create self contained > application bundles? > > 3) Universal (PPC+Intel) anything. > > Apart from "feel", I think those are the concrete reasons to use > MacPython, rather than fink. Please correct me if I'm got a wrong (or > outdated) impression. > > -Chris > > > Chris: The answer is No for all three. But for some scientists like me, who are used to working on linux/unix workstations, fink works well. I like being able to just run 'fink update scipy-py25 matplotlib-py25' to get the latest versions of everything. Also, being able to run stuff remotely via an ssh X11 tunnel to my office mac, and have the windows display back to my home mac, is a useful feature. It all comes down to what you feel comfortable with. Choice is good. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 NOAA/OAR/CDC R/PSD1 FAX : (303)497-6449 325 Broadway Boulder, CO, USA 80305-3328 From erin.sheldon at gmail.com Sun Dec 31 15:17:20 2006 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 31 Dec 2006 15:17:20 -0500 Subject: [Numpy-discussion] numpy install on mac os x 10.4 In-Reply-To: <45980541.5070307@noaa.gov> References: <03A1F20C-B0B0-4BFF-A780-B2C393F76160@cs.hmc.edu> <4596E76E.60908@gmail.com> <58A567CA-684C-46E0-8B50-FEE485213A16@cs.hmc.edu> <4597607B.408@noaa.gov> <331116dc0612310927m36bf776dg1558c30250c85c59@mail.gmail.com> <45980541.5070307@noaa.gov> Message-ID: <331116dc0612311217r4b9e0a03y629b15172693017@mail.gmail.com> On 12/31/06, Christopher Barker wrote: > Erin Sheldon wrote: > > You can do this quite simply with fink > > I've generally stayed away form fink, as it felt like kind of a separate > system within OS-X, rather than integrated -- kind of like cygwin. > > In particular, if you use Fink Python, can you: > > 1) Write apps that use the native GUI (not X), in particular, PyObjC, > wx-Mac, and TK-aqua. > > 2) Bundle up apps with Py2App, or otherwise create self contained > application bundles? > > 3) Universal (PPC+Intel) anything. > > Apart from "feel", I think those are the concrete reasons to use > MacPython, rather than fink. Please correct me if I'm got a wrong (or > outdated) impression. Hi Chris - I think you are correct. The solution I posted is not a long term solution for the eventual average numpy/scipy user. It was just a response to Belinda's original need for "Advice on a painless way to install scipy on my G5 OS X 10.4.8 mac". I don't mind waiting for things to compile so it seems painless to me. Erin From bthom at cs.hmc.edu Fri Dec 29 17:29:17 2006 From: bthom at cs.hmc.edu (belinda thom) Date: Fri, 29 Dec 2006 14:29:17 -0800 Subject: [Numpy-discussion] test issue Message-ID: Hello, I've been going thru Dave Kuhlman's "SciPy Course Outline" and found out about test functions -- very cool. Except that on my end, not all tests pass (appended below). Is this a problem for other people? Is it something I should worry about? Here's my setup: Mac G5 w/OS X 10.4.8, using MacPython 2.4, numpy.__version__ is 1.0, matplotlib.__version__ 0.87.7 and Numeric.__version__ 24.2 Thanks, --b ========== In [94]: import numpy In [95]: numpy.test() Found 13 tests for numpy.core.umath Found 9 tests for numpy.lib.arraysetops Found 3 tests for numpy.fft.helper Found 1 tests for numpy.lib.ufunclike Found 4 tests for numpy.ctypeslib Found 2 tests for numpy.lib.polynomial Found 8 tests for numpy.core.records Found 26 tests for numpy.core.numeric Found 5 tests for numpy.distutils.misc_util Found 3 tests for numpy.lib.getlimits Found 31 tests for numpy.core.numerictypes Found 4 tests for numpy.core.scalarmath Found 12 tests for numpy.lib.twodim_base Found 47 tests for numpy.lib.shape_base Found 4 tests for numpy.lib.index_tricks Found 32 tests for numpy.linalg.linalg Found 42 tests for numpy.lib.type_check Found 184 tests for numpy.core.multiarray Found 36 tests for numpy.core.ma Found 10 tests for numpy.core.defmatrix Found 41 tests for numpy.lib.function_base Found 0 tests for __main__ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ F....................................................................... ........................................................................ ............. ====================================================================== FAIL: Ticket #112 ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/site-packages/numpy/core/tests/test_regression.py", line 220, in check_longfloat_repr assert(str(a)[1:9] == str(a[0])[:8]) AssertionError ---------------------------------------------------------------------- Ran 517 tests in 1.241s FAILED (failures=1) From frbeaxs at earthlink.net Wed Dec 27 21:51:22 2006 From: frbeaxs at earthlink.net (frbeaxs) Date: Wed, 27 Dec 2006 18:51:22 -0800 Subject: [Numpy-discussion] Scipy - Numpy incompatiblility when calling upon Old Numeric Message-ID: I am using Python 2.4 with Numpy 0.9.8. Matpylib graphs function under these two version except when old numeric must be called to utilized the spline function in contouring the graphs leading to the error: Traceback (most recent call last): File "C:\File\XXX", line 3, in ? from scipy.sandbox.delaunay import * File "C:\Python24\lib\site-packages\scipy\__init__.py", line 32, in ? from numpy import oldnumeric File "C:\Python24\lib\site-packages\numpy\oldnumeric\__init__.py", line 3, in ? from compat import * File "C:\Python24\lib\site-packages\numpy\oldnumeric\compat.py", line 133, in ? from numpy import deprecate ImportError: cannot import name deprecate Scipy 0.5.0 seems to be compatible only with Numpy 101b, not version 0.9.8 and I have to switch version of Numpy to should I need to utilize Scipy, however certain matpylib graphs will not function. Upgrading Scipy to 0.5.2 and Numpy 101 has no negative effects on the programs I wrote or the matplotlib sample programs, however the following error message replaces the Import error above Matrix = matrix NameError:name matrix is not defined. Downgrading NUMPY to ver 9.6 produces even more problems. The best match seems to be Mumpy 0.9.8 used with SciPy 0.50 seems to be the best combination however it prohibits calling upon Oldnemeric which seems to be necessary to use the spline function. Does anyone have any idea on how to get around this? frbeaxs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andreas.Eisele at dfki.de Sat Dec 30 09:45:48 2006 From: Andreas.Eisele at dfki.de (Andreas Eisele) Date: Sat, 30 Dec 2006 15:45:48 +0100 (CET) Subject: [Numpy-discussion] Advice please on efficient subtotal function In-Reply-To: References: Message-ID: <6448907.1167489948431.SLOX.WebMail.wwwrun@corp-102> Hi Stephen, > > I'm looking for efficient ways to subtotal a 1-d array onto a 2-D > grid. This > is more easily explained in code that words, thus: > > for n in xrange(len(data)): > totals[ i[n], j[n] ] += data[n] > > data comes from a series of PyTables files with ~200m rows. Each row > has ~20 > cols, and I use the first three columns (which are 1-3 char strings) > to form > the indexing functions i[] and j[], then want to calc averages of the > remaining 17 numerical cols. > > I have tried various indirect ways of doing this with searchsorted and > bincount, but intuitively they feel overly complex solutions to what > is > essentially a very simple problem. > > My work involved comparing the subtotals for various different > segmentation > strategies (the i[] and j[] indexing functions). Efficient solutions > are > important because I need to make many passes through the 200m rows of > data. > Memory usage is the easiest thing for me to adjust by changing how > many rows > of data to read in for each pass and then reusing the same array data > buffers. > It looks as if the values in your i and j columns come from a limited range, so you may consider encoding pairs of (i,j) values into one int using a suitable encoding function (e.g. ij = i+K*j if both i and j are non-negative and K=max(i)+1). You could then use bincount(ij, data) to get the sums per encoded (i,j) pair. This should be efficient, and the complexity is only in the encoding/decoding steps. Best regards, Andreas ---- Dr. Andreas Eisele Senior Researcher DFKI GmbH, Language Technology Lab eisele at dfki.de Stuhlsatzenhausweg 3 tel: +49-681-302-5285 D-66123 Saarbr?cken fax: +49-681-302-5338