From ralf.gommers at googlemail.com Mon Jan 2 15:21:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 2 Jan 2012 21:21:14 +0100 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Sat, Dec 31, 2011 at 6:43 AM, Jaidev Deshpande < deshpande.jaidev at gmail.com> wrote: > Hi Chris > > > Documentation is specificsly excluded from GSoC (at least it was a > > couple years ago when I last was involved) > > Documentation wasn't excluded last year from GSoC, there were quite a > few projects that required a lot of documentation. > But yes, there was no "documentation only" project. > > Anyhow, it seems reasonable that testing alone can't be a project. > What about benchmarking and the related statistics? Does that qualify > as a worthwhile project (again, GSoC or otherwise)? > > That's certainly worth doing, and doing well. You could start with investigating what Wes has done with vbench so far, and look at how to get the output of that into http://speed.pypy.org/. I have the feeling it's not enough work for a GSoC project though, and with a project like starting scikits.signal you'd have a better chance. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 2 21:44:02 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2012 19:44:02 -0700 Subject: [Numpy-discussion] polynomial package update Message-ID: Hi All, I've made a pull request for a rather large update of the polynomial package. The new features are 1) Bug fixes 2) Improved documentation in the numpy reference 3) Preliminary support for multi-dimensional coefficient arrays 4) Support for NA in the fitting routines 5) Improved testing and test coverage 6) Gauss quadrature 7) Weight functions 8) (Mostly) Symmetrized companion matrices 9) Add cast and basis as static functions of convenience classes 10) Remove deprecated import from package *init*.py If anyone has an interest in that package, please take some time and review it here . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 3 00:46:16 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Jan 2012 00:46:16 -0500 Subject: [Numpy-discussion] polynomial package update In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 9:44 PM, Charles R Harris wrote: > Hi All, > > I've made a pull request for a? rather large update of the polynomial > package. The new features are > > 1) Bug fixes > 2) Improved documentation in the numpy reference > 3) Preliminary support for multi-dimensional coefficient arrays > 4) Support for NA in the fitting routines > 5) Improved testing and test coverage > 6) Gauss quadrature > 7) Weight functions > 8) (Mostly) Symmetrized companion matrices > 9) Add cast and basis as static functions of convenience classes > 10) Remove deprecated import from package init.py > > If anyone has an interest in that package, please take some time and review > it here. (Since I'm not setup for compiling numpy I cannot try it out. Just some spotty reading of the code.) The two things I'm most interested in are the 2d, 3d enhancements and the quadrature. What's the return of the 2d vander functions? If I read it correctly, it's: >>> xyn = np.array([['x^%d*y^%d'%(px,py) for py in range(5)] for px in range(3)]) >>> xyn array([['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4'], ['x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4'], ['x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4']], dtype='|S7') >>> xyn.ravel() array(['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4', 'x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4', 'x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4'], dtype='|S7') Are the normalization constants available in explicit form to get an orthonormal basis? The test_100 look like good recipes for getting the normalization and the integration constants. Are the quads weights and points the same as in scipy.special (up to floating point differences)? Looks very useful and I'm looking forward to trying it out, and I will borrow some code like test_100 as recipes. (For densities, I still need mostly orthonormal basis and integration normalized to 1.) Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chaoyuejoy at gmail.com Tue Jan 3 04:01:39 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 3 Jan 2012 10:01:39 +0100 Subject: [Numpy-discussion] strange nan in np.ma.average Message-ID: Dear all numpy users, I have 10 90X720 arrays. let's say they are in a list 'a' with each element a 90X720 numpy masked array. then I create a new empty ndarray: data data=np.empty([10,90,720]) ##first I store all the 10 ndarray in a 10X90X720 array: for i,d in enumerate(a): data[i]=a data.shape=(10, 90, 720) then I use data_av=np.ma.average(data, axis=0) to get the average. The strange thing is, I don't have any 'nan' in all the 10 90X720 array, but I have nan value in the final data_av. how does this come? In [26]: np.nonzero(np.isnan(data_av)) Out[26]: (array([ 0, 0, 2, 2, 3, 5, 5, 6, 6, 6, 6, 7, 8, 8, 8, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 17, 17, 19, 22, 22, 44, 63, 64, 64, 67, 68, 71, 72, 73, 76, 77, 77, 78, 79, 80, 80, 81, 82, 82, 84, 85, 85, 86, 86, 87, 87, 88, 89, 89, 89]), array([159, 541, 497, 548, 90, 97, 170, 244, 267, 587, 590, 150, 126, 168, 477, 240, 271, 277, 588, 99, 179, 528, 52, 256, 230, 109, 190, 617, 377, 389, 707, 539, 193, 361, 262, 465, 100, 232, 206, 90, 87, 93, 522, 229, 200, 482, 325, 195, 239, 228, 159, 194, thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 3 08:21:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 3 Jan 2012 06:21:04 -0700 Subject: [Numpy-discussion] polynomial package update In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 10:46 PM, wrote: > On Mon, Jan 2, 2012 at 9:44 PM, Charles R Harris > wrote: > > Hi All, > > > > I've made a pull request for a rather large update of the polynomial > > package. The new features are > > > > 1) Bug fixes > > 2) Improved documentation in the numpy reference > > 3) Preliminary support for multi-dimensional coefficient arrays > > 4) Support for NA in the fitting routines > > 5) Improved testing and test coverage > > 6) Gauss quadrature > > 7) Weight functions > > 8) (Mostly) Symmetrized companion matrices > > 9) Add cast and basis as static functions of convenience classes > > 10) Remove deprecated import from package init.py > > > > If anyone has an interest in that package, please take some time and > review > > it here. > > (Since I'm not setup for compiling numpy I cannot try it out. Just > some spotty reading of the code.) > > The two things I'm most interested in are the 2d, 3d enhancements and > the quadrature. > > What's the return of the 2d vander functions? > > If I read it correctly, it's: > > >>> xyn = np.array([['x^%d*y^%d'%(px,py) for py in range(5)] for px in > range(3)]) > >>> xyn > array([['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4'], > ['x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4'], > ['x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4']], > dtype='|S7') > >>> xyn.ravel() > array(['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4', 'x^1*y^0', > 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4', 'x^2*y^0', 'x^2*y^1', > 'x^2*y^2', 'x^2*y^3', 'x^2*y^4'], > dtype='|S7') > > Yes, that's right. > Are the normalization constants available in explicit form to get an > orthonormal basis? > No, not at the moment. I haven't quite figured out how I want to expose them but I agree that they should be available. > The test_100 look like good recipes for getting the normalization and > the integration constants. > > Yes, that works. There are also explicit formulas, but I don't know that they would work better. Some of the factors get very large, for Laguerre of degree 100 the can be up in the 10^100 range > Are the quads weights and points the same as in scipy.special (up to > floating point differences)? > > Yes, but more accurate. For instance, the scipy.special values for Gauss-Laguerre integration die at around degree 40. > Looks very useful and I'm looking forward to trying it out, and I will > borrow some code like test_100 as recipes. > (For densities, I still need mostly orthonormal basis and integration > normalized to 1.) > > Let me know what would be useful and I'll try to put it in. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Jan 3 09:42:26 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 03 Jan 2012 09:42:26 -0500 Subject: [Numpy-discussion] choose -> segfault Message-ID: I made 2 mistakes here, the 1st argument had the wrong shape, and I really wanted to use 'where', not 'choose'. But shouldn't segfault: ValueError: Need between 2 and (32) array objects (inclusive). Segmentation fault (core dumped) From robert.kern at gmail.com Tue Jan 3 09:44:37 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 3 Jan 2012 14:44:37 +0000 Subject: [Numpy-discussion] choose -> segfault In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 14:42, Neal Becker wrote: > I made 2 mistakes here, the 1st argument had the wrong shape, and I really > wanted to use 'where', not 'choose'. ?But shouldn't segfault: > > ValueError: Need between 2 and (32) array objects (inclusive). > Segmentation fault (core dumped) Can you provide an example that replicates the crash? Since it looks like you have a core dump handy, can you get a gdb backtrace to show us where the crash is? Platform details would also be handy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From ognen at enthought.com Tue Jan 3 12:46:29 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Tue, 3 Jan 2012 11:46:29 -0600 Subject: [Numpy-discussion] Enum type Message-ID: Hello, I am playing with adding an enum dtype to numpy (to get my feet wet in numpy really). I have looked at the https://github.com/martinling/numpy_quaternion and I feel comfortable with my understanding of adding a simple type to numpy in technical terms. I am mostly a C programmer and have programmed in Python but not at the level where my code wcould be considered "pretty" or maybe even "pythonic". I know enums from C and have browsed around a few python enum implementations online. Most of them use hash tables or lists to associate names to numbers - these approaches just feel "heavy" to me. What would be a proper "numpy approach" to this? I am looking mostly for direction and advice as I would like to do the work myself :-) Any input appreciated :-) Ognen From jsalvati at u.washington.edu Tue Jan 3 12:49:48 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 3 Jan 2012 09:49:48 -0800 Subject: [Numpy-discussion] nested_iters does not accept length zero nest (also doesn't have documentation) Message-ID: Hellow, while using the nested_iters function, I've noticed that it does not accept length zero nestings. For example, the following fails: nested_iters([ones(3),ones(3)], [[], [0]]) with "ValueError: If 'op_axes' or 'itershape' is not NULL in theiterator constructor, 'oa_ndim' must be greater than zero" This makes a certain amount of sense to me, but I think having the iterator with the empty axes have a single iteration would be more useful. For example, if you are using nested_iters to ally a function along a specific set of axes, you'll otherwise have to special case the case where those axes take up the whole array (which is my use case). This is not much of a hassle for me, but I thought other people might like to know. Also, I could not find any nested_iters documentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Jan 3 13:05:10 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 3 Jan 2012 19:05:10 +0100 Subject: [Numpy-discussion] strange nan in np.ma.average In-Reply-To: References: Message-ID: the problem is here, data=np.empty([10,90,720]) you should always use np.ma.empty if you want to construct a masked empty array. Chao 2012/1/3 Chao YUE > Dear all numpy users, > > I have 10 90X720 arrays. let's say they are in a list 'a' with each > element a 90X720 numpy masked array. > then I create a new empty ndarray: data > > data=np.empty([10,90,720]) > > ##first I store all the 10 ndarray in a 10X90X720 array: > for i,d in enumerate(a): > data[i]=a > > data.shape=(10, 90, 720) > then I use data_av=np.ma.average(data, axis=0) to get the average. > > > The strange thing is, I don't have any 'nan' in all the 10 90X720 array, > but I have nan value in the final data_av. > how does this come? > > > In [26]: np.nonzero(np.isnan(data_av)) > Out[26]: > (array([ 0, 0, 2, 2, 3, 5, 5, 6, 6, 6, 6, 7, 8, 8, 8, 9, 10, > 10, 10, 11, 11, 12, 13, 13, 14, 17, 17, 19, 22, 22, 44, 63, 64, 64, > 67, 68, 71, 72, 73, 76, 77, 77, 78, 79, 80, 80, 81, 82, 82, 84, 85, > 85, 86, 86, 87, 87, 88, 89, 89, 89]), > array([159, 541, 497, 548, 90, 97, 170, 244, 267, 587, 590, 150, 126, > 168, 477, 240, 271, 277, 588, 99, 179, 528, 52, 256, 230, 109, > 190, 617, 377, 389, 707, 539, 193, 361, 262, 465, 100, 232, 206, > 90, 87, 93, 522, 229, 200, 482, 325, 195, 239, 228, 159, 194, > > thanks, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Tue Jan 3 13:05:37 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 3 Jan 2012 13:05:37 -0500 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 12:46 PM, Ognen Duzlevski wrote: > Hello, > > I am playing with adding an enum dtype to numpy (to get my feet wet in > numpy really). I have looked at the > https://github.com/martinling/numpy_quaternion and I feel comfortable > with my understanding of adding a simple type to numpy in technical > terms. > > I am mostly a C programmer and have programmed in Python but not at > the level where my code wcould be considered "pretty" or maybe even > "pythonic". I know enums from C and have browsed around a few python > enum implementations online. Most of them use hash tables or lists to > associate names to numbers - these approaches just feel "heavy" to me. > > What would be a proper "numpy approach" to this? I am looking mostly > for direction and advice as I would like to do the work myself :-) > > Any input appreciated :-) > Ognen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion You should use a hash table internally in my opinion. I've started using khash from klib (https://github.com/attractivechaos/klib) which has excellent memory usage (more than 50% less than Python dict with large hash tables) and good performance characteristics. With the enum dtype you can avoid reference counting with primitive types, not sure about object dtype. If enum arrays are mutable this will be very tricky. - Wes From jim.vickroy at noaa.gov Tue Jan 3 13:06:39 2012 From: jim.vickroy at noaa.gov (Jim Vickroy) Date: Tue, 03 Jan 2012 11:06:39 -0700 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: <4F0343AF.9010103@noaa.gov> On 1/3/2012 10:46 AM, Ognen Duzlevski wrote: > Hello, > > I am playing with adding an enum dtype to numpy (to get my feet wet in > numpy really). I have looked at the > https://github.com/martinling/numpy_quaternion and I feel comfortable > with my understanding of adding a simple type to numpy in technical > terms. > > I am mostly a C programmer and have programmed in Python but not at > the level where my code wcould be considered "pretty" or maybe even > "pythonic". I know enums from C and have browsed around a few python > enum implementations online. Most of them use hash tables or lists to > associate names to numbers - these approaches just feel "heavy" to me. > > What would be a proper "numpy approach" to this? I am looking mostly > for direction and advice as I would like to do the work myself :-) > > Any input appreciated :-) > Ognen Does "enumerate" (http://docs.python.org/library/functions.html#enumerate) work for you? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Tue Jan 3 13:41:28 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 3 Jan 2012 10:41:28 -0800 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Fri, Dec 30, 2011 at 9:43 PM, Jaidev Deshpande wrote: >> Documentation is specificsly excluded from GSoC (at least it was a >> couple years ago when I last was involved) > > Documentation wasn't excluded last year from GSoC, there were quite a > few projects that required a lot of documentation. sure -- it's certanly encouraged to docuemnt code that gets written, but... > But yes, there was no "documentation only" project. exactly -- from the 2011 GSoC FAQ: 12. Are proposals for documentation work eligible for Google Summer of Code? While we greatly appreciate the value of documentation, this program is an exercise in developing code; we can't accept proposals for documentation-only work at this time. > Anyhow, it seems reasonable that testing alone can't be a project. > What about benchmarking and the related statistics? Does that qualify > as a worthwhile project (again, GSoC or otherwise)? I didn't find a specific RAQ, but from the above, I suspect that all projects must be primarily about producing code: not documenting, testing, or benchmarking. Those, of course, should all be part of code development, but not the focus. - Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From wesmckinn at gmail.com Tue Jan 3 13:46:11 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 3 Jan 2012 13:46:11 -0500 Subject: [Numpy-discussion] Enum type In-Reply-To: <4F0343AF.9010103@noaa.gov> References: <4F0343AF.9010103@noaa.gov> Message-ID: On Tue, Jan 3, 2012 at 1:06 PM, Jim Vickroy wrote: > On 1/3/2012 10:46 AM, Ognen Duzlevski wrote: >> Hello, >> >> I am playing with adding an enum dtype to numpy (to get my feet wet in >> numpy really). I have looked at the >> https://github.com/martinling/numpy_quaternion and I feel comfortable >> with my understanding of adding a simple type to numpy in technical >> terms. >> >> I am mostly a C programmer and have programmed in Python but not at >> the level where my code wcould be considered "pretty" or maybe even >> "pythonic". I know enums from C and have browsed around a few python >> enum implementations online. Most of them use hash tables or lists to >> associate names to numbers - these approaches just feel "heavy" to me. >> >> What would be a proper "numpy approach" to this? I am looking mostly >> for direction and advice as I would like to do the work myself :-) >> >> Any input appreciated :-) >> Ognen > > Does "enumerate" > (http://docs.python.org/library/functions.html#enumerate) work for you? > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion That's not exactly what he means. The R lingo for this concept is "factor" or a bit more common "categorical variable": http://stat.ethz.ch/R-manual/R-patched/library/base/html/factor.html FWIW R's factor type is implemented using hash tables. I do the same in pandas. - Wes From ognen at enthought.com Tue Jan 3 13:52:30 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Tue, 3 Jan 2012 12:52:30 -0600 Subject: [Numpy-discussion] Enum type In-Reply-To: References: <4F0343AF.9010103@noaa.gov> Message-ID: On Tue, Jan 3, 2012 at 12:46 PM, Wes McKinney wrote: > On Tue, Jan 3, 2012 at 1:06 PM, Jim Vickroy wrote: >> On 1/3/2012 10:46 AM, Ognen Duzlevski wrote: >>> Hello, >>> >>> I am playing with adding an enum dtype to numpy (to get my feet wet in >>> numpy really). I have looked at the >>> https://github.com/martinling/numpy_quaternion and I feel comfortable >>> with my understanding of adding a simple type to numpy in technical >>> terms. >>> >>> I am mostly a C programmer and have programmed in Python but not at >>> the level where my code wcould be considered "pretty" or maybe even >>> "pythonic". I know enums from C and have browsed around a few python >>> enum implementations online. Most of them use hash tables or lists to >>> associate names to numbers - these approaches just feel "heavy" to me. >>> >>> What would be a proper "numpy approach" to this? I am looking mostly >>> for direction and advice as I would like to do the work myself :-) >>> >>> Any input appreciated :-) >>> Ognen >> >> Does "enumerate" >> (http://docs.python.org/library/functions.html#enumerate) work for you? > That's not exactly what he means. The R lingo for this concept is > "factor" or a bit more common "categorical variable": > > http://stat.ethz.ch/R-manual/R-patched/library/base/html/factor.html > > FWIW R's factor type is implemented using hash tables. I do the same in pandas. > > - Wes Wes, You are right, "categorical variable" is what I am after. Thanks for the pointer, I will go the klib route you suggested and see what comes out. I may be "old fashioned" a bit in the sense that adding dependencies on external libraries is something I am reluctant to do - this is why I said using hashes may have felt a bit "heavy". But that may be my shortcoming :-) Ognen From d.s.seljebotn at astro.uio.no Tue Jan 3 14:29:44 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 03 Jan 2012 20:29:44 +0100 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: <4F035728.4060509@astro.uio.no> On 01/03/2012 06:46 PM, Ognen Duzlevski wrote: > Hello, > > I am playing with adding an enum dtype to numpy (to get my feet wet in > numpy really). I have looked at the > https://github.com/martinling/numpy_quaternion and I feel comfortable > with my understanding of adding a simple type to numpy in technical > terms. > > I am mostly a C programmer and have programmed in Python but not at > the level where my code wcould be considered "pretty" or maybe even > "pythonic". I know enums from C and have browsed around a few python > enum implementations online. Most of them use hash tables or lists to > associate names to numbers - these approaches just feel "heavy" to me. If you want the enum values to be stored efficiently (using 1, 2 or 4-byte integers), and want a mapping between string names and such integers, then you need to map between them somehow, right? I.e., when printing the repr() of each element, you at least need a list in order to go from enum values to names (and that doesn't feel 'heavy' to me -- it's the minimal possible solution for the job!) It's unclear whether you mean heavy on the CPU, in the API, in the C code, or whatever, so difficult to give more feedback. As far as the API goes, you could probably do something like: colors = np.enum(['red', 'green', 'blue']) arr = np.asarray([colors.red, colors.red, colors.red, colors.blue]) assert arr[0] == colors.red assert np.all(arr.view(np.int8) == [0, 0, 0, 2]) So the strings are only needed in the API in the constructor of the enum type. They are needed there though. Dag Sverre From nitroamos at gmail.com Tue Jan 3 14:58:13 2012 From: nitroamos at gmail.com (Amos Anderson) Date: Tue, 3 Jan 2012 11:58:13 -0800 Subject: [Numpy-discussion] numpy dgemm link error Message-ID: Hello -- I've been having some problems building numpy. The first problem I had "error: unrecognizable insn", I was able to fix by following these instructions: http://www.mail-archive.com/numpy-discussion at scipy.org/msg34238.html > Could you try the following, at line 38, to add the following: > > #define EINSUM_USE_SSE1 0 > #define EINSUM_USE_SSE2 0 But now when I compile, I get a linker error against dgemm. I've pasted the full build output below. Amos. Running from numpy source directory.non-existing path in 'numpy/distutils': 'site.cfg' usage: svnversion [OPTIONS] WC_PATH [TRAIL_URL] Produce a compact "version number" for the working copy path WC_PATH. TRAIL_URL is the trailing portion of the URL used to determine if WC_PATH itself is switched (detection of switches within WC_PATH does not rely on TRAIL_URL). The version number is written to standard output. For example: $ svnversion . /repos/svn/trunk 4168 The version number will be a single number if the working copy is single revision, unmodified, not switched and with an URL that matches the TRAIL_URL argument. If the working copy is unusual the version number will be more complex: 4123:4168 mixed revision working copy 4168M modified working copy 4123S switched working copy 4123:4168MS mixed revision, modified, switched working copy If invoked on a directory that is not a working copy, an exported directory say, the program will output "exported". Valid options: -n [--no-newline] : do not output the trailing newline -c [--committed] : last changed rather than current revisions --version : show version information F2PY Version 2 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /home/amosa/triad/tools/python/lib libraries mkl,vml,guide not found in /usr/local/lib64 libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib64 libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /home/amosa/triad/tools/python/lib libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib NOT AVAILABLE atlas_blas_info: libraries f77blas,cblas,atlas not found in /home/amosa/triad/tools/python/lib libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib customize GnuFCompiler Found executable /usr/bin/g77 gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using config compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -pthread _configtest.o -L/usr/lib64 -lf77blas -lcblas -latlas -o _configtest ATLAS version 3.7.11 built by root on Mon Jun 5 10:14:12 EDT 2006: UNAME : Linux intel1.lsf.platform.com 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux INSTFLG : MMDEF : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/gemm ARCHDEF : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/misc F2CDEFS : -DAdd__ -DStringSunStyle CACHEEDGE: 393216 F77 : /usr/bin/g77, version GNU Fortran (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) F77FLAGS : -fomit-frame-pointer -O -m64 CC : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) CC FLAGS : -fomit-frame-pointer -O3 -funroll-all-loops -m64 MCC : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) MCCFLAGS : -fomit-frame-pointer -O -m64 success! removing: _configtest.c _configtest.o _configtest FOUND: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')] FOUND: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')] usage: svnversion [OPTIONS] WC_PATH [TRAIL_URL] Produce a compact "version number" for the working copy path WC_PATH. TRAIL_URL is the trailing portion of the URL used to determine if WC_PATH itself is switched (detection of switches within WC_PATH does not rely on TRAIL_URL). The version number is written to standard output. For example: $ svnversion . /repos/svn/trunk 4168 The version number will be a single number if the working copy is single revision, unmodified, not switched and with an URL that matches the TRAIL_URL argument. If the working copy is unusual the version number will be more complex: 4123:4168 mixed revision working copy 4168M modified working copy 4123S switched working copy 4123:4168MS mixed revision, modified, switched working copy If invoked on a directory that is not a working copy, an exported directory say, the program will output "exported". Valid options: -n [--no-newline] : do not output the trailing newline -c [--committed] : last changed rather than current revisions --version : show version information lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in /home/amosa/triad/tools/python/lib libraries mkl,vml,guide not found in /usr/local/lib64 libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib64 libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /home/amosa/triad/tools/python/lib libraries lapack_atlas not found in /home/amosa/triad/tools/python/lib libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 libraries lapack_atlas not found in /usr/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: libraries f77blas,cblas,atlas not found in /home/amosa/triad/tools/python/lib libraries lapack_atlas not found in /home/amosa/triad/tools/python/lib libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/lib64 numpy.distutils.system_info.atlas_info FOUND: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib64'] language = f77 define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')] FOUND: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib64'] language = f77 define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')] running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building py_modules sources building library "npymath" sources customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function 'exp' gcc -pthread _configtest.o -o _configtest _configtest.o(.text+0x5): In function `main': /home/amosa/triad/tools/numpy/numpy-1.6.1/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status _configtest.o(.text+0x5): In function `main': /home/amosa/triad/tools/numpy/numpy-1.6.1/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function 'exp' gcc -pthread _configtest.o -lm -o _configtest success! removing: _configtest.c _configtest.o _configtest building extension "numpy.core._sort" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'] building extension "numpy.core.multiarray" sources conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/arraytypes.c conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/nditer.c conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/lowlevel_strided_loops.c conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/einsum.c adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'] building extension "numpy.core.umath" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/src/umath' to include_dirs. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/src/umath/funcs.inc', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'] building extension "numpy.core.scalarmath" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'] building extension "numpy.core._dotblas" sources adding 'numpy/core/blasdot/_dotblas.c' to sources. building extension "numpy.core.umath_tests" sources building extension "numpy.core.multiarray_tests" sources building extension "numpy.lib._compiled_base" sources building extension "numpy.numarray._capi" sources building extension "numpy.fft.fftpack_lite" sources building extension "numpy.linalg.lapack_lite" sources adding 'numpy/linalg/lapack_litemodule.c' to sources. adding 'numpy/linalg/python_xerbla.c' to sources. building extension "numpy.random.mtrand" sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest _configtest failure. removing: _configtest.c _configtest.o _configtest building data_files sources build_src: building npy-pkg config files running build_py copying numpy/version.py -> build/lib.linux-x86_64-2.7/numpy copying build/src.linux-x86_64-2.7/numpy/__config__.py -> build/lib.linux-x86_64-2.7/numpy copying build/src.linux-x86_64-2.7/numpy/distutils/__config__.py -> build/lib.linux-x86_64-2.7/numpy/distutils running build_clib customize UnixCCompiler customize UnixCCompiler using build_clib running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using build_ext building 'numpy.core.multiarray' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:42: numpy/core/src/multiarray/einsum.c.src:39:1: warning: "EINSUM_USE_SSE1" redefined numpy/core/src/multiarray/einsum.c.src:25:1: warning: this is the location of the previous definition numpy/core/src/multiarray/einsum.c.src:40:1: warning: "EINSUM_USE_SSE2" redefined numpy/core/src/multiarray/einsum.c.src:34:1: warning: this is the location of the previous definition numpy/core/src/multiarray/descriptor.c: In function `_convert_divisor_to_multiple': numpy/core/src/multiarray/descriptor.c:606: warning: 'q' might be used uninitialized in this function numpy/core/src/multiarray/nditer.c.src: In function `npyiter_allocate_arrays': numpy/core/src/multiarray/nditer.c.src:4923: warning: 'innershape' might be used uninitialized in this function numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level: numpy/core/src/multiarray/scalartypes.c.src:2550: warning: 'longlong_arrtype_hash' defined but not used numpy/core/src/multiarray/mapping.c:75: warning: '_array_ass_item' defined but not used build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h:227: warning: '_import_umath' defined but not used gcc -pthread -shared build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o -L/state/partition1/home/amosa/triad/tools/python/lib -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/multiarray.so building 'numpy.core.umath' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.7/numpy/core/src/umath compile options: '-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: numpy/core/src/umath/umathmodule_onefile.c numpy/core/include/numpy/npy_3kcompat.h:392: warning: 'simple_capsule_dtor' defined but not used numpy/core/src/private/lowlevel_strided_loops.h:36: warning: 'PyArray_FreeStridedTransferData' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:43: warning: 'PyArray_CopyStridedTransferData' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:64: warning: 'PyArray_GetStridedCopyFn' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:78: warning: 'PyArray_GetStridedCopySwapFn' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:92: warning: 'PyArray_GetStridedCopySwapPairFn' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:109: warning: 'PyArray_GetStridedZeroPadCopyFn' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:120: warning: 'PyArray_GetStridedNumericCastFn' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:174: warning: 'PyArray_GetDTypeTransferFunction' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:227: warning: 'PyArray_TransferNDimToStrided' declared `static' but never defined numpy/core/src/private/lowlevel_strided_loops.h:237: warning: 'PyArray_TransferStridedToNDim' declared `static' but never defined gcc -pthread -shared build/temp.linux-x86_64-2.7/numpy/core/src/umath/umathmodule_onefile.o -L/state/partition1/home/amosa/triad/tools/python/lib -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/umath.so building 'numpy.core.scalarmath' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: build/src.linux-x86_64-2.7/numpy/core/src/scalarmathmodule.c numpy/core/src/scalarmathmodule.c.src:1054: warning: function declaration isn't a prototype numpy/core/include/numpy/npy_3kcompat.h:392: warning: 'simple_capsule_dtor' defined but not used gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/scalarmathmodule.o -L/state/partition1/home/amosa/triad/tools/python/lib -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/scalarmath.so building 'numpy.core._dotblas' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.7/numpy/core/blasdot compile options: '-DATLAS_INFO="\"3.7.11\"" -Inumpy/core/blasdot -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: numpy/core/blasdot/_dotblas.c numpy/core/blasdot/_dotblas.c: In function `dotblas_matrixproduct': numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct pointer types lacks a cast numpy/core/blasdot/_dotblas.c:257: warning: passing arg 3 of pointer to function from incompatible pointer type numpy/core/blasdot/_dotblas.c:292: warning: passing arg 3 of pointer to function from incompatible pointer type gcc -pthread -shared build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -L/usr/lib64 -L/state/partition1/home/amosa/triad/tools/python/lib -Lbuild/temp.linux-x86_64-2.7 -lf77blas -lcblas -latlas -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so /usr/bin/ld: /usr/lib64/libcblas.a(cblas_dgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib64/libcblas.a: could not read symbols: Bad value collect2: ld returned 1 exit status /usr/bin/ld: /usr/lib64/libcblas.a(cblas_dgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib64/libcblas.a: could not read symbols: Bad value collect2: ld returned 1 exit status error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -L/usr/lib64 -L/state/partition1/home/amosa/triad/tools/python/lib -Lbuild/temp.linux-x86_64-2.7 -lf77blas -lcblas -latlas -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so" failed with exit status 1 From njs at pobox.com Tue Jan 3 15:02:24 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 3 Jan 2012 12:02:24 -0800 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski wrote: > Hello, > > I am playing with adding an enum dtype to numpy (to get my feet wet in > numpy really). I have looked at the > https://github.com/martinling/numpy_quaternion and I feel comfortable > with my understanding of adding a simple type to numpy in technical > terms. Hi Ognen, I'm in the middle of an intercontinental move, so I can't help much, but I'd also love to see a proper enum/categorical type in numpy, so here are a few notes: - I wrote a simple cython implementation of this last year, which might be useful -- code attached. - The barrier I ran into, which you'll surely run into as well, is a flaw in the ufunc API in numpy. Currently, ufunc inner loops do not have any way to access the dtype of the array they are being called on. For most dtypes, this isn't an issue -- the inner loop for adding together int32's knows that it is being called on an array of int32's, it doesn't need to see the dtype to figure that out. But with enums, each array has a different set of possible categories, and these will be attached to the dtype object somehow. So if you want to do, say, equality comparison between an enum-array and a string-array: np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True, False, True]) ...you can't actually make this work in current numpy. The solution is that the ufunc API needs to be changed to make dtype's somehow available to inner loops. (Probably by passing a pointer to the array object, like all the PyArray_ArrFuncs do.) See this thread: http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html - Both the statistical folk (pandas, statsmodels) and the hdf5 folk (pytables, h5py) have reasons to want better enum support. (Maybe there are other use cases too -- anyone I'm forgetting?) You should make sure to talk to both groups to make sure what you come up with will work for them. Cheers, -- Nathaniel > I am mostly a C programmer and have programmed in Python but not at > the level where my code wcould be considered "pretty" or maybe even > "pythonic". I know enums from C and have browsed around a few python > enum implementations online. Most of them use hash tables or lists to > associate names to numbers - these approaches just feel "heavy" to me. > > What would be a proper "numpy approach" to this? I am looking mostly > for direction and advice as I would like to do the work myself :-) > > Any input appreciated :-) > Ognen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: npenum.pyx Type: application/octet-stream Size: 12481 bytes Desc: not available URL: From ognen at enthought.com Tue Jan 3 17:34:42 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Tue, 3 Jan 2012 16:34:42 -0600 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: Nathaniel, On Tue, Jan 3, 2012 at 2:02 PM, Nathaniel Smith wrote: > On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski wrote: >> Hello, >> >> I am playing with adding an enum dtype to numpy (to get my feet wet in >> numpy really). I have looked at the >> https://github.com/martinling/numpy_quaternion and I feel comfortable >> with my understanding of adding a simple type to numpy in technical >> terms. > > Hi Ognen, > > I'm in the middle of an intercontinental move, so I can't help much, > but I'd also love to see a proper enum/categorical type in numpy, so > here are a few notes: > > - I wrote a simple cython implementation of this last year, which > might be useful -- code attached. > > - The barrier I ran into, which you'll surely run into as well, is a > flaw in the ufunc API in numpy. Currently, ufunc inner loops do not > have any way to access the dtype of the array they are being called > on. For most dtypes, this isn't an issue -- the inner loop for adding > together int32's knows that it is being called on an array of int32's, > it doesn't need to see the dtype to figure that out. But with enums, > each array has a different set of possible categories, and these will > be attached to the dtype object somehow. So if you want to do, say, > equality comparison between an enum-array and a string-array: > ?np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True, > False, True]) > ...you can't actually make this work in current numpy. The solution is > that the ufunc API needs to be changed to make dtype's somehow > available to inner loops. (Probably by passing a pointer to the array > object, like all the PyArray_ArrFuncs do.) > > See this thread: > http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html > > - Both the statistical folk (pandas, statsmodels) and the hdf5 folk > (pytables, h5py) have reasons to want better enum support. (Maybe > there are other use cases too -- anyone I'm forgetting?) You should > make sure to talk to both groups to make sure what you come up with > will work for them. > > Cheers, > -- Nathaniel Thanks! The above input is exactly what I was looking for (in addition to my original question). This "corner case" knowledge is good to have ;) Ognen From wonjunchoi001 at gmail.com Tue Jan 3 22:56:52 2012 From: wonjunchoi001 at gmail.com (Wonjun, Choi) Date: Tue, 3 Jan 2012 19:56:52 -0800 (PST) Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? Message-ID: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> hello, what is the best way to pass c, c++ array to numpy in cython? or what is the best way to pass fortran multi-dimensional array to numpy in cython? Wonjun, Choi From questions.anon at gmail.com Tue Jan 3 23:10:43 2012 From: questions.anon at gmail.com (questions anon) Date: Wed, 4 Jan 2012 15:10:43 +1100 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: References: Message-ID: Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. There shouldn't be any issues with the shape. The latitude and longitude are the same shape always (covering a state) and the temperature (TSFC) data are hourly for a whole month. Are there any other ideas for finding the location and time of the maximum value in an array? Thanks On Wed, Dec 21, 2011 at 3:38 PM, Benjamin Root wrote: > > > On Tuesday, December 20, 2011, questions anon > wrote: > > ok thanks, a quick try at using it resulted in: > > IndexError: index out of bounds > > but I may need to do abit more investigating to understand how it works. > > thanks > > The assumption is that these arrays are all the same shape. If not, then > extra work is needed to figure out how to map indices of the temperature > array to the indices of the lat and Lon arrays. > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Wed Jan 4 01:07:31 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 4 Jan 2012 00:07:31 -0600 Subject: [Numpy-discussion] Enum type In-Reply-To: References: Message-ID: A categorical type (or enum type) is an important dtype to add to NumPy. It would be very nice if the option existed to make the categorical dtype "dynamic" in that the categories can grow as more data is added or inserted into the array. This would effectively allow binning of data on insertion into the array. The option would need to exist to have both "fixed" and "dynamic" dtypes because there are important use-cases for both. -Travis On Jan 3, 2012, at 2:02 PM, Nathaniel Smith wrote: > On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski wrote: >> Hello, >> >> I am playing with adding an enum dtype to numpy (to get my feet wet in >> numpy really). I have looked at the >> https://github.com/martinling/numpy_quaternion and I feel comfortable >> with my understanding of adding a simple type to numpy in technical >> terms. > > Hi Ognen, > > I'm in the middle of an intercontinental move, so I can't help much, > but I'd also love to see a proper enum/categorical type in numpy, so > here are a few notes: > > - I wrote a simple cython implementation of this last year, which > might be useful -- code attached. > > - The barrier I ran into, which you'll surely run into as well, is a > flaw in the ufunc API in numpy. Currently, ufunc inner loops do not > have any way to access the dtype of the array they are being called > on. For most dtypes, this isn't an issue -- the inner loop for adding > together int32's knows that it is being called on an array of int32's, > it doesn't need to see the dtype to figure that out. But with enums, > each array has a different set of possible categories, and these will > be attached to the dtype object somehow. So if you want to do, say, > equality comparison between an enum-array and a string-array: > np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True, > False, True]) > ...you can't actually make this work in current numpy. The solution is > that the ufunc API needs to be changed to make dtype's somehow > available to inner loops. (Probably by passing a pointer to the array > object, like all the PyArray_ArrFuncs do.) > > See this thread: > http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html > > - Both the statistical folk (pandas, statsmodels) and the hdf5 folk > (pytables, h5py) have reasons to want better enum support. (Maybe > there are other use cases too -- anyone I'm forgetting?) You should > make sure to talk to both groups to make sure what you come up with > will work for them. > > Cheers, > -- Nathaniel > >> I am mostly a C programmer and have programmed in Python but not at >> the level where my code wcould be considered "pretty" or maybe even >> "pythonic". I know enums from C and have browsed around a few python >> enum implementations online. Most of them use hash tables or lists to >> associate names to numbers - these approaches just feel "heavy" to me. >> >> What would be a proper "numpy approach" to this? I am looking mostly >> for direction and advice as I would like to do the work myself :-) >> >> Any input appreciated :-) >> Ognen >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From dhruvkaran at gmail.com Wed Jan 4 01:25:10 2012 From: dhruvkaran at gmail.com (Dhruvkaran Mehta) Date: Tue, 3 Jan 2012 22:25:10 -0800 Subject: [Numpy-discussion] SParse feature vector generation Message-ID: Hi numpy users, *Is there a convenient way in numpy to go from "string" features like:* "uc_berkeley", "google", 1 "stanford", "intel", 1 . . . "uiuc", "texas_instruments", 0 *to a numpy matrix like:* "uc_berkeley", "stanford", ..., "uiuc", "google", "intel", "texas_instruments", "bool" 1 0 ... 0 1 0 0 1 0 1 ... 0 0 1 0 1 : 0 0 ... 1 0 0 1 0 I really appreciate you taking the time to help! Thanks! --Dhruv -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jan 4 01:28:45 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 4 Jan 2012 07:28:45 +0100 Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? In-Reply-To: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> Message-ID: <20120104062845.GB22809@phare.normalesup.org> On Tue, Jan 03, 2012 at 07:56:52PM -0800, Wonjun, Choi wrote: > what is the best way to pass c, c++ array to numpy in cython? I don't know if it is the best way, but I wrote a self-contained example a little while ago, to explain to people one way of doing it: http://gael-varoquaux.info/blog/?p=157 For multidimensional arrays, all you have to do is to pass in the full shape and number of dimensions in the call to PyArray_SimpleNewFromData. Hope this helps, Gael From xantares09 at hotmail.com Wed Jan 4 05:22:39 2012 From: xantares09 at hotmail.com (xantares 09) Date: Wed, 4 Jan 2012 10:22:39 +0000 Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion In-Reply-To: References: , , , Message-ID: > From: wesmckinn at gmail.com > Date: Sat, 24 Dec 2011 19:51:06 -0500 > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion > > On Sat, Dec 24, 2011 at 3:11 AM, xantares 09 wrote: > > > > > >> From: wesmckinn at gmail.com > >> Date: Fri, 23 Dec 2011 12:31:45 -0500 > >> To: numpy-discussion at scipy.org > >> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion > > > >> > >> On Fri, Dec 23, 2011 at 4:37 AM, xantares 09 > >> wrote: > >> > Hi, > >> > > >> > I'm using Numpy from the C python api side while tweaking my SWIG > >> > interface > >> > to work with numpy array types. > >> > I want to convert a numpy array of integers (whose elements are numpy's > >> > 'int64') > >> > The problem is that it this int64 type is not compatible with the > >> > standard > >> > python integer type: > >> > I cannot use PyInt_Check, and PyInt_AsUnsignedLongMask to check and > >> > convert > >> > from int64: basically PyInt_Check returns false. > >> > I checked the numpy config header and npy_int64 does have a size of 8o, > >> > which should be the same as int on my x86_64. > >> > What is the correct way to do that ? > >> > I checked for a Int64_Check function and didn't find any in numpy > >> > headers. > >> > > >> > Regards, > >> > > >> > x. > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > >> hello, > >> > >> I think you'll want to use the C macro PyArray_IsIntegerScalar, e.g. > >> in pandas I have the following function exposed to my Cython code: > >> > >> PANDAS_INLINE int > >> is_integer_object(PyObject* obj) { > >> return PyArray_IsIntegerScalar(obj); > >> } > >> > >> last time I checked that macro detects Python int, long, and all of > >> the NumPy integer hierarchy (int8, 16, 32, 64). If you ONLY want to > >> check for int64 I am not 100% sure the best way. > >> > >> - Wes > > > > Hi, > > > > Thank you for your reply ! > > > > That's the thing : I want to check/convert every type of integer, numpy's > > int64 and also python standard ints. > > Is there a way to avoid to use only the python api ? ( and avoid to depend > > on numpy's PyArray_* functions ) > > > > Regards. > > > > x. > > > > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > No. All of the PyTypeObject objects for the NumPy array scalars are > explicitly part of the NumPy C API so you have no choice but to depend > on that (to get the best performance). If you want to ONLY check for > int64 at the C API level, I did a bit of digging and the relevant type > definitions are in > > https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h > > so you'll want to do: > > int is_int64(PyObject* obj){ > return PyObject_TypeCheck(obj, &PyInt64ArrType_Type); > } > > and that will *only* detect np.int64 > > - Wes Ok many thanks ! One last thing, do you happen to know how to actually convert an np int64 to a C int ? - x. -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Wed Jan 4 06:29:36 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 4 Jan 2012 12:29:36 +0100 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: References: Message-ID: On 04.01.2012, at 5:10AM, questions anon wrote: > Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. > There shouldn't be any issues with the shape. The latitude and longitude are the same shape always (covering a state) and the temperature (TSFC) data are hourly for a whole month. There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == LON.shape One needs more information on the structure of these data to say anything definite, but if e.g. your TSFC data have a time and a location dimension, argmax will per default return the index for the flattened array (see the argmax documentation for details, and how to use the axis keyword to get a different output). This might be the very large value you mention, and if your location data have fewer dimensions, the index will easily be out of range. As Ben wrote, you'd need extra work to find the maximum location, depending on what maximum you are actually looking for. As a speculative example, let's assume you have the temperature data in an array(ntime, nloc) and the position data in array(nloc). Then TSFC.argmax(axis=1) would give you the index for the hottest place for each hour of the month (i.e. actually an array of ntime indices, and pointer to so many different locations). To locate the maximum temperature for the entire month, your best way would probably be to first extract the array of (monthly) maximum temperatures in each location as tmax = TSFC.max(axis=0) which would have (in this example) the shape (nloc,), so you could directly use it to index LAT[tmax.argmax()] etc. Cheers, Derek From wonjunchoi001 at gmail.com Wed Jan 4 21:26:05 2012 From: wonjunchoi001 at gmail.com (=?EUC-KR?B?w9a/+MHY?=) Date: Thu, 5 Jan 2012 11:26:05 +0900 Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? In-Reply-To: <20120104062845.GB22809@phare.normalesup.org> References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> <20120104062845.GB22809@phare.normalesup.org> Message-ID: it seems like you recommend below way. Cython example of exposing C-computed arrays in Python without data copies https://gist.github.com/1249305 but it uses malloc. isn't it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Jan 5 02:03:38 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 5 Jan 2012 08:03:38 +0100 Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? In-Reply-To: References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> <20120104062845.GB22809@phare.normalesup.org> Message-ID: <20120105070338.GD21804@phare.normalesup.org> On Thu, Jan 05, 2012 at 11:26:05AM +0900, ??? wrote: > it seems like you recommend below way. > Cython example of exposing C-computed arrays in Python without data copies > [1]https://gist.github.com/1249305 > but it uses malloc. isn't it? In this example, the data can be allocated the way you want in C. Malloc is just an implementation detail, you can write the code you want instead. Gael From wonjunchoi001 at gmail.com Thu Jan 5 02:04:52 2012 From: wonjunchoi001 at gmail.com (=?EUC-KR?B?w9a/+MHY?=) Date: Thu, 5 Jan 2012 16:04:52 +0900 Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? In-Reply-To: <20120105070338.GD21804@phare.normalesup.org> References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> <20120104062845.GB22809@phare.normalesup.org> <20120105070338.GD21804@phare.normalesup.org> Message-ID: can I pass the array without malloc? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Jan 5 04:08:44 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 5 Jan 2012 10:08:44 +0100 Subject: [Numpy-discussion] what is the best way to pass c, c++ array to numpy in cython? In-Reply-To: References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com> <20120104062845.GB22809@phare.normalesup.org> <20120105070338.GD21804@phare.normalesup.org> Message-ID: <20120105090844.GA17920@phare.normalesup.org> On Thu, Jan 05, 2012 at 04:04:52PM +0900, ??? wrote: > can I pass the array without malloc? An array is a pointer in C, so yes you can do what you want. G From barthelemy at crans.org Thu Jan 5 09:02:30 2012 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9l=E9my?=) Date: Thu, 5 Jan 2012 15:02:30 +0100 Subject: [Numpy-discussion] Error in numpy.load example? Message-ID: Hi all, the doc http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html contains the following example: Store compressed data to disk, and load it again: >>> np.savez('/tmp/123.npz', a=np.array([[1, 2, 3], [4, 5, 6]]), b=np.array([1, 2])) >>> data = np.load('/tmp/123.npy') However http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html says: numpy.savez(file, *args, **kwds)? Save several arrays into a single file in uncompressed .npz format. Moreover, this last page points to an undocumented numpy.savez_compressed function, which is also non-existent in my version of numpy (1.5.1-2ubuntu2). That's quite confusing. >From the following thread, it seems the arrays are stored uncompressed in a zipfile. http://article.gmane.org/gmane.comp.python.numeric.general/38378 Can somebody confirm that/fix the docs? Cheers -- S?bastien From jsalvati at u.washington.edu Thu Jan 5 14:22:50 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 5 Jan 2012 11:22:50 -0800 Subject: [Numpy-discussion] "Symbol table not found" compiling numpy from git repository on Windows Message-ID: Hello, I'm trying to compile numpy on Windows 7 using the command: "python setup.py config --compiler=mingw32 build" but I get an error about a symbol table not found. Anyone know how to work around this or what to look into? building library "npymath" sources Building msvcr library: "C:\Python26\libs\libmsvcr90.a" (from C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll) objdump.exe: C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll: File format not recognized Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Python26\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Python26\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Python26\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 646, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "C:\Python26\lib\distutils\command\config.py", line 257, in try_link self._check_compiler() File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\config.py", line 45, in _check_compiler old_config._check_compiler(self) File "C:\Python26\lib\distutils\command\config.py", line 107, in _check_compiler dry_run=self.dry_run, force=1) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\ccompiler.py", line 560, in new_compiler compiler = klass(None, dry_run, force) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py", line 94, in __init__ msvcr_success = build_msvcr_library() File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py", line 362, in build_msvcr_library generate_def(dll_file, def_file) File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py", line 282, in generate_def raise ValueError("Symbol table not found") ValueError: Symbol table not found Thank you, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Fri Jan 6 04:52:41 2012 From: schut at sarvision.nl (Vincent Schut) Date: Fri, 6 Jan 2012 09:52:41 +0000 (UTC) Subject: [Numpy-discussion] find location of maximum values References: Message-ID: On Wed, 04 Jan 2012 12:29:36 +0100, Derek Homeier wrote: > On 04.01.2012, at 5:10AM, questions anon wrote: > >> Thanks for your responses but I am still having difficuties with this >> problem. Using argmax gives me one very large value and I am not sure >> what it is. it is the index in the flattened array. To translate this into a multidimensional index, use numpy.unravel_index(i, original_shape). Cheers, Vincent. From dkoepfer at gmx.de Fri Jan 6 07:15:22 2012 From: dkoepfer at gmx.de (=?iso-8859-1?Q?=22David_K=F6pfer=22?=) Date: Fri, 06 Jan 2012 13:15:22 +0100 Subject: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method In-Reply-To: References: Message-ID: <20120106121522.203240@gmx.net> Dear numpy community, I'm trying to create an array of type object. A = empty(9, dtype=object) A[ array(0,1,2) ] = MyObject(1) A[ array(3,4,5) ] = MyObject(2) A[ array(6,7,8) ] = MyObject(3) This has worked well until MyObject has gotten an __getitem__ method. Now python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], [1] to MyObject(1)[1] and so on. Is there any way to just get a reference of the instance of MyObject into every entry of the array slice? Thank you for any help on this problem David From ralf.gommers at googlemail.com Fri Jan 6 10:15:09 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 6 Jan 2012 16:15:09 +0100 Subject: [Numpy-discussion] numpy 1.7.0 release? In-Reply-To: References: Message-ID: On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers wrote: > > > On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi Ralf, >> >> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> Hi all, >>> >>> It's been a little over 6 months since the release of 1.6.0 and the NA >>> debate has quieted down, so I'd like to ask your opinion on the timing of >>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small >>> improvements, plus three larger chucks of work: >>> >>> - datetime >>> - NA >>> - Bento support >>> >>> My impression is that both datetime and NA are releasable, but should be >>> labeled "tech preview" or something similar, because they may still see >>> significant changes. Please correct me if I'm wrong. >>> >>> There's still some maintenance work to do and pull requests to merge, >>> but a beta release by Christmas should be feasible. What do you all think? >>> >>> >> I'm now thinking that is too optimistic. There are a fair number of >> tickets that need to be looked at, including some for einsum and the >> iterator, and I think the number of pull requests needs to be reduced. How >> about sometime in the beginning of January? >> >> > Yes, it certainly was. Besides the tickets and pull requests, we also need > the support for MinGW 4.x that David is looking at. If that goes smoothly > then the first week of January may be feasible, otherwise it'll have to be > February (I'm traveling for most of Jan). Or someone else has to volunteer > to be the release manager for this release. > There isn't really much progress here. Besides a few smaller issues that still need attention, I think the MinGW 4.x issue is a blocker and needs to be resolved. This can be done either by making it work, or deciding to stick with 3.x. In the latter case numpy.datetime should be fixed somehow. For the next three weeks I'm traveling and won't be able to do any work on numpy. I propose to keep master in a state that's (close to being) releasable until the blocker issue is resolved and we can create a 1.7.x branch. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Jan 6 16:00:00 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 6 Jan 2012 16:00:00 -0500 Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion In-Reply-To: References: Message-ID: On Wed, Jan 4, 2012 at 5:22 AM, xantares 09 wrote: > > >> From: wesmckinn at gmail.com >> Date: Sat, 24 Dec 2011 19:51:06 -0500 > >> To: numpy-discussion at scipy.org >> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion >> >> On Sat, Dec 24, 2011 at 3:11 AM, xantares 09 >> wrote: >> > >> > >> >> From: wesmckinn at gmail.com >> >> Date: Fri, 23 Dec 2011 12:31:45 -0500 >> >> To: numpy-discussion at scipy.org >> >> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion >> > >> >> >> >> On Fri, Dec 23, 2011 at 4:37 AM, xantares 09 >> >> wrote: >> >> > Hi, >> >> > >> >> > I'm using Numpy from the C python api side while tweaking my SWIG >> >> > interface >> >> > to work with numpy array types. >> >> > I want to convert a numpy array of integers (whose elements are >> >> > numpy's >> >> > 'int64') >> >> > The problem is that it this int64 type is not compatible with the >> >> > standard >> >> > python integer type: >> >> > I cannot use PyInt_Check, and PyInt_AsUnsignedLongMask to check and >> >> > convert >> >> > from int64: basically PyInt_Check returns false. >> >> > I checked the numpy config header and npy_int64 does have a size of >> >> > 8o, >> >> > which should be the same as int on my x86_64. >> >> > What is the correct way to do that ? >> >> > I checked for a Int64_Check function and didn't find any in numpy >> >> > headers. >> >> > >> >> > Regards, >> >> > >> >> > x. >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> >> >> hello, >> >> >> >> I think you'll want to use the C macro PyArray_IsIntegerScalar, e.g. >> >> in pandas I have the following function exposed to my Cython code: >> >> >> >> PANDAS_INLINE int >> >> is_integer_object(PyObject* obj) { >> >> return PyArray_IsIntegerScalar(obj); >> >> } >> >> >> >> last time I checked that macro detects Python int, long, and all of >> >> the NumPy integer hierarchy (int8, 16, 32, 64). If you ONLY want to >> >> check for int64 I am not 100% sure the best way. >> >> >> >> - Wes >> > >> > Hi, >> > >> > Thank you for your reply ! >> > >> > That's the thing : I want to check/convert every type of integer, >> > numpy's >> > int64 and also python standard ints. >> > Is there a way to avoid to use only the python api ? ( and avoid to >> > depend >> > on numpy's PyArray_* functions ) >> > >> > Regards. >> > >> > x. >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> No. All of the PyTypeObject objects for the NumPy array scalars are >> explicitly part of the NumPy C API so you have no choice but to depend >> on that (to get the best performance). If you want to ONLY check for >> int64 at the C API level, I did a bit of digging and the relevant type >> definitions are in >> >> >> https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h >> >> so you'll want to do: >> >> int is_int64(PyObject* obj){ >> return PyObject_TypeCheck(obj, &PyInt64ArrType_Type); >> } >> >> and that will *only* detect np.int64 >> >> - Wes > > Ok many thanks ! > > One last thing, do you happen to know how to actually convert an np int64 to > a C int ? > > - x. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Not sure off-hand. You'll have to look at the NumPy scalar API in the C code From travis at continuum.io Fri Jan 6 16:31:45 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 6 Jan 2012 15:31:45 -0600 Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion In-Reply-To: References: Message-ID: >>> >>> No. All of the PyTypeObject objects for the NumPy array scalars are >>> explicitly part of the NumPy C API so you have no choice but to depend >>> on that (to get the best performance). If you want to ONLY check for >>> int64 at the C API level, I did a bit of digging and the relevant type >>> definitions are in >>> >>> >>> https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h >>> >>> so you'll want to do: >>> >>> int is_int64(PyObject* obj){ >>> return PyObject_TypeCheck(obj, &PyInt64ArrType_Type); >>> } >>> >>> and that will *only* detect np.int64 >>> >>> - Wes >> >> Ok many thanks ! >> >> One last thing, do you happen to know how to actually convert an np int64 to >> a C int ? >> >> - x. > > Not sure off-hand. You'll have to look at the NumPy scalar API in the C code What is it you want to do? Do you want to get the C int out of the np.int64 *Python* object? If so, you do: npy_int64 val PyArray_ScalarAsCtype(obj, &val); If you want to get the C int of a *different* type out of the scalar Python object, you do: npy_int32 val PyArray_Descr * outcode = PyArray_DescrFromType(NPY_INT32); PyArray_CastScalarToCtype(obj, &val, outcode); Py_DECREF(outcode); -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Fri Jan 6 20:45:31 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 6 Jan 2012 19:45:31 -0600 Subject: [Numpy-discussion] numpy 1.7.0 release? In-Reply-To: References: Message-ID: On Fri, Jan 6, 2012 at 9:15 AM, Ralf Gommers wrote: > > > On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers > wrote: >> >> >> >> On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris >> wrote: >>> >>> Hi Ralf, >>> >>> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers >>> wrote: >>>> >>>> Hi all, >>>> >>>> It's been a little over 6 months since the release of 1.6.0 and the NA >>>> debate has quieted down, so I'd like to ask your opinion on the timing of >>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small >>>> improvements, plus three larger chucks of work: >>>> >>>> - datetime >>>> - NA >>>> - Bento support >>>> >>>> My impression is that both datetime and NA are releasable, but should be >>>> labeled "tech preview" or something similar, because they may still see >>>> significant changes. Please correct me if I'm wrong. >>>> >>>> There's still some maintenance work to do and pull requests to merge, >>>> but a beta release by Christmas should be feasible. What do you all think? >>>> >>> >>> I'm now thinking that is too optimistic. There are a fair number of >>> tickets that need to be looked at, including some for einsum and the >>> iterator, and I think the number of pull requests needs to be reduced. How >>> about sometime in the beginning of January? >>> >> >> Yes, it certainly was. Besides the tickets and pull requests, we also need >> the support for MinGW 4.x that David is looking at. If that goes smoothly >> then the first week of January may be feasible, otherwise it'll have to be >> February (I'm traveling for most of Jan). Or someone else has to volunteer >> to be the release manager for this release. > > > There isn't really much progress here. Besides a few smaller issues that > still need attention, I think the MinGW 4.x issue is a blocker and needs to > be resolved. This can be done either by making it work, or deciding to stick > with 3.x. In the latter case numpy.datetime should be fixed somehow. > > For the next three weeks I'm traveling and won't be able to do any work on > numpy. I propose to keep master in a state that's (close to being) > releasable until the blocker issue is resolved and we can create a 1.7.x > branch. > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I think that my ticket 1973 (http://projects.scipy.org/numpy/ticket/1973) "Can not display a masked array containing np.NA values even if masked" that is due to the astype function not handling the NA object is also a blocker. Bruce From ralf.gommers at googlemail.com Sat Jan 7 03:11:15 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 7 Jan 2012 09:11:15 +0100 Subject: [Numpy-discussion] numpy 1.7.0 release? In-Reply-To: References: Message-ID: On Sat, Jan 7, 2012 at 2:45 AM, Bruce Southey wrote: > On Fri, Jan 6, 2012 at 9:15 AM, Ralf Gommers > wrote: > > > > > > On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> > >> > >> On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris > >> wrote: > >>> > >>> Hi Ralf, > >>> > >>> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers > >>> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> It's been a little over 6 months since the release of 1.6.0 and the NA > >>>> debate has quieted down, so I'd like to ask your opinion on the > timing of > >>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and > small > >>>> improvements, plus three larger chucks of work: > >>>> > >>>> - datetime > >>>> - NA > >>>> - Bento support > >>>> > >>>> My impression is that both datetime and NA are releasable, but should > be > >>>> labeled "tech preview" or something similar, because they may still > see > >>>> significant changes. Please correct me if I'm wrong. > >>>> > >>>> There's still some maintenance work to do and pull requests to merge, > >>>> but a beta release by Christmas should be feasible. What do you all > think? > >>>> > >>> > >>> I'm now thinking that is too optimistic. There are a fair number of > >>> tickets that need to be looked at, including some for einsum and the > >>> iterator, and I think the number of pull requests needs to be reduced. > How > >>> about sometime in the beginning of January? > >>> > >> > >> Yes, it certainly was. Besides the tickets and pull requests, we also > need > >> the support for MinGW 4.x that David is looking at. If that goes > smoothly > >> then the first week of January may be feasible, otherwise it'll have to > be > >> February (I'm traveling for most of Jan). Or someone else has to > volunteer > >> to be the release manager for this release. > > > > > > There isn't really much progress here. Besides a few smaller issues that > > still need attention, I think the MinGW 4.x issue is a blocker and needs > to > > be resolved. This can be done either by making it work, or deciding to > stick > > with 3.x. In the latter case numpy.datetime should be fixed somehow. > > > > For the next three weeks I'm traveling and won't be able to do any work > on > > numpy. I propose to keep master in a state that's (close to being) > > releasable until the blocker issue is resolved and we can create a 1.7.x > > branch. > > > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > I think that my ticket 1973 > (http://projects.scipy.org/numpy/ticket/1973) "Can not display a > masked array containing np.NA values even if masked" that is due to > the astype function not handling the NA object is also a blocker. > > I've set it to Milestone 1.7.0. This should be done for all tickets that are important for this release, so we can keep track of it. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Sun Jan 8 05:47:26 2012 From: faltet at gmail.com (Francesc Alted) Date: Sun, 8 Jan 2012 11:47:26 +0100 Subject: [Numpy-discussion] ANN: Numexpr 2.0 released In-Reply-To: References: <201111271400.48560.faltet@pytables.org> Message-ID: Hi srean, Sorry for being late answering, the latest weeks have been really crazy for me. See my comments below. 2011/12/13 srean : > This is great news, I hope this gets included in the epd distribution soon. > > I had mailed a few questions about numexpr sometime ago. I am still > curious about those. I have included the relevant parts below. In > addition, I have another question. There was a numexpr branch that > allows a "out=blah" parameer to build the output in place, has that > been merged or its functionality incorporated ? Yes, the `out` parameter is fully supported in 2.0 series, as well as new `order` and `casting` ones. These are fully documented in docstrings in forthcoming 2.0.1, as well as in the new User's Guide wiki page at: http://code.google.com/p/numexpr/wiki/UsersGuide Thanks for pointing this out! > This goes without saying, but, thanks for numexpr. > > -- ?from old mail -- > > What I find somewhat encumbering is that there is no single piece of > document that lists all the operators and functions that numexpr can > parse. For a new user this will be very useful There is a list in the > wiki page entitled "overview" but it seems incomplete (for instance it > does not describe the reduction operations available). I do not know > enough to know how incomplete it is. The reduction functions are just `sum()` and `prod()` and are fully documented in the new User's Guide. > > Is there any plan to implement the reduction like enhancements that > ufuncs provide: namely reduce_at, accumulate, reduce ? It is entirely > possible that they are already in there but I could not figure out how > to use them. If they aren't it would be great to have them. No, these are not implemented, but we will gladly accept contributions ;) -- Francesc Alted From faltet at gmail.com Sun Jan 8 05:49:53 2012 From: faltet at gmail.com (Francesc Alted) Date: Sun, 8 Jan 2012 11:49:53 +0100 Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released Message-ID: ========================== Announcing Numexpr 2.0.1 ========================== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's VML library, which allows for squeezing the last drop of performance out of your multi-core processors. What's new ========== In this release, better docstrings for `evaluate` and reduction methods (`sum`, `prod`) is in place. Also, compatibility with Python 2.5 has been restored (2.4 is definitely not supported anymore). In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted From nadavh at visionsense.com Sun Jan 8 06:11:47 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 8 Jan 2012 03:11:47 -0800 Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released In-Reply-To: References: Message-ID: <26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local> What about python3 support? Thanks Nadav. ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Francesc Alted [faltet at gmail.com] Sent: 08 January 2012 12:49 To: Discussion of Numerical Python; numexpr Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released ========================== Announcing Numexpr 2.0.1 ========================== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's VML library, which allows for squeezing the last drop of performance out of your multi-core processors. What's new ========== In this release, better docstrings for `evaluate` and reduction methods (`sum`, `prod`) is in place. Also, compatibility with Python 2.5 has been restored (2.4 is definitely not supported anymore). In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From faltet at gmail.com Sun Jan 8 08:08:11 2012 From: faltet at gmail.com (Francesc Alted) Date: Sun, 8 Jan 2012 14:08:11 +0100 Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released In-Reply-To: <26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local> Message-ID: Python3 is not on my radar yet. Perhaps others might be interested on doing the port. Francesc 2012/1/8 Nadav Horesh : > What about python3 support? > > ?Thanks > > ? ?Nadav. > > ________________________________________ > From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Francesc Alted [faltet at gmail.com] > Sent: 08 January 2012 12:49 > To: Discussion of Numerical Python; numexpr > Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released > > ========================== > ?Announcing Numexpr 2.0.1 > ========================== > > Numexpr is a fast numerical expression evaluator for NumPy. ?With it, > expressions that operate on arrays (like "3*a+4*b") are accelerated > and use less memory than doing the same calculation in Python. > > It wears multi-threaded capabilities, as well as support for Intel's > VML library, which allows for squeezing the last drop of performance > out of your multi-core processors. > > What's new > ========== > > In this release, better docstrings for `evaluate` and reduction > methods (`sum`, `prod`) is in place. ?Also, compatibility with Python > 2.5 has been restored (2.4 is definitely not supported anymore). > > In case you want to know more in detail what has changed in this > version, see: > > http://code.google.com/p/numexpr/wiki/ReleaseNotes > > or have a look at RELEASE_NOTES.txt in the tarball. > > Where I can find Numexpr? > ========================= > > The project is hosted at Google code in: > > http://code.google.com/p/numexpr/ > > You can get the packages from PyPI as well: > > http://pypi.python.org/pypi/numexpr > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may > have. > > > Enjoy! > > -- > Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From pierre.haessig at crans.org Sun Jan 8 08:56:36 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Sun, 08 Jan 2012 14:56:36 +0100 Subject: [Numpy-discussion] Error in numpy.load example? In-Reply-To: References: Message-ID: <4F09A094.9050304@crans.org> Hi Sebastien, Le 05/01/2012 15:02, S?bastien Barth?l?my a ?crit : > However http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html > says: > > numpy.savez(file, *args, **kwds)? > > Save several arrays into a single file in uncompressed .npz format. > > Moreover, this last page points to an undocumented numpy.savez_compressed > function, which is also non-existent in my version of numpy (1.5.1-2ubuntu2). Indeed, this online doc is not consistent with the numpy.savez *version 1.5*... [3] It seems there was an API update between 1.5 and 1.6 ([1][2]). (As a Debian testing user, I was also unaware of this change. Thanks for pointing that out, at least for me ! ) Now, I think Sebastien's question raises an interesting practical issue : If I'm correct, there is no place in the HTML page where the numpy version is written, except in (and except the root of the manual http://docs.scipy.org/doc/numpy/reference/index.html ) Maybe, there should be some "top page version indicator". And possibly some way (a drop down menu ??) to switch between versions. (I don't know how this would fit in the Sphinx template system however...) What do other people think ? Best, Pierre [1] http://docs.scipy.org/doc/numpy-1.5.x/reference/generated/numpy.savez.html#numpy.savez [2] http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.savez.html#numpy.savez [3] There is even a ticket about this ! http://projects.scipy.org/numpy/ticket/1696 From shish at keba.be Sun Jan 8 16:16:33 2012 From: shish at keba.be (Olivier Delalleau) Date: Sun, 8 Jan 2012 16:16:33 -0500 Subject: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method In-Reply-To: <20120106121522.203240@gmx.net> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net> Message-ID: <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com> You could try A[...].fill(MyObject(...)). I haven't tried it myself, so not sure it would work though... -=- Olivier 2012/1/6 "David K?pfer" <dkoepfer at gmx.de> > Dear numpy community, > > I'm trying to create an array of type object. > > A = empty(9, dtype=object) > A[ array(0,1,2) ] = MyObject(1) > A[ array(3,4,5) ] = MyObject(2) > A[ array(6,7,8) ] = MyObject(3) > > This has worked well until MyObject has gotten an __getitem__ method. Now > python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], [1] > to MyObject(1)[1] and so on. > > Is there any way to just get a reference of the instance of MyObject into > every entry of the array slice? > > Thank you for any help on this problem > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120108/7d434a03/attachment.html> From dkoepfer at gmx.de Mon Jan 9 03:21:35 2012 From: dkoepfer at gmx.de (=?iso-8859-1?Q?=22David_K=F6pfer=22?=) Date: Mon, 09 Jan 2012 09:21:35 +0100 Subject: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method In-Reply-To: <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net> <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com> Message-ID: <20120109082135.203220@gmx.net> Hi Oliver, thank you very much for your reply, sadly it is not working as you and I hoped. The array still stays at None even after the code. I've also tried A[X] = [MyObject(...)]*len(X) but that just results in a Memory error. So is there really no way to avoid this broadcasting? David -------- Original-Nachricht -------- > Datum: Sun, 8 Jan 2012 16:16:33 -0500 > Von: Olivier Delalleau <shish at keba.be> > An: Discussion of Numerical Python <numpy-discussion at scipy.org> > Betreff: Re: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method > You could try A[...].fill(MyObject(...)). I haven't tried it myself, so > not > sure it would work though... > > -=- Olivier > > 2012/1/6 "David K?pfer" <dkoepfer at gmx.de> > > > Dear numpy community, > > > > I'm trying to create an array of type object. > > > > A = empty(9, dtype=object) > > A[ array(0,1,2) ] = MyObject(1) > > A[ array(3,4,5) ] = MyObject(2) > > A[ array(6,7,8) ] = MyObject(3) > > > > This has worked well until MyObject has gotten an __getitem__ method. > Now > > python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], > [1] > > to MyObject(1)[1] and so on. > > > > Is there any way to just get a reference of the instance of MyObject > into > > every entry of the array slice? > > > > Thank you for any help on this problem > > David > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From shish at keba.be Mon Jan 9 10:39:21 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 9 Jan 2012 10:39:21 -0500 Subject: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method In-Reply-To: <20120109082135.203220@gmx.net> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net> <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com> <20120109082135.203220@gmx.net> Message-ID: <CAFXk4bo20OFa1i17cXGRJN-nAjk-HvAMWEQo+KaPufaJ-+h0nA@mail.gmail.com> Oh, sorry, I hadn't paid enough attention to the way you are indexing A: if you are using an array to index, it creates a copy, so using ".fill" will fill the copy and you won't see the result. Instead, use A[0:3], A[3:6], etc. -=- Olivier 2012/1/9 "David K?pfer" <dkoepfer at gmx.de> > Hi Oliver, > > thank you very much for your reply, sadly it is not working as you and I > hoped. The array still stays at None even after the code. > > I've also tried A[X] = [MyObject(...)]*len(X) but that just results in a > Memory error. > > So is there really no way to avoid this broadcasting? > > David > > > -------- Original-Nachricht -------- > > Datum: Sun, 8 Jan 2012 16:16:33 -0500 > > Von: Olivier Delalleau <shish at keba.be> > > An: Discussion of Numerical Python <numpy-discussion at scipy.org> > > Betreff: Re: [Numpy-discussion] filling an alice of array of object with > a reference to an object that has a __getitem__ method > > > You could try A[...].fill(MyObject(...)). I haven't tried it myself, so > > not > > sure it would work though... > > > > -=- Olivier > > > > 2012/1/6 "David K?pfer" <dkoepfer at gmx.de> > > > > > Dear numpy community, > > > > > > I'm trying to create an array of type object. > > > > > > A = empty(9, dtype=object) > > > A[ array(0,1,2) ] = MyObject(1) > > > A[ array(3,4,5) ] = MyObject(2) > > > A[ array(6,7,8) ] = MyObject(3) > > > > > > This has worked well until MyObject has gotten an __getitem__ method. > > Now > > > python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], > > [1] > > > to MyObject(1)[1] and so on. > > > > > > Is there any way to just get a reference of the instance of MyObject > > into > > > every entry of the array slice? > > > > > > Thank you for any help on this problem > > > David > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/787b5e5a/attachment.html> From questions.anon at gmail.com Mon Jan 9 17:31:41 2012 From: questions.anon at gmail.com (questions anon) Date: Tue, 10 Jan 2012 09:31:41 +1100 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> Message-ID: <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> thanks for the responses. Unfortunately they are not matching shapes >>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape (721, 106, 193) (721,) (106,) (193,) So I still receive index out of bounds error: >>>tmax=TSFC.max(axis=0) numpy array of max values for the month >>>maxindex=tmax.argmax() 2928 >>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() 35.5 (degrees celcius) >>>latloc=LAT[tmax.argmax()] IndexError: index out of bounds lonloc=LON[tmax.argmax()] timeloc=TIME[tmax.argmax()] Any other ideas for this type of situation? thanks On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 04.01.2012, at 5:10AM, questions anon wrote: > > > Thanks for your responses but I am still having difficuties with this > problem. Using argmax gives me one very large value and I am not sure what > it is. > > There shouldn't be any issues with the shape. The latitude and longitude > are the same shape always (covering a state) and the temperature (TSFC) > data are hourly for a whole month. > > There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == > LON.shape > > One needs more information on the structure of these data to say anything > definite, > but if e.g. your TSFC data have a time and a location dimension, argmax > will > per default return the index for the flattened array (see the argmax > documentation > for details, and how to use the axis keyword to get a different output). > This might be the very large value you mention, and if your location data > have fewer > dimensions, the index will easily be out of range. As Ben wrote, you'd > need extra work to > find the maximum location, depending on what maximum you are actually > looking for. > > As a speculative example, let's assume you have the temperature data in an > array(ntime, nloc) and the position data in array(nloc). Then > > TSFC.argmax(axis=1) > > would give you the index for the hottest place for each hour of the month > (i.e. actually an array of ntime indices, and pointer to so many different > locations). > > To locate the maximum temperature for the entire month, your best way > would probably > be to first extract the array of (monthly) maximum temperatures in each > location as > > tmax = TSFC.max(axis=0) > > which would have (in this example) the shape (nloc,), so you could > directly use it to index > > LAT[tmax.argmax()] etc. > > Cheers, > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/6ab1563a/attachment.html> From ben.root at ou.edu Mon Jan 9 18:22:39 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 9 Jan 2012 17:22:39 -0600 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> Message-ID: <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> On Monday, January 9, 2012, questions anon <questions.anon at gmail.com> wrote: > thanks for the responses. > Unfortunately they are not matching shapes >>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape > (721, 106, 193) (721,) (106,) (193,) > > So I still receive index out of bounds error: >>>>tmax=TSFC.max(axis=0) > numpy array of max values for the month >>>>maxindex=tmax.argmax() > 2928 >>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() > 35.5 (degrees celcius) > >>>>latloc=LAT[tmax.argmax()] > IndexError: index out of bounds > > lonloc=LON[tmax.argmax()] > timeloc=TIME[tmax.argmax()] > > > Any other ideas for this type of situation? > thanks Right, we realize they are not the same shape. When you use argmax on the temperature data, take that index number and use unravel_index(index, TSFC.shape) to get a three-element tuple, each being the index in the TIME, LAT, LON arrays, respectively. Cheers, Ben Root > > On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: >> >> On 04.01.2012, at 5:10AM, questions anon wrote: >> >> > Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. >> > There shouldn't be any issues with the shape. The latitude and longitude are the same shape always (covering a state) and the temperature (TSFC) data are hourly for a whole month. >> >> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == LON.shape >> >> One needs more information on the structure of these data to say anything definite, >> but if e.g. your TSFC data have a time and a location dimension, argmax will >> per default return the index for the flattened array (see the argmax documentation >> for details, and how to use the axis keyword to get a different output). >> This might be the very large value you mention, and if your location data have fewer >> dimensions, the index will easily be out of range. As Ben wrote, you'd need extra work to >> find the maximum location, depending on what maximum you are actually looking for. >> >> As a speculative example, let's assume you have the temperature data in an >> array(ntime, nloc) and the position data in array(nloc). Then >> >> TSFC.argmax(axis=1) >> >> would give you the index for the hottest place for each hour of the month >> (i.e. actually an array of ntime indices, and pointer to so many different locations). >> >> To locate the maximum temperature for the entire month, your best way would probably >> be to first extract the array of (monthly) maximum temperatures in each location as >> >> tmax = TSFC.max(axis=0) >> >> which would have (in this example) the shape (nloc,), so you could directly use it to index >> >> LAT[tmax.argmax()] etc. >> >> Cheers, >> Derek >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/4b1e0db4/attachment.html> From questions.anon at gmail.com Mon Jan 9 20:59:00 2012 From: questions.anon at gmail.com (questions anon) Date: Tue, 10 Jan 2012 12:59:00 +1100 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> Message-ID: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> thank you, I seem to have made some progress (with lots of help)!! I still seem to be having trouble with the time. Because it is hourly data for a whole month I assume that is where my problem lies. When I run the following code I alwayes receive the first timestamp of the file. Not sure how to get around this: tmax=TSFC.max(axis=0) maxindex=tmax.argmax() maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() print maxindex, maxtemp val=N.unravel_index(maxindex, TSFC.shape) listval=list(val) print listval timelocation=TIME[listval[0]] latlocation=LAT[listval[1]] lonlocation=LON[listval[2]] print latlocation, lonlocation cdftime=utime('seconds since 1970-01-01 00:00:00') ncfiletime=cdftime.num2date(timelocation) print ncfiletime On Tue, Jan 10, 2012 at 10:22 AM, Benjamin Root <ben.root at ou.edu> wrote: > > > On Monday, January 9, 2012, questions anon <questions.anon at gmail.com> > wrote: > > thanks for the responses. > > Unfortunately they are not matching shapes > >>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape > > (721, 106, 193) (721,) (106,) (193,) > > > > So I still receive index out of bounds error: > >>>>tmax=TSFC.max(axis=0) > > numpy array of max values for the month > >>>>maxindex=tmax.argmax() > > 2928 > >>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() > > 35.5 (degrees celcius) > > > >>>>latloc=LAT[tmax.argmax()] > > IndexError: index out of bounds > > > > lonloc=LON[tmax.argmax()] > > timeloc=TIME[tmax.argmax()] > > > > > > Any other ideas for this type of situation? > > thanks > > Right, we realize they are not the same shape. When you use argmax on the > temperature data, take that index number and use unravel_index(index, > TSFC.shape) to get a three-element tuple, each being the index in the TIME, > LAT, LON arrays, respectively. > > Cheers, > Ben Root > > > > > > On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier < > derek at astro.physik.uni-goettingen.de> wrote: > >> > >> On 04.01.2012, at 5:10AM, questions anon wrote: > >> > >> > Thanks for your responses but I am still having difficuties with this > problem. Using argmax gives me one very large value and I am not sure what > it is. > >> > There shouldn't be any issues with the shape. The latitude and > longitude are the same shape always (covering a state) and the temperature > (TSFC) data are hourly for a whole month. > >> > >> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == > LON.shape > >> > >> One needs more information on the structure of these data to say > anything definite, > >> but if e.g. your TSFC data have a time and a location dimension, argmax > will > >> per default return the index for the flattened array (see the argmax > documentation > >> for details, and how to use the axis keyword to get a different output). > >> This might be the very large value you mention, and if your location > data have fewer > >> dimensions, the index will easily be out of range. As Ben wrote, you'd > need extra work to > >> find the maximum location, depending on what maximum you are actually > looking for. > >> > >> As a speculative example, let's assume you have the temperature data in > an > >> array(ntime, nloc) and the position data in array(nloc). Then > >> > >> TSFC.argmax(axis=1) > >> > >> would give you the index for the hottest place for each hour of the > month > >> (i.e. actually an array of ntime indices, and pointer to so many > different locations). > >> > >> To locate the maximum temperature for the entire month, your best way > would probably > >> be to first extract the array of (monthly) maximum temperatures in each > location as > >> > >> tmax = TSFC.max(axis=0) > >> > >> which would have (in this example) the shape (nloc,), so you could > directly use it to index > >> > >> LAT[tmax.argmax()] etc. > >> > >> Cheers, > >> Derek > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/f538bc81/attachment.html> From shish at keba.be Mon Jan 9 22:39:52 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 9 Jan 2012 22:39:52 -0500 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> Message-ID: <CAFXk4bqfLCrNWyTTxMCszu+K2KpRDHucjxN636be_UcPTOySSg@mail.gmail.com> Do you mean that listval[0] is systematically equal to 0, or is it something else? -=- Olivier 2012/1/9 questions anon <questions.anon at gmail.com> > thank you, I seem to have made some progress (with lots of help)!! > I still seem to be having trouble with the time. Because it is hourly data > for a whole month I assume that is where my problem lies. > When I run the following code I alwayes receive the first timestamp of the > file. Not sure how to get around this: > > tmax=TSFC.max(axis=0) > maxindex=tmax.argmax() > maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() > print maxindex, maxtemp > val=N.unravel_index(maxindex, TSFC.shape) > listval=list(val) > print listval > timelocation=TIME[listval[0]] > latlocation=LAT[listval[1]] > lonlocation=LON[listval[2]] > print latlocation, lonlocation > > cdftime=utime('seconds since 1970-01-01 00:00:00') > ncfiletime=cdftime.num2date(timelocation) > print ncfiletime > > > > On Tue, Jan 10, 2012 at 10:22 AM, Benjamin Root <ben.root at ou.edu> wrote: > >> >> >> On Monday, January 9, 2012, questions anon <questions.anon at gmail.com> >> wrote: >> > thanks for the responses. >> > Unfortunately they are not matching shapes >> >>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape >> > (721, 106, 193) (721,) (106,) (193,) >> > >> > So I still receive index out of bounds error: >> >>>>tmax=TSFC.max(axis=0) >> > numpy array of max values for the month >> >>>>maxindex=tmax.argmax() >> > 2928 >> >>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max() >> > 35.5 (degrees celcius) >> > >> >>>>latloc=LAT[tmax.argmax()] >> > IndexError: index out of bounds >> > >> > lonloc=LON[tmax.argmax()] >> > timeloc=TIME[tmax.argmax()] >> > >> > >> > Any other ideas for this type of situation? >> > thanks >> >> Right, we realize they are not the same shape. When you use argmax on >> the temperature data, take that index number and use unravel_index(index, >> TSFC.shape) to get a three-element tuple, each being the index in the TIME, >> LAT, LON arrays, respectively. >> >> Cheers, >> Ben Root >> >> >> > >> > On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier < >> derek at astro.physik.uni-goettingen.de> wrote: >> >> >> >> On 04.01.2012, at 5:10AM, questions anon wrote: >> >> >> >> > Thanks for your responses but I am still having difficuties with >> this problem. Using argmax gives me one very large value and I am not sure >> what it is. >> >> > There shouldn't be any issues with the shape. The latitude and >> longitude are the same shape always (covering a state) and the temperature >> (TSFC) data are hourly for a whole month. >> >> >> >> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == >> LON.shape >> >> >> >> One needs more information on the structure of these data to say >> anything definite, >> >> but if e.g. your TSFC data have a time and a location dimension, >> argmax will >> >> per default return the index for the flattened array (see the argmax >> documentation >> >> for details, and how to use the axis keyword to get a different >> output). >> >> This might be the very large value you mention, and if your location >> data have fewer >> >> dimensions, the index will easily be out of range. As Ben wrote, you'd >> need extra work to >> >> find the maximum location, depending on what maximum you are actually >> looking for. >> >> >> >> As a speculative example, let's assume you have the temperature data >> in an >> >> array(ntime, nloc) and the position data in array(nloc). Then >> >> >> >> TSFC.argmax(axis=1) >> >> >> >> would give you the index for the hottest place for each hour of the >> month >> >> (i.e. actually an array of ntime indices, and pointer to so many >> different locations). >> >> >> >> To locate the maximum temperature for the entire month, your best way >> would probably >> >> be to first extract the array of (monthly) maximum temperatures in >> each location as >> >> >> >> tmax = TSFC.max(axis=0) >> >> >> >> which would have (in this example) the shape (nloc,), so you could >> directly use it to index >> >> >> >> LAT[tmax.argmax()] etc. >> >> >> >> Cheers, >> >> Derek >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/4bed4563/attachment.html> From aronne.merrelli at gmail.com Mon Jan 9 23:28:52 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Mon, 9 Jan 2012 22:28:52 -0600 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> Message-ID: <CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com> On Mon, Jan 9, 2012 at 7:59 PM, questions anon <questions.anon at gmail.com>wrote: > thank you, I seem to have made some progress (with lots of help)!! > I still seem to be having trouble with the time. Because it is hourly data > for a whole month I assume that is where my problem lies. > When I run the following code I alwayes receive the first timestamp of the > file. Not sure how to get around this: > > tmax=TSFC.max(axis=0) > maxindex=tmax.argmax() > You are computing max(axis=0) first. So, tmax is an array containing the maximum temperature at each lat/lon grid point, over the set of 721 months. It will be a [106, 193] array. So the argmax of tmax is an element in a shape [106,193] array (the number of latitude/number of longitude) not the original three dimension [721, 106, 193] array. Thus when you unravel it you can only get the first time value. I re-read your original post but I don't understand what number you need. Are you trying to get the single max value over the entire array? Or max value for each month? (a 721 element vector)? or something else? Cheers, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/3d31eaaf/attachment.html> From questions.anon at gmail.com Mon Jan 9 23:40:02 2012 From: questions.anon at gmail.com (questions anon) Date: Tue, 10 Jan 2012 15:40:02 +1100 Subject: [Numpy-discussion] find location of maximum values In-Reply-To: <CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com> References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com> <CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com> <CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com> <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com> <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com> <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de> <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com> <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com> <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com> <CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com> Message-ID: <CAN_=ogvBwmYDduqF2EK3TiVxWkL2bcmZk9v4ZQ9oozV5qA9CzQ@mail.gmail.com> Thank you, thank you, thank you! I needed to find the max value (and corresponding TIME and LAT, LON) for the entire month but I shouldn't have been using the tmax, instead I needed to use the entire array. Below code works for those needing to do something similar. Thanks for all your help everyone! tmax=TSFC.max(axis=0) maxindex=TSFC.argmax() maxtemp=TSFC.ravel()[maxindex] #or maxtemp=TSFC.max() print maxindex, maxtemp val=N.unravel_index(maxindex, TSFC.shape) listval=list(val) print listval timelocation=TIME[listval[0]] latlocation=LAT[listval[1]] lonlocation=LON[listval[2]] print latlocation, lonlocation cdftime=utime('seconds since 1970-01-01 00:00:00') ncfiletime=cdftime.num2date(timelocation) print ncfiletime On Tue, Jan 10, 2012 at 3:28 PM, Aronne Merrelli <aronne.merrelli at gmail.com>wrote: > > > On Mon, Jan 9, 2012 at 7:59 PM, questions anon <questions.anon at gmail.com>wrote: > >> thank you, I seem to have made some progress (with lots of help)!! >> I still seem to be having trouble with the time. Because it is hourly >> data for a whole month I assume that is where my problem lies. >> When I run the following code I alwayes receive the first timestamp of >> the file. Not sure how to get around this: >> >> tmax=TSFC.max(axis=0) >> maxindex=tmax.argmax() >> > > You are computing max(axis=0) first. So, tmax is an array containing the > maximum temperature at each lat/lon grid point, over the set of 721 months. > It will be a [106, 193] array. > > So the argmax of tmax is an element in a shape [106,193] array (the number > of latitude/number of longitude) not the original three dimension [721, > 106, 193] array. Thus when you unravel it you can only get the first time > value. > > I re-read your original post but I don't understand what number you need. > Are you trying to get the single max value over the entire array? Or max > value for each month? (a 721 element vector)? or something else? > > > Cheers, > Aronne > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/2ace75cc/attachment.html> From scipy at samueljohn.de Tue Jan 10 10:24:41 2012 From: scipy at samueljohn.de (Samuel John) Date: Tue, 10 Jan 2012 16:24:41 +0100 Subject: [Numpy-discussion] SParse feature vector generation In-Reply-To: <CAO=18DY=fpfWjX0axfce-GZhwZOAffwC+48CdxUdAp5hV4U3Hw@mail.gmail.com> References: <CAO=18DY=fpfWjX0axfce-GZhwZOAffwC+48CdxUdAp5hV4U3Hw@mail.gmail.com> Message-ID: <11FDB528-2255-4D76-90A7-0FC013E4E12A@samueljohn.de> I would just use a lookup dict: names = [ "uc_berkeley", "stanford", "uiuc", "google", "intel", "texas_instruments", "bool"] lookup = dict( zip( range(len(names)), names ) ) Now, given you have n entries: S = numpy.zeros( (n, len(names)) ,dtype=numpy.int32) for k in ["uc_berkeley", "google", "bool"]: S[0,lookup[k]] += 1 for k in ["stanford", "intel","bool"]: S[1,lookup[k]] += 1 ... and so forth. so lookup[k] returns the index to use. Hope this helps. I am not aware of an automatic that does this. I may be wrong. cheers, Samuel On 04.01.2012, at 07:25, Dhruvkaran Mehta wrote: > Hi numpy users, > > Is there a convenient way in numpy to go from "string" features like: > > "uc_berkeley", "google", 1 > "stanford", "intel", 1 > . > . > . > "uiuc", "texas_instruments", 0 > > to a numpy matrix like: > > "uc_berkeley", "stanford", ..., "uiuc", "google", "intel", "texas_instruments", "bool" > 1 0 ... 0 1 0 0 1 > 0 1 ... 0 0 1 0 1 > : > 0 0 ... 1 0 0 1 0 > > I really appreciate you taking the time to help! > Thanks! > --Dhruv > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scipy at samueljohn.de Tue Jan 10 11:29:04 2012 From: scipy at samueljohn.de (Samuel John) Date: Tue, 10 Jan 2012 17:29:04 +0100 Subject: [Numpy-discussion] Ufuncs and flexible types, CAPI In-Reply-To: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com> References: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com> Message-ID: <18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de> [sorry for duplicate - I used the wrong mail address] I am afraid, I didn't quite get the question. What is the scenario? What is the benefit that would weight out the performance hit of checking whether there is a callback or not. This has to be evaluated quite a lot. Oh well ... and 1.3.0 is pretty old :-) cheers, Samuel On 31.12.2011, at 07:48, Val Kalatsky wrote: > > Hi folks, > > First post, may not follow the standards, please bear with me. > > Need to define a ufunc that takes care of various type. > Fixed - no problem, userdef - no problem, flexible - problem. > It appears that the standard ufunc loop does not provide means to > deliver the size of variable size items. > Questions and suggestions: > > 1) Please no laughing: I have to code for NumPy 1.3.0. > Perhaps this issue has been resolved, then the discussion becomes moot. > If so please direct me to the right link. > > 2) A reasonable approach here would be to use callbacks and to give the user (read programmer) > a chance to intervene at least twice: OnInit and OnFail (OnFinish may not be unreasonable as well). > > OnInit: before starting the type resolution the user is given a chance to do something (e.g. check for > that pesky type and take control then return a flag indicating a stop) before the resolution starts > OnFail: the resolution took place and did not succeed, the user is given a chance to fix it. > In most of the case these callbacks are NULLs. > > I could patch numpy with a generic method that does it, but it's a shame not to use the good ufunc machine. > > Thanks for tips and suggestions. > > Val Kalatsky > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From kalatsky at gmail.com Tue Jan 10 13:26:17 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Tue, 10 Jan 2012 12:26:17 -0600 Subject: [Numpy-discussion] Ufuncs and flexible types, CAPI In-Reply-To: <18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de> References: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com> <18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de> Message-ID: <CAE8bXEnz_Q1WVVFk_7QOxNB5--zyu1TVwVGRCK74tLHKBVoYng@mail.gmail.com> Hi Samuel, Thanks for the reply. I hoped somebody will prove me wrong on ufuncs' limitation: no flexible type support. Also wanted to bring up a discussion on changing ufunc API. I think adding another parameter that delivers pointers to arrays to the loops would not lead to any undesirable consequences. Yep, 1.3.0 is old, but 1.7 has same loop prototype (with some minor cosmetic change): (char **args, intp *dimensions, intp *steps, void *func) -> (char **args, intp *dimensions, intp *steps, void *NPY_UNUSED(func)) it probably has not change from the conception. Thanks Val On Tue, Jan 10, 2012 at 10:29 AM, Samuel John <scipy at samueljohn.de> wrote: > [sorry for duplicate - I used the wrong mail address] > > I am afraid, I didn't quite get the question. > What is the scenario? What is the benefit that would weight out the > performance hit of checking whether there is a callback or not. This has to > be evaluated quite a lot. > > Oh well ... and 1.3.0 is pretty old :-) > > cheers, > Samuel > > On 31.12.2011, at 07:48, Val Kalatsky wrote: > > > > > Hi folks, > > > > First post, may not follow the standards, please bear with me. > > > > Need to define a ufunc that takes care of various type. > > Fixed - no problem, userdef - no problem, flexible - problem. > > It appears that the standard ufunc loop does not provide means to > > deliver the size of variable size items. > > Questions and suggestions: > > > > 1) Please no laughing: I have to code for NumPy 1.3.0. > > Perhaps this issue has been resolved, then the discussion becomes moot. > > If so please direct me to the right link. > > > > 2) A reasonable approach here would be to use callbacks and to give the > user (read programmer) > > a chance to intervene at least twice: OnInit and OnFail (OnFinish may > not be unreasonable as well). > > > > OnInit: before starting the type resolution the user is given a chance > to do something (e.g. check for > > that pesky type and take control then return a flag indicating a stop) > before the resolution starts > > OnFail: the resolution took place and did not succeed, the user is given > a chance to fix it. > > In most of the case these callbacks are NULLs. > > > > I could patch numpy with a generic method that does it, but it's a shame > not to use the good ufunc machine. > > > > Thanks for tips and suggestions. > > > > Val Kalatsky > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/9732d65c/attachment.html> From madsipsen at gmail.com Tue Jan 10 15:14:02 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 10 Jan 2012 21:14:02 +0100 Subject: [Numpy-discussion] Index update Message-ID: <4F0C9C0A.5030809@gmail.com> Hi, Suppose you have N items, say N = 10. Now a subset of these items are selected given by a list A of indices. Lets say that items A = [2,5,7] are selected. Assume now that you delete some of the items given by the indices S = [1,4,8]. This means that the list of indices A must be updated, since items have been deleted. For this particular case the updated selection list A becomes A = [1,3,5]. Is there some smart numpy way of doing this index update of the selected items in A without looping? Typically N is large. Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/90877fe9/attachment.html> From kalatsky at gmail.com Tue Jan 10 15:45:46 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Tue, 10 Jan 2012 14:45:46 -0600 Subject: [Numpy-discussion] Index update In-Reply-To: <4F0C9C0A.5030809@gmail.com> References: <4F0C9C0A.5030809@gmail.com> Message-ID: <CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com> A - np.digitize(A, S) Should do the trick, just make sure that S is sorted and A and S do not overlap, if they do remove those items from A using set operations. Val On Tue, Jan 10, 2012 at 2:14 PM, Mads Ipsen <madsipsen at gmail.com> wrote: > ** > Hi, > > Suppose you have N items, say N = 10. > > Now a subset of these items are selected given by a list A of indices. > Lets say that items A = [2,5,7] are selected. Assume now that you delete > some of the items given by the indices S = [1,4,8]. This means that the > list of indices A must be updated, since items have been deleted. For this > particular case the updated selection list A becomes A = [1,3,5]. > > Is there some smart numpy way of doing this index update of the selected > items in A without looping? Typically N is large. > > Best regards, > > Mads > > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone: +45-29716388 | > | Denmark | email: mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/df087851/attachment.html> From madsipsen at gmail.com Tue Jan 10 15:53:52 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 10 Jan 2012 21:53:52 +0100 Subject: [Numpy-discussion] Index update In-Reply-To: <CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com> References: <4F0C9C0A.5030809@gmail.com> <CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com> Message-ID: <4F0CA560.7080807@gmail.com> Thanks - very cool! On 10/01/2012 21:45, Val Kalatsky wrote: > > A - np.digitize(A, S) > Should do the trick, just make sure that S is sorted and A and S do > not overlap, > if they do remove those items from A using set operations. > Val > > On Tue, Jan 10, 2012 at 2:14 PM, Mads Ipsen <madsipsen at gmail.com > <mailto:madsipsen at gmail.com>> wrote: > > Hi, > > Suppose you have N items, say N = 10. > > Now a subset of these items are selected given by a list A of > indices. Lets say that items A = [2,5,7] are selected. Assume now > that you delete some of the items given by the indices S = > [1,4,8]. This means that the list of indices A must be updated, > since items have been deleted. For this particular case the > updated selection list A becomes A = [1,3,5]. > > Is there some smart numpy way of doing this index update of the > selected items in A without looping? Typically N is large. > > Best regards, > > Mads > > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone:+45-29716388 <tel:%2B45-29716388> | > | Denmark | email:mads.ipsen at gmail.com <mailto:mads.ipsen at gmail.com> | > +----------------------+------------------------------+ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/ff6500e6/attachment.html> From mikehulluk at googlemail.com Wed Jan 11 04:41:22 2012 From: mikehulluk at googlemail.com (Michael Hull) Date: Wed, 11 Jan 2012 09:41:22 +0000 Subject: [Numpy-discussion] Numpy 'groupby' Message-ID: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com> Hi Everyone, First off, thanks for all your hard work on numpy, its a really great help! I was wondering if there was a standard 'groupby' in numpy, that similar to that in itertools. I know its not hard to write with np.diff, but I have found myself writing it on more than a couple of occasions, and wondered if there was a 'standarised' version I was missing out on?? Thanks, Mike From ndbecker2 at gmail.com Wed Jan 11 07:05:33 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 11 Jan 2012 07:05:33 -0500 Subject: [Numpy-discussion] Numpy 'groupby' References: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com> Message-ID: <jejtud$abh$1@dough.gmane.org> Michael Hull wrote: > Hi Everyone, > First off, thanks for all your hard work on numpy, its a really great help! > I was wondering if there was a standard 'groupby' in numpy, that > similar to that in itertools. > I know its not hard to write with np.diff, but I have found myself > writing it on more than a couple of occasions, and wondered if > there was a 'standarised' version I was missing out on?? > Thanks, > > > Mike I've played with groupby in pandas. From hmgaudecker at gmail.com Wed Jan 11 10:12:02 2012 From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker) Date: Wed, 11 Jan 2012 16:12:02 +0100 Subject: [Numpy-discussion] Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 In-Reply-To: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org> References: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org> Message-ID: <472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com> I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it). In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest. Best, Hans-Martin [1] https://github.com/kennethreitz/osx-gcc-installer [2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy >>> numpy.test() Running unit tests for numpy NumPy version 2.0.0.dev-55472ca NumPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy Python version 2.7.2 (default, Jan 11 2012, 15:34:30) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] nose version 1.1.2 ........................./usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py:1317: UserWarning: pytz not found, pytz compatibility tests skipped warnings.warn("pytz not found, pytz compatibility tests skipped") ......................................................................................................................................................F..................................................................................................................................................................................................................................................................S.........................................................................................................................................................../usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:2020: RuntimeWarning: invalid value encountered in absolute return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) ..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K....................................................................................................K...SK.S.......S........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ====================================================================== FAIL: test_einsum_sums_clongdouble (test_einsum.TestEinSum) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_einsum.py", line 479, in test_einsum_sums_clongdouble self.check_einsum_sums(np.clongdouble); File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_einsum.py", line 231, in check_einsum_sums np.sum(a, axis=0).astype(dtype)) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 256, in assert_equal return assert_array_equal(actual, desired, err_msg, verbose) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 753, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 677, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 100.0%) x: array([[ 48.0+0.0j, 50.0+0.0j, 52.0+0.0j, 54.0+0.0j, 56.0+0.0j, 58.0+0.0j, 60.0+0.0j, 62.0+0.0j, 64.0+0.0j, 66.0+0.0j, 68.0+0.0j, 70.0+0.0j, 72.0+0.0j, 74.0+0.0j, 76.0+0.0j,... y: array([[ 0.0+0.0j, 1.0+0.0j, 2.0+0.0j, 3.0+0.0j, 4.0+0.0j, 5.0+0.0j, 6.0+0.0j, 7.0+0.0j, 8.0+0.0j, 9.0+0.0j, 10.0+0.0j, 11.0+0.0j, 12.0+0.0j, 13.0+0.0j, 14.0+0.0j, 15.0+0.0j],... ====================================================================== FAIL: test_prod (test_defmatrix.TestProperties) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/matrixlib/tests/test_defmatrix.py", line 78, in test_prod assert_equal(x.prod(0), matrix([[4,10,18]])) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 256, in assert_equal return assert_array_equal(actual, desired, err_msg, verbose) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 753, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 677, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 100.0%) x: matrix([[4611686018427387904, 4611686018427387904, 4449894403]]) y: matrix([[ 4, 10, 18]]) ---------------------------------------------------------------------- Ran 3553 tests in 51.254s FAILED (KNOWNFAIL=3, SKIP=5, failures=2) <nose.result.TextTestResult run=3553 errors=0 failures=2> >>> import numpy >>> numpy.test() Running unit tests for numpy NumPy version 2.0.0.dev-55472ca NumPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy Python version 3.2.2 (default, Jan 11 2012, 15:30:18) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] nose version 1.1.2 ........................./usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py:1317: UserWarning: pytz not found, pytz compatibility tests skipped warnings.warn("pytz not found, pytz compatibility tests skipped") ...................................................................................................................................................................................................................................................................................E..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...........................................................................................................................................................................................................K....................................................................................................K...SK.S.......S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/lib/format.py:575: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/vx/v51x110x36zcd9lppn78v0qw0000gn/T/tmpenv3j3'> mode=mode, offset=offset) ...........................................................................................................................................................................................................................S............................................................................................................................................................................................................................................................................................................/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/ma/core.py:4778: RuntimeWarning: invalid value encountered in power np.power(out, 0.5, out=out, casting='unsafe') ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ====================================================================== ERROR: test_multiarray.TestFromBuffer.test_empty('', array([], dtype=float64), {}) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/nose-1.1.2-py3.2.egg/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/core/tests/test_multiarray.py", line 1446, in tst_basic assert_array_equal(np.frombuffer(buffer,**kwargs),expected) AttributeError: 'str' object has no attribute '__buffer__' ---------------------------------------------------------------------- Ran 3552 tests in 44.427s FAILED (KNOWNFAIL=4, SKIP=4, errors=1) <nose.result.TextTestResult run=3552 errors=1 failures=0> On 18 Dec 2011, at 23:14, numpy-discussion-request at scipy.org wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 > (McNicol, Adam) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 18 Dec 2011 22:13:48 -0000 > From: "McNicol, Adam" <amcnicol at longroad.ac.uk> > Subject: Re: [Numpy-discussion] Problem installing NumPy with Python > 3.2.2/MacOS X 10.7.2 > To: <numpy-discussion at scipy.org> > Message-ID: <2128d5dc6c318f2d07c027b5c6c4c0ef5b01bf31 at localhost> > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > Definitely have the sdk installed. In the Developer/SDKs directory I have one for 10.6 and another for 10.7 - no idea where a second 10.6 would be coming from =( > > > Adam. > > > -----Original Message----- > From: numpy-discussion-request at scipy.org [mailto:numpy-discussion-request at scipy.org] > Sent: Sun 12/18/2011 9:52 PM > To: numpy-discussion at scipy.org > Subject: NumPy-Discussion Digest, Vol 63, Issue 55 > > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 > (McNicol, Adam) > 2. Re: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 > (Ralf Gommers) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 18 Dec 2011 18:48:47 -0000 > From: "McNicol, Adam" <amcnicol at longroad.ac.uk> > Subject: Re: [Numpy-discussion] Problem installing NumPy with Python > 3.2.2/MacOS X 10.7.2 > To: <numpy-discussion at scipy.org> > Message-ID: <fd29dc5c77c7ee4f0f0622b3459e91a5d77badc3 at localhost> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Ralf, > > Thanks for the response. I tried reinstalling Xcode 4.2.1 and the GCC/Fortran installer from http://r.research.att.com/tools/ (gcc-42-5666.3-darwin11.pkg) before installing the distribute package that you suggested. > > I then reran the numpy installer being sure to enter the three export lines as suggested on the numpy installation guide for Lion. > > Still no success. I guess I'll just have to wait for more official support for my configuration. > > I have included the output from terminal just in case it is useful as there were a few lines in red that suggest something isn't quite right with something. I have placed ** before the lines that appear in red. > > I appreciate the suggestions, > > Thanks again, > > > Adam. > > running build > running config_cc > unifing config_cc, config, build_clib, build_ext, build commands --compiler options > running config_fc > unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options > running build_src > build_src > building py_modules sources > creating build > creating build/src.macosx-10.6-intel-3.2 > creating build/src.macosx-10.6-intel-3.2/numpy > creating build/src.macosx-10.6-intel-3.2/numpy/distutils > building library "npymath" sources > customize NAGFCompiler > **Could not locate executable f95 > customize AbsoftFCompiler > **Could not locate executable f90 > **Could not locate executable f77 > customize IBMFCompiler > **Could not locate executable xlf90 > **Could not locate executable xlf > customize IntelFCompiler > **Could not locate executable fort > **Could not locate executable ifc > customize GnuFCompiler > **Could not locate executable g77 > customize Gnu95FCompiler > **Could not locate executable gfortran > customize G95FCompiler > **Could not locate executable g95 > customize PGroupFCompiler > **Could not locate executable pgf90 > **Could not locate executable pgf77 > **don't know how to compile Fortran code on platform 'posix' > C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' > gcc-4.2: _configtest.c > gcc-4.2 _configtest.o -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > customize NAGFCompiler > customize AbsoftFCompiler > customize IBMFCompiler > customize IntelFCompiler > customize GnuFCompiler > customize Gnu95FCompiler > customize G95FCompiler > customize PGroupFCompiler > **don't know how to compile Fortran code on platform 'posix' > customize NAGFCompiler > customize AbsoftFCompiler > customize IBMFCompiler > customize IntelFCompiler > customize GnuFCompiler > customize Gnu95FCompiler > customize G95FCompiler > customize PGroupFCompiler > **don't know how to compile Fortran code on platform 'posix' > C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' > gcc-4.2: _configtest.c > _configtest.c:1: warning: conflicting types for built-in function ?exp? > _configtest.c:1: warning: conflicting types for built-in function ?exp? > gcc-4.2 _configtest.o -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > creating build/src.macosx-10.6-intel-3.2/numpy/core > creating build/src.macosx-10.6-intel-3.2/numpy/core/src > creating build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath > conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math.c > conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/ieee754.c > conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math_complex.c > building extension "numpy.core._sort" sources > Generating build/src.macosx-10.6-intel-3.2/numpy/core/include/numpy/config.h > customize NAGFCompiler > customize AbsoftFCompiler > customize IBMFCompiler > customize IntelFCompiler > customize GnuFCompiler > customize Gnu95FCompiler > customize G95FCompiler > customize PGroupFCompiler > **don't know how to compile Fortran code on platform 'posix' > customize NAGFCompiler > customize AbsoftFCompiler > customize IBMFCompiler > customize IntelFCompiler > customize GnuFCompiler > customize Gnu95FCompiler > customize G95FCompiler > customize PGroupFCompiler > **don't know how to compile Fortran code on platform 'posix' > C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' > gcc-4.2: _configtest.c > In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, > from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, > from _configtest.c:1: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory > In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, > from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, > from _configtest.c:1: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory > lipo: can't figure out the architecture type of: /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out > In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, > from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, > from _configtest.c:1: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory > In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, > from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, > from _configtest.c:1: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory > lipo: can't figure out the architecture type of: /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out > failure. > removing: _configtest.c _configtest.o > Running from numpy source directory.Traceback (most recent call last): > File "setup.py", line 196, in <module> > setup_package() > File "setup.py", line 189, in setup_package > configuration=configuration ) > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/core.py", line 186, in setup > return old_setup(**new_attr) > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/core.py", line 148, in setup > dist.run_commands() > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 917, in run_commands > self.run_command(cmd) > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 936, in run_command > cmd_obj.run() > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build.py", line 37, in run > old_build.run(self) > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/command/build.py", line 126, in run > self.run_command(cmd_name) > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/cmd.py", line 313, in run_command > self.distribution.run_command(command) > File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 936, in run_command > cmd_obj.run() > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 152, in run > self.build_sources() > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 169, in build_sources > self.build_extension_sources(ext) > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 328, in build_extension_sources > sources = self.generate_sources(sources, ext) > File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 385, in generate_sources > source = func(extension, build_dir) > File "numpy/core/setup.py", line 410, in generate_config_h > moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) > File "numpy/core/setup.py", line 41, in check_types > out = check_types(*a, **kw) > File "numpy/core/setup.py", line 271, in check_types > "Cannot compile 'Python.h'. Perhaps you need to "\ > SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. > > > > Message: 3 > Date: Sun, 18 Dec 2011 09:49:00 +0100 > From: Ralf Gommers <ralf.gommers at googlemail.com> > Subject: Re: [Numpy-discussion] Problem installing NumPy with Python > 3.2.2/MacOS X 10.7.2 > To: Discussion of Numerical Python <numpy-discussion at scipy.org> > Message-ID: > <CABL7CQh-+t+6p5z_S_JVycRCB-FWK0Q3wiAWjazgmDp8AJkKqw at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > On Sat, Dec 17, 2011 at 12:59 PM, McNicol, Adam <amcnicol at longroad.ac.uk>wrote: > >> ** >> >> Hi There, >> >> Thanks for the responses. >> >> At this point I would settle from just being able to install matplotlib. >> Even if some of the functionality isn't present currently that is fine. >> >> I'm afraid my knowledge of Python falls down about here as well. I >> installed Python 3.2.2 via the installer from Python.org so I have no idea >> whether Python.h is present or where indeed I would find it or how I would >> add it to the search path. >> >> Do I have to install from source or something like that? >> > > No, your Python install should be fine if you just got the dmg installer > from python.org. I recommend you install the OS X SDKs and distribute ( > http://pypi.python.org/pypi/distribute), as I said before, and try again to > compile numpy. > > Unfortunately you have chosen a difficult combination of OS and Python > version, so we don't have binary installers you can use (yet). > > Ralf > > >> Thanks again, >> >> >> Adam. >> >> >> -----Original Message----- >> From: McNicol, Adam >> Sent: Fri 12/16/2011 11:07 PM >> To: numpy-discussion at scipy.org >> Subject: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 >> >> Hi There, >> >> I am very new to numpy and have really only started investigating it as >> one of my students needs some functionality from matplotlib. I have managed >> to install everything under Windows for work in class but I use a Mac at >> home and have been struggling all night to get it to build and install. >> >> I should mention that I am using Python 3.2.2 both in school and at home >> and it isn't an option to use Python 2.7 as all of the rest of my class is >> taught in Python 3. I also have the most recent version of Xcode installed. >> >> I have installed the correct build of gcc-4.2 with Fortran (gcc-4.2 (Apple >> build 5666.3) with GNU Fortran 4.2.4 for Mac OS X 10.7 (Lion)) from >> http://r.research.att.com/tools/ >> >> I then followed the install instructions but the build fails with the >> following message: >> >> File "numpy/core/setup.py", line 271, in check_types >> "Cannot compile 'Python.h'. Perhaps you need to "\ >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install >> python-dev|python-devel. >> >> I have got no idea what to do with this error message. Any help would be >> much appreciated. >> >> Kind Regards, >> >> >> Adam. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/0747fb12/attachment-0001.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 63, Issue 54 > ************************************************ > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/ms-tnef > Size: 7394 bytes > Desc: not available > Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/02269314/attachment-0001.bin > > ------------------------------ > > Message: 2 > Date: Sun, 18 Dec 2011 22:53:35 +0100 > From: Ralf Gommers <ralf.gommers at googlemail.com> > Subject: Re: [Numpy-discussion] Problem installing NumPy with Python > 3.2.2/MacOS X 10.7.2 > To: Discussion of Numerical Python <numpy-discussion at scipy.org> > Message-ID: > <CABL7CQgYJWK2ejek0fbQbs2tjT60kHPQjtPuPBTAX4-doT7YtA at mail.gmail.com> > Content-Type: text/plain; charset="windows-1252" > > On Sun, Dec 18, 2011 at 7:48 PM, McNicol, Adam <amcnicol at longroad.ac.uk>wrote: > >> Hi Ralf, >> >> Thanks for the response. I tried reinstalling Xcode 4.2.1 and the >> GCC/Fortran installer from http://r.research.att.com/tools/(gcc-42-5666.3-darwin11.pkg) before installing the distribute package that >> you suggested. >> >> I then reran the numpy installer being sure to enter the three export >> lines as suggested on the numpy installation guide for Lion. >> >> Still no success. I guess I'll just have to wait for more official support >> for my configuration. >> >> I have included the output from terminal just in case it is useful as >> there were a few lines in red that suggest something isn't quite right with >> something. I have placed ** before the lines that appear in red. >> >> Your compile flags have "-isysroot /Developer/SDKs/MacOSX10.6.sdk" in it > twice. Can you confirm you have installed this SDK? If so, I think the > problem is that it appears twice. Not sure what's causing it though. > > Ralf > > > I appreciate the suggestions, >> >> Thanks again, >> >> >> Adam. >> >> running build >> running config_cc >> unifing config_cc, config, build_clib, build_ext, build commands >> --compiler options >> running config_fc >> unifing config_fc, config, build_clib, build_ext, build commands >> --fcompiler options >> running build_src >> build_src >> building py_modules sources >> creating build >> creating build/src.macosx-10.6-intel-3.2 >> creating build/src.macosx-10.6-intel-3.2/numpy >> creating build/src.macosx-10.6-intel-3.2/numpy/distutils >> building library "npymath" sources >> customize NAGFCompiler >> **Could not locate executable f95 >> customize AbsoftFCompiler >> **Could not locate executable f90 >> **Could not locate executable f77 >> customize IBMFCompiler >> **Could not locate executable xlf90 >> **Could not locate executable xlf >> customize IntelFCompiler >> **Could not locate executable fort >> **Could not locate executable ifc >> customize GnuFCompiler >> **Could not locate executable g77 >> customize Gnu95FCompiler >> **Could not locate executable gfortran >> customize G95FCompiler >> **Could not locate executable g95 >> customize PGroupFCompiler >> **Could not locate executable pgf90 >> **Could not locate executable pgf77 >> **don't know how to compile Fortran code on platform 'posix' >> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g >> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 >> -isysroot /Developer/SDKs/MacOSX10.6.sdk >> >> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> -Inumpy/core/include >> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' >> gcc-4.2: _configtest.c >> gcc-4.2 _configtest.o -o _configtest >> success! >> removing: _configtest.c _configtest.o _configtest >> customize NAGFCompiler >> customize AbsoftFCompiler >> customize IBMFCompiler >> customize IntelFCompiler >> customize GnuFCompiler >> customize Gnu95FCompiler >> customize G95FCompiler >> customize PGroupFCompiler >> **don't know how to compile Fortran code on platform 'posix' >> customize NAGFCompiler >> customize AbsoftFCompiler >> customize IBMFCompiler >> customize IntelFCompiler >> customize GnuFCompiler >> customize Gnu95FCompiler >> customize G95FCompiler >> customize PGroupFCompiler >> **don't know how to compile Fortran code on platform 'posix' >> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g >> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 >> -isysroot /Developer/SDKs/MacOSX10.6.sdk >> >> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> -Inumpy/core/include >> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' >> gcc-4.2: _configtest.c >> _configtest.c:1: warning: conflicting types for built-in function ?exp? >> _configtest.c:1: warning: conflicting types for built-in function ?exp? >> gcc-4.2 _configtest.o -o _configtest >> success! >> removing: _configtest.c _configtest.o _configtest >> creating build/src.macosx-10.6-intel-3.2/numpy/core >> creating build/src.macosx-10.6-intel-3.2/numpy/core/src >> creating build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath >> conv_template:> >> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math.c >> conv_template:> >> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/ieee754.c >> conv_template:> >> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math_complex.c >> building extension "numpy.core._sort" sources >> Generating >> build/src.macosx-10.6-intel-3.2/numpy/core/include/numpy/config.h >> customize NAGFCompiler >> customize AbsoftFCompiler >> customize IBMFCompiler >> customize IntelFCompiler >> customize GnuFCompiler >> customize Gnu95FCompiler >> customize G95FCompiler >> customize PGroupFCompiler >> **don't know how to compile Fortran code on platform 'posix' >> customize NAGFCompiler >> customize AbsoftFCompiler >> customize IBMFCompiler >> customize IntelFCompiler >> customize GnuFCompiler >> customize Gnu95FCompiler >> customize G95FCompiler >> customize PGroupFCompiler >> **don't know how to compile Fortran code on platform 'posix' >> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g >> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 >> -isysroot /Developer/SDKs/MacOSX10.6.sdk >> >> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> -Inumpy/core/include >> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c' >> gcc-4.2: _configtest.c >> In file included from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, >> from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, >> from _configtest.c:1: >> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: >> No such file or directory >> In file included from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, >> from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, >> from _configtest.c:1: >> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: >> No such file or directory >> lipo: can't figure out the architecture type of: >> /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out >> In file included from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, >> from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, >> from _configtest.c:1: >> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: >> No such file or directory >> In file included from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9, >> from >> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73, >> from _configtest.c:1: >> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: >> No such file or directory >> lipo: can't figure out the architecture type of: >> /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out >> failure. >> removing: _configtest.c _configtest.o >> Running from numpy source directory.Traceback (most recent call last): >> File "setup.py", line 196, in <module> >> setup_package() >> File "setup.py", line 189, in setup_package >> configuration=configuration ) >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/core.py", >> line 186, in setup >> return old_setup(**new_attr) >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/core.py", >> line 148, in setup >> dist.run_commands() >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", >> line 917, in run_commands >> self.run_command(cmd) >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", >> line 936, in run_command >> cmd_obj.run() >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build.py", >> line 37, in run >> old_build.run(self) >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/command/build.py", >> line 126, in run >> self.run_command(cmd_name) >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/cmd.py", >> line 313, in run_command >> self.distribution.run_command(command) >> File >> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", >> line 936, in run_command >> cmd_obj.run() >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", >> line 152, in run >> self.build_sources() >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", >> line 169, in build_sources >> self.build_extension_sources(ext) >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", >> line 328, in build_extension_sources >> sources = self.generate_sources(sources, ext) >> File >> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", >> line 385, in generate_sources >> source = func(extension, build_dir) >> File "numpy/core/setup.py", line 410, in generate_config_h >> moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) >> File "numpy/core/setup.py", line 41, in check_types >> out = check_types(*a, **kw) >> File "numpy/core/setup.py", line 271, in check_types >> "Cannot compile 'Python.h'. Perhaps you need to "\ >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install >> python-dev|python-devel. >> >> >> >> Message: 3 >> Date: Sun, 18 Dec 2011 09:49:00 +0100 >> From: Ralf Gommers <ralf.gommers at googlemail.com> >> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python >> 3.2.2/MacOS X 10.7.2 >> To: Discussion of Numerical Python <numpy-discussion at scipy.org> >> Message-ID: >> <CABL7CQh-+t+6p5z_S_JVycRCB-FWK0Q3wiAWjazgmDp8AJkKqw at mail.gmail.com >>> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Sat, Dec 17, 2011 at 12:59 PM, McNicol, Adam <amcnicol at longroad.ac.uk >>> wrote: >> >>> ** >>> >>> Hi There, >>> >>> Thanks for the responses. >>> >>> At this point I would settle from just being able to install matplotlib. >>> Even if some of the functionality isn't present currently that is fine. >>> >>> I'm afraid my knowledge of Python falls down about here as well. I >>> installed Python 3.2.2 via the installer from Python.org so I have no >> idea >>> whether Python.h is present or where indeed I would find it or how I >> would >>> add it to the search path. >>> >>> Do I have to install from source or something like that? >>> >> >> No, your Python install should be fine if you just got the dmg installer >> from python.org. I recommend you install the OS X SDKs and distribute ( >> http://pypi.python.org/pypi/distribute), as I said before, and try again >> to >> compile numpy. >> >> Unfortunately you have chosen a difficult combination of OS and Python >> version, so we don't have binary installers you can use (yet). >> >> Ralf >> >> >>> Thanks again, >>> >>> >>> Adam. >>> >>> >>> -----Original Message----- >>> From: McNicol, Adam >>> Sent: Fri 12/16/2011 11:07 PM >>> To: numpy-discussion at scipy.org >>> Subject: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 >>> >>> Hi There, >>> >>> I am very new to numpy and have really only started investigating it as >>> one of my students needs some functionality from matplotlib. I have >> managed >>> to install everything under Windows for work in class but I use a Mac at >>> home and have been struggling all night to get it to build and install. >>> >>> I should mention that I am using Python 3.2.2 both in school and at home >>> and it isn't an option to use Python 2.7 as all of the rest of my class >> is >>> taught in Python 3. I also have the most recent version of Xcode >> installed. >>> >>> I have installed the correct build of gcc-4.2 with Fortran (gcc-4.2 >> (Apple >>> build 5666.3) with GNU Fortran 4.2.4 for Mac OS X 10.7 (Lion)) from >>> http://r.research.att.com/tools/ >>> >>> I then followed the install instructions but the build fails with the >>> following message: >>> >>> File "numpy/core/setup.py", line 271, in check_types >>> "Cannot compile 'Python.h'. Perhaps you need to "\ >>> SystemError: Cannot compile 'Python.h'. Perhaps you need to install >>> python-dev|python-devel. >>> >>> I have got no idea what to do with this error message. Any help would be >>> much appreciated. >>> >>> Kind Regards, >>> >>> >>> Adam. >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/0747fb12/attachment-0001.html >> >> ------------------------------ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> End of NumPy-Discussion Digest, Vol 63, Issue 54 >> ************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/35cbac93/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 63, Issue 55 > ************************************************ > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/ms-tnef > Size: 13714 bytes > Desc: not available > Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/4ae2f625/attachment.bin > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 63, Issue 56 > ************************************************ From wesmckinn at gmail.com Wed Jan 11 20:27:46 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 11 Jan 2012 20:27:46 -0500 Subject: [Numpy-discussion] Numpy 'groupby' In-Reply-To: <jejtud$abh$1@dough.gmane.org> References: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com> <jejtud$abh$1@dough.gmane.org> Message-ID: <CAJPUwMBncdJJPWUpijictfthqa_D-c2vu2o3EVt5xgz4f=An5g@mail.gmail.com> On Wed, Jan 11, 2012 at 7:05 AM, Neal Becker <ndbecker2 at gmail.com> wrote: > Michael Hull wrote: > >> Hi Everyone, >> First off, thanks for all your hard work on numpy, its a really great help! >> I was wondering if there was a standard 'groupby' in numpy, that >> similar to that in itertools. >> I know its not hard to write with np.diff, but I have found myself >> writing it on more than a couple of occasions, and wondered if >> ?there was a 'standarised' version I was missing out on?? >> Thanks, >> >> >> Mike > > I've played with groupby in pandas. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I agree (unsurprisingly) that pandas is your best bet: http://pandas.sourceforge.net/groupby.html I've had it on my TODO list to extend the pandas groupby engine (which has grown fairly sophisticated) to work with generic ndarrays and record arrays: https://github.com/wesm/pandas/issues/123 It shouldn't actually be that hard for most simple cases. I could imagine the results of a groupby being somewhat difficult to interpret without axis labeling/indexing, though. cheers, Wes From ivan.oseledets at gmail.com Thu Jan 12 09:21:32 2012 From: ivan.oseledets at gmail.com (Ivan Oseledets) Date: Thu, 12 Jan 2012 18:21:32 +0400 Subject: [Numpy-discussion] Question on F/C-ordering in numpy svd Message-ID: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com> Dear all! I quite new to numpy and python. I am a matlab user, my work is mainly on multidimensional arrays, and I have a question on the svd function from numpy.linalg It seems that u,s,v=svd(a,full_matrices=False) returns u and v in the F-contiguous format. That is not in a good agreement with other numpy stuff, where C-ordering is default. For example, matrix multiplication, dot() ignores ordering and returns result always in C-ordering. (which is documented), but the svd feature is not documented. With best wishes, Ivan From langton2 at llnl.gov Thu Jan 12 20:13:41 2012 From: langton2 at llnl.gov (Asher Langton) Date: Thu, 12 Jan 2012 17:13:41 -0800 Subject: [Numpy-discussion] Improving Python+MPI import performance Message-ID: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> Hi all, (I originally posted this to the BayPIGgies list, where Fernando Perez suggested I send it to the NumPy list as well. My apologies if you're receiving this email twice.) I work on a Python/C++ scientific code that runs as a number of independent Python processes communicating via MPI. Unfortunately, as some of you may have experienced, module importing does not scale well in Python/MPI applications. For 32k processes on BlueGene/P, importing 100 trivial C-extension modules takes 5.5 hours, compared to 35 minutes for all other interpreter loading and initialization. We developed a simple pure-Python module (based on knee.py, a hierarchical import example) that cuts the import time from 5.5 hours to 6 minutes. The code is available here: https://github.com/langton/MPI_Import Usage, implementation details, and limitations are described in a docstring at the beginning of the file (just after the mandatory legalese). I've talked with a few people who've faced the same problem and heard about a variety of approaches, which range from putting all necessary files in one directory to hacking the interpreter itself so it distributes the module-loading over MPI. Last summer, I had a student intern try a few of these approaches. It turned out that the problem wasn't so much the simultaneous module loads, but rather the huge number of failed open() calls (ENOENT) as the interpreter tries to find the module files. In the MPI_Import module, we have rank 0 perform the module lookups and then broadcast the locations to the rest of the processes. For our real-world scientific applications written in Python and C++, this has meant that we can start a problem and actually make computational progress before the batch allocation ends. If you try out the code, I'd appreciate any feedback you have: performance results, bugfixes/feature-additions, or alternate approaches to solving this problem. Thanks! -Asher From pearu.peterson at gmail.com Fri Jan 13 03:05:55 2012 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Fri, 13 Jan 2012 10:05:55 +0200 Subject: [Numpy-discussion] Question on F/C-ordering in numpy svd In-Reply-To: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com> References: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com> Message-ID: <4F0FE5E3.2050502@cens.ioc.ee> On 01/12/2012 04:21 PM, Ivan Oseledets wrote: > Dear all! > > I quite new to numpy and python. > I am a matlab user, my work is mainly > on multidimensional arrays, and I have a question on the svd function > from numpy.linalg > > It seems that > > u,s,v=svd(a,full_matrices=False) > > returns u and v in the F-contiguous format. The reason for this is that the underlying computational routine is in Fortran (when using system lapack library, for instance) that requires and returns F-contiguous arrays and the current behaviour guarantees the most memory efficient computation of svd. > That is not in a good agreement with other numpy stuff, where > C-ordering is default. > For example, matrix multiplication, dot() ignores ordering and returns > result always in C-ordering. > (which is documented), but the svd feature is not documented. In generic numpy operation, the particular ordering of arrays should not matter as the underlying code should know how to compute array operation results from different input orderings efficiently. This behaviour of svd should be documented. However, one should check that when using the svd from numpy lapack_lite (which is f2c code and could use also C-ordering, in principle), F-contiguous arrays are actually returned. Regards, Pearu From mmueller at python-academy.de Fri Jan 13 05:32:45 2012 From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=) Date: Fri, 13 Jan 2012 11:32:45 +0100 Subject: [Numpy-discussion] Python for Scientists - courses in Germany and US Message-ID: <4F10084D.1030708@python-academy.de> Learn NumPy and Much More ========================= Scientists like Python. If you would like to learn more about important libraries for scientific applications, you might be interested in these courses. The course in Germany covers: - Overview of libraries - NumPy - Data storage with text files, Excel, netCDF and HDF5 - matplotlib - Object oriented programming for scientists - Problem solving session The course in the USA covers all this plus: - Extending Python in other languages - Version control - Unit testing More details below. If you have any questions about the courses, please contact me. Mike Python for Scientists and Engineers (Germany) --------------------------------------------- A three-day course covering all the basic tools scientists and engineers need. This course requires basic Python knowledge. Date: 19.01.-21.01.2012 Location: Leipzig, Germany Trainer: Mike M?ller Course Language: English Link: http://www.python-academy.com/courses/python_course_scientists.html Python for Scientists and Engineers (USA) ----------------------------------------- This is an extend version of our well-received course for scientists and engineers. Five days of intensive training will give you a solid basis for using Python for scientific an technical problems. The course is hosted by David Beazley (http://www.dabeaz.com). Date: 27.02.-02.03.2012 Location: Chicago, IL, USA Trainer: Mike M?ller Course Language: English Link: http://www.dabeaz.com/chicago/science.html From ischnell at enthought.com Fri Jan 13 12:57:56 2012 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 13 Jan 2012 11:57:56 -0600 Subject: [Numpy-discussion] Python for Scientists - courses in Germany and US In-Reply-To: <4F10084D.1030708@python-academy.de> References: <4F10084D.1030708@python-academy.de> Message-ID: <CAAUn5qK1z-4m-oUrBzCUVq6a+7Ntrw7dXUws8G5N3AuorHdXGA@mail.gmail.com> By the way, Enthought is also offering Python training and we have just updated out training calendar for this year: http://www.enthought.com/training/enthought_training_calendar.php We are offering about 20 open Python classes in the US and Europe this year. - Ilan On Fri, Jan 13, 2012 at 4:32 AM, Mike M?ller <mmueller at python-academy.de> wrote: > Learn NumPy and Much More > ========================= > > Scientists like Python. If you would like to learn more about > important libraries for scientific applications, you might be > interested in these courses. > > The course in Germany covers: > > - Overview of libraries > - NumPy > - Data storage with text files, Excel, netCDF and HDF5 > - matplotlib > - Object oriented programming for scientists > - Problem solving session > > The course in the USA covers all this plus: > > - Extending Python in other languages > - Version control > - Unit testing > > > More details below. > > If you have any questions about the courses, please contact me. > > Mike > > > Python for Scientists and Engineers (Germany) > --------------------------------------------- > > A three-day course covering all the basic tools scientists and engineers need. > This course requires basic Python knowledge. > > Date: 19.01.-21.01.2012 > Location: Leipzig, Germany > Trainer: Mike M?ller > Course Language: English > Link: http://www.python-academy.com/courses/python_course_scientists.html > > > Python for Scientists and Engineers (USA) > ----------------------------------------- > > This is an extend version of our well-received course for > scientists and engineers. Five days of intensive training > will give you a solid basis for using Python for scientific > an technical problems. > > The course is hosted by David Beazley (http://www.dabeaz.com). > > Date: 27.02.-02.03.2012 > Location: Chicago, IL, USA > Trainer: Mike M?ller > Course Language: English > Link: http://www.dabeaz.com/chicago/science.html > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Fri Jan 13 14:41:19 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Jan 2012 20:41:19 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> Message-ID: <4F1088DF.6090201@molden.no> Den 13.01.2012 02:13, skrev Asher Langton: > intern try a few of these approaches. It turned out that the problem > wasn't so much the simultaneous module loads, but rather the huge > number of failed open() calls (ENOENT) as the interpreter tries to > find the module files. It sounds like there is a scalability problem with imp.find_module. I'd report this on python-dev or python-ideas. Sturla From robert.kern at gmail.com Fri Jan 13 14:53:44 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 13 Jan 2012 19:53:44 +0000 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F1088DF.6090201@molden.no> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> <4F1088DF.6090201@molden.no> Message-ID: <CAF6FJit110_vLUwThKR5+3EPP3yV_kvQOv4QW4NEAbxz4BWf-w@mail.gmail.com> On Fri, Jan 13, 2012 at 19:41, Sturla Molden <sturla at molden.no> wrote: > Den 13.01.2012 02:13, skrev Asher Langton: >> intern try a few of these approaches. It turned out that the problem >> wasn't so much the simultaneous module loads, but rather the huge >> number of failed open() calls (ENOENT) as the interpreter tries to >> find the module files. > > It sounds like there is a scalability problem with imp.find_module. I'd > report > this on python-dev or python-ideas. It's well-known. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From d.s.seljebotn at astro.uio.no Fri Jan 13 15:19:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Jan 2012 21:19:11 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> Message-ID: <4F1091BF.4020801@astro.uio.no> On 01/13/2012 02:13 AM, Asher Langton wrote: > Hi all, > > (I originally posted this to the BayPIGgies list, where Fernando Perez > suggested I send it to the NumPy list as well. My apologies if you're > receiving this email twice.) > > I work on a Python/C++ scientific code that runs as a number of > independent Python processes communicating via MPI. Unfortunately, as > some of you may have experienced, module importing does not scale well > in Python/MPI applications. For 32k processes on BlueGene/P, importing > 100 trivial C-extension modules takes 5.5 hours, compared to 35 > minutes for all other interpreter loading and initialization. We > developed a simple pure-Python module (based on knee.py, a > hierarchical import example) that cuts the import time from 5.5 hours > to 6 minutes. > > The code is available here: > > https://github.com/langton/MPI_Import > > Usage, implementation details, and limitations are described in a > docstring at the beginning of the file (just after the mandatory > legalese). > > I've talked with a few people who've faced the same problem and heard > about a variety of approaches, which range from putting all necessary > files in one directory to hacking the interpreter itself so it > distributes the module-loading over MPI. Last summer, I had a student > intern try a few of these approaches. It turned out that the problem > wasn't so much the simultaneous module loads, but rather the huge > number of failed open() calls (ENOENT) as the interpreter tries to > find the module files. In the MPI_Import module, we have rank 0 > perform the module lookups and then broadcast the locations to the > rest of the processes. For our real-world scientific applications > written in Python and C++, this has meant that we can start a problem > and actually make computational progress before the batch allocation > ends. This is great news! I've forwarded to the mpi4py mailing list which despairs over this regularly. Another idea: Given your diagnostics, wouldn't dumping the output of "find" of every path in sys.path to a single text file work well? Then each node download that file once and consult it when looking for modules, instead of network file metadata. (In fact I think "texhash" does the same for LaTeX?) The disadvantage is that one would need to run "update-python-paths" every time a package is installed to update the text file. But I'm not sure if that that disadvantage is larger than remembering to avoid diverging import paths between nodes; hopefully one could put a reminder to run update-python-paths in the ImportError string. > If you try out the code, I'd appreciate any feedback you have: > performance results, bugfixes/feature-additions, or alternate > approaches to solving this problem. Thanks! I didn't try it myself, but forwarding this from the mpi4py mailing list: """ I'm testing it now and actually running into some funny errors with unittest on Python 2.7 causing infinite recursion. If anyone is able to get this going, and could report successes back to the group, that would be very helpful. """ Dag Sverre From d.s.seljebotn at astro.uio.no Fri Jan 13 15:21:30 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Jan 2012 21:21:30 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F1091BF.4020801@astro.uio.no> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> <4F1091BF.4020801@astro.uio.no> Message-ID: <4F10924A.4080600@astro.uio.no> On 01/13/2012 09:19 PM, Dag Sverre Seljebotn wrote: > On 01/13/2012 02:13 AM, Asher Langton wrote: >> Hi all, >> >> (I originally posted this to the BayPIGgies list, where Fernando Perez >> suggested I send it to the NumPy list as well. My apologies if you're >> receiving this email twice.) >> >> I work on a Python/C++ scientific code that runs as a number of >> independent Python processes communicating via MPI. Unfortunately, as >> some of you may have experienced, module importing does not scale well >> in Python/MPI applications. For 32k processes on BlueGene/P, importing >> 100 trivial C-extension modules takes 5.5 hours, compared to 35 >> minutes for all other interpreter loading and initialization. We >> developed a simple pure-Python module (based on knee.py, a >> hierarchical import example) that cuts the import time from 5.5 hours >> to 6 minutes. >> >> The code is available here: >> >> https://github.com/langton/MPI_Import >> >> Usage, implementation details, and limitations are described in a >> docstring at the beginning of the file (just after the mandatory >> legalese). >> >> I've talked with a few people who've faced the same problem and heard >> about a variety of approaches, which range from putting all necessary >> files in one directory to hacking the interpreter itself so it >> distributes the module-loading over MPI. Last summer, I had a student >> intern try a few of these approaches. It turned out that the problem >> wasn't so much the simultaneous module loads, but rather the huge >> number of failed open() calls (ENOENT) as the interpreter tries to >> find the module files. In the MPI_Import module, we have rank 0 >> perform the module lookups and then broadcast the locations to the >> rest of the processes. For our real-world scientific applications >> written in Python and C++, this has meant that we can start a problem >> and actually make computational progress before the batch allocation >> ends. > > This is great news! I've forwarded to the mpi4py mailing list which > despairs over this regularly. > > Another idea: Given your diagnostics, wouldn't dumping the output of > "find" of every path in sys.path to a single text file work well? Then > each node download that file once and consult it when looking for > modules, instead of network file metadata. > > (In fact I think "texhash" does the same for LaTeX?) > > The disadvantage is that one would need to run "update-python-paths" > every time a package is installed to update the text file. But I'm not > sure if that that disadvantage is larger than remembering to avoid > diverging import paths between nodes; hopefully one could put a reminder > to run update-python-paths in the ImportError string. I meant "diverging code paths during imports between nodes".. Dag > > >> If you try out the code, I'd appreciate any feedback you have: >> performance results, bugfixes/feature-additions, or alternate >> approaches to solving this problem. Thanks! > > I didn't try it myself, but forwarding this from the mpi4py mailing list: > > """ > I'm testing it now and actually > running into some funny errors with unittest on Python 2.7 causing > infinite recursion. If anyone is able to get this going, and could > report successes back to the group, that would be very helpful. > """ > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Fri Jan 13 15:38:50 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Jan 2012 21:38:50 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10924A.4080600@astro.uio.no> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> <4F1091BF.4020801@astro.uio.no> <4F10924A.4080600@astro.uio.no> Message-ID: <4F10965A.4040605@molden.no> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: > Another idea: Given your diagnostics, wouldn't dumping the output of > "find" of every path in sys.path to a single text file work well? It probably would, and would also be less prone to synchronization problems than using an MPI broadcast. Another possibility would be to use a bsddb (or sqlite?) file as a persistent dict for caching the output of imp.find_module. Sturla From langton2 at llnl.gov Fri Jan 13 16:20:23 2012 From: langton2 at llnl.gov (Langton, Asher) Date: Fri, 13 Jan 2012 13:20:23 -0800 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10965A.4040605@molden.no> Message-ID: <CB35D7D9.A659%langton2@llnl.gov> On 1/13/12 12:38 PM, Sturla Molden wrote: >Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >> Another idea: Given your diagnostics, wouldn't dumping the output of >> "find" of every path in sys.path to a single text file work well? > >It probably would, and would also be less prone to synchronization >problems than using an MPI broadcast. Another possibility would be to >use a bsddb (or sqlite?) file as a persistent dict for caching the >output of imp.find_module. We tested something along those lines. Tim Kadich, a summer student at LLNL, wrote a module that went through the path and built up a dict of module->location mappings for a subset of module types. My recollection is that it worked well, and as you note, it didn't have the synchronization issues that MPI_Import has. We didn't fully implement it, since to handle complicated packages correctly, it looked like we'd either have to re-implement a lot of the internal Python import code or modify the interpreter itself. I don't think that MPI_Import is ultimately the "right" solution, but it shows how easily we can reap significant gains. Two better approaches that come to mind are: 1) Fixing this bottleneck at the interpreter level (pre-computing and caching the locations) 2) More generally, dealing with this as well as other library-loading issues at the system level, perhaps by putting a small disk near a node or small collection of nodes, along with a command to push (broadcast) some portions of the filesystem to these (more-)local disks. Basically, the idea would be to let the user specify those directories or objects that will be accessed by most of the processes and treated as read-only so that those objects can be cached near the node. -Asher From robert.kern at gmail.com Fri Jan 13 16:24:11 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 13 Jan 2012 21:24:11 +0000 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CB35D7D9.A659%langton2@llnl.gov> References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov> Message-ID: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> On Fri, Jan 13, 2012 at 21:20, Langton, Asher <langton2 at llnl.gov> wrote: > 2) More generally, dealing with this as well as other library-loading > issues at the system level, perhaps by putting a small disk near a node or > small collection of nodes, along with a command to push (broadcast) some > portions of the filesystem to these (more-)local disks. Basically, the > idea would be to let the user specify those directories or objects that > will be accessed by most of the processes and treated as read-only so that > those objects can be cached near the node. Do these systems have a ramdisk capability? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From sturla at molden.no Fri Jan 13 16:42:25 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Jan 2012 22:42:25 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov> <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> Message-ID: <4F10A541.60305@molden.no> Den 13.01.2012 22:24, skrev Robert Kern: > Do these systems have a ramdisk capability? I assume you have seen this as well :) http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf Sturla From travis at continuum.io Fri Jan 13 16:48:51 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 13 Jan 2012 15:48:51 -0600 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10965A.4040605@molden.no> References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com> <4F1091BF.4020801@astro.uio.no> <4F10924A.4080600@astro.uio.no> <4F10965A.4040605@molden.no> Message-ID: <88C67BE4-5FE5-45FC-AD3C-540F269B6D50@continuum.io> It is a straightforward thing to implement a "registry mechanism" for Python that by-passes imp.find_module (i.e. using sys.meta_path). You could imagine creating the registry file for a package or distribution (much like Dag described) and push that to every node during distribution. The registry file would have the map between package_name : file_location which would avoid all the failed open calls. You would need to keep the registry updated as Dag describes, but this seems like a fairly simple approach that should help. -Travis On Jan 13, 2012, at 2:38 PM, Sturla Molden wrote: > Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >> Another idea: Given your diagnostics, wouldn't dumping the output of >> "find" of every path in sys.path to a single text file work well? > > It probably would, and would also be less prone to synchronization > problems than using an MPI broadcast. Another possibility would be to > use a bsddb (or sqlite?) file as a persistent dict for caching the > output of imp.find_module. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From langton2 at llnl.gov Fri Jan 13 16:52:29 2012 From: langton2 at llnl.gov (Langton, Asher) Date: Fri, 13 Jan 2012 13:52:29 -0800 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> Message-ID: <CB35E3B6.A6A5%langton2@llnl.gov> On 1/13/12 1:24 PM, Robert Kern wrote: >On Fri, Jan 13, 2012 at 21:20, Langton, Asher <langton2 at llnl.gov> wrote: > >> 2) More generally, dealing with this as well as other library-loading >> issues at the system level, perhaps by putting a small disk near a node >>or >> small collection of nodes, along with a command to push (broadcast) some >> portions of the filesystem to these (more-)local disks. Basically, the >> idea would be to let the user specify those directories or objects that >> will be accessed by most of the processes and treated as read-only so >>that >> those objects can be cached near the node. > >Do these systems have a ramdisk capability? That was another thing we looked at (but didn't implement): broadcasting the modules to each node and putting them in a ramdisk. The drawback (for us) is that we're already struggling with the amount of available memory per core, and according to the vendors, the situation will only get worse on future systems. The ramdisk approach might work well when there are lots of small objects that will be accessed. On 1/13/12 1:42 PM, Sturla Molden wrote: >Den 13.01.2012 22:24, skrev Robert Kern: >>Do these systems have a ramdisk capability? > >I assume you have seen this as well :) > >http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final >.pdf I hadn't. Thanks! -Asher From jlounds at dynamiteinc.com Fri Jan 13 16:56:59 2012 From: jlounds at dynamiteinc.com (Jeremy Lounds) Date: Fri, 13 Jan 2012 16:56:59 -0500 Subject: [Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude Message-ID: <20120113165659.25556@web003.nyc1.bluetie.com> Hello, I am looking for some help extracting a subset of data from a large dataset. The data is being read from a wgrib2 (World Meterological Organization standard gridded data) using the pygrib library. The data values, latitudes and longitudes are in separate lists (arrays?), and I would like a regional subset. The budget is not very large, but I am hoping that this is pretty simple job. I am just way too green at Python / numpy to know how to proceed, or even what to search for on Google. If interested, please e-mail jlounds at dynamiteinc.com Thank you! Jeremy Lounds DynamiteInc.com 1-877-762-7723, ext 711 Fax: 877-202-3014 From d.s.seljebotn at astro.uio.no Fri Jan 13 16:58:56 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Jan 2012 22:58:56 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <CB35D7D9.A659%langton2@llnl.gov> References: <CB35D7D9.A659%langton2@llnl.gov> Message-ID: <4F10A920.1050109@astro.uio.no> On 01/13/2012 10:20 PM, Langton, Asher wrote: > On 1/13/12 12:38 PM, Sturla Molden wrote: >> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >>> Another idea: Given your diagnostics, wouldn't dumping the output of >>> "find" of every path in sys.path to a single text file work well? >> >> It probably would, and would also be less prone to synchronization >> problems than using an MPI broadcast. Another possibility would be to >> use a bsddb (or sqlite?) file as a persistent dict for caching the >> output of imp.find_module. > > We tested something along those lines. Tim Kadich, a summer student at > LLNL, wrote a module that went through the path and built up a dict of > module->location mappings for a subset of module types. My recollection is > that it worked well, and as you note, it didn't have the synchronization > issues that MPI_Import has. We didn't fully implement it, since to handle > complicated packages correctly, it looked like we'd either have to > re-implement a lot of the internal Python import code or modify the > interpreter itself. I don't think that MPI_Import is ultimately the > "right" solution, but it shows how easily we can reap significant gains. > Two better approaches that come to mind are: It's actually not too difficult to do something like LD_PRELOAD=myhack.so python something.py and have myhack.so intercept the filesystem calls Python makes (to libc) and do whatever it wants. That's a solution that doesn't interfer with how Python does its imports at all, it simply changes how Python perceives the world around it ("emulation", though much, much lighter). It does require some low-level C code, but there are several examples on the net. I know Ondrej Certik just implemented something similar. Note, I'm just brainstorming here and recording possible (and perhaps impossible) ideas in this thread -- the solution you have found is indeed a great step forward! Dag Sverre > > 1) Fixing this bottleneck at the interpreter level (pre-computing and > caching the locations) > > 2) More generally, dealing with this as well as other library-loading > issues at the system level, perhaps by putting a small disk near a node or > small collection of nodes, along with a command to push (broadcast) some > portions of the filesystem to these (more-)local disks. Basically, the > idea would be to let the user specify those directories or objects that > will be accessed by most of the processes and treated as read-only so that > those objects can be cached near the node. > > -Asher > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From langton2 at llnl.gov Fri Jan 13 17:09:56 2012 From: langton2 at llnl.gov (Langton, Asher) Date: Fri, 13 Jan 2012 14:09:56 -0800 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10A920.1050109@astro.uio.no> Message-ID: <CB35E9D3.A6D1%langton2@llnl.gov> On 1/13/12 1:58 PM, Dag Sverre Seljebotn wrote: > >It's actually not too difficult to do something like > >LD_PRELOAD=myhack.so python something.py > >and have myhack.so intercept the filesystem calls Python makes (to libc) >and do whatever it wants. That's a solution that doesn't interfer with >how Python does its imports at all, it simply changes how Python >perceives the world around it ("emulation", though much, much lighter). > >It does require some low-level C code, but there are several examples on >the net. I know Ondrej Certik just implemented something similar. One of my colleagues suggested the LD_PRELOAD trick. I asked around here at LLNL, and I seem to recall hearing that the LD_PRELOAD trick didn't work on BlueGene/P, which is where the import bottleneck is the worst. That might have been incorrect though, since LD_PRELOAD is mentioned on Argonne's BG/P wiki. I'll have to look into this some more. -Asher From robert.kern at gmail.com Fri Jan 13 17:11:07 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 13 Jan 2012 22:11:07 +0000 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10A541.60305@molden.no> References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov> <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> <4F10A541.60305@molden.no> Message-ID: <CAF6FJistOx6srWumnetxNw7usa5TtChZz9fEwiC=zc7GHRHgwg@mail.gmail.com> On Fri, Jan 13, 2012 at 21:42, Sturla Molden <sturla at molden.no> wrote: > Den 13.01.2012 22:24, skrev Robert Kern: >> Do these systems have a ramdisk capability? > > I assume you have seen this as well :) > > http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf I hadn't, actually! Good find! Actually, this same problem came up at the last SciPy conference from several people (Blue Genes are more common than I expected!), and the ramdisk was just my first idea. I'm glad people have evaluated it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From chaoyuejoy at gmail.com Fri Jan 13 17:31:58 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 13 Jan 2012 23:31:58 +0100 Subject: [Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude In-Reply-To: <20120113165659.25556@web003.nyc1.bluetie.com> References: <20120113165659.25556@web003.nyc1.bluetie.com> Message-ID: <CAAN-aREKAz4wC9Y9rGZJD2J1bw0Z80gKKaKQVYtYN46XW9TMAw@mail.gmail.com> Hi, I don't know if numpy has ready tool for this. I also have this use in my study. So I write a simple code for my personal use. It might no be great. I hope others can also respond as this is very basic function in earth data analysis. ############################################################3 import numpy as np lat=np.arange(89.75,-90,-0.5) lon=np.arange(-179.75,180,0.5) lon0,lat0=np.meshgrid(lon,lat) #crate the grid from demonstration def Get_GridValue(data,(vlat1,vlat2),(vlon1,vlon2)): index_lat=np.nonzero((lat[:]>=vlat1)&(lat[:]<=vlat2))[0] index_lon=np.nonzero((lon[:]>=vlon1)&(lon[:]<=vlon2))[0] target=data[...,index_lat[0]:index_lat[-1]+1,index_lon[0]:index_lon[-1]+1] return target Get_GridValue(lat0,(40,45),(-30,-25)) Get_GridValue(lon0,(40,45),(-30,-25)) ############################################################ Chao 2012/1/13 Jeremy Lounds <jlounds at dynamiteinc.com> > Hello, > > I am looking for some help extracting a subset of data from a large > dataset. The data is being read from a wgrib2 (World Meterological > Organization standard gridded data) using the pygrib library. > > The data values, latitudes and longitudes are in separate lists (arrays?), > and I would like a regional subset. > > The budget is not very large, but I am hoping that this is pretty simple > job. I am just way too green at Python / numpy to know how to proceed, or > even what to search for on Google. > > If interested, please e-mail jlounds at dynamiteinc.com > > Thank you! > > Jeremy Lounds > DynamiteInc.com > 1-877-762-7723, ext 711 > Fax: 877-202-3014 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120113/4a1816b4/attachment.html> From sturla at molden.no Fri Jan 13 18:28:39 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 14 Jan 2012 00:28:39 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10A541.60305@molden.no> References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov> <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> <4F10A541.60305@molden.no> Message-ID: <4F10BE27.6010601@molden.no> Den 13.01.2012 22:42, skrev Sturla Molden: > Den 13.01.2012 22:24, skrev Robert Kern: >> Do these systems have a ramdisk capability? > I assume you have seen this as well :) > > http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf > This paper also repeats a common mistake about the GIL: "A future challenge is the increasing number of CPU cores per node, which is normally addressed by hybrid thread and message passing based parallelization. Whereas message passing can be used transparently by both on Python and C level, the global interpreter lock in CPython limits the thread based parallelization to the C-extensions only. We are currently investigating hybrid OpenMP/MPI implementation with the hope that limiting threading to only C-extension provides enough performance." This is NOT true. Python threads are native OS threads. They can be used for parallel computing on multi-core CPUs. The only requirement is that the Python code calls a C extension that releases the GIL. We can use threads in C or Python code: OpenMP and threading.Thread perform equally well, but if we use threading.Thread the GIL must be released for parallel execution. OpenMP is typically better for fine-grained parallelism in C code and threading.Thread is better for course-grained parallelism in Python code. The latter is also where mpi4py and multiprocessing can be used. Sturla From d.s.seljebotn at astro.uio.no Sat Jan 14 02:21:32 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Jan 2012 08:21:32 +0100 Subject: [Numpy-discussion] Improving Python+MPI import performance In-Reply-To: <4F10BE27.6010601@molden.no> References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov> <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com> <4F10A541.60305@molden.no> <4F10BE27.6010601@molden.no> Message-ID: <4F112CFC.307@astro.uio.no> On 01/14/2012 12:28 AM, Sturla Molden wrote: > Den 13.01.2012 22:42, skrev Sturla Molden: >> Den 13.01.2012 22:24, skrev Robert Kern: >>> Do these systems have a ramdisk capability? >> I assume you have seen this as well :) >> >> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf >> > > This paper also repeats a common mistake about the GIL: > > "A future challenge is the increasing number of CPU cores per node, > which is normally addressed by hybrid thread and message passing based > parallelization. Whereas message passing can be used transparently by > both on Python and C level, the global interpreter lock in CPython > limits the thread based parallelization to the C-extensions only. We are > currently investigating hybrid OpenMP/MPI implementation with the hope > that limiting threading to only C-extension provides enough performance." > > This is NOT true. > > Python threads are native OS threads. They can be used for parallel > computing on multi-core CPUs. The only requirement is that the Python > code calls a C extension that releases the GIL. We can use threads in C > or Python code: OpenMP and threading.Thread perform equally well, but if > we use threading.Thread the GIL must be released for parallel execution. > OpenMP is typically better for fine-grained parallelism in C code and > threading.Thread is better for course-grained parallelism in Python > code. The latter is also where mpi4py and multiprocessing can be used. I don't see how you contradict their statement. The only code that can run without the GIL is in C-extensions (even if it is written in, say, Cython). Dag Sverre From totonixsame at gmail.com Sat Jan 14 15:52:55 2012 From: totonixsame at gmail.com (Thiago Franco de Moraes) Date: Sat, 14 Jan 2012 18:52:55 -0200 Subject: [Numpy-discussion] Calculating density based on distance Message-ID: <4F11EB27.2020909@gmail.com> Hi all, I have the following problem: Given a array with dimension Nx3, where N is generally greater than 1.000.000, for each item in this array I have to calculate its density, Where its density is the number of items from the same array with distance less than a given r. The items are the rows from the array. I was not able to think a solution to this using one or two functions of Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem it so slow. So I tried to implement it in Cython, here the result http://pastebin.com/zTywzjyM , but it is very slow yet. Is there a better and faster way of doing that? Is there something in my Cython implementation I can do to perform better? Thanks! From ben.root at ou.edu Sat Jan 14 16:07:02 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 14 Jan 2012 15:07:02 -0600 Subject: [Numpy-discussion] Calculating density based on distance In-Reply-To: <4F11EB27.2020909@gmail.com> References: <4F11EB27.2020909@gmail.com> Message-ID: <CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com> On Saturday, January 14, 2012, Thiago Franco de Moraes < totonixsame at gmail.com> wrote: > Hi all, > > I have the following problem: > > Given a array with dimension Nx3, where N is generally greater than > 1.000.000, for each item in this array I have to calculate its density, > Where its density is the number of items from the same array with > distance less than a given r. The items are the rows from the array. > > I was not able to think a solution to this using one or two functions of > Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem > it so slow. So I tried to implement it in Cython, here the result > http://pastebin.com/zTywzjyM , but it is very slow yet. > > Is there a better and faster way of doing that? Is there something in my > Cython implementation I can do to perform better? > > Thanks! Have you looked at scipy.spatial.KDTree? It can efficiently load up a data structure that lets you easily determine the spatial relationship between datapoints. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/bc7f2737/attachment.html> From sturla at molden.no Sat Jan 14 16:21:48 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 14 Jan 2012 22:21:48 +0100 Subject: [Numpy-discussion] Calculating density based on distance In-Reply-To: <4F11EB27.2020909@gmail.com> References: <4F11EB27.2020909@gmail.com> Message-ID: <4F11F1EC.5020704@molden.no> Den 14.01.2012 21:52, skrev Thiago Franco de Moraes: > Is there a better and faster way of doing that? Is there something in my > Cython implementation I can do to perform better? > > You need to use a kd-tree to make the computation run in O(n log n) time instead of O(n**2). scipy.spatial.cKDTree is very fast. Sturla From charlesr.harris at gmail.com Sat Jan 14 17:12:15 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Jan 2012 15:12:15 -0700 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. Message-ID: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> This sort of makes sense, but is it the 'correct' behavior? In [20]: zeros(2, 'S') Out[20]: array(['', ''], dtype='|S1') It might be more consistent to return '0' instead, as in In [3]: zeros(2, int).astype('S') Out[3]: array(['0', '0'], dtype='|S24') Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/35805be6/attachment.html> From ben.root at ou.edu Sat Jan 14 17:16:28 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 14 Jan 2012 16:16:28 -0600 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> Message-ID: <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris <charlesr.harris at gmail.com > wrote: > This sort of makes sense, but is it the 'correct' behavior? > > In [20]: zeros(2, 'S') > Out[20]: > array(['', ''], > dtype='|S1') > > It might be more consistent to return '0' instead, as in > > In [3]: zeros(2, int).astype('S') > Out[3]: > array(['0', '0'], > dtype='|S24') > > Chuck > > Whatever it should be, numpy is currently inconsistent: >>> np.empty(2, 'S') array(['0', '\xd4'], dtype='|S1') >>> np.zeros(2, 'S') array(['', ''], dtype='|S1') >>> np.ones(2, 'S') array(['1', '1'], dtype='|S1') I would expect '0''s for the call to zeros() and empty strings for the call to empty(). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/c9906a01/attachment.html> From ben.root at ou.edu Sat Jan 14 17:25:05 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 14 Jan 2012 16:25:05 -0600 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> Message-ID: <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com> On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote: > On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> This sort of makes sense, but is it the 'correct' behavior? >> >> In [20]: zeros(2, 'S') >> Out[20]: >> array(['', ''], >> dtype='|S1') >> >> It might be more consistent to return '0' instead, as in >> >> In [3]: zeros(2, int).astype('S') >> Out[3]: >> array(['0', '0'], >> dtype='|S24') >> >> Chuck >> >> > Whatever it should be, numpy is currently inconsistent: > > >>> np.empty(2, 'S') > array(['0', '\xd4'], > dtype='|S1') > >>> np.zeros(2, 'S') > > array(['', ''], > dtype='|S1') > >>> np.ones(2, 'S') > array(['1', '1'], > dtype='|S1') > > I would expect '0''s for the call to zeros() and empty strings for the > call to empty(). > > Ben Root > > On the other hand, it is fairly standard to assume that the values in the array returned by empty() to be random, uninitialized junk. So, maybe empty()'s current behavior is ok, but certainly zeros()'s and ones()'s behaviors need to be looked at. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/4cd78e7a/attachment.html> From charlesr.harris at gmail.com Sat Jan 14 17:31:07 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Jan 2012 15:31:07 -0700 Subject: [Numpy-discussion] Fix for ticket #1973 Message-ID: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> I've put up a pull request for a fix to ticket #1973. Currently the fix simply propagates the maskna flag when the *.astype method is called. A more complicated option would be to add a maskna keyword to specify whether the output is masked or not or propagates the type of the source, but that seems overly complex to me. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/28bfe98c/attachment.html> From nathan.faggian at gmail.com Sat Jan 14 18:53:52 2012 From: nathan.faggian at gmail.com (Nathan Faggian) Date: Sun, 15 Jan 2012 10:53:52 +1100 Subject: [Numpy-discussion] Negative indexing. Message-ID: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> Hi, I am finding it less than useful to have the negative index wrapping on nd-arrays. Here is a short example: import numpy as np a = np.zeros((3, 3)) a[:,2] = 1000 print a[0,-1] print a[0,-1] print a[-1,-1] In all cases 1000 is printed out. What I am after is a way to say "please don't wrap around" and have negative indices behave in a way I choose. I know this is a standard thing - but is there a way to override that behaviour that doesn't involve cython or rolling my own resampler? Kind Regards, Nathan. From josef.pktd at gmail.com Sat Jan 14 19:21:55 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 14 Jan 2012 19:21:55 -0500 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com> Message-ID: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com> On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote: > On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote: >> >> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris >> <charlesr.harris at gmail.com> wrote: >>> >>> This sort of makes sense, but is it the 'correct' behavior? >>> >>> In [20]: zeros(2, 'S') >>> Out[20]: >>> array(['', ''], >>> ????? dtype='|S1') >>> >>> It might be more consistent to return '0' instead, as in >>> >>> In [3]: zeros(2, int).astype('S') >>> Out[3]: >>> array(['0', '0'], >>> ????? dtype='|S24') I would be surprised if zeros is not an empty string, since an empty string is the "zero" for string addition. multiplication for strings doesn't exist, so ones can be anything even literally '1' >>> a = np.zeros(5,'S4') >>> a[:] = 'b' >>> reduce(lambda x,y: x+y, a) 'bbbbb' >>> a = np.zeros(1,'S100') >>> for i in range(5): a[:] = a.item() + 'a' ... >>> a array(['aaaaa'], dtype='|S100') just as a logical argument, I have no idea what's practical since last time I tried to use numpy strings, I didn't find string addition and went back to double and triple list comprehension. Josef >>> >>> Chuck >>> >> >> Whatever it should be, numpy is currently inconsistent: >> >> >>> np.empty(2, 'S') >> array(['0', '\xd4'], >> ????? dtype='|S1') >> >>> np.zeros(2, 'S') >> >> array(['', ''], >> ????? dtype='|S1') >> >>> np.ones(2, 'S') >> array(['1', '1'], >> ????? dtype='|S1') >> >> I would expect '0''s for the call to zeros() and empty strings for the >> call to empty(). >> >> Ben Root >> > > On the other hand, it is fairly standard to assume that the values in the > array returned by empty() to be random, uninitialized junk.? So, maybe > empty()'s current behavior is ok, but certainly zeros()'s and ones()'s > behaviors need to be looked at. > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sat Jan 14 21:02:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Jan 2012 19:02:49 -0700 Subject: [Numpy-discussion] GSOC In-Reply-To: <CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com> References: <CAB6mnxLYF65T+hwYJSm6-PYOeMitUM6qZcM2WjgjvTsa_yL9ZA@mail.gmail.com> <CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com> Message-ID: <CAB6mnx+qf4F0awbG67D-7JKio_r4mhuL9nfLWq5i8ujueW=YPA@mail.gmail.com> On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers <ralf.gommers at googlemail.com>wrote: > > > On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> I thought I'd raise this topic just to get some ideas out there. At the >> moment I see two areas that I'd like to see addressed. >> >> >> 1. Documentation editor. This would involve looking at the generated >> documentation and it's organization/coverage as well such things as style >> and maybe reviewing stuff on the documentation site. This would be more >> technical writing than coding. >> 2. Test coverage. There are a lot of areas of numpy that are not well >> tested as well as some tests that are still doc tests and should probably >> be updated. This is a substantial amount of work and would require some >> familiarity with numpy as well as a willingness to ping developers for >> clarification of some topics. >> >> Thoughts? >> > First thought: very useful, but probably not GSOC topics by themselves. > > For a very good student, I'd think topics like implementing NA bit masks > or improved user-defined dtypes would be interesting. In SciPy there's also > a lot to do, and that's probably a better project for students who prefer > to work in Python. > > Besides NA bit masks, the new iterator isn't used in a lot of places it could be. Maybe replacing all uses of the old iterator? I'll admit, that smacks more of maintenance than developing new code and might be a hard sell. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/87af1221/attachment.html> From totonixsame at gmail.com Sat Jan 14 21:23:47 2012 From: totonixsame at gmail.com (Thiago Franco Moraes) Date: Sun, 15 Jan 2012 00:23:47 -0200 Subject: [Numpy-discussion] Calculating density based on distance In-Reply-To: <CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com> References: <4F11EB27.2020909@gmail.com> <CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com> Message-ID: <CAMmoLX8VqnaUaD-DqyO4mKXAWywsJDOt-x28-yWddUtrpt7ZnA@mail.gmail.com> No dia S?bado, 14 de Janeiro de 2012, Benjamin Rootben.root at ou.edu escreveu: > > > On Saturday, January 14, 2012, Thiago Franco de Moraes < totonixsame at gmail.com> wrote: >> Hi all, >> >> I have the following problem: >> >> Given a array with dimension Nx3, where N is generally greater than >> 1.000.000, for each item in this array I have to calculate its density, >> Where its density is the number of items from the same array with >> distance less than a given r. The items are the rows from the array. >> >> I was not able to think a solution to this using one or two functions of >> Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem >> it so slow. So I tried to implement it in Cython, here the result >> http://pastebin.com/zTywzjyM , but it is very slow yet. >> >> Is there a better and faster way of doing that? Is there something in my >> Cython implementation I can do to perform better? >> >> Thanks! > > Have you looked at scipy.spatial.KDTree? It can efficiently load up a data structure that lets you easily determine the spatial relationship between datapoints. > > Ben Root Thanks, Ben, I'm going to do that. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/f89691f3/attachment.html> From charlesr.harris at gmail.com Sat Jan 14 23:01:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Jan 2012 21:01:09 -0700 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com> <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com> Message-ID: <CAB6mnxKDM3OTbsCfR6OszA7BncZB8K9v4oiv6t6L6On=zdgzuQ@mail.gmail.com> On Sat, Jan 14, 2012 at 5:21 PM, <josef.pktd at gmail.com> wrote: > On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote: > > On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote: > >> > >> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris > >> <charlesr.harris at gmail.com> wrote: > >>> > >>> This sort of makes sense, but is it the 'correct' behavior? > >>> > >>> In [20]: zeros(2, 'S') > >>> Out[20]: > >>> array(['', ''], > >>> dtype='|S1') > >>> > >>> It might be more consistent to return '0' instead, as in > >>> > >>> In [3]: zeros(2, int).astype('S') > >>> Out[3]: > >>> array(['0', '0'], > >>> dtype='|S24') > > > > I would be surprised if zeros is not an empty string, since an empty > string is the "zero" for string addition. > multiplication for strings doesn't exist, so ones can be anything even > literally '1' > > >>> a = np.zeros(5,'S4') > >>> a[:] = 'b' > >>> reduce(lambda x,y: x+y, a) > 'bbbbb' > > > >>> a = np.zeros(1,'S100') > >>> for i in range(5): a[:] = a.item() + 'a' > ... > >>> a > array(['aaaaa'], > dtype='|S100') > > > just as a logical argument, I have no idea what's practical since last > time I tried to use numpy strings, I didn't find string addition and > went back to double and triple list comprehension. > > I don't think it was quite so cleverly reasoned out ;) The functions works as expected for object arrays, but that is the only exception. For all other types the allocated space is simply filled with zero bytes. Too bad this isn't done in python like ones, it would be easier to fix. <snip> Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/b8784270/attachment.html> From paul.anton.letnes at gmail.com Sun Jan 15 02:39:50 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 15 Jan 2012 08:39:50 +0100 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com> <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com> <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com> Message-ID: <20CF3C33-B239-4AE0-BE93-AC02E637956E@gmail.com> On 15. jan. 2012, at 01:21, josef.pktd at gmail.com wrote: > On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote: >> On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote: >>> >>> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris >>> <charlesr.harris at gmail.com> wrote: >>>> >>>> This sort of makes sense, but is it the 'correct' behavior? >>>> >>>> In [20]: zeros(2, 'S') >>>> Out[20]: >>>> array(['', ''], >>>> dtype='|S1') >>>> >>>> It might be more consistent to return '0' instead, as in >>>> >>>> In [3]: zeros(2, int).astype('S') >>>> Out[3]: >>>> array(['0', '0'], >>>> dtype='|S24') > > > > I would be surprised if zeros is not an empty string, since an empty > string is the "zero" for string addition. > multiplication for strings doesn't exist, so ones can be anything even > literally '1' My python disagrees. In [1]: 2 * 'spam ham ' Out[1]: 'spam ham spam ham ' Not sure what the element-wise numpy array equivalent would be, though. Paul From njs at pobox.com Sun Jan 15 03:15:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 15 Jan 2012 00:15:41 -0800 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> Message-ID: <CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com> On Sat, Jan 14, 2012 at 2:12 PM, Charles R Harris <charlesr.harris at gmail.com> wrote: > This sort of makes sense, but is it the 'correct' behavior? > > In [20]: zeros(2, 'S') > Out[20]: > array(['', ''], > ????? dtype='|S1') I think of numpy strings as raw fixed-length byte arrays (since, well, that's what they are), so I would expect np.zeros to return all-NUL strings, like it does. (Not just 'empty' strings, which just means the first byte is NUL -- I expect all-NUL.) Maybe I've spent too much time working with C data structures, but that's my $0.02 :-) -- Nathaniel From daniele at grinta.net Sun Jan 15 07:30:33 2012 From: daniele at grinta.net (Daniele Nicolodi) Date: Sun, 15 Jan 2012 13:30:33 +0100 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> Message-ID: <4F12C6E9.6070804@grinta.net> On 15/01/12 00:53, Nathan Faggian wrote: > Hi, > > I am finding it less than useful to have the negative index wrapping > on nd-arrays. Here is a short example: > > import numpy as np > a = np.zeros((3, 3)) > a[:,2] = 1000 > print a[0,-1] > print a[0,-1] > print a[-1,-1] > > In all cases 1000 is printed out. What else would you expect? > What I am after is a way to say "please don't wrap around" and have > negative indices behave in a way I choose. I know this is a standard > thing - but is there a way to override that behaviour that doesn't > involve cython or rolling my own resampler? What other behavior would you choose? I don't see any other that would make sense and that would be consistent with positive indexing. Cheers, -- Daniele From cournape at gmail.com Sun Jan 15 07:54:59 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 15 Jan 2012 12:54:59 +0000 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> Message-ID: <CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com> On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian <nathan.faggian at gmail.com> wrote: > Hi, > > I am finding it less than useful to have the negative index wrapping on nd-arrays. Here is a short example: > > import numpy as np > a = np.zeros((3, 3)) > a[:,2] = 1000 > print a[0,-1] > print a[0,-1] > print a[-1,-1] > > In all cases 1000 is printed out. > > What I am after is a way to say "please don't wrap around" and have negative indices behave in a way I choose. ?I know this is a standard thing - but is there a way to override that behaviour that doesn't involve cython or rolling my own resampler? Although it could be possible with lots of work, it would most likely be a bad idea. You will need to wrap something around your model/data/etc... Could you explain a bit more what you have in mind ? David From josef.pktd at gmail.com Sun Jan 15 08:00:57 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 15 Jan 2012 08:00:57 -0500 Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings. In-Reply-To: <CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com> References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com> <CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com> Message-ID: <CAMMTP+COp7xE-ffwyYw7xjcwjAjomjqVZqqbtk8u=+v3hABjVQ@mail.gmail.com> On Sun, Jan 15, 2012 at 3:15 AM, Nathaniel Smith <njs at pobox.com> wrote: > On Sat, Jan 14, 2012 at 2:12 PM, Charles R Harris > <charlesr.harris at gmail.com> wrote: >> This sort of makes sense, but is it the 'correct' behavior? >> >> In [20]: zeros(2, 'S') >> Out[20]: >> array(['', ''], >> ????? dtype='|S1') > > I think of numpy strings as raw fixed-length byte arrays (since, well, > that's what they are), so I would expect np.zeros to return all-NUL > strings, like it does. (Not just 'empty' strings, which just means the > first byte is NUL -- I expect all-NUL.) Maybe I've spent too much time > working with C data structures, but that's my $0.02 :-) Since I'm not coding in C: can a fixed-length empty string, '', be represented as only first byte is NUL? The following with the current behavior looks all reasonable to me >>> np.zeros(2).view('S4') array(['', '', '', ''], dtype='|S4') >>> np.zeros(4, 'S4').view(float) array([ 0., 0.]) >>> np.zeros(4, 'S4').view(int) array([0, 0, 0, 0]) >>> np.zeros(4, 'S4').view('S16') array([''], dtype='|S16') np.zeros(2, float).view('S4') array(['', '', '', ''], dtype='|S4') instead of astype >>> np.zeros(2, float).astype('S4') array(['0.0', '0.0'], dtype='|S4') my 2c (with trying to understand what's the question) Josef > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From apo at pdauf.de Sun Jan 15 10:45:48 2012 From: apo at pdauf.de (apo at pdauf.de) Date: Sun, 15 Jan 2012 16:45:48 +0100 (CET) Subject: [Numpy-discussion] Counting the Colors of RGB-Image Message-ID: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de> An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/0310a1ac/attachment.html> -------------- next part -------------- Counting the Colors of RGB-Image, nameit im0 with im0.shape = 2500,3500,3 with this code: tab0 = zeros( (256,256,256) , dtype=int) tt = im0.view() tt.shape = -1,3 for r,g,b in tt: tab0[r,g,b] += 1 Question: Is there a faster way in numpy to get this result? MfG elodw From tsyu80 at gmail.com Sun Jan 15 11:03:29 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 15 Jan 2012 11:03:29 -0500 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de> References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de> Message-ID: <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com> On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de> wrote: > > Counting the Colors of RGB-Image, > nameit im0 with im0.shape = 2500,3500,3 > with this code: > > tab0 = zeros( (256,256,256) , dtype=int) > tt = im0.view() > tt.shape = -1,3 > for r,g,b in tt: > tab0[r,g,b] += 1 > > Question: > > Is there a faster way in numpy to get this result? > > > MfG elodw > Assuming that your image is made up of integer values (which I guess they'd have to be if you're indexing into `tab0`), then you could write: >>> rgb_unique = set(tuple(rgb) for rgb in tt) I'm not sure if it's any faster than your loop, but I would assume it is. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/7c6aa5b9/attachment.html> From nadavh at visionsense.com Sun Jan 15 12:40:44 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 15 Jan 2012 09:40:44 -0800 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com> References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>, <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com> Message-ID: <26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local> im_flat = im0[...,0]*65536 + im[...,1]*256 +im[...,2] colours = np.unique(im_flat) Nadav ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Tony Yu [tsyu80 at gmail.com] Sent: 15 January 2012 18:03 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Counting the Colors of RGB-Image On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de<mailto:apo at pdauf.de>> wrote: Counting the Colors of RGB-Image, nameit im0 with im0.shape = 2500,3500,3 with this code: tab0 = zeros( (256,256,256) , dtype=int) tt = im0.view() tt.shape = -1,3 for r,g,b in tt: tab0[r,g,b] += 1 Question: Is there a faster way in numpy to get this result? MfG elodw Assuming that your image is made up of integer values (which I guess they'd have to be if you're indexing into `tab0`), then you could write: >>> rgb_unique = set(tuple(rgb) for rgb in tt) I'm not sure if it's any faster than your loop, but I would assume it is. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/65eefcac/attachment.html> From numpy-discussion at maubp.freeserve.co.uk Sun Jan 15 14:10:34 2012 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Sun, 15 Jan 2012 19:10:34 +0000 Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays Message-ID: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com> Hello all, Is there a recommended (and ideally cross platform) way to load the frames of a QuickTime movie (*.mov file) in Python as NumPy arrays? I'd be happy with an iterator based approach, but random access to the frames would be a nice bonus. My aim is to try some image analysis in Python, if there is any sound in the files I don't care about it. I had a look at OpenCV which has Python bindings, http://opencv.willowgarage.com/documentation/python/index.html however I had no joy compiling this on Mac OS X with QuickTime support. Is this the best bet? Thanks, Peter From robert.kern at gmail.com Sun Jan 15 14:12:11 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 15 Jan 2012 19:12:11 +0000 Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays In-Reply-To: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com> References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com> Message-ID: <CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com> On Sun, Jan 15, 2012 at 19:10, Peter <numpy-discussion at maubp.freeserve.co.uk> wrote: > Hello all, > > Is there a recommended (and ideally cross platform) > way to load the frames of a QuickTime movie (*.mov > file) in Python as NumPy arrays? I'd be happy with > an iterator based approach, but random access to > the frames would be a nice bonus. > > My aim is to try some image analysis in Python, if > there is any sound in the files I don't care about it. > > I had a look at OpenCV which has Python bindings, > http://opencv.willowgarage.com/documentation/python/index.html > however I had no joy compiling this on Mac OS X > with QuickTime support. Is this the best bet? I've had luck with pyffmpeg, though I haven't tried QuickTime .mov files: http://code.google.com/p/pyffmpeg/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From bsouthey at gmail.com Mon Jan 16 10:37:57 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 16 Jan 2012 09:37:57 -0600 Subject: [Numpy-discussion] Fix for ticket #1973 In-Reply-To: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> Message-ID: <4F144455.9020904@gmail.com> On 01/14/2012 04:31 PM, Charles R Harris wrote: > I've put up a pull request for a fix to ticket #1973. Currently the > fix simply propagates the maskna flag when the *.astype method is > called. A more complicated option would be to add a maskna keyword to > specify whether the output is masked or not or propagates the type of > the source, but that seems overly complex to me. > > Thoughts? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks for the correction and as well as the fix. While it worked for integer and floats (not complex ones), I got an error when using complex dtypes. This error that is also present in array creation of complex dtypes. Is this known or a new bug? If it is new, then we need to identify what functionality should handle np.NA but are not working. Bruce $ python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.__version__ # pull request version '2.0.0.dev-88f9276' >>> np.array([1,2], dtype=np.complex) array([ 1.+0.j, 2.+0.j]) >>> np.array([1,2, np.NA], dtype=np.complex) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 1445, in array_repr ', ', "array(") File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 459, in array2string separator, prefix, formatter=formatter) File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 263, in _array2string suppress_small), File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 724, in __init__ self.real_format = FloatFormat(x.real, precision, suppress_small) ValueError: Cannot construct a view of data together with the NPY_ARRAY_MASKNA flag, the NA mask must be added later >>> ca=np.array([1,2], dtype=np.complex, maskna=True) >>> ca[1]=np.NA >>> ca Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 1445, in array_repr ', ', "array(") File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 459, in array2string separator, prefix, formatter=formatter) File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 263, in _array2string suppress_small), File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line 724, in __init__ self.real_format = FloatFormat(x.real, precision, suppress_small) ValueError: Cannot construct a view of data together with the NPY_ARRAY_MASKNA flag, the NA mask must be added later >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/b40cb3fb/attachment.html> From charlesr.harris at gmail.com Mon Jan 16 10:52:10 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Jan 2012 08:52:10 -0700 Subject: [Numpy-discussion] Fix for ticket #1973 In-Reply-To: <4F144455.9020904@gmail.com> References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> <4F144455.9020904@gmail.com> Message-ID: <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote: > ** > On 01/14/2012 04:31 PM, Charles R Harris wrote: > > I've put up a pull request for a fix to ticket #1973. Currently the fix > simply propagates the maskna flag when the *.astype method is called. A > more complicated option would be to add a maskna keyword to specify whether > the output is masked or not or propagates the type of the source, but that > seems overly complex to me. > > Thoughts? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Thanks for the correction and as well as the fix. While it worked for > integer and floats (not complex ones), I got an error when using complex > dtypes. This error that is also present in array creation of complex > dtypes. Is this known or a new bug? > > If it is new, then we need to identify what functionality should handle > np.NA but are not working. > > Bruce > > $ python > Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) > [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> np.__version__ # pull request version > '2.0.0.dev-88f9276' > >>> np.array([1,2], dtype=np.complex) > array([ 1.+0.j, 2.+0.j]) > >>> np.array([1,2, np.NA], dtype=np.complex) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line > 1445, in array_repr > ', ', "array(") > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 459, in array2string > separator, prefix, formatter=formatter) > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 263, in _array2string > suppress_small), > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 724, in __init__ > self.real_format = FloatFormat(x.real, precision, suppress_small) > ValueError: Cannot construct a view of data together with the > NPY_ARRAY_MASKNA flag, the NA mask must be added later > >>> ca=np.array([1,2], dtype=np.complex, maskna=True) > >>> ca[1]=np.NA > >>> ca > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line > 1445, in array_repr > ', ', "array(") > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 459, in array2string > separator, prefix, formatter=formatter) > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 263, in _array2string > suppress_small), > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 724, in __init__ > self.real_format = FloatFormat(x.real, precision, suppress_small) > ValueError: Cannot construct a view of data together with the > NPY_ARRAY_MASKNA flag, the NA mask must be added later > >>> > > Looks like a different bug involving the *.real and *.imag views. I'll take a look. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/d4609e11/attachment.html> From charlesr.harris at gmail.com Mon Jan 16 11:14:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Jan 2012 09:14:22 -0700 Subject: [Numpy-discussion] Fix for ticket #1973 In-Reply-To: <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com> References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> <4F144455.9020904@gmail.com> <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com> Message-ID: <CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com> On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris <charlesr.harris at gmail.com > wrote: > > > On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote: > >> ** >> On 01/14/2012 04:31 PM, Charles R Harris wrote: >> >> I've put up a pull request for a fix to ticket #1973. Currently the fix >> simply propagates the maskna flag when the *.astype method is called. A >> more complicated option would be to add a maskna keyword to specify whether >> the output is masked or not or propagates the type of the source, but that >> seems overly complex to me. >> >> Thoughts? >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> Thanks for the correction and as well as the fix. While it worked for >> integer and floats (not complex ones), I got an error when using complex >> dtypes. This error that is also present in array creation of complex >> dtypes. Is this known or a new bug? >> >> If it is new, then we need to identify what functionality should handle >> np.NA but are not working. >> >> Bruce >> >> $ python >> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) >> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import numpy as np >> >>> np.__version__ # pull request version >> '2.0.0.dev-88f9276' >> >>> np.array([1,2], dtype=np.complex) >> array([ 1.+0.j, 2.+0.j]) >> >>> np.array([1,2, np.NA], dtype=np.complex) >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line >> 1445, in array_repr >> ', ', "array(") >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 459, in array2string >> separator, prefix, formatter=formatter) >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 263, in _array2string >> suppress_small), >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 724, in __init__ >> self.real_format = FloatFormat(x.real, precision, suppress_small) >> ValueError: Cannot construct a view of data together with the >> NPY_ARRAY_MASKNA flag, the NA mask must be added later >> >>> ca=np.array([1,2], dtype=np.complex, maskna=True) >> >>> ca[1]=np.NA >> >>> ca >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line >> 1445, in array_repr >> ', ', "array(") >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 459, in array2string >> separator, prefix, formatter=formatter) >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 263, in _array2string >> suppress_small), >> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >> line 724, in __init__ >> self.real_format = FloatFormat(x.real, precision, suppress_small) >> ValueError: Cannot construct a view of data together with the >> NPY_ARRAY_MASKNA flag, the NA mask must be added later >> >>> >> >> > Looks like a different bug involving the *.real and *.imag views. I'll > take a look. > > Looks like views of masked arrays have other problems: In [13]: a = ones(3, int16, maskna=1) In [14]: a.view(int8) Out[14]: array([1, 0, 1, NA, 1, NA], dtype=int8) I'm not sure what the policy should be here. One could construct a new mask adapted to the view, raise an error when the types don't align (I think the real/imag parts should be considered aligned), or just let the view unmask the array. The last seems dangerous. Hmm... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/c7d4dd5c/attachment.html> From charlesr.harris at gmail.com Mon Jan 16 13:20:21 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Jan 2012 11:20:21 -0700 Subject: [Numpy-discussion] Fix for ticket #1973 In-Reply-To: <4F144455.9020904@gmail.com> References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> <4F144455.9020904@gmail.com> Message-ID: <CAB6mnxJ4ukkr+B8aiN3j0KPduvUY2unB1zTcah0Ks+917JU1EQ@mail.gmail.com> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote: > ** > On 01/14/2012 04:31 PM, Charles R Harris wrote: > > I've put up a pull request for a fix to ticket #1973. Currently the fix > simply propagates the maskna flag when the *.astype method is called. A > more complicated option would be to add a maskna keyword to specify whether > the output is masked or not or propagates the type of the source, but that > seems overly complex to me. > > Thoughts? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Thanks for the correction and as well as the fix. While it worked for > integer and floats (not complex ones), I got an error when using complex > dtypes. This error that is also present in array creation of complex > dtypes. Is this known or a new bug? > > If it is new, then we need to identify what functionality should handle > np.NA but are not working. > > Bruce > > $ python > Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) > [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> np.__version__ # pull request version > '2.0.0.dev-88f9276' > >>> np.array([1,2], dtype=np.complex) > array([ 1.+0.j, 2.+0.j]) > >>> np.array([1,2, np.NA], dtype=np.complex) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line > 1445, in array_repr > ', ', "array(") > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 459, in array2string > separator, prefix, formatter=formatter) > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 263, in _array2string > suppress_small), > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 724, in __init__ > self.real_format = FloatFormat(x.real, precision, suppress_small) > ValueError: Cannot construct a view of data together with the > NPY_ARRAY_MASKNA flag, the NA mask must be added later > >>> ca=np.array([1,2], dtype=np.complex, maskna=True) > >>> ca[1]=np.NA > >>> ca > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line > 1445, in array_repr > ', ', "array(") > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 459, in array2string > separator, prefix, formatter=formatter) > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 263, in _array2string > suppress_small), > File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line > 724, in __init__ > self.real_format = FloatFormat(x.real, precision, suppress_small) > ValueError: Cannot construct a view of data together with the > NPY_ARRAY_MASKNA flag, the NA mask must be added later > >>> > > The location of this problem is easy to find, but the fix isn't completely trivial as it seems there is no easy way to copy masks between arrays. There may be, but I haven't found it. Also there is the unfortunate fact that real and imag are array methods and work for non-complex arrays In [6]: a = ones(3, 'S') In [7]: a.real Out[7]: array(['1', '1', '1'], dtype='|S1') In [8]: a.imag Out[8]: array(['', '', ''], dtype='|S1') which makes a simple view impractical. Not that views seem to work. Another complication of the NA stuff is that there two types of NA, a potential multivalued NA, and a simple boolean NA. I think we need to pick between the two as supporting both makes a mess. Because of the common complaint about memory usage, I vote for simple boolean which offers the option of bit arrays for the masks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/50be8526/attachment.html> From tmp50 at ukr.net Mon Jan 16 15:07:54 2012 From: tmp50 at ukr.net (Dmitrey) Date: Mon, 16 Jan 2012 22:07:54 +0200 Subject: [Numpy-discussion] [ANN] global constrained solver with discrete variables Message-ID: <17671.1326744474.16853089562548174848@ffe6.ukr.net> hi all, I've done support of discrete variables for interalg - free (license: BSD) solver with specifiable accuracy, you can take a look at an example here It is written in Python + NumPy, and I hope it's speed will be essentially increased when PyPy (Python with dynamic compilation) support for NumPy will be done (some parts of code are not vectorized and still use CPython cycles). Also, NumPy funcs like vstack or append produce only copy of data, and it also slows the solver very much (for mature problems). Maybe some bugs still present somewhere - interalg code already became very long, but since it already works, you could be interested in trying to use it right now. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/7afbf75e/attachment.html> From ben.root at ou.edu Mon Jan 16 16:05:17 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 16 Jan 2012 15:05:17 -0600 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> <CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com> Message-ID: <CANNq6Fk-Z_ydvyaSkeATooBbZ=YsNvDMw80O6QdabHQ9YKHk3A@mail.gmail.com> On Sun, Jan 15, 2012 at 6:54 AM, David Cournapeau <cournape at gmail.com>wrote: > On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian > <nathan.faggian at gmail.com> wrote: > > Hi, > > > > I am finding it less than useful to have the negative index wrapping on > nd-arrays. Here is a short example: > > > > import numpy as np > > a = np.zeros((3, 3)) > > a[:,2] = 1000 > > print a[0,-1] > > print a[0,-1] > > print a[-1,-1] > > > > In all cases 1000 is printed out. > > > > What I am after is a way to say "please don't wrap around" and have > negative indices behave in a way I choose. I know this is a standard thing > - but is there a way to override that behaviour that doesn't involve cython > or rolling my own resampler? > > Although it could be possible with lots of work, it would most likely > be a bad idea. You will need to wrap something around your > model/data/etc... Could you explain a bit more what you have in mind ? > > David > Another approach that might be useful, depending on the needs, is to use `np.ravel_multi_index()`, in which ndim coords can be passed in and flatten coords are returned. It has options of 'raise', 'wrap' and 'clip' for handling out-of-bounds indices. It wouldn't be built directly into the arrays, but if that isn't needed, this might work. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/e9cce650/attachment.html> From charlesr.harris at gmail.com Mon Jan 16 16:24:08 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Jan 2012 14:24:08 -0700 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> Message-ID: <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com>wrote: > Hi, > > I am finding it less than useful to have the negative index wrapping on > nd-arrays. Here is a short example: > > import numpy as np > a = np.zeros((3, 3)) > a[:,2] = 1000 > print a[0,-1] > print a[0,-1] > print a[-1,-1] > > In all cases 1000 is printed out. > > Looks right to me, the whole last column is 1000. What exactly do you want to do and what is the problem? <snip> Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/0783c008/attachment.html> From ben.root at ou.edu Mon Jan 16 16:30:27 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 16 Jan 2012 15:30:27 -0600 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com> Message-ID: <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com> On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris <charlesr.harris at gmail.com > wrote: > > > On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com>wrote: > >> Hi, >> >> I am finding it less than useful to have the negative index wrapping on >> nd-arrays. Here is a short example: >> >> import numpy as np >> a = np.zeros((3, 3)) >> a[:,2] = 1000 >> print a[0,-1] >> print a[0,-1] >> print a[-1,-1] >> >> In all cases 1000 is printed out. >> >> > Looks right to me, the whole last column is 1000. What exactly do you want > to do and what is the problem? > > <snip> > > Chuck > > I would imagine that it is some sort of image processing use-case, where sometimes you want the data to reflect at the boundaries, or be constant, or have some other value used for access outside the domain. So, for reflect, I would guess that he would have wanted 0.0 for the first two and 1000 for the last one. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/ec9d39f2/attachment.html> From ben.root at ou.edu Mon Jan 16 16:42:39 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 16 Jan 2012 15:42:39 -0600 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com> <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com> Message-ID: <CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com> On Mon, Jan 16, 2012 at 3:30 PM, Benjamin Root <ben.root at ou.edu> wrote: > > > On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com >> > wrote: >> >>> Hi, >>> >>> I am finding it less than useful to have the negative index wrapping on >>> nd-arrays. Here is a short example: >>> >>> import numpy as np >>> a = np.zeros((3, 3)) >>> a[:,2] = 1000 >>> print a[0,-1] >>> print a[0,-1] >>> print a[-1,-1] >>> >>> In all cases 1000 is printed out. >>> >>> >> Looks right to me, the whole last column is 1000. What exactly do you >> want to do and what is the problem? >> >> <snip> >> >> Chuck >> >> > I would imagine that it is some sort of image processing use-case, where > sometimes you want the data to reflect at the boundaries, or be constant, > or have some other value used for access outside the domain. So, for > reflect, I would guess that he would have wanted 0.0 for the first two and > 1000 for the last one. > > Ben Root > > Errr, I mean 0.0 for the last one. I can't think today. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/1d523065/attachment.html> From nathan.faggian at gmail.com Mon Jan 16 19:23:24 2012 From: nathan.faggian at gmail.com (Nathan Faggian) Date: Tue, 17 Jan 2012 11:23:24 +1100 Subject: [Numpy-discussion] Negative indexing. In-Reply-To: <CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com> References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com> <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com> <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com> <CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com> Message-ID: <CAN1J6jUzwBAxafSodrGLwgohRDXhfasGVY0O-mOyLZkueJjv6A@mail.gmail.com> Hi, I am sorry for the late reply. Benjamin has hit the nail on the head. I guess I am seeing numpy "fancy indexing" as equivalent to integer based coordinate sampling and trying to compare numpy's fancy indexing to something like map_coordinates in scipy. I have never used np.ravel_multi_index() and will have a look at this now. -N On 17 January 2012 08:42, Benjamin Root <ben.root at ou.edu> wrote: > On Mon, Jan 16, 2012 at 3:30 PM, Benjamin Root <ben.root at ou.edu> wrote: >> >> >> >> On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris >> <charlesr.harris at gmail.com> wrote: >>> >>> >>> >>> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian >>> <nathan.faggian at gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I am finding it less than useful to have the negative index wrapping on >>>> nd-arrays. Here is a short example: >>>> >>>> import numpy as np >>>> a = np.zeros((3, 3)) >>>> a[:,2] = 1000 >>>> print a[0,-1] >>>> print a[0,-1] >>>> print a[-1,-1] >>>> >>>> In all cases 1000 is printed out. >>>> >>> >>> Looks right to me, the whole last column is 1000. What exactly do you >>> want to do and what is the problem? >>> >>> <snip> >>> >>> Chuck >>> >> >> I would imagine that it is some sort of image processing use-case, where >> sometimes you want the data to reflect at the boundaries, or be constant, or >> have some other value used for access outside the domain.? So, for reflect, >> I would guess that he would have wanted 0.0 for the first two and 1000 for >> the last one. >> >> Ben Root >> > > Errr, I mean 0.0 for the last one.? I can't think today. > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From lists at informa.tiker.net Tue Jan 17 00:11:10 2012 From: lists at informa.tiker.net (Andreas Kloeckner) Date: Tue, 17 Jan 2012 00:11:10 -0500 Subject: [Numpy-discussion] dtype comparison, hash In-Reply-To: <CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com> References: <878vlyu7uq.fsf@ding.tiker.net> <CAF6FJit4zMEibGi6NACKUNCOZh5viucqB=Pf5+S1CvHDMLqdvA@mail.gmail.com> <87wr9drios.fsf@ding.tiker.net> <CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com> Message-ID: <87ty3u7w29.fsf@ding.tiker.net> Hi Robert, On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern at gmail.com> wrote: > On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner > <lists at informa.tiker.net> wrote: > > Hi Robert, > > > > On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern at gmail.com> wrote: > >> On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner > >> <lists at informa.tiker.net> wrote: > >> > Hi all, > >> > > >> > Two questions: > >> > > >> > - Are dtypes supposed to be comparable (i.e. implement '==', '!=')? > >> > >> Yes. > >> > >> > - Are dtypes supposed to be hashable? > >> > >> Yes, with caveats. Strictly speaking, we violate the condition that > >> objects that equal each other should hash equal since we define == to > >> be rather free. Namely, > >> > >> ? np.dtype(x) == x > >> > >> for all objects x that can be converted to a dtype. > >> > >> ? np.dtype(float) == np.dtype('float') > >> ? np.dtype(float) == float > >> ? np.dtype(float) == 'float' > >> > >> Since hash(float) != hash('float') we cannot implement > >> np.dtype.__hash__() to follow the stricture that objects that compare > >> equal should hash equal. > >> > >> However, if you restrict the domain of objects to just dtypes (i.e. > >> only consider dicts that use only actual dtype objects as keys instead > >> of arbitrary mixtures of objects), then the stricture is obeyed. This > >> is a useful domain that is used internally in numpy. > >> > >> Is this the problem that you found? > > > > Thanks for the reply. > > > > It doesn't seem like this is our issue--instead, we're encountering two > > different dtype objects that claim to be float64, compare as equal, but > > don't hash to the same value. > > > > I've asked the user who encountered the user to investigate, and I'll > > be back with more detail in a bit. > > I think we've run into this before and tried to fix it. Try to find > the version of numpy the user has and a minimal example, if you can. This is what Thomas found: http://projects.scipy.org/numpy/ticket/2017 Hope this helps, Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/a4c2e34f/attachment.sig> From robert.kern at gmail.com Tue Jan 17 09:28:21 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Jan 2012 14:28:21 +0000 Subject: [Numpy-discussion] dtype comparison, hash In-Reply-To: <87ty3u7w29.fsf@ding.tiker.net> References: <878vlyu7uq.fsf@ding.tiker.net> <CAF6FJit4zMEibGi6NACKUNCOZh5viucqB=Pf5+S1CvHDMLqdvA@mail.gmail.com> <87wr9drios.fsf@ding.tiker.net> <CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com> <87ty3u7w29.fsf@ding.tiker.net> Message-ID: <CAF6FJiu_Y=tKyNuggU6RXQK8tU6-GPJJkrOpz_DDs1Fxkw8U=g@mail.gmail.com> On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner <lists at informa.tiker.net> wrote: > Hi Robert, > > On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern at gmail.com> wrote: >> On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner >> <lists at informa.tiker.net> wrote: >> > Hi Robert, >> > >> > On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern at gmail.com> wrote: >> >> On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner >> >> <lists at informa.tiker.net> wrote: >> >> > Hi all, >> >> > >> >> > Two questions: >> >> > >> >> > - Are dtypes supposed to be comparable (i.e. implement '==', '!=')? >> >> >> >> Yes. >> >> >> >> > - Are dtypes supposed to be hashable? >> >> >> >> Yes, with caveats. Strictly speaking, we violate the condition that >> >> objects that equal each other should hash equal since we define == to >> >> be rather free. Namely, >> >> >> >> ? np.dtype(x) == x >> >> >> >> for all objects x that can be converted to a dtype. >> >> >> >> ? np.dtype(float) == np.dtype('float') >> >> ? np.dtype(float) == float >> >> ? np.dtype(float) == 'float' >> >> >> >> Since hash(float) != hash('float') we cannot implement >> >> np.dtype.__hash__() to follow the stricture that objects that compare >> >> equal should hash equal. >> >> >> >> However, if you restrict the domain of objects to just dtypes (i.e. >> >> only consider dicts that use only actual dtype objects as keys instead >> >> of arbitrary mixtures of objects), then the stricture is obeyed. This >> >> is a useful domain that is used internally in numpy. >> >> >> >> Is this the problem that you found? >> > >> > Thanks for the reply. >> > >> > It doesn't seem like this is our issue--instead, we're encountering two >> > different dtype objects that claim to be float64, compare as equal, but >> > don't hash to the same value. >> > >> > I've asked the user who encountered the user to investigate, and I'll >> > be back with more detail in a bit. >> >> I think we've run into this before and tried to fix it. Try to find >> the version of numpy the user has and a minimal example, if you can. > > This is what Thomas found: > > http://projects.scipy.org/numpy/ticket/2017 It looks like the .flags attribute is different between np.uintp and np.uint32. The .flags attribute forms part of the hashed information about the dtype (or PyArray_Descr at the C-level). [~] |15> np.dtype(np.uintp).flags 1536 [~] |16> np.dtype(np.uint32).flags 2048 The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so unlike the comment in the ticket, they do have different hashes for me. However, diving through the source a bit, I'm not entirely sure I trust the values being given at the Python level. It appears that the flag member of the PyArray_Descr struct is declared as a char. However, it is exposed as a T_INT member in the PyMemberDef table by direct addressing. Basically, a Python descriptor gets added to the np.dtype type that will look up sizeof(long) bytes from the starting position of the flags member in the struct. This includes 3 bytes of the following type_num member. Obviously, 2048 does not fit into a char. Nonetheless, the type_num is also part of the hash, so either the flags member or the type_num member is different between the two. Two bugs for the price of one! -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From adam at lambdafoundry.com Tue Jan 17 10:57:29 2012 From: adam at lambdafoundry.com (Adam Klein) Date: Tue, 17 Jan 2012 10:57:29 -0500 Subject: [Numpy-discussion] segfault on searchsorted (1.6.2.dev-396dbb9) Message-ID: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com> Hello, I get a segfault here: In [1]: x = np.array([1,2,3], dtype='M') In [2]: x.searchsorted(2, side='left') But it's fine here: In [1]: x = np.array([1,2,3], dtype='M') In [2]: x.view('i8').searchsorted(2, side='left') Out[2]: 1 This segfaults again: x.view('i8').searchsorted(np.datetime64(2), side='left') GDB gets me this far: Program received signal SIGSEGV, Segmentation fault. PyArray_SearchSorted (op1=0x1b8dd70, op2=0x17dfac0, side=NPY_SEARCHLEFT) at numpy/core/src/multiarray/item_selection.c:1463 1463 Py_INCREF(dtype); -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/90b3ac63/attachment.html> From charlesr.harris at gmail.com Tue Jan 17 11:53:37 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 17 Jan 2012 09:53:37 -0700 Subject: [Numpy-discussion] segfault on searchsorted (1.6.2.dev-396dbb9) In-Reply-To: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com> References: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com> Message-ID: <CAB6mnxL-FC7PQkD5hz9T5mSDP16aeUAE-TVrkvV5xn89OMjQUQ@mail.gmail.com> On Tue, Jan 17, 2012 at 8:57 AM, Adam Klein <adam at lambdafoundry.com> wrote: > Hello, > > I get a segfault here: > > In [1]: x = np.array([1,2,3], dtype='M') > In [2]: x.searchsorted(2, side='left') > > But it's fine here: > > In [1]: x = np.array([1,2,3], dtype='M') > In [2]: x.view('i8').searchsorted(2, side='left') > Out[2]: 1 > > This segfaults again: > > x.view('i8').searchsorted(np.datetime64(2), side='left') > > GDB gets me this far: > > Program received signal SIGSEGV, Segmentation fault. > PyArray_SearchSorted (op1=0x1b8dd70, op2=0x17dfac0, side=NPY_SEARCHLEFT) > at numpy/core/src/multiarray/item_selection.c:1463 > 1463 Py_INCREF(dtype); > > > Confirmed in current development. Note that things have changed and the initial array creation will fail (no unit). The searchsorted will work if searching for a datetime: In [10]: x = np.array([1,2,3], 'datetime64[D]') In [11]: x.searchsorted(datetime64(2,'D')) Out[11]: 1 So the failure is one of raising an appropriate error message. Please open a ticket. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/6ab37a02/attachment.html> From chris.barker at noaa.gov Tue Jan 17 15:20:37 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 17 Jan 2012 12:20:37 -0800 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local> References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de> <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com> <26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local> Message-ID: <CALGmxEKPOJB8gmg3Z4B_U9o8z0+dVfGh89E28ETfzU2De7R8sw@mail.gmail.com> Here's a thought: Too bad numpy doesn't have a 24 bit integer, but you could tack a 0 on, making your image 32 bit, then use histogram2d to count the colors. something like (untested): # create the 32 bit image 32bit_im = np.zeros((w, h), dtype = np.uint32) view = 32bit_im.view(dtype = np.uint8).reshape((w,h,4)) view[:,:,:3] = im # histogram it: bins = # this is the trick -- setting your bins right # remember that histrogram is designed for floats, so you're bin boundaries shold be between the inteer values you want. colors = np.histogram(32bit_im, bins=bins) NOTE: the image processing scikit may well have somethign already -- histogramming an image is a common process. -Chris On Sun, Jan 15, 2012 at 9:40 AM, Nadav Horesh <nadavh at visionsense.com> wrote: > im_flat = im0[...,0]*65536 + im[...,1]*256 +im[...,2] > colours = np.unique(im_flat) > > ?? Nadav > > ________________________________ > From: numpy-discussion-bounces at scipy.org > [numpy-discussion-bounces at scipy.org] On Behalf Of Tony Yu [tsyu80 at gmail.com] > Sent: 15 January 2012 18:03 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Counting the Colors of RGB-Image > > > > On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de> wrote: >> >> >> Counting the Colors of RGB-Image, >> nameit im0 with im0.shape = 2500,3500,3 >> with this code: >> >> tab0 = zeros( (256,256,256) , dtype=int) >> tt = im0.view() >> tt.shape = -1,3 >> for r,g,b in tt: >> ?tab0[r,g,b] += 1 >> >> Question: >> >> Is there a faster way in numpy to get this result? >> >> >> MfG elodw > > > Assuming that your image is made up of integer values (which I guess they'd > have to be if you're indexing into `tab0`), then you could write: > >>>> rgb_unique = set(tuple(rgb) for rgb in tt) > > I'm not sure if it's any faster than your loop, but I would assume it is. > > -Tony > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From sturla at molden.no Wed Jan 18 00:26:10 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Jan 2012 06:26:10 +0100 Subject: [Numpy-discussion] Strange numpy behaviour (bug?) Message-ID: <4F1657F2.2020803@molden.no> While "playing" with a point-in-polygon test, I have discovered some a failure mode that I cannot make sence of. The algorithm is vectorized for NumPy from a C and Python implementation I found on the net (see links below). It is written to process a large dataset in chunks. I'm rather happy with it, it can test 100,000 x,y points against a non-convex pentagon in just 50 ms. Anyway, here is something very strange (or at least I think so): If I use a small chunk size, it sometimes fails. I know I shouldn't blame it on NumPy, beacuse it is by all likelood my mistake. But it does not make any sence, as the parameter should not affect the computation. Observed behavior: 1. Processing the whole dataset in one big chunk always works. 2. Processing the dataset in big chunks (e.g. 8192 points) always works. 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail. 4. Processing the dataset element-wise always work. 5. The scalar version behaves like the numpy version: fine for large chunks, sometimes it fails for small. That is, when list comprehensions is used for chunks. Big list comprehensions always work, small ones might fail. It looks like the numerical robstness of the alorithm depends on a parameter that has nothing to do with the algorithm at all. For example in (5), we might think that calling a function from a nested loop makes it fail, depending on the length of the inner loop. But calling it from a single loop works just fine. ??? So I wonder: Could there be a bug in numpy that only shows up only when taking a huge number of short slices? I don't know... But try it if you care. In the function "inpolygon", change the call that says __chunk(n,8192) to e.g. __chunk(n,32) to see it fail (or at least it does on my computer, running Enthought 7.2-1 on Win64). Regards, Sturla Molden def __inpolygon_scalar(x,y,poly): # Source code taken from: # http://paulbourke.net/geometry/insidepoly # http://www.ariel.com.au/a/python-point-int-poly.html n = len(poly) inside = False p1x,p1y = poly[0] xinters = 0 for i in range(n+1): p2x,p2y = poly[i % n] if y > min(p1y,p2y): if y <= max(p1y,p2y): if x <= max(p1x,p2x): if p1y != p2y: xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x if p1x == p2x or x <= xinters: inside = not inside p1x,p1y = p2x,p2y return inside # the rest is (C) Sturla Molden, 2012 # University of Oslo def __inpolygon_numpy(x,y,poly): """ numpy vectorized version """ n = len(poly) inside = np.zeros(x.shape[0], dtype=bool) xinters = np.zeros(x.shape[0], dtype=float) p1x,p1y = poly[0] for i in range(n+1): p2x,p2y = poly[i % n] mask = (y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= max(p1x,p2x)) if p1y != p2y: xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x if p1x == p2x: inside[mask] = ~inside[mask] else: mask2 = x[mask] <= xinters[mask] idx, = np.where(mask) idx2, = np.where(mask2) idx = idx[idx2] inside[idx] = ~inside[idx] p1x,p1y = p2x,p2y return inside def __chunk(n,size): x = range(0,n,size) if (n%size): x.append(n) return zip(x[:-1],x[1:]) def inpolygon(x, y, poly): """ point-in-polygon test x and y are numpy arrays polygon is a list of (x,y) vertex tuples """ if np.isscalar(x) and np.isscalar(y): return __inpolygon_scalar(x, y, poly) else: x = np.asarray(x) y = np.asarray(y) n = x.shape[0] z = np.zeros(n, dtype=bool) for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ??? if j-i > 1: z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly) else: z[i] = __inpolygon_scalar(x[i], y[i], poly) return z if __name__ == "__main__": import matplotlib import matplotlib.pyplot as plt from time import clock n = 100000 polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)] xp = [x for x,y in polygon] yp = [y for x,y in polygon] x = np.random.rand(n) y = np.random.rand(n) t0 = clock() inside = inpolygon(x,y,polygon) t1 = clock() print 'elapsed time %.3g ms' % ((t0-t1)*1E3,) plt.figure() plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g') plt.axis([0,1,0,1]) plt.show() From sturla at molden.no Wed Jan 18 00:57:38 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Jan 2012 06:57:38 +0100 Subject: [Numpy-discussion] Strange numpy behaviour (bug?) In-Reply-To: <4F1657F2.2020803@molden.no> References: <4F1657F2.2020803@molden.no> Message-ID: <4F165F52.6070300@molden.no> Never mind this, it was my own mistake as I expected :-) def __chunk(n,size): x = range(0,n,size) x.append(n) return zip(x[:-1],x[1:]) makes it a lot better :) Sturla Den 18.01.2012 06:26, skrev Sturla Molden: > While "playing" with a point-in-polygon test, I have discovered some a > failure mode that I cannot make sence of. > > The algorithm is vectorized for NumPy from a C and Python implementation > I found on the net (see links below). It is written to process a large > dataset in chunks. I'm rather happy with it, it can test 100,000 x,y > points against a non-convex pentagon in just 50 ms. > > Anyway, here is something very strange (or at least I think so): > > If I use a small chunk size, it sometimes fails. I know I shouldn't > blame it on NumPy, beacuse it is by all likelood my mistake. But it does > not make any sence, as the parameter should not affect the computation. > > Observed behavior: > > 1. Processing the whole dataset in one big chunk always works. > > 2. Processing the dataset in big chunks (e.g. 8192 points) always works. > > 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail. > > 4. Processing the dataset element-wise always work. > > 5. The scalar version behaves like the numpy version: fine for large > chunks, sometimes it fails for small. That is, when list comprehensions > is used for chunks. Big list comprehensions always work, small ones > might fail. > > It looks like the numerical robstness of the alorithm depends on a > parameter that has nothing to do with the algorithm at all. For example > in (5), we might think that calling a function from a nested loop makes > it fail, depending on the length of the inner loop. But calling it from > a single loop works just fine. > > ??? > > So I wonder: > > Could there be a bug in numpy that only shows up only when taking a huge > number of short slices? > > I don't know... But try it if you care. > > In the function "inpolygon", change the call that says __chunk(n,8192) > to e.g. __chunk(n,32) to see it fail (or at least it does on my > computer, running Enthought 7.2-1 on Win64). > > > Regards, > Sturla Molden > > > > > > def __inpolygon_scalar(x,y,poly): > > # Source code taken from: > # http://paulbourke.net/geometry/insidepoly > # http://www.ariel.com.au/a/python-point-int-poly.html > > n = len(poly) > inside = False > p1x,p1y = poly[0] > xinters = 0 > for i in range(n+1): > p2x,p2y = poly[i % n] > if y> min(p1y,p2y): > if y<= max(p1y,p2y): > if x<= max(p1x,p2x): > if p1y != p2y: > xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x > if p1x == p2x or x<= xinters: > inside = not inside > p1x,p1y = p2x,p2y > return inside > > > # the rest is (C) Sturla Molden, 2012 > # University of Oslo > > def __inpolygon_numpy(x,y,poly): > """ numpy vectorized version """ > n = len(poly) > inside = np.zeros(x.shape[0], dtype=bool) > xinters = np.zeros(x.shape[0], dtype=float) > p1x,p1y = poly[0] > for i in range(n+1): > p2x,p2y = poly[i % n] > mask = (y> min(p1y,p2y))& (y<= max(p1y,p2y))& (x<= > max(p1x,p2x)) > if p1y != p2y: > xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x > if p1x == p2x: > inside[mask] = ~inside[mask] > else: > mask2 = x[mask]<= xinters[mask] > idx, = np.where(mask) > idx2, = np.where(mask2) > idx = idx[idx2] > inside[idx] = ~inside[idx] > p1x,p1y = p2x,p2y > return inside > > def __chunk(n,size): > x = range(0,n,size) > if (n%size): > x.append(n) > return zip(x[:-1],x[1:]) > > def inpolygon(x, y, poly): > """ > point-in-polygon test > x and y are numpy arrays > polygon is a list of (x,y) vertex tuples > """ > if np.isscalar(x) and np.isscalar(y): > return __inpolygon_scalar(x, y, poly) > else: > x = np.asarray(x) > y = np.asarray(y) > n = x.shape[0] > z = np.zeros(n, dtype=bool) > for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ??? > if j-i> 1: > z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly) > else: > z[i] = __inpolygon_scalar(x[i], y[i], poly) > return z > > > > if __name__ == "__main__": > > import matplotlib > import matplotlib.pyplot as plt > from time import clock > > n = 100000 > polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)] > xp = [x for x,y in polygon] > yp = [y for x,y in polygon] > x = np.random.rand(n) > y = np.random.rand(n) > t0 = clock() > inside = inpolygon(x,y,polygon) > t1 = clock() > print 'elapsed time %.3g ms' % ((t0-t1)*1E3,) > plt.figure() > plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g') > plt.axis([0,1,0,1]) > plt.show() > > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Wed Jan 18 04:22:57 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 18 Jan 2012 01:22:57 -0800 Subject: [Numpy-discussion] Download page still points to SVN Message-ID: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> Hi folks, I was just pointing a colleague to the 'official download page' for numpy so he could find how to grab current sources: http://new.scipy.org/download.html but I was quite surprised to find that it still points to SVN for both numpy and scipy. It would probably not be a bad idea to update those and point them to github... Cheers, f From apo at pdauf.de Wed Jan 18 04:26:25 2012 From: apo at pdauf.de (apo at pdauf.de) Date: Wed, 18 Jan 2012 10:26:25 +0100 (CET) Subject: [Numpy-discussion] Counting the Colors of RGB-Image Message-ID: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/8fbd4065/attachment.html> -------------- next part -------------- Sorry, that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker. When iam direct answering on Your e-mail i get an error 5. I think i did a mistake. Your ideas are very helpfull and the code is very fast. Thank You elodw From scott.sinclair.za at gmail.com Wed Jan 18 05:18:49 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 18 Jan 2012 12:18:49 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> Message-ID: <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> On 18 January 2012 11:22, Fernando Perez <fperez.net at gmail.com> wrote: > I was just pointing a colleague to the 'official download page' for > numpy so he could find how to grab current sources: > > http://new.scipy.org/download.html > > but I was quite surprised to find that it still points to SVN for both > numpy and scipy. ?It would probably not be a bad idea to update those > and point them to github... It's rather confusing having two websites. The "official" page at http://www.scipy.org/Download points to github. There hasn't been much maintenance effort for new.scipy.org, and there was some recent discussion about taking it offline. I'm not sure if a firm conclusion was reached. Cheers, Scott From numpy-discussion at maubp.freeserve.co.uk Wed Jan 18 05:19:21 2012 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Jan 2012 10:19:21 +0000 Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays In-Reply-To: <CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com> References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com> <CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com> <CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com> Message-ID: <CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com> Sending this again (sorry Robert, this will be the second time for you) since I sent from a non-subscribed email address the first time. On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote: > On Sun, Jan 15, 2012 at 19:10, Peter wrote: >> Hello all, >> >> Is there a recommended (and ideally cross platform) >> way to load the frames of a QuickTime movie (*.mov >> file) in Python as NumPy arrays? ... > > I've had luck with pyffmpeg, though I haven't tried > QuickTime .mov files: > > ?http://code.google.com/p/pyffmpeg/ Thanks for the suggestion. Sadly right now pyffmpeg won't install on Mac OS X, at least not with the version of Cython I have installed: http://code.google.com/p/pyffmpeg/issues/detail?id=44 There doesn't seem to have been any activity on the official repository for some time either. Peter From robert.kern at gmail.com Wed Jan 18 05:36:39 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 Jan 2012 10:36:39 +0000 Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays In-Reply-To: <CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com> References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com> <CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com> <CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com> <CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com> Message-ID: <CAF6FJiuAM1F_4vrNOdr9DHNOABvnLiZyuBAeO-j6Bi1kCwxJxw@mail.gmail.com> On Wed, Jan 18, 2012 at 10:19, Peter <numpy-discussion at maubp.freeserve.co.uk> wrote: > Sending this again (sorry Robert, this will be the second time > for you) since I sent from a non-subscribed email address the > first time. > > On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote: >> On Sun, Jan 15, 2012 at 19:10, Peter wrote: >>> Hello all, >>> >>> Is there a recommended (and ideally cross platform) >>> way to load the frames of a QuickTime movie (*.mov >>> file) in Python as NumPy arrays? ... >> >> I've had luck with pyffmpeg, though I haven't tried >> QuickTime .mov files: >> >> ?http://code.google.com/p/pyffmpeg/ > > Thanks for the suggestion. > > Sadly right now pyffmpeg won't install on Mac OS X, > at least not with the version of Cython I have installed: > http://code.google.com/p/pyffmpeg/issues/detail?id=44 > > There doesn't seem to have been any activity on the > official repository for some time either. Oh, right, I had to fix those, too. I've attached the patches that I used. I used MacPorts to install the ffmpeg libraries, so I modified the paths in the setup.py appropriately. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco -------------- next part -------------- A non-text attachment was scrubbed... Name: setup-fix.diff Type: application/octet-stream Size: 1103 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/ec5b0211/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: cinit-fix.diff Type: application/octet-stream Size: 2427 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/ec5b0211/attachment-0001.obj> From malcolm.reynolds at gmail.com Wed Jan 18 09:59:26 2012 From: malcolm.reynolds at gmail.com (Malcolm Reynolds) Date: Wed, 18 Jan 2012 14:59:26 +0000 Subject: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB) Message-ID: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com> Hi, I've built a system which allocates numpy arrays and processes them in C++ code (this is because I'm building a native code module using boost.python and it makes sense to use numpy data storage to then deal with outputs in python, without having to do any copying). Everything seems fine except when I parallelise the main loop, (openmp and TBB give the same results) in which case I see a whole bunch of messages saying "reference count error detected: an attempt was made to deallocate 12 (d)" sometimes during the running of the program, sometimes all at the end (presumably when all the destructors in my program run). To clarify, the loop I am now running parallel takes read-only parameters (enforced by the C++ compiler using 'const') and as far as I can tell there are no race conditions with multiple threads writing to the same numpy arrays at once or anything obvious like that. I recompiled numpy (I'm using 1.6.1 from the official git repository) to print out some extra information with the reference count message, namely a pointer to the thing which is being erroneously deallocated. Surprisingly, it is always the same address for any run of the program, considering this is a message printed out hundreds of times. I've looked into this a little with GDB and as far as I can see the object which the message pertains to is an "array descriptor", or at least that's what I conclude from backtraces similar to the following: Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 1501 fprintf(stderr, "*** Reference count error detected: \n" \ (gdb) bt #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 #2 0x0000000103e592d7 in boost::detail::sp_counted_impl_p<garf::multivariate_normal<double> const>::dispose (this=<value temporarily unavailable, due to optimizations>) at refcount.hpp:36 #3 .... my code Obviously I can turn off the parallelism to make this problem go away, but since my underlying algorithm is trivially parallelisable I was counting on being able to achieve linear speedup across cores.. Currently I can, and as far as I know there are no actual incorrect results being produced by the program. However, in my field (Machine Learning) it's difficult enough to know whether the numbers calculated are sensible even without the presence of these kind of warnings, so I'd like to get a handle on at least why this is happening so I'd know know whether I can safely ignore it. My guess at what might be happening is that the multiple threads are dealing with some object concurrently and the updates to the reference count are not processed atomically, meaning that there are too many DECREFs which happen later on. I had presumed that allocated different numpy matrices in different threads, and then all reading from central numpy matrices would work fine, but apparently there is something I missed, pertaining to descriptors.. Can anyone offer any guidance, or at least tell me this is safe to ignore? I can reproduce the problem reliably, so if you need me to do some digging with GDB at the point the error takes place I can do that. Many thanks, Malcolm From robert.kern at gmail.com Wed Jan 18 10:15:31 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 Jan 2012 15:15:31 +0000 Subject: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB) In-Reply-To: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com> References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com> Message-ID: <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com> On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds <malcolm.reynolds at gmail.com> wrote: > Hi, > > I've built a system which allocates numpy arrays and processes them in > C++ code (this is because I'm building a native code module using > boost.python and it makes sense to use numpy data storage to then deal > with outputs in python, without having to do any copying). Everything > seems fine except when I parallelise the main loop, (openmp and TBB > give the same results) in which case I see a whole bunch of messages > saying > > "reference count error detected: an attempt was made to deallocate 12 (d)" > > sometimes during the running of the program, sometimes all at the end > (presumably when all the destructors in my program run). > > To clarify, the loop I am now running parallel takes read-only > parameters (enforced by the C++ compiler using 'const') and as far as > I can tell there are no race conditions with multiple threads writing > to the same numpy arrays at once or anything obvious like that. > > I recompiled numpy (I'm using 1.6.1 from the official git repository) > to print out some extra information with the reference count message, > namely a pointer to the thing which is being erroneously deallocated. > Surprisingly, it is always the same address for any run of the > program, considering this is a message printed out hundreds of times. > > I've looked into this a little with GDB and as far as I can see the > object which the message pertains to is an "array descriptor", or at > least that's what I conclude from backtraces similar to the following: > > Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 > 1501 ? ? ? ? ? ?fprintf(stderr, "*** Reference count error detected: \n" \ > (gdb) bt > #0 ?arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 > #1 ?0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 > #2 ?0x0000000103e592d7 in > boost::detail::sp_counted_impl_p<garf::multivariate_normal<double> > const>::dispose (this=<value temporarily unavailable, due to > optimizations>) at refcount.hpp:36 > #3 .... my code I suspect there is some problem with the reference counting that you are doing at the C++ level that is causing you to do too many Py_DECREFs to the numpy objects, and this is being identified by the arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level implementation of dtype objects.) Reading the comments just before descriptor.c:1501 points out that this warning is being printed because something is trying to deallocate the builtin np.dtype('d') == np.dtype('float64') dtype. This should never happen. The refcount for these objects should always be > 0 because numpy itself holds references to them. I suspect that you are obtaining the numpy object (1 Py_INCREF) before you split into multiple threads but releasing them in each thread (multiple Py_DECREFs). This is probably being hidden from you by the boost.python interface and/or the boost::detail::sp_counted_impl_p<> smart(ish) pointer. Check the backtrace where your code starts to verify if this looks to be the case. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From malcolm.reynolds at gmail.com Wed Jan 18 11:14:32 2012 From: malcolm.reynolds at gmail.com (Malcolm Reynolds) Date: Wed, 18 Jan 2012 16:14:32 +0000 Subject: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB) In-Reply-To: <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com> References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com> <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com> Message-ID: <CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com> > > I suspect that you are obtaining the numpy object (1 Py_INCREF) before > you split into multiple threads but releasing them in each thread > (multiple Py_DECREFs). This is probably being hidden from you by the > boost.python interface and/or the boost::detail::sp_counted_impl_p<> > smart(ish) pointer. Check the backtrace where your code starts to > verify if this looks to be the case. Thankyou for your quick reply. This makes a lot of sense, I'm just having trouble seeing where this could be happening as everything I pass into each parallel computation strand is pass down as either pointer-to-consts or reference-to-const - the only things that need to be modified (for example random number generator objects) are created uniquely inside each iteration of the for loop so it can't be that. This information about which object has the reference count problem helps though, I will keep digging. I'm vaguely planning on trying to track every incref and decref so I can pin down which object has an unbalanced amount - to do this I want to know the address of the array, rather than the associated datatype descriptor - I assume I want to pay attention to the (self=0x117e0e850) in this line, and that is the address of the array I am mishandling? #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 Malcolm > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ? -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Wed Jan 18 11:54:53 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 Jan 2012 16:54:53 +0000 Subject: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB) In-Reply-To: <CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com> References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com> <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com> <CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com> Message-ID: <CAF6FJiuCE_CUBmmehKgM+RDYWr_MfPdg7+k2jmWqYJmceX6hmg@mail.gmail.com> On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds <malcolm.reynolds at gmail.com> wrote: >> >> I suspect that you are obtaining the numpy object (1 Py_INCREF) before >> you split into multiple threads but releasing them in each thread >> (multiple Py_DECREFs). This is probably being hidden from you by the >> boost.python interface and/or the boost::detail::sp_counted_impl_p<> >> smart(ish) pointer. Check the backtrace where your code starts to >> verify if this looks to be the case. > > Thankyou for your quick reply. This makes a lot of sense, I'm just > having trouble seeing where this could be happening as everything I > pass into each parallel computation strand is pass down as either > pointer-to-consts or reference-to-const - the only things that need to > be modified (for example random number generator objects) are created > uniquely inside each iteration of the for loop so it can't be that. My C++-fu is fairly weak, so I'm never really sure what the smart pointers are doing when. If there are tracing features that you can turn on, try that. Is this deallocation of the smart pointer to the "garf::multivariate_normal<double> const" being done inside the loop or outside back in the main thread? Where did it get created? > This information about which object has the reference count problem > helps though, I will keep digging. I'm vaguely planning on trying to > track every incref and decref so I can pin down which object has an > unbalanced amount - to do this I want to know the address of the > array, rather than the associated datatype descriptor - I assume I > want to pay attention to the (self=0x117e0e850) in this line, and that > is the address of the array I am mishandling? > > #1 ?0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From olivier.grisel at ensta.org Wed Jan 18 14:54:12 2012 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 18 Jan 2012 20:54:12 +0100 Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012 Message-ID: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> Hi all, Just a quick email to advertise this year's PyCon tutorials as they are very focused on HPC & data analytics. In particular the numpy / scipy ecosystem is well covered, see: https://us.pycon.org/2012/schedule/tutorials/ Here is a selection of tutorials with an abstracts that mention numpy or a related project (scipy, ipython, matplotlib...): - Bayesian statistics made (as) simple (as possible) - Allen Downey https://us.pycon.org/2012/schedule/presentation/10/ - IPython in-depth: high-productivity interactive and parallel python - Fernando P?rez , Brian E. Granger , Min Ragan-Kelley https://us.pycon.org/2012/schedule/presentation/121/ - Faster Python Programs through Optimization - Mike M?ller https://us.pycon.org/2012/schedule/presentation/245/ - Graph Analysis from the Ground Up - Van Lindberg https://us.pycon.org/2012/schedule/presentation/228/ - Data analysis in Python with pandas - Wes McKinney https://us.pycon.org/2012/schedule/presentation/427/ - Social Network Analysis with Python - Maksim Tsvetovat https://us.pycon.org/2012/schedule/presentation/15/ - High Performance Python I - Ian Ozsvald https://us.pycon.org/2012/schedule/presentation/174/ - Plotting with matplotlib - Mike M?ller https://us.pycon.org/2012/schedule/presentation/238/ - Introduction to Interactive Predictive Analytics in Python with scikit-learn - Olivier Grisel https://us.pycon.org/2012/schedule/presentation/195/ - High Performance Python II - Travis Oliphant https://us.pycon.org/2012/schedule/presentation/343/ Also the main conference has also very interesting talks: https://us.pycon.org/2012/schedule/ The early birds rate for the PyCOn ends on Jan 25. See you in PyCon in March, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From fperez.net at gmail.com Wed Jan 18 17:44:55 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 18 Jan 2012 14:44:55 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> Message-ID: <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair <scott.sinclair.za at gmail.com> wrote: > It's rather confusing having two websites. The "official" page at > http://www.scipy.org/Download points to github. The problem is that this page, which looks pretty official to just about anyone: http://numpy.scipy.org/ takes you to the one at new.scipy... So as far as traps for the unwary go, this one was pretty cleverly laid out ;) Best, f From chaoyuejoy at gmail.com Wed Jan 18 17:51:50 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 18 Jan 2012 23:51:50 +0100 Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012 In-Reply-To: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> Message-ID: <CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com> Does anybody know if there is similar chance for training in Paris? (or other places of France)/ the price is nice, just because it's in US.... thanks, Chao 2012/1/18 Olivier Grisel <olivier.grisel at ensta.org> > Hi all, > > Just a quick email to advertise this year's PyCon tutorials as they > are very focused on HPC & data analytics. In particular the numpy / > scipy ecosystem is well covered, see: > > https://us.pycon.org/2012/schedule/tutorials/ > > Here is a selection of tutorials with an abstracts that mention numpy > or a related project (scipy, ipython, matplotlib...): > > - Bayesian statistics made (as) simple (as possible) - Allen Downey > https://us.pycon.org/2012/schedule/presentation/10/ > > - IPython in-depth: high-productivity interactive and parallel python > - Fernando P?rez , Brian E. Granger , Min Ragan-Kelley > https://us.pycon.org/2012/schedule/presentation/121/ > > - Faster Python Programs through Optimization - Mike M?ller > https://us.pycon.org/2012/schedule/presentation/245/ > > - Graph Analysis from the Ground Up - Van Lindberg > https://us.pycon.org/2012/schedule/presentation/228/ > > - Data analysis in Python with pandas - Wes McKinney > https://us.pycon.org/2012/schedule/presentation/427/ > > - Social Network Analysis with Python - Maksim Tsvetovat > https://us.pycon.org/2012/schedule/presentation/15/ > > - High Performance Python I - Ian Ozsvald > https://us.pycon.org/2012/schedule/presentation/174/ > > - Plotting with matplotlib - Mike M?ller > https://us.pycon.org/2012/schedule/presentation/238/ > > - Introduction to Interactive Predictive Analytics in Python with > scikit-learn - Olivier Grisel > https://us.pycon.org/2012/schedule/presentation/195/ > > - High Performance Python II - Travis Oliphant > https://us.pycon.org/2012/schedule/presentation/343/ > > Also the main conference has also very interesting talks: > > https://us.pycon.org/2012/schedule/ > > The early birds rate for the PyCOn ends on Jan 25. > > See you in PyCon in March, > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/7d8bed27/attachment.html> From scott.sinclair.za at gmail.com Thu Jan 19 01:19:24 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 19 Jan 2012 08:19:24 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> Message-ID: <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> On 19 January 2012 00:44, Fernando Perez <fperez.net at gmail.com> wrote: > On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair > <scott.sinclair.za at gmail.com> wrote: >> It's rather confusing having two websites. The "official" page at >> http://www.scipy.org/Download points to github. > > The problem is that this page, which looks pretty official to just about anyone: > > http://numpy.scipy.org/ > > takes you to the one at new.scipy... ?So as far as traps for the > unwary go, this one was pretty cleverly laid out ;) It certainly is. I think (as usual), the problem is that fixing the situation lies on the shoulders of people who are already heavily overburdened.. There is a pull request updating the offending page at https://github.com/scipy/scipy.org-new/pull/1 if any overburdened types feel like merging, building and uploading the revised html. Cheers, Scott From fperez.net at gmail.com Thu Jan 19 01:39:29 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 18 Jan 2012 22:39:29 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> Message-ID: <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair <scott.sinclair.za at gmail.com> wrote: > I think (as usual), the problem is that fixing the situation lies on > the shoulders of people who are already heavily overburdened.. I certainly understand that problem, as I'm eternally behind on a million things regarding ipython. But the only solution to these problems is delegation, not asking the already overburdened few to work even harder than they already do. I wonder if we could distribute the process of managing the websites a little more for numpy/scipy, so this didn't bottleneck as much. Furthermore, managing those is the kind of task that can be accomplished by someone who may not feel comfortable touching the numpy C core, and yet it's a *great* way to help the project out. In ipython, we've moved to github-pages hosting for everything, which means that now having a web team is as easy as clicking on the github interface a couple of times, and that's one more task we can get help on from others. In fairness, right now the ipython-web team is the same people as the core, but at least things are in place to accept new hands helping should they become available, without any conflict with core development. Just a thought. Cheers, f From staticfloat at gmail.com Thu Jan 19 01:50:04 2012 From: staticfloat at gmail.com (Elliot Saba) Date: Wed, 18 Jan 2012 22:50:04 -0800 Subject: [Numpy-discussion] Cross-covariance function Message-ID: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com> Greetings, I recently needed to calculate the cross-covariance of two random vectors, (e.g. I have two matricies, X and Y, the columns of which are observations of one variable, and I wish to generate a matrix pairing each value of X and Y) and so I wrote a small utility function to do so, and I'd like to try and get it integrated into numpy core, if it is deemed useful. I have never submitted a patch to numpy before, so I'm not sure as to the protocol; do I ask someone on this list to review the code? Are there conventions I should be aware of? Etc... Thank you all, -E -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/b9c4abf7/attachment.html> From d.s.seljebotn at astro.uio.no Thu Jan 19 04:11:27 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Jan 2012 10:11:27 +0100 Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012 In-Reply-To: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> Message-ID: <4F17DE3F.7070104@astro.uio.no> On 01/18/2012 08:54 PM, Olivier Grisel wrote: > Hi all, > > Just a quick email to advertise this year's PyCon tutorials as they > are very focused on HPC& data analytics. In particular the numpy / > scipy ecosystem is well covered, see: > > https://us.pycon.org/2012/schedule/tutorials/ > > Here is a selection of tutorials with an abstracts that mention numpy > or a related project (scipy, ipython, matplotlib...): > > - Bayesian statistics made (as) simple (as possible) - Allen Downey > https://us.pycon.org/2012/schedule/presentation/10/ > > - IPython in-depth: high-productivity interactive and parallel python > - Fernando P?rez , Brian E. Granger , Min Ragan-Kelley > https://us.pycon.org/2012/schedule/presentation/121/ > > - Faster Python Programs through Optimization - Mike M?ller > https://us.pycon.org/2012/schedule/presentation/245/ > > - Graph Analysis from the Ground Up - Van Lindberg > https://us.pycon.org/2012/schedule/presentation/228/ > > - Data analysis in Python with pandas - Wes McKinney > https://us.pycon.org/2012/schedule/presentation/427/ > > - Social Network Analysis with Python - Maksim Tsvetovat > https://us.pycon.org/2012/schedule/presentation/15/ > > - High Performance Python I - Ian Ozsvald > https://us.pycon.org/2012/schedule/presentation/174/ > > - Plotting with matplotlib - Mike M?ller > https://us.pycon.org/2012/schedule/presentation/238/ > > - Introduction to Interactive Predictive Analytics in Python with > scikit-learn - Olivier Grisel > https://us.pycon.org/2012/schedule/presentation/195/ > > - High Performance Python II - Travis Oliphant > https://us.pycon.org/2012/schedule/presentation/343/ > Also two of the Cython devs (me and Mark Florisson) will attend with a poster on Cython. Dag Sverre From olivier.grisel at ensta.org Thu Jan 19 04:22:44 2012 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 19 Jan 2012 10:22:44 +0100 Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012 In-Reply-To: <CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com> References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com> <CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com> Message-ID: <CAFvE7K5fexJ1MMfyL7pzTrsOOEKf6S6U-6WX=tif+qL0Pw6TtA@mail.gmail.com> 2012/1/18 Chao YUE <chaoyuejoy at gmail.com>: > Does anybody know if there is similar chance for training in Paris? (or > other places of France)/ > the price is nice, just because it's in US.... The next EuroScipy will take place in Brussels. Just 1h25m train ride from Paris. http://www.euroscipy.org/conference/euroscipy2012 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From markbak at gmail.com Thu Jan 19 04:37:00 2012 From: markbak at gmail.com (Mark Bakker) Date: Thu, 19 Jan 2012 10:37:00 +0100 Subject: [Numpy-discussion] swaxes(0, 1) 10% faster than transpose on 2D matrix? Message-ID: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com> Hello List, I noticed that swapaxes(0,1) is consistently (on my system) 10% faster than transpose on a 2D matrix. Any reason why? Any reason why the swapaxes algorithm is not used in transpose? Just wondering. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120119/00cfbeba/attachment.html> From travis at continuum.io Thu Jan 19 12:21:31 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 19 Jan 2012 11:21:31 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> Message-ID: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> I think the problem here is one of delegation and information. I'm not even sure how the web-pages get updated at this point. Does anyone on this list know? I think it would be a great idea to move to github pages for the NumPy project at least. -Travis On Jan 19, 2012, at 12:39 AM, Fernando Perez wrote: > On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair > <scott.sinclair.za at gmail.com> wrote: >> I think (as usual), the problem is that fixing the situation lies on >> the shoulders of people who are already heavily overburdened.. > > I certainly understand that problem, as I'm eternally behind on a > million things regarding ipython. > > But the only solution to these problems is delegation, not asking the > already overburdened few to work even harder than they already do. I > wonder if we could distribute the process of managing the websites a > little more for numpy/scipy, so this didn't bottleneck as much. > > Furthermore, managing those is the kind of task that can be > accomplished by someone who may not feel comfortable touching the > numpy C core, and yet it's a *great* way to help the project out. > > In ipython, we've moved to github-pages hosting for everything, which > means that now having a web team is as easy as clicking on the github > interface a couple of times, and that's one more task we can get help > on from others. In fairness, right now the ipython-web team is the > same people as the core, but at least things are in place to accept > new hands helping should they become available, without any conflict > with core development. > > Just a thought. > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Thu Jan 19 12:57:40 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 19 Jan 2012 18:57:40 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> Message-ID: <jf9lik$lu4$1@dough.gmane.org> 19.01.2012 18:21, Travis Oliphant kirjoitti: > I think the problem here is one of delegation and information. > > I'm not even sure how the web-pages get updated at this point. > Does anyone on this list know? I think it would be a great idea > to move to github pages for the NumPy project at least. The main scipy.org web page is the wiki. I'm not sure who apart from Enthought's IT staff has access to the machine running it. The pages at numpy.scipy.org and new.scipy.org are hosted on new.scipy.org as static files -- they're just generated by sphinx and uploaded manually. In addition to that, the machine also runs the Trac, the doc editor, and the conference.scipy.org and docs.scipy.org websites. A couple of people (including at least me and Jarrod + Enthought IT staff) have access to that machine. Moving the stuff at numpy.scipy.org to Github pages would make sense, as those are only static files. IMO, the stuff at new.scipy.org should be taken down --- the idea was to revise the scipy.org front page during Scipy '09 conference, and make it rely less on the wiki, but the work was not finished. I think I don't have the necessary unix permissions to put the site down or edit it, though. Pauli From kwgoodman at gmail.com Thu Jan 19 13:53:17 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 19 Jan 2012 10:53:17 -0800 Subject: [Numpy-discussion] swaxes(0, 1) 10% faster than transpose on 2D matrix? In-Reply-To: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com> References: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com> Message-ID: <CAB6Y534P757tFKgcjTw4=RU0wCJYuLy2EryenEDYZpmgp_d_Vg@mail.gmail.com> On Thu, Jan 19, 2012 at 1:37 AM, Mark Bakker <markbak at gmail.com> wrote: > I noticed that swapaxes(0,1) is consistently (on my system) 10% faster than > transpose on a 2D matrix. Transpose is faster for me. And a.T is faster than a.transpose() perhaps because a.transpose() checks that the inputs make sense? My guess is that they all do the same thing. It's just a matter of which function has the least overhead. I[10] a = np.random.rand(1000,1000) I[11] timeit a.T 10000000 loops, best of 3: 153 ns per loop I[12] timeit a.transpose() 10000000 loops, best of 3: 171 ns per loop I[13] timeit a.swapaxes(0,1) 1000000 loops, best of 3: 227 ns per loop I[14] timeit np.transpose(a) 1000000 loops, best of 3: 308 ns per loop From fperez.net at gmail.com Thu Jan 19 14:48:12 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 19 Jan 2012 11:48:12 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> Message-ID: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> On Thu, Jan 19, 2012 at 9:21 AM, Travis Oliphant <travis at continuum.io> wrote: > I'm not even sure how the web-pages get updated at this point. ? Does anyone on this list know? ? ?I think it would be a great idea to move to github pages for the NumPy project at least. We've moved to the following setup with ipython, which works very well for us so far: 1. ipython.org: Main website with only static content, manged as a repo in github (https://github.com/ipython/ipython-website) and updated with a gh-pages build (https://github.com/ipython/ipython.github.com). 2. wiki.ipython.org: a mediawiki instance we run on a server I personally pay for. 3. archive.ipython.org: static hosting of content such as downloads of release candidates, same server as #2. We also keep main releases here as an alternative, but I think most people get the releases from pypi these days. With this setup, the only thing that requires actual ssh access is #3, and I simply uploaded the keys of a few developers to that server. But having to upload content there is fairly rare, and the large majority of content that needs update lives in #1 and #2, both of which have access control mechanisms that make job delegation extremely easy. At this point, our only real bottleneck is that I'm still the sole release manager so far. But now that we're hitting a more regular release pace I plan to change that soon, and start rotating this job too, so it doesn't depend on my time. We used to release so infrequently that this wasn't really an issue, and the 0.11 release was so big that I wouldn't foist it on anyone else (it took ~2 weeks just to do the release work), but moving forward this job should also be easy to delegate and we'll do so soon. I'm happy to share any other details that may help smooth out the workflow for numpy and scipy. I certainly think that the current setup with a very outdated wiki as the main site and a new-but-semi-invalid rst one needs fixing; it's kind of a shame to have the crown jewels of the scientific python ecosystem with such a poor web presence. But fortunately the problem isn't too hard to fix these days (the github machinery really plays a key part in helping here). Cheers, f From ognen at enthought.com Thu Jan 19 15:14:05 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Thu, 19 Jan 2012 14:14:05 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> Message-ID: <CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com> On Thu, Jan 19, 2012 at 1:48 PM, Fernando Perez <fperez.net at gmail.com> wrote: > On Thu, Jan 19, 2012 at 9:21 AM, Travis Oliphant <travis at continuum.io> wrote: >> I'm not even sure how the web-pages get updated at this point. ? Does anyone on this list know? ? ?I think it would be a great idea to move to github pages for the NumPy project at least. > > We've moved to the following setup with ipython, which works very well > for us so far: > > 1. ipython.org: Main website with only static content, manged as a > repo in github (https://github.com/ipython/ipython-website) and > updated with a gh-pages build > (https://github.com/ipython/ipython.github.com). > > 2. wiki.ipython.org: a mediawiki instance we run on a server I > personally pay for. > > 3. archive.ipython.org: static hosting of content such as downloads of > release candidates, same server as #2. ?We also keep main releases > here as an alternative, but I think most people get the releases from > pypi these days. > > With this setup, the only thing that requires actual ssh access is #3, > and I simply uploaded the keys of a few developers to that server. > But having to upload content there is fairly rare, and the large > majority of content that needs update lives in #1 and #2, both of > which have access control mechanisms that make job delegation > extremely easy. > > At this point, our only real bottleneck is that I'm still the sole > release manager so far. ?But now that we're hitting a more regular > release pace I plan to change that soon, and start rotating this job > too, so it doesn't depend on my time. ?We used to release so > infrequently that this wasn't really an issue, and the 0.11 release > was so big that I wouldn't foist it on anyone else (it took ~2 weeks > just to do the release work), but moving forward this job should also > be easy to delegate and we'll do so soon. > > I'm happy to share any other details that may help smooth out the > workflow for numpy and scipy. ?I certainly think that the current > setup with a very outdated wiki as the main site and a > new-but-semi-invalid rst one needs fixing; it's kind of a shame to > have the crown jewels of the scientific python ecosystem with such a > poor web presence. ?But fortunately the problem isn't too hard to fix > these days (the github machinery really plays a key part in helping > here). ipython.org used to live on scipy.org machine - as far as I can tell the only thing still on the scipy.org machine related to ipython are the dev and user mailing lists (via mailman) hosted at projects.scipy.org. Ognen From ognen at enthought.com Thu Jan 19 15:18:03 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Thu, 19 Jan 2012 14:18:03 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <jf9lik$lu4$1@dough.gmane.org> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <jf9lik$lu4$1@dough.gmane.org> Message-ID: <CAA6U3WDejzJzNzAyws7PxGCk=ngbXcxLRekOXc8C1yeC=-0NyA@mail.gmail.com> On Thu, Jan 19, 2012 at 11:57 AM, Pauli Virtanen <pav at iki.fi> wrote: > 19.01.2012 18:21, Travis Oliphant kirjoitti: >> I think the problem here is one of delegation and information. >> >> I'm not even sure how the web-pages get updated at this point. >> Does anyone on this list know? I think it would be a great idea >> to move to github pages for the NumPy project at least. > > The main scipy.org web page is the wiki. I'm not sure who apart from > Enthought's IT staff has access to the machine running it. This machine is slated to move to Amazon EC2 no later than end of March. I am doing it myself. The problem I ran into is one of accumulated crust (for lack of better expression). There are a zillion apache .conf files and virtual www sites hosted off that box, just deciding what it still live and what is not is a big task (I don't want to shut something off by accident). I would personally be in favour of moving as much as we can to github or whatever other place you may think of. The current scipy.org machine bogs down randomly and apache needs a kick almost daily. Whenever I log into the box to restart it - the load is in the 17-20 range. The scipy.org machine is actually an OpenVZ container living on an underpowered (imho) linux box. Hence, I decided to get a large Amazon instance with plenty of memory. At the same time, this is the perfect opportunity for cleanup. If someone is willing to assist me, I have no problems getting more involved into moving things and reorganizing them. Ognen From fperez.net at gmail.com Thu Jan 19 15:37:34 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 19 Jan 2012 12:37:34 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> <CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com> Message-ID: <CAHAreOq=LOiENAWY83GqAKC-q72LQWtyQf2JjaFR0JqEqdAGOg@mail.gmail.com> On Thu, Jan 19, 2012 at 12:14 PM, Ognen Duzlevski <ognen at enthought.com> wrote: > ipython.org used to live on scipy.org machine - as far as I can tell > the only thing still on the scipy.org machine related to ipython are > the dev and user mailing lists (via mailman) hosted at > projects.scipy.org. Yup, we've now moved everything but the mailing lists (as you point out next, the load on that box was so awful all the time that trying to use it for anything was nothing but pain). Technically, when we were on scipy our domain was ipython.scipy.org, the ipython.org domain has been from the start hosted outside of the Enthought infrastructure; but that's just a nitpick :) Thanks for tackling the problem of cleaning up all that accumulated cruft, and for the always responsive support you gave us in the past. Cheers, f From ruby185 at gmail.com Thu Jan 19 23:28:08 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Thu, 19 Jan 2012 23:28:08 -0500 Subject: [Numpy-discussion] getting position index from array Message-ID: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com> hi, all I am a newbie on numpy ... I am trying to figure out, given an array, how to get back position value based on some conditions. Say, array([1, 0, 0, 0 1], and I want to get a list of indices where it is none-zero, [ 0 , 4 ] The closest thing I can find from the doc is select(), but I can't figure out how to use it properly. Thanks for your help. Ruby From ben.root at ou.edu Thu Jan 19 23:33:02 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 19 Jan 2012 22:33:02 -0600 Subject: [Numpy-discussion] getting position index from array In-Reply-To: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com> References: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com> Message-ID: <CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com> On Thursday, January 19, 2012, Ruby Stevenson <ruby185 at gmail.com> wrote: > hi, all > > I am a newbie on numpy ... I am trying to figure out, given an array, > how to get back position value based on some conditions. > Say, array([1, 0, 0, 0 1], and I want to get a list of indices where > it is none-zero, [ 0 , 4 ] > > The closest thing I can find from the doc is select(), but I can't > figure out how to use it properly. > > Thanks for your help. > > Ruby > np.nonzero() Note that you typically use it with a Boolean array result like "a >= 4". Also note that it returns a tuple of index lists, on for each dimension. This can the be feed back into the array to get the values as a flat array. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120119/5fbd06f0/attachment.html> From scott.sinclair.za at gmail.com Fri Jan 20 02:49:13 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 20 Jan 2012 09:49:13 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> Message-ID: <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com> On 19 January 2012 21:48, Fernando Perez <fperez.net at gmail.com> wrote: > We've moved to the following setup with ipython, which works very well > for us so far: > > 1. ipython.org: Main website with only static content, manged as a > repo in github (https://github.com/ipython/ipython-website) and > updated with a gh-pages build > (https://github.com/ipython/ipython.github.com). I like this idea, and to get the ball rolling I've stripped out the www directory of the scipy.org-new repo into it's own repository using git filter-branch (posted here: https://github.com/scottza/scipy_website) and created https://github.com/scottza/scottza.github.com. This puts a copy of the new scipy website at http://scottza.github.com as a proof of concept. Since there seems to be some agreement on rehosting numpy's website on github, I'd be happy to do as much of the legwork as I can in getting the numpy.scipy.org content hosted at numpy.github.com. I don't have permission to create new repos for the Numpy organization, so someone would have to create an empty https://github.com/numpy/numpy.github.com and give me push permission on that repo. It would be great to see scipy go the same way and make updating the site easier. I know that David Warde-Farley, Pauli and others put in a lot of work scraping content off the wiki to produce the new website, it would be fantastic to see the fruits of that effort. Issues with scipy "Trac, the doc editor, and the conference.scipy.org and docs.scipy.org" as mentioned by Pauli. There is also the cookbook on the wiki to consider (perhaps http://scipy-central.org/ could play a role there). Cheers, Scott From valentin.haenel at epfl.ch Fri Jan 20 05:25:36 2012 From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin) Date: Fri, 20 Jan 2012 11:25:36 +0100 Subject: [Numpy-discussion] (no subject) Message-ID: <20120120102536.GB18683@kudu.in-berlin.de> Hi, I would like to make a sanity test to check that calling the same function with different parameters actually gives different results. I am currently using:: try: npt.assert_almost_equal(numpy_result, result) except AssertionError: assert True else: assert False But maybe you have a better way? I couldn't find a 'assert_not_equal' and the above just feels stupid. thanks for your advice. V- -- Valentin H?nel Scientific Software Developer Blue Brain Project http://bluebrain.epfl.ch/ From shish at keba.be Fri Jan 20 06:53:04 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 20 Jan 2012 06:53:04 -0500 Subject: [Numpy-discussion] (no subject) In-Reply-To: <20120120102536.GB18683@kudu.in-berlin.de> References: <20120120102536.GB18683@kudu.in-berlin.de> Message-ID: <CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com> Not sure if there's a better way, but you can do it with assert not numpy.allclose(numpy_result, result) -=- Olivier 2012/1/20 H?nel Nikolaus Valentin <valentin.haenel at epfl.ch> > Hi, > > I would like to make a sanity test to check that calling the same > function with different parameters actually gives different results. > > I am currently using:: > > try: > npt.assert_almost_equal(numpy_result, result) > except AssertionError: > assert True > else: > assert False > > But maybe you have a better way? I couldn't find a 'assert_not_equal' > and the above just feels stupid. > > thanks for your advice. > > V- > > -- > Valentin H?nel > Scientific Software Developer > Blue Brain Project http://bluebrain.epfl.ch/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120120/e1592eeb/attachment.html> From david.verelst at gmail.com Fri Jan 20 06:53:32 2012 From: david.verelst at gmail.com (David Verelst) Date: Fri, 20 Jan 2012 12:53:32 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com> Message-ID: <4F1955BC.4000207@gmail.com> I would like to assist on the website. Although I have not made any code contributions to Numpy/SciPy (yet), I do follow the mailing lists and try to keep up to date on the scientific python scene. However, I need to hold my breath until the end of my wind tunnel test campaign mid February. And I do like the sound of the gihub workflow as currently done by the ipython team. Regards, David On 20/01/12 08:49, Scott Sinclair wrote: > On 19 January 2012 21:48, Fernando Perez<fperez.net at gmail.com> wrote: >> We've moved to the following setup with ipython, which works very well >> for us so far: >> >> 1. ipython.org: Main website with only static content, manged as a >> repo in github (https://github.com/ipython/ipython-website) and >> updated with a gh-pages build >> (https://github.com/ipython/ipython.github.com). > I like this idea, and to get the ball rolling I've stripped out the > www directory of the scipy.org-new repo into it's own repository using > git filter-branch (posted here: > https://github.com/scottza/scipy_website) and created > https://github.com/scottza/scottza.github.com. This puts a copy of the > new scipy website at http://scottza.github.com as a proof of concept. > > Since there seems to be some agreement on rehosting numpy's website on > github, I'd be happy to do as much of the legwork as I can in getting > the numpy.scipy.org content hosted at numpy.github.com. I don't have > permission to create new repos for the Numpy organization, so someone > would have to create an empty > https://github.com/numpy/numpy.github.com and give me push permission > on that repo. > > It would be great to see scipy go the same way and make updating the > site easier. I know that David Warde-Farley, Pauli and others put in a > lot of work scraping content off the wiki to produce the new website, > it would be fantastic to see the fruits of that effort. > > Issues with scipy "Trac, the doc editor, and the conference.scipy.org > and docs.scipy.org" as mentioned by Pauli. There is also the cookbook > on the wiki to consider (perhaps http://scipy-central.org/ could play > a role there). > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From valentin.haenel at epfl.ch Fri Jan 20 07:27:36 2012 From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin) Date: Fri, 20 Jan 2012 13:27:36 +0100 Subject: [Numpy-discussion] (no subject) In-Reply-To: <CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com> References: <20120120102536.GB18683@kudu.in-berlin.de> <CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com> Message-ID: <20120120122736.GD18683@kudu.in-berlin.de> * Olivier Delalleau <shish at keba.be> [120120]: > Not sure if there's a better way, but you can do it with > > assert not numpy.allclose(numpy_result, result) Okay, thats already better than what I have. thanks V- From pierre.haessig at crans.org Fri Jan 20 07:39:00 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 20 Jan 2012 13:39:00 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com> References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com> Message-ID: <4F196064.5090508@crans.org> Hi Eliot, Le 19/01/2012 07:50, Elliot Saba a ?crit : > I recently needed to calculate the cross-covariance of two random > vectors, (e.g. I have two matricies, X and Y, the columns of which are > observations of one variable, and I wish to generate a matrix pairing > each value of X and Y) I don't see how does your function relates to numpy.cov [1]. Is it an "extended case" function or is there a difference in the underlying math ? Best, Pierre [1] numpy.cov docstring : http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html From ruby185 at gmail.com Fri Jan 20 09:21:10 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Fri, 20 Jan 2012 09:21:10 -0500 Subject: [Numpy-discussion] getting position index from array In-Reply-To: <CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com> References: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com> <CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com> Message-ID: <CAA=a5iNWKBfgjLF9FYLkJU5KunxMr0B0__XSynCqs9zQReg=fw@mail.gmail.com> Exactly what I need - thank you very much. Ruby On Thu, Jan 19, 2012 at 11:33 PM, Benjamin Root <ben.root at ou.edu> wrote: > > > On Thursday, January 19, 2012, Ruby Stevenson <ruby185 at gmail.com> wrote: >> hi, all >> >> I am a newbie on numpy ... I am trying to figure out, given an array, >> how to get back position value based on some conditions. >> Say, array([1, 0, 0, 0 1], and I want to get a list of indices where >> it is none-zero, [ 0 , 4 ] >> >> The closest thing I can find from the doc is select(), but I can't >> figure out how to use it properly. >> >> Thanks for your help. >> >> Ruby >> > > np.nonzero() > > Note that you typically use it with a Boolean array result like "a >= 4". > ?Also note that it returns a tuple of index lists, on for each dimension. > ?This can the be feed back into the array to get the values as a flat array. > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ruby185 at gmail.com Fri Jan 20 09:41:30 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Fri, 20 Jan 2012 09:41:30 -0500 Subject: [Numpy-discussion] condense array along one dimension Message-ID: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com> hi, all Say I have a three dimension array, X, Y, Z, how can I condense into two dimensions: for example, compute 2-D array with (X, Z) and summarize along Y dimensions ... is it possible? thanks Ruby From shish at keba.be Fri Jan 20 09:50:30 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 20 Jan 2012 09:50:30 -0500 Subject: [Numpy-discussion] condense array along one dimension In-Reply-To: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com> References: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com> Message-ID: <CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com> What do you mean by "summarize"? If for instance you want to sum along Y, just do my_array.sum(axis=1) -=- Olivier 2012/1/20 Ruby Stevenson <ruby185 at gmail.com> > hi, all > > Say I have a three dimension array, X, Y, Z, how can I condense into > two dimensions: for example, compute 2-D array with (X, Z) and > summarize along Y dimensions ... is it possible? > > thanks > > Ruby > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120120/c12b180c/attachment.html> From sturla at molden.no Fri Jan 20 10:30:42 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 20 Jan 2012 16:30:42 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F196064.5090508@crans.org> References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com> <4F196064.5090508@crans.org> Message-ID: <4F1988A2.5000701@molden.no> Den 20.01.2012 13:39, skrev Pierre Haessig: > I don't see how does your function relates to numpy.cov [1]. Is it an > "extended case" function or is there a difference in the underlying math ? > If X is rank n x p, then np.cov(X, rowvar=False) is equal to S after cX = X - X.mean(axis=0)[np.newaxis,:] S = np.dot(cX.T, cX)/(n-1.) If we also have Y rank n x p, then the upper p x p quadrant of np.cov(X, y=Y, rowvar=False) is equal to S after XY = np.hstack(X,Y) cXY = XY - XY.mean(axis=0)[np.newaxis,:] S = np.dot(cXY.T, cXY)/(n-1.) Thus we can see thatthe total cocariance is composed of four parts: S[:p,:p] = np.dot(cX.T, cX)/(n-1.) # upper left S[:p,p:] = np.dot(cXY.T, cYY)/(n-1.) # upper right S[p:,:p] = np.dot(cY.T, cX)/(n-1.) # lower left S[p:,:p] = np.dot(cYX.T, cYX)/(n-1.) # lower right Often we just want the upper-right p x p quadrant. Thus we can save 75 % of the cpu time by not computing the rest. Sturla From pierre.haessig at crans.org Fri Jan 20 13:04:26 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 20 Jan 2012 19:04:26 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F1988A2.5000701@molden.no> References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com> <4F196064.5090508@crans.org> <4F1988A2.5000701@molden.no> Message-ID: <4F19ACAA.2060707@crans.org> Le 20/01/2012 16:30, Sturla Molden a ?crit : > Often we just want the upper-right p x p quadrant. Thanks for the explanation. If I understood it correctly, you're interested in the *cross*-covariance block of the matrix (and now I understand better Elliot's message). Actually, I thought that was the behavior of the np.cor function ! But you're right it's not ! [source code] The seconde 'y' argument just gets concatenated with the first one 'm'. I would go further and ask why it so. People around may have use cases in mind, because I have not. Otherwise, I feel that the default behavior of cov when called with two arguments should be what Sturla and Elliot just described. Best, Pierre (that is something like this : def cov(X, Y=None): if Y is None: Y = X else: assert Y.shape == X.shape # or something like that # [...jumping to the end of the existing code...] if not rowvar: return (dot(X.T, Y.conj()) / fact).squeeze() else: return (dot(X, Y.T.conj()) / fact).squeeze() ) [source code] https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py From fperez.net at gmail.com Fri Jan 20 16:26:33 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 20 Jan 2012 13:26:33 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <4F1955BC.4000207@gmail.com> References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com> <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com> <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com> <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com> <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com> <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com> <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com> <4F1955BC.4000207@gmail.com> Message-ID: <CAHAreOo5BaB8VSAhsOH_4Tn2PqL=QZs7jrvh7XRvKjc3Ri=s-w@mail.gmail.com> On Fri, Jan 20, 2012 at 3:53 AM, David Verelst <david.verelst at gmail.com> wrote: > I would like to assist on the website. Although I have not made any code > contributions to Numpy/SciPy (yet), I do follow the mailing lists and > try to keep up to date on the scientific python scene. However, I need > to hold my breath until the end of my wind tunnel test campaign mid > February. Fantastic, thanks. I think the ideal setup would be to create a web team in the numpy org. so that this team can have permissions over the website repos (source and build). I don't belong to the org so I can't do it myself. > And I do like the sound of the gihub workflow as currently done by the > ipython team. Don't hesitate to ask us if you have any questions. In particular, it's important *not* to use gh-pages like they originally suggest, but instead like we do it in ipython: the build should be a separate repo altogether, not just a branch in the official source repo. Ours has the makefile targets and scripts already for that, let me know if any of it doesn't make sense. Cheers, f From lamblinp at iro.umontreal.ca Fri Jan 20 16:55:43 2012 From: lamblinp at iro.umontreal.ca (Pascal Lamblin) Date: Fri, 20 Jan 2012 22:55:43 +0100 Subject: [Numpy-discussion] Upgrade to 1.6.x: frompyfunc() ufunc casting issue In-Reply-To: <CANZ39W3ZS1LNVRxanjr34p+P9O65O5vaHxSNLmOsaVgew8N=Lw@mail.gmail.com> References: <mailman.687.1327095181.1085.numpy-discussion@scipy.org> Message-ID: <20120120215543.GC10327@bob.blip.be> Hi everyone, A long time ago, Aditya Sethi <ady.sethi at gmail... wrote: > I am facing an issue upgrading numpy from 1.5.1 to 1.6.1. > In numPy 1.6, the casting behaviour for ufunc has changed and has become > stricter. > > Can someone advise how to implement the below simple example which worked in > 1.5.1 but fails in 1.6.1? > > >>> import numpy as np > >>> def add(a,b): > ... return (a+b) > >>> uadd = np.frompyfunc(add,2,1) > >>> uadd > <ufunc 'add (vectorized)'> > >>> uadd.accumulate([1,2,3]) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ValueError: could not find a matching type for add (vectorized).accumulate, > requested type has type code 'l' Here's the workaround I found to that problem: >>> uadd.accumulate([1,2,3], dtype='object') array([1, 3, 6], dtype=object) It seems like "accumulate" infers that 'l' is the required output dtype, but does not have the appropriate implementation: >>> uadd.types ['OO->O'] Forcing the output dtype to be 'object' (the only supported dtype) seems to do the trick. Hope this helps, -- Pascal From torgil.svensson at gmail.com Sat Jan 21 08:49:24 2012 From: torgil.svensson at gmail.com (Torgil Svensson) Date: Sat, 21 Jan 2012 14:49:24 +0100 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> Message-ID: <CA+RwOBWf6hfGtRX2ps+Gx0pj2fDozKMQTOUz+k4GPCXdosjOQw@mail.gmail.com> unique has an option to get indexes out which you can use in combination with sort to get the actual counts out. tab0 = zeros( 256*256*256 , dtype=int) col=ravel(((im0[...,0].astype('u4')*256+im0[...,1])*256)+im0[...,2]) col,idx=unique(sort(col),True) idx=hstack([idx,[2500*2500]]) tab0[col]=idx[1:]-idx[:-1] tab0.shape=(256,256,256) As Chris pointed out, if each pixel were 4 bytes you could probably just use im0.view('>u4') for histogram values. //Torgil On Wed, Jan 18, 2012 at 10:26 AM, <apo at pdauf.de> wrote: > > Sorry, > > that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker. > When iam direct answering on Your e-mail i get an error 5. > I think i did a mistake. > > Your ideas are very helpfull and the code is very fast. > > Thank You > > elodw > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sat Jan 21 12:06:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 21 Jan 2012 10:06:09 -0700 Subject: [Numpy-discussion] views and mask NA In-Reply-To: <CAB6mnxL=sFZOT5Wih09UinGHz2RHB3vn=MQy8L5cdEx9A6kmgA@mail.gmail.com> References: <CAB6mnxL=sFZOT5Wih09UinGHz2RHB3vn=MQy8L5cdEx9A6kmgA@mail.gmail.com> Message-ID: <CAB6mnxL5Vv8DS4DPqDOd-_oRMeOqx2rNA1kdF6BE61sCt7AULA@mail.gmail.com> Hi All, I'd like some feedback on how mask NA should interact with views. The immediate problem is how to deal with the real and imaginary parts of complex numbers. If the original has a masked value, it should show up as masked in the real and imaginary parts. But what should happen on assignment to one of the masked views? This should probably clear the NA in the real/imag part, but not in the complex original. However, that does allow touching things under the mask, so to speak. Things get more complicated if the complex original is viewed as reals. In this case the mask needs to be "doubled" up, and there is again the possibility of touching things beneath the mask in the original. Viewing the original as bytes leads to even greater duplication. My thought is that touching the underlying data needs to be allowed in these cases, but the original mask can only be cleared by assignment to the original. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/438da4a5/attachment.html> From staticfloat at gmail.com Sat Jan 21 15:47:36 2012 From: staticfloat at gmail.com (Elliot Saba) Date: Sat, 21 Jan 2012 12:47:36 -0800 Subject: [Numpy-discussion] Cross-covariance function Message-ID: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> Thank you Sturla, that's exactly what I want. I'm sorry that I was not able to reply for so long, but Pierre's code is similar to what I have already implemented, and I am in support of changing the functionality of cov(). I am unaware of any arguments for a covariance function that works in this way, except for the fact that the MATLAB cov() function behaves in the same way. [1] MATLAB, however, has an xcov() function, which is similar to what we have been discussing. [2] Unless you all wish to retain compatibility with MATLAB, I feel that the behaviour of cov() suggested by Pierre is the most straightforward method, and that if users wish to calculate the covariance of X concatenated with Y, then they may simply concatenate the matrices explicitly before passing into cov(), as this way the default method does not use 75% more CPU time. Again, if there is an argument for this functionality, I would love to learn of it! -E [1] http://www.mathworks.com/help//techdoc/ref/cov.html [2] http://www.mathworks.com/help/toolbox/signal/ref/xcov.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/040fa1c9/attachment.html> From jsalvati at u.washington.edu Sat Jan 21 18:26:30 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Sat, 21 Jan 2012 15:26:30 -0800 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> Message-ID: <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> I ran into this a while ago and was confused why cov did not behave the way pierre suggested. On Jan 21, 2012 12:48 PM, "Elliot Saba" <staticfloat at gmail.com> wrote: > Thank you Sturla, that's exactly what I want. > > I'm sorry that I was not able to reply for so long, but Pierre's code is > similar to what I have already implemented, and I am in support of changing > the functionality of cov(). I am unaware of any arguments for a covariance > function that works in this way, except for the fact that the MATLAB cov() > function behaves in the same way. [1] > > MATLAB, however, has an xcov() function, which is similar to what we have > been discussing. [2] > > Unless you all wish to retain compatibility with MATLAB, I feel that the > behaviour of cov() suggested by Pierre is the most straightforward method, > and that if users wish to calculate the covariance of X concatenated with > Y, then they may simply concatenate the matrices explicitly before passing > into cov(), as this way the default method does not use 75% more CPU time. > > Again, if there is an argument for this functionality, I would love to > learn of it! > -E > > [1] http://www.mathworks.com/help//techdoc/ref/cov.html > [2] http://www.mathworks.com/help/toolbox/signal/ref/xcov.html > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/8c93072e/attachment.html> From josef.pktd at gmail.com Sat Jan 21 19:40:34 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Jan 2012 19:40:34 -0500 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> Message-ID: <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> On Sat, Jan 21, 2012 at 6:26 PM, John Salvatier <jsalvati at u.washington.edu> wrote: > I ran into this a while ago and was confused why cov did not behave the way > pierre suggested. same here, When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for two arrays, while R only returns the cross-correlation part. Josef > > On Jan 21, 2012 12:48 PM, "Elliot Saba" <staticfloat at gmail.com> wrote: >> >> Thank you Sturla, that's exactly what I want. >> >> I'm sorry that I was not able to reply for so long, but Pierre's code is >> similar to what I have already implemented, and I am in support of changing >> the functionality of cov(). ?I am unaware of any arguments for a covariance >> function that works in this way, except for the fact that the MATLAB cov() >> function behaves in the same way. [1] >> >> MATLAB, however, has an xcov() function, which is similar to what we have >> been discussing. [2] >> >> Unless you all wish to retain compatibility with MATLAB, I feel that the >> behaviour of cov() suggested by Pierre is the most straightforward method, >> and that if users wish to calculate the covariance of X concatenated with Y, >> then they may simply concatenate the matrices explicitly before passing into >> cov(), as this way the default method does not use 75% more CPU time. >> >> Again, if there is an argument for this functionality, I would love to >> learn of it! >> -E >> >> [1]?http://www.mathworks.com/help//techdoc/ref/cov.html >> [2]?http://www.mathworks.com/help/toolbox/signal/ref/xcov.html >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ondrej.certik at gmail.com Sat Jan 21 22:55:10 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Sat, 21 Jan 2012 19:55:10 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> Message-ID: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> Hi, I read the Mandelbrot code using NumPy at this page: http://mentat.za.net/numpy/intro/intro.html but when I run it, it gives me integer overflows. As such, I have fixed the code, so that it doesn't overflow here: https://gist.github.com/1655320 and I have also written an equivalent Fortran program. You can compare both source codes to see that that it is pretty much one-to-one translation. The main idea in the above gist is to take an algorithm written in NumPy, and translate it directly to Fortran, without any special optimizations. So the above is my first try in Fortran. You can plot the result using this simple script (you can also just click on this gist to see the image there): https://gist.github.com/1655377 Here are my timings: Python Fortran Speedup Calculation 12.749 00.784 16.3x Saving 01.904 01.456 1.3x Total 14.653 02.240 6.5x I save the matrices to disk in an ascii format, so it's quite slow in both cases. The pure computation is however 16x faster in Fortran (in gfortran, I didn't even try Intel Fortran, that will probably be even faster). As such, I wonder how the NumPy version could be sped up? I have compiled NumPy with Lapack+Blas from source. Would anyone be willing to run the NumPy version? Just copy+paste should do it. If you want to run the Fortran version, the above gist uses some of my other modules that I use in my other programs, my goal was to see how much more complicated the Fortran code gets, compared to NumPy. As such, I put here https://gist.github.com/1655350 a single file with all the dependencies, just compile it like this: gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90 and run: $ ./a.out Iteration 1 Iteration 2 ... Iteration 100 Saving... Times: Calculation: 0.74804599999999999 Saving: 1.3640850000000002 Total: 2.1121310000000002 Let me know if you figure out something. I think the "mask" thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the "where" statement, which does the same thing). Maybe there is some other way to write the same thing in NumPy? Ondrej From pengyu.ut at gmail.com Sat Jan 21 23:25:22 2012 From: pengyu.ut at gmail.com (Peng Yu) Date: Sat, 21 Jan 2012 22:25:22 -0600 Subject: [Numpy-discussion] Easy module installation with less human intervention. Message-ID: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com> Hi, Perl has something like ppm so that I can just use one command to download and install perl modules. But I don't find such thing in python. As shown on http://docs.python.org/install/index.html, it seems that I have to download the packages first unzip it then install it. I'm wondering if there is a better way to install python packages that require less human intervention. Thanks! NAME ppm - Perl Package Manager, version 4.14 SYNOPSIS Invoke the graphical user interface: ppm ppm gui Install, upgrade and remove packages: ppm install [--area <area>] [--force] <pkg> ppm install [--area <area>] [--force] <module> ppm install [--area <area>] <url> ppm install [--area <area>] <file>.ppmx ppm install [--area <area>] <file>.ppd ppm install [--area <area>] <num> ppm upgrade [--install] ppm upgrade <pkg> ppm upgrade <module> ppm remove [--area <area>] [--force] <pkg> -- Regards, Peng From shish at keba.be Sat Jan 21 23:34:57 2012 From: shish at keba.be (Olivier Delalleau) Date: Sat, 21 Jan 2012 23:34:57 -0500 Subject: [Numpy-discussion] Easy module installation with less human intervention. In-Reply-To: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com> References: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com> Message-ID: <CAFXk4bq=Qg1mZEMLtmrwiZRF+hd9NtS7dumJ4Q_ozcA8YEmoUw@mail.gmail.com> You can try easy_install or pip. -=- Olivier 2012/1/21 Peng Yu <pengyu.ut at gmail.com> > Hi, > > > Perl has something like ppm so that I can just use one command to > download and install perl modules. But I don't find such thing in > python. As shown on http://docs.python.org/install/index.html, it > seems that I have to download the packages first unzip it then install > it. I'm wondering if there is a better way to install python packages > that require less human intervention. Thanks! > > > NAME > ppm - Perl Package Manager, version 4.14 > > SYNOPSIS > Invoke the graphical user interface: > > ppm > ppm gui > > Install, upgrade and remove packages: > > ppm install [--area <area>] [--force] <pkg> > ppm install [--area <area>] [--force] <module> > ppm install [--area <area>] <url> > ppm install [--area <area>] <file>.ppmx > ppm install [--area <area>] <file>.ppd > ppm install [--area <area>] <num> > ppm upgrade [--install] > ppm upgrade <pkg> > ppm upgrade <module> > ppm remove [--area <area>] [--force] <pkg> > > > > -- > Regards, > Peng > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/b61dcf55/attachment.html> From nadavh at visionsense.com Sun Jan 22 05:28:53 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 22 Jan 2012 02:28:53 -0800 Subject: [Numpy-discussion] Strange error raised by scipy.special.erf Message-ID: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local> With N.seterr(all='raise'): >>> from scipy import special >>> import scipy >>> special.erf(26.6) 1.0 >>> scipy.__version__ '0.11.0.dev-81dc505' >>> import numpy as N >>> N.seterr(all='raise') {'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'} >>> special.erf(26.5) 1.0 >>> special.erf(26.6) Traceback (most recent call last): File "<pyshell#7>", line 1, in <module> special.erf(26.6) FloatingPointError: underflow encountered in erf >>> special.erf(26.7) 1.0 What is so special in 26.6? I have this error also with previous versions of scipy Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120122/444ea5ab/attachment.html> From seb.haase at gmail.com Sun Jan 22 06:13:32 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Sun, 22 Jan 2012 12:13:32 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> Message-ID: <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> How does the?algorithm and timing compare to this one: http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f The author of original version is Dan Goodman # FAST FRACTALS WITH PYTHON AND NUMPY -Sebastian Haase 2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com> > > Hi, > > I read the Mandelbrot code using NumPy at this page: > > http://mentat.za.net/numpy/intro/intro.html > > but when I run it, it gives me integer overflows. As such, I have > fixed the code, so that it doesn't overflow here: > > https://gist.github.com/1655320 > > and I have also written an equivalent Fortran program. > > You can compare both source codes to see > that that it is pretty much one-to-one translation. > The main idea in the above gist is to take an > algorithm written in NumPy, and translate > it directly to Fortran, without any special > optimizations. So the above is my first try > in Fortran. You can plot the result > using this simple script (you > can also just click on this gist to > see the image there): > > https://gist.github.com/1655377 > > Here are my timings: > > ? ? ? ? ? ? ? Python ?Fortran Speedup > Calculation ? ? 12.749 ?00.784 ?16.3x > Saving ?01.904 ?01.456 ?1.3x > Total ? ? ? ? ?14.653 ? 02.240 ?6.5x > > I save the matrices to disk in an ascii format, > so it's quite slow in both cases. The pure computation > is however 16x faster in Fortran (in gfortran, > I didn't even try Intel Fortran, that will probably be > even faster). > > As such, I wonder how the NumPy version could be sped up? > I have compiled NumPy with Lapack+Blas from source. > > Would anyone be willing to run the NumPy version? Just copy+paste > should do it. > > If you want to run the Fortran version, the above gist uses > some of my other modules that I use in my other programs, my goal > was to see how much more complicated the Fortran code gets, > compared to NumPy. As such, I put here > > https://gist.github.com/1655350 > > a single file > with all the dependencies, just compile it like this: > > gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90 > > and run: > > $ ./a.out > Iteration 1 > Iteration 2 > ... > Iteration 100 > ?Saving... > ?Times: > ?Calculation: ?0.74804599999999999 > ?Saving: ? 1.3640850000000002 > ?Total: ? 2.1121310000000002 > > > Let me know if you figure out something. I think the "mask" thing is > quite slow, but the problem is that it needs to be there, to catch > overflows (and it is there in Fortran as well, see the > "where" statement, which does the same thing). Maybe there is some > other way to write the same thing in NumPy? > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Sun Jan 22 12:29:20 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 22 Jan 2012 18:29:20 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> Message-ID: <4F1C4770.2040603@astro.uio.no> On 01/22/2012 04:55 AM, Ond?ej ?ert?k wrote: > Hi, > > I read the Mandelbrot code using NumPy at this page: > > http://mentat.za.net/numpy/intro/intro.html > > but when I run it, it gives me integer overflows. As such, I have > fixed the code, so that it doesn't overflow here: > > https://gist.github.com/1655320 > > and I have also written an equivalent Fortran program. > > You can compare both source codes to see > that that it is pretty much one-to-one translation. > The main idea in the above gist is to take an > algorithm written in NumPy, and translate > it directly to Fortran, without any special > optimizations. So the above is my first try > in Fortran. You can plot the result > using this simple script (you > can also just click on this gist to > see the image there): > > https://gist.github.com/1655377 > > Here are my timings: > > Python Fortran Speedup > Calculation 12.749 00.784 16.3x > Saving 01.904 01.456 1.3x > Total 14.653 02.240 6.5x > > I save the matrices to disk in an ascii format, > so it's quite slow in both cases. The pure computation > is however 16x faster in Fortran (in gfortran, > I didn't even try Intel Fortran, that will probably be > even faster). > > As such, I wonder how the NumPy version could be sped up? > I have compiled NumPy with Lapack+Blas from source. This is a pretty well known weakness with NumPy. In the Python code at least, each of c and z are about 15 MB, and the mask about 1 MB. So that doesn't fit in CPU cache, and so each and every statement you do in the loop transfer that data in and out of CPU cache the memory bus. There's no quick fix -- you can try to reduce the working set so that it fits in CPU cache, but then the Python overhead often comes into play. Solutions include numexpr and Theano -- and as often as not, Cython and Fortran. It's a good example, thanks!, Dag Sverre From ondrej.certik at gmail.com Sun Jan 22 14:01:41 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Sun, 22 Jan 2012 11:01:41 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> Message-ID: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase <seb.haase at gmail.com> wrote: > How does the?algorithm and timing compare to this one: > > http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f > > The author of original version is ?Dan Goodman > # FAST FRACTALS WITH PYTHON AND NUMPY Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with the same dimensions/iterations. It uses a better datastructures -- it only keeps track of points that still need to be iterated --- very clever. If I have time, I'll try to provide an equivalent Fortran version too, for comparison. Ondrej From chris.barker at noaa.gov Sun Jan 22 23:31:30 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Sun, 22 Jan 2012 20:31:30 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> Message-ID: <CALGmxEJtKUEanrcipSYeH=he8Gc7BTU64pjNJZfQWdrueUA8yg@mail.gmail.com> 2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com>: > If I have time, I'll try to provide an equivalent Fortran version too, > for comparison. > > Ondrej here is a Cython example: http://wiki.cython.org/examples/mandelbrot I haven't looked to see if it's the same algorithm, but it may be instructive, none the less. -Chris -- -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From jrocher at enthought.com Sun Jan 22 23:35:03 2012 From: jrocher at enthought.com (Jonathan Rocher) Date: Sun, 22 Jan 2012 22:35:03 -0600 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> Message-ID: <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> Hi all, I was reading this while learning about Pytables in more details and the origin of its efficiency. This sounds like a problem where out of core computation using pytables would shine since the dataset doesn't fit into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course C/Cythonizing the problem would be another good way... HTH, Jonathan 2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com> > On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase <seb.haase at gmail.com> > wrote: > > How does the algorithm and timing compare to this one: > > > > > http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f > > > > The author of original version is Dan Goodman > > # FAST FRACTALS WITH PYTHON AND NUMPY > > Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with > the same dimensions/iterations. > > It uses a better datastructures -- it only keeps track of points that > still need to be iterated --- very clever. > If I have time, I'll try to provide an equivalent Fortran version too, > for comparison. > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120122/7eb4e4f6/attachment.html> From dg.gmane at thesamovar.net Sun Jan 22 23:39:35 2012 From: dg.gmane at thesamovar.net (Dan Goodman) Date: Mon, 23 Jan 2012 05:39:35 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> Message-ID: <jfioa6$tuo$1@dough.gmane.org> On 22/01/2012 20:01, Ond?ej ?ert?k wrote: > On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase<seb.haase at gmail.com> wrote: >> How does the algorithm and timing compare to this one: >> >> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f >> >> The author of original version is Dan Goodman >> # FAST FRACTALS WITH PYTHON AND NUMPY > > Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with > the same dimensions/iterations. > > It uses a better datastructures -- it only keeps track of points that > still need to be iterated --- very clever. > If I have time, I'll try to provide an equivalent Fortran version too, > for comparison. I spent a little while trying to optimise my algorithm using only numpy and couldn't get it running much faster than that. Given the relatively low number of iterations it's probably not a problem of Python overheads, so I guess it is indeed memory access that is the problem. One way to get round this using numexpr would be something like this. Write f(z)=z^2+c and then f(n+1,z)=f(n,f(z)). Now try out instead of computing z->f(z) each iteration, write down the formula for z->f(n,z) for a few different n and use that in numexpr, e.g. z->f(2,z) or z->(z^2+c)^2+c. This amounts to doing several iterations per step, but it means that you'll be spending more time doing floating point ops and less time waiting for memory operations so it might get closer to fortran/C speeds. Actually, my curiosity was piqued so I tried it out. On my laptop I get that using the idea above gives a maximum speed increase for n=8, and after that you start to get overflow errors so it runs slower. At n=8 it runs about 4.5x faster than the original version. So if you got the same speedup it would be running in about 0.6s compared to your fortran 0.7s. However it's not a fair comparison as numexpr is using multiple cores (but only about 60% peak on my dual core laptop), but still nice to see what can be achieved with numexpr. :) Code attached. Dan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fastermandel.py URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/e1d93a12/attachment.ksh> From d.s.seljebotn at astro.uio.no Mon Jan 23 04:04:36 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 23 Jan 2012 10:04:36 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> Message-ID: <4F1D22A4.8010009@astro.uio.no> On 01/23/2012 05:35 AM, Jonathan Rocher wrote: > Hi all, > > I was reading this while learning about Pytables in more details and the > origin of its efficiency. This sounds like a problem where out of core > computation using pytables would shine since the dataset doesn't fit > into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course > C/Cythonizing the problem would be another good way... Well, since the data certainly fits in RAM, one would use numexpr directly (which is what pytables also uses). Dag Sverre > > HTH, > Jonathan > > 2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com > <mailto:ondrej.certik at gmail.com>> > > On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase > <seb.haase at gmail.com <mailto:seb.haase at gmail.com>> wrote: > > How does the algorithm and timing compare to this one: > > > > > http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f > <http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f> > > > > The author of original version is Dan Goodman > > # FAST FRACTALS WITH PYTHON AND NUMPY > > Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with > the same dimensions/iterations. > > It uses a better datastructures -- it only keeps track of points that > still need to be iterated --- very clever. > If I have time, I'll try to provide an equivalent Fortran version too, > for comparison. > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Jonathan Rocher, PhD > Scientific software developer > Enthought, Inc. > jrocher at enthought.com <mailto:jrocher at enthought.com> > 1-512-536-1057 > http://www.enthought.com <http://www.enthought.com/> > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From wardefar at iro.umontreal.ca Mon Jan 23 05:23:28 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 23 Jan 2012 05:23:28 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? Message-ID: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: > a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8') > b = numpy.random.randint(5000000,size=(4993210,)) > c = a[b] It seems c is not getting filled in full, namely: > In [14]: c[1000000:].sum() > Out[14]: 0 I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. Thanks, David From sturla at molden.no Mon Jan 23 06:23:01 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 23 Jan 2012 12:23:01 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <4F1D22A4.8010009@astro.uio.no> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> Message-ID: <4F1D4315.5080602@molden.no> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: > On 01/23/2012 05:35 AM, Jonathan Rocher wrote: >> Hi all, >> >> I was reading this while learning about Pytables in more details and the >> origin of its efficiency. This sounds like a problem where out of core >> computation using pytables would shine since the dataset doesn't fit >> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course >> C/Cythonizing the problem would be another good way... > Well, since the data certainly fits in RAM, one would use numexpr > directly (which is what pytables also uses). > > Personally I feel this debate is asking the wrong question. It is not uncommon for NumPy code to be 16x slower than C or Fortran. But that is not really interesting. This is what I think matters: - Is the NumPy code FAST ENOUGH? If not, then go ahead and optimize. If it's fast enough, then just leave it. In this case, it seems Python takes ~13 seconds compared to ~1 second for Fortran. Sure, those extra 12 seconds could be annoying. But how much coding time should we spend to avoid them? 15 minutes? An hour? Two hours? Taking the time spent optimizing into account, then perhaps Python is 'faster' anyway? It is common to ask what is fastest for the computer. But we should really be asking what is fastest for our selves. For example: I have a computation that will take a day in Fortran or a month in Python (estimated). And I am going to run this code several times (20 or so, I think). In this case, yes, coding the bottlenecks in Fortran matters to me. But 13 seconds versus 1 second? I find that hardly interesting. Sturla From seb.haase at gmail.com Mon Jan 23 07:09:58 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 23 Jan 2012 13:09:58 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <4F1D4315.5080602@molden.no> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no> Message-ID: <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com> On Mon, Jan 23, 2012 at 12:23 PM, Sturla Molden <sturla at molden.no> wrote: > Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: >> On 01/23/2012 05:35 AM, Jonathan Rocher wrote: >>> Hi all, >>> >>> I was reading this while learning about Pytables in more details and the >>> origin of its efficiency. This sounds like a problem where out of core >>> computation using pytables would shine since the dataset doesn't fit >>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course >>> C/Cythonizing the problem would be another good way... >> Well, since the data certainly fits in RAM, one would use numexpr >> directly (which is what pytables also uses). >> >> > > Personally I feel this debate is asking the wrong question. > > It is not uncommon for NumPy code to be 16x slower than C or Fortran. > But that is not really interesting. > > This is what I think matters: > > - Is the NumPy code FAST ENOUGH? ?If not, then go ahead and optimize. If > it's fast enough, then just leave it. > > In this case, it seems Python takes ~13 seconds compared to ~1 second > for Fortran. Sure, those extra 12 seconds could be annoying. But how > much coding time should we spend to avoid them? 15 minutes? An hour? Two > hours? > > Taking the time spent optimizing into account, then perhaps Python is > 'faster' anyway? It is common to ask what is fastest for the computer. > But we should really be asking what is fastest for our selves. > > For example: I have a computation that will take a day in Fortran or a > month in Python (estimated). And I am going to run this code several > times (20 or so, I think). In this case, yes, coding the bottlenecks in > Fortran matters to me. But 13 seconds versus 1 second? I find that > hardly interesting. > > Sturla I would think that interactive zooming would be quite nice ("illuminating") .... and for that 13 secs would not be tolerable.... Well... it's not at the top of my priority list ... ;-) -Sebastian Haase From d.s.seljebotn at astro.uio.no Mon Jan 23 07:40:42 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 23 Jan 2012 13:40:42 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <4F1D4315.5080602@molden.no> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no> Message-ID: <4F1D554A.4040902@astro.uio.no> On 01/23/2012 12:23 PM, Sturla Molden wrote: > Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: >> On 01/23/2012 05:35 AM, Jonathan Rocher wrote: >>> Hi all, >>> >>> I was reading this while learning about Pytables in more details and the >>> origin of its efficiency. This sounds like a problem where out of core >>> computation using pytables would shine since the dataset doesn't fit >>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course >>> C/Cythonizing the problem would be another good way... >> Well, since the data certainly fits in RAM, one would use numexpr >> directly (which is what pytables also uses). >> >> > > Personally I feel this debate is asking the wrong question. > > It is not uncommon for NumPy code to be 16x slower than C or Fortran. > But that is not really interesting. > > This is what I think matters: > > - Is the NumPy code FAST ENOUGH? If not, then go ahead and optimize. If > it's fast enough, then just leave it. > > In this case, it seems Python takes ~13 seconds compared to ~1 second > for Fortran. Sure, those extra 12 seconds could be annoying. But how > much coding time should we spend to avoid them? 15 minutes? An hour? Two > hours? > > Taking the time spent optimizing into account, then perhaps Python is > 'faster' anyway? It is common to ask what is fastest for the computer. > But we should really be asking what is fastest for our selves. > > For example: I have a computation that will take a day in Fortran or a > month in Python (estimated). And I am going to run this code several > times (20 or so, I think). In this case, yes, coding the bottlenecks in > Fortran matters to me. But 13 seconds versus 1 second? I find that > hardly interesting. You, me, Ondrej, and many more are happy to learn 4 languages and use them where they are most appropriate. But most scientists only want to learn and use one tool. And most scientists have both problems where performance doesn't matter, and problems where it does. So as long as examples like this exists, many people will prefer Fortran for *all* their tasks. (Of course, that's why I got involved in Cython...) Dag Sverre From sturla at molden.no Mon Jan 23 07:51:42 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 23 Jan 2012 13:51:42 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no> <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com> Message-ID: <4F1D57DE.3030507@molden.no> Den 23.01.2012 13:09, skrev Sebastian Haase: > > I would think that interactive zooming would be quite nice > ("illuminating") .... and for that 13 secs would not be tolerable.... > Well... it's not at the top of my priority list ... ;-) > Sure, that comes under the 'fast enough' issue. But even Fortran might be too slow here? For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader (which would be a text string in Python): madelbrot_fragment_shader = """ uniform sampler1D tex; uniform vec2 center; uniform float scale; uniform int iter; void main() { vec2 z, c; c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x; c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; int i; z = c; for(i=0; i<iter; i++) { float x = (z.x * z.x - z.y * z.y) + c.x; float y = (z.y * z.x + z.x * z.y) + c.y; if((x * x + y * y)> 4.0) break; z.x = x; z.y = y; } gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); } """ The rest is just boiler-plate OpenGL... Sources: http://nuclear.mutantstargoat.com/articles/sdr_fract/ http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml Sturla From cimrman3 at ntc.zcu.cz Mon Jan 23 08:02:54 2012 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 23 Jan 2012 14:02:54 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <4F1D57DE.3030507@molden.no> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no> <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com> <4F1D57DE.3030507@molden.no> Message-ID: <4F1D5A7E.4080604@ntc.zcu.cz> On 01/23/12 13:51, Sturla Molden wrote: > Den 23.01.2012 13:09, skrev Sebastian Haase: >> >> I would think that interactive zooming would be quite nice >> ("illuminating") .... and for that 13 secs would not be tolerable.... >> Well... it's not at the top of my priority list ... ;-) >> > > Sure, that comes under the 'fast enough' issue. But even Fortran might > be too slow here? > > For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader > (which would be a text string in Python): > > madelbrot_fragment_shader = """ > > uniform sampler1D tex; > uniform vec2 center; > uniform float scale; > uniform int iter; > void main() { > vec2 z, c; > c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x; > c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; > int i; > z = c; > for(i=0; i<iter; i++) { > float x = (z.x * z.x - z.y * z.y) + c.x; > float y = (z.y * z.x + z.x * z.y) + c.y; > if((x * x + y * y)> 4.0) break; > z.x = x; > z.y = y; > } > gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); > } > > """ > > The rest is just boiler-plate OpenGL... > > Sources: > > http://nuclear.mutantstargoat.com/articles/sdr_fract/ > > http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) years ago (-> on a laughable hardware by today's standards). r. [1] http://wmi.math.u-szeged.hu/xaos/doku.php From scipy at samueljohn.de Mon Jan 23 11:35:29 2012 From: scipy at samueljohn.de (Samuel John) Date: Mon, 23 Jan 2012 17:35:29 +0100 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <4F1D5A7E.4080604@ntc.zcu.cz> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com> <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com> <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com> <4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no> <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com> <4F1D57DE.3030507@molden.no> <4F1D5A7E.4080604@ntc.zcu.cz> Message-ID: <20E62AD8-0A7B-4D32-868C-97C62DE9F9AB@samueljohn.de> I'd like to add http://git.tiker.net/pyopencl.git/blob/HEAD:/examples/demo_mandelbrot.py to the discussion, since I use pyopencl (http://mathema.tician.de/software/pyopencl) with great success in my daily scientific computing. Install with pip. PyOpenCL does understand numpy arrays. You write a kernel (small c-program) directly into a python triple quoted strings and get a pythonic way to program GPU and core i5 and i7 CPUs with python Exception if something goes wrong. Whenever I hit a speed bottleneck that I cannot solve with pure numpy, I code a little part of the computation for GPU. The compilation is done just in time when you run the python code. Especially for the mandelbrot this may be a _huge_ gain in speed since its embarrassingly parallel. Samuel On 23.01.2012, at 14:02, Robert Cimrman wrote: > On 01/23/12 13:51, Sturla Molden wrote: >> Den 23.01.2012 13:09, skrev Sebastian Haase: >>> >>> I would think that interactive zooming would be quite nice >>> ("illuminating") .... and for that 13 secs would not be tolerable.... >>> Well... it's not at the top of my priority list ... ;-) >>> >> >> Sure, that comes under the 'fast enough' issue. But even Fortran might >> be too slow here? >> >> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader >> (which would be a text string in Python): >> >> madelbrot_fragment_shader = """ >> >> uniform sampler1D tex; >> uniform vec2 center; >> uniform float scale; >> uniform int iter; >> void main() { >> vec2 z, c; >> c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x; >> c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; >> int i; >> z = c; >> for(i=0; i<iter; i++) { >> float x = (z.x * z.x - z.y * z.y) + c.x; >> float y = (z.y * z.x + z.x * z.y) + c.y; >> if((x * x + y * y)> 4.0) break; >> z.x = x; >> z.y = y; >> } >> gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); >> } >> >> """ >> >> The rest is just boiler-plate OpenGL... >> >> Sources: >> >> http://nuclear.mutantstargoat.com/articles/sdr_fract/ >> >> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml > > Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos > had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) > years ago (-> on a laughable hardware by today's standards). > > r. > > [1] http://wmi.math.u-szeged.hu/xaos/doku.php > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Mon Jan 23 12:17:41 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 23 Jan 2012 09:17:41 -0800 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> Message-ID: <CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com> On Wed, Jan 18, 2012 at 1:26 AM, <apo at pdauf.de> wrote: > Your ideas are very helpfull and the code is very fast. I'm curios -- a number of ideas were floated here -- what did you end up using? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From wardefar at iro.umontreal.ca Mon Jan 23 13:55:52 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 23 Jan 2012 13:55:52 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> Message-ID: <20120123185552.GA27535@ravage> I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. Thanks, David On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: > A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: > > > a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8') > > b = numpy.random.randint(5000000,size=(4993210,)) > > c = a[b] > > It seems c is not getting filled in full, namely: > > > In [14]: c[1000000:].sum() > > Out[14]: 0 > > I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. > > Thanks, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From teoliphant at gmail.com Mon Jan 23 14:33:42 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Mon, 23 Jan 2012 13:33:42 -0600 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120123185552.GA27535@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> Message-ID: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io> Can you determine where the problem is, precisely. In other words, can you verify that c is not getting filled in correctly? You are no doubt going to get overflow in the summation as you have a uint8 parameter. But, having that overflow be exactly '0' would be surprising. Can you verify that a and b are getting created correctly? Also, 'c' should be a 2-d array, can you verify that? Can you take the sum along the -1 axis and the 0 axis separately: print a.shape print b.shape print c.shape c[1000000:].sum(axis=0) d = c[1000000:].sum(axis=-1) print d[:100] print d[-100:] On Jan 23, 2012, at 12:55 PM, David Warde-Farley wrote: > I've reproduced this (rather serious) bug myself and confirmed that it exists > in master, and as far back as 1.4.1. > > I'd really appreciate if someone could reproduce and confirm on another > machine, as so far all my testing has been on our single high-memory machine. > > Thanks, > David > > On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: >> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: >> >>> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8') >>> b = numpy.random.randint(5000000,size=(4993210,)) >>> c = a[b] >> >> It seems c is not getting filled in full, namely: >> >>> In [14]: c[1000000:].sum() >>> Out[14]: 0 >> >> I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. >> >> Thanks, >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robince at gmail.com Mon Jan 23 14:38:44 2012 From: robince at gmail.com (Robin) Date: Mon, 23 Jan 2012 20:38:44 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120123185552.GA27535@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> Message-ID: <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley <wardefar at iro.umontreal.ca> wrote: > I've reproduced this (rather serious) bug myself and confirmed that it exists > in master, and as far back as 1.4.1. > > I'd really appreciate if someone could reproduce and confirm on another > machine, as so far all my testing has been on our single high-memory machine. I see the same behaviour on a Winodows machine with numpy 1.6.1. But I don't think it is an indexing problem - rather something with the random number creation. a itself is already zeros for high indexes. ?? In [8]: b[1000000:1000010] Out[8]: array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, 2005054, 2565207, 3114930]) In [9]: a[b[1000000:1000010]] Out[9]: array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) In [41]: a[581350:,0].sum() Out[41]: 0� Cheers Robin > > Thanks, > David > > On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: >> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: >> >> > a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8') >> > b = numpy.random.randint(5000000,size=(4993210,)) >> > c = a[b] >> >> It seems c is not getting filled in full, namely: >> >> > In [14]: c[1000000:].sum() >> > Out[14]: 0 >> >> I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. >> >> Thanks, >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From apo at pdauf.de Mon Jan 23 14:40:21 2012 From: apo at pdauf.de (elodw) Date: Mon, 23 Jan 2012 20:40:21 +0100 Subject: [Numpy-discussion] Counting the Colors of RGB-Image In-Reply-To: <CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com> References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de> <CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com> Message-ID: <4F1DB7A5.3050208@pdauf.de> Am 23.01.2012 18:17, schrieb Chris Barker: > On Wed, Jan 18, 2012 at 1:26 AM,<apo at pdauf.de> wrote: >> Your ideas are very helpfull and the code is very fast. > I'm curios -- a number of ideas were floated here -- what did you end up using? > > -Chris > > I'am sorry but when i see the code of Torgil Svenson, I think, "the game is over". I use the follow. code: t0=clock() tt = n_im2.view() tt.shape = -1,3 ifl = tt[...,0].astype(np.int)*256*256 + tt[...,1].astype(np.int)*256 + tt[...,2].astype(np.int) colors, inv = np.unique(ifl,return_inverse=True) zus = np.array([colors[-1]+1]) colplus = np.hstack((colors,zus)) ccnt = np.histogram(ifl,colplus)[0] t1=clock() print (t1-t0) t0=t1 > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From aronne.merrelli at gmail.com Mon Jan 23 15:04:22 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Mon, 23 Jan 2012 14:04:22 -0600 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io> Message-ID: <CAHNdQ4LO_9B4v-PtEWisUrKGQv0spwMT8C=Z=WJB+=mmcP4phQ@mail.gmail.com> On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant <teoliphant at gmail.com>wrote: > Can you determine where the problem is, precisely. In other words, can > you verify that c is not getting filled in correctly? > > You are no doubt going to get overflow in the summation as you have a > uint8 parameter. But, having that overflow be exactly '0' would be > surprising. > > Can you verify that a and b are getting created correctly? Also, 'c' > should be a 2-d array, can you verify that? Can you take the sum along the > -1 axis and the 0 axis separately: > > print a.shape > print b.shape > print c.shape > > c[1000000:].sum(axis=0) > d = c[1000000:].sum(axis=-1) > print d[:100] > print d[-100:] > I am getting the same results as David. It looks like c just "stopped filling in" partway through the array. I don't think there is any overflow issue, since the result of sum() is up-promoted to uint64 when I do that. Travis, here are the outputs at my end - I cut out many zeros for brevity: In [7]: print a.shape (5000000, 972) In [8]: print b.shape (4993210,) In [9]: print c.shape (4993210, 972) In [10]: c[1000000:].sum(axis=0) Out[10]: array([0, 0, 0, .... , 0]) In [11]: d = c[1000000:].sum(axis=-1) In [12]: print d[:100] [0 0 0 ... 0 0] In [13]: print d[-100:] [0 0 0 ... 0 0 0] I looked at sparse subsamples with matplotlib - specifically, imshow(a[::1000, :]) - and the a array looks correct (random values everywhere), but c is zero past a certain row number. In fact, it looks like it becomes zero at row 575419 - I think for all rows in c beyond row 574519, the values will be zero. For lower row numbers, I think they are correctly filled (at least, by the sparse view in matplotlib). In [15]: a[b[574519], 350:360] Out[15]: array([143, 155, 11, 30, 212, 149, 110, 164, 165, 120], dtype=uint8) In [16]: c[574519, 350:360] Out[16]: array([143, 155, 11, 30, 212, 149, 0, 0, 0, 0], dtype=uint8) I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel details) HTH, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/ad3e9033/attachment.html> From emayssat at gmail.com Mon Jan 23 15:15:56 2012 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Mon, 23 Jan 2012 12:15:56 -0800 Subject: [Numpy-discussion] Saving and loading a structured array from a TEXT file Message-ID: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com> Is there a way to save a structured array in a text file? My problem is not so much in the saving procedure, but rather in the 'reloading' procedure. See below In [3]: import numpy as np In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')]) In [5]: r.tofile('toto.txt',sep='\n') bash-4.2$ cat toto.txt ('1', 1, 1.0) ('1', 1, 1.0) ('1', 1, 1.0) In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/cls1fs/clseng/10/<ipython-input-7-b07ba265ede7> in <module>() ----> 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) ValueError: Unable to read character files of that array type -- Emmanuel From wardefar at iro.umontreal.ca Mon Jan 23 15:21:32 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 23 Jan 2012 15:21:32 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io> Message-ID: <20120123202131.GC28091@ravage> Hi Travis, Thanks for your reply. On Mon, Jan 23, 2012 at 01:33:42PM -0600, Travis Oliphant wrote: > Can you determine where the problem is, precisely. In other words, can you verify that c is not getting filled in correctly? > > You are no doubt going to get overflow in the summation as you have a uint8 parameter. But, having that overflow be exactly '0' would be surprising. I've already looked at this actually. The last 4400000 or so rows of c are all zero, however 'a' seems to be filled in fine: >>> import numpy >>> a = numpy.array(numpy.random.randint(256,size=(5000000,972)), >>> dtype=numpy.uint8) >>> b = numpy.random.randint(5000000,size=(4993210,)) >>> c = a[b] >>> print c [[186 215 204 ..., 170 98 198] [ 56 98 112 ..., 32 233 1] [ 44 133 171 ..., 163 35 51] ..., [ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0]] >>> print a [[ 30 182 56 ..., 133 162 173] [112 100 69 ..., 3 147 80] [124 70 232 ..., 114 177 11] ..., [ 22 42 31 ..., 141 196 134] [ 74 47 167 ..., 38 193 9] [162 228 190 ..., 150 18 1]] So it seems to have nothing to do with the sum, but rather the advanced indexing operation. The zeros seem to start in the middle of row 574519, in particular at element 356. This is reproducible with different random vectors of indices, it seems. So 558432824th element things go awry. I can't say it makes any sense to me why this would be the magic number. David From wardefar at iro.umontreal.ca Mon Jan 23 15:33:53 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 23 Jan 2012 15:33:53 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> Message-ID: <20120123203353.GD28091@ravage> On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote: > On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley > <wardefar at iro.umontreal.ca> wrote: > > I've reproduced this (rather serious) bug myself and confirmed that it exists > > in master, and as far back as 1.4.1. > > > > I'd really appreciate if someone could reproduce and confirm on another > > machine, as so far all my testing has been on our single high-memory machine. > > I see the same behaviour on a Winodows machine with numpy 1.6.1. But I > don't think it is an indexing problem - rather something with the > random number creation. a itself is already zeros for high indexes. > ?? > In [8]: b[1000000:1000010] > Out[8]: > array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, > 2005054, 2565207, 3114930]) > > In [9]: a[b[1000000:1000010]] > Out[9]: > array([[0, 0, 0, ..., 0, 0, 0], > [0, 0, 0, ..., 0, 0, 0], > [0, 0, 0, ..., 0, 0, 0], > ..., > [0, 0, 0, ..., 0, 0, 0], > [0, 0, 0, ..., 0, 0, 0], > [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) > > In [41]: a[581350:,0].sum() > Out[41]: 0 Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being filled in -- the problem arises with c alone. So, another Windows-specific bug to add to the pile, perhaps? :( David From derek at astro.physik.uni-goettingen.de Mon Jan 23 16:07:11 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Mon, 23 Jan 2012 22:07:11 +0100 Subject: [Numpy-discussion] Saving and loading a structured array from a TEXT file In-Reply-To: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com> References: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com> Message-ID: <619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de> On 23 Jan 2012, at 21:15, Emmanuel Mayssat wrote: > Is there a way to save a structured array in a text file? > My problem is not so much in the saving procedure, but rather in the > 'reloading' procedure. > See below > > > In [3]: import numpy as np > > In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')]) > > In [5]: r.tofile('toto.txt',sep='\n') > > bash-4.2$ cat toto.txt > ('1', 1, 1.0) > ('1', 1, 1.0) > ('1', 1, 1.0) > > In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/cls1fs/clseng/10/<ipython-input-7-b07ba265ede7> in <module>() > ----> 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) > > ValueError: Unable to read character files of that array type I think most of the np.fromfile functionality works for binary input; for reading text input np.loadtxt and np.genfromtxt are the (currently) recommended functions. It is bit tricky to read the format generated by tofile() in the above example, but the following should work: cnv = {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')} r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype) Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately does not offer an easy way to save structured arrays (note to self and others currently working on npyio: definitely room for improvement!). HTH, Derek From cgohlke at uci.edu Mon Jan 23 16:08:03 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Mon, 23 Jan 2012 13:08:03 -0800 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120123203353.GD28091@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> Message-ID: <4F1DCC33.6090101@uci.edu> On 1/23/2012 12:33 PM, David Warde-Farley wrote: > On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote: >> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley >> <wardefar at iro.umontreal.ca> wrote: >>> I've reproduced this (rather serious) bug myself and confirmed that it exists >>> in master, and as far back as 1.4.1. >>> >>> I'd really appreciate if someone could reproduce and confirm on another >>> machine, as so far all my testing has been on our single high-memory machine. >> >> I see the same behaviour on a Winodows machine with numpy 1.6.1. But I >> don't think it is an indexing problem - rather something with the >> random number creation. a itself is already zeros for high indexes. >> ?? >> In [8]: b[1000000:1000010] >> Out[8]: >> array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, >> 2005054, 2565207, 3114930]) >> >> In [9]: a[b[1000000:1000010]] >> Out[9]: >> array([[0, 0, 0, ..., 0, 0, 0], >> [0, 0, 0, ..., 0, 0, 0], >> [0, 0, 0, ..., 0, 0, 0], >> ..., >> [0, 0, 0, ..., 0, 0, 0], >> [0, 0, 0, ..., 0, 0, 0], >> [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) >> >> In [41]: a[581350:,0].sum() >> Out[41]: 0 > > Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being > filled in -- the problem arises with c alone. > > So, another Windows-specific bug to add to the pile, perhaps? :( > > David Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863> Christoph From derek at astro.physik.uni-goettingen.de Mon Jan 23 16:28:47 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Mon, 23 Jan 2012 22:28:47 +0100 Subject: [Numpy-discussion] Saving and loading a structured array from a TEXT file In-Reply-To: <619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de> References: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com> <619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de> Message-ID: <D3CA6FF9-D9C1-4014-955A-2F0A32EA9960@astro.physik.uni-goettingen.de> On 23 Jan 2012, at 22:07, Derek Homeier wrote: >> In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')]) >> >> In [5]: r.tofile('toto.txt',sep='\n') >> >> bash-4.2$ cat toto.txt >> ('1', 1, 1.0) >> ('1', 1, 1.0) >> ('1', 1, 1.0) >> > > cnv = {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')} > r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype) > > Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately > does not offer an easy way to save structured arrays (note to self and others currently > working on npyio: definitely room for improvement!). For the record, in that example np.savetxt('toto.txt', r, fmt='%s,%d,%f') would work as well, saving you the custom converter for loadtxt - it could just become tedious to work out the format for more complex structures, so an option to construct this automatically from r.dtype could certainly be a nice enhancement. Just wondering, is there something like the inverse operator to np.format_parser, i.e. mapping each dtype to a default print format specifier? Cheers, Derek From emayssat at gmail.com Mon Jan 23 18:26:08 2012 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Mon, 23 Jan 2012 15:26:08 -0800 Subject: [Numpy-discussion] 'Advanced' save and restore operation Message-ID: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> After having saved data, I need to know/remember the data dtype to restore it correctly. Is there a way to save the dtype with the data? (I guess the header parameter of savedata could help, but they are only available in v2.0+ ) I would like to save several related structured array and a dictionary of parameters into a TEXT file. Is there an easy way to do that? (maybe xml file, or maybe archive zip file of other files, or ..... ) Any recommendation is helpful. Regards, -- Emmanuel From deshpande.jaidev at gmail.com Mon Jan 23 18:42:02 2012 From: deshpande.jaidev at gmail.com (Jaidev Deshpande) Date: Tue, 24 Jan 2012 05:12:02 +0530 Subject: [Numpy-discussion] Working with MATLAB Message-ID: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com> Dear List, I frequently work with MATLAB and it is necessary for me many a times to adapt MATLAB codes for NumPy arrays. While for most practical purposes it works fine, I think there might be a lot of 'under the hood' things that I might be missing when I make the translations from MATLAB to Python. Are there any 'best practices' for working on this transition? Thanks From deshpande.jaidev at gmail.com Mon Jan 23 18:52:42 2012 From: deshpande.jaidev at gmail.com (Jaidev Deshpande) Date: Tue, 24 Jan 2012 05:22:42 +0530 Subject: [Numpy-discussion] Working with MATLAB In-Reply-To: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com> References: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com> Message-ID: <CAB=suEk6X_AkezMhkE6c3V+CBAWkwaRxsUViCwBR-Q8xj_xwUg@mail.gmail.com> Please ignore my question. I found what I needed on the scipy website. I asked the question in haste. I'm sorry. Thanks From shish at keba.be Mon Jan 23 19:45:09 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 23 Jan 2012 19:45:09 -0500 Subject: [Numpy-discussion] 'Advanced' save and restore operation In-Reply-To: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> Message-ID: <CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com> Note sure if there's a better way, but you can do it with some custom load and save functions: >>> with open('f.txt', 'w') as f: ... f.write(str(x.dtype) + '\n') ... numpy.savetxt(f, x) >>> with open('f.txt') as f: ... dtype = f.readline().strip() ... y = numpy.loadtxt(f).astype(dtype) I'm not sure how that'd work with structured arrays though. For the dict of parameters you'd have to write your own load/save piece of code too if you need a clean text file. -=- Olivier 2012/1/23 Emmanuel Mayssat <emayssat at gmail.com> > After having saved data, I need to know/remember the data dtype to > restore it correctly. > Is there a way to save the dtype with the data? > (I guess the header parameter of savedata could help, but they are > only available in v2.0+ ) > > I would like to save several related structured array and a dictionary > of parameters into a TEXT file. > Is there an easy way to do that? > (maybe xml file, or maybe archive zip file of other files, or ..... ) > > Any recommendation is helpful. > > Regards, > -- > Emmanuel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/113c608a/attachment.html> From derek at astro.physik.uni-goettingen.de Mon Jan 23 20:00:16 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 24 Jan 2012 02:00:16 +0100 Subject: [Numpy-discussion] 'Advanced' save and restore operation In-Reply-To: <CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com> References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> <CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com> Message-ID: <7E29BD99-5064-4009-93BC-64AC87956779@astro.physik.uni-goettingen.de> On 24 Jan 2012, at 01:45, Olivier Delalleau wrote: > Note sure if there's a better way, but you can do it with some custom load and save functions: > > >>> with open('f.txt', 'w') as f: > ... f.write(str(x.dtype) + '\n') > ... numpy.savetxt(f, x) > > >>> with open('f.txt') as f: > ... dtype = f.readline().strip() > ... y = numpy.loadtxt(f).astype(dtype) > > I'm not sure how that'd work with structured arrays though. For the dict of parameters you'd have to write your own load/save piece of code too if you need a clean text file. > > -=- Olivier > > 2012/1/23 Emmanuel Mayssat <emayssat at gmail.com> > After having saved data, I need to know/remember the data dtype to > restore it correctly. > Is there a way to save the dtype with the data? > (I guess the header parameter of savedata could help, but they are > only available in v2.0+ ) > > I would like to save several related structured array and a dictionary > of parameters into a TEXT file. > Is there an easy way to do that? > (maybe xml file, or maybe archive zip file of other files, or ..... ) > > Any recommendation is helpful. asciitable might be of some help, but to implement all of your required functionality, you'd probably still have to implement your own Reader class: http://cxc.cfa.harvard.edu/contrib/asciitable/ Cheers, Derek From sturla at molden.no Mon Jan 23 23:35:20 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 05:35:20 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1DCC33.6090101@uci.edu> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> Message-ID: <4F1E3508.1070405@molden.no> Den 23.01.2012 22:08, skrev Christoph Gohlke: > Maybe this explains the win-amd64 behavior: There are a couple of places > in mtrand where array indices and sizes are C long instead of npy_intp, > for example in the randint function: > > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863> > > AFAIK, on AMD64 a C long is 64 bit on Linux (gcc) and 32 bit on Windows (gcc and MSVC). Sturla From sturla at molden.no Tue Jan 24 00:00:05 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 06:00:05 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1DCC33.6090101@uci.edu> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> Message-ID: <4F1E3AD5.9000507@molden.no> Den 23.01.2012 22:08, skrev Christoph Gohlke: > > Maybe this explains the win-amd64 behavior: There are a couple of places > in mtrand where array indices and sizes are C long instead of npy_intp, > for example in the randint function: > > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863> > > Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. Producing 2 GB of random ints twice fails: >>> import numpy as np >>> np.random.randint(5000000,size=(2*1024**3,)) array([0, 0, 0, ..., 0, 0, 0]) >>> np.random.randint(5000000,size=(2*1024**3,)) Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> np.random.randint(5000000,size=(2*1024**3,)) File "mtrand.pyx", line 881, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:6040) MemoryError >>> Sturla From sturla at molden.no Tue Jan 24 00:32:14 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 06:32:14 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E3AD5.9000507@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> Message-ID: <4F1E425E.20203@molden.no> Den 24.01.2012 06:00, skrev Sturla Molden: > Both i and length could overflow here. It should overflow on > allocation of more than 2 GB. There is also a lot of C longs in the > internal state (line 55-105), as well as the other functions. The use of C long affects all the C and Pyrex source code in mtrand module, not just mtrand.pyx. All of it is fubar on Win64. From the C standard, a C long is only quarranteed to be "at least 32 bits wide". Thus a C long can only be expected to index up to 2**31 - 1, and it is not a Windows specific problem. So it seems there are hundreds of places in the mtrand module where integers can overflow on 64-bit Python. Also the crappy old Pyrex code should be updated to some more recent Cython. Sturla From sturla at molden.no Tue Jan 24 03:21:00 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 09:21:00 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E425E.20203@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> Message-ID: <4F1E69EC.4020408@molden.no> On 24.01.2012 06:32, Sturla Molden wrote: > The use of C long affects all the C and Pyrex source code in mtrand > module, not just mtrand.pyx. All of it is fubar on Win64. randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. Sturla From sturla at molden.no Tue Jan 24 03:37:26 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 09:37:26 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E69EC.4020408@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> Message-ID: <4F1E6DC6.6040904@molden.no> On 24.01.2012 09:21, Sturla Molden wrote: > randomkit.c handles C long correctly, I think. There are different codes > for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it can give an unexpected overflow error on 64-bit Windows: In [1]: np.random.binomial(2**31, .5) --------------------------------------------------------------------------- OverflowError Traceback (most recent call last) C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>() ----> 1 np.random.binomial(2**31, .5) C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() OverflowError: Python int too large to convert to C long On systems where C longs are 64 bit, this is likely not to produce an error. This begs the question if also randomkit.c and districutions.c should be changed to use npy_intp for consistency across all platforms. (I assume we are not supporting 16 bit NumPy, in which case we will need C long there...) Sturla From sturla at molden.no Tue Jan 24 03:47:01 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 09:47:01 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E425E.20203@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> Message-ID: <4F1E7005.1090801@molden.no> On 24.01.2012 06:32, Sturla Molden wrote: > Den 24.01.2012 06:00, skrev Sturla Molden: >> Both i and length could overflow here. It should overflow on >> allocation of more than 2 GB. There is also a lot of C longs in the >> internal state (line 55-105), as well as the other functions. > > The use of C long affects all the C and Pyrex source code in mtrand > module, not just mtrand.pyx. All of it is fubar on Win64. The coding is also inconsistent, compare for example: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 Sturla From robert.kern at gmail.com Tue Jan 24 04:15:01 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 24 Jan 2012 09:15:01 +0000 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E6DC6.6040904@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no> Message-ID: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> On Tue, Jan 24, 2012 at 08:37, Sturla Molden <sturla at molden.no> wrote: > On 24.01.2012 09:21, Sturla Molden wrote: > >> randomkit.c handles C long correctly, I think. There are different codes >> for 32 and 64 bit C long, and buffer sizes are size_t. > > distributions.c take C longs as parameters e.g. for the binomial > distribution. mtrand.pyx correctly handles this, but it can give an > unexpected overflow error on 64-bit Windows: > > > In [1]: np.random.binomial(2**31, .5) > --------------------------------------------------------------------------- > OverflowError ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last) > C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>() > ----> 1 np.random.binomial(2**31, .5) > > C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in > mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() > > OverflowError: Python int too large to convert to C long > > > On systems where C longs are 64 bit, this is likely not to produce an > error. > > This begs the question if also randomkit.c and districutions.c should be > changed to use npy_intp for consistency across all platforms. There are two different uses of long that you need to distinguish. One is for sizes, and one is for parameters and values. The sizes should definitely be upgraded to npy_intp. The latter shouldn't; these should remain as the default integer type of Python and numpy, a C long. The reason longs are used for sizes is that I wrote mtrand for Numeric and Python 2.4 before numpy was even announced (and I don't think we had npy_intp at the time I merged it into numpy, but I could be wrong). Using longs for sizes was the order of the day. I don't think I had even touched a 64-bit machine that wasn't a DEC Alpha at the time, so I knew very little about the issues. So yes, please, fix whatever you can. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From robert.kern at gmail.com Tue Jan 24 04:16:48 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 24 Jan 2012 09:16:48 +0000 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E7005.1090801@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E7005.1090801@molden.no> Message-ID: <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com> On Tue, Jan 24, 2012 at 08:47, Sturla Molden <sturla at molden.no> wrote: > The coding is also inconsistent, compare for example: > > https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 > > https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 I'm sorry, what are you demonstrating there? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From sturla at molden.no Tue Jan 24 04:19:29 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 10:19:29 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E7005.1090801@molden.no> <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com> Message-ID: <4F1E77A1.3090504@molden.no> On 24.01.2012 10:16, Robert Kern wrote: > I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Sturla From robert.kern at gmail.com Tue Jan 24 04:23:22 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 24 Jan 2012 09:23:22 +0000 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E77A1.3090504@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E7005.1090801@molden.no> <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com> <4F1E77A1.3090504@molden.no> Message-ID: <CAF6FJiu7wwvz_XDZYTVanDm0n=5c485s=8ZajGoSKG_TvVFzBg@mail.gmail.com> On Tue, Jan 24, 2012 at 09:19, Sturla Molden <sturla at molden.no> wrote: > On 24.01.2012 10:16, Robert Kern wrote: > >> I'm sorry, what are you demonstrating there? > > Both npy_intp and C long are used for sizes and indexing. Ah, yes. I think Travis added the multiiter code to cont1_array(), which does broadcasting, so he used npy_intp as is proper (and necessary to pass into the multiiter API). The other functions don't do broadcasting, so he didn't touch them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From sturla at molden.no Tue Jan 24 05:01:53 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 24 Jan 2012 11:01:53 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no> <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> Message-ID: <4F1E8191.3010300@molden.no> On 24.01.2012 10:15, Robert Kern wrote: > There are two different uses of long that you need to distinguish. One > is for sizes, and one is for parameters and values. The sizes should > definitely be upgraded to npy_intp. The latter shouldn't; these should > remain as the default integer type of Python and numpy, a C long. Ok, that makes sence. > The reason longs are used for sizes is that I wrote mtrand for Numeric > and Python 2.4 before numpy was even announced (and I don't think we > had npy_intp at the time I merged it into numpy, but I could be > wrong). Using longs for sizes was the order of the day. I don't think > I had even touched a 64-bit machine that wasn't a DEC Alpha at the > time, so I knew very little about the issues. On amd64 the "native" datatypes are actually a 64 bit pointer with a 32 bit offset (contrary to what we see in Python and NumPy C sources), which is one reason why C longs are still 32 bits in MSVC. Thus an array size (size_t) should be 64 bits, but array indices (C long) should be 32 bits. But nobody likes to code like that (e.g. we would need an extra 64 bit pointer as cursor if the buffer size overflows a C long), and I don't think using a non-native 64-bit offset incur a lot of extra overhead for the CPU. :-) Sturla From pierre.haessig at crans.org Tue Jan 24 09:01:44 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Tue, 24 Jan 2012 15:01:44 +0100 Subject: [Numpy-discussion] Strange error raised by scipy.special.erf In-Reply-To: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local> Message-ID: <4F1EB9C8.9070309@crans.org> Le 22/01/2012 11:28, Nadav Horesh a ?crit : > >>> special.erf(26.5) > 1.0 > >>> special.erf(26.6) > Traceback (most recent call last): > File "<pyshell#7>", line 1, in <module> > special.erf(26.6) > FloatingPointError: underflow encountered in erf > >>> special.erf(26.7) > 1.0 > I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0 Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird enough ! -- Pierre From gammelmark at gmail.com Tue Jan 24 09:32:30 2012 From: gammelmark at gmail.com (=?ISO-8859-1?Q?S=F8ren_Gammelmark?=) Date: Tue, 24 Jan 2012 15:32:30 +0100 Subject: [Numpy-discussion] einsum evaluation order Message-ID: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com> Dear all, I was just looking into numpy.einsum and encountered an issue which might be worth pointing out in the documentation. Let us say you wish to evaluate something like this (repeated indices a summed) D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime, sigma] * C[beta, betaprime] with einsum as einsum('abs,cds,bd->ac', A, B, C) then it is not exactly clear which order einsum evaluates the contractions (or if it does it in one go). This can be very important since you can do it in several ways, one of which has the least computational complexity. The most efficient way of doing it is to contract e.g. A and C and then contract that with B (or exchange A and B). A quick test on my labtop says 2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x 2 and C is D x D for D = 96. This scaling seems to explode for higher dimensions, whereas it is much better with the two independent contractions (i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two contractions, whereas I stopped waiting after 60 s for einsum (i guess einsum probably is O(D^4) in this case). I had in fact thought of making a function similar to einsum for a while, but after I noticed it dropped it. I think, however, that there might still be room for a tool for evaluating more complicated expressions efficiently. I think the best way would be for the user to enter an expression like the one above which is then evaluated in the optimal order. I know how to do this (theoretically) if all the repeated indices only occur twice (like the expression above), but for the more general expression supported by einsum I om not sure how to do it (haven't thought about it). Here I am thinking about stuff like x[i] = a[i] * b[i] and their more general counterparts (at first glance this seems to be a simpler problem than full contractions). Do you think there is a need/interest for this kind of thing? In that case I would like the write it / help write it. Much of it, I think, can be reduced to decomposing the expression into existing numpy operations (e.g. tensordot). How to incorporate issues of storage layout etc, however, I have no idea. In any case I think it might be nice to write explicitly how the expression in einsum is evaluated in the docs. S?ren Gammelmark PhD-student Department of Physics and Astronomy Aarhus University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/ea1cf5a0/attachment.html> From Kathleen.M.Tacina at nasa.gov Tue Jan 24 10:29:16 2012 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Tue, 24 Jan 2012 15:29:16 +0000 Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type Message-ID: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1. I would have expected np.min_scalar_type(2**64-1) to return uint64. Instead, I get object. Further experimenting showed that the largest integer for which np.min_scalar_type will return uint64 is 2**63-1. Is this expected behavior? On python 2.7.2 on a 64-bit linux machine: >>> import numpy as np >>> np.version.full_version '2.0.0.dev-55472ca' >>> np.min_scalar_type(2**8-1) dtype('uint8') >>> np.min_scalar_type(2**16-1) dtype('uint16') >>> np.min_scalar_type(2**32-1) dtype('uint32') >>> np.min_scalar_type(2**64-1) dtype('O') >>> np.min_scalar_type(2**63-1) dtype('uint64') >>> np.min_scalar_type(2**63) dtype('O') I get the same results on a Windows XP machine running python 2.7.2 and numpy 1.6.1. Kathy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/68c4af44/attachment.html> From nadavh at visionsense.com Tue Jan 24 10:37:09 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 24 Jan 2012 07:37:09 -0800 Subject: [Numpy-discussion] Strange error raised by scipy.special.erf In-Reply-To: <4F1EB9C8.9070309@crans.org> References: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local>, <4F1EB9C8.9070309@crans.org> Message-ID: <26FC23E7C398A64083C980D16001012D261F0D937A@VA3DIAXVS361.RED001.local> I filed a ticket (#1590). Thank you for the verification. Nadav. ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Pierre Haessig [pierre.haessig at crans.org] Sent: 24 January 2012 16:01 To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Strange error raised by scipy.special.erf Le 22/01/2012 11:28, Nadav Horesh a ?crit : > >>> special.erf(26.5) > 1.0 > >>> special.erf(26.6) > Traceback (most recent call last): > File "<pyshell#7>", line 1, in <module> > special.erf(26.6) > FloatingPointError: underflow encountered in erf > >>> special.erf(26.7) > 1.0 > I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0 Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird enough ! -- Pierre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From wardefar at iro.umontreal.ca Tue Jan 24 11:19:21 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 24 Jan 2012 11:19:21 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no> <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> Message-ID: <20120124161921.GA31456@ravage> On Tue, Jan 24, 2012 at 09:15:01AM +0000, Robert Kern wrote: > On Tue, Jan 24, 2012 at 08:37, Sturla Molden <sturla at molden.no> wrote: > > On 24.01.2012 09:21, Sturla Molden wrote: > > > >> randomkit.c handles C long correctly, I think. There are different codes > >> for 32 and 64 bit C long, and buffer sizes are size_t. > > > > distributions.c take C longs as parameters e.g. for the binomial > > distribution. mtrand.pyx correctly handles this, but it can give an > > unexpected overflow error on 64-bit Windows: > > > > > > In [1]: np.random.binomial(2**31, .5) > > --------------------------------------------------------------------------- > > OverflowError ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last) > > C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>() > > ----> 1 np.random.binomial(2**31, .5) > > > > C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in > > mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() > > > > OverflowError: Python int too large to convert to C long > > > > > > On systems where C longs are 64 bit, this is likely not to produce an > > error. > > > > This begs the question if also randomkit.c and districutions.c should be > > changed to use npy_intp for consistency across all platforms. > > There are two different uses of long that you need to distinguish. One > is for sizes, and one is for parameters and values. The sizes should > definitely be upgraded to npy_intp. The latter shouldn't; these should > remain as the default integer type of Python and numpy, a C long. Hmm. Seeing as the width of a C long is inconsistent, does this imply that the random number generator will produce different results on different platforms? Or do the state dynamics prevent it from ever growing in magnitude to the point where that's an issue? David From wardefar at iro.umontreal.ca Tue Jan 24 12:24:24 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 24 Jan 2012 12:24:24 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F1E3AD5.9000507@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> Message-ID: <20120124172424.GB31456@ravage> On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: > Den 23.01.2012 22:08, skrev Christoph Gohlke: > > > > Maybe this explains the win-amd64 behavior: There are a couple of places > > in mtrand where array indices and sizes are C long instead of npy_intp, > > for example in the randint function: > > > > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863> > > > > > > Both i and length could overflow here. It should overflow on allocation > of more than 2 GB. > > There is also a lot of C longs in the internal state (line 55-105), as > well as the other functions. > > Producing 2 GB of random ints twice fails: Sturla, since you seem to have access to Win64 machines, do you suppose you could try this code: >>> a = numpy.ones((1, 972)) >>> b = numpy.zeros((4993210,), dtype=int) >>> c = a[b] and verify that there's a whole lot of 0s in the matrix, specifically, >>> c[574519:].sum() 356.0 >>> c[574520:].sum() 0.0 is the case on Linux 64-bit; is it the case on Windows 64? Thanks a lot, David From robince at gmail.com Tue Jan 24 12:37:12 2012 From: robince at gmail.com (Robin) Date: Tue, 24 Jan 2012 18:37:12 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120124172424.GB31456@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage> Message-ID: <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley <wardefar at iro.umontreal.ca> wrote: > On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: >> Den 23.01.2012 22:08, skrev Christoph Gohlke: >> > >> > Maybe this explains the win-amd64 behavior: There are a couple of places >> > in mtrand where array indices and sizes are C long instead of npy_intp, >> > for example in the randint function: >> > >> > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863> >> > >> > >> >> Both i and length could overflow here. It should overflow on allocation >> of more than 2 GB. >> >> There is also a lot of C longs in the internal state (line 55-105), as >> well as the other functions. >> >> Producing 2 GB of random ints twice fails: > > Sturla, since you seem to have access to Win64 machines, do you suppose you > could try this code: > >>>> a = numpy.ones((1, 972)) >>>> b = numpy.zeros((4993210,), dtype=int) >>>> c = a[b] > > and verify that there's a whole lot of 0s in the matrix, specifically, > >>>> c[574519:].sum() > 356.0 >>>> c[574520:].sum() > 0.0 > > is the case on Linux 64-bit; is it the case on Windows 64? Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Cheers Robin From wardefar at iro.umontreal.ca Tue Jan 24 13:02:44 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 24 Jan 2012 13:02:44 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage> <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> Message-ID: <20120124180244.GD31456@ravage> On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me to it. David From kmichael.aye at gmail.com Tue Jan 24 13:33:30 2012 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Tue, 24 Jan 2012 19:33:30 +0100 Subject: [Numpy-discussion] bug in numpy.mean() ? Message-ID: <jfmthq$jhh$1@dough.gmane.org> I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.0243839999998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael From bsouthey at gmail.com Tue Jan 24 13:50:31 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 24 Jan 2012 12:50:31 -0600 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <jfmthq$jhh$1@dough.gmane.org> References: <jfmthq$jhh$1@dough.gmane.org> Message-ID: <4F1EFD77.2010004@gmail.com> On 01/24/2012 12:33 PM, K.-Michael Aye wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.0243839999998 > > In [4]: data.max() > Out[4]: 3052.4343 > > In [5]: data.shape > Out[5]: (1000, 1000) > > In [6]: data.min() > Out[6]: 3040.498 > > In [7]: data.dtype > Out[7]: dtype('float32') > > > A mean value calculated per loop over the data gives me 3045.747251076416 > I first thought I still misunderstand how data.mean() works, per axis > and so on, but did the same with a flattenend version with the same > results. > > Am I really soo tired that I can't see what I am doing wrong here? > For completion, the data was read by a osgeo.gdal dataset method called > ReadAsArray() > My numpy.__version__ gives me 1.6.1 and my whole setup is based on > Enthought's EPD. > > Best regards, > Michael > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: >>> a.mean(dtype=np.float32) # default and lacks precision 3067.0243839999998 >>> a.mean(dtype=np.float64) 3045.747251076416 >>> a.mean(dtype=np.float128) 3045.7472510764160156 >>> b=a.astype(np.float128) >>> b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce From Kathleen.M.Tacina at nasa.gov Tue Jan 24 13:53:27 2012 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Tue, 24 Jan 2012 18:53:27 +0000 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <jfmthq$jhh$1@dough.gmane.org> References: <jfmthq$jhh$1@dough.gmane.org> Message-ID: <1327431207.6882.111.camel@MOSES.grc.nasa.gov> I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation value: 3045.7472500000002. With the original 32-bit array, averaging successively on one axis and then on the other gives answers that agree with the 64-bit float answer to the second decimal place. In [125]: d = np.load('data.npy') In [126]: d.mean() Out[126]: 3067.0243839999998 In [127]: d64 = d.astype('float64') In [128]: d64.mean() Out[128]: 3045.747251076416 In [129]: d.mean(axis=0).mean() Out[129]: 3045.7487500000002 In [130]: d.mean(axis=1).mean() Out[130]: 3045.7444999999998 In [131]: np.version.full_version Out[131]: '2.0.0.dev-55472ca' -- On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.0243839999998 > > In [4]: data.max() > Out[4]: 3052.4343 > > In [5]: data.shape > Out[5]: (1000, 1000) > > In [6]: data.min() > Out[6]: 3040.498 > > In [7]: data.dtype > Out[7]: dtype('float32') > > > A mean value calculated per loop over the data gives me 3045.747251076416 > I first thought I still misunderstand how data.mean() works, per axis > and so on, but did the same with a flattenend version with the same > results. > > Am I really soo tired that I can't see what I am doing wrong here? > For completion, the data was read by a osgeo.gdal dataset method called > ReadAsArray() > My numpy.__version__ gives me 1.6.1 and my whole setup is based on > Enthought's EPD. > > Best regards, > Michael > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- -------------------------------------------------- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/9c41e8fd/attachment.html> From zachary.pincus at yale.edu Tue Jan 24 13:58:57 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 24 Jan 2012 13:58:57 -0500 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <jfmthq$jhh$1@dough.gmane.org> References: <jfmthq$jhh$1@dough.gmane.org> Message-ID: <919864B3-F3D0-4C18-8669-BB4DCB16172F@yale.edu> On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.0243839999998 > > In [4]: data.max() > Out[4]: 3052.4343 > > In [5]: data.shape > Out[5]: (1000, 1000) > > In [6]: data.min() > Out[6]: 3040.498 > > In [7]: data.dtype > Out[7]: dtype('float32') > > > A mean value calculated per loop over the data gives me 3045.747251076416 > I first thought I still misunderstand how data.mean() works, per axis > and so on, but did the same with a flattenend version with the same > results. > > Am I really soo tired that I can't see what I am doing wrong here? > For completion, the data was read by a osgeo.gdal dataset method called > ReadAsArray() > My numpy.__version__ gives me 1.6.1 and my whole setup is based on > Enthought's EPD. I get the same result: In [1]: import numpy In [2]: data = numpy.load('data.npy') In [3]: data.mean() Out[3]: 3067.0243839999998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.min() Out[5]: 3040.498 In [6]: numpy.version.version Out[6]: '2.0.0.dev-433b02a' This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference. The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological? This is fun too: In [12]: data.mean() Out[12]: 3067.0243839999998 In [13]: (data/3000).mean()*3000 Out[13]: 3020.8074375000001 In [15]: (data/2).mean()*2 Out[15]: 3067.0243839999998 In [16]: (data/200).mean()*200 Out[16]: 3013.6754000000001 Zach From kalatsky at gmail.com Tue Jan 24 14:01:40 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Tue, 24 Jan 2012 13:01:40 -0600 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <jfmthq$jhh$1@dough.gmane.org> References: <jfmthq$jhh$1@dough.gmane.org> Message-ID: <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the "bad" mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye <kmichael.aye at gmail.com>wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.0243839999998 > > In [4]: data.max() > Out[4]: 3052.4343 > > In [5]: data.shape > Out[5]: (1000, 1000) > > In [6]: data.min() > Out[6]: 3040.498 > > In [7]: data.dtype > Out[7]: dtype('float32') > > > A mean value calculated per loop over the data gives me 3045.747251076416 > I first thought I still misunderstand how data.mean() works, per axis > and so on, but did the same with a flattenend version with the same > results. > > Am I really soo tired that I can't see what I am doing wrong here? > For completion, the data was read by a osgeo.gdal dataset method called > ReadAsArray() > My numpy.__version__ gives me 1.6.1 and my whole setup is based on > Enthought's EPD. > > Best regards, > Michael > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/32c5136b/attachment.html> From zachary.pincus at yale.edu Tue Jan 24 14:05:50 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 24 Jan 2012 14:05:50 -0500 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <4F1EFD77.2010004@gmail.com> References: <jfmthq$jhh$1@dough.gmane.org> <4F1EFD77.2010004@gmail.com> Message-ID: <57F1B495-BF3C-4C9B-9767-04A4F2AFF1DD@yale.edu> > You have a million 32-bit floating point numbers that are in the > thousands. Thus you are exceeding the 32-bitfloat precision and, if you > can, you need to increase precision of the accumulator in np.mean() or > change the input dtype: >>>> a.mean(dtype=np.float32) # default and lacks precision > 3067.0243839999998 >>>> a.mean(dtype=np.float64) > 3045.747251076416 >>>> a.mean(dtype=np.float128) > 3045.7472510764160156 >>>> b=a.astype(np.float128) >>>> b.mean() > 3045.7472510764160156 > > Otherwise you are left to using some alternative approach to calculate > the mean. > > Bruce Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know! Zach From kmichael.aye at gmail.com Tue Jan 24 14:48:31 2012 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Tue, 24 Jan 2012 20:48:31 +0100 Subject: [Numpy-discussion] bug in numpy.mean() ? References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> Message-ID: <jfn1ug$kve$1@dough.gmane.org> Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert later to the type of the original array? As one can see in this case, the result would be much closer to the true value. Michael On 2012-01-24 19:01:40 +0000, Val Kalatsky said: > > Just what Bruce said.? > > You can run the following to confirm: > np.mean(data - data.mean()) > > If for some reason you do not want to convert to float64 you can add > the result of the previous line to the "bad" mean: > bad_mean =?data.mean() > good_mean =?bad_mean +?np.mean(data - bad_mean) > > Val > > On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye > <kmichael.aye at gmail.com> wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.0243839999998 > > In [4]: data.max() > Out[4]: 3052.4343 > > In [5]: data.shape > Out[5]: (1000, 1000) > > In [6]: data.min() > Out[6]: 3040.498 > > In [7]: data.dtype > Out[7]: dtype('float32') > > > A mean value calculated per loop over the data gives me 3045.747251076416 > I first thought I still misunderstand how data.mean() works, per axis > and so on, but did the same with a flattenend version with the same > results. > > Am I really soo tired that I can't see what I am doing wrong here? > For completion, the data was read by a osgeo.gdal dataset method called > ReadAsArray() > My numpy.__version__ gives me 1.6.1 and my whole setup is based on > Enthought's EPD. > > Best regards, > Michael > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/98be8f9e/attachment.html> From mmueller at python-academy.de Tue Jan 24 15:49:43 2012 From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=) Date: Tue, 24 Jan 2012 21:49:43 +0100 Subject: [Numpy-discussion] Course "Python for Scientists and Engineers" in Chicago Message-ID: <4F1F1967.3030209@python-academy.de> Course "Python for Scientists and Engineers" in Chicago ======================================================= There will be a comprehensive Python course for scientists and engineers in Chicago end of February / beginning of March 2012. It consists of a 3-day intro and a 2-day advanced section. Both sections can be taken separately or combined. More details below and here: http://www.dabeaz.com/chicago/science.html Please let friends or colleagues who might be interested in such a course know about it. 3-Day Intro Section ------------------- - Overview of Scientific and Technical Libraries for Python. - Numerical Calculations with NumPy - Storage and Processing of Large Amounts of Data - Graphical Presentation of Scientific Data with matplotlib - Object Oriented Programming for Scientific and Technical Projects - Open Time for Problem Solving 2-Day Advanced Section ---------------------- - Extending Python with Other Languages - Unit Testing - Version Control with Mercurial The Details ----------- The course is hosted by David Beazley (http://www.dabeaz.com). Date: Feb 27 - Mar 2, 2012 Location: Chicago, IL, USA Trainer: Mike M?ller Course Language: English Link: http://www.dabeaz.com/chicago/science.html From wardefar at iro.umontreal.ca Tue Jan 24 17:30:32 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 24 Jan 2012 17:30:32 -0500 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120124180244.GD31456@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage> <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> <20120124180244.GD31456@ravage> Message-ID: <20120124223032.GG31456@ravage> On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: > On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: > > > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. > > Alright, so that rules out platform specific effects. > > I'll try and hunt the bug down when I have some time, if someone more > familiar with the indexing code doesn't beat me to it. I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap is using an int for a counter variable where it should be using an npy_intp. I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a regression test. David From scipy at samueljohn.de Tue Jan 24 17:32:50 2012 From: scipy at samueljohn.de (Samuel John) Date: Tue, 24 Jan 2012 23:32:50 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> Message-ID: <A75E31A0-FD6A-4DD9-89B4-6CEA57FF0468@samueljohn.de> On 23.01.2012, at 11:23, David Warde-Farley wrote: >> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8') >> b = numpy.random.randint(5000000,size=(4993210,)) >> c = a[b] >> In [14]: c[1000000:].sum() >> Out[14]: 0 Same here. Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM, numpy.__version__ = 2.0.0.dev-55472ca [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)] Numpy built without llvm. From scipy at samueljohn.de Tue Jan 24 17:36:48 2012 From: scipy at samueljohn.de (Samuel John) Date: Tue, 24 Jan 2012 23:36:48 +0100 Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type In-Reply-To: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> Message-ID: <AD0C30C5-03F6-48D8-9E67-3EA70DB24628@samueljohn.de> I get the same results as you, Kathy. *surprised* (On OS X (Lion), 64 bit, numpy 2.0.0.dev-55472ca, Python 2.7.2. On 24.01.2012, at 16:29, Kathleen M Tacina wrote: > I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1. I would have expected np.min_scalar_type(2**64-1) to return uint64. Instead, I get object. Further experimenting showed that the largest integer for which np.min_scalar_type will return uint64 is 2**63-1. Is this expected behavior? > > On python 2.7.2 on a 64-bit linux machine: > >>> import numpy as np > >>> np.version.full_version > '2.0.0.dev-55472ca' > >>> np.min_scalar_type(2**8-1) > dtype('uint8') > >>> np.min_scalar_type(2**16-1) > dtype('uint16') > >>> np.min_scalar_type(2**32-1) > dtype('uint32') > >>> np.min_scalar_type(2**64-1) > dtype('O') > >>> np.min_scalar_type(2**63-1) > dtype('uint64') > >>> np.min_scalar_type(2**63) > dtype('O') > > I get the same results on a Windows XP machine running python 2.7.2 and numpy 1.6.1. > > Kathy > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scipy at samueljohn.de Tue Jan 24 17:44:10 2012 From: scipy at samueljohn.de (Samuel John) Date: Tue, 24 Jan 2012 23:44:10 +0100 Subject: [Numpy-discussion] 'Advanced' save and restore operation In-Reply-To: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com> Message-ID: <24F05A4F-D070-47D5-8F00-B5F9314B239A@samueljohn.de> I know you wrote that you want "TEXT" files, but never-the-less, I'd like to point to http://code.google.com/p/h5py/ . There are viewers for hdf5 and it is stable and widely used. Samuel On 24.01.2012, at 00:26, Emmanuel Mayssat wrote: > After having saved data, I need to know/remember the data dtype to > restore it correctly. > Is there a way to save the dtype with the data? > (I guess the header parameter of savedata could help, but they are > only available in v2.0+ ) > > I would like to save several related structured array and a dictionary > of parameters into a TEXT file. > Is there an easy way to do that? > (maybe xml file, or maybe archive zip file of other files, or ..... ) > > Any recommendation is helpful. > > Regards, > -- > Emmanuel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scipy at samueljohn.de Tue Jan 24 18:06:08 2012 From: scipy at samueljohn.de (Samuel John) Date: Wed, 25 Jan 2012 00:06:08 +0100 Subject: [Numpy-discussion] installing matplotlib in MacOs 10.6.8. In-Reply-To: <CAMW75YuPwz1BP+qPt++Ovu9R9y0WQvo1E25Qig0BV_eDcsGFeg@mail.gmail.com> References: <CAMW75YuPwz1BP+qPt++Ovu9R9y0WQvo1E25Qig0BV_eDcsGFeg@mail.gmail.com> Message-ID: <5C5F4BC6-E4F7-433F-8A45-BE6D44080DB9@samueljohn.de> Sorry for the late answer. But at least for the record: If you are using eclipse, I assume you have also installed the eclipse plugin [pydev](http://pydev.org/). Is use it myself, it's good. Then you have to go to the preferences->pydev->PythonInterpreter and select the python version you want to use by searching for the "Python" executable. I am not familiar with the pre-built versions of matplotlib. Perhaps they miss the 64bit intel versions? Perhaps you can find a lib (.so file) in matplotlib and use the "file" command to see the architectures, it was built for. You should be able to install matplotlib also with `pip install matplotlib`. (if you have pip) Samuel On 26.12.2011, at 06:40, Alex Ter-Sarkissov wrote: > hi everyone, I run python 2.7.2. in Eclipse (recently upgraded from 2.6). I have a problem with installing matplotlib (I found the version for python 2.7. MacOs 10.3, no later versions). If I run python in terminal using arch -i386 python, and then > > from matplotlib.pylab import * > > and similar stuff, everything works fine. If I run python in eclipse or just without arch -i386, I can import matplotlib as > > from matplotlib import * > > but actually nothing gets imported. If I do it in the same way as above, I get the message > > no matching architecture in universal wrapper > > which means there's conflict of versions or something like that. I tried reinstalling the interpreter and adding matplotlib to forced built-ins, but nothing helped. For some reason I didn't have this problem with numpy and tkinter. > > Any suggestions are appreciated. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From e.antero.tammi at gmail.com Tue Jan 24 18:12:06 2012 From: e.antero.tammi at gmail.com (eat) Date: Wed, 25 Jan 2012 01:12:06 +0200 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <jfn1ug$kve$1@dough.gmane.org> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> Message-ID: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.7471999999998 In []: d.mean(dtype= np.float32) Out[]: 3045.7471999999998 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.7472499999999 In []: d.mean(axis= 1).mean() Out[]: 3045.7472499999999 Or does the results of calculations depend more on the platform? My 2 cents, eat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/912a52b8/attachment.html> From Kathleen.M.Tacina at nasa.gov Tue Jan 24 18:21:35 2012 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Tue, 24 Jan 2012 23:21:35 +0000 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail .com> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> Message-ID: <1327447295.6882.139.camel@MOSES.grc.nasa.gov> I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: >>>a = np.ones((1024,1024),dtype=np.float32) >>>a.mean() 4000.0 >>>np.version.full_version '1.6.1' On Tue, 2012-01-24 at 17:12 -0600, eat wrote: > Hi, > > > > Oddly, but numpy 1.6 seems to behave more consistent manner: > > > In []: sys.version > Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit > (Intel)]' > In []: np.version.version > Out[]: '1.6.0' > > > In []: d= np.load('data.npy') > In []: d.dtype > Out[]: dtype('float32') > > > In []: d.mean() > Out[]: 3045.7471999999998 > In []: d.mean(dtype= np.float32) > Out[]: 3045.7471999999998 > In []: d.mean(dtype= np.float64) > Out[]: 3045.747251076416 > In []: (d- d.min()).mean()+ d.min() > Out[]: 3045.7472508750002 > In []: d.mean(axis= 0).mean() > Out[]: 3045.7472499999999 > In []: d.mean(axis= 1).mean() > Out[]: 3045.7472499999999 > > > Or does the results of calculations depend more on the platform? > > > > > My 2 cents, > eat -- -------------------------------------------------- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/5cc422b3/attachment.html> From wardefar at iro.umontreal.ca Tue Jan 24 18:22:26 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 24 Jan 2012 18:22:26 -0500 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> Message-ID: <20120124232226.GH31456@ravage> On Wed, Jan 25, 2012 at 01:12:06AM +0200, eat wrote: > Or does the results of calculations depend more on the platform? Floating point operations often do, sadly (not saying that this is the case here, but you'd need to try both versions on the same machine [or at least architecture/bit-width]/same platform to be certain). David From mwwiebe at gmail.com Tue Jan 24 18:49:05 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 24 Jan 2012 15:49:05 -0800 Subject: [Numpy-discussion] einsum evaluation order In-Reply-To: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com> References: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com> Message-ID: <CAMRnEmqMKkCBg5vXVeEApOSGRNz8Dv9dW49YEeftYd_8GF-UCA@mail.gmail.com> On Tue, Jan 24, 2012 at 6:32 AM, S?ren Gammelmark <gammelmark at gmail.com>wrote: > Dear all, > > I was just looking into numpy.einsum and encountered an issue which might > be worth pointing out in the documentation. > > Let us say you wish to evaluate something like this (repeated indices a > summed) > > D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime, > sigma] * C[beta, betaprime] > > with einsum as > > einsum('abs,cds,bd->ac', A, B, C) > > then it is not exactly clear which order einsum evaluates the contractions > (or if it does it in one go). This can be very important since you can do > it in several ways, one of which has the least computational complexity. > The most efficient way of doing it is to contract e.g. A and C and then > contract that with B (or exchange A and B). A quick test on my labtop says > 2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x > 2 and C is D x D for D = 96. This scaling seems to explode for higher > dimensions, whereas it is much better with the two independent contractions > (i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two > contractions, whereas I stopped waiting after 60 s for einsum (i guess > einsum probably is O(D^4) in this case). > You are correct, einsum presently uses the most naive evaluation. > I had in fact thought of making a function similar to einsum for a while, > but after I noticed it dropped it. I think, however, that there might still > be room for a tool for evaluating more complicated expressions efficiently. > I think the best way would be for the user to enter an expression like the > one above which is then evaluated in the optimal order. I know how to do > this (theoretically) if all the repeated indices only occur twice (like the > expression above), but for the more general expression supported by einsum > I om not sure how to do it (haven't thought about it). Here I am thinking > about stuff like x[i] = a[i] * b[i] and their more general counterparts (at > first glance this seems to be a simpler problem than full contractions). Do > you think there is a need/interest for this kind of thing? In that case I > would like the write it / help write it. Much of it, I think, can be > reduced to decomposing the expression into existing numpy operations (e.g. > tensordot). How to incorporate issues of storage layout etc, however, I > have no idea. > I think a good approach would be to modify einsum so it decomposes the expression into multiple products. It may even just be a simple dynamic programming problem, but I haven't given it much thought. In any case I think it might be nice to write explicitly how the expression > in einsum is evaluated in the docs. > That's a good idea, yes. Thanks, Mark > > S?ren Gammelmark > PhD-student > Department of Physics and Astronomy > Aarhus University > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/6b83d15a/attachment.html> From e.antero.tammi at gmail.com Tue Jan 24 19:21:10 2012 From: e.antero.tammi at gmail.com (eat) Date: Wed, 25 Jan 2012 02:21:10 +0200 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <1327447295.6882.139.camel@MOSES.grc.nasa.gov> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> <1327447295.6882.139.camel@MOSES.grc.nasa.gov> Message-ID: <CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com> Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina < Kathleen.M.Tacina at nasa.gov> wrote: > ** > I found something similar, with a very simple example. > > On 64-bit linux, python 2.7.2, numpy development version: > > In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) > > In [23]: a.mean() > Out[23]: 4034.16357421875 > > In [24]: np.version.full_version > Out[24]: '2.0.0.dev-55472ca' > > > But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: > >>>a = np.ones((1024,1024),dtype=np.float32) > >>>a.mean() > 4000.0 > >>>np.version.full_version > '1.6.1' > This indeed looks very nasty, regardless of whether it is a version or platform related problem. -eat > > > > On Tue, 2012-01-24 at 17:12 -0600, eat wrote: > > Hi, > > > > Oddly, but numpy 1.6 seems to behave more consistent manner: > > > > In []: sys.version > > Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit > (Intel)]' > > In []: np.version.version > > Out[]: '1.6.0' > > > > In []: d= np.load('data.npy') > > In []: d.dtype > > Out[]: dtype('float32') > > > > In []: d.mean() > > Out[]: 3045.7471999999998 > > In []: d.mean(dtype= np.float32) > > Out[]: 3045.7471999999998 > > In []: d.mean(dtype= np.float64) > > Out[]: 3045.747251076416 > > In []: (d- d.min()).mean()+ d.min() > > Out[]: 3045.7472508750002 > > In []: d.mean(axis= 0).mean() > > Out[]: 3045.7472499999999 > > In []: d.mean(axis= 1).mean() > > Out[]: 3045.7472499999999 > > > > Or does the results of calculations depend more on the platform? > > > > > > My 2 cents, > > eat > > -- > -------------------------------------------------- > Kathleen M. Tacina > NASA Glenn Research Center > MS 5-10 > 21000 Brookpark Road > Cleveland, OH 44135 > Telephone: (216) 433-6660 > Fax: (216) 433-5802 > -------------------------------------------------- > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/4104b4db/attachment.html> From questions.anon at gmail.com Tue Jan 24 19:22:52 2012 From: questions.anon at gmail.com (questions anon) Date: Wed, 25 Jan 2012 11:22:52 +1100 Subject: [Numpy-discussion] numpy.percentile multiple arrays Message-ID: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> I need some help understanding how to loop through many arrays to calculate the 95th percentile. I can easily do this by using numpy.concatenate to make one big array and then finding the 95th percentile using numpy.percentile but this causes a memory error when I want to run this on 100's of netcdf files (see code below). Any alternative methods will be greatly appreciated. all_TSFC=[] for (path, dirs, files) in os.walk(MainFolder): for dir in dirs: print dir path=path+'/' for ncfile in files: if ncfile[-3:]=='.nc': print "dealing with ncfiles:", ncfile ncfile=os.path.join(path,ncfile) ncfile=Dataset(ncfile, 'r+', 'NETCDF4') TSFC=ncfile.variables['T_SFC'][:] ncfile.close() all_TSFC.append(TSFC) big_array=N.ma.concatenate(all_TSFC) Percentile95th=N.percentile(big_array, 95, axis=0) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/f0d2e515/attachment.html> From mwwiebe at gmail.com Tue Jan 24 19:33:44 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 24 Jan 2012 16:33:44 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> Message-ID: <CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com> 2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com> > <snip> > > Let me know if you figure out something. I think the "mask" thing is > quite slow, but the problem is that it needs to be there, to catch > overflows (and it is there in Fortran as well, see the > "where" statement, which does the same thing). Maybe there is some > other way to write the same thing in NumPy? > In the current master, you can replace z[mask] *= z[mask] z[mask] += c[mask] with np.multiply(z, z, out=z, where=mask) np.add(z, c, out=z, where=mask) The performance of this alternate syntax is still not great, but it is significantly faster than what it replaces. For a particular choice of mask, I get In [40]: timeit z[mask] *= z[mask] 10 loops, best of 3: 29.1 ms per loop In [41]: timeit np.multiply(z, z, out=z, where=mask) 100 loops, best of 3: 4.2 ms per loop -Mark > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/0a9719a7/attachment.html> From marc.shivers at gmail.com Tue Jan 24 19:55:48 2012 From: marc.shivers at gmail.com (Marc Shivers) Date: Tue, 24 Jan 2012 19:55:48 -0500 Subject: [Numpy-discussion] numpy.percentile multiple arrays In-Reply-To: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> Message-ID: <CAGFio5ZC3WYK1c4O=OZ30PenfoPNQXxE-4veSQLsR0Y5kkgXeg@mail.gmail.com> This is probably not the best way to do it, but I think it would work: Your could take two passes through your data, first calculating and storing the median for each file and the number of elements in each file. From those data, you can get a lower bound on the 95th percentile of the combined dataset. For example, if all the files are the same size, and you've got 100 of them, then the 95th percentile of the full dataset would be at least as large as the 90th percentile of the individual file median values. Once you've got that cut-off value, go back through your files and just pull out the values larger than your cut-off value. Then you'd just need to figure out what percentile in this subset would correspond to the 95th percentile in the full dataset. HTH, Marc On Tue, Jan 24, 2012 at 7:22 PM, questions anon <questions.anon at gmail.com>wrote: > I need some help understanding how to loop through many arrays to > calculate the 95th percentile. > I can easily do this by using numpy.concatenate to make one big array and > then finding the 95th percentile using numpy.percentile but this causes a > memory error when I want to run this on 100's of netcdf files (see code > below). > Any alternative methods will be greatly appreciated. > > > all_TSFC=[] > for (path, dirs, files) in os.walk(MainFolder): > for dir in dirs: > print dir > path=path+'/' > for ncfile in files: > if ncfile[-3:]=='.nc': > print "dealing with ncfiles:", ncfile > ncfile=os.path.join(path,ncfile) > ncfile=Dataset(ncfile, 'r+', 'NETCDF4') > TSFC=ncfile.variables['T_SFC'][:] > ncfile.close() > all_TSFC.append(TSFC) > > big_array=N.ma.concatenate(all_TSFC) > Percentile95th=N.percentile(big_array, 95, axis=0) > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/bbb161f8/attachment.html> From mwwiebe at gmail.com Tue Jan 24 19:56:34 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 24 Jan 2012 16:56:34 -0800 Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type In-Reply-To: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> Message-ID: <CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail.com> On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina < Kathleen.M.Tacina at nasa.gov> wrote: > ** > I was experimenting with np.min_scalar_type to make sure it worked as > expected, and found some unexpected results for integers between 2**63 and > 2**64-1. I would have expected np.min_scalar_type(2**64-1) to return > uint64. Instead, I get object. Further experimenting showed that the > largest integer for which np.min_scalar_type will return uint64 is > 2**63-1. Is this expected behavior? > This is a bug in how numpy detects the dtype of python objects. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.c#L18 You can see there it's only checking for a signed long long, not accounting for the unsigned case. I created a ticket for you here: http://projects.scipy.org/numpy/ticket/2028 -Mark > > On python 2.7.2 on a 64-bit linux machine: > >>> import numpy as np > >>> np.version.full_version > '2.0.0.dev-55472ca' > >>> np.min_scalar_type(2**8-1) > dtype('uint8') > >>> np.min_scalar_type(2**16-1) > dtype('uint16') > >>> np.min_scalar_type(2**32-1) > dtype('uint32') > >>> np.min_scalar_type(2**64-1) > dtype('O') > >>> np.min_scalar_type(2**63-1) > dtype('uint64') > >>> np.min_scalar_type(2**63) > dtype('O') > > I get the same results on a Windows XP machine running python 2.7.2 and > numpy 1.6.1. > > Kathy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/c4ea701a/attachment.html> From mwwiebe at gmail.com Tue Jan 24 19:59:22 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 24 Jan 2012 16:59:22 -0800 Subject: [Numpy-discussion] Fix for ticket #1973 In-Reply-To: <CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com> References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com> <4F144455.9020904@gmail.com> <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com> <CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com> Message-ID: <CAMRnEmqif+dS-EEJbL+tEQNH1oa_=XzEE9QPaG5fRjsTEC2uQw@mail.gmail.com> On Mon, Jan 16, 2012 at 8:14 AM, Charles R Harris <charlesr.harris at gmail.com > wrote: > > > On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com>wrote: >> >>> ** >>> On 01/14/2012 04:31 PM, Charles R Harris wrote: >>> >>> I've put up a pull request for a fix to ticket #1973. Currently the fix >>> simply propagates the maskna flag when the *.astype method is called. A >>> more complicated option would be to add a maskna keyword to specify whether >>> the output is masked or not or propagates the type of the source, but that >>> seems overly complex to me. >>> >>> Thoughts? >>> >>> Chuck >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> Thanks for the correction and as well as the fix. While it worked for >>> integer and floats (not complex ones), I got an error when using complex >>> dtypes. This error that is also present in array creation of complex >>> dtypes. Is this known or a new bug? >>> >>> If it is new, then we need to identify what functionality should handle >>> np.NA but are not working. >>> >>> Bruce >>> >>> $ python >>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) >>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> import numpy as np >>> >>> np.__version__ # pull request version >>> '2.0.0.dev-88f9276' >>> >>> np.array([1,2], dtype=np.complex) >>> array([ 1.+0.j, 2.+0.j]) >>> >>> np.array([1,2, np.NA], dtype=np.complex) >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line >>> 1445, in array_repr >>> ', ', "array(") >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 459, in array2string >>> separator, prefix, formatter=formatter) >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 263, in _array2string >>> suppress_small), >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 724, in __init__ >>> self.real_format = FloatFormat(x.real, precision, suppress_small) >>> ValueError: Cannot construct a view of data together with the >>> NPY_ARRAY_MASKNA flag, the NA mask must be added later >>> >>> ca=np.array([1,2], dtype=np.complex, maskna=True) >>> >>> ca[1]=np.NA >>> >>> ca >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line >>> 1445, in array_repr >>> ', ', "array(") >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 459, in array2string >>> separator, prefix, formatter=formatter) >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 263, in _array2string >>> suppress_small), >>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", >>> line 724, in __init__ >>> self.real_format = FloatFormat(x.real, precision, suppress_small) >>> ValueError: Cannot construct a view of data together with the >>> NPY_ARRAY_MASKNA flag, the NA mask must be added later >>> >>> >>> >>> >> Looks like a different bug involving the *.real and *.imag views. I'll >> take a look. >> >> > Looks like views of masked arrays have other problems: > > In [13]: a = ones(3, int16, maskna=1) > > In [14]: a.view(int8) > Out[14]: array([1, 0, 1, NA, 1, NA], dtype=int8) > > This looks like a serious bug to me, to avoid memory corruption issues it should raise an exception. -Mark > > I'm not sure what the policy should be here. One could construct a new > mask adapted to the view, raise an error when the types don't align (I > think the real/imag parts should be considered aligned), or just let the > view unmask the array. The last seems dangerous. Hmm... > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/dbb841ef/attachment.html> From brett.olsen at gmail.com Tue Jan 24 21:26:28 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Tue, 24 Jan 2012 20:26:28 -0600 Subject: [Numpy-discussion] numpy.percentile multiple arrays In-Reply-To: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> Message-ID: <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com> On Tue, Jan 24, 2012 at 6:22 PM, questions anon <questions.anon at gmail.com> wrote: > I need some help understanding how to loop through many arrays to calculate > the 95th percentile. > I can easily do this by using numpy.concatenate to make one big array and > then finding the 95th percentile using numpy.percentile but this causes a > memory error when I want to run this on 100's of netcdf files (see code > below). > Any alternative methods will be greatly appreciated. > > > all_TSFC=[] > for (path, dirs, files) in os.walk(MainFolder): > ??? for dir in dirs: > ??????? print dir > ??? path=path+'/' > ??? for ncfile in files: > ??????? if ncfile[-3:]=='.nc': > ??????????? print "dealing with ncfiles:", ncfile > ??????????? ncfile=os.path.join(path,ncfile) > ??????????? ncfile=Dataset(ncfile, 'r+', 'NETCDF4') > ??????????? TSFC=ncfile.variables['T_SFC'][:] > ??????????? ncfile.close() > ??????????? all_TSFC.append(TSFC) > > big_array=N.ma.concatenate(all_TSFC) > Percentile95th=N.percentile(big_array, 95, axis=0) If the range of your data is known and limited (i.e., you have a comparatively small number of possible values, but a number of repeats of each value) then you could do this by keeping a running cumulative distribution function as you go through each of your files. For each file, calculate a cumulative distribution function --- at each possible value, record the fraction of that population strictly less than that value --- and then it's straightforward to combine the cumulative distribution functions from two separate files: cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2) Then once you've gone through all the files, look for the value where your cumulative distribution function is equal to 0.95. If your data isn't structured with repeated values, though, this won't work, because your cumulative distribution function will become too big to hold into memory. In that case, what I would probably do would be an iterative approach: make an approximation to the exact function by removing some fraction of the possible values, which will provide a limited range for the exact percentile you want, and then walk through the files again calculating the function more exactly within the limited range, repeating until you have the value to the desired precision. ~Brett From questions.anon at gmail.com Tue Jan 24 22:49:46 2012 From: questions.anon at gmail.com (questions anon) Date: Wed, 25 Jan 2012 14:49:46 +1100 Subject: [Numpy-discussion] numpy.percentile multiple arrays In-Reply-To: <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com> References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com> Message-ID: <CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com> thanks for your responses, because of the size of the dataset I will still end up with the memory error if I calculate the median for each file, additionally the files are not all the same size. I believe this memory problem will still arise with the cumulative distribution calculation and not sure I understand how to write the second suggestion about the iterative approach but will have a go. Thanks again On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen <brett.olsen at gmail.com> wrote: > On Tue, Jan 24, 2012 at 6:22 PM, questions anon > <questions.anon at gmail.com> wrote: > > I need some help understanding how to loop through many arrays to > calculate > > the 95th percentile. > > I can easily do this by using numpy.concatenate to make one big array and > > then finding the 95th percentile using numpy.percentile but this causes a > > memory error when I want to run this on 100's of netcdf files (see code > > below). > > Any alternative methods will be greatly appreciated. > > > > > > all_TSFC=[] > > for (path, dirs, files) in os.walk(MainFolder): > > for dir in dirs: > > print dir > > path=path+'/' > > for ncfile in files: > > if ncfile[-3:]=='.nc': > > print "dealing with ncfiles:", ncfile > > ncfile=os.path.join(path,ncfile) > > ncfile=Dataset(ncfile, 'r+', 'NETCDF4') > > TSFC=ncfile.variables['T_SFC'][:] > > ncfile.close() > > all_TSFC.append(TSFC) > > > > big_array=N.ma.concatenate(all_TSFC) > > Percentile95th=N.percentile(big_array, 95, axis=0) > > If the range of your data is known and limited (i.e., you have a > comparatively small number of possible values, but a number of repeats > of each value) then you could do this by keeping a running cumulative > distribution function as you go through each of your files. For each > file, calculate a cumulative distribution function --- at each > possible value, record the fraction of that population strictly less > than that value --- and then it's straightforward to combine the > cumulative distribution functions from two separate files: > cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2) > > Then once you've gone through all the files, look for the value where > your cumulative distribution function is equal to 0.95. If your data > isn't structured with repeated values, though, this won't work, > because your cumulative distribution function will become too big to > hold into memory. In that case, what I would probably do would be an > iterative approach: make an approximation to the exact function by > removing some fraction of the possible values, which will provide a > limited range for the exact percentile you want, and then walk through > the files again calculating the function more exactly within the > limited range, repeating until you have the value to the desired > precision. > > ~Brett > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/761fa871/attachment.html> From shish at keba.be Tue Jan 24 23:00:10 2012 From: shish at keba.be (Olivier Delalleau) Date: Tue, 24 Jan 2012 23:00:10 -0500 Subject: [Numpy-discussion] numpy.percentile multiple arrays In-Reply-To: <CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com> References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com> <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com> <CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com> Message-ID: <CAFXk4bpDUUwHk+d95wDBDQpiKscaTGKrezPqW=aGG7dOKS_3kQ@mail.gmail.com> Note that if you are ok with an approximate solution, and you can assume your data is somewhat shuffled, a simple online algorithm that uses no memory consists in: - choosing a small step size delta - initializing your percentile p to a more or less random value (a meaningful guess is better though) - iterate through your samples, updating p after each sample by p += 19 * delta if sample > p, and p -= delta otherwise The idea is that the 95th percentile is such that 5% of the data is higher, and 95% (19 times more) is lower, so if p is equal to this value, on average it should remain constant through the online update. You may do multiple passes if you are not confident in your initial value, possibly reducing delta over time to improve accuracy. -=- Olivier 2012/1/24 questions anon <questions.anon at gmail.com> > thanks for your responses, > because of the size of the dataset I will still end up with the memory > error if I calculate the median for each file, additionally the files are > not all the same size. I believe this memory problem will still arise with > the cumulative distribution calculation and not sure I understand how to > write the second suggestion about the iterative approach but will have a go. > Thanks again > > > On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen <brett.olsen at gmail.com>wrote: > >> On Tue, Jan 24, 2012 at 6:22 PM, questions anon >> <questions.anon at gmail.com> wrote: >> > I need some help understanding how to loop through many arrays to >> calculate >> > the 95th percentile. >> > I can easily do this by using numpy.concatenate to make one big array >> and >> > then finding the 95th percentile using numpy.percentile but this causes >> a >> > memory error when I want to run this on 100's of netcdf files (see code >> > below). >> > Any alternative methods will be greatly appreciated. >> > >> > >> > all_TSFC=[] >> > for (path, dirs, files) in os.walk(MainFolder): >> > for dir in dirs: >> > print dir >> > path=path+'/' >> > for ncfile in files: >> > if ncfile[-3:]=='.nc': >> > print "dealing with ncfiles:", ncfile >> > ncfile=os.path.join(path,ncfile) >> > ncfile=Dataset(ncfile, 'r+', 'NETCDF4') >> > TSFC=ncfile.variables['T_SFC'][:] >> > ncfile.close() >> > all_TSFC.append(TSFC) >> > >> > big_array=N.ma.concatenate(all_TSFC) >> > Percentile95th=N.percentile(big_array, 95, axis=0) >> >> If the range of your data is known and limited (i.e., you have a >> comparatively small number of possible values, but a number of repeats >> of each value) then you could do this by keeping a running cumulative >> distribution function as you go through each of your files. For each >> file, calculate a cumulative distribution function --- at each >> possible value, record the fraction of that population strictly less >> than that value --- and then it's straightforward to combine the >> cumulative distribution functions from two separate files: >> cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2) >> >> Then once you've gone through all the files, look for the value where >> your cumulative distribution function is equal to 0.95. If your data >> isn't structured with repeated values, though, this won't work, >> because your cumulative distribution function will become too big to >> hold into memory. In that case, what I would probably do would be an >> iterative approach: make an approximation to the exact function by >> removing some fraction of the possible values, which will provide a >> limited range for the exact percentile you want, and then walk through >> the files again calculating the function more exactly within the >> limited range, repeating until you have the value to the desired >> precision. >> >> ~Brett >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/f1723fc7/attachment.html> From josef.pktd at gmail.com Tue Jan 24 23:40:02 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 24 Jan 2012 23:40:02 -0500 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> <1327447295.6882.139.camel@MOSES.grc.nasa.gov> <CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com> Message-ID: <CAMMTP+DB=y5TJrigfZEkv97m96EQbJr+s-V0k9NtqEJEQWoJjw@mail.gmail.com> On Tue, Jan 24, 2012 at 7:21 PM, eat <e.antero.tammi at gmail.com> wrote: > Hi > > On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina < > Kathleen.M.Tacina at nasa.gov> wrote: > >> ** >> I found something similar, with a very simple example. >> >> On 64-bit linux, python 2.7.2, numpy development version: >> >> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) >> >> In [23]: a.mean() >> Out[23]: 4034.16357421875 >> >> In [24]: np.version.full_version >> Out[24]: '2.0.0.dev-55472ca' >> >> >> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: >> >>>a = np.ones((1024,1024),dtype=np.float32) >> >>>a.mean() >> 4000.0 >> >>>np.version.full_version >> '1.6.1' >> > This indeed looks very nasty, regardless of whether it is a version or > platform related problem. > Looks like platform specific, same result as -eat Windows 7, Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 >>> a = np.ones((1024,1024),dtype=np.float32) >>> a.mean() 1.0 >>> (4000*a).dtype dtype('float32') >>> (4000*a).mean() 4000.0 >>> b = np.load("data.npy") >>> b.mean() 3045.7471999999998 >>> b.shape (1000, 1000) >>> b.mean(0).mean(0) 3045.7472499999999 >>> _.dtype dtype('float64') >>> b.dtype dtype('float32') >>> b.mean(dtype=np.float32) 3045.7471999999998 Josef > > -eat > >> >> >> >> On Tue, 2012-01-24 at 17:12 -0600, eat wrote: >> >> Hi, >> >> >> >> Oddly, but numpy 1.6 seems to behave more consistent manner: >> >> >> >> In []: sys.version >> >> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit >> (Intel)]' >> >> In []: np.version.version >> >> Out[]: '1.6.0' >> >> >> >> In []: d= np.load('data.npy') >> >> In []: d.dtype >> >> Out[]: dtype('float32') >> >> >> >> In []: d.mean() >> >> Out[]: 3045.7471999999998 >> >> In []: d.mean(dtype= np.float32) >> >> Out[]: 3045.7471999999998 >> >> In []: d.mean(dtype= np.float64) >> >> Out[]: 3045.747251076416 >> >> In []: (d- d.min()).mean()+ d.min() >> >> Out[]: 3045.7472508750002 >> >> In []: d.mean(axis= 0).mean() >> >> Out[]: 3045.7472499999999 >> >> In []: d.mean(axis= 1).mean() >> >> Out[]: 3045.7472499999999 >> >> >> >> Or does the results of calculations depend more on the platform? >> >> >> >> >> >> My 2 cents, >> >> eat >> >> -- >> -------------------------------------------------- >> Kathleen M. Tacina >> NASA Glenn Research Center >> MS 5-10 >> 21000 Brookpark Road >> Cleveland, OH 44135 >> Telephone: (216) 433-6660 >> Fax: (216) 433-5802 >> -------------------------------------------------- >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/8df4f297/attachment.html> From charlesr.harris at gmail.com Wed Jan 25 00:03:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Jan 2012 22:03:49 -0700 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <1327447295.6882.139.camel@MOSES.grc.nasa.gov> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> <1327447295.6882.139.camel@MOSES.grc.nasa.gov> Message-ID: <CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com> On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina < Kathleen.M.Tacina at nasa.gov> wrote: > ** > I found something similar, with a very simple example. > > On 64-bit linux, python 2.7.2, numpy development version: > > In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) > > In [23]: a.mean() > Out[23]: 4034.16357421875 > > In [24]: np.version.full_version > Out[24]: '2.0.0.dev-55472ca' > > > But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: > >>>a = np.ones((1024,1024),dtype=np.float32) > >>>a.mean() > 4000.0 > >>>np.version.full_version > '1.6.1' > > > Yes, the results are platform/compiler dependent. The 32 bit platforms tend to use extended precision accumulators and the x87 instruction set. The 64 bit platforms tend to use sse2+. Different precisions, even though you might think they are the same. <snip> Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/c3c92e65/attachment.html> From josef.pktd at gmail.com Wed Jan 25 00:16:55 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Jan 2012 00:16:55 -0500 Subject: [Numpy-discussion] bug in numpy.mean() ? In-Reply-To: <CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com> References: <jfmthq$jhh$1@dough.gmane.org> <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> <jfn1ug$kve$1@dough.gmane.org> <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com> <1327447295.6882.139.camel@MOSES.grc.nasa.gov> <CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com> Message-ID: <CAMMTP+CRw7cqTzLxhvFTQNk_fST2Sj6mCTq_TbdNvswnQ8Gj9Q@mail.gmail.com> On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris <charlesr.harris at gmail.com> wrote: > > > On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina > <Kathleen.M.Tacina at nasa.gov> wrote: >> >> I found something similar, with a very simple example. >> >> On 64-bit linux, python 2.7.2, numpy development version: >> >> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) >> >> In [23]: a.mean() >> Out[23]: 4034.16357421875 >> >> In [24]: np.version.full_version >> Out[24]: '2.0.0.dev-55472ca' >> >> >> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: >> >>>a = np.ones((1024,1024),dtype=np.float32) >> >>>a.mean() >> 4000.0 >> >>>np.version.full_version >> '1.6.1' >> >> > > Yes, the results are platform/compiler dependent. The 32 bit platforms tend > to use extended precision accumulators and the x87 instruction set. The 64 > bit platforms tend to use sse2+. Different precisions, even though you might > think they are the same. just to confirm, same computer as before but the python 3.2 version is 64 bit, now I get the "Linux" result Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32 >>> import numpy as np >>> np.__version__ '1.5.1' >>> a = 4000*np.ones((1024,1024),dtype=np.float32) >>> a.mean() 4034.16357421875 >>> a.mean(0).mean(0) 4000.0 >>> a.mean(dtype=np.float64) 4000.0 Josef > > <snip> > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Wed Jan 25 04:10:22 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 25 Jan 2012 10:10:22 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120124223032.GG31456@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage> <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> <20120124180244.GD31456@ravage> <20120124223032.GG31456@ravage> Message-ID: <4F1FC6FE.3040906@molden.no> On 24.01.2012 23:30, David Warde-Farley wrote: > I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap > is using an int for a counter variable where it should be using an npy_intp. > > I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a > regression test. That is great :) Now we just need to fix mtrand.pyx and all this will be gone. Sturla From charlesr.harris at gmail.com Wed Jan 25 07:40:23 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 25 Jan 2012 05:40:23 -0700 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120124223032.GG31456@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage> <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com> <20120124180244.GD31456@ravage> <20120124223032.GG31456@ravage> Message-ID: <CAB6mnxJA_zW8KOcLv0U-L=Z-qDFx-oTba3DhTXnstFJW8Fasnw@mail.gmail.com> On Tue, Jan 24, 2012 at 3:30 PM, David Warde-Farley < wardefar at iro.umontreal.ca> wrote: > On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: > > On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: > > > > > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. > > > > Alright, so that rules out platform specific effects. > > > > I'll try and hunt the bug down when I have some time, if someone more > > familiar with the indexing code doesn't beat me to it. > > I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap > is using an int for a counter variable where it should be using an > npy_intp. > > I've filed a pull request at https://github.com/numpy/numpy/pull/188 with > a > regression test. > > I think this bug, or one like it, was reported a couple of years ago. But I don't recall if there was ever a ticket opened. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/a7e6d505/attachment.html> From edcjones at comcast.net Wed Jan 25 10:12:23 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Wed, 25 Jan 2012 10:12:23 -0500 Subject: [Numpy-discussion] Permuting sparse arrays Message-ID: <4F201BD7.7000301@comcast.net> I have a vector of bits where there are many more zeros than one. I store the array as a sorted list of the indexes where the bit is one. If the bit array is (0, 1, 0, 0, 0, 1, 1), it is stored as (1, 5, 6). If the bit array, b, has length n, and p is a random permutation of arange(n), then I can permute the bit array using fancy indexing: b[p]. Is there some neat trick I can use to permute an array while leaving it in the list-of-indexes form? Currently I am doing it with a Python loop but I am looking for a faster way. From robert.kern at gmail.com Wed Jan 25 10:23:51 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 25 Jan 2012 15:23:51 +0000 Subject: [Numpy-discussion] Permuting sparse arrays In-Reply-To: <4F201BD7.7000301@comcast.net> References: <4F201BD7.7000301@comcast.net> Message-ID: <CAF6FJiu7bxd-5-5eY5P4qBFPhyFjHAO99+CA5P-dBRLvjw4k9w@mail.gmail.com> On Wed, Jan 25, 2012 at 15:12, Edward C. Jones <edcjones at comcast.net> wrote: > I have a vector of bits where there are many more zeros than one. ?I > store the array as a sorted list of the indexes where the bit is one. > If the bit array is (0, 1, 0, 0, 0, 1, 1), it is stored as (1, 5, 6). > If the bit array, b, has length n, and p is a random permutation of > arange(n), then I can permute the bit array using fancy indexing: b[p]. > Is there some neat trick I can use to permute an array while leaving it > in the list-of-indexes form? ?Currently I am doing it with a Python loop > but I am looking for a faster way. Use argsort() to get the "inverse" of the permutation. Then fancy-index the inverse with the list-of-indexes array. [~/scratch] |28> b array([0, 1, 0, 0, 0, 1, 1]) [~/scratch] |29> loi array([1, 5, 6]) [~/scratch] |30> p = np.random.permutation(len(b)) [~/scratch] |31> ps = p.argsort() [~/scratch] |41> p array([2, 3, 5, 4, 6, 1, 0]) [~/scratch] |42> ps array([6, 5, 0, 1, 3, 2, 4]) [~/scratch] |43> ps[loi] array([5, 2, 4]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mmueller at python-academy.de Wed Jan 25 10:47:07 2012 From: mmueller at python-academy.de (=?UTF-8?B?TWlrZSBNw7xsbGVy?=) Date: Wed, 25 Jan 2012 16:47:07 +0100 Subject: [Numpy-discussion] Matplotlib and optimization tutorials at PyCon US Message-ID: <4F2023FB.7000108@python-academy.de> Hi, I will be giving a matplotlib and a optimization tutorial at PyCon in March. The first tutorial is a compact introduction to matplotlib. The optimization tutorial gives an overview over this topic. BTW, the early bird deadline is today. Mike Plotting with matplotlib ------------------------ Instructor: Mike M?ller Type:Tutorial Audience level:Novice Category:Useful libraries March 8th 9 a.m. ? 12:20 p.m. https://us.pycon.org/2012/schedule/presentation/238/ When it comes to plotting with Python many people think about matplotlib. It is widely used and provides a simple interface for creating a wide variety of plots from very simple diagrams to sophisticated animations. This tutorial is a hands-on introduction that teaches the basics of matplotlib. Students will learn how to create publication-ready plots with just a few lines of Python. Faster Python Programs through Optimization ------------------------------------------- Instructor: Mike M?ller Type:Tutorial Audience level:Experienced Category:Best Practices/Patterns March 7th 9 a.m. ? 12:20 p.m. https://us.pycon.org/2012/schedule/presentation/245/ This tutorial provides an overview of techniques to improve the performance of Python programs. The focus is on concepts such as profiling, difference of data structures and algorithms as well as a selection of tools and libraries that help to speed up Python. From emayssat at gmail.com Wed Jan 25 13:10:27 2012 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Wed, 25 Jan 2012 10:10:27 -0800 Subject: [Numpy-discussion] array metadata Message-ID: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> Is there a way to store metadata for an array? For example, date the samples were collected, name of the operator, etc. Regards, -- Emmanuel From kalatsky at gmail.com Wed Jan 25 13:47:28 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Wed, 25 Jan 2012 12:47:28 -0600 Subject: [Numpy-discussion] array metadata In-Reply-To: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> Message-ID: <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com> I believe there are no provisions made for that in ndarray. But you can subclass ndarray. Val On Wed, Jan 25, 2012 at 12:10 PM, Emmanuel Mayssat <emayssat at gmail.com>wrote: > Is there a way to store metadata for an array? > For example, date the samples were collected, name of the operator, etc. > > Regards, > -- > Emmanuel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/b0d61ae1/attachment.html> From chris.barker at noaa.gov Wed Jan 25 14:27:10 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 25 Jan 2012 11:27:10 -0800 Subject: [Numpy-discussion] view of a structured array? Message-ID: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com> HI folks, Is there a way to get a view of a subset of a structured array? I know that an arbitrary subset will not fit into the numpy "strides"offsets" model, but some will, and it would be nice to have a view: For example, here we have a stuctured array: In [56]: a Out[56]: array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) if I pull out one "field" a get a view: In [57]: b = a['f1'] In [58]: b[0] = 1000 In [59]: a Out[59]: array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) However, if I pull out more than one field, I get a copy: In [60]: b = a[['f1','f2']] In [61]: b Out[61]: array([(1000.0, 3.0), (8.0, 9.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)], dtype=[('f1', '<f8'), ('f2', '<f8')]) In [62]: b[1] = (2000,3000) In [63]: b Out[63]: array([(1000.0, 3.0), (2000.0, 3000.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)], dtype=[('f1', '<f8'), ('f2', '<f8')]) In [64]: a Out[64]: array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) However, in this case, the two fields are contiguous, and thus I'm pretty sure one could build a numpy array that was a view. Is there any way to do so? Ideally without manipulating the strides by hand, but I may want to do that if it's the only way. -Chris -- -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Wed Jan 25 14:33:34 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Jan 2012 14:33:34 -0500 Subject: [Numpy-discussion] view of a structured array? In-Reply-To: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com> References: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com> Message-ID: <CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com> On Wed, Jan 25, 2012 at 2:27 PM, Chris Barker <chris.barker at noaa.gov> wrote: > HI folks, > > Is there a way to get a view of a subset of a structured array? I know > that an arbitrary subset will not fit into the numpy "strides"offsets" > model, but some will, and it would be nice to have a view: > > For example, here we have a stuctured array: > > In [56]: a > Out[56]: > array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), > ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], > ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) > > > if I pull out one "field" a get a view: > > In [57]: b = a['f1'] > > In [58]: b[0] = 1000 > > In [59]: a > Out[59]: > array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), > ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], > ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) > > However, if I pull out more than one field, I get a copy: > > In [60]: b = a[['f1','f2']] > > In [61]: b > Out[61]: > array([(1000.0, 3.0), (8.0, 9.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)], > ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')]) > > In [62]: b[1] = (2000,3000) > > In [63]: b > Out[63]: > array([(1000.0, 3.0), (2000.0, 3000.0), (123.4, 7.0), (10.0, 11.0), > ? ? ? (14.0, 15.0)], > ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')]) > > In [64]: a > Out[64]: > array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8), > ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], > ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) > > > However, in this case, the two fields are contiguous, and thus I'm > pretty sure one could build a numpy array that was a view. Is there > any way to do so? Ideally without manipulating the strides by hand, > but I may want to do that if it's the only way. > > -Chris > that's what I would try: >>> b = a.view(dtype=[('i', '<i4'), ('fl',[('f1', '<f8'), ('f2', '<f8')]), ('i2', '<i4')]) >>> b['fl'] array([(2.0, 3.0), (8.0, 9.0), (123.40000000000001, 7.0), (10.0, 11.0), (14.0, 15.0)], dtype=[('f1', '<f8'), ('f2', '<f8')]) >>> b['fl'][2]= (200, 500) >>> a array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 200.0, 500.0, 8), (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) Josef > > > > -- > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice > 7600 Sand Point Way NE ??(206) 526-6329?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From Kathleen.M.Tacina at nasa.gov Wed Jan 25 15:30:39 2012 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Wed, 25 Jan 2012 20:30:39 +0000 Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type In-Reply-To: <CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail .com> References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> <CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail.com> Message-ID: <1327523439.6882.249.camel@MOSES.grc.nasa.gov> Thanks! It was interesting to see why that happened. Kathy On Tue, 2012-01-24 at 18:56 -0600, Mark Wiebe wrote: > On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina > <Kathleen.M.Tacina at nasa.gov> wrote: > > I was experimenting with np.min_scalar_type to make sure it > worked as expected, and found some unexpected results for > integers between 2**63 and 2**64-1. I would have expected > np.min_scalar_type(2**64-1) to return uint64. Instead, I get > object. Further experimenting showed that the largest integer > for which np.min_scalar_type will return uint64 is 2**63-1. > Is this expected behavior? > > > > This is a bug in how numpy detects the dtype of python objects. > > > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.c#L18 > > > You can see there it's only checking for a signed long long, not > accounting for the unsigned case. I created a ticket for you here: > > > http://projects.scipy.org/numpy/ticket/2028 > > > -Mark > > > On python 2.7.2 on a 64-bit linux machine: > >>> import numpy as np > >>> np.version.full_version > '2.0.0.dev-55472ca' > >>> np.min_scalar_type(2**8-1) > dtype('uint8') > >>> np.min_scalar_type(2**16-1) > dtype('uint16') > >>> np.min_scalar_type(2**32-1) > dtype('uint32') > >>> np.min_scalar_type(2**64-1) > dtype('O') > >>> np.min_scalar_type(2**63-1) > dtype('uint64') > >>> np.min_scalar_type(2**63) > dtype('O') > > I get the same results on a Windows XP machine running python > 2.7.2 and numpy 1.6.1. > > Kathy > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/961cc37a/attachment.html> From chris.barker at noaa.gov Wed Jan 25 18:19:33 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 25 Jan 2012 15:19:33 -0800 Subject: [Numpy-discussion] view of a structured array? In-Reply-To: <CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com> References: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com> <CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com> Message-ID: <CALGmxELqtthmQLNDFrKu+kXMmuPFCn8HvtmPN3cKt8EsNaYB1w@mail.gmail.com> On Wed, Jan 25, 2012 at 11:33 AM, <josef.pktd at gmail.com> wrote: > that's what I would try: > >>>> b = a.view(dtype=[('i', '<i4'), ('fl',[('f1', '<f8'), ('f2', '<f8')]), ('i2', '<i4')]) ah yes, I forgot about nesting dtypes -- very nice, thanks! -Chris >>>> b['fl'] > array([(2.0, 3.0), (8.0, 9.0), (123.40000000000001, 7.0), (10.0, 11.0), > ? ? ? (14.0, 15.0)], > ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')]) >>>> b['fl'][2]= (200, 500) >>>> a > array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 200.0, 500.0, 8), > ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)], > ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')]) > > Josef >> >> >> >> -- >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice >> 7600 Sand Point Way NE ??(206) 526-6329?? fax >> Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From ondrej.certik at gmail.com Wed Jan 25 21:56:31 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Wed, 25 Jan 2012 18:56:31 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> <CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com> Message-ID: <CADDwiVA71zL1WMjX-89_3vaWye9dXTVzDKr2ix+2DoKaM1c6bg@mail.gmail.com> On Tue, Jan 24, 2012 at 4:33 PM, Mark Wiebe <mwwiebe at gmail.com> wrote: > 2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com> >> >> <snip> >> >> >> Let me know if you figure out something. I think the "mask" thing is >> quite slow, but the problem is that it needs to be there, to catch >> overflows (and it is there in Fortran as well, see the >> "where" statement, which does the same thing). Maybe there is some >> other way to write the same thing in NumPy? > > > In the current master, you can replace > > ? ? z[mask] *= z[mask] > ? ? z[mask] += c[mask] > with > ? ? np.multiply(z, z, out=z, where=mask) > ? ? np.add(z, c, out=z, where=mask) I am getting: Traceback (most recent call last): File "b.py", line 19, in <module> np.multiply(z, z, out=z, where=mask) TypeError: 'where' is an invalid keyword to ufunc 'multiply' I assume it is a new feature in numpy? > > The performance of this alternate syntax is still not great, but it is > significantly faster than what it replaces. For a particular choice of mask, > I get > > In [40]: timeit z[mask] *= z[mask] > > 10 loops, best of 3: 29.1 ms per loop > > In [41]: timeit np.multiply(z, z, out=z, where=mask) > > 100 loops, best of 3: 4.2 ms per loop That looks like 7x faster to me. If it works for you, can you run the mandelbrot example with and without your patch? That way we'll know the actual speedup. Ondrej From sturla at molden.no Thu Jan 26 04:19:25 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 26 Jan 2012 10:19:25 +0100 Subject: [Numpy-discussion] OT: MS C++ AMP library Message-ID: <4F211A9D.3060108@molden.no> When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad we have Microsoft to screw it up. Congratulations to Redmond: Another C++ API I cannot read, and a scientific compute library I hopefully never have to use. http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx The annoying part is, with this crap there will never be a standard OpenCL DLL in Windows. Sturla Molden From matthieu.brucher at gmail.com Thu Jan 26 04:24:58 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 26 Jan 2012 10:24:58 +0100 Subject: [Numpy-discussion] OT: MS C++ AMP library In-Reply-To: <4F211A9D.3060108@molden.no> References: <4F211A9D.3060108@molden.no> Message-ID: <CAHCaCk+6N2o0FHw4V7ghFFgxWZmqjT9R9rGzXUWJ3sNSWSLgjw@mail.gmail.com> Hi Sturla, It has been several months now since AMP is there, I wouldn't care about it. You also forgot about OpenAcc, the accelerator sister of OpenMP, Intel's PBB (with TBB, IPP, ArBB that will soon make a step in Numpy's world), OmpSS, and so many others. I wouldn't blame MS for this, IMHO Intel does a far better job at the moment, and we are only starting consolidation now that everyone has shown its cards. Cheers, Matthieu 2012/1/26 Sturla Molden <sturla at molden.no> > > When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad > we have Microsoft to screw it up. > > Congratulations to Redmond: Another C++ API I cannot read, and a > scientific compute library I hopefully never have to use. > > http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx > > The annoying part is, with this crap there will never be a standard > OpenCL DLL in Windows. > > Sturla Molden > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/54554c73/attachment.html> From paul.anton.letnes at gmail.com Thu Jan 26 07:30:44 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 26 Jan 2012 13:30:44 +0100 Subject: [Numpy-discussion] array metadata In-Reply-To: <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com> References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com> Message-ID: <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com> If by "store" you mean "store on disk", I recommend h5py datasets and attributes. Reportedly pytables is also good but I don't have any first hand experience there. Both python modules use the hdf5 library, written in C/C++/Fortran. Paul On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote: > > I believe there are no provisions made for that in ndarray. > But you can subclass ndarray. > Val > > > On Wed, Jan 25, 2012 at 12:10 PM, Emmanuel Mayssat <emayssat at gmail.com> > wrote: >> >> Is there a way to store metadata for an array? >> For example, date the samples were collected, name of the operator, etc. >> >> Regards, >> -- >> Emmanuel >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From scipy at samueljohn.de Thu Jan 26 08:04:49 2012 From: scipy at samueljohn.de (Samuel John) Date: Thu, 26 Jan 2012 14:04:49 +0100 Subject: [Numpy-discussion] OT: MS C++ AMP library In-Reply-To: <4F211A9D.3060108@molden.no> References: <4F211A9D.3060108@molden.no> Message-ID: <D680AB77-58AB-4A28-A5F8-6192BD1F563D@samueljohn.de> Yes, I agree 100%. On 26.01.2012, at 10:19, Sturla Molden wrote: > When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad > we have Microsoft to screw it up. > > Congratulations to Redmond: Another C++ API I cannot read, and a > scientific compute library I hopefully never have to use. > > http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx > > The annoying part is, with this crap there will never be a standard > OpenCL DLL in Windows. > > Sturla Molden From pierre.haessig at crans.org Thu Jan 26 08:19:18 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 26 Jan 2012 14:19:18 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> Message-ID: <4F2152D6.10303@crans.org> Le 22/01/2012 01:40, josef.pktd at gmail.com a ?crit : > same here, > When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for > two arrays, while R only returns the cross-correlation part. Since I've seen no negative feedback, I jumped to the next step by creating a Trac account and posting a new ticket : http://projects.scipy.org/numpy/ticket/2031 If people feel ok with this proposal, I can try to expand the proposed implementation skeleton to something more serious. But maybe Elliot has already something ready to pull-request on GitHub ? Pierre From derek at astro.physik.uni-goettingen.de Thu Jan 26 08:49:58 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Thu, 26 Jan 2012 14:49:58 +0100 Subject: [Numpy-discussion] array metadata In-Reply-To: <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com> References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com> <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com> Message-ID: <21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de> On 26 Jan 2012, at 13:30, Paul Anton Letnes wrote: > If by "store" you mean "store on disk", I recommend h5py datasets and > attributes. Reportedly pytables is also good but I don't have any > first hand experience there. Both python modules use the hdf5 library, > written in C/C++/Fortran. > > Paul > > On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote: >> >> I believe there are no provisions made for that in ndarray. >> But you can subclass ndarray. > You could probably use structured arrays with string and datetype fields for the metadata and multidimensional fields (i.e. effectively subarrays within the structured array) for the actual data. For file storage, they could probably be directly saved as .npy, if interoperability is not a concern. Otherwise I'd also highly recommend hdf5; with both h5py and pytables allowing quite transparent conversion of structured arrays to datasets in the HDF5, but you also have the option to store other objects, like dictionary elements, within the same data structure. Pytables is generally regarded as having a more database-oriented approach, while h5py appears more straightforward to use from a numerics background (at least in my experience). Cheers, Derek From bsouthey at gmail.com Thu Jan 26 09:57:06 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 26 Jan 2012 08:57:06 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F2152D6.10303@crans.org> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> Message-ID: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> On Thu, Jan 26, 2012 at 7:19 AM, Pierre Haessig <pierre.haessig at crans.org> wrote: > Le 22/01/2012 01:40, josef.pktd at gmail.com a ?crit : >> same here, >> When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for >> two arrays, while R only returns the cross-correlation part. > Since I've seen no negative feedback, I jumped to the next step by > creating a Trac account and posting a new ticket : > > http://projects.scipy.org/numpy/ticket/2031 > > If people feel ok with this proposal, I can try to expand the proposed > implementation skeleton to something more serious. But maybe Elliot has > already something ready to pull-request on GitHub ? > > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Really I do not understand what you want to do especially when the ticket contains some very basic errors. Can you please provide a couple of real examples with expected output that clearly show what you want? >From a statistical viewpoint, np.cov is correct because it outputs the variance/covariance matrix. Also I believe that changing the np.cov function will cause major havoc because numpy and people's code depend on the current behavior. Bruce From pav at iki.fi Thu Jan 26 10:50:17 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 26 Jan 2012 16:50:17 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> Message-ID: <jfrsnq$6i8$1@dough.gmane.org> 26.01.2012 15:57, Bruce Southey kirjoitti: [clip] > Also I believe that changing the np.cov > function will cause major havoc because numpy and people's code depend > on the current behavior. Changing the behavior of `cov` is IMHO not really possible at this point --- the current behavior is not a bug, but a documented feature that has been around probably already since Numeric. However, adding a new function could be possible. -- Pauli Virtanen From pierre.haessig at crans.org Thu Jan 26 11:07:15 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 26 Jan 2012 17:07:15 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> Message-ID: <4F217A33.6020806@crans.org> Le 26/01/2012 15:57, Bruce Southey a ?crit : > Can you please provide a > couple of real examples with expected output that clearly show what > you want? > Hi Bruce, Thanks for your ticket feedback ! It's precisely because I see a big potential impact of the proposed change that I send first a ML message, second a ticket before jumping to a pull-request like a Sergio Leone's cowboy (sorry, I watched "for a few dollars more" last weekend...) Now, I realize that in the ticket writing I made the wrong trade-off between conciseness and accuracy which led to some of the errors you raised. Let me try to use your example to try to share what I have in mind. > >> X = array([-2.1, -1. , 4.3]) > >> Y = array([ 3. , 1.1 , 0.12]) Indeed, with today's cov behavior we have a 2x2 array: > >> cov(X,Y) array([[ 11.71 , -4.286 ], [ -4.286 , 2.14413333]]) Now, when I used the word 'concatenation', I wasn't precise enough because I meant assembling X and Y in the sense of 2 vectors of observations from 2 random variables X and Y. This is achieved by concatenate(X,Y) *when properly playing with dimensions* (which I didn't mentioned) : > >> XY = np.concatenate((X[None, :], Y[None, :])) array([[-2.1 , -1. , 4.3 ], [ 3. , 1.1 , 0.12]]) In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". > >> np.cov(XY) array([[ 11.71 , -4.286 ], [ -4.286 , 2.14413333]]) (And indeed, the actual cov Python code does use concatenate() ) Now let me come back to my assertion about this behavior *usefulness*. You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 simple scalars blocks). * diagonal blocks are just cov(X) and cov(Y) (which in this case comes to var(X) and var(Y) when setting ddof to 1) * off diagonal blocks are symetric and are actually the covariance estimate of X, Y observations (from http://en.wikipedia.org/wiki/Covariance) that is : > >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) -4.2860000000000005 The new proposed behaviour for cov is that cov(X,Y) would return : array(-4.2860000000000005) instead of the 2*2 matrix. * This would be in line with the cov(X,Y) mathematical definition, as well as with R behavior. * This would save memory and computing resources. (and therefore help save the planet ;-) ) However, I do understand that the impact for this change may be big. This indeed requires careful reviewing. Pierre From pierre.haessig at crans.org Thu Jan 26 11:25:38 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 26 Jan 2012 17:25:38 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <jfrsnq$6i8$1@dough.gmane.org> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> Message-ID: <4F217E82.2050002@crans.org> Le 26/01/2012 16:50, Pauli Virtanen a ?crit : > the current behavior is not a bug, I completely agree that numpy.cov(m,y) does what it says ! I (and apparently some other people) are only questioning why there is such a behavior ? Indeed, the second variable `y` is presented as "An additional set of variables and observations". This raises for me two different questions : * What is the use case for such an additional set of variables that could just be concatenated to the first set `?m` ? * Or, if indeed this sort of integrated concatenation is useful, why just add one "additional set" and not several "additional sets" like : >>> cov(m, y1, y2, y3, ....) ? But I would understand that numpy responsibility to provide a stable computing API would prevent any change in cov behavior. You have the long term experience to judge that. (I certainly don't ;-) ) However, in the case this change is not possible, I would see this solution : * add and xcov function that does what Elliot and Sturla and I described, because * possibly deprecate the `y` 2nd argument of cov because I feel it brings more definition complication than real programming benefits (but I still find that changing cov would lead to a leaner numpy API which was my motivation for reacting to Elliot's first message) Pierre From sturla at molden.no Thu Jan 26 12:26:32 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 26 Jan 2012 18:26:32 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F217E82.2050002@crans.org> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org> Message-ID: <4F218CC8.9070602@molden.no> Den 26.01.2012 17:25, skrev Pierre Haessig: > However, in the case this change is not possible, I would see this > solution : > * add and xcov function that does what Elliot and Sturla and I > described, because The current np.cov implementation returns the cross-covariance the way it is commonly used in statistics. If MATLAB does not, then that is MATLAB's problem I think. http://www.stat.washington.edu/research/reports/2001/tr391.pdf Sturla From emayssat at gmail.com Thu Jan 26 12:37:23 2012 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Thu, 26 Jan 2012 09:37:23 -0800 Subject: [Numpy-discussion] array metadata In-Reply-To: <21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de> References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com> <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com> <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com> <21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de> Message-ID: <CACB6ZmBjt=iTCvMFnarZ=2pdezrv1UWwBDXWdE-0pk_V+joNbg@mail.gmail.com> subclassing is what I was looking for. Indeed the code is almost available at http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#simple-example-adding-an-extra-attribute-to-ndarray I just created a dictionary variable which I called 'metadata' I had to overload the __repr__ method to print my parameters in the python shell. As far as saving the data on the disk.... let me start a new thread ;-) -- Emmanuel On Thu, Jan 26, 2012 at 5:49 AM, Derek Homeier <derek at astro.physik.uni-goettingen.de> wrote: > On 26 Jan 2012, at 13:30, Paul Anton Letnes wrote: > >> If by "store" you mean "store on disk", I recommend h5py datasets andhttp://docs.scipy.org/doc/numpy/user/basics.subclassing.html >> attributes. Reportedly pytables is also good but I don't have any >> first hand experience there. Both python modules use the hdf5 library, >> written in C/C++/Fortran. >> >> Paul >> >> On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote: >>> >>> I believe there are no provisions made for that in ndarray. >>> But you can subclass ndarray. >> > You could probably use structured arrays with string and datetype fields for the > metadata and multidimensional fields (i.e. effectively subarrays within the > structured array) for the actual data. For file storage, they could probably be directly > saved as .npy, if interoperability is not a concern. Otherwise I'd also highly recommend > hdf5; with both h5py and pytables allowing quite transparent conversion of structured > arrays to datasets in the HDF5, but you also have the option to store other objects, > like dictionary elements, within the same data structure. > Pytables is generally regarded as having a more database-oriented approach, > while h5py appears more straightforward to use from a numerics background > (at least in my experience). > > Cheers, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Thu Jan 26 12:39:07 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 26 Jan 2012 18:39:07 +0100 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <20120124161921.GA31456@ravage> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no> <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> <20120124161921.GA31456@ravage> Message-ID: <4F218FBB.5080301@molden.no> Den 24.01.2012 17:19, skrev David Warde-Farley: > > Hmm. Seeing as the width of a C long is inconsistent, does this imply that > the random number generator will produce different results on different > platforms? If it does, it is a C programming mistake. C code should never depend on the exact size of a long, only it's minimum size. ISO C defines other datatypes if an exact integer size is needed (include stdint.h), but ANSI C used for NumPy does not. Sturla From scipy at samueljohn.de Thu Jan 26 13:01:22 2012 From: scipy at samueljohn.de (Samuel John) Date: Thu, 26 Jan 2012 19:01:22 +0100 Subject: [Numpy-discussion] Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 In-Reply-To: <472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com> References: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org> <472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com> Message-ID: <AD984E1A-D359-4BA0-BDD4-81826454EFC2@samueljohn.de> Hi Hans-Martin! You could try my instructions recently posted to this list http://thread.gmane.org/gmane.comp.python.scientific.devel/15956/ Basically, using llvm-gcc scipy segfaults when scipy.test() (on my system at least). Therefore, I created the homebrew install formula. They work for whatever "which python" you have. But I have tested this for 2.7.2 on MacOS X 10.7.2. Samuel On 11.01.2012, at 16:12, Hans-Martin v. Gaudecker wrote: > I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it). > > In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest. > > Best, > Hans-Martin > > [1] https://github.com/kennethreitz/osx-gcc-installer > [2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy The instructions at [2] lead to a segfault in scipy.test() for me, because it used llvm-gcc (which is the default on Lion). From josef.pktd at gmail.com Thu Jan 26 13:19:11 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Jan 2012 13:19:11 -0500 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F218CC8.9070602@molden.no> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org> <4F218CC8.9070602@molden.no> Message-ID: <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com> On Thu, Jan 26, 2012 at 12:26 PM, Sturla Molden <sturla at molden.no> wrote: > Den 26.01.2012 17:25, skrev Pierre Haessig: >> However, in the case this change is not possible, I would see this >> solution : >> * add and xcov function that does what Elliot and Sturla and I >> described, because > > The current np.cov implementation returns the cross-covariance the way > it is commonly used in statistics. If MATLAB does not, then that is > MATLAB's problem I think. The discussion had this reversed, numpy matches the behavior of MATLAB, while R (statistics) only returns the cross covariance part as proposed. If there is a new xcov, then I think there should also be a xcorrcoef. This case needs a different implementation than corrcoef, since the xcov doesn't contain the variances and they need to be calculated separately. Josef > > http://www.stat.washington.edu/research/reports/2001/tr391.pdf > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Thu Jan 26 13:25:55 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 26 Jan 2012 12:25:55 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F217A33.6020806@crans.org> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <4F217A33.6020806@crans.org> Message-ID: <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig <pierre.haessig at crans.org> wrote: > Le 26/01/2012 15:57, Bruce Southey a ?crit : >> Can you please provide a >> couple of real examples with expected output that clearly show what >> you want? >> > Hi Bruce, > > Thanks for your ticket feedback ! It's precisely because I see a big > potential impact of the proposed change that I send first a ML message, > second a ticket before jumping to a pull-request like a Sergio Leone's > cowboy (sorry, I watched "for a few dollars more" last weekend...) > > Now, I realize that in the ticket writing I made the wrong trade-off > between conciseness and accuracy which led to some of the errors you > raised. Let me try to use your example to try to share what I have in mind. > >> >> X = array([-2.1, -1. , ?4.3]) >> >> Y = array([ 3. ?, ?1.1 , ?0.12]) > > Indeed, with today's cov behavior we have a 2x2 array: >> >> cov(X,Y) > array([[ 11.71 ? ? ?, ?-4.286 ? ? ], > ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) > > Now, when I used the word 'concatenation', I wasn't precise enough > because I meant assembling X and Y in the sense of 2 vectors of > observations from 2 random variables X and Y. > This is achieved by concatenate(X,Y) *when properly playing with > dimensions* (which I didn't mentioned) : >> >> XY = np.concatenate((X[None, :], Y[None, :])) > array([[-2.1 , -1. ?, ?4.3 ], > ? ? ? ?[ 3. ?, ?1.1 , ?0.12]]) In this context, I find stacking, np.vstack((X,Y)), more appropriate than concatenate. > > In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >> >> np.cov(XY) > array([[ 11.71 ? ? ?, ?-4.286 ? ? ], > ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) > Sure the resulting array is the same but whole process is totally different. > (And indeed, the actual cov Python code does use concatenate() ) Yes, but the user does not see that. Whereas you are forcing the user to do the stacking in the correct dimensions. > > > Now let me come back to my assertion about this behavior *usefulness*. > You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 > simple scalars blocks). No there are not '4' blocks just rows and columns. > ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes > to var(X) and var(Y) when setting ddof to 1) Sure but variances are still covariances. > ?* off diagonal blocks are symetric and are actually the covariance > estimate of X, Y observations (from > http://en.wikipedia.org/wiki/Covariance) Sure > > that is : >> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) > -4.2860000000000005 > > The new proposed behaviour for cov is that cov(X,Y) would return : > array(-4.2860000000000005) ?instead of the 2*2 matrix. But how you interpret an 2D array where the rows are greater than 2? >>> Z=Y+X >>> np.cov(np.vstack((X,Y,Z))) array([[ 11.71 , -4.286 , 7.424 ], [ -4.286 , 2.14413333, -2.14186667], [ 7.424 , -2.14186667, 5.28213333]]) > > ?* This would be in line with the cov(X,Y) mathematical definition, as > well as with R behavior. I don't care what R does because I am using Python and Python is infinitely better than R is! But I think that is only in the 1D case. > ?* This would save memory and computing resources. (and therefore help > save the planet ;-) ) Nothing that you have provided shows that it will. > > However, I do understand that the impact for this change may be big. > This indeed requires careful reviewing. > > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Bruce From josef.pktd at gmail.com Thu Jan 26 13:45:46 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Jan 2012 13:45:46 -0500 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <4F217A33.6020806@crans.org> <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> Message-ID: <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote: > On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig > <pierre.haessig at crans.org> wrote: >> Le 26/01/2012 15:57, Bruce Southey a ?crit : >>> Can you please provide a >>> couple of real examples with expected output that clearly show what >>> you want? >>> >> Hi Bruce, >> >> Thanks for your ticket feedback ! It's precisely because I see a big >> potential impact of the proposed change that I send first a ML message, >> second a ticket before jumping to a pull-request like a Sergio Leone's >> cowboy (sorry, I watched "for a few dollars more" last weekend...) >> >> Now, I realize that in the ticket writing I made the wrong trade-off >> between conciseness and accuracy which led to some of the errors you >> raised. Let me try to use your example to try to share what I have in mind. >> >>> >> X = array([-2.1, -1. , ?4.3]) >>> >> Y = array([ 3. ?, ?1.1 , ?0.12]) >> >> Indeed, with today's cov behavior we have a 2x2 array: >>> >> cov(X,Y) >> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >> >> Now, when I used the word 'concatenation', I wasn't precise enough >> because I meant assembling X and Y in the sense of 2 vectors of >> observations from 2 random variables X and Y. >> This is achieved by concatenate(X,Y) *when properly playing with >> dimensions* (which I didn't mentioned) : >>> >> XY = np.concatenate((X[None, :], Y[None, :])) >> array([[-2.1 , -1. ?, ?4.3 ], >> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]]) > > In this context, I find stacking, ?np.vstack((X,Y)), more appropriate > than concatenate. > >> >> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >>> >> np.cov(XY) >> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >> > Sure the resulting array is the same but whole process is totally different. > > >> (And indeed, the actual cov Python code does use concatenate() ) > Yes, but the user does not see that. Whereas you are forcing the user > to do the stacking in the correct dimensions. > > >> >> >> Now let me come back to my assertion about this behavior *usefulness*. >> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 >> simple scalars blocks). > No there are not '4' blocks just rows and columns. Sturla showed the 4 blocks in his first message. > >> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes >> to var(X) and var(Y) when setting ddof to 1) > Sure but variances are still covariances. > >> ?* off diagonal blocks are symetric and are actually the covariance >> estimate of X, Y observations (from >> http://en.wikipedia.org/wiki/Covariance) > Sure >> >> that is : >>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) >> -4.2860000000000005 >> >> The new proposed behaviour for cov is that cov(X,Y) would return : >> array(-4.2860000000000005) ?instead of the 2*2 matrix. > > But how you interpret an 2D array where the rows are greater than 2? >>>> Z=Y+X >>>> np.cov(np.vstack((X,Y,Z))) > array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ], > ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667], > ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]]) > > >> >> ?* This would be in line with the cov(X,Y) mathematical definition, as >> well as with R behavior. > I don't care what R does because I am using Python and Python is > infinitely better than R is! > > But I think that is only in the 1D case. I just checked R to make sure I remember correctly > xx = matrix((1:20)^2, nrow=4) > xx [,1] [,2] [,3] [,4] [,5] [1,] 1 25 81 169 289 [2,] 4 36 100 196 324 [3,] 9 49 121 225 361 [4,] 16 64 144 256 400 > cov(xx, 2*xx[,1:2]) [,1] [,2] [1,] 86.0000 219.3333 [2,] 219.3333 566.0000 [3,] 352.6667 912.6667 [4,] 486.0000 1259.3333 [5,] 619.3333 1606.0000 > cov(xx) [,1] [,2] [,3] [,4] [,5] [1,] 43.0000 109.6667 176.3333 243.0000 309.6667 [2,] 109.6667 283.0000 456.3333 629.6667 803.0000 [3,] 176.3333 456.3333 736.3333 1016.3333 1296.3333 [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000 > >> ?* This would save memory and computing resources. (and therefore help >> save the planet ;-) ) > Nothing that you have provided shows that it will. I don't know about saving the planet, but if X and Y have the same number of columns, we save 3 quarters of the calculations, as Sturla also explained in his first message. Josef > >> >> However, I do understand that the impact for this change may be big. >> This indeed requires careful reviewing. >> >> Pierre >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Thu Jan 26 15:35:58 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 26 Jan 2012 20:35:58 +0000 Subject: [Numpy-discussion] advanced indexing bug with huge arrays? In-Reply-To: <4F218FBB.5080301@molden.no> References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca> <20120123185552.GA27535@ravage> <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com> <20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu> <4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no> <4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no> <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com> <20120124161921.GA31456@ravage> <4F218FBB.5080301@molden.no> Message-ID: <CAF6FJivkKg=aTO5ELPLFbvX5kzd4Efr1BS5sMe4YdGyfnsQ3Eg@mail.gmail.com> On Thu, Jan 26, 2012 at 17:39, Sturla Molden <sturla at molden.no> wrote: > Den 24.01.2012 17:19, skrev David Warde-Farley: >> >> Hmm. Seeing as the width of a C long is inconsistent, does this imply that >> the random number generator will produce different results on different >> platforms? > > If it does, it is a C programming mistake. C code should never depend on > the exact size of a long, only it's minimum size. ?ISO C defines other > datatypes if an exact integer size is needed (include stdint.h), but > ANSI C used for NumPy does not. I think you're subtly misunderstanding his question. He's not asking if the code is written such that it semantically requires a long to have one specific size or another (and indeed, it is not). However, it is true that the code may behave differently for the same inputs on different machines with different long sizes. Namely, some part of the computation may overflow on 32-bit longs while giving an accurate answer with 64-bit longs. They just have different domains of accuracy over their inputs. It is not necessarily a mistake to take advantage of the extra room when it is available. That is the reason that Python ints are C longs and why numpy follows suit. But unfortunately, it is true that at least some of the distributions do have different behavior when using 64-bit longs than when using 32-bit longs. Here is an example of drawing from a binomial distribution with a large N on a 32-bit process and comparing it with results from a 64-bit process: [~]$ $PYBIN/python Enthought Python Distribution -- www.enthought.com Version: 7.1-2 (32-bit) Python 2.7.2 |EPD 7.1-2 (32-bit)| (default, Jul 27 2011, 13:29:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "packages", "demo" or "enthought" for more information. >>> import numpy as np >>> prng = np.random.RandomState(1234567890) >>> x32 = prng.binomial(500000000, 0.5, size=10000) >>> x64 = np.load('x64.npy') >>> np.save('x32.npy', x32) >>> bad = (x64 != x32) >>> bad.sum() 9449 >>> bad[-9449:] array([ True, True, True, ..., True, True, True], dtype=bool) >>> bad[-9449:].sum() 9449 binomial() is using a rejection algorithm for this set of inputs. For each draw, it is going to generate random numbers until they meet a certain condition. Both the 32-bit process and the 64-bit process draw the same exact numbers until the 552nd draw. Then, I suspect, there is an integer overflow in the 32-bit process causing the rejection algorithm to terminate either earlier or later than it otherwise should. Since the two processes have consumed different amounts of random numbers, the underlying uniform PRNG is no longer in the same state, so all of the numbers thereafter will be different. It's not clear to me how problematic this is. I haven't seen any difference when using reasonable input values (N=500000000 is a ridiculous number to be using with a binomial distribution). If I'm right that there is an overflow when using the 32-bit longs, then the results should not be trusted anyways, so there is no point in comparing them to the 64-bit results. It's just that the domain of validity with a 32-bit long is a bit smaller than when using a 64-bit long. The deviation of x32[551] from the mean is larger than the maximum deviation from the 64-bit results, so it is reasonably likely that the draw is just bogus. >>> np.max(abs(x64 - 250000000)) 44519 >>> x32[551] - 250000000 47368 Often, the acceptance criterion is something of the form (X < something) while expecting X to be positive. An integer overflow would introduce a negative value somewhere in the computation and could easily "pass" this acceptance criterion when it really shouldn't have if the intermediate computations had been done without overflow. If anyone wants to debug this more thoroughly, this bit of code will get the PRNG into exactly the right state to see the difference on the next binomial() draw: >>> import numpy as np >>> prng = np.random.RandomState(1234567890) >>> blah = prng.binomial(500000000, 0.5, size=551) If you run python under gdb, you can then set a breakpoint in rk_binomial_btpe() in distributions.c to step through the next call to prng.binomial(). Sometimes you can fix these issues in a rejection algorithm by checking for overflow and rejecting those cases. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From bsouthey at gmail.com Thu Jan 26 15:58:45 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 26 Jan 2012 14:58:45 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <4F217A33.6020806@crans.org> <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com> Message-ID: <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com> On Thu, Jan 26, 2012 at 12:45 PM, <josef.pktd at gmail.com> wrote: > On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote: >> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig >> <pierre.haessig at crans.org> wrote: >>> Le 26/01/2012 15:57, Bruce Southey a ?crit : >>>> Can you please provide a >>>> couple of real examples with expected output that clearly show what >>>> you want? >>>> >>> Hi Bruce, >>> >>> Thanks for your ticket feedback ! It's precisely because I see a big >>> potential impact of the proposed change that I send first a ML message, >>> second a ticket before jumping to a pull-request like a Sergio Leone's >>> cowboy (sorry, I watched "for a few dollars more" last weekend...) >>> >>> Now, I realize that in the ticket writing I made the wrong trade-off >>> between conciseness and accuracy which led to some of the errors you >>> raised. Let me try to use your example to try to share what I have in mind. >>> >>>> >> X = array([-2.1, -1. , ?4.3]) >>>> >> Y = array([ 3. ?, ?1.1 , ?0.12]) >>> >>> Indeed, with today's cov behavior we have a 2x2 array: >>>> >> cov(X,Y) >>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>> >>> Now, when I used the word 'concatenation', I wasn't precise enough >>> because I meant assembling X and Y in the sense of 2 vectors of >>> observations from 2 random variables X and Y. >>> This is achieved by concatenate(X,Y) *when properly playing with >>> dimensions* (which I didn't mentioned) : >>>> >> XY = np.concatenate((X[None, :], Y[None, :])) >>> array([[-2.1 , -1. ?, ?4.3 ], >>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]]) >> >> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate >> than concatenate. >> >>> >>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >>>> >> np.cov(XY) >>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>> >> Sure the resulting array is the same but whole process is totally different. >> >> >>> (And indeed, the actual cov Python code does use concatenate() ) >> Yes, but the user does not see that. Whereas you are forcing the user >> to do the stacking in the correct dimensions. >> >> >>> >>> >>> Now let me come back to my assertion about this behavior *usefulness*. >>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 >>> simple scalars blocks). >> No there are not '4' blocks just rows and columns. > > Sturla showed the 4 blocks in his first message. > Well, I could not follow that because the code is wrong. X = np.array([-2.1, -1. , 4.3]) >>> cX = X - X.mean(axis=0)[np.newaxis,:] Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> cX = X - X.mean(axis=0)[np.newaxis,:] IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index whoops! Anyhow, variance-covariance matrix is symmetric but numpy or scipy lacks lapac's symmetrix matrix (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html) >> >>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes >>> to var(X) and var(Y) when setting ddof to 1) >> Sure but variances are still covariances. >> >>> ?* off diagonal blocks are symetric and are actually the covariance >>> estimate of X, Y observations (from >>> http://en.wikipedia.org/wiki/Covariance) >> Sure >>> >>> that is : >>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) >>> -4.2860000000000005 >>> >>> The new proposed behaviour for cov is that cov(X,Y) would return : >>> array(-4.2860000000000005) ?instead of the 2*2 matrix. >> >> But how you interpret an 2D array where the rows are greater than 2? >>>>> Z=Y+X >>>>> np.cov(np.vstack((X,Y,Z))) >> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ], >> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667], >> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]]) >> >> >>> >>> ?* This would be in line with the cov(X,Y) mathematical definition, as >>> well as with R behavior. >> I don't care what R does because I am using Python and Python is >> infinitely better than R is! >> >> But I think that is only in the 1D case. > > I just checked R to make sure I remember correctly > >> xx = matrix((1:20)^2, nrow=4) >> xx > ? ? [,1] [,2] [,3] [,4] [,5] > [1,] ? ?1 ? 25 ? 81 ?169 ?289 > [2,] ? ?4 ? 36 ?100 ?196 ?324 > [3,] ? ?9 ? 49 ?121 ?225 ?361 > [4,] ? 16 ? 64 ?144 ?256 ?400 >> cov(xx, 2*xx[,1:2]) > ? ? ? ? [,1] ? ? ?[,2] > [1,] ?86.0000 ?219.3333 > [2,] 219.3333 ?566.0000 > [3,] 352.6667 ?912.6667 > [4,] 486.0000 1259.3333 > [5,] 619.3333 1606.0000 >> cov(xx) > ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5] > [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667 > [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000 > [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333 > [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 > [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000 > > >> >>> ?* This would save memory and computing resources. (and therefore help >>> save the planet ;-) ) >> Nothing that you have provided shows that it will. > > I don't know about saving the planet, but if X and Y have the same > number of columns, we save 3 quarters of the calculations, as Sturla > also explained in his first message. > Can not figure those savings: For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%) a 3 by 3 output has 6 covariances a 5 by 5 output 15 covariances If you want to save memory and calculation then use symmetric storage and associated methods. Bruce > Josef > >> >>> >>> However, I do understand that the impact for this change may be big. >>> This indeed requires careful reviewing. >>> >>> Pierre >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> Bruce >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hmgaudecker at gmail.com Thu Jan 26 16:12:58 2012 From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker) Date: Thu, 26 Jan 2012 22:12:58 +0100 Subject: [Numpy-discussion] Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2 (Samuel John) In-Reply-To: <mailman.7386.1327611481.1086.numpy-discussion@scipy.org> References: <mailman.7386.1327611481.1086.numpy-discussion@scipy.org> Message-ID: <96DDFBB9-8332-4DE3-A2C2-AF93EE914346@gmail.com> Hi Samuel, I realised that a couple of days ago as well? Same on Python 2.7.2 (full output from both below FWIW). I usually only need a minimal subset of SciPy, so still hoping it's only in places I don't need it. Else I shall be happy to come back to your formulas, thanks for making them! Best, Hans-Martin python Python 2.7.2 (default, Jan 11 2012, 16:23:50) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import scipy sci>>> scipy.test(verbose=10) Running unit tests for scipy NumPy version 2.0.0.dev-55472ca NumPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy SciPy version 0.11.0.dev-600e81f SciPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy Python version 2.7.2 (default, Jan 11 2012, 16:23:50) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] nose version 1.1.2 nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/fftpack/convolve.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/integrate/vode.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/interpolate/dfitpack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/interpolate/interpnd.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/streams.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/blas/cblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/blas/fblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/atlas_version.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/clapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/flapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/atlas_version.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/calc_lwork.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/cblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/clapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/fblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/flapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/optimize/minpack2.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/sigtools.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/spectral.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/spline.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/spatial/ckdtree.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/spatial/qhull.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/lambertw.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/orthogonal_eval.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/specfun.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/futil.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/mvn.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/statlib.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/vonmises_cython.so is executable; skipped Tests cophenet(Z) on tdist data set. ... ok Tests cophenet(Z, Y) on tdist data set. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok Tests correspond(Z, y) with empty linkage and condensed distance matrix. ... ok Tests num_obs_linkage with observation matrices of multiple sizes. ... ok Tests fcluster(Z, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok Tests fcluster(Z, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok Tests fcluster(Z, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok Tests from_mlab_linkage on empty linkage array. ... ok Tests from_mlab_linkage on linkage array with multiple rows. ... ok Tests from_mlab_linkage on linkage array with single row. ... ok Tests inconsistency matrix calculation (depth=1) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=2) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=3) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=4) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=1, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=2, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=3, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=4, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=1) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=2) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=3) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=4) on a single linkage. ... ok Tests is_isomorphic on test case #1 (one flat cluster, different labellings) ... ok Tests is_isomorphic on test case #2 (two flat clusters, different labelings) ... ok Tests is_isomorphic on test case #3 (no flat clusters) ... ok Tests is_isomorphic on test case #4A (3 flat clusters, different labelings, isomorphic) ... ok Tests is_isomorphic on test case #4B (3 flat clusters, different labelings, nonisomorphic) ... ok Tests is_isomorphic on test case #4C (3 flat clusters, different labelings, isomorphic) ... ok Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling, slightly non-isomorphic.) Run 3 times. ... ok Tests is_monotonic(Z) on 1x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on 2x4 linkage. Expecting False. ... ok Tests is_monotonic(Z) on 2x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 1). Expecting False. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 2). Expecting False. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 3). Expecting False ... ok Tests is_monotonic(Z) on 3x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on an empty linkage. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on Iris data set. Expecting True. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Expecting True. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Perturbing. Expecting False. ... ok Tests is_valid_im(R) on im over 2 observations. ... ok Tests is_valid_im(R) on im over 3 observations. ... ok Tests is_valid_im(R) with 3 columns. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3). ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link counts. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height means. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height standard deviations. ... ok Tests is_valid_im(R) with 5 columns. ... ok Tests is_valid_im(R) with empty inconsistency matrix. ... ok Tests is_valid_im(R) with integer type. ... ok Tests is_valid_linkage(Z) on linkage over 2 observations. ... ok Tests is_valid_linkage(Z) on linkage over 3 observations. ... ok Tests is_valid_linkage(Z) with 3 columns. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative counts. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative distances. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (left). ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (right). ... ok Tests is_valid_linkage(Z) with 5 columns. ... ok Tests is_valid_linkage(Z) with empty linkage. ... ok Tests is_valid_linkage(Z) with integer type. ... ok Tests leaders using a flat clustering generated by single linkage. ... ok Tests leaves_list(Z) on a 1x4 linkage. ... ok Tests leaves_list(Z) on a 2x4 linkage. ... ok Tests leaves_list(Z) on the Iris data set using average linkage. ... ok Tests leaves_list(Z) on the Iris data set using centroid linkage. ... ok Tests leaves_list(Z) on the Iris data set using complete linkage. ... ok Tests leaves_list(Z) on the Iris data set using median linkage. ... ok Tests leaves_list(Z) on the Iris data set using single linkage. ... ok Tests leaves_list(Z) on the Iris data set using ward linkage. ... ok Tests linkage(Y, 'average') on the tdist data set. ... ok Tests linkage(Y, 'centroid') on the Q data set. ... ok Tests linkage(Y, 'complete') on the Q data set. ... ok Tests linkage(Y, 'complete') on the tdist data set. ... ok Tests linkage(Y) where Y is a 0x4 linkage matrix. Exception expected. ... ok Tests linkage(Y, 'single') on the Q data set. ... ok Tests linkage(Y, 'single') on the tdist data set. ... ok Tests linkage(Y, 'weighted') on the Q data set. ... ok Tests linkage(Y, 'weighted') on the tdist data set. ... ok Tests maxdists(Z) on the Q data set using centroid linkage. ... ok Tests maxdists(Z) on the Q data set using complete linkage. ... ok Tests maxdists(Z) on the Q data set using median linkage. ... ok Tests maxdists(Z) on the Q data set using single linkage. ... ok Tests maxdists(Z) on the Q data set using Ward linkage. ... ok Tests maxdists(Z) on empty linkage. Expecting exception. ... ok Tests maxdists(Z) on linkage with one cluster. ... ok Tests maxinconsts(Z, R) on the Q data set using centroid linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using complete linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using median linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using single linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using Ward linkage. ... ok Tests maxinconsts(Z, R) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxinconsts(Z, R) on empty linkage. Expecting exception. ... ok Tests maxinconsts(Z, R) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 0) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 0) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 0) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 0) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 1) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 1) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 1) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 1) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 2) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 2) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 2) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 2) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 3) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 3) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 3) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 3) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 3.3). Expecting exception. ... ok Tests maxRstat(Z, R, -1). Expecting exception. ... ok Tests maxRstat(Z, R, 4). Expecting exception. ... ok Tests num_obs_linkage(Z) on linkage over 2 observations. ... ok Tests num_obs_linkage(Z) on linkage over 3 observations. ... ok Tests num_obs_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok Tests num_obs_linkage(Z) with empty linkage. ... ok Tests to_mlab_linkage on linkage array with multiple rows. ... ok Tests to_mlab_linkage on empty linkage array. ... ok Tests to_mlab_linkage on linkage array with single row. ... ok test_hierarchy.load_testing_files ... ok Ticket #505. ... ok Testing that kmeans2 init methods work. ... ok Testing simple call to kmeans2 with rank 1 data. ... ok Testing simple call to kmeans2 with rank 1 data. ... ok Testing simple call to kmeans2 and its results. ... ok Regression test for #546: fail when k arg is 0. ... ok This will cause kmean to have a cluster with no points. ... ok test_kmeans_simple (test_vq.TestKMean) ... ok test_large_features (test_vq.TestKMean) ... ok test_py_vq (test_vq.TestVq) ... ok test_py_vq2 (test_vq.TestVq) ... ok test_vq (test_vq.TestVq) ... ok Test special rank 1 vq algo, python implementation. ... ok nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/cluster/tests/vq_test.py is executable; skipped test_codata.test_find ... ok test_codata.test_basic_table_parse ... ok test_codata.test_basic_lookup ... ok test_codata.test_find_all ... ok test_codata.test_find_single ... ok test_codata.test_2002_vs_2006 ... ok Check that updating stored values with exact ones worked. ... ok test_constants.test_fahrenheit_to_celcius ... ok test_constants.test_celcius_to_kelvin ... ok test_constants.test_kelvin_to_celcius ... ok test_constants.test_fahrenheit_to_kelvin ... ok test_constants.test_kelvin_to_fahrenheit ... ok test_constants.test_celcius_to_fahrenheit ... ok test_constants.test_lambda_to_nu ... ok test_constants.test_nu_to_lambda ... ok test_definition (test_basic.TestDoubleFFT) ... ok test_djbfft (test_basic.TestDoubleFFT) ... ok test_n_argument_real (test_basic.TestDoubleFFT) ... ok test_definition (test_basic.TestDoubleIFFT) ... FAIL test_definition_real (test_basic.TestDoubleIFFT) ... ok test_djbfft (test_basic.TestDoubleIFFT) ... FAIL test_random_complex (test_basic.TestDoubleIFFT) ... python(30168) malloc: *** error for object 0x104cdce88: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug Abort trap: 6 Python 3.2.2 (default, Jan 11 2012, 16:48:20) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.test(verbose=10) Running unit tests for scipy NumPy version 2.0.0.dev-55472ca NumPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy SciPy version 0.11.0.dev-600e81f SciPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy Python version 3.2.2 (default, Jan 11 2012, 16:48:20) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] nose version 1.1.2 nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/fftpack/convolve.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/integrate/vode.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/interpolate/dfitpack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/interpolate/interpnd.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/mio5_utils.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/mio_utils.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/streams.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/blas/cblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/blas/fblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/atlas_version.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/calc_lwork.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/clapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/flapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/atlas_version.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/calc_lwork.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/cblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/clapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/fblas.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/flapack.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/optimize/minpack2.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/optimize/moduleTNC.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/sigtools.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/spectral.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/spline.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/spatial/ckdtree.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/spatial/qhull.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/lambertw.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/orthogonal_eval.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/specfun.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/futil.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/mvn.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/statlib.so is executable; skipped nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/vonmises_cython.so is executable; skipped Tests cophenet(Z) on tdist data set. ... ok Tests cophenet(Z, Y) on tdist data set. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok Tests correspond(Z, y) with empty linkage and condensed distance matrix. ... ok Tests num_obs_linkage with observation matrices of multiple sizes. ... ok Tests fcluster(Z, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok Tests fcluster(Z, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok Tests fcluster(Z, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok Tests fclusterdata(X, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok Tests from_mlab_linkage on empty linkage array. ... ok Tests from_mlab_linkage on linkage array with multiple rows. ... ok Tests from_mlab_linkage on linkage array with single row. ... ok Tests inconsistency matrix calculation (depth=1) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=2) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=3) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=4) on a complete linkage. ... ok Tests inconsistency matrix calculation (depth=1, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=2, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=3, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=4, dataset=Q) with single linkage. ... ok Tests inconsistency matrix calculation (depth=1) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=2) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=3) on a single linkage. ... ok Tests inconsistency matrix calculation (depth=4) on a single linkage. ... ok Tests is_isomorphic on test case #1 (one flat cluster, different labellings) ... ok Tests is_isomorphic on test case #2 (two flat clusters, different labelings) ... ok Tests is_isomorphic on test case #3 (no flat clusters) ... ok Tests is_isomorphic on test case #4A (3 flat clusters, different labelings, isomorphic) ... ok Tests is_isomorphic on test case #4B (3 flat clusters, different labelings, nonisomorphic) ... ok Tests is_isomorphic on test case #4C (3 flat clusters, different labelings, isomorphic) ... ok Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling). Run 3 times. ... ok Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling, slightly non-isomorphic.) Run 3 times. ... ok Tests is_monotonic(Z) on 1x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on 2x4 linkage. Expecting False. ... ok Tests is_monotonic(Z) on 2x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 1). Expecting False. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 2). Expecting False. ... ok Tests is_monotonic(Z) on 3x4 linkage (case 3). Expecting False ... ok Tests is_monotonic(Z) on 3x4 linkage. Expecting True. ... ok Tests is_monotonic(Z) on an empty linkage. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on Iris data set. Expecting True. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Expecting True. ... ok Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Perturbing. Expecting False. ... ok Tests is_valid_im(R) on im over 2 observations. ... ok Tests is_valid_im(R) on im over 3 observations. ... ok Tests is_valid_im(R) with 3 columns. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3). ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link counts. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height means. ... ok Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height standard deviations. ... ok Tests is_valid_im(R) with 5 columns. ... ok Tests is_valid_im(R) with empty inconsistency matrix. ... ok Tests is_valid_im(R) with integer type. ... ok Tests is_valid_linkage(Z) on linkage over 2 observations. ... ok Tests is_valid_linkage(Z) on linkage over 3 observations. ... ok Tests is_valid_linkage(Z) with 3 columns. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative counts. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative distances. ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (left). ... ok Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (right). ... ok Tests is_valid_linkage(Z) with 5 columns. ... ok Tests is_valid_linkage(Z) with empty linkage. ... ok Tests is_valid_linkage(Z) with integer type. ... ok Tests leaders using a flat clustering generated by single linkage. ... ok Tests leaves_list(Z) on a 1x4 linkage. ... ok Tests leaves_list(Z) on a 2x4 linkage. ... ok Tests leaves_list(Z) on the Iris data set using average linkage. ... ok Tests leaves_list(Z) on the Iris data set using centroid linkage. ... ok Tests leaves_list(Z) on the Iris data set using complete linkage. ... ok Tests leaves_list(Z) on the Iris data set using median linkage. ... ok Tests leaves_list(Z) on the Iris data set using single linkage. ... ok Tests leaves_list(Z) on the Iris data set using ward linkage. ... ok Tests linkage(Y, 'average') on the tdist data set. ... ok Tests linkage(Y, 'centroid') on the Q data set. ... ok Tests linkage(Y, 'complete') on the Q data set. ... ok Tests linkage(Y, 'complete') on the tdist data set. ... ok Tests linkage(Y) where Y is a 0x4 linkage matrix. Exception expected. ... ok Tests linkage(Y, 'single') on the Q data set. ... ok Tests linkage(Y, 'single') on the tdist data set. ... ok Tests linkage(Y, 'weighted') on the Q data set. ... ok Tests linkage(Y, 'weighted') on the tdist data set. ... ok Tests maxdists(Z) on the Q data set using centroid linkage. ... ok Tests maxdists(Z) on the Q data set using complete linkage. ... ok Tests maxdists(Z) on the Q data set using median linkage. ... ok Tests maxdists(Z) on the Q data set using single linkage. ... ok Tests maxdists(Z) on the Q data set using Ward linkage. ... ok Tests maxdists(Z) on empty linkage. Expecting exception. ... ok Tests maxdists(Z) on linkage with one cluster. ... ok Tests maxinconsts(Z, R) on the Q data set using centroid linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using complete linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using median linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using single linkage. ... ok Tests maxinconsts(Z, R) on the Q data set using Ward linkage. ... ok Tests maxinconsts(Z, R) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxinconsts(Z, R) on empty linkage. Expecting exception. ... ok Tests maxinconsts(Z, R) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 0) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 0) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 0) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 0) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 0) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 1) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 1) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 1) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 1) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 1) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 2) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 2) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 2) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 2) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 2) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 3) on the Q data set using centroid linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using complete linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using median linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using single linkage. ... ok Tests maxRstat(Z, R, 3) on the Q data set using Ward linkage. ... ok Tests maxRstat(Z, R, 3) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok Tests maxRstat(Z, R, 3) on empty linkage. Expecting exception. ... ok Tests maxRstat(Z, R, 3) on linkage with one cluster. ... ok Tests maxRstat(Z, R, 3.3). Expecting exception. ... ok Tests maxRstat(Z, R, -1). Expecting exception. ... ok Tests maxRstat(Z, R, 4). Expecting exception. ... ok Tests num_obs_linkage(Z) on linkage over 2 observations. ... ok Tests num_obs_linkage(Z) on linkage over 3 observations. ... ok Tests num_obs_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok Tests num_obs_linkage(Z) with empty linkage. ... ok Tests to_mlab_linkage on linkage array with multiple rows. ... ok Tests to_mlab_linkage on empty linkage array. ... ok Tests to_mlab_linkage on linkage array with single row. ... ok test_hierarchy.load_testing_files ... ok Ticket #505. ... ok Testing that kmeans2 init methods work. ... ok Testing simple call to kmeans2 with rank 1 data. ... ok Testing simple call to kmeans2 with rank 1 data. ... ok Testing simple call to kmeans2 and its results. ... ok Regression test for #546: fail when k arg is 0. ... ok This will cause kmean to have a cluster with no points. ... ok test_kmeans_simple (test_vq.TestKMean) ... ok test_large_features (test_vq.TestKMean) ... ok test_py_vq (test_vq.TestVq) ... ok test_py_vq2 (test_vq.TestVq) ... ok test_vq (test_vq.TestVq) ... ok Test special rank 1 vq algo, python implementation. ... ok test_codata.test_find ... ok test_codata.test_basic_table_parse ... ok test_codata.test_basic_lookup ... ok test_codata.test_find_all ... ok test_codata.test_find_single ... ok test_codata.test_2002_vs_2006 ... ok Check that updating stored values with exact ones worked. ... ok test_constants.test_fahrenheit_to_celcius ... ok test_constants.test_celcius_to_kelvin ... ok test_constants.test_kelvin_to_celcius ... ok test_constants.test_fahrenheit_to_kelvin ... ok test_constants.test_kelvin_to_fahrenheit ... ok test_constants.test_celcius_to_fahrenheit ... ok test_constants.test_lambda_to_nu ... ok test_constants.test_nu_to_lambda ... ok test_definition (test_basic.TestDoubleFFT) ... ok test_djbfft (test_basic.TestDoubleFFT) ... ok test_n_argument_real (test_basic.TestDoubleFFT) ... ok test_definition (test_basic.TestDoubleIFFT) ... python3(30179) malloc: *** error for object 0x1050ae058: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug Abort trap: 6 > Date: Thu, 26 Jan 2012 19:01:22 +0100 > From: Samuel John <scipy at samueljohn.de> > Subject: Re: [Numpy-discussion] Problem installing NumPy with Python > 3.2.2/MacOS X 10.7.2 > To: Discussion of Numerical Python <numpy-discussion at scipy.org> > Message-ID: <AD984E1A-D359-4BA0-BDD4-81826454EFC2 at samueljohn.de> > Content-Type: text/plain; charset=us-ascii > > Hi Hans-Martin! > > You could try my instructions recently posted to this list http://thread.gmane.org/gmane.comp.python.scientific.devel/15956/ > Basically, using llvm-gcc scipy segfaults when scipy.test() (on my system at least). > > Therefore, I created the homebrew install formula. > They work for whatever "which python" you have. But I have tested this for 2.7.2 on MacOS X 10.7.2. > > Samuel > > > On 11.01.2012, at 16:12, Hans-Martin v. Gaudecker wrote: > >> I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it). >> >> In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest. >> >> Best, >> Hans-Martin >> >> [1] https://github.com/kennethreitz/osx-gcc-installer >> [2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy > > The instructions at [2] lead to a segfault in scipy.test() for me, because it used llvm-gcc (which is the default on Lion). From josef.pktd at gmail.com Thu Jan 26 19:43:14 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Jan 2012 19:43:14 -0500 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <4F217A33.6020806@crans.org> <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com> <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com> Message-ID: <CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com> On Thu, Jan 26, 2012 at 3:58 PM, Bruce Southey <bsouthey at gmail.com> wrote: > On Thu, Jan 26, 2012 at 12:45 PM, ?<josef.pktd at gmail.com> wrote: >> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote: >>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig >>> <pierre.haessig at crans.org> wrote: >>>> Le 26/01/2012 15:57, Bruce Southey a ?crit : >>>>> Can you please provide a >>>>> couple of real examples with expected output that clearly show what >>>>> you want? >>>>> >>>> Hi Bruce, >>>> >>>> Thanks for your ticket feedback ! It's precisely because I see a big >>>> potential impact of the proposed change that I send first a ML message, >>>> second a ticket before jumping to a pull-request like a Sergio Leone's >>>> cowboy (sorry, I watched "for a few dollars more" last weekend...) >>>> >>>> Now, I realize that in the ticket writing I made the wrong trade-off >>>> between conciseness and accuracy which led to some of the errors you >>>> raised. Let me try to use your example to try to share what I have in mind. >>>> >>>>> >> X = array([-2.1, -1. , ?4.3]) >>>>> >> Y = array([ 3. ?, ?1.1 , ?0.12]) >>>> >>>> Indeed, with today's cov behavior we have a 2x2 array: >>>>> >> cov(X,Y) >>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>>> >>>> Now, when I used the word 'concatenation', I wasn't precise enough >>>> because I meant assembling X and Y in the sense of 2 vectors of >>>> observations from 2 random variables X and Y. >>>> This is achieved by concatenate(X,Y) *when properly playing with >>>> dimensions* (which I didn't mentioned) : >>>>> >> XY = np.concatenate((X[None, :], Y[None, :])) >>>> array([[-2.1 , -1. ?, ?4.3 ], >>>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]]) >>> >>> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate >>> than concatenate. >>> >>>> >>>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >>>>> >> np.cov(XY) >>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>>> >>> Sure the resulting array is the same but whole process is totally different. >>> >>> >>>> (And indeed, the actual cov Python code does use concatenate() ) >>> Yes, but the user does not see that. Whereas you are forcing the user >>> to do the stacking in the correct dimensions. >>> >>> >>>> >>>> >>>> Now let me come back to my assertion about this behavior *usefulness*. >>>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 >>>> simple scalars blocks). >>> No there are not '4' blocks just rows and columns. >> >> Sturla showed the 4 blocks in his first message. >> > Well, I could not follow that because the code is wrong. > X = np.array([-2.1, -1. , ?4.3]) >>>> cX = X - X.mean(axis=0)[np.newaxis,:] > > Traceback (most recent call last): > ?File "<pyshell#6>", line 1, in <module> > ? ?cX = X - X.mean(axis=0)[np.newaxis,:] > IndexError: 0-d arrays can only use a single () or a list of newaxes > (and a single ...) as an index > ?whoops! > > Anyhow, variance-covariance matrix is symmetric but numpy or scipy > lacks ?lapac's symmetrix matrix > (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html) > >>> >>>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes >>>> to var(X) and var(Y) when setting ddof to 1) >>> Sure but variances are still covariances. >>> >>>> ?* off diagonal blocks are symetric and are actually the covariance >>>> estimate of X, Y observations (from >>>> http://en.wikipedia.org/wiki/Covariance) >>> Sure >>>> >>>> that is : >>>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) >>>> -4.2860000000000005 >>>> >>>> The new proposed behaviour for cov is that cov(X,Y) would return : >>>> array(-4.2860000000000005) ?instead of the 2*2 matrix. >>> >>> But how you interpret an 2D array where the rows are greater than 2? >>>>>> Z=Y+X >>>>>> np.cov(np.vstack((X,Y,Z))) >>> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ], >>> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667], >>> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]]) >>> >>> >>>> >>>> ?* This would be in line with the cov(X,Y) mathematical definition, as >>>> well as with R behavior. >>> I don't care what R does because I am using Python and Python is >>> infinitely better than R is! >>> >>> But I think that is only in the 1D case. >> >> I just checked R to make sure I remember correctly >> >>> xx = matrix((1:20)^2, nrow=4) >>> xx >> ? ? [,1] [,2] [,3] [,4] [,5] >> [1,] ? ?1 ? 25 ? 81 ?169 ?289 >> [2,] ? ?4 ? 36 ?100 ?196 ?324 >> [3,] ? ?9 ? 49 ?121 ?225 ?361 >> [4,] ? 16 ? 64 ?144 ?256 ?400 >>> cov(xx, 2*xx[,1:2]) >> ? ? ? ? [,1] ? ? ?[,2] >> [1,] ?86.0000 ?219.3333 >> [2,] 219.3333 ?566.0000 >> [3,] 352.6667 ?912.6667 >> [4,] 486.0000 1259.3333 >> [5,] 619.3333 1606.0000 >>> cov(xx) >> ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5] >> [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667 >> [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000 >> [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333 >> [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 >> [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000 >> >> >>> >>>> ?* This would save memory and computing resources. (and therefore help >>>> save the planet ;-) ) >>> Nothing that you have provided shows that it will. >> >> I don't know about saving the planet, but if X and Y have the same >> number of columns, we save 3 quarters of the calculations, as Sturla >> also explained in his first message. >> > Can not figure those savings: > For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%) > a 3 by 3 ?output has 6 covariances > a 5 by 5 output 15 covariances what numpy calculates are 4, 9 and 25 covariances, we might care only about 1, 2 and 4 of them. > > If you want to save memory and calculation then use symmetric storage > and associated methods. actually for covariance matrix we stilll need to subtract means, so we won't save 75%, but we save 75% in the cross-product. suppose X and Y are (nobs, k_x) and (nobs, k_y) (means already subtracted) (and ignoring that numpy "likes" rows instead of columns) the partitioned dot product [X,Y]'[X,Y] is [[ X'X, X'Y], [Y'X, Y'Y]] X'Y is (n_x, n_y) total shape is (n_x + n_y, n_x + n_y) If we are only interested in X'Y, we don't need the other three submatrices. If n_x = 99 and n_y is 1, we save .... ? (we get a (99,1) instead of a (100, 100) matrix) and X'Y , np.dot(X, Y), doesn't have any duplicated symmetry, so exploiting symmetry is a different issue. Josef > > Bruce > >> Josef >> >>> >>>> >>>> However, I do understand that the impact for this change may be big. >>>> This indeed requires careful reviewing. >>>> >>>> Pierre >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> Bruce >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Thu Jan 26 21:45:49 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 26 Jan 2012 20:45:49 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <4F217A33.6020806@crans.org> <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com> <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com> <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com> <CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com> Message-ID: <CAAea2paQ3hGgtaroKiZ9MX_P=EEJAz_eoSozZ2TFR-0BnonKdQ@mail.gmail.com> On Thu, Jan 26, 2012 at 6:43 PM, <josef.pktd at gmail.com> wrote: > On Thu, Jan 26, 2012 at 3:58 PM, Bruce Southey <bsouthey at gmail.com> wrote: >> On Thu, Jan 26, 2012 at 12:45 PM, ?<josef.pktd at gmail.com> wrote: >>> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote: >>>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig >>>> <pierre.haessig at crans.org> wrote: >>>>> Le 26/01/2012 15:57, Bruce Southey a ?crit : >>>>>> Can you please provide a >>>>>> couple of real examples with expected output that clearly show what >>>>>> you want? >>>>>> >>>>> Hi Bruce, >>>>> >>>>> Thanks for your ticket feedback ! It's precisely because I see a big >>>>> potential impact of the proposed change that I send first a ML message, >>>>> second a ticket before jumping to a pull-request like a Sergio Leone's >>>>> cowboy (sorry, I watched "for a few dollars more" last weekend...) >>>>> >>>>> Now, I realize that in the ticket writing I made the wrong trade-off >>>>> between conciseness and accuracy which led to some of the errors you >>>>> raised. Let me try to use your example to try to share what I have in mind. >>>>> >>>>>> >> X = array([-2.1, -1. , ?4.3]) >>>>>> >> Y = array([ 3. ?, ?1.1 , ?0.12]) >>>>> >>>>> Indeed, with today's cov behavior we have a 2x2 array: >>>>>> >> cov(X,Y) >>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>>>> >>>>> Now, when I used the word 'concatenation', I wasn't precise enough >>>>> because I meant assembling X and Y in the sense of 2 vectors of >>>>> observations from 2 random variables X and Y. >>>>> This is achieved by concatenate(X,Y) *when properly playing with >>>>> dimensions* (which I didn't mentioned) : >>>>>> >> XY = np.concatenate((X[None, :], Y[None, :])) >>>>> array([[-2.1 , -1. ?, ?4.3 ], >>>>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]]) >>>> >>>> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate >>>> than concatenate. >>>> >>>>> >>>>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >>>>>> >> np.cov(XY) >>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ], >>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]]) >>>>> >>>> Sure the resulting array is the same but whole process is totally different. >>>> >>>> >>>>> (And indeed, the actual cov Python code does use concatenate() ) >>>> Yes, but the user does not see that. Whereas you are forcing the user >>>> to do the stacking in the correct dimensions. >>>> >>>> >>>>> >>>>> >>>>> Now let me come back to my assertion about this behavior *usefulness*. >>>>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 >>>>> simple scalars blocks). >>>> No there are not '4' blocks just rows and columns. >>> >>> Sturla showed the 4 blocks in his first message. >>> >> Well, I could not follow that because the code is wrong. >> X = np.array([-2.1, -1. , ?4.3]) >>>>> cX = X - X.mean(axis=0)[np.newaxis,:] >> >> Traceback (most recent call last): >> ?File "<pyshell#6>", line 1, in <module> >> ? ?cX = X - X.mean(axis=0)[np.newaxis,:] >> IndexError: 0-d arrays can only use a single () or a list of newaxes >> (and a single ...) as an index >> ?whoops! >> >> Anyhow, variance-covariance matrix is symmetric but numpy or scipy >> lacks ?lapac's symmetrix matrix >> (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html) >> >>>> >>>>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes >>>>> to var(X) and var(Y) when setting ddof to 1) >>>> Sure but variances are still covariances. >>>> >>>>> ?* off diagonal blocks are symetric and are actually the covariance >>>>> estimate of X, Y observations (from >>>>> http://en.wikipedia.org/wiki/Covariance) >>>> Sure >>>>> >>>>> that is : >>>>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) >>>>> -4.2860000000000005 >>>>> >>>>> The new proposed behaviour for cov is that cov(X,Y) would return : >>>>> array(-4.2860000000000005) ?instead of the 2*2 matrix. >>>> >>>> But how you interpret an 2D array where the rows are greater than 2? >>>>>>> Z=Y+X >>>>>>> np.cov(np.vstack((X,Y,Z))) >>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ], >>>> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667], >>>> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]]) >>>> >>>> >>>>> >>>>> ?* This would be in line with the cov(X,Y) mathematical definition, as >>>>> well as with R behavior. >>>> I don't care what R does because I am using Python and Python is >>>> infinitely better than R is! >>>> >>>> But I think that is only in the 1D case. >>> >>> I just checked R to make sure I remember correctly >>> >>>> xx = matrix((1:20)^2, nrow=4) >>>> xx >>> ? ? [,1] [,2] [,3] [,4] [,5] >>> [1,] ? ?1 ? 25 ? 81 ?169 ?289 >>> [2,] ? ?4 ? 36 ?100 ?196 ?324 >>> [3,] ? ?9 ? 49 ?121 ?225 ?361 >>> [4,] ? 16 ? 64 ?144 ?256 ?400 >>>> cov(xx, 2*xx[,1:2]) >>> ? ? ? ? [,1] ? ? ?[,2] >>> [1,] ?86.0000 ?219.3333 >>> [2,] 219.3333 ?566.0000 >>> [3,] 352.6667 ?912.6667 >>> [4,] 486.0000 1259.3333 >>> [5,] 619.3333 1606.0000 >>>> cov(xx) >>> ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5] >>> [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667 >>> [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000 >>> [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333 >>> [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 >>> [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000 >>> >>> >>>> >>>>> ?* This would save memory and computing resources. (and therefore help >>>>> save the planet ;-) ) >>>> Nothing that you have provided shows that it will. >>> >>> I don't know about saving the planet, but if X and Y have the same >>> number of columns, we save 3 quarters of the calculations, as Sturla >>> also explained in his first message. >>> >> Can not figure those savings: >> For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%) >> a 3 by 3 ?output has 6 covariances >> a 5 by 5 output 15 covariances > > what numpy calculates are 4, 9 and 25 covariances, we might care only > about 1, 2 and 4 of them. > >> >> If you want to save memory and calculation then use symmetric storage >> and associated methods. > > actually for covariance matrix we stilll need to subtract means, so we > won't save 75%, but we save 75% in the cross-product. > > suppose X and Y are (nobs, k_x) and (nobs, k_y) ? (means already subtracted) > (and ignoring that numpy "likes" rows instead of columns) > > the partitioned dot product ?[X,Y]'[X,Y] is > > [[ X'X, X'Y], > ?[Y'X, Y'Y]] > > X'Y is (n_x, n_y) > total shape is (n_x + n_y, n_x + n_y) > > If we are only interested in X'Y, we don't need the other three submatrices. > > If n_x = 99 and n_y is 1, we save .... ? > (we get a (99,1) instead of a (100, 100) matrix) > > and X'Y , np.dot(X, Y), doesn't have any duplicated symmetry, so > exploiting symmetry is a different issue. > > Josef > >> >> Bruce >> >>> Josef >>> >>>> >>>>> >>>>> However, I do understand that the impact for this change may be big. >>>>> This indeed requires careful reviewing. >>>>> >>>>> Pierre >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> Bruce >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks for someone to clearly state what they want. But still lacks evidence that it will save the world - when nobs is large, n_x and n_y are meaningless and thus (99,1) vs (100, 100) is also meaningless. Further dealing separately with the two arrays also bring additional overhead - small not zero. Bruce From wcmclen at gmail.com Fri Jan 27 00:09:09 2012 From: wcmclen at gmail.com (William McLendon) Date: Thu, 26 Jan 2012 22:09:09 -0700 Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7 with Python2.7 (32-bit) Message-ID: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> Hi, I am trying to install NumPy (using numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has 32-bit Python 2.7 installed on it using the latest installer (python-2.7.2.msi). Python is installed into the default location, C:\Python27, and as far as I can tell the registry knows about it -- or at least the windows uninstaller in the control panel does... The installation fails because the NumPy installer cannot find the Python installation. I am then prompted with a screen that should allow me to type in the location of my python installation, but the text-boxes where I should type this do not allow input so I'm kind of stuck. I did look into trying to build from source, but I don't have a C compiler on this system so setup.py died a horrible death. I'd prefer to avoid having to install Visual C++ Express on this system. Does anyone have any suggestions that might be helpful? Thanks! -William -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/e22023cf/attachment.html> From kalatsky at gmail.com Fri Jan 27 00:29:32 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Thu, 26 Jan 2012 23:29:32 -0600 Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7 with Python2.7 (32-bit) In-Reply-To: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> Message-ID: <CAE8bXEmvriDkh43ct11r===iZ-LHxLDJKr5jZ90WC6-nWfTdDg@mail.gmail.com> To avoid all the hassle I suggest getting EPD: http://enthought.com/products/epd.php You'd get way more than just NumPy, which may or may not be what you need. I have installed various NumPy's on linux only and from source only which did require compilation (gcc), so I am not a good help for your setup. On the hand, I've done multiple EPD installations on various platforms and never had problems. Val On Thu, Jan 26, 2012 at 11:09 PM, William McLendon <wcmclen at gmail.com>wrote: > Hi, > > I am trying to install NumPy (using > numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has > 32-bit Python 2.7 installed on it using the latest installer > (python-2.7.2.msi). Python is installed into the default location, > C:\Python27, and as far as I can tell the registry knows about it -- or at > least the windows uninstaller in the control panel does... > > The installation fails because the NumPy installer cannot find the Python > installation. I am then prompted with a screen that should allow me to > type in the location of my python installation, but the text-boxes where I > should type this do not allow input so I'm kind of stuck. > > I did look into trying to build from source, but I don't have a C compiler > on this system so setup.py died a horrible death. I'd prefer to avoid > having to install Visual C++ Express on this system. > > Does anyone have any suggestions that might be helpful? > > Thanks! > -William > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/e3598028/attachment.html> From pierre.haessig at crans.org Fri Jan 27 05:09:53 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 27 Jan 2012 11:09:53 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org> <4F218CC8.9070602@molden.no> <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com> Message-ID: <4F2277F1.4050407@crans.org> Le 26/01/2012 19:19, josef.pktd at gmail.com a ?crit : > The discussion had this reversed, numpy matches the behavior of > MATLAB, while R (statistics) only returns the cross covariance part as > proposed. > I would also say that there was an attempt to match MATLAB behavior. However, there is big difference with numpy.cov because of the default value `rowvar` being True. Most softwares and textbooks I know consider that, in a 2D context, matrix rows are obvervations while columns are the variables. Any idea why the "transposed" convention was selected in np.cov ? (This question, I'm raising for informative purpose only... ;-) ) I also compared with octave to see how it works : -- Function File: cov (X, Y) Compute covariance. If each row of X and Y is an observation and each column is a variable, the (I, J)-th entry of `cov (X, Y)' is the covariance between the I-th variable in X and the J-th variable in Y. If called with one argument, compute `cov (X, X)'. (http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html) I like the clear tone of this description. But strangely enough, this a bit different from Matlab. (http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a) > If there is a new xcov, then I think there should also be a xcorrcoef. > This case needs a different implementation than corrcoef, since the > xcov doesn't contain the variances and they need to be calculated > separately. Adding xcorrcoeff as well would make sense. The use of the np.var when setting the `axis` and `??ddof` arguments to appropriate values should the bring variances needed for the normalization. In the end, if adding xcov is the path of least resistance, this may be the way to go. What do people think ? Pierre From shish at keba.be Fri Jan 27 06:55:24 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 27 Jan 2012 06:55:24 -0500 Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7 with Python2.7 (32-bit) In-Reply-To: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> Message-ID: <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com> It seems weird that it wouldn't work, as this is a pretty standard setup. Here's a few ideas of things to check: - Double-check it's really 32 bit Python (checking sys.maxint) - Is there another Python installation that may cause some conflicts? - Did you download the numpy superpack from the official website? - Reboot Unlikely to be helpful, but I can't think of something else right now :/ -=- Olivier 2012/1/27 William McLendon <wcmclen at gmail.com> > Hi, > > I am trying to install NumPy (using > numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has > 32-bit Python 2.7 installed on it using the latest installer > (python-2.7.2.msi). Python is installed into the default location, > C:\Python27, and as far as I can tell the registry knows about it -- or at > least the windows uninstaller in the control panel does... > > The installation fails because the NumPy installer cannot find the Python > installation. I am then prompted with a screen that should allow me to > type in the location of my python installation, but the text-boxes where I > should type this do not allow input so I'm kind of stuck. > > I did look into trying to build from source, but I don't have a C compiler > on this system so setup.py died a horrible death. I'd prefer to avoid > having to install Visual C++ Express on this system. > > Does anyone have any suggestions that might be helpful? > > Thanks! > -William > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/35b6a90f/attachment.html> From wcmclen at gmail.com Fri Jan 27 08:26:33 2012 From: wcmclen at gmail.com (William McLendon) Date: Fri, 27 Jan 2012 06:26:33 -0700 Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7 with Python2.7 (32-bit) In-Reply-To: <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com> References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com> Message-ID: <CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com> Yup, it's 32-bit python: Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> I've only got one python instance installed here :D Here's where I got the numpy installer, http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/, as far as I can tell this should be the right place. Python has been installed on this system for a while and it's been rebooted numerous times, I can't imagine that it wouldn't be there. Matplotlib's installer had no trouble finding Python. Thanks! -William On Fri, Jan 27, 2012 at 4:55 AM, Olivier Delalleau <shish at keba.be> wrote: > It seems weird that it wouldn't work, as this is a pretty standard setup. > Here's a few ideas of things to check: > - Double-check it's really 32 bit Python (checking sys.maxint) > - Is there another Python installation that may cause some conflicts? > - Did you download the numpy superpack from the official website? > - Reboot > > Unlikely to be helpful, but I can't think of something else right now :/ > > -=- Olivier > > 2012/1/27 William McLendon <wcmclen at gmail.com> > >> Hi, >> >> I am trying to install NumPy (using >> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has >> 32-bit Python 2.7 installed on it using the latest installer >> (python-2.7.2.msi). Python is installed into the default location, >> C:\Python27, and as far as I can tell the registry knows about it -- or at >> least the windows uninstaller in the control panel does... >> >> The installation fails because the NumPy installer cannot find the Python >> installation. I am then prompted with a screen that should allow me to >> type in the location of my python installation, but the text-boxes where I >> should type this do not allow input so I'm kind of stuck. >> >> I did look into trying to build from source, but I don't have a C >> compiler on this system so setup.py died a horrible death. I'd prefer to >> avoid having to install Visual C++ Express on this system. >> >> Does anyone have any suggestions that might be helpful? >> >> Thanks! >> -William >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/c800c411/attachment.html> From chaoyuejoy at gmail.com Fri Jan 27 08:52:55 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 27 Jan 2012 14:52:55 +0100 Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array? Message-ID: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> Dear all, suppose I have a ndarray a: In [66]: a Out[66]: array([0, 1, 2, 3, 4]) how can use it as 5X1 array without doing a=a.reshape(5,1)? thanks Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/a118b884/attachment.html> From d.s.seljebotn at astro.uio.no Fri Jan 27 08:56:38 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 27 Jan 2012 14:56:38 +0100 Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array? In-Reply-To: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> Message-ID: <4F22AD16.3090800@astro.uio.no> On 01/27/2012 02:52 PM, Chao YUE wrote: > Dear all, > > suppose I have a ndarray a: > > In [66]: a > Out[66]: array([0, 1, 2, 3, 4]) > > how can use it as 5X1 array without doing a=a.reshape(5,1)? a[:, np.newaxis] a[:, None] np.newaxis is None Dag From paul.anton.letnes at gmail.com Fri Jan 27 09:28:56 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Fri, 27 Jan 2012 15:28:56 +0100 Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array? In-Reply-To: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> Message-ID: <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com> On 27. jan. 2012, at 14:52, Chao YUE wrote: > Dear all, > > suppose I have a ndarray a: > > In [66]: a > Out[66]: array([0, 1, 2, 3, 4]) > > how can use it as 5X1 array without doing a=a.reshape(5,1)? Several ways, this is one, although not much simpler. In [6]: a Out[6]: array([0, 1, 2, 3, 4]) In [7]: a.shape = 5, 1 In [8]: a Out[8]: array([[0], [1], [2], [3], [4]]) Paul From tsyu80 at gmail.com Fri Jan 27 09:36:46 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Fri, 27 Jan 2012 09:36:46 -0500 Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array? In-Reply-To: <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com> References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com> Message-ID: <CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com> On Fri, Jan 27, 2012 at 9:28 AM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > > On 27. jan. 2012, at 14:52, Chao YUE wrote: > > > Dear all, > > > > suppose I have a ndarray a: > > > > In [66]: a > > Out[66]: array([0, 1, 2, 3, 4]) > > > > how can use it as 5X1 array without doing a=a.reshape(5,1)? > > Several ways, this is one, although not much simpler. > In [6]: a > Out[6]: array([0, 1, 2, 3, 4]) > > In [7]: a.shape = 5, 1 > > In [8]: a > Out[8]: > array([[0], > [1], > [2], > [3], > [4]]) > > Paul > > I'm assuming your issue with that call to reshape is that you need to know the dimensions beforehand. An alternative is to call: >>> a.reshape(-1, 1) The "-1" allows numpy to "infer" the length based on the given sizes. Another alternative is: >>> a[:, np.newaxis] -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/f49ea0d0/attachment.html> From ben.root at ou.edu Fri Jan 27 10:00:02 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 27 Jan 2012 09:00:02 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F2277F1.4050407@crans.org> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org> <4F218CC8.9070602@molden.no> <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com> <4F2277F1.4050407@crans.org> Message-ID: <CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com> On Friday, January 27, 2012, Pierre Haessig <pierre.haessig at crans.org> wrote: > Le 26/01/2012 19:19, josef.pktd at gmail.com a ?crit : >> The discussion had this reversed, numpy matches the behavior of >> MATLAB, while R (statistics) only returns the cross covariance part as >> proposed. >> > I would also say that there was an attempt to match MATLAB behavior. > However, there is big difference with numpy.cov because of the default > value `rowvar` being True. Most softwares and textbooks I know consider > that, in a 2D context, matrix rows are obvervations while columns are > the variables. > > Any idea why the "transposed" convention was selected in np.cov ? > (This question, I'm raising for informative purpose only... ;-) ) > > I also compared with octave to see how it works : > -- Function File: cov (X, Y) > Compute covariance. > > If each row of X and Y is an observation and each column is a > variable, the (I, J)-th entry of `cov (X, Y)' is the covariance > between the I-th variable in X and the J-th variable in Y. If > called with one argument, compute `cov (X, X)'. > > ( http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html ) > I like the clear tone of this description. But strangely enough, this a > bit different from Matlab. > ( http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a ) > >> If there is a new xcov, then I think there should also be a xcorrcoef. >> This case needs a different implementation than corrcoef, since the >> xcov doesn't contain the variances and they need to be calculated >> separately. > Adding xcorrcoeff as well would make sense. The use of the np.var when > setting the `axis` and `??ddof` arguments to appropriate values should the > bring variances needed for the normalization. > > In the end, if adding xcov is the path of least resistance, this may be > the way to go. What do people think ? > > Pierre > My vote is for xcov() and xcorrcoeff(). It won't break compatibility, and the name of the function makes it clear what it does. It would also make sense to add "seealso" references to each other in the docstrings. The documentation for xcov() should also make it clear the differences between cov() and xcov() with examples and show how to get equivalent results using just cov() for those with older versions of numpy. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/04c1e9a3/attachment.html> From chaoyuejoy at gmail.com Fri Jan 27 10:45:49 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 27 Jan 2012 16:45:49 +0100 Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array? In-Reply-To: <CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com> References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com> <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com> <CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com> Message-ID: <CAAN-aRFnj7byMJ2Jpy31NON0ViT8bCyRq+zAb2w87D0GYZUN=g@mail.gmail.com> Thanks all. chao 2012/1/27 Tony Yu <tsyu80 at gmail.com> > > > On Fri, Jan 27, 2012 at 9:28 AM, Paul Anton Letnes < > paul.anton.letnes at gmail.com> wrote: > >> >> On 27. jan. 2012, at 14:52, Chao YUE wrote: >> >> > Dear all, >> > >> > suppose I have a ndarray a: >> > >> > In [66]: a >> > Out[66]: array([0, 1, 2, 3, 4]) >> > >> > how can use it as 5X1 array without doing a=a.reshape(5,1)? >> >> Several ways, this is one, although not much simpler. >> In [6]: a >> Out[6]: array([0, 1, 2, 3, 4]) >> >> In [7]: a.shape = 5, 1 >> >> In [8]: a >> Out[8]: >> array([[0], >> [1], >> [2], >> [3], >> [4]]) >> >> Paul >> >> > I'm assuming your issue with that call to reshape is that you need to know > the dimensions beforehand. An alternative is to call: > > >>> a.reshape(-1, 1) > > The "-1" allows numpy to "infer" the length based on the given sizes. > > Another alternative is: > > >>> a[:, np.newaxis] > > -Tony > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/47996868/attachment.html> From bsouthey at gmail.com Fri Jan 27 11:28:53 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 27 Jan 2012 10:28:53 -0600 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com> References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com> <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com> <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com> <4F2152D6.10303@crans.org> <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com> <jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org> <4F218CC8.9070602@molden.no> <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com> <4F2277F1.4050407@crans.org> <CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com> Message-ID: <4F22D0C5.2050807@gmail.com> On 01/27/2012 09:00 AM, Benjamin Root wrote: > > > On Friday, January 27, 2012, Pierre Haessig <pierre.haessig at crans.org > <mailto:pierre.haessig at crans.org>> wrote: > > Le 26/01/2012 19:19, josef.pktd at gmail.com > <mailto:josef.pktd at gmail.com> a ?crit : > >> The discussion had this reversed, numpy matches the behavior of > >> MATLAB, while R (statistics) only returns the cross covariance part as > >> proposed. > >> > > I would also say that there was an attempt to match MATLAB behavior. > > However, there is big difference with numpy.cov because of the default > > value `rowvar` being True. Most softwares and textbooks I know consider > > that, in a 2D context, matrix rows are obvervations while columns are > > the variables. > > > > Any idea why the "transposed" convention was selected in np.cov ? > > (This question, I'm raising for informative purpose only... ;-) ) > > > > I also compared with octave to see how it works : > > -- Function File: cov (X, Y) > > Compute covariance. > > > > If each row of X and Y is an observation and each column is a > > variable, the (I, J)-th entry of `cov (X, Y)' is the covariance > > between the I-th variable in X and the J-th variable in Y. If > > called with one argument, compute `cov (X, X)'. > > > > > (http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html) > > I like the clear tone of this description. But strangely enough, this a > > bit different from Matlab. > > > (http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a > <http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a>) > > > >> If there is a new xcov, then I think there should also be a xcorrcoef. > >> This case needs a different implementation than corrcoef, since the > >> xcov doesn't contain the variances and they need to be calculated > >> separately. > > Adding xcorrcoeff as well would make sense. The use of the np.var when > > setting the `axis` and `??ddof` arguments to appropriate values > should the > > bring variances needed for the normalization. > > > > In the end, if adding xcov is the path of least resistance, this may be > > the way to go. What do people think ? > > > > Pierre > > > > My vote is for xcov() and xcorrcoeff(). It won't break compatibility, > and the name of the function makes it clear what it does. It would > also make sense to add "seealso" references to each other in the > docstrings. The documentation for xcov() should also make it clear > the differences between cov() and xcov() with examples and show how to > get equivalent results using just cov() for those with older versions > of numpy. > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -1 because these are too close to cross-correlation as used by signal processing. The output is still a covariance so do we really need yet another set of very similar functions to maintain? Or can we get away with a new keyword? If speed really matters to you guys then surely moving np.cov into C would have more impact on 'saving the world' than this proposal. That also ignores algorithm used ( http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance). Actually np.cov also is deficient in that it does not have the dtype argument so it is prone to numerical precision errors (especially getting the mean of the array). Probably should be a ticket... Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/a3bad208/attachment.html> From shish at keba.be Fri Jan 27 12:28:53 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 27 Jan 2012 12:28:53 -0500 Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7 with Python2.7 (32-bit) In-Reply-To: <CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com> References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com> <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com> <CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com> Message-ID: <CAFXk4bq1M2dn13Zx2+7Lk4ErNFxDcUJyB6VAzZykxWWkDCBJwA@mail.gmail.com> Sorry then, I'm afraid I'm out of (simple ideas). Out of curiosity, I tried to install Python 2.7.2 and numpy 1.6.1 on a Windows 7 computer and it worked just fine, so it must be something with your specific setup... -=- Olivier 2012/1/27 William McLendon <wcmclen at gmail.com> > Yup, it's 32-bit python: > Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] > on win32 > Type "copyright", "credits" or "license()" for more information. > >>> > > I've only got one python instance installed here :D > > Here's where I got the numpy installer, > http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/, as far as I can > tell this should be the right place. > > Python has been installed on this system for a while and it's been > rebooted numerous times, I can't imagine that it wouldn't be there. > Matplotlib's installer had no trouble finding Python. > > Thanks! > -William > > > > > > On Fri, Jan 27, 2012 at 4:55 AM, Olivier Delalleau <shish at keba.be> wrote: > >> It seems weird that it wouldn't work, as this is a pretty standard setup. >> Here's a few ideas of things to check: >> - Double-check it's really 32 bit Python (checking sys.maxint) >> - Is there another Python installation that may cause some conflicts? >> - Did you download the numpy superpack from the official website? >> - Reboot >> >> Unlikely to be helpful, but I can't think of something else right now :/ >> >> -=- Olivier >> >> 2012/1/27 William McLendon <wcmclen at gmail.com> >> >>> Hi, >>> >>> I am trying to install NumPy (using >>> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has >>> 32-bit Python 2.7 installed on it using the latest installer >>> (python-2.7.2.msi). Python is installed into the default location, >>> C:\Python27, and as far as I can tell the registry knows about it -- or at >>> least the windows uninstaller in the control panel does... >>> >>> The installation fails because the NumPy installer cannot find the >>> Python installation. I am then prompted with a screen that should allow me >>> to type in the location of my python installation, but the text-boxes where >>> I should type this do not allow input so I'm kind of stuck. >>> >>> I did look into trying to build from source, but I don't have a C >>> compiler on this system so setup.py died a horrible death. I'd prefer to >>> avoid having to install Visual C++ Express on this system. >>> >>> Does anyone have any suggestions that might be helpful? >>> >>> Thanks! >>> -William >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/4e9ddf0e/attachment.html> From b.telenczuk at biologie.hu-berlin.de Fri Jan 27 15:46:06 2012 From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk) Date: Fri, 27 Jan 2012 21:46:06 +0100 Subject: [Numpy-discussion] preferred way of testing empty arrays Message-ID: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> I have been using numpy for several years and I am very impressed with its flexibility. However, there is one problem that has always bothered me. Quite often I need to test consistently whether a variable is any of the following: an empty list, an empty array or None. Since both arrays and lists are ordered sequences I usually allow for both, and convert if necessary. However, when the (optional) argument is an empty list/array or None, I skip its processing and do nothing. Now, how should I test for 'emptiness'? PEP8 recommends: For sequences, (strings, lists, tuples), use the fact that empty sequences are false. >> seq = [] >> if not seq: ... print 'Hello' It works for empty numpy arrays: >> a = np.array(seq) >> if not a: ... print 'Hello" Hello but if 'a' is non-empty it raises an exception: >> a = np.array([1,2]) >> if not a: ... print 'Hello" ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() One solution is to test lengths: >> if len(seq) > 0: .... ... >> if len(a) > 0: ... ... but for None it fails again: >> opt = None >> if len(opt): ... TypeError: object of type 'NoneType' has no len() even worse we can not test for None, because it will fail if someone accidentally wraps None in an array: >> a = np.array(opt) >> if opt is not None: ... print 'hello' hello Although this behaviour is expected, it may be very confusing and it easily leads to errors. Even worse it adds unnecessary complexity in the code, because arrays, lists and None have to be handled differently. I hoped the I managed to explain the problem well. Is there a recommended way to test for empty arrays? Cheers, Bartosz From ben.root at ou.edu Fri Jan 27 15:57:52 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 27 Jan 2012 14:57:52 -0600 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> Message-ID: <CANNq6FnmUzN0SNf11YAKLDNJE20e_Ajjf-6BFrMBc9uwmeoR2g@mail.gmail.com> On Fri, Jan 27, 2012 at 2:46 PM, Bartosz Telenczuk < b.telenczuk at biologie.hu-berlin.de> wrote: > I have been using numpy for several years and I am very impressed with its > flexibility. However, there is one problem that has always bothered me. > > Quite often I need to test consistently whether a variable is any of the > following: an empty list, an empty array or None. Since both arrays and > lists are ordered sequences I usually allow for both, and convert if > necessary. However, when the (optional) argument is an empty list/array or > None, I skip its processing and do nothing. > > Now, how should I test for 'emptiness'? > > PEP8 recommends: > > For sequences, (strings, lists, tuples), use the fact that empty sequences > are false. > > >> seq = [] > >> if not seq: > ... print 'Hello' > > It works for empty numpy arrays: > > >> a = np.array(seq) > >> if not a: > ... print 'Hello" > Hello > > but if 'a' is non-empty it raises an exception: > > >> a = np.array([1,2]) > >> if not a: > ... print 'Hello" > ValueError: The truth value of an array with more than one element is > ambiguous. Use a.any() or a.all() > > One solution is to test lengths: > > >> if len(seq) > 0: > .... ... > >> if len(a) > 0: > ... ... > > but for None it fails again: > > >> opt = None > >> if len(opt): > ... > TypeError: object of type 'NoneType' has no len() > > even worse we can not test for None, because it will fail if someone > accidentally wraps None in an array: > > >> a = np.array(opt) > >> if opt is not None: > ... print 'hello' > hello > > Although this behaviour is expected, it may be very confusing and it > easily leads to errors. Even worse it adds unnecessary complexity in the > code, because arrays, lists and None have to be handled differently. > > I hoped the I managed to explain the problem well. Is there a recommended > way to test for empty arrays? > > Cheers, > > Bartosz > > Don't know if it is recommended, but this is used frequently within matplotlib: if np.prod(a.shape) == 0 : print "Is Empty!" Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/c15099ef/attachment.html> From robert.kern at gmail.com Fri Jan 27 15:59:03 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Jan 2012 20:59:03 +0000 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> Message-ID: <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> On Fri, Jan 27, 2012 at 20:46, Bartosz Telenczuk <b.telenczuk at biologie.hu-berlin.de> wrote: > I have been using numpy for several years and I am very impressed with its flexibility. However, there is one problem that has always bothered me. > > Quite often I need to test consistently whether a variable is any of the following: an empty list, an empty array or None. Since both arrays and lists are ordered sequences I usually allow for both, and convert if necessary. However, when the (optional) argument is an empty list/array or None, ?I skip its processing and do nothing. > > Now, how should I test for 'emptiness'? > > PEP8 recommends: > > For sequences, (strings, lists, tuples), use the fact that empty sequences are false. > >>> seq = [] >>> if not seq: > ... ? ?print 'Hello' > > It works for empty numpy arrays: > >>> a = np.array(seq) >>> if not a: > ... ? ? print 'Hello" > Hello > > but if 'a' is non-empty it raises an exception: > >>> a = np.array([1,2]) >>> if not a: > ... ? ? print 'Hello" > ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() > > One solution is to test lengths: > >>> if len(seq) > 0: > .... ? ?... >>> if len(a) > 0: > ... ? ? ... > > but for None it fails again: > >>> opt = None >>> if len(opt): > ... > TypeError: object of type 'NoneType' has no len() > > even worse we can not test for None, because it will fail if someone accidentally wraps None in an array: > >>> a = np.array(opt) >>> if opt is not None: > ... ? ? ?print 'hello' > hello > > Although this behaviour is expected, it may be very confusing and it easily leads to errors. Even worse it adds unnecessary complexity in the code, because arrays, lists and None have to be handled differently. > > I hoped the I managed to explain the problem well. Is there a recommended way to test for empty arrays? [~] |5> x = np.zeros([0]) [~] |6> x array([], dtype=float64) [~] |7> x.size == 0 True Note that checking for len(x) will fail for some empty arrays: [~] |8> x = np.zeros([10, 0]) [~] |9> x.size == 0 True [~] |10> len(x) 10 There is no way to test all of the cases (empty sequence, empty array, None) in the same way. Usually, it's a bad idea to conflate the three. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From emayssat at gmail.com Fri Jan 27 16:17:36 2012 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Fri, 27 Jan 2012 13:17:36 -0800 Subject: [Numpy-discussion] bug in array instanciation? Message-ID: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com> In [20]: dt_knobs = [('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))] In [21]: r_knobs = np.recarray([],dtype=dt_knobs) In [22]: r_knobs Out[22]: rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02', 1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'), dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'), ('mode', '|S10')]) why is the array not empty? -- E From howard at renci.org Fri Jan 27 16:18:06 2012 From: howard at renci.org (Howard) Date: Fri, 27 Jan 2012 16:18:06 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question Message-ID: <4F23148E.80902@renci.org> Hi all I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :) I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData. If I do this: for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin I get the effect I want, If I do this: modelData[np.isnan(modelData)] = dataMin it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :) Any ideas would be much appreciated. Thanks Howard -- Howard Lander <mailto:howard at renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/2576f2bc/attachment.html> From robert.kern at gmail.com Fri Jan 27 16:22:31 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Jan 2012 21:22:31 +0000 Subject: [Numpy-discussion] bug in array instanciation? In-Reply-To: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com> References: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com> Message-ID: <CAF6FJiu_m7HQwgh8EFZTM8gcFRiQmT+Dm7Ca+1dyqwWKjCbYLg@mail.gmail.com> On Fri, Jan 27, 2012 at 21:17, Emmanuel Mayssat <emayssat at gmail.com> wrote: > In [20]: dt_knobs = > [('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))] > > In [21]: r_knobs = np.recarray([],dtype=dt_knobs) > > In [22]: r_knobs > Out[22]: > rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02', > 1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'), > ? ? ?dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'), > ('mode', '|S10')]) > > why is the array not empty? The shape [] creates a rank-0 array, which is essentially a scalar. [~] |1> x = np.array(10) [~] |2> x array(10) [~] |3> x.shape () If you want an empty array, you need at least one dimension of size 0: [~] |7> r_knobs = np.recarray([0], dtype=dt_knobs) [~] |8> r_knobs rec.array([], dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'), ('mode', '|S10')]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From b.telenczuk at biologie.hu-berlin.de Fri Jan 27 16:24:52 2012 From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk) Date: Fri, 27 Jan 2012 22:24:52 +0100 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> Message-ID: <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de> Thank you for your tips. I was not aware of the possible problems with len. > There is no way to test all of the cases (empty sequence, empty array, > None) in the same way. Usually, it's a bad idea to conflate the three. I agree that this should be avoided. However, there are cases in which it is not possible or hard. My case is that I get some extra data to add to my plots from a database. The dataset may be undefined (which means None), empty array or empty list. In all cases the data should not be plotted. If I want to test for all the cases, my program becomes quite complex. In fact, Python provides False values for most empty objects, but NumPy seems to ignore this. It might be a good idea to have a helper function which handles all objects consistently. Yours, Bartosz From robert.kern at gmail.com Fri Jan 27 16:29:39 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Jan 2012 21:29:39 +0000 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de> Message-ID: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com> On Fri, Jan 27, 2012 at 21:24, Bartosz Telenczuk <b.telenczuk at biologie.hu-berlin.de> wrote: > Thank you for your tips. I was not aware of the possible problems with len. > >> There is no way to test all of the cases (empty sequence, empty array, >> None) in the same way. Usually, it's a bad idea to conflate the three. > > I agree that this should be avoided. However, there are cases in which it is not possible or hard. My case is that I get some extra data to add to my plots from a database. The dataset may be undefined (which means None), empty array or empty list. In all cases the data should not be plotted. If I want to test for all the cases, my program becomes quite complex. Well, if you really need to do this in more than one place, define a utility function and call it a day. def should_not_plot(x): if x is None: return True elif isinstance(x, np.ndarray): return x.size == 0 else: return bool(x) > In fact, Python provides False values for most empty objects, but NumPy seems to ignore this. It might be a good idea to have a helper function which handles all objects consistently. np.asarray(x).size == 0 None should rarely be treated the same as an empty list or a 0-size array, so that should be left to application-specific code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From b.telenczuk at biologie.hu-berlin.de Fri Jan 27 16:37:32 2012 From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk) Date: Fri, 27 Jan 2012 22:37:32 +0100 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de> <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com> Message-ID: <FDDAFAA6-B8DF-40A1-AB9B-8DB4E183E212@biologie.hu-berlin.de> This will be indeed very helpful. Thanks. > Well, if you really need to do this in more than one place, define a > utility function and call it a day. > > def should_not_plot(x): > if x is None: > return True > elif isinstance(x, np.ndarray): > return x.size == 0 > else: > return bool(x) Bartosz From shish at keba.be Fri Jan 27 16:42:59 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 27 Jan 2012 16:42:59 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F23148E.80902@renci.org> References: <4F23148E.80902@renci.org> Message-ID: <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com> What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1) -=- Olivier 2012/1/27 Howard <howard at renci.org> > Hi all > > I am a fairly recent convert to python and I have got a question that's > got me stumped. I hope this is the right mailing list: here goes :) > > I am reading some time series data out of a netcdf file a single timestep > at a time. If the data is NaN, I want to reset it to the minimum of the > dataset over all timesteps (which I already know). The data is in a > variable of type numpy.ma.core.MaskedArray called modelData. > > If I do this: > > for i in range(len(modelData)): > if math.isnan(modelData[i]): > modelData[i] = dataMin > > I get the effect I want, If I do this: > > modelData[np.isnan(modelData)] = dataMin > > it doesn't seem to be working. Of course I could just do the first one, > but len(modelData) is about 3.5 million, and it's taking about 20 seconds > to run. This is happening inside of a rendering loop, so I'd like it to be > as fast as possible, and I thought the second one might be faster, and > maybe it is, but it doesn't seem to be working! :) > > Any ideas would be much appreciated. > > Thanks > Howard > > -- > Howard Lander <howard at renci.org> > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/0cde807b/attachment.html> From howard at renci.org Fri Jan 27 16:54:13 2012 From: howard at renci.org (Howard) Date: Fri, 27 Jan 2012 16:54:13 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com> References: <4F23148E.80902@renci.org> <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com> Message-ID: <4F231D05.7090705@renci.org> Hi Olivier I added this to the code: print "modelData:", type(modelData), modelData.shape, modelData.size print "dataMin:", type(dataMin) and got modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 dataMin: <type 'float'> What's funny is I tried the example from http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf and it works fine for me. Maybe 1.7 million is over some threshhold? Thanks Howard >>> myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) >>> myarr[np.isnan(myarr)] = 30 >>> myarr masked_array(data = [ 1. 0. 30. 3.], mask = False, fill_value = 1e+20) On 1/27/12 4:42 PM, Olivier Delalleau wrote: > What are the types and shapes of modelData and dataMin? (it works for > me with modelData a (3, 4) numpy array and dataMin a Python float, > with numpy 1.6.1) > > -=- Olivier > > 2012/1/27 Howard <howard at renci.org <mailto:howard at renci.org>> > > Hi all > > I am a fairly recent convert to python and I have got a question > that's got me stumped. I hope this is the right mailing list: > here goes :) > > I am reading some time series data out of a netcdf file a single > timestep at a time. If the data is NaN, I want to reset it to the > minimum of the dataset over all timesteps (which I already know). > The data is in a variable of type numpy.ma.core.MaskedArray called > modelData. > > If I do this: > > for i in range(len(modelData)): > if math.isnan(modelData[i]): > modelData[i] = dataMin > > I get the effect I want, If I do this: > > modelData[np.isnan(modelData)] = dataMin > > it doesn't seem to be working. Of course I could just do the > first one, but len(modelData) is about 3.5 million, and it's > taking about 20 seconds to run. This is happening inside of a > rendering loop, so I'd like it to be as fast as possible, and I > thought the second one might be faster, and maybe it is, but it > doesn't seem to be working! :) > > Any ideas would be much appreciated. > > Thanks > Howard > > -- > Howard Lander <mailto:howard at renci.org> > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Howard Lander <mailto:howard at renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/35c576ec/attachment.html> From howard at renci.org Fri Jan 27 16:58:05 2012 From: howard at renci.org (Howard) Date: Fri, 27 Jan 2012 16:58:05 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F231D05.7090705@renci.org> References: <4F23148E.80902@renci.org> <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com> <4F231D05.7090705@renci.org> Message-ID: <4F231DED.60001@renci.org> Oh, one other thing I should mention: I did the install of numpy yesterday and I also have 1.6.1 Howard On 1/27/12 4:54 PM, Howard wrote: > Hi Olivier > > I added this to the code: > > print "modelData:", type(modelData), modelData.shape, modelData.size > print "dataMin:", type(dataMin) > > and got > > modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 > dataMin: <type 'float'> > > What's funny is I tried the example from > > http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf > > and it works fine for me. Maybe 1.7 million is over some threshhold? > > Thanks > Howard > > >>> myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) > >>> myarr[np.isnan(myarr)] = 30 > >>> myarr > masked_array(data = [ 1. 0. 30. 3.], > mask = False, > fill_value = 1e+20) > > > On 1/27/12 4:42 PM, Olivier Delalleau wrote: >> What are the types and shapes of modelData and dataMin? (it works for >> me with modelData a (3, 4) numpy array and dataMin a Python float, >> with numpy 1.6.1) >> >> -=- Olivier >> >> 2012/1/27 Howard <howard at renci.org <mailto:howard at renci.org>> >> >> Hi all >> >> I am a fairly recent convert to python and I have got a question >> that's got me stumped. I hope this is the right mailing list: >> here goes :) >> >> I am reading some time series data out of a netcdf file a single >> timestep at a time. If the data is NaN, I want to reset it to >> the minimum of the dataset over all timesteps (which I already >> know). The data is in a variable of type >> numpy.ma.core.MaskedArray called modelData. >> >> If I do this: >> >> for i in range(len(modelData)): >> if math.isnan(modelData[i]): >> modelData[i] = dataMin >> >> I get the effect I want, If I do this: >> >> modelData[np.isnan(modelData)] = dataMin >> >> it doesn't seem to be working. Of course I could just do the >> first one, but len(modelData) is about 3.5 million, and it's >> taking about 20 seconds to run. This is happening inside of a >> rendering loop, so I'd like it to be as fast as possible, and I >> thought the second one might be faster, and maybe it is, but it >> doesn't seem to be working! :) >> >> Any ideas would be much appreciated. >> >> Thanks >> Howard >> >> -- >> Howard Lander <mailto:howard at renci.org> >> Senior Research Software Developer >> Renaissance Computing Institute (RENCI) <http://www.renci.org> >> The University of North Carolina at Chapel Hill >> Duke University >> North Carolina State University >> 100 Europa Drive >> Suite 540 >> Chapel Hill, NC 27517 >> 919-445-9651 >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Howard Lander <mailto:howard at renci.org> > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 -- Howard Lander <mailto:howard at renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/de08b3ed/attachment.html> From efiring at hawaii.edu Fri Jan 27 17:21:04 2012 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 27 Jan 2012 12:21:04 -1000 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F23148E.80902@renci.org> References: <4F23148E.80902@renci.org> Message-ID: <4F232350.4090702@hawaii.edu> On 01/27/2012 11:18 AM, Howard wrote: > Hi all > > I am a fairly recent convert to python and I have got a question that's > got me stumped. I hope this is the right mailing list: here goes :) > > I am reading some time series data out of a netcdf file a single > timestep at a time. If the data is NaN, I want to reset it to the > minimum of the dataset over all timesteps (which I already know). The > data is in a variable of type numpy.ma.core.MaskedArray called modelData. > > If I do this: > > for i in range(len(modelData)): > if math.isnan(modelData[i]): > modelData[i] = dataMin > > I get the effect I want, If I do this: > > modelData[np.isnan(modelData)] = dataMin > > it doesn't seem to be working. Of course I could just do the first one, > but len(modelData) is about 3.5 million, and it's taking about 20 > seconds to run. This is happening inside of a rendering loop, so I'd > like it to be as fast as possible, and I thought the second one might be > faster, and maybe it is, but it doesn't seem to be working! :) It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem. Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this: y = np.ma.masked_invalid(modelData).filled(dataMin) Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array. You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised. In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True --], mask = [False False True], fill_value = True) Eric > > Any ideas would be much appreciated. > > Thanks > Howard > > -- > Howard Lander <mailto:howard at renci.org> > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From howard at renci.org Fri Jan 27 17:37:35 2012 From: howard at renci.org (Howard) Date: Fri, 27 Jan 2012 17:37:35 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F232350.4090702@hawaii.edu> References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu> Message-ID: <4F23272F.30601@renci.org> On 1/27/12 5:21 PM, Eric Firing wrote: > On 01/27/2012 11:18 AM, Howard wrote: >> Hi all >> >> I am a fairly recent convert to python and I have got a question that's >> got me stumped. I hope this is the right mailing list: here goes :) >> >> I am reading some time series data out of a netcdf file a single >> timestep at a time. If the data is NaN, I want to reset it to the >> minimum of the dataset over all timesteps (which I already know). The >> data is in a variable of type numpy.ma.core.MaskedArray called modelData. >> >> If I do this: >> >> for i in range(len(modelData)): >> if math.isnan(modelData[i]): >> modelData[i] = dataMin >> >> I get the effect I want, If I do this: >> >> modelData[np.isnan(modelData)] = dataMin >> >> it doesn't seem to be working. Of course I could just do the first one, >> but len(modelData) is about 3.5 million, and it's taking about 20 >> seconds to run. This is happening inside of a rendering loop, so I'd >> like it to be as fast as possible, and I thought the second one might be >> faster, and maybe it is, but it doesn't seem to be working! :) > It would help if you would say explicitly what you mean by "doesn't seem > to be working", ideally by providing a minimal complete example > illustrating the problem. Hi Eric Thanks for the reply. Yes, I can be a little more specific about the issue. I am reading data from a storm surge model out of a NetCDF file so I can render it with tricontourf. The model data has both a triangulation and a set of lat, lon points that are invariant for the entire model run, as well as data for each time step. As the model runs, triangles in the coastal plain wet and dry: the dry values are indicated by NaN values in the data and should not be rendered. Those I mask off previous to this code. I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bi-chromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not. > > Does modelData have masked values that you want to keep separate from > your NaN values? If not, you can do this: No I don't think so. > > y = np.ma.masked_invalid(modelData).filled(dataMin) > > Then y will be an ordinary ndarray. If this is not satisfactory because > you need to keep separate some initially masked values, then you may > need to save the initial mask and use it to turn y back into a masked array. > > You may be running into trouble with your initial approach because using > np.isnan on a masked array is giving a masked array, and I think trying > to index with a masked array is not advised. This could certainly be be the issue. I will look into this Monday. Thanks very much for taking the time to reply. Howard > > In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) > Out[2]: > masked_array(data = [False True --], > mask = [False False True], > fill_value = True) > > Eric > >> Any ideas would be much appreciated. >> >> Thanks >> Howard >> >> -- >> Howard Lander<mailto:howard at renci.org> >> Senior Research Software Developer >> Renaissance Computing Institute (RENCI)<http://www.renci.org> >> The University of North Carolina at Chapel Hill >> Duke University >> North Carolina State University >> 100 Europa Drive >> Suite 540 >> Chapel Hill, NC 27517 >> 919-445-9651 >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Howard Lander <mailto:howard at renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/d89bdb08/attachment.html> From ben.root at ou.edu Fri Jan 27 17:46:04 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 27 Jan 2012 16:46:04 -0600 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F23272F.30601@renci.org> References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu> <4F23272F.30601@renci.org> Message-ID: <CANNq6FnZJi7rA7yHAP2eFQ5e5HdQ-ximBTbwBebz4yf+sUuw6A@mail.gmail.com> On Fri, Jan 27, 2012 at 4:37 PM, Howard <howard at renci.org> wrote: > I have found, in using tricontourf, that in the mapping from data values > to color values, the range of the data seems to include even the data from > the masked triangles. This causes the data to be either monochromatic or > bi-chromatic (the high and low colors in the map). However, once the > triangles are masked, if I set the corresponding data values to the known > dataMin (or in fact, any value in the valid data range) the render proceeds > correctly. So in the case of the first piece of code, I get reasonable > images: using the second I do not. > > > This sounds like a bug in tricontourf. It should not be doing that. If you could report it to the matplotlib-devel list with an example demonstrating your problem, I can see to it that it gets resolved. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/91b39ec1/attachment.html> From stefan at sun.ac.za Fri Jan 27 23:28:57 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 27 Jan 2012 20:28:57 -0800 Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com> <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com> <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com> Message-ID: <CABDkGQk-Us4Fe0kxR95VZPQMDaVxip2hL0cA-RDYG8B2H-zHtg@mail.gmail.com> Hey, Ond?ej 2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com>: > I read the Mandelbrot code using NumPy at this page: > > http://mentat.za.net/numpy/intro/intro.html I wrote this as a tutorial for beginners, so the emphasis is on simplicity. Do you have any suggestions on how to improve the code without obfuscating the tutorial? St?fan From shish at keba.be Sat Jan 28 00:28:42 2012 From: shish at keba.be (Olivier Delalleau) Date: Sat, 28 Jan 2012 00:28:42 -0500 Subject: [Numpy-discussion] NetCDF4/numpy question In-Reply-To: <4F23272F.30601@renci.org> References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu> <4F23272F.30601@renci.org> Message-ID: <CAFXk4bpSoNFNuctJwV6CRFcGzKTowS=TNGwoY0c048KWBuq2Gg@mail.gmail.com> Eric's probably right and it's indexing with a masked array that's causing you trouble. Since you seem to say your NaN values correspond to your mask, you should be able to simply do: modelData[modeData.mask] = dataMin Note that in further processing it may then make more sense to remove the mask, since your array is now full with valid data: modelData = modelData.data -=- Olivier Le 27 janvier 2012 17:37, Howard <howard at renci.org> a ?crit : > On 1/27/12 5:21 PM, Eric Firing wrote: > > On 01/27/2012 11:18 AM, Howard wrote: > > Hi all > > I am a fairly recent convert to python and I have got a question that's > got me stumped. I hope this is the right mailing list: here goes :) > > I am reading some time series data out of a netcdf file a single > timestep at a time. If the data is NaN, I want to reset it to the > minimum of the dataset over all timesteps (which I already know). The > data is in a variable of type numpy.ma.core.MaskedArray called modelData. > > If I do this: > > for i in range(len(modelData)): > if math.isnan(modelData[i]): > modelData[i] = dataMin > > I get the effect I want, If I do this: > > modelData[np.isnan(modelData)] = dataMin > > it doesn't seem to be working. Of course I could just do the first one, > but len(modelData) is about 3.5 million, and it's taking about 20 > seconds to run. This is happening inside of a rendering loop, so I'd > like it to be as fast as possible, and I thought the second one might be > faster, and maybe it is, but it doesn't seem to be working! :) > > It would help if you would say explicitly what you mean by "doesn't seem > to be working", ideally by providing a minimal complete example > illustrating the problem. > > Hi Eric > > Thanks for the reply. Yes, I can be a little more specific about the > issue. I am reading data from a storm surge model out of a NetCDF file so > I can render it with tricontourf. The model data has both a triangulation > and a set of lat, lon points that are invariant for the entire model run, > as well as data for each time step. As the model runs, triangles in the > coastal plain wet and dry: the dry values are indicated by NaN values in > the data and should not be rendered. Those I mask off previous to this > code. I have found, in using tricontourf, that in the mapping from data > values to color values, the range of the data seems to include even the > data from the masked triangles. This causes the data to be either > monochromatic or bi-chromatic (the high and low colors in the map). > However, once the triangles are masked, if I set the corresponding data > values to the known dataMin (or in fact, any value in the valid data range) > the render proceeds correctly. So in the case of the first piece of code, > I get reasonable images: using the second I do not. > > > Does modelData have masked values that you want to keep separate from > your NaN values? If not, you can do this: > > > No I don't think so. > > y = np.ma.masked_invalid(modelData).filled(dataMin) > > Then y will be an ordinary ndarray. If this is not satisfactory because > you need to keep separate some initially masked values, then you may > need to save the initial mask and use it to turn y back into a masked array. > > You may be running into trouble with your initial approach because using > np.isnan on a masked array is giving a masked array, and I think trying > to index with a masked array is not advised. > > This could certainly be be the issue. I will look into this Monday. > > Thanks very much for taking the time to reply. > Howard > > > In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) > Out[2]: > masked_array(data = [False True --], > mask = [False False True], > fill_value = True) > > Eric > > > Any ideas would be much appreciated. > > Thanks > Howard > > -- > Howard Lander <mailto:howard at renci.org> <howard at renci.org> > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Howard Lander <howard at renci.org> > > Senior Research Software Developer > Renaissance Computing Institute (RENCI) <http://www.renci.org> > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/7101e4c5/attachment.html> From e.antero.tammi at gmail.com Sat Jan 28 13:15:37 2012 From: e.antero.tammi at gmail.com (eat) Date: Sat, 28 Jan 2012 20:15:37 +0200 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? Message-ID: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> Hi, Short demonstration of the issue: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: from numpy.polynomial import Polynomial as Poly In []: def p_tst(c): ..: p= Poly(c) ..: r= p.roots() ..: return sort(abs(p(r))) ..: Now I would expect a result more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) be the case, but actually most result seems to be more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) In []: p_tst(randn(123))[-3:] Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) So, does this phenomena imply that - I'm testing with too high order polynomials (if so, does there exists a definite upper limit of polynomial order I'll not face this issue) or - it's just the 'nature' of computations with float values (if so, probably I should be able to tackle this regardless of the polynomial order) or - it's a nasty bug in class Polynomial Regards, eat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/c059c837/attachment.html> From charlesr.harris at gmail.com Sat Jan 28 16:14:17 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 28 Jan 2012 14:14:17 -0700 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? In-Reply-To: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> Message-ID: <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote: > Hi, > > Short demonstration of the issue: > In []: sys.version > Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' > In []: np.version.version > Out[]: '1.6.0' > > In []: from numpy.polynomial import Polynomial as Poly > In []: def p_tst(c): > ..: p= Poly(c) > ..: r= p.roots() > ..: return sort(abs(p(r))) > ..: > > Now I would expect a result more like: > In []: p_tst(randn(123))[-3:] > Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) > > be the case, but actually most result seems to be more like: > In []: p_tst(randn(123))[-3:] > Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) > In []: p_tst(randn(123))[-3:] > Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) > In []: p_tst(randn(123))[-3:] > Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) > In []: p_tst(randn(123))[-3:] > Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) > In []: p_tst(randn(123))[-3:] > Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) > > So, does this phenomena imply that > - I'm testing with too high order polynomials (if so, does there exists a > definite upper limit of polynomial order I'll not face this issue) > or > - it's just the 'nature' of computations with float values (if so, > probably I should be able to tackle this regardless of the polynomial order) > or > - it's a nasty bug in class Polynomial > > It's a defect. You will get all the roots and the number will equal the degree. I haven't decided what the best way to deal with this is, but my thoughts have trended towards specifying an interval with the default being the domain. If you have other thoughts I'd be glad for the feedback. For the problem at hand, note first that you are specifying the coefficients, not the roots as was the case with poly1d. Second, as a rule of thumb, plain old polynomials will generally only be good for degree < 22 due to being numerically ill conditioned. If you are really looking to use high degrees, Chebyshev or Legendre will work better, although you will probably need to explicitly specify the domain. If you want to specify the polynomial using roots, do Poly.fromroots(...). Third, for the high degrees you are probably screwed anyway for degree 123, since the accuracy of the root finding will be limited, especially for roots that can cluster, and any root that falls even a little bit outside the interval [-1,1] (the default domain) is going to evaluate to a big number simply because the polynomial is going to h*ll at a rate you wouldn't believe ;) For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things look good for degree 50, get a bit loose at degree 75 but can be fixed up with one iteration of Newton, and blow up at degree 100. I think that's pretty good, actually, doing better would require a lot more work. There are some zero finding algorithms out there that might do better if someone wants to give it a shot. In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) In [21]: sort(abs(p(p.roots()))) Out[21]: array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, 2.85411753e-21, 2.85411753e-21]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/d47a62d2/attachment.html> From warren.weckesser at enthought.com Mon Jan 30 02:17:27 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 30 Jan 2012 01:17:27 -0600 Subject: [Numpy-discussion] ufunc delegation to object method Message-ID: <CAM-+wY8M7m7y=+nXXhELkqQ9=wsrNzROFAUL8a1K2SNTao0mwA@mail.gmail.com> In the following code, numpy.sin() calls the object's sin() function: In [2]: class Foo(object): ...: def sin(self): ...: return "spam" ...: In [3]: f = Foo() In [4]: np.sin(f) Out[4]: 'spam' Is this, in fact, guaranteed behavior for a ufunc? It does not appear to be documented. This question came up in the discussion of SciPy pull request 138 ( https://github.com/scipy/scipy/pull/138), where the idea is to add numpy unary ufunc support to SciPy's sparse arrays. (Sorry if this email shows up twice. I sent it the first time while the Enthought servers were down, and eventually got an email back saying it had not been sent.) Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/86109698/attachment.html> From e.antero.tammi at gmail.com Sun Jan 29 12:03:55 2012 From: e.antero.tammi at gmail.com (eat) Date: Sun, 29 Jan 2012 19:03:55 +0200 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? In-Reply-To: <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> Message-ID: <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote: > >> Hi, >> >> Short demonstration of the issue: >> In []: sys.version >> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit >> (Intel)]' >> In []: np.version.version >> Out[]: '1.6.0' >> >> In []: from numpy.polynomial import Polynomial as Poly >> In []: def p_tst(c): >> ..: p= Poly(c) >> ..: r= p.roots() >> ..: return sort(abs(p(r))) >> ..: >> >> Now I would expect a result more like: >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) >> >> be the case, but actually most result seems to be more like: >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) >> In []: p_tst(randn(123))[-3:] >> Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) >> >> So, does this phenomena imply that >> - I'm testing with too high order polynomials (if so, does there exists a >> definite upper limit of polynomial order I'll not face this issue) >> or >> - it's just the 'nature' of computations with float values (if so, >> probably I should be able to tackle this regardless of the polynomial order) >> or >> - it's a nasty bug in class Polynomial >> >> > It's a defect. You will get all the roots and the number will equal the > degree. I haven't decided what the best way to deal with this is, but my > thoughts have trended towards specifying an interval with the default being > the domain. If you have other thoughts I'd be glad for the feedback. > > For the problem at hand, note first that you are specifying the > coefficients, not the roots as was the case with poly1d. Second, as a rule > of thumb, plain old polynomials will generally only be good for degree < 22 > due to being numerically ill conditioned. If you are really looking to use > high degrees, Chebyshev or Legendre will work better, although you will > probably need to explicitly specify the domain. If you want to specify the > polynomial using roots, do Poly.fromroots(...). Third, for the high degrees > you are probably screwed anyway for degree 123, since the accuracy of the > root finding will be limited, especially for roots that can cluster, and > any root that falls even a little bit outside the interval [-1,1] (the > default domain) is going to evaluate to a big number simply because the > polynomial is going to h*ll at a rate you wouldn't believe ;) > > For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things > look good for degree 50, get a bit loose at degree 75 but can be fixed up > with one iteration of Newton, and blow up at degree 100. I think that's > pretty good, actually, doing better would require a lot more work. There > are some zero finding algorithms out there that might do better if someone > wants to give it a shot. > > In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) > > In [21]: sort(abs(p(p.roots()))) > Out[21]: > array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, > 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, > 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, > 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, > 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, > 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, > 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, > 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, > 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, > 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, > 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, > 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, > 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, > 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, > 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, > 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, > 2.85411753e-21, 2.85411753e-21]) > Thanks, for a very informative feedback. I'll study those orthogonal polynomials more detail. Regards, - eat > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120129/3e982447/attachment.html> From charlesr.harris at gmail.com Mon Jan 30 08:55:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Jan 2012 06:55:18 -0700 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? In-Reply-To: <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com> References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com> Message-ID: <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com> On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote: > On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote: >> >>> Hi, >>> >>> Short demonstration of the issue: >>> In []: sys.version >>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit >>> (Intel)]' >>> In []: np.version.version >>> Out[]: '1.6.0' >>> >>> In []: from numpy.polynomial import Polynomial as Poly >>> In []: def p_tst(c): >>> ..: p= Poly(c) >>> ..: r= p.roots() >>> ..: return sort(abs(p(r))) >>> ..: >>> >>> Now I would expect a result more like: >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) >>> >>> be the case, but actually most result seems to be more like: >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) >>> In []: p_tst(randn(123))[-3:] >>> Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) >>> >>> So, does this phenomena imply that >>> - I'm testing with too high order polynomials (if so, does there exists >>> a definite upper limit of polynomial order I'll not face this issue) >>> or >>> - it's just the 'nature' of computations with float values (if so, >>> probably I should be able to tackle this regardless of the polynomial order) >>> or >>> - it's a nasty bug in class Polynomial >>> >>> >> It's a defect. You will get all the roots and the number will equal the >> degree. I haven't decided what the best way to deal with this is, but my >> thoughts have trended towards specifying an interval with the default being >> the domain. If you have other thoughts I'd be glad for the feedback. >> >> For the problem at hand, note first that you are specifying the >> coefficients, not the roots as was the case with poly1d. Second, as a rule >> of thumb, plain old polynomials will generally only be good for degree < 22 >> due to being numerically ill conditioned. If you are really looking to use >> high degrees, Chebyshev or Legendre will work better, although you will >> probably need to explicitly specify the domain. If you want to specify the >> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees >> you are probably screwed anyway for degree 123, since the accuracy of the >> root finding will be limited, especially for roots that can cluster, and >> any root that falls even a little bit outside the interval [-1,1] (the >> default domain) is going to evaluate to a big number simply because the >> polynomial is going to h*ll at a rate you wouldn't believe ;) >> >> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, >> things look good for degree 50, get a bit loose at degree 75 but can be >> fixed up with one iteration of Newton, and blow up at degree 100. I think >> that's pretty good, actually, doing better would require a lot more work. >> There are some zero finding algorithms out there that might do better if >> someone wants to give it a shot. >> >> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) >> >> In [21]: sort(abs(p(p.roots()))) >> Out[21]: >> array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, >> 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, >> 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, >> 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, >> 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, >> 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, >> 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, >> 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, >> 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, >> 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, >> 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, >> 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, >> 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, >> 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, >> 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, >> 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, >> 2.85411753e-21, 2.85411753e-21]) >> > Thanks, > > for a very informative feedback. I'll study those orthogonal polynomials > more detail. > > That said, I'm thinking it might be possible to get a more accurate polynomial representation from the zeros by going through a barycentric form rather than simply multiplying the factors together as is done now. Hmm... For evenly spaced roots the polynomial grows in amplitude rapidly at the ends which leads to numerical problems because a small error in the zeros turns into a large error in value because of the steepness of the curve at the zeroes. I've attached a semilogy plot of the absolute values of the polynomial with 30 equally spaced zeroes from -1 to 1. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/51018011/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: polyplot.png Type: image/png Size: 42262 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/51018011/attachment.png> From rainexpected at theo.to Mon Jan 30 10:25:26 2012 From: rainexpected at theo.to (Ted To) Date: Mon, 30 Jan 2012 10:25:26 -0500 Subject: [Numpy-discussion] Addressing arrays Message-ID: <4F26B666.4090108@theo.to> Hi, Is there some straightforward way to access an array by values across a subset of its dimensions? For example, if I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? Thanks, Ted From chaoyuejoy at gmail.com Mon Jan 30 10:27:06 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Mon, 30 Jan 2012 16:27:06 +0100 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26B666.4090108@theo.to> References: <4F26B666.4090108@theo.to> Message-ID: <CAAN-aRHHHW2iEJ+sJ5CGWNM6fL1_K4LkCzACU3PqP2ENXH-HyQ@mail.gmail.com> I am afraid you have to write index inquire function by yourself. I did like this. chao 2012/1/30 Ted To <rainexpected at theo.to> > Hi, > > Is there some straightforward way to access an array by values across a > subset of its dimensions? For example, if I have a three dimensional > array a=(x,y,z), can I look at the values of z given particular values > for x and y? > > Thanks, > Ted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/a3c81e4b/attachment.html> From malcolm.reynolds at gmail.com Mon Jan 30 10:26:59 2012 From: malcolm.reynolds at gmail.com (Malcolm Reynolds) Date: Mon, 30 Jan 2012 15:26:59 +0000 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26B666.4090108@theo.to> References: <4F26B666.4090108@theo.to> Message-ID: <CAO1Gn59xe-znR_3x8-4G1hAJmSDD61pQjKcQaB8_xXHyp+f-XQ@mail.gmail.com> On Mon, Jan 30, 2012 at 3:25 PM, Ted To <rainexpected at theo.to> wrote: > Is there some straightforward way to access an array by values across a > subset of its dimensions? ?For example, if I have a three dimensional > array a=(x,y,z), can I look at the values of z given particular values > for x and y? a[x, y, :] should get you what you want I believe.. Malcolm From zachary.pincus at yale.edu Mon Jan 30 10:28:38 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 30 Jan 2012 10:28:38 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26B666.4090108@theo.to> References: <4F26B666.4090108@theo.to> Message-ID: <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> a[x,y,:] Read the slicing part of the tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial (section 1.6) And the documentation: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html On Jan 30, 2012, at 10:25 AM, Ted To wrote: > Hi, > > Is there some straightforward way to access an array by values across a > subset of its dimensions? For example, if I have a three dimensional > array a=(x,y,z), can I look at the values of z given particular values > for x and y? > > Thanks, > Ted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chaoyuejoy at gmail.com Mon Jan 30 10:33:05 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Mon, 30 Jan 2012 16:33:05 +0100 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> Message-ID: <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> he is not asking for slicing. he is asking for how to index array by element value but not element index. 2012/1/30 Zachary Pincus <zachary.pincus at yale.edu> > a[x,y,:] > > Read the slicing part of the tutorial: > http://www.scipy.org/Tentative_NumPy_Tutorial > (section 1.6) > > And the documentation: > http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html > > > > On Jan 30, 2012, at 10:25 AM, Ted To wrote: > > > Hi, > > > > Is there some straightforward way to access an array by values across a > > subset of its dimensions? For example, if I have a three dimensional > > array a=(x,y,z), can I look at the values of z given particular values > > for x and y? > > > > Thanks, > > Ted > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/cbd0da1e/attachment.html> From zachary.pincus at yale.edu Mon Jan 30 10:50:01 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 30 Jan 2012 10:50:01 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> Message-ID: <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> Ted, can you clarify what you're asking for? Maybe give a trivial example of an array and the desired output? I'm pretty sure this is a slicing question though: > If I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? Given that element values are scalars in this case, and indices are (x,y,z) triples, it seems likely that looking for "values of z" given an (x,y) pair is an slicing-by-index question, no? For indexing-by-value, "fancy indexing" with boolean masks is usually the way to go... again, Ted (or Chao), if you can describe your indexing needs in a bit more detail, it's often easy to find a compact slicing and/or fancy-indexing strategy that works well and reasonably efficiently. Zach On Jan 30, 2012, at 10:33 AM, Chao YUE wrote: > he is not asking for slicing. he is asking for how to index array by element value but not element index. > > 2012/1/30 Zachary Pincus <zachary.pincus at yale.edu> > a[x,y,:] > > Read the slicing part of the tutorial: > http://www.scipy.org/Tentative_NumPy_Tutorial > (section 1.6) > > And the documentation: > http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html > > > > On Jan 30, 2012, at 10:25 AM, Ted To wrote: > > > Hi, > > > > Is there some straightforward way to access an array by values across a > > subset of its dimensions? For example, if I have a three dimensional > > array a=(x,y,z), can I look at the values of z given particular values > > for x and y? > > > > Thanks, > > Ted > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From rainexpected at theo.to Mon Jan 30 11:57:12 2012 From: rainexpected at theo.to (Ted To) Date: Mon, 30 Jan 2012 11:57:12 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> Message-ID: <4F26CBE8.6090703@theo.to> Sure thing. To keep it simple suppose I have just a two dimensional array (time,output): [(1,2),(2,3),(3,4)] I would like to look at all values of output for which, for example time==2. My actual application has a six dimensional array and I'd like to look at the contents using one or more of the first three dimensions. Many thanks, Ted On 01/30/2012 10:50 AM, Zachary Pincus wrote: > Ted, can you clarify what you're asking for? Maybe give a trivial example of an array and the desired output? > > I'm pretty sure this is a slicing question though: >> If I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? > Given that element values are scalars in this case, and indices are (x,y,z) triples, it seems likely that looking for "values of z" given an (x,y) pair is an slicing-by-index question, no? > > For indexing-by-value, "fancy indexing" with boolean masks is usually the way to go... again, Ted (or Chao), if you can describe your indexing needs in a bit more detail, it's often easy to find a compact slicing and/or fancy-indexing strategy that works well and reasonably efficiently. > > Zach > > > > On Jan 30, 2012, at 10:33 AM, Chao YUE wrote: > >> he is not asking for slicing. he is asking for how to index array by element value but not element index. >> >> 2012/1/30 Zachary Pincus <zachary.pincus at yale.edu> >> a[x,y,:] >> >> Read the slicing part of the tutorial: >> http://www.scipy.org/Tentative_NumPy_Tutorial >> (section 1.6) >> >> And the documentation: >> http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html >> >> >> >> On Jan 30, 2012, at 10:25 AM, Ted To wrote: >> >>> Hi, >>> >>> Is there some straightforward way to access an array by values across a >>> subset of its dimensions? For example, if I have a three dimensional >>> array a=(x,y,z), can I look at the values of z given particular values >>> for x and y? >>> >>> Thanks, >>> Ted >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> -- >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> ************************************************************************************ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From brett.olsen at gmail.com Mon Jan 30 12:13:33 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Mon, 30 Jan 2012 11:13:33 -0600 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26CBE8.6090703@theo.to> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> <4F26CBE8.6090703@theo.to> Message-ID: <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote: > Sure thing. ?To keep it simple suppose I have just a two dimensional > array (time,output): > [(1,2),(2,3),(3,4)] > I would like to look at all values of output for which, for example time==2. > > My actual application has a six dimensional array and I'd like to look > at the contents using one or more of the first three dimensions. > > Many thanks, > Ted Couldn't you just do something like this with boolean indexing: In [1]: import numpy as np In [2]: a = np.array([(1,2),(2,3),(3,4)]) In [3]: a Out[3]: array([[1, 2], [2, 3], [3, 4]]) In [4]: mask = a[:,0] == 2 In [5]: mask Out[5]: array([False, True, False], dtype=bool) In [6]: a[mask,1] Out[6]: array([3]) ~Brett From rainexpected at theo.to Mon Jan 30 12:31:55 2012 From: rainexpected at theo.to (Ted To) Date: Mon, 30 Jan 2012 12:31:55 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> <4F26CBE8.6090703@theo.to> <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> Message-ID: <4F26D40B.4040303@theo.to> On 01/30/2012 12:13 PM, Brett Olsen wrote: > On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote: >> Sure thing. To keep it simple suppose I have just a two dimensional >> array (time,output): >> [(1,2),(2,3),(3,4)] >> I would like to look at all values of output for which, for example time==2. >> >> My actual application has a six dimensional array and I'd like to look >> at the contents using one or more of the first three dimensions. >> >> Many thanks, >> Ted > > Couldn't you just do something like this with boolean indexing: > > In [1]: import numpy as np > > In [2]: a = np.array([(1,2),(2,3),(3,4)]) > > In [3]: a > Out[3]: > array([[1, 2], > [2, 3], > [3, 4]]) > > In [4]: mask = a[:,0] == 2 > > In [5]: mask > Out[5]: array([False, True, False], dtype=bool) > > In [6]: a[mask,1] > Out[6]: array([3]) > > ~Brett Thanks! That works great if I only want to search over one index but I can't quite figure out what to do with more than a single index. So suppose I have a labeled, multidimensional array with labels 'month', 'year' and 'quantity'. a[['month','year']] gives me an array of indices but "a[['month','year']]==(1,1960)" produces "False". I'm sure I simply don't know the proper syntax and I apologize for that -- I'm kind of new to numpy. Ted From zachary.pincus at yale.edu Mon Jan 30 13:29:38 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 30 Jan 2012 13:29:38 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26D40B.4040303@theo.to> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> <4F26CBE8.6090703@theo.to> <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> <4F26D40B.4040303@theo.to> Message-ID: <E42CB68E-F4D5-4E34-817C-46A89053CA8B@yale.edu> > Thanks! That works great if I only want to search over one index but I > can't quite figure out what to do with more than a single index. So > suppose I have a labeled, multidimensional array with labels 'month', > 'year' and 'quantity'. a[['month','year']] gives me an array of indices > but "a[['month','year']]==(1,1960)" produces "False". I'm sure I simply > don't know the proper syntax and I apologize for that -- I'm kind of new > to numpy. I think that your best bet is to form the boolean masks independently and then logical-and them together: mask = (a['month'] == 1) & (a['year'] == 1960) jan_60 = a[mask] Someone might have more insight here. Though I should note that if you have large data and are doing lots of "queries" like this, a more database-ish approach might be better. Something like sqlite's python bindings, or PyTables. Alternately, if your data are all time-series based things, PANDAS might be worth looking at. But the above approach should be just fine for non-huge datasets... Zach From brett.olsen at gmail.com Mon Jan 30 13:30:58 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Mon, 30 Jan 2012 12:30:58 -0600 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <4F26D40B.4040303@theo.to> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> <4F26CBE8.6090703@theo.to> <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> <4F26D40B.4040303@theo.to> Message-ID: <CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com> On Mon, Jan 30, 2012 at 11:31 AM, Ted To <rainexpected at theo.to> wrote: > On 01/30/2012 12:13 PM, Brett Olsen wrote: >> On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote: >>> Sure thing. ?To keep it simple suppose I have just a two dimensional >>> array (time,output): >>> [(1,2),(2,3),(3,4)] >>> I would like to look at all values of output for which, for example time==2. >>> >>> My actual application has a six dimensional array and I'd like to look >>> at the contents using one or more of the first three dimensions. >>> >>> Many thanks, >>> Ted >> >> Couldn't you just do something like this with boolean indexing: >> >> In [1]: import numpy as np >> >> In [2]: a = np.array([(1,2),(2,3),(3,4)]) >> >> In [3]: a >> Out[3]: >> array([[1, 2], >> ? ? ? ?[2, 3], >> ? ? ? ?[3, 4]]) >> >> In [4]: mask = a[:,0] == 2 >> >> In [5]: mask >> Out[5]: array([False, ?True, False], dtype=bool) >> >> In [6]: a[mask,1] >> Out[6]: array([3]) >> >> ~Brett > > Thanks! ?That works great if I only want to search over one index but I > can't quite figure out what to do with more than a single index. ?So > suppose I have a labeled, multidimensional array with labels 'month', > 'year' and 'quantity'. ?a[['month','year']] gives me an array of indices > but "a[['month','year']]==(1,1960)" produces "False". ?I'm sure I simply > don't know the proper syntax and I apologize for that -- I'm kind of new > to numpy. > > Ted You'd want to update your mask appropriately to get everything you want to select, one criteria at a time e.g.: mask = a[:,0] == 1 mask &= a[:,1] == 1960 Alternatively: mask = (a[:,0] == 1) & (a[:,1] == 1960) but be careful with the parens, & and | are normally high-priority bitwise operators and if you leave the parens out, it will try to bitwise-and 1 and a[:,1] and throw an error. If you've got a ton of parameters, you can combine these more aesthetically with: mask = (a[:,[0,1]] == [1, 1960]).all(axis=1) ~Brett From rainexpected at theo.to Mon Jan 30 13:39:13 2012 From: rainexpected at theo.to (Ted To) Date: Mon, 30 Jan 2012 13:39:13 -0500 Subject: [Numpy-discussion] Addressing arrays In-Reply-To: <CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com> References: <4F26B666.4090108@theo.to> <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu> <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com> <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu> <4F26CBE8.6090703@theo.to> <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com> <4F26D40B.4040303@theo.to> <CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com> Message-ID: <4F26E3D1.2000700@theo.to> > You'd want to update your mask appropriately to get everything you > want to select, one criteria at a time e.g.: > mask = a[:,0] == 1 > mask &= a[:,1] == 1960 > > Alternatively: > mask = (a[:,0] == 1) & (a[:,1] == 1960) > but be careful with the parens, & and | are normally high-priority > bitwise operators and if you leave the parens out, it will try to > bitwise-and 1 and a[:,1] and throw an error. > > If you've got a ton of parameters, you can combine these more > aesthetically with: > mask = (a[:,[0,1]] == [1, 1960]).all(axis=1) > > ~Brett Zach and Brett, Many thanks -- that is exactly what I need. Cheers, Ted From ruby185 at gmail.com Mon Jan 30 14:21:03 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Mon, 30 Jan 2012 14:21:03 -0500 Subject: [Numpy-discussion] histogram help Message-ID: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com> hi, all I am trying to figure out how to do histogram with numpy I have a three-dimension array A[x,y,z], another array (bins) has been allocated along Z dimension, z' how can I get the histogram of H[ x, y, z' ]? thanks for your help. Ruby From ruby185 at gmail.com Mon Jan 30 14:21:43 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Mon, 30 Jan 2012 14:21:43 -0500 Subject: [Numpy-discussion] condense array along one dimension In-Reply-To: <CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com> References: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com> <CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com> Message-ID: <CAA=a5iPCWp-JxBk49KazeXrar6XR-R2kFvtaGk9+ekCsHm5Tmw@mail.gmail.com> I think this is exactly what I need. Thanks for your help, Olivier. Ruby On Fri, Jan 20, 2012 at 9:50 AM, Olivier Delalleau <shish at keba.be> wrote: > What do you mean by "summarize"? > If for instance you want to sum along Y, just do > ? my_array.sum(axis=1) > > -=- Olivier > > 2012/1/20 Ruby Stevenson <ruby185 at gmail.com> >> >> hi, all >> >> Say I have a three dimension array, X, Y, Z, ?how can I condense into >> two dimensions: for example, compute 2-D array with (X, Z) and >> summarize along Y dimensions ... is it possible? >> >> thanks >> >> Ruby >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ruby185 at gmail.com Mon Jan 30 14:27:15 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Mon, 30 Jan 2012 14:27:15 -0500 Subject: [Numpy-discussion] histogram help In-Reply-To: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com> References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com> Message-ID: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com> Sorry, I realize I didn't describe the problem completely clear or correct. the (x,y) in this case is just many co-ordinates, and each coordinate has a list of values (Z value) associated with it. The bins are allocated for the Z. I hope this clarify things a little. Thanks again. Ruby On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote: > hi, all > > I am trying to figure out how to do histogram with numpy > > I have a three-dimension array A[x,y,z], ?another array (bins) has > been allocated along Z dimension, z' > > how can I get the histogram of H[ x, y, z' ]? > > thanks for your help. > > Ruby From charlesr.harris at gmail.com Mon Jan 30 15:15:47 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Jan 2012 13:15:47 -0700 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? In-Reply-To: <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com> References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com> <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com> Message-ID: <CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com> On Mon, Jan 30, 2012 at 6:55 AM, Charles R Harris <charlesr.harris at gmail.com > wrote: > > > On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote: > >> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Short demonstration of the issue: >>>> In []: sys.version >>>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit >>>> (Intel)]' >>>> In []: np.version.version >>>> Out[]: '1.6.0' >>>> >>>> In []: from numpy.polynomial import Polynomial as Poly >>>> In []: def p_tst(c): >>>> ..: p= Poly(c) >>>> ..: r= p.roots() >>>> ..: return sort(abs(p(r))) >>>> ..: >>>> >>>> Now I would expect a result more like: >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) >>>> >>>> be the case, but actually most result seems to be more like: >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) >>>> In []: p_tst(randn(123))[-3:] >>>> Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) >>>> >>>> So, does this phenomena imply that >>>> - I'm testing with too high order polynomials (if so, does there exists >>>> a definite upper limit of polynomial order I'll not face this issue) >>>> or >>>> - it's just the 'nature' of computations with float values (if so, >>>> probably I should be able to tackle this regardless of the polynomial order) >>>> or >>>> - it's a nasty bug in class Polynomial >>>> >>>> >>> It's a defect. You will get all the roots and the number will equal the >>> degree. I haven't decided what the best way to deal with this is, but my >>> thoughts have trended towards specifying an interval with the default being >>> the domain. If you have other thoughts I'd be glad for the feedback. >>> >>> For the problem at hand, note first that you are specifying the >>> coefficients, not the roots as was the case with poly1d. Second, as a rule >>> of thumb, plain old polynomials will generally only be good for degree < 22 >>> due to being numerically ill conditioned. If you are really looking to use >>> high degrees, Chebyshev or Legendre will work better, although you will >>> probably need to explicitly specify the domain. If you want to specify the >>> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees >>> you are probably screwed anyway for degree 123, since the accuracy of the >>> root finding will be limited, especially for roots that can cluster, and >>> any root that falls even a little bit outside the interval [-1,1] (the >>> default domain) is going to evaluate to a big number simply because the >>> polynomial is going to h*ll at a rate you wouldn't believe ;) >>> >>> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, >>> things look good for degree 50, get a bit loose at degree 75 but can be >>> fixed up with one iteration of Newton, and blow up at degree 100. I think >>> that's pretty good, actually, doing better would require a lot more work. >>> There are some zero finding algorithms out there that might do better if >>> someone wants to give it a shot. >>> >>> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) >>> >>> In [21]: sort(abs(p(p.roots()))) >>> Out[21]: >>> array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, >>> 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, >>> 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, >>> 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, >>> 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, >>> 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, >>> 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, >>> 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, >>> 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, >>> 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, >>> 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, >>> 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, >>> 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, >>> 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, >>> 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, >>> 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, >>> 2.85411753e-21, 2.85411753e-21]) >>> >> Thanks, >> >> for a very informative feedback. I'll study those orthogonal polynomials >> more detail. >> >> > That said, I'm thinking it might be possible to get a more accurate > polynomial representation from the zeros by going through a barycentric > form rather than simply multiplying the factors together as is done now. > Hmm... > > For evenly spaced roots the polynomial grows in amplitude rapidly at the > ends which leads to numerical problems because a small error in the zeros > turns into a large error in value because of the steepness of the curve at > the zeroes. I've attached a semilogy plot of the absolute values of the > polynomial with 30 equally spaced zeroes from -1 to 1. > > I've attached a plot of the Chebyshev coefficients for the monic polynomial with 50 zeros evenly spaced from -1, 1. The odd coefficients should be zero, so their value tells you what the error in the coefficient determination was (I used Gauss-Chebyshev integration). The value of the resulting Chebyshev series cannot be evaluated with sufficient accuracy in double precision due to the dynamic range of the coefficients and I expect that simple inability of double precision to correctly represent the values extends to the root finding. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/7a88b60e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: chebcoef-deg50.png Type: image/png Size: 41467 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/7a88b60e/attachment.png> From scipy at samueljohn.de Mon Jan 30 17:06:22 2012 From: scipy at samueljohn.de (Samuel John) Date: Mon, 30 Jan 2012 23:06:22 +0100 Subject: [Numpy-discussion] histogram help In-Reply-To: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com> References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com> <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com> Message-ID: <F1916220-7CB1-4DE8-A22F-BE3709E56EC7@samueljohn.de> Hi Ruby, I still do not fully understand your question but what I do in such cases is to construct a very simple array and test the functions. The help of numpy.histogram2d or numpy.histogramdd (for more than two dims) might help here. So I guess, basically you want to ignore the x,y positions and just look at the combined distribution of the Z values? In this case, you would just need the numpy.histogram (the 1d version). Note that the histogram returns the numbers and the bin-borders. bests Samuel On 30.01.2012, at 20:27, Ruby Stevenson wrote: > Sorry, I realize I didn't describe the problem completely clear or correct. > > the (x,y) in this case is just many co-ordinates, and each coordinate > has a list of values (Z value) associated with it. The bins are > allocated for the Z. > > I hope this clarify things a little. Thanks again. > > Ruby > > > > > On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote: >> hi, all >> >> I am trying to figure out how to do histogram with numpy >> >> I have a three-dimension array A[x,y,z], another array (bins) has >> been allocated along Z dimension, z' >> >> how can I get the histogram of H[ x, y, z' ]? >> >> thanks for your help. >> >> Ruby > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Mon Jan 30 17:20:57 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Jan 2012 15:20:57 -0700 Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug? In-Reply-To: <CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com> References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com> <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com> <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com> <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com> <CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com> Message-ID: <CAB6mnxJ5rkJcUfT8bH5g6-yDnYN_KUc9U_EnEQFEUk_-9kh5Wg@mail.gmail.com> On Mon, Jan 30, 2012 at 1:15 PM, Charles R Harris <charlesr.harris at gmail.com > wrote: > > > On Mon, Jan 30, 2012 at 6:55 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote: >> >>> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Short demonstration of the issue: >>>>> In []: sys.version >>>>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit >>>>> (Intel)]' >>>>> In []: np.version.version >>>>> Out[]: '1.6.0' >>>>> >>>>> In []: from numpy.polynomial import Polynomial as Poly >>>>> In []: def p_tst(c): >>>>> ..: p= Poly(c) >>>>> ..: r= p.roots() >>>>> ..: return sort(abs(p(r))) >>>>> ..: >>>>> >>>>> Now I would expect a result more like: >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) >>>>> >>>>> be the case, but actually most result seems to be more like: >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) >>>>> In []: p_tst(randn(123))[-3:] >>>>> Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) >>>>> >>>>> So, does this phenomena imply that >>>>> - I'm testing with too high order polynomials (if so, does there >>>>> exists a definite upper limit of polynomial order I'll not face this issue) >>>>> or >>>>> - it's just the 'nature' of computations with float values (if so, >>>>> probably I should be able to tackle this regardless of the polynomial order) >>>>> or >>>>> - it's a nasty bug in class Polynomial >>>>> >>>>> >>>> It's a defect. You will get all the roots and the number will equal the >>>> degree. I haven't decided what the best way to deal with this is, but my >>>> thoughts have trended towards specifying an interval with the default being >>>> the domain. If you have other thoughts I'd be glad for the feedback. >>>> >>>> For the problem at hand, note first that you are specifying the >>>> coefficients, not the roots as was the case with poly1d. Second, as a rule >>>> of thumb, plain old polynomials will generally only be good for degree < 22 >>>> due to being numerically ill conditioned. If you are really looking to use >>>> high degrees, Chebyshev or Legendre will work better, although you will >>>> probably need to explicitly specify the domain. If you want to specify the >>>> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees >>>> you are probably screwed anyway for degree 123, since the accuracy of the >>>> root finding will be limited, especially for roots that can cluster, and >>>> any root that falls even a little bit outside the interval [-1,1] (the >>>> default domain) is going to evaluate to a big number simply because the >>>> polynomial is going to h*ll at a rate you wouldn't believe ;) >>>> >>>> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, >>>> things look good for degree 50, get a bit loose at degree 75 but can be >>>> fixed up with one iteration of Newton, and blow up at degree 100. I think >>>> that's pretty good, actually, doing better would require a lot more work. >>>> There are some zero finding algorithms out there that might do better if >>>> someone wants to give it a shot. >>>> >>>> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) >>>> >>>> In [21]: sort(abs(p(p.roots()))) >>>> Out[21]: >>>> array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, >>>> 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, >>>> 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, >>>> 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, >>>> 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, >>>> 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, >>>> 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, >>>> 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, >>>> 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, >>>> 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, >>>> 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, >>>> 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, >>>> 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, >>>> 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, >>>> 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, >>>> 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, >>>> 2.85411753e-21, 2.85411753e-21]) >>>> >>> Thanks, >>> >>> for a very informative feedback. I'll study those orthogonal polynomials >>> more detail. >>> >>> >> That said, I'm thinking it might be possible to get a more accurate >> polynomial representation from the zeros by going through a barycentric >> form rather than simply multiplying the factors together as is done now. >> Hmm... >> >> For evenly spaced roots the polynomial grows in amplitude rapidly at the >> ends which leads to numerical problems because a small error in the zeros >> turns into a large error in value because of the steepness of the curve at >> the zeroes. I've attached a semilogy plot of the absolute values of the >> polynomial with 30 equally spaced zeroes from -1 to 1. >> >> > > I've attached a plot of the Chebyshev coefficients for the monic > polynomial with 50 zeros evenly spaced from -1, 1. The odd coefficients > should be zero, so their value tells you what the error in the coefficient > determination was (I used Gauss-Chebyshev integration). The value of the > resulting Chebyshev series cannot be evaluated with sufficient accuracy in > double precision due to the dynamic range of the coefficients and I expect > that simple inability of double precision to correctly represent the values > extends to the root finding. > > Oops, that was erroneous. The proximate cause of the problem seems to be poor precision in obtaining the coefficients from the roots. That can be improved. I've attached a few more plots ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: polycoef.png Type: image/png Size: 36547 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: polyval.png Type: image/png Size: 57555 bytes Desc: not available URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment-0001.png> From chris.barker at noaa.gov Mon Jan 30 19:35:05 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 30 Jan 2012 16:35:05 -0800 Subject: [Numpy-discussion] preferred way of testing empty arrays In-Reply-To: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com> References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de> <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com> <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de> <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com> Message-ID: <CALGmxELAhzRU7PUkxbdsxyF+akbAN_qvxc6Gx+jOB29WF83W8Q@mail.gmail.com> On Fri, Jan 27, 2012 at 1:29 PM, Robert Kern <robert.kern at gmail.com> wrote: > Well, if you really need to do this in more than one place, define a > utility function and call it a day. > > def should_not_plot(x): > ? ?if x is None: > ? ? ? ?return True > ? ?elif isinstance(x, np.ndarray): > ? ? ? ?return x.size == 0 > ? ?else: > ? ? ? ?return bool(x) I tend to do things like: def convert_to_plotable(x): if x is None: return None else: x = np.asarray(x) if b.size == 0: return None return x it does mean you need to check for None later anyway, but I like to convert to an array early in the process -- then you know you have either an array or None at that point. NOTE: you could also raise and handle an exception instead. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From opossumnano at gmail.com Tue Jan 31 07:42:21 2012 From: opossumnano at gmail.com (Tiziano Zito) Date: Tue, 31 Jan 2012 13:42:21 +0100 Subject: [Numpy-discussion] [ANN] Summer School "Advanced Scientific Programming in Python" in Kiel, Germany Message-ID: <20120131124221.GD12374@multivac.zonafranca> Advanced Scientific Programming in Python ========================================= a Summer School by the G-Node and the Institute of Experimental and Applied Physics, Christian-Albrechts-Universit?t zu Kiel Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location ================= September 2?7, 2012. Kiel, Germany. Preliminary Program =================== Day 0 (Sun Sept 2) ? Best Programming Practices - Best Practices, Development Methodologies and the Zen of Python - Version control with git - Object-oriented programming & design patterns Day 1 (Mon Sept 3) ? Software Carpentry - Test-driven development, unit testing & quality assurance - Debugging, profiling and benchmarking techniques - Best practices in data visualization - Programming in teams Day 2 (Tue Sept 4) ? Scientific Tools for Python - Advanced NumPy - The Quest for Speed (intro): Interfacing to C with Cython - Advanced Python I: idioms, useful built-in data structures, generators Day 3 (Wed Sept 5) ? The Quest for Speed - Writing parallel applications in Python - Programming project Day 4 (Thu Sept 6) ? Efficient Memory Management - When parallelization does not help: the starving CPUs problem - Advanced Python II: decorators and context managers - Programming project Day 5 (Fri Sept 7) ? Practical Software Development - Programming project - The Pelita Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects. Applications ============ You can apply on-line at http://python.g-node.org Applications must be submitted before 23:59 UTC, May 1, 2012. Notifications of acceptance will be sent by June 1, 2012. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate last time was around 20%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. You are encouraged to go through the introductory material available on the website. Faculty ======= - Francesc Alted, Continuum Analytics Inc., USA - Pietro Berkes, Enthought Inc., UK - Valentin Haenel, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland - Zbigniew J?drzejewski-Szmek, Faculty of Physics, University of Warsaw, Poland - Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland - Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy - Rike-Benjamin Schuppner, Technologit GbR, Germany - Bartosz Tele?czuk, Unit? de Neurosciences Information et Complexit?, Centre National de la Recherche Scientifique, France - St?fan van der Walt, Helen Wills Neuroscience Institute, University of California Berkeley, USA - Bastian Venthur, Berlin Institute of Technology and Bernstein Focus Neurotechnology, Germany - Niko Wilbert, TNG Technology Consulting GmbH, Germany - Tiziano Zito, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany Organized by Christian T. Steigies and Christian Drews of the Institute of Experimental and Applied Physics, Christian-Albrechts-Universit?t zu Kiel , and by Zbigniew J?drzejewski-Szmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Website: http://python.g-node.org Contact: python-info at g-node.org From ndbecker2 at gmail.com Tue Jan 31 08:26:34 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 31 Jan 2012 08:26:34 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) Message-ID: <jg8q6a$j3e$2@dough.gmane.org> I was just bitten by this unexpected behavior: In [24]: all ([i> 0 for i in xrange (10)]) Out[24]: False In [25]: all (i> 0 for i in xrange (10)) Out[25]: True Turns out: In [31]: all is numpy.all Out[31]: True So numpy.all doesn't seem to do what I would expect when given a generator. Bug? From nadavh at visionsense.com Tue Jan 31 04:42:55 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 31 Jan 2012 01:42:55 -0800 Subject: [Numpy-discussion] histogram help In-Reply-To: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com> References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>, <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com> Message-ID: <26FC23E7C398A64083C980D16001012D261F0D938C@VA3DIAXVS361.RED001.local> Do you want a histogramm of z for each (x,y) ? Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Ruby Stevenson [ruby185 at gmail.com] Sent: 30 January 2012 21:27 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] histogram help Sorry, I realize I didn't describe the problem completely clear or correct. the (x,y) in this case is just many co-ordinates, and each coordinate has a list of values (Z value) associated with it. The bins are allocated for the Z. I hope this clarify things a little. Thanks again. Ruby On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote: > hi, all > > I am trying to figure out how to do histogram with numpy > > I have a three-dimension array A[x,y,z], another array (bins) has > been allocated along Z dimension, z' > > how can I get the histogram of H[ x, y, z' ]? > > thanks for your help. > > Ruby _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From madsipsen at gmail.com Tue Jan 31 03:29:23 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 31 Jan 2012 09:29:23 +0100 Subject: [Numpy-discussion] Unexpected reorganization of internal data Message-ID: <4F27A663.4000403@gmail.com> Hi, I am confused. Here's the reason: The following structure is a representation of N points in 3D space: U = numpy.array([[x1,y1,z1], [x1,y1,z1],...,[xn,yn,zn]]) So the array U has shape (N,3). This order makes sense to me since U[i] will give you the i'th point in the set. Now, I want to pass this array to a C++ function that does some stuff with the points. Here's how I do that void Foo::doStuff(int n, PyObject * numpy_data) { // Get pointer to data double * const positions = (double *) PyArray_DATA(numpy_data); // Print positions for (int i=0; i<n; ++i) { float x = static_cast<float>(positions[3*i+0]) float y = static_cast<float>(positions[3*i+1]) float z = static_cast<float>(positions[3*i+2]) printf("Pos[%d] = %f %f %f\n", x, y, z); } } When I call this routine, using a swig wrapped Python interface to the C++ class, everything prints out nice. Now, I want to apply a rotation to all the positions. So I set up some rotation matrix R like this: R = numpy.array([[r11,r12,r13], [r21,r22,r23], [r31,r32,r33]]) To apply the matrix to the data in one crunch, I do V = numpy.dot(R, U.transpose()).transpose() Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed. So apparently the internal data structure handled by numpy has been reorganized, even though I called transpose() twice, which I would expect to cancel out each other. However, if I do: V = numpy.array(U.transpose()).transpose() and call the C++ routine, everything is perfectly fine, ie. the data structure is as expected. What went wrong? Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/c50bf518/attachment.html> From robert.kern at gmail.com Tue Jan 31 09:07:31 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 31 Jan 2012 14:07:31 +0000 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <jg8q6a$j3e$2@dough.gmane.org> References: <jg8q6a$j3e$2@dough.gmane.org> Message-ID: <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com> On Tue, Jan 31, 2012 at 13:26, Neal Becker <ndbecker2 at gmail.com> wrote: > I was just bitten by this unexpected behavior: > > In [24]: all ([i> ?0 for i in xrange (10)]) > Out[24]: False > > In [25]: all (i> ?0 for i in xrange (10)) > Out[25]: True > > Turns out: > In [31]: all is numpy.all > Out[31]: True > > So numpy.all doesn't seem to do what I would expect when given a generator. > Bug? Expected behavior. numpy.all(), like nearly all numpy functions, converts the input to an array using numpy.asarray(). numpy.asarray() knows nothing special about generators and other iterables that are not sequences, so it thinks it's a single scalar object. This scalar object happens to have a __nonzero__() method that returns True like most Python objects that don't override this. In order to use generic iterators that are not sequences, you need to explicitly use numpy.fromiter() to convert them to ndarrays. asarray() and array() can't do it in general because they need to autodiscover the shape and dtype all at the same time. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From d.s.seljebotn at astro.uio.no Tue Jan 31 09:14:24 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 31 Jan 2012 15:14:24 +0100 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com> Message-ID: <4F27F740.9060202@astro.uio.no> On 01/31/2012 03:07 PM, Robert Kern wrote: > On Tue, Jan 31, 2012 at 13:26, Neal Becker<ndbecker2 at gmail.com> wrote: >> I was just bitten by this unexpected behavior: >> >> In [24]: all ([i> 0 for i in xrange (10)]) >> Out[24]: False >> >> In [25]: all (i> 0 for i in xrange (10)) >> Out[25]: True >> >> Turns out: >> In [31]: all is numpy.all >> Out[31]: True >> >> So numpy.all doesn't seem to do what I would expect when given a generator. >> Bug? > > Expected behavior. numpy.all(), like nearly all numpy functions, > converts the input to an array using numpy.asarray(). numpy.asarray() > knows nothing special about generators and other iterables that are > not sequences, so it thinks it's a single scalar object. This scalar > object happens to have a __nonzero__() method that returns True like > most Python objects that don't override this. > > In order to use generic iterators that are not sequences, you need to > explicitly use numpy.fromiter() to convert them to ndarrays. asarray() > and array() can't do it in general because they need to autodiscover > the shape and dtype all at the same time. Perhaps np.asarray could specifically check for a generator argument and raise an exception? I imagine that would save people some time when running into this... If you really want In [7]: x = np.asarray(None) In [8]: x[()] = (i for i in range(10)) In [9]: x Out[9]: array(<generator object <genexpr> at 0x4553fa0>, dtype=object) ...then one can type it out? Dag From malcolm.reynolds at gmail.com Tue Jan 31 09:14:14 2012 From: malcolm.reynolds at gmail.com (Malcolm Reynolds) Date: Tue, 31 Jan 2012 14:14:14 +0000 Subject: [Numpy-discussion] Unexpected reorganization of internal data In-Reply-To: <4F27A663.4000403@gmail.com> References: <4F27A663.4000403@gmail.com> Message-ID: <CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com> Not exactly an answer to your question, but I can highly recommend using Boost.python, PyUblas and Ublas for your C++ vectors and matrices. It gives you a really good interface on the C++ side to numpy arrays and matrices, which can be passed in both directions over the language threshold with no copying. If I had to guess I'd say sometimes when transposing numpy simply sets a flag internally to avoid copying the data, but in some cases (such as perhaps when multiplication needs to take place) the data has to be placed in a new object. Accessing the data via raw pointers in C++ may not be checking for the 'transpose' flag and therefore you see an unexpected result. Disclaimer: this is just a guess, someone more familiar with Numpy internals will no doubt be able to correct me. Malcolm From ndbecker2 at gmail.com Tue Jan 31 09:33:55 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 31 Jan 2012 09:33:55 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) References: <jg8q6a$j3e$2@dough.gmane.org> <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com> <4F27F740.9060202@astro.uio.no> Message-ID: <jg8u4j$j21$1@dough.gmane.org> Dag Sverre Seljebotn wrote: > On 01/31/2012 03:07 PM, Robert Kern wrote: >> On Tue, Jan 31, 2012 at 13:26, Neal Becker<ndbecker2 at gmail.com> wrote: >>> I was just bitten by this unexpected behavior: >>> >>> In [24]: all ([i> 0 for i in xrange (10)]) >>> Out[24]: False >>> >>> In [25]: all (i> 0 for i in xrange (10)) >>> Out[25]: True >>> >>> Turns out: >>> In [31]: all is numpy.all >>> Out[31]: True >>> >>> So numpy.all doesn't seem to do what I would expect when given a generator. >>> Bug? >> >> Expected behavior. numpy.all(), like nearly all numpy functions, >> converts the input to an array using numpy.asarray(). numpy.asarray() >> knows nothing special about generators and other iterables that are >> not sequences, so it thinks it's a single scalar object. This scalar >> object happens to have a __nonzero__() method that returns True like >> most Python objects that don't override this. >> >> In order to use generic iterators that are not sequences, you need to >> explicitly use numpy.fromiter() to convert them to ndarrays. asarray() >> and array() can't do it in general because they need to autodiscover >> the shape and dtype all at the same time. > > Perhaps np.asarray could specifically check for a generator argument and > raise an exception? I imagine that would save people some time when > running into this... > > If you really want > > In [7]: x = np.asarray(None) > > In [8]: x[()] = (i for i in range(10)) > > In [9]: x > Out[9]: array(<generator object <genexpr> at 0x4553fa0>, dtype=object) > > ...then one can type it out? > > Dag The reason it surprised me, is that python 'all' doesn't behave as numpy 'all' in this respect - and using ipython, I didn't even notice that 'all' was numpy.all rather than standard python all. All in all, rather unfortunate :) From matthew.brett at gmail.com Tue Jan 31 09:40:59 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 31 Jan 2012 14:40:59 +0000 Subject: [Numpy-discussion] Unexpected reorganization of internal data In-Reply-To: <4F27A663.4000403@gmail.com> References: <4F27A663.4000403@gmail.com> Message-ID: <CAH6Pt5oH1WEvPkfHSNJn7QBM+SwmhCndOoi14UnRDFefOn+Lvw@mail.gmail.com> Hi, On Tue, Jan 31, 2012 at 8:29 AM, Mads Ipsen <madsipsen at gmail.com> wrote: > Hi, > > I am confused. Here's the reason: > > The following structure is a representation of N points in 3D space: > > U = numpy.array([[x1,y1,z1], [x1,y1,z1],...,[xn,yn,zn]]) > > So the array U has shape (N,3). This order makes sense to me since U[i] will > give you the i'th point in the set. Now, I want to pass this array to a C++ > function that does some stuff with the points. Here's how I do that > > void Foo::doStuff(int n, PyObject * numpy_data) > { > ??? // Get pointer to data > ??? double * const positions = (double *) PyArray_DATA(numpy_data); > > ??? // Print positions > ??? for (int i=0; i<n; ++i) > ??? { > ??? float x = static_cast<float>(positions[3*i+0]) > ??? float y = static_cast<float>(positions[3*i+1]) > ??? float z = static_cast<float>(positions[3*i+2]) > > ??? printf("Pos[%d] = %f %f %f\n", x, y, z); > ??? } > } > > When I call this routine, using a swig wrapped Python interface to the C++ > class, everything prints out nice. > > Now, I want to apply a rotation to all the positions. So I set up some > rotation matrix R like this: > > R = numpy.array([[r11,r12,r13], > ???????????????? [r21,r22,r23], > ???????????????? [r31,r32,r33]]) > > To apply the matrix to the data in one crunch, I do > > V = numpy.dot(R, U.transpose()).transpose() > > Now when I call my C++ function from the Python side, all the data in V is > printed, but it has been transposed. So apparently the internal data > structure handled by numpy has been reorganized, even though I called > transpose() twice, which I would expect to cancel out each other. > > However, if I do: > > V = numpy.array(U.transpose()).transpose() > > and call the C++ routine, everything is perfectly fine, ie. the data > structure is as expected. > > What went wrong? The numpy array reserves the right to organize its data internally. For example, a numpy array can be in Fortran order in memory, or C order in memory, and many more complicated schemes. You might want to have a look at: http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray If you depend on a particular order for your array memory, you might want to look at: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ascontiguousarray.html Best, Matthew From alan.isaac at gmail.com Tue Jan 31 10:03:55 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 31 Jan 2012 10:03:55 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <jg8q6a$j3e$2@dough.gmane.org> References: <jg8q6a$j3e$2@dough.gmane.org> Message-ID: <4F2802DB.3060705@gmail.com> On 1/31/2012 8:26 AM, Neal Becker wrote: > I was just bitten by this unexpected behavior: > > In [24]: all ([i> 0 for i in xrange (10)]) > Out[24]: False > > In [25]: all (i> 0 for i in xrange (10)) > Out[25]: True > > Turns out: > In [31]: all is numpy.all > Out[31]: True >>> np.array([i> 0 for i in xrange (10)]) array([False, True, True, True, True, True, True, True, True, True], dtype=bool) >>> np.array(i> 0 for i in xrange (10)) array(<generator object <genexpr> at 0x0267A210>, dtype=object) >>> import this Cheers, Alan From ben.root at ou.edu Tue Jan 31 10:13:54 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 31 Jan 2012 09:13:54 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <4F2802DB.3060705@gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> Message-ID: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> On Tuesday, January 31, 2012, Alan G Isaac <alan.isaac at gmail.com> wrote: > On 1/31/2012 8:26 AM, Neal Becker wrote: >> I was just bitten by this unexpected behavior: >> >> In [24]: all ([i> 0 for i in xrange (10)]) >> Out[24]: False >> >> In [25]: all (i> 0 for i in xrange (10)) >> Out[25]: True >> >> Turns out: >> In [31]: all is numpy.all >> Out[31]: True > > >>>> np.array([i> 0 for i in xrange (10)]) > array([False, True, True, True, True, True, True, True, True, True], dtype=bool) >>>> np.array(i> 0 for i in xrange (10)) > array(<generator object <genexpr> at 0x0267A210>, dtype=object) >>>> import this > > > Cheers, > Alan > Is np.all() using np.array() or np.asanyarray()? If the latter, I would expect it to return a numpy array from a generator. If the former, why isn't it using asanyarray()? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/b5df357a/attachment.html> From robert.kern at gmail.com Tue Jan 31 10:18:26 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 31 Jan 2012 15:18:26 +0000 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> Message-ID: <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: > Is np.all() using np.array() or np.asanyarray()? ?If the latter, I would > expect it to return a numpy array from a generator. Why would you expect that? [~/scratch] |37> np.asanyarray(i>5 for i in range(10)) array(<generator object <genexpr> at 0xdc24a08>, dtype=object) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From d.s.seljebotn at astro.uio.no Tue Jan 31 10:19:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 31 Jan 2012 16:19:29 +0100 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> Message-ID: <4F280681.3020700@astro.uio.no> On 01/31/2012 04:13 PM, Benjamin Root wrote: > > > On Tuesday, January 31, 2012, Alan G Isaac <alan.isaac at gmail.com > <mailto:alan.isaac at gmail.com>> wrote: > > On 1/31/2012 8:26 AM, Neal Becker wrote: > >> I was just bitten by this unexpected behavior: > >> > >> In [24]: all ([i> 0 for i in xrange (10)]) > >> Out[24]: False > >> > >> In [25]: all (i> 0 for i in xrange (10)) > >> Out[25]: True > >> > >> Turns out: > >> In [31]: all is numpy.all > >> Out[31]: True > > > > > >>>> np.array([i> 0 for i in xrange (10)]) > > array([False, True, True, True, True, True, True, True, True, > True], dtype=bool) > >>>> np.array(i> 0 for i in xrange (10)) > > array(<generator object <genexpr> at 0x0267A210>, dtype=object) > >>>> import this > > > > > > Cheers, > > Alan > > > > Is np.all() using np.array() or np.asanyarray()? If the latter, I would > expect it to return a numpy array from a generator. If the former, why > isn't it using asanyarray()? Your expectation is probably wrong: In [12]: np.asanyarray(i for i in range(10)) Out[12]: array(<generator object <genexpr> at 0x455d9b0>, dtype=object) Dag Sverre From ben.root at ou.edu Tue Jan 31 10:35:38 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 31 Jan 2012 09:35:38 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> Message-ID: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote: > On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: > > > Is np.all() using np.array() or np.asanyarray()? If the latter, I would > > expect it to return a numpy array from a generator. > > Why would you expect that? > > [~/scratch] > |37> np.asanyarray(i>5 for i in range(10)) > array(<generator object <genexpr> at 0xdc24a08>, dtype=object) > > -- > Robert Kern > What possible use-case could there be for a numpy array of generators? Furthermore, from the documentation: numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, ownmaskna=False) Convert the input to an ndarray, but pass ndarray subclasses through. Parameters ---------- a : array_like *Input data, in any form that can be converted to an array*. This includes scalars, lists, lists of tuples, tuples, tuples of tuples, tuples of lists, and ndarrays. Emphasis mine. A generator is an input that could be converted into an array. (Setting aside the issue of non-terminating generators such as those from cycle()). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/ec1d9d85/attachment.html> From d.s.seljebotn at astro.uio.no Tue Jan 31 10:46:01 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 31 Jan 2012 16:46:01 +0100 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> Message-ID: <4F280CB9.9070509@astro.uio.no> On 01/31/2012 04:35 PM, Benjamin Root wrote: > > > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com > <mailto:robert.kern at gmail.com>> wrote: > > On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu > <mailto:ben.root at ou.edu>> wrote: > > > Is np.all() using np.array() or np.asanyarray()? If the latter, > I would > > expect it to return a numpy array from a generator. > > Why would you expect that? > > [~/scratch] > |37> np.asanyarray(i>5 for i in range(10)) > array(<generator object <genexpr> at 0xdc24a08>, dtype=object) > > -- > Robert Kern > > > What possible use-case could there be for a numpy array of generators? > Furthermore, from the documentation: > > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, > ownmaskna=False) > Convert the input to an ndarray, but pass ndarray subclasses through. > > Parameters > ---------- > a : array_like > *Input data, in any form that can be converted to an array*. This > includes scalars, lists, lists of tuples, tuples, tuples of > tuples, > tuples of lists, and ndarrays. > > Emphasis mine. A generator is an input that could be converted into an > array. (Setting aside the issue of non-terminating generators such as > those from cycle()). Splitting semantic hairs doesn't help here -- it *does* return an array, it just happens to be a completely useless 0-dimensional one. The question is, is the current confusing and less than useful? (I vot for "yes"). list and tuple are special-cased, why not generators (at least to raise an exception) Going OT, look at this gem: ???? In [3]: a Out[3]: array([1, 2, 3], dtype=object) In [4]: a.shape Out[4]: () ??? In [9]: b Out[9]: array([1, 2, 3], dtype=object) In [10]: b.shape Out[10]: (3,) Figuring out the "???" is left as an exercise to the reader :-) Dag Sverre From alan.isaac at gmail.com Tue Jan 31 10:48:05 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 31 Jan 2012 10:48:05 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> Message-ID: <4F280D35.6020409@gmail.com> On 1/31/2012 10:35 AM, Benjamin Root wrote: > A generator is an input that could be converted into an array. def mygen(): i = 0 while True: yield i i += 1 Alan Isaac From robert.kern at gmail.com Tue Jan 31 10:50:15 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 31 Jan 2012 15:50:15 +0000 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> Message-ID: <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote: > > > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote: >> >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: >> >> > Is np.all() using np.array() or np.asanyarray()? ?If the latter, I would >> > expect it to return a numpy array from a generator. >> >> Why would you expect that? >> >> [~/scratch] >> |37> np.asanyarray(i>5 for i in range(10)) >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object) >> >> -- >> Robert Kern > > > What possible use-case could there be for a numpy array of generators? Not many. This isn't an intentional feature, just a logical consequence of all of the other intentional features being applied consistently. > Furthermore, from the documentation: > > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, > ownmaskna=False) > ???? Convert the input to an ndarray, but pass ndarray subclasses through. > > ???? Parameters > ???? ---------- > ???? a : array_like > ???????? Input data, in any form that can be converted to an array.? This > ???????? includes scalars, lists, lists of tuples, tuples, tuples of tuples, > ???????? tuples of lists, and ndarrays. > > Emphasis mine.? A generator is an input that could be converted into an > array.? (Setting aside the issue of non-terminating generators such as those > from cycle()). I'm sorry, but this is not true. In general, it's too hard to do all of the magic autodetermination that asarray() and array() do when faced with an indeterminate-length iterable. We tried. That's why we have fromiter(). By restricting the domain to an iterable yielding scalars and requiring that the user specify the desired dtype, fromiter() can figure out the rest. Like it or not, "array_like" is practically defined by the behavior of np.asarray(), not vice-versa. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From shish at keba.be Tue Jan 31 11:05:34 2012 From: shish at keba.be (Olivier Delalleau) Date: Tue, 31 Jan 2012 11:05:34 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> Message-ID: <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit : > On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote: > > > > > > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> > wrote: > >> > >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: > >> > >> > Is np.all() using np.array() or np.asanyarray()? If the latter, I > would > >> > expect it to return a numpy array from a generator. > >> > >> Why would you expect that? > >> > >> [~/scratch] > >> |37> np.asanyarray(i>5 for i in range(10)) > >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object) > >> > >> -- > >> Robert Kern > > > > > > What possible use-case could there be for a numpy array of generators? > > Not many. This isn't an intentional feature, just a logical > consequence of all of the other intentional features being applied > consistently. > > > Furthermore, from the documentation: > > > > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, > > ownmaskna=False) > > Convert the input to an ndarray, but pass ndarray subclasses > through. > > > > Parameters > > ---------- > > a : array_like > > Input data, in any form that can be converted to an array. This > > includes scalars, lists, lists of tuples, tuples, tuples of > tuples, > > tuples of lists, and ndarrays. > > > > Emphasis mine. A generator is an input that could be converted into an > > array. (Setting aside the issue of non-terminating generators such as > those > > from cycle()). > > I'm sorry, but this is not true. In general, it's too hard to do all > of the magic autodetermination that asarray() and array() do when > faced with an indeterminate-length iterable. We tried. That's why we > have fromiter(). By restricting the domain to an iterable yielding > scalars and requiring that the user specify the desired dtype, > fromiter() can figure out the rest. > > Like it or not, "array_like" is practically defined by the behavior of > np.asarray(), not vice-versa. In that case I agree with whoever said ealier it would be best to detect this case and throw an exception, as it'll probably save some headaches. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/30bf81ab/attachment.html> From robert.kern at gmail.com Tue Jan 31 11:11:19 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 31 Jan 2012 16:11:19 +0000 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> Message-ID: <CAF6FJis3WrvvdWX8Z9VypGzSLFd+o8rvVbMkGLKwzEt-qK2=Pg@mail.gmail.com> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote: > Furthermore, from the documentation: > > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, > ownmaskna=False) > ???? Convert the input to an ndarray, but pass ndarray subclasses through. > > ???? Parameters > ???? ---------- > ???? a : array_like > ???????? Input data, in any form that can be converted to an array.? This > ???????? includes scalars, lists, lists of tuples, tuples, tuples of tuples, > ???????? tuples of lists, and ndarrays. I should also add that this verbiage is also in np.asarray(). The only additional feature of np.asanyarray() is that is does not convert ndarray subclasses like matrix to ndarray objects. np.asanyarray() does not accept more types of objects than np.asarray(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From ben.root at ou.edu Tue Jan 31 11:46:56 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 31 Jan 2012 10:46:56 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> Message-ID: <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> On Tue, Jan 31, 2012 at 10:05 AM, Olivier Delalleau <shish at keba.be> wrote: > Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit : > >> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote: >> > >> > >> > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> >> wrote: >> >> >> >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: >> >> >> >> > Is np.all() using np.array() or np.asanyarray()? If the latter, I >> would >> >> > expect it to return a numpy array from a generator. >> >> >> >> Why would you expect that? >> >> >> >> [~/scratch] >> >> |37> np.asanyarray(i>5 for i in range(10)) >> >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object) >> >> >> >> -- >> >> Robert Kern >> > >> > >> > What possible use-case could there be for a numpy array of generators? >> >> Not many. This isn't an intentional feature, just a logical >> consequence of all of the other intentional features being applied >> consistently. >> >> > Furthermore, from the documentation: >> > >> > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, >> > ownmaskna=False) >> > Convert the input to an ndarray, but pass ndarray subclasses >> through. >> > >> > Parameters >> > ---------- >> > a : array_like >> > Input data, in any form that can be converted to an array. >> This >> > includes scalars, lists, lists of tuples, tuples, tuples of >> tuples, >> > tuples of lists, and ndarrays. >> > >> > Emphasis mine. A generator is an input that could be converted into an >> > array. (Setting aside the issue of non-terminating generators such as >> those >> > from cycle()). >> >> I'm sorry, but this is not true. In general, it's too hard to do all >> of the magic autodetermination that asarray() and array() do when >> faced with an indeterminate-length iterable. We tried. That's why we >> have fromiter(). By restricting the domain to an iterable yielding >> scalars and requiring that the user specify the desired dtype, >> fromiter() can figure out the rest. >> >> Like it or not, "array_like" is practically defined by the behavior of >> np.asarray(), not vice-versa. > > > In that case I agree with whoever said ealier it would be best to detect > this case and throw an exception, as it'll probably save some headaches. > > -=- Olivier > > I'll agree with this statement. This bug has popped up a few times in the mpl bug tracker due to the pylab mode. While I would prefer if it were possible to evaluate the generator into an array, silently returning True incorrectly for all() and any() is probably far worse. That said, is it still impossible to make np.all() and np.any() special to have similar behavior to the built-in all() and any()? Maybe it could catch the above exception and then return the result from python's built-ins? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/9aa56e31/attachment.html> From chris.barker at noaa.gov Tue Jan 31 12:07:20 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 31 Jan 2012 09:07:20 -0800 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <jg8u4j$j21$1@dough.gmane.org> References: <jg8q6a$j3e$2@dough.gmane.org> <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com> <4F27F740.9060202@astro.uio.no> <jg8u4j$j21$1@dough.gmane.org> Message-ID: <CALGmxEJiZXmLSJ7jYwfca6susvGF2kOt4ESxF0Nn4Ek_2RVWZw@mail.gmail.com> On Tue, Jan 31, 2012 at 6:33 AM, Neal Becker <ndbecker2 at gmail.com> wrote: > The reason it surprised me, is that python 'all' doesn't behave as numpy 'all' > in this respect - and using ipython, I didn't even notice that 'all' was > numpy.all rather than standard python all. "namespaces are one honking great idea" -- sorry, I couldn't help myself.... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Tue Jan 31 12:23:58 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 31 Jan 2012 09:23:58 -0800 Subject: [Numpy-discussion] Unexpected reorganization of internal data In-Reply-To: <CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com> References: <4F27A663.4000403@gmail.com> <CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com> Message-ID: <CALGmxE+47X1rmmrm9wqmEiVGHOb3DwbuZ1oRmWY=g1MKXmbQrg@mail.gmail.com> On Tue, Jan 31, 2012 at 6:14 AM, Malcolm Reynolds <malcolm.reynolds at gmail.com> wrote: > Not exactly an answer to your question, but I can highly recommend > using Boost.python, PyUblas and Ublas for your C++ vectors and > matrices. It gives you a really good interface on the C++ side to > numpy arrays and matrices, which can be passed in both directions over > the language threshold with no copying. or use Cython... > If I had to guess I'd say sometimes when transposing numpy simply sets > a flag internally to avoid copying the data, but in some cases (such > as perhaps when multiplication needs to take place) the data has to be > placed in a new object. good guess: > V = numpy.dot(R, U.transpose()).transpose() >>> a array([[1, 2], [3, 4], [5, 6]]) >>> a.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> b = a.transpose() >>> b.flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False so the transpose() simple re-arranges the strides to Fortran order, rather than changing anything in memory. np.dot() produces a new array, so it is C-contiguous, then you transpose it, so you get a fortran-ordered array. > Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed. as mentioned, if you are working with arrays in C++ (or fortran, orC, or...) and need to count on the ordering of the data, you need to check it in your extension code. There are utilities for this. > However, if I do: > V = numpy.array(U.transpose()).transpose() right: In [7]: a.flags Out[7]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [8]: a.transpose().flags Out[8]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [9]: np.array( a.transpose() ).flags Out[9]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False so the np.array call doesn't re-arrange the order if it doesn't need to. If you want to force it, you can specify the order: In [10]: np.array( a.transpose(), order='C' ).flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False (note: this does surprise me a bit, as it is making a copy, but there you go -- if order matters, specify it) In general, numpy does a lot of things for the sake of efficiency -- avoiding copies when it can, for instance -- this give efficiency and flexibility, but you do need to be careful, particularly when interfacing with the binary data directly. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From travis at continuum.io Tue Jan 31 17:17:28 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 31 Jan 2012 16:17:28 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> Message-ID: <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> I also agree that an exception should be raised at the very least. It might also be possible to make the NumPy any, all, and sum functions behave like the builtins when given a generator. It seems worth exploring at least. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jan 31, 2012, at 10:46 AM, Benjamin Root <ben.root at ou.edu> wrote: > > On Tue, Jan 31, 2012 at 10:05 AM, Olivier Delalleau <shish at keba.be> wrote: > Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit : > On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote: > > > > > > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote: > >> > >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote: > >> > >> > Is np.all() using np.array() or np.asanyarray()? If the latter, I would > >> > expect it to return a numpy array from a generator. > >> > >> Why would you expect that? > >> > >> [~/scratch] > >> |37> np.asanyarray(i>5 for i in range(10)) > >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object) > >> > >> -- > >> Robert Kern > > > > > > What possible use-case could there be for a numpy array of generators? > > Not many. This isn't an intentional feature, just a logical > consequence of all of the other intentional features being applied > consistently. > > > Furthermore, from the documentation: > > > > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None, > > ownmaskna=False) > > Convert the input to an ndarray, but pass ndarray subclasses through. > > > > Parameters > > ---------- > > a : array_like > > Input data, in any form that can be converted to an array. This > > includes scalars, lists, lists of tuples, tuples, tuples of tuples, > > tuples of lists, and ndarrays. > > > > Emphasis mine. A generator is an input that could be converted into an > > array. (Setting aside the issue of non-terminating generators such as those > > from cycle()). > > I'm sorry, but this is not true. In general, it's too hard to do all > of the magic autodetermination that asarray() and array() do when > faced with an indeterminate-length iterable. We tried. That's why we > have fromiter(). By restricting the domain to an iterable yielding > scalars and requiring that the user specify the desired dtype, > fromiter() can figure out the rest. > > Like it or not, "array_like" is practically defined by the behavior of > np.asarray(), not vice-versa. > > In that case I agree with whoever said ealier it would be best to detect this case and throw an exception, as it'll probably save some headaches. > > -=- Olivier > > > I'll agree with this statement. This bug has popped up a few times in the mpl bug tracker due to the pylab mode. While I would prefer if it were possible to evaluate the generator into an array, silently returning True incorrectly for all() and any() is probably far worse. > > That said, is it still impossible to make np.all() and np.any() special to have similar behavior to the built-in all() and any()? Maybe it could catch the above exception and then return the result from python's built-ins? > > Cheers! > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/92da353e/attachment.html> From robert.kern at gmail.com Tue Jan 31 17:22:17 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 31 Jan 2012 22:22:17 +0000 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> Message-ID: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote: > I also agree that an exception should be raised at the very least. > > It might also be possible to make the NumPy any, all, and sum functions > behave like the builtins when given a generator. ?It seems worth exploring > at least. I would rather we deprecate the all() and any() functions in favor of the alltrue() and sometrue() aliases that date back to Numeric. Renaming them to match the builtin names was a mistake. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From warren.weckesser at enthought.com Tue Jan 31 17:25:33 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 31 Jan 2012 16:25:33 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> Message-ID: <CAM-+wY8a7fO0dqVEo4cJVUMQpk3hDRqH7tCeumMk-C1v5ocCVA@mail.gmail.com> On Tue, Jan 31, 2012 at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote: > On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> > wrote: > > I also agree that an exception should be raised at the very least. > > > > It might also be possible to make the NumPy any, all, and sum functions > > behave like the builtins when given a generator. It seems worth > exploring > > at least. > > I would rather we deprecate the all() and any() functions in favor of > the alltrue() and sometrue() aliases that date back to Numeric. > +1 (Maybe 'anytrue' for consistency? (And a royal blue bike shed?)) Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/f4af3d62/attachment.html> From travis at continuum.io Tue Jan 31 17:35:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 31 Jan 2012 16:35:18 -0600 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> Message-ID: <07157234-E610-4A5F-8B18-AB6940129E96@continuum.io> Actually i believe the NumPy 'any' and 'all' names pre-date the Python usage which first appeared in Python 2.5 I agree with Chris that namespaces are a great idea. I don't agree with deprecating 'any' and 'all' It also seems useful to revisit under what conditions 'array' could correctly interpret a generator expression, but in the context of streaming or deferred arrays. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jan 31, 2012, at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote: > On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote: >> I also agree that an exception should be raised at the very least. >> >> It might also be possible to make the NumPy any, all, and sum functions >> behave like the builtins when given a generator. It seems worth exploring >> at least. > > I would rather we deprecate the all() and any() functions in favor of > the alltrue() and sometrue() aliases that date back to Numeric. > Renaming them to match the builtin names was a mistake. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Tue Jan 31 20:45:52 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 31 Jan 2012 20:45:52 -0500 Subject: [Numpy-discussion] numpy all unexpected result (generator) In-Reply-To: <07157234-E610-4A5F-8B18-AB6940129E96@continuum.io> References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com> <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com> <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com> <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com> <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com> <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com> <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com> <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io> <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com> <07157234-E610-4A5F-8B18-AB6940129E96@continuum.io> Message-ID: <CAMMTP+CPOmM7WoaJuELCOMaMctJdR7ezA-=x6KQzhwQn9-JnxA@mail.gmail.com> On Tue, Jan 31, 2012 at 5:35 PM, Travis Oliphant <travis at continuum.io> wrote: > Actually i believe the NumPy 'any' and 'all' names pre-date the Python usage which first appeared in Python 2.5 > > I agree with Chris that namespaces are a great idea. ?I don't agree with deprecating 'any' and 'all' I completely agree here. I also like to keep np.all, np.any, np.max, ... >>> np.max((i> 0 for i in xrange (10))) <generator object <genexpr> at 0x046493F0> >>> max((i> 0 for i in xrange (10))) True I used an old-style matplotlib example as recipe yesterday, and the first thing I did is getting rid of the missing name spaces, and I had to think twice what amax and amin are. aall, aany ??? ;) Josef > > It also seems useful to revisit under what conditions 'array' could correctly interpret a generator expression, but in the context of streaming or deferred arrays. > > Travis > > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Jan 31, 2012, at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote: > >> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote: >>> I also agree that an exception should be raised at the very least. >>> >>> It might also be possible to make the NumPy any, all, and sum functions >>> behave like the builtins when given a generator. ?It seems worth exploring >>> at least. >> >> I would rather we deprecate the all() and any() functions in favor of >> the alltrue() and sometrue() aliases that date back to Numeric. >> Renaming them to match the builtin names was a mistake. >> >> -- >> Robert Kern >> >> "I have come to believe that the whole world is an enigma, a harmless >> enigma that is made terrible by our own mad attempt to interpret it as >> though it had an underlying truth." >> ? -- Umberto Eco >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion