From enzomich at gmail.com Sat Jan 1 06:23:37 2011 From: enzomich at gmail.com (Enzo Michelangeli) Date: Sat, 1 Jan 2011 19:23:37 +0800 Subject: [Numpy-discussion] Optimization suggestion sought References: Message-ID: ----- Original Message ----- From: "Robert Bradshaw" Sent: Wednesday, December 29, 2010 4:47 PM [...] >> Regarding Justin's suggestion, before trying Cython (which, according to >> http://wiki.cython.org/tutorials/numpy , seems to require a bit of work >> to >> handle numpy arrays properly) > > Cython doesn't have to be that complicated. For your example, you just > have to unroll the vectorization (and account for the fact that the > result is mutated in place, which was your original goal). Thanks, but the full de-vectorization forces to give up any use of BLAS (I suppose that for array products numpy relies on its routines). In my tests, the performance in terms of speed is more or less the same as the original pure-numpy code (which may be made less memory-hungry with the chunking suggested by Josef). Instead, it would be nice to have a native function able to perform evaluation of arbitrary numpy expressions without converting the intermediate results in Python format (a sort of "better weave.blitz", able to understand slicing, broadcasting rules etc.). That would give us the best of both worlds: code execution at BLAS speeds, and savings in unnecessary conversions and temporary variable allocations. Such "numpy calculator" could also be a simple interpreter, avoiding the complexities and site dependencies deriving from the use of a C compiler: it should build temporary C data structures for the parameters in input, call the relevant C ATLAS/BLAS/LAPACK functions in the right order (possibly allocating temporary C arrays), and convert only the final result back to a Python object. Enzo From ralf.gommers at googlemail.com Sat Jan 1 06:40:58 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 1 Jan 2011 19:40:58 +0800 Subject: [Numpy-discussion] OS X binaries. In-Reply-To: References: Message-ID: On Sat, Jan 1, 2011 at 5:44 AM, Gideon wrote: > I noticed that 1.5.1 was released, and sourceforge is suggesting I use > the package numpy-1.5.1-py2.6-python.org-macosx10.3.dmg. However, I > have an OS X 10.6 machine. > > Can/should I use this binary? > Yes you can. The naming scheme corresponds to the one used by Python itself on python.org. For 2.6 the ..macosx10.3.dmg works for all supported versions of OS X. For 2.7 you have the choice of 2 versions if you are on 10.6, depending on whether or not you want 32-bit or 64-bit. Cheers, Ralf > > Should I just compile from source? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Sat Jan 1 14:23:22 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Sat, 1 Jan 2011 11:23:22 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: This thread is a bit old, but since it's not possible to use the C-API is possible to accomplish this same thing with the Python API? On Tue, Dec 21, 2010 at 5:12 PM, Mark Wiebe wrote: > On Mon, Dec 20, 2010 at 1:42 PM, John Salvatier > wrote: > >> A while ago, I asked a whether it was possible to multi-iterate over >> several ndarrays but exclude a certain axis( >> http://www.mail-archive.com/numpy-discussion at scipy.org/msg29204.html), >> sort of a combination of PyArray_IterAllButAxis and PyArray_MultiIterNew. My >> goal was to allow creation of relatively complex ufuncs that can allow >> reduction or directionally dependent computation and still use broadcasting >> (for example a moving averaging ufunc that can have changing averaging >> parameters). I didn't get any solutions, which I take to mean that no one >> knew how to do this. >> >> I am thinking about trying to make a numpy patch with this functionality, >> and I have some questions: 1) How difficult would this kind of task be for >> someone with non-expert C knowledge and good numpy knowledge? 2) Does anyone >> have advice on how to do this kind of thing? >> > > You may be able to do what you would like with the new iterator I've > written. In particular, it supports nesting multiple iterators by providing > either pointers or offsets, and allowing you to specify any subset of the > axes to iterate. Here's how the code to do this in a simple 3D case might > look, for making axis 1 the inner loop: > > PyArrayObject *op[2] = {a,b}; > npy_intp axes_outer[2] = {0,2}}; > npy_intp *op_axes[2]; > npy_intp axis_inner = 1; > npy_int32 flags[2] = {NPY_ITER_READONLY, NPY_ITER_READONLY}; > NpyIter *outer, *inner; > NpyIter_IterNext_Fn oiternext, iiternext; > npy_intp *ooffsets; > char **idataptrs; > > op_axes[0] = op_axes[1] = axes_outer; > outer = NpyIter_MultiNew(2, op, NPY_ITER_OFFSETS, > NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 2, > op_axes, 0); > op_axes[0] = op_axes[1] = &axis_inner; > inner = NpyIter_MultiNew(2, op, 0, NPY_KEEPORDER, NPY_NO_CASTING, flags, > NULL, 1, op_axes, 0); > > oiternext = NpyIter_GetIterNext(outer); > iiternext = NpyIter_GetIterNext(inner); > > ooffsets = (npy_intp *)NpyIter_GetDataPtrArray(outer); > idataptrs = NpyIter_GetDataPtrArray(inner); > > do { > do { > char *a_data = idataptrs[0] + ooffsets[0], *b_data = idataptrs[0] + > ooffsets[0]; > /* Do stuff with the data */ > } while(iiternext()); > NpyIter_Reset(inner); > } while(oiternext()); > > NpyIter_Deallocate(outer); > NpyIter_Deallocate(inner); > > Extending to more dimensions, or making both the inner and outer loops have > multiple dimensions, isn't too crazy. Is this along the lines of what you > need? > > If you check out my code, note that it currently isn't exposed as NumPy API > yet, but you can try a lot of things with the Python exposure. > > Cheers, > Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From enzomich at gmail.com Sat Jan 1 20:42:02 2011 From: enzomich at gmail.com (Enzo Michelangeli) Date: Sun, 2 Jan 2011 09:42:02 +0800 Subject: [Numpy-discussion] Arrays with aliased elements? Message-ID: Is there any way, not involving compilation of C code, to define ndarrays where some rows or columns share the same data buffers? For example, something built by a hypothetical variant of the np.repeat() function, such that, if a = array([2,3]), calling: b = np.aliasedrepeat(x, [1, 2], axis=0) would return in b: array([[2, 3], [2, 3], [2, 3]]) ...with the understanding that the three rows would actually share the same data, so setting e.g.: b[0,1] = 5 ...would change b into: array([[2, 5], [2, 5], [2, 5]]) In other words, something with a behaviour similar to a list of lists: >>> a = [2,3] >>> b = [a,a,a] >>> b [[2, 3], [2, 3], [2, 3]] >>> b[0][1] = 5 >>> b [[2, 5], [2, 5], [2, 5]] This would save memory (and time spent in unnecessary copying) in some applications with large arrays, and would allow to cope with the current inability of weave.blitz to understand broadcasting rules, e.g. for calculating outer products (I mentioned this in a previous thread). Enzo From zachary.pincus at yale.edu Sat Jan 1 20:53:13 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Sat, 1 Jan 2011 20:53:13 -0500 Subject: [Numpy-discussion] Arrays with aliased elements? In-Reply-To: References: Message-ID: <7DBBF9B7-9AE2-44C4-B126-5039F1B5AC67@yale.edu> def repeat(arr, num): arr = numpy.asarray(arr) return numpy.ndarray(arr.shape+(num,), dtype=arr.dtype, buffer=arr, strides=arr.strides+(0,)) There are limits to what these sort of stride tricks can accomplish, but repeating as above, or similar, is feasible. On Jan 1, 2011, at 8:42 PM, Enzo Michelangeli wrote: > Is there any way, not involving compilation of C code, to define > ndarrays > where some rows or columns share the same data buffers? For example, > something built by a hypothetical variant of the np.repeat() > function, such > that, if a = array([2,3]), calling: > > b = np.aliasedrepeat(x, [1, 2], axis=0) > > would return in b: > > array([[2, 3], > [2, 3], > [2, 3]]) > > ...with the understanding that the three rows would actually share > the same > data, so setting e.g.: > > b[0,1] = 5 > > ...would change b into: > > array([[2, 5], > [2, 5], > [2, 5]]) > > In other words, something with a behaviour similar to a list of lists: > >>>> a = [2,3] >>>> b = [a,a,a] >>>> b > [[2, 3], [2, 3], [2, 3]] >>>> b[0][1] = 5 >>>> b > [[2, 5], [2, 5], [2, 5]] > > This would save memory (and time spent in unnecessary copying) in some > applications with large arrays, and would allow to cope with the > current > inability of weave.blitz to understand broadcasting rules, e.g. for > calculating outer products (I mentioned this in a previous thread). > > Enzo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Sat Jan 1 21:08:05 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 1 Jan 2011 20:08:05 -0600 Subject: [Numpy-discussion] Arrays with aliased elements? In-Reply-To: References: Message-ID: On Sat, Jan 1, 2011 at 19:42, Enzo Michelangeli wrote: > Is there any way, not involving compilation of C code, to define ndarrays > where some rows or columns share the same data buffers? For example, > something built by a hypothetical variant of the np.repeat() function, such > that, if a = array([2,3]), calling: > > ? b = np.aliasedrepeat(x, [1, 2], axis=0) > > would return in b: > > ? array([[2, 3], > ? ? ? ? ?[2, 3], > ? ? ? ? ?[2, 3]]) > > ...with the understanding that the three rows would actually share the same > data, so setting e.g.: > > ? b[0,1] = 5 > > ...would change b into: > > ? array([[2, 5], > ? ? ? ? ?[2, 5], > ? ? ? ? ?[2, 5]]) > > In other words, something with a behaviour similar to a list of lists: > >>>> a = [2,3] >>>> b = [a,a,a] >>>> b > [[2, 3], [2, 3], [2, 3]] >>>> b[0][1] = 5 >>>> b > [[2, 5], [2, 5], [2, 5]] > > This would save memory (and time spent in unnecessary copying) in some > applications with large arrays, and would allow to cope with the current > inability of weave.blitz to understand broadcasting rules, e.g. for > calculating outer products (I mentioned this in a previous thread). See numpy.lib.stride_tricks for tools to do this, specifically the as_strided() function. See numpy.broadcast_arrays() for the latter functionality. http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast_arrays.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From enzomich at gmail.com Sun Jan 2 02:01:24 2011 From: enzomich at gmail.com (Enzo Michelangeli) Date: Sun, 2 Jan 2011 15:01:24 +0800 Subject: [Numpy-discussion] Arrays with aliased elements? References: Message-ID: <63A2AB5B2FCE4426B473D469EE06ADC1@EMLT> Thanks. Meanwhile, I had arrived to a solution similar to the one suggested by Zachary: >>> a = array([2,3]) >>> ndarray((3,a.shape[0]), strides=(0,a.itemsize), buffer = a, offset=0, >>> dtype=a.dtype) array([[2, 3], [2, 3], [2, 3]]) ...but I'd say that numpy.broadcast_arrays is the cleanest way of obtaining pre-broadcasted views to pass to weave.blitz(). But alas, it appears that blitz doesn't work well with such non-contiguous views: tsb, pivb = broadcast_arrays(tableau[:,cand:cand+1], pivot) tableau = tableau - tsb * pivb ...works, but: tsb, pivb = broadcast_arrays(tableau[:,cand:cand+1], pivot) weave.blitz('tableau = tableau - tsb * pivb') ...returns wrong results. And, of course, converting them to contiguous through the array() function defeats the intended savings in memory and CPU cycles... Enzo ----- Original Message ----- From: "Robert Kern" To: "Discussion of Numerical Python" Sent: Sunday, January 02, 2011 10:08 AM Subject: Re: [Numpy-discussion] Arrays with aliased elements? > On Sat, Jan 1, 2011 at 19:42, Enzo Michelangeli > wrote: >> Is there any way, not involving compilation of C code, to define ndarrays >> where some rows or columns share the same data buffers? For example, >> something built by a hypothetical variant of the np.repeat() function, >> such >> that, if a = array([2,3]), calling: >> >> b = np.aliasedrepeat(x, [1, 2], axis=0) >> >> would return in b: >> >> array([[2, 3], >> [2, 3], >> [2, 3]]) >> >> ...with the understanding that the three rows would actually share the >> same >> data, so setting e.g.: >> >> b[0,1] = 5 >> >> ...would change b into: >> >> array([[2, 5], >> [2, 5], >> [2, 5]]) >> >> In other words, something with a behaviour similar to a list of lists: >> >>>>> a = [2,3] >>>>> b = [a,a,a] >>>>> b >> [[2, 3], [2, 3], [2, 3]] >>>>> b[0][1] = 5 >>>>> b >> [[2, 5], [2, 5], [2, 5]] >> >> This would save memory (and time spent in unnecessary copying) in some >> applications with large arrays, and would allow to cope with the current >> inability of weave.blitz to understand broadcasting rules, e.g. for >> calculating outer products (I mentioned this in a previous thread). > > See numpy.lib.stride_tricks for tools to do this, specifically the > as_strided() function. See numpy.broadcast_arrays() for the latter > functionality. > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast_arrays.html > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Mon Jan 3 01:46:24 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 3 Jan 2011 15:46:24 +0900 Subject: [Numpy-discussion] Prime size FFT: bluestein transform vs general chirp/z transform ? Message-ID: Hi, I finally took the time to clean up my code to speed up prime-size FFT (which use a O(N^2) algo in both numpy and scipy). The code is there: https://github.com/cournape/numpy/tree/bluestein (most of the code is tests, because numpy.fft had almost none). Bottom line: it is used only for prime numbers, and is faster than the current code for complex transforms > 500. Because of python + inherent bluestein overhead, this is mostly useful for "long" fft (where the speed up is significant - already 100x speed up for prime size ~ 50000). Several comments: - the overhead is pretty significant (on my machine, bluestein transfrom is slower for prime size < 500) - it could be used as such for real transforms, but the overhead would be even more significant (there is no bluestein transform for real transforms, so one needs to re-rexpress real transforms in term of complex ones, multiplying the overhead by 2x). There are several alternatives to make things faster (Rader-like transform, as used by fftw), but I think this would be quite hard to do in python without significant slowdown, because the code cannot be vectorized. - one could also decide to provide a chirp-z transform, of which Bluestein transform is a special case. Maybe this is more adapted to scipy ? - more generic code will require a few simple (but not trivial) arithmetic-like functions (find prime factors, find generator of Z/nZ groups with n prime, etc...). Where should I put those ? cheers, David From seb.haase at gmail.com Mon Jan 3 05:13:25 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 3 Jan 2011 11:13:25 +0100 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: Hi Erik, This is really neat ! Do I understand correctly, that you mean by "stride tricks", that your rolling_window is _not_ allocating any new memory ? IOW, If I have a large array using 500MB of memory, say of float32 of shape 125,1000,1000 and I want the last axis rolling of window size 11, what would the peak memory usage of that operation be ? How about renaming the option `window` to `window_size` (first I was thinking of things like hamming and hanning windows...)... ? Thanks, Sebastian Haase On Sat, Jan 1, 2011 at 5:29 AM, Erik Rigtorp wrote: > Hi, > > Implementing moving average, moving std and other functions working > over rolling windows using python for loops are slow. This is a > effective stride trick I learned from Keith Goodman's > Bottleneck code but generalized into arrays of > any dimension. This trick allows the loop to be performed in C code > and in the future hopefully using multiple cores. > > import numpy as np > > def rolling_window(a, window): > ? ?""" > ? ?Make an ndarray with a rolling window of the last dimension > > ? ?Parameters > ? ?---------- > ? ?a : array_like > ? ? ? ?Array to add rolling window to > ? ?window : int > ? ? ? ?Size of rolling window > > ? ?Returns > ? ?------- > ? ?Array that is a view of the original array with a added dimension > ? ?of size w. > > ? ?Examples > ? ?-------- > ? ?>>> x=np.arange(10).reshape((2,5)) > ? ?>>> rolling_window(x, 3) > ? ?array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]], > ? ? ? ? ? [[5, 6, 7], [6, 7, 8], [7, 8, 9]]]) > > ? ?Calculate rolling mean of last dimension: > ? ?>>> np.mean(rolling_window(x, 3), -1) > ? ?array([[ 1., ?2., ?3.], > ? ? ? ? ? [ 6., ?7., ?8.]]) > > ? ?""" > ? ?if window < 1: > ? ? ? ?raise ValueError, "`window` must be at least 1." > ? ?if window > a.shape[-1]: > ? ? ? ?raise ValueError, "`window` is too long." > ? ?shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) > ? ?strides = a.strides + (a.strides[-1],) > ? ?return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) > > > Using np.swapaxes(-1, axis) rolling aggregations over any axis can be computed. > > I submitted a pull request to add this to the stride_tricks module. > > Erik From erik at rigtorp.com Mon Jan 3 08:37:21 2011 From: erik at rigtorp.com (Erik Rigtorp) Date: Mon, 3 Jan 2011 08:37:21 -0500 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 05:13, Sebastian Haase wrote: > Hi Erik, > This is really neat ! ?Do I understand correctly, that you mean by > "stride tricks", that your rolling_window is _not_ allocating any new > memory ? Yes, it's only a view. > IOW, If I have a large array using 500MB of memory, say of float32 of > shape 125,1000,1000 and I want the last axis rolling of window size > 11, what would the peak memory usage of that operation be ? It's only a view of the array, no copying is done. Though some operations like np.std() will copy the array, but that's more of a bug. In general It's hard to imagine any speedup gains by copying a 10GB array. > How about renaming the option `window` to `window_size` ?(first I was > thinking of things like hamming and hanning windows...)... ? Sounds fare. From kwgoodman at gmail.com Mon Jan 3 10:32:44 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 3 Jan 2011 07:32:44 -0800 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Fri, Dec 31, 2010 at 8:29 PM, Erik Rigtorp wrote: > Implementing moving average, moving std and other functions working > over rolling windows using python for loops are slow. This is a > effective stride trick I learned from Keith Goodman's > Bottleneck code but generalized into arrays of > any dimension. This trick allows the loop to be performed in C code > and in the future hopefully using multiple cores. I like using strides for moving window functions. The one downside I found is that it is slow when window * (arr.shape[axis] - window) is large: >> a = np.random.rand(1000000) >> b = rolling_window(a, 5000) >> import bottleneck as bn >> timeit bn.nanmean(b, axis=1) 1 loops, best of 3: 7.1 s per loop >> timeit bn.move_nanmean(a, window=5000, axis=0) 100 loops, best of 3: 7.99 ms per loop From kwgoodman at gmail.com Mon Jan 3 10:36:47 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 3 Jan 2011 07:36:47 -0800 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 5:37 AM, Erik Rigtorp wrote: > It's only a view of the array, no copying is done. Though some > operations like np.std() ?will copy the array, but that's more of a > bug. In general It's hard to imagine any speedup gains by copying a > 10GB array. I don't think that np.std makes a copy of the input data if the input is an array. If the input is, for example, a list, then an array is created. From erik at rigtorp.com Mon Jan 3 10:41:11 2011 From: erik at rigtorp.com (Erik Rigtorp) Date: Mon, 3 Jan 2011 10:41:11 -0500 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 10:36, Keith Goodman wrote: > On Mon, Jan 3, 2011 at 5:37 AM, Erik Rigtorp wrote: > >> It's only a view of the array, no copying is done. Though some >> operations like np.std() ?will copy the array, but that's more of a >> bug. In general It's hard to imagine any speedup gains by copying a >> 10GB array. > > I don't think that np.std makes a copy of the input data if the input > is an array. If the input is, for example, a list, then an array is > created. When I tried it on a big array, it tried to allocate a huge amount of memory. As I said it's probably a bug. From kwgoodman at gmail.com Mon Jan 3 10:52:28 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 3 Jan 2011 07:52:28 -0800 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 7:41 AM, Erik Rigtorp wrote: > On Mon, Jan 3, 2011 at 10:36, Keith Goodman wrote: >> On Mon, Jan 3, 2011 at 5:37 AM, Erik Rigtorp wrote: >> >>> It's only a view of the array, no copying is done. Though some >>> operations like np.std() ?will copy the array, but that's more of a >>> bug. In general It's hard to imagine any speedup gains by copying a >>> 10GB array. >> >> I don't think that np.std makes a copy of the input data if the input >> is an array. If the input is, for example, a list, then an array is >> created. > > When I tried it on a big array, it tried to allocate a huge amount of > memory. As I said it's probably a bug. Yes, that would be a big bug. np.std does have to initialize the output array. If the window size is small compared to arr.shape[axis] then the memory taken by the output array is of the same order as that of the input array. Could that be what you are seeing? >> a = np.arange(10) Small window, output array shape (8,): >> rolling_window(a, 2) array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9]]) Big window, output array shape (2,): >> rolling_window(a, 9) array([[0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7, 8, 9]]) From erik at rigtorp.com Mon Jan 3 10:55:32 2011 From: erik at rigtorp.com (Erik Rigtorp) Date: Mon, 3 Jan 2011 10:55:32 -0500 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 10:52, Keith Goodman wrote: > On Mon, Jan 3, 2011 at 7:41 AM, Erik Rigtorp wrote: >> On Mon, Jan 3, 2011 at 10:36, Keith Goodman wrote: >>> On Mon, Jan 3, 2011 at 5:37 AM, Erik Rigtorp wrote: >>> >>>> It's only a view of the array, no copying is done. Though some >>>> operations like np.std() ?will copy the array, but that's more of a >>>> bug. In general It's hard to imagine any speedup gains by copying a >>>> 10GB array. >>> >>> I don't think that np.std makes a copy of the input data if the input >>> is an array. If the input is, for example, a list, then an array is >>> created. >> >> When I tried it on a big array, it tried to allocate a huge amount of >> memory. As I said it's probably a bug. > > Yes, that would be a big bug. > > np.std does have to initialize the output array. If the window size is > small compared to arr.shape[axis] then the memory taken by the output > array is of the same order as that of the input array. Could that be > what you are seeing? > >>> a = np.arange(10) > > Small window, output array shape (8,): > >>> rolling_window(a, 2) > array([[0, 1], > ? ? ? [1, 2], > ? ? ? [2, 3], > ? ? ? [3, 4], > ? ? ? [4, 5], > ? ? ? [5, 6], > ? ? ? [6, 7], > ? ? ? [7, 8], > ? ? ? [8, 9]]) > > Big window, output array shape (2,): > >>> rolling_window(a, 9) > array([[0, 1, 2, 3, 4, 5, 6, 7, 8], > ? ? ? [1, 2, 3, 4, 5, 6, 7, 8, 9]]) No the array was (500,2000) and i did np.std(rolling_window(a,252),-1) and it started to allocate > 2GB. From efiring at hawaii.edu Mon Jan 3 11:26:37 2011 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 03 Jan 2011 06:26:37 -1000 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: Message-ID: <4D21F8BD.60003@hawaii.edu> On 12/31/2010 06:29 PM, Erik Rigtorp wrote: > Hi, > > Implementing moving average, moving std and other functions working > over rolling windows using python for loops are slow. This is a > effective stride trick I learned from Keith Goodman's > Bottleneck code but generalized into arrays of > any dimension. This trick allows the loop to be performed in C code > and in the future hopefully using multiple cores. > An alternative is to go straight to C, with a cython interface. If you look in the num/src subdirectory of http://currents.soest.hawaii.edu/hgstage/hgwebdir.cgi/pycurrents/ you will find this approach, labeled "ringbuf" and "runstats". See pycurrents/setup.py, and its driver, pycurrents/runsetup.py, to see how runstats is presently being built. Instead of calculating statistics independently each time the window is advanced one data point, the statistics are updated. I have not done any benchmarking, but I expect this approach to be quick. The code is old; I have not tried to update it to take advantage of cython's advances over pyrex. If I were writing it now, I might not bother with the C level at all; it could all be done in cython, probably with no speed penalty, and maybe even with reduced overhead. Eric From erik at rigtorp.com Mon Jan 3 11:32:07 2011 From: erik at rigtorp.com (Erik Rigtorp) Date: Mon, 3 Jan 2011 11:32:07 -0500 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: <4D21F8BD.60003@hawaii.edu> References: <4D21F8BD.60003@hawaii.edu> Message-ID: On Mon, Jan 3, 2011 at 11:26, Eric Firing wrote: > Instead of calculating statistics independently each time the window is > advanced one data point, the statistics are updated. ?I have not done > any benchmarking, but I expect this approach to be quick. This might accumulate numerical errors. But could be fine for many applications. > The code is old; I have not tried to update it to take advantage of > cython's advances over pyrex. ?If I were writing it now, I might not > bother with the C level at all; it could all be done in cython, probably > with no speed penalty, and maybe even with reduced overhead. > No doubt this would be faster, I just wanted to offer a general way to this in NumPy. From pivanov314 at gmail.com Mon Jan 3 16:44:09 2011 From: pivanov314 at gmail.com (Paul Ivanov) Date: Mon, 3 Jan 2011 13:44:09 -0800 Subject: [Numpy-discussion] numpy installation In-Reply-To: <962444.76273.qm@web29613.mail.ird.yahoo.com> References: <962444.76273.qm@web29613.mail.ird.yahoo.com> Message-ID: <20110103214409.GC17029@ykcyc> Waqar Rashid, on 2011-01-02 00:38, wrote: > Hi, > > trying to install numpy on MacOS with python 3.1 > > Having installation issues. Has anyone managed to install this on the Mac? > > regards > Waqar - you sent this to the IPython-User list, but I think you probably meant to send it to the numpy-discussion list, since your question does not pertain to IPython itself, so I'm forwarding your email there. Also, can you be more specific about what issues you are having? best, -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From SSharma84 at slb.com Tue Jan 4 02:31:00 2011 From: SSharma84 at slb.com (Sachin Kumar Sharma) Date: Tue, 4 Jan 2011 07:31:00 +0000 Subject: [Numpy-discussion] newbie question (curve fitting Z=f(X,Y)) Message-ID: <75C2FED246299A478280FA1470EDA4430C9025E4@NL0230MBX06N1.DIR.slb.com> Hi, Absolute basic question I want to do following * Read data from excel sheet (three columns - X,Y,Z) * Curve fit a function Z=F(X,Y) * Plot X Vs Z (from data) and plot X Vs Z (from curve fit) Kindly advise me how to write a basic python script for the same. Thanks & Regards Sachin ************************************************************************ Sachin Kumar Sharma Senior Geomodeler -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpscipy at gmail.com Tue Jan 4 03:30:24 2011 From: jpscipy at gmail.com (Justin Peel) Date: Tue, 4 Jan 2011 01:30:24 -0700 Subject: [Numpy-discussion] Submitting patches Message-ID: Hi all, I've been submitting some patches recently just by putting them on Trac. However, I noticed in the Numpy Developer Guide that it says: The recommended way to proceed is either to attach these files to an enhancement ticket in the Numpy Trac and send a mail about the enhancement to the NumPy mailing list. This line is rather confusing. Either the 'either' should be removed or 'and'->'or'. In other words, is it sufficient to submit the patch in Trac or should I also be emailing the Numpy mailing list about each patch I submit? Justin From cournape at gmail.com Tue Jan 4 07:34:19 2011 From: cournape at gmail.com (David Cournapeau) Date: Tue, 4 Jan 2011 21:34:19 +0900 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: On Wed, Dec 22, 2010 at 11:20 AM, Mark Wiebe wrote: > On Tue, Dec 21, 2010 at 6:06 PM, David wrote: >> >> >> >> This looks pretty cool. I hope to be able to take a look at it during >> the christmas holidays. > > Thanks! Ok, I took some time to look into it, but I am far from understanding everything yet. I will need more time. One design issue which bothers me a bit is the dynamically created structure for the iterator - do you have some benchmarks which show that this design is significantly better than a plain old C data structure with a couple of dynamically allocated arrays ? Besides bypassing the compiler type checks, I am a bit worried about the ability to extend the iterator through "inheritence in C" like I did with neighborhood iterator, but maybe I should just try it. I think the code would benefit from smaller functions, too - 500+ lines functions is just too much IMO, it should be split up. To get a deeper understanding of the code, I am starting to implement several benchmarks to compare old and new iterator - do you already have some of them handy ? Thanks for the hard work, that's a really nice piece of code, David From brockp at umich.edu Tue Jan 4 10:40:08 2011 From: brockp at umich.edu (Brock Palen) Date: Tue, 4 Jan 2011 10:40:08 -0500 Subject: [Numpy-discussion] NumPy on HPC podcast Message-ID: <466AE4A6-89BF-40FE-8A95-3DC88D32C6ED@umich.edu> I host and HPC podcast with Jeff Squyres of OpenMPI fame: www.rce-cast.com We would like to have a developer or two from NumPy on the show to represent the project. We do this over phone or skype and takes about an hour. Feel free to contact me out of band. I hope to hear from you soon! Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985 From seb.haase at gmail.com Tue Jan 4 11:06:05 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 4 Jan 2011 17:06:05 +0100 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: <4D21F8BD.60003@hawaii.edu> Message-ID: On Mon, Jan 3, 2011 at 5:32 PM, Erik Rigtorp wrote: > On Mon, Jan 3, 2011 at 11:26, Eric Firing wrote: >> Instead of calculating statistics independently each time the window is >> advanced one data point, the statistics are updated. ?I have not done >> any benchmarking, but I expect this approach to be quick. > > This might accumulate numerical errors. But could be fine for many applications. > >> The code is old; I have not tried to update it to take advantage of >> cython's advances over pyrex. ?If I were writing it now, I might not >> bother with the C level at all; it could all be done in cython, probably >> with no speed penalty, and maybe even with reduced overhead. >> > > No doubt this would be faster, I just wanted to offer a general way to > this in NumPy. > _______________________________________________ BTW, some of these operations can be done using scipy's ndimage - right ? Any comments ? How does the performance compare ? ndimage might have more options regarding edge handling, or ? Cheers, Sebastian Haase From kwgoodman at gmail.com Tue Jan 4 11:14:58 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 4 Jan 2011 08:14:58 -0800 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: <4D21F8BD.60003@hawaii.edu> Message-ID: On Tue, Jan 4, 2011 at 8:06 AM, Sebastian Haase wrote: > On Mon, Jan 3, 2011 at 5:32 PM, Erik Rigtorp wrote: >> On Mon, Jan 3, 2011 at 11:26, Eric Firing wrote: >>> Instead of calculating statistics independently each time the window is >>> advanced one data point, the statistics are updated. ?I have not done >>> any benchmarking, but I expect this approach to be quick. >> >> This might accumulate numerical errors. But could be fine for many applications. >> >>> The code is old; I have not tried to update it to take advantage of >>> cython's advances over pyrex. ?If I were writing it now, I might not >>> bother with the C level at all; it could all be done in cython, probably >>> with no speed penalty, and maybe even with reduced overhead. >>> >> >> No doubt this would be faster, I just wanted to offer a general way to >> this in NumPy. >> _______________________________________________ > > BTW, some of these operations can be done using scipy's ndimage ?- right ? > Any comments ? ?How does the performance compare ? > ndimage might have more options regarding edge handling, or ? Take a look at the moving window function in the development version of the la package: https://github.com/kwgoodman/la/blob/master/la/farray/mov.py Many of the moving window functions offer three calculation methods: filter (ndimage), strides (the strides trick discussed in this thread), and loop (a simple python loop). For example: >> a = np.random.rand(500,2000) >> timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') 1 loops, best of 3: 336 ms per loop >> timeit la.farray.mov_max(a, window=252, axis=-1, method='strides') 1 loops, best of 3: 609 ms per loop >> timeit la.farray.mov_max(a, window=252, axis=-1, method='loop') 1 loops, best of 3: 638 ms per loop No one method is best for all situations. That is one of the reasons I started the Bottleneck package. I figured Cython could beat them all. From jpscipy at gmail.com Tue Jan 4 13:49:32 2011 From: jpscipy at gmail.com (Justin Peel) Date: Tue, 4 Jan 2011 11:49:32 -0700 Subject: [Numpy-discussion] Question regarding submitting patches Message-ID: Hi all, I've been submitting some patches recently just by putting them on Trac. However, I noticed in the Numpy Developer Guide that it says: The recommended way to proceed is either to attach these files to an enhancement ticket in the Numpy Trac and send a mail about the enhancement to the NumPy mailing list. This line is rather confusing. Either the 'either' should be removed or 'and'->'or'. In other words, is it sufficient to submit the patch in Trac or should I also be emailing the Numpy mailing list about each patch I submit? Justin From mwwiebe at gmail.com Tue Jan 4 15:04:34 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 4 Jan 2011 12:04:34 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: On Sat, Jan 1, 2011 at 11:23 AM, John Salvatier wrote: > This thread is a bit old, but since it's not possible to use the C-API is > possible to accomplish this same thing with the Python API? > I've committed Python exposure for nested iteration to the new_iterator branch. In doing so, I also changed the mechanism in C. I found that it was simpler to expose to Python if I added a Reset function which gives new base data pointers, and this also simplifies C code using nested iterators. The Python code a = arange(2).reshape(2,1) b = arange(3).reshape(1,3) i, j = np.nested_iters([a,b], [[0],[1]]) for x in i: print "inner:" for y in j: print y[0], y[1] gives inner: 0 0 0 1 0 2 inner: 1 0 1 1 1 2 and C code for nested iteration looks something like this: NpyIter *iter1, *iter1; NpyIter_IterNext_Fn iternext1, iternext2; char **dataptrs1; /* * With the exact same operands, no copies allowed, and * no axis in op_axes used both in iter1 and iter2. * Buffering may be enabled for iter2, but not for iter1. */ iter1 = ...; iter2 = ...; iternext1 = NpyIter_GetIterNext(iter1); iternext2 = NpyIter_GetIterNext(iter2); dataptrs1 = NpyIter_GetDataPtrArray(iter1); do { NpyIter_ResetBasePointers(iter2, dataptrs1); do { /* Use the iter2 values */ } while (iternext2(iter2)); } while (iternext1(iter1)); Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Tue Jan 4 15:15:56 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 4 Jan 2011 12:15:56 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: Wow, great! I'm excited to try this. I think your patch significantly increases the extendability of numpy. Is the C-API exposed currently? Can you use the API from Cython (meaning is the numpy.pxd file updated)? On Tue, Jan 4, 2011 at 12:04 PM, Mark Wiebe wrote: > On Sat, Jan 1, 2011 at 11:23 AM, John Salvatier > wrote: > >> This thread is a bit old, but since it's not possible to use the C-API is >> possible to accomplish this same thing with the Python API? >> > > I've committed Python exposure for nested iteration to the new_iterator > branch. In doing so, I also changed the mechanism in C. I found that it > was simpler to expose to Python if I added a Reset function which gives new > base data pointers, and this also simplifies C code using nested iterators. > > The Python code > > a = arange(2).reshape(2,1) > b = arange(3).reshape(1,3) > > i, j = np.nested_iters([a,b], [[0],[1]]) > for x in i: > print "inner:" > for y in j: > print y[0], y[1] > > > gives > > inner: > 0 0 > 0 1 > 0 2 > inner: > 1 0 > 1 1 > 1 2 > > > and C code for nested iteration looks something like this: > > NpyIter *iter1, *iter1; > NpyIter_IterNext_Fn iternext1, iternext2; > char **dataptrs1; > > /* > * With the exact same operands, no copies allowed, and > * no axis in op_axes used both in iter1 and iter2. > * Buffering may be enabled for iter2, but not for iter1. > */ > iter1 = ...; iter2 = ...; > > iternext1 = NpyIter_GetIterNext(iter1); > iternext2 = NpyIter_GetIterNext(iter2); > dataptrs1 = NpyIter_GetDataPtrArray(iter1); > > do { > NpyIter_ResetBasePointers(iter2, dataptrs1); > do { > /* Use the iter2 values */ > } while (iternext2(iter2)); > } while (iternext1(iter1)); > > Cheers, > Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jan 4 15:59:48 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 4 Jan 2011 12:59:48 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: On Tue, Jan 4, 2011 at 12:15 PM, John Salvatier wrote: > Wow, great! I'm excited to try this. I think your patch significantly > increases the extendability of numpy. > > Is the C-API exposed currently? Can you use the API from Cython (meaning is > the numpy.pxd file updated)? > The C-API isn't exposed yet, but that won't be too difficult since it's mostly a matter of adding all the functions to the arrays in the python setup files. I thought I might do that and look at plugging it into numexpr at the same time, since to be able to use the iterator's buffering and numexpr's multithreading together will require some small additions to the iterator. Cheers, Mark On Tue, Jan 4, 2011 at 12:04 PM, Mark Wiebe wrote: > >> On Sat, Jan 1, 2011 at 11:23 AM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> This thread is a bit old, but since it's not possible to use the C-API is >>> possible to accomplish this same thing with the Python API? >>> >> >> I've committed Python exposure for nested iteration to the new_iterator >> branch. In doing so, I also changed the mechanism in C. I found that it >> was simpler to expose to Python if I added a Reset function which gives new >> base data pointers, and this also simplifies C code using nested iterators. >> >> The Python code >> >> a = arange(2).reshape(2,1) >> b = arange(3).reshape(1,3) >> >> i, j = np.nested_iters([a,b], [[0],[1]]) >> for x in i: >> print "inner:" >> for y in j: >> print y[0], y[1] >> >> >> gives >> >> inner: >> 0 0 >> 0 1 >> 0 2 >> inner: >> 1 0 >> 1 1 >> 1 2 >> >> >> and C code for nested iteration looks something like this: >> >> NpyIter *iter1, *iter1; >> NpyIter_IterNext_Fn iternext1, iternext2; >> char **dataptrs1; >> >> /* >> * With the exact same operands, no copies allowed, and >> * no axis in op_axes used both in iter1 and iter2. >> * Buffering may be enabled for iter2, but not for iter1. >> */ >> iter1 = ...; iter2 = ...; >> >> iternext1 = NpyIter_GetIterNext(iter1); >> iternext2 = NpyIter_GetIterNext(iter2); >> dataptrs1 = NpyIter_GetDataPtrArray(iter1); >> >> do { >> NpyIter_ResetBasePointers(iter2, dataptrs1); >> do { >> /* Use the iter2 values */ >> } while (iternext2(iter2)); >> } while (iternext1(iter1)); >> >> Cheers, >> Mark >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jan 4 16:01:44 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 4 Jan 2011 13:01:44 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: Oh, and I'm not sure about Cython, since I've never looked into its details. I imagine Cython will want to short circuit some of the Python exposure code, since accessing the iterator values creates new array objects. -Mark On Tue, Jan 4, 2011 at 12:59 PM, Mark Wiebe wrote: > On Tue, Jan 4, 2011 at 12:15 PM, John Salvatier > wrote: > >> Wow, great! I'm excited to try this. I think your patch significantly >> increases the extendability of numpy. >> >> Is the C-API exposed currently? Can you use the API from Cython (meaning >> is the numpy.pxd file updated)? >> > > The C-API isn't exposed yet, but that won't be too difficult since it's > mostly a matter of adding all the functions to the arrays in the python > setup files. I thought I might do that and look at plugging it into numexpr > at the same time, since to be able to use the iterator's buffering and > numexpr's multithreading together will require some small additions to the > iterator. > > Cheers, > Mark > > On Tue, Jan 4, 2011 at 12:04 PM, Mark Wiebe wrote: >> >>> On Sat, Jan 1, 2011 at 11:23 AM, John Salvatier < >>> jsalvati at u.washington.edu> wrote: >>> >>>> This thread is a bit old, but since it's not possible to use the C-API >>>> is possible to accomplish this same thing with the Python API? >>>> >>> >>> I've committed Python exposure for nested iteration to the new_iterator >>> branch. In doing so, I also changed the mechanism in C. I found that it >>> was simpler to expose to Python if I added a Reset function which gives new >>> base data pointers, and this also simplifies C code using nested iterators. >>> >>> The Python code >>> >>> a = arange(2).reshape(2,1) >>> b = arange(3).reshape(1,3) >>> >>> i, j = np.nested_iters([a,b], [[0],[1]]) >>> for x in i: >>> print "inner:" >>> for y in j: >>> print y[0], y[1] >>> >>> >>> gives >>> >>> inner: >>> 0 0 >>> 0 1 >>> 0 2 >>> inner: >>> 1 0 >>> 1 1 >>> 1 2 >>> >>> >>> and C code for nested iteration looks something like this: >>> >>> NpyIter *iter1, *iter1; >>> NpyIter_IterNext_Fn iternext1, iternext2; >>> char **dataptrs1; >>> >>> /* >>> * With the exact same operands, no copies allowed, and >>> * no axis in op_axes used both in iter1 and iter2. >>> * Buffering may be enabled for iter2, but not for iter1. >>> */ >>> iter1 = ...; iter2 = ...; >>> >>> iternext1 = NpyIter_GetIterNext(iter1); >>> iternext2 = NpyIter_GetIterNext(iter2); >>> dataptrs1 = NpyIter_GetDataPtrArray(iter1); >>> >>> do { >>> NpyIter_ResetBasePointers(iter2, dataptrs1); >>> do { >>> /* Use the iter2 values */ >>> } while (iternext2(iter2)); >>> } while (iternext1(iter1)); >>> >>> Cheers, >>> Mark >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Tue Jan 4 16:05:44 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 4 Jan 2011 13:05:44 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: Cython just has interfaces to the C-API, I think. On Tue, Jan 4, 2011 at 1:01 PM, Mark Wiebe wrote: > Oh, and I'm not sure about Cython, since I've never looked into its > details. I imagine Cython will want to short circuit some of the Python > exposure code, since accessing the iterator values creates new array > objects. > > -Mark > > > On Tue, Jan 4, 2011 at 12:59 PM, Mark Wiebe wrote: > >> On Tue, Jan 4, 2011 at 12:15 PM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> Wow, great! I'm excited to try this. I think your patch significantly >>> increases the extendability of numpy. >>> >>> Is the C-API exposed currently? Can you use the API from Cython (meaning >>> is the numpy.pxd file updated)? >>> >> >> The C-API isn't exposed yet, but that won't be too difficult since it's >> mostly a matter of adding all the functions to the arrays in the python >> setup files. I thought I might do that and look at plugging it into numexpr >> at the same time, since to be able to use the iterator's buffering and >> numexpr's multithreading together will require some small additions to the >> iterator. >> >> Cheers, >> Mark >> >> On Tue, Jan 4, 2011 at 12:04 PM, Mark Wiebe wrote: >>> >>>> On Sat, Jan 1, 2011 at 11:23 AM, John Salvatier < >>>> jsalvati at u.washington.edu> wrote: >>>> >>>>> This thread is a bit old, but since it's not possible to use the C-API >>>>> is possible to accomplish this same thing with the Python API? >>>>> >>>> >>>> I've committed Python exposure for nested iteration to the new_iterator >>>> branch. In doing so, I also changed the mechanism in C. I found that it >>>> was simpler to expose to Python if I added a Reset function which gives new >>>> base data pointers, and this also simplifies C code using nested iterators. >>>> >>>> The Python code >>>> >>>> a = arange(2).reshape(2,1) >>>> b = arange(3).reshape(1,3) >>>> >>>> i, j = np.nested_iters([a,b], [[0],[1]]) >>>> for x in i: >>>> print "inner:" >>>> for y in j: >>>> print y[0], y[1] >>>> >>>> >>>> gives >>>> >>>> inner: >>>> 0 0 >>>> 0 1 >>>> 0 2 >>>> inner: >>>> 1 0 >>>> 1 1 >>>> 1 2 >>>> >>>> >>>> and C code for nested iteration looks something like this: >>>> >>>> NpyIter *iter1, *iter1; >>>> NpyIter_IterNext_Fn iternext1, iternext2; >>>> char **dataptrs1; >>>> >>>> /* >>>> * With the exact same operands, no copies allowed, and >>>> * no axis in op_axes used both in iter1 and iter2. >>>> * Buffering may be enabled for iter2, but not for iter1. >>>> */ >>>> iter1 = ...; iter2 = ...; >>>> >>>> iternext1 = NpyIter_GetIterNext(iter1); >>>> iternext2 = NpyIter_GetIterNext(iter2); >>>> dataptrs1 = NpyIter_GetDataPtrArray(iter1); >>>> >>>> do { >>>> NpyIter_ResetBasePointers(iter2, dataptrs1); >>>> do { >>>> /* Use the iter2 values */ >>>> } while (iternext2(iter2)); >>>> } while (iternext1(iter1)); >>>> >>>> Cheers, >>>> Mark >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jan 4 16:37:01 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 4 Jan 2011 13:37:01 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: On Tue, Jan 4, 2011 at 4:34 AM, David Cournapeau wrote: > > Ok, I took some time to look into it, but I am far from understanding > everything yet. I will need more time. > Yeah, it ended up being pretty large. I think the UFunc code will shrink substantially when it uses this iterator, which is something I was targeting. One design issue which bothers me a bit is the dynamically created > structure for the iterator - do you have some benchmarks which show > that this design is significantly better than a plain old C data > structure with a couple of dynamically allocated arrays ? Besides > bypassing the compiler type checks, I am a bit worried about the > ability to extend the iterator through "inheritence in C" like I did > with neighborhood iterator, but maybe I should just try it. > I know what you mean - if I could use C++ templates the implementation could probably have the best of both worlds, but seeing as NumPy is in C I tried to compromise mostly towards higher performance. I don't have benchmarks showing that the implementation is faster, but I did validate that the compiler does the optimizations I want it to do. For example, the specialized iternext function for 1 operand and 1 dimension, a common case because of dimension coalescing, looks like this on my machine: 0: 48 83 47 58 01 addq $0x1,0x58(%rdi) 5: 48 8b 47 60 mov 0x60(%rdi),%rax 9: 48 01 47 68 add %rax,0x68(%rdi) d: 48 8b 47 50 mov 0x50(%rdi),%rax 11: 48 39 47 58 cmp %rax,0x58(%rdi) 15: 0f 9c c0 setl %al 18: 0f b6 c0 movzbl %al,%eax 1b: c3 retq The function has no branches and all memory accesses are directly offset from the iter pointer %rdi, something I think is pretty good. If this data was in separately allocated arrays, I think it would hurt locality as well as add some more instructions. In the implementation, I tried to structure the data access macros so errors are easy to spot. Accessing the bufferdata and the axisdata isn't typed, but I can think of ways to do that. I was viewing the implementation as fully opaque to any non-iterator code, even within NumPy, do you think such access will be necessary? I think the code would benefit from smaller functions, too - 500+ > lines functions is just too much IMO, it should be split up. > I definitely agree, I've been splitting things up as they got large, but that's not finished. I also think the main iterator .c file is too large and needs splitting up. To get a deeper understanding of the code, I am starting to implement > several benchmarks to compare old and new iterator - do you already > have some of them handy ? > So far I've just done timing with the Python exposure, C-based benchmarking is welcome. Where possible, NPY_ITER_NO_INNER_ITERATION should be used, since it exposes the possibility of longer inner loops with no function calls. An example where this is not possible is when coordinates are required. I should probably put together a collection of copy/paste templates for typical use. Thanks for the hard work, that's a really nice piece of code, > Thanks for taking the time to look into it, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Jan 4 17:21:43 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 4 Jan 2011 16:21:43 -0600 Subject: [Numpy-discussion] NumPy on HPC podcast In-Reply-To: <466AE4A6-89BF-40FE-8A95-3DC88D32C6ED@umich.edu> References: <466AE4A6-89BF-40FE-8A95-3DC88D32C6ED@umich.edu> Message-ID: <1DA9923A-4AB9-444F-948D-8627EA5E014D@enthought.com> Hi Brock, I would be happy to participate if I can. When is it? -Travis On Jan 4, 2011, at 9:40 AM, Brock Palen wrote: > I host and HPC podcast with Jeff Squyres of OpenMPI fame: > > www.rce-cast.com > > We would like to have a developer or two from NumPy on the show to represent the project. We do this over phone or skype and takes about an hour. > > Feel free to contact me out of band. I hope to hear from you soon! > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > brockp at umich.edu > (734)936-1985 > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From brockp at umich.edu Tue Jan 4 19:06:28 2011 From: brockp at umich.edu (Brock Palen) Date: Tue, 4 Jan 2011 19:06:28 -0500 Subject: [Numpy-discussion] NumPy on HPC podcast In-Reply-To: <1DA9923A-4AB9-444F-948D-8627EA5E014D@enthought.com> References: <466AE4A6-89BF-40FE-8A95-3DC88D32C6ED@umich.edu> <1DA9923A-4AB9-444F-948D-8627EA5E014D@enthought.com> Message-ID: We record the show in advance, edit and then release. We would hope to record in the next week or two. Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985 On Jan 4, 2011, at 5:21 PM, Travis Oliphant wrote: > Hi Brock, > > I would be happy to participate if I can. When is it? > > -Travis > > On Jan 4, 2011, at 9:40 AM, Brock Palen wrote: > >> I host and HPC podcast with Jeff Squyres of OpenMPI fame: >> >> www.rce-cast.com >> >> We would like to have a developer or two from NumPy on the show to represent the project. We do this over phone or skype and takes about an hour. >> >> Feel free to contact me out of band. I hope to hear from you soon! >> >> Brock Palen >> www.umich.edu/~brockp >> Center for Advanced Computing >> brockp at umich.edu >> (734)936-1985 >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > --- > Travis Oliphant > Enthought, Inc. > oliphant at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From faltet at pytables.org Wed Jan 5 07:26:08 2011 From: faltet at pytables.org (Francesc Alted) Date: Wed, 5 Jan 2011 13:26:08 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: 2011/1/4, Mark Wiebe : >> To get a deeper understanding of the code, I am starting to implement >> several benchmarks to compare old and new iterator - do you already >> have some of them handy ? >> > > So far I've just done timing with the Python exposure, C-based benchmarking > is welcome. Where possible, NPY_ITER_NO_INNER_ITERATION should be used, > since it exposes the possibility of longer inner loops with no function > calls. An example where this is not possible is when coordinates are > required. I should probably put together a collection of copy/paste > templates for typical use. Sorry for the naive question, but I use the numpy.fromiter() iterator quite a few in my projects. and I'm curious on whether this new iterator would allow numpy.fromiter() to go faster (I mean, in Python space). Any hint? -- Francesc Alted From jsseabold at gmail.com Wed Jan 5 10:30:08 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 5 Jan 2011 10:30:08 -0500 Subject: [Numpy-discussion] Question regarding submitting patches In-Reply-To: References: Message-ID: On Tue, Jan 4, 2011 at 1:49 PM, Justin Peel wrote: > Hi all, > > I've been submitting some patches recently just by putting them on > Trac. However, I noticed in the Numpy Developer Guide that it says: > > ? The recommended way to proceed is either to attach these files to > an enhancement ticket in the Numpy Trac and send a mail about the > enhancement to the NumPy mailing list. > > This line is rather confusing. Either the 'either' should be removed > or 'and'->'or'. In other words, is it sufficient to submit the patch > in Trac or should I also be emailing the Numpy mailing list about each > patch I submit? > I am not positive, but I think that having them on Trac ensures that they're not lost and is sufficient. An e-mail serves to draw some attention (or not) for a review or speedier inclusion. Should this recommendation in the docs be changed or amended with the switch to git? Skipper From ben.root at ou.edu Wed Jan 5 11:07:13 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 5 Jan 2011 10:07:13 -0600 Subject: [Numpy-discussion] Question regarding submitting patches In-Reply-To: References: Message-ID: On Wed, Jan 5, 2011 at 9:30 AM, Skipper Seabold wrote: > On Tue, Jan 4, 2011 at 1:49 PM, Justin Peel wrote: > > Hi all, > > > > I've been submitting some patches recently just by putting them on > > Trac. However, I noticed in the Numpy Developer Guide that it says: > > > > The recommended way to proceed is either to attach these files to > > an enhancement ticket in the Numpy Trac and send a mail about the > > enhancement to the NumPy mailing list. > > > > This line is rather confusing. Either the 'either' should be removed > > or 'and'->'or'. In other words, is it sufficient to submit the patch > > in Trac or should I also be emailing the Numpy mailing list about each > > patch I submit? > > > > I am not positive, but I think that having them on Trac ensures that > they're not lost and is sufficient. An e-mail serves to draw some > attention (or not) for a review or speedier inclusion. > > Should this recommendation in the docs be changed or amended with the > switch to git? > > Skipper > At the very least, the wording is grammatically incorrect and should be fixed. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdh2358 at gmail.com Wed Jan 5 11:27:32 2011 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 5 Jan 2011 10:27:32 -0600 Subject: [Numpy-discussion] segfault on complex array on solaris x86 Message-ID: johnh at udesktop253:~> gcc --version gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. johnh at udesktop253:~> uname -a SunOS udesktop253 5.10 Generic_142910-17 i86pc i386 i86pc johnh at udesktop253:~> cat test.py import numpy as np print np.__version__ fs = 1000 t = np.linspace(0, 0.3, 301) A = np.array([2, 8]).reshape(-1, 1) f = np.array([150, 140]).reshape(-1, 1) xn = (A * np.exp(2j * np.pi * f * t)).sum(axis=0) johnh at udesktop253:~> python test.py 2.0.0.dev-9451260 Segmentation Fault (core dumped) johnh at udesktop253:~> johnh at udesktop253:~> sudo pstack /var/core/core.python.957 core '/var/core/core.python.957' of 9397: python test.py febf1928 cexp (0, 0, 0, 0, 8060ab0, 84321ac) + 1b0 fe9657e0 npy_cexp (80458e0, 0, 0, 0, 0, 84e2530) + 30 fe95064f nc_exp (8045920, 84e72a0, 8045978, 8045920, 10, 10) + 3f fe937d5b PyUFunc_D_D (84e2530, 84e20f4, 84e25b0, fe950610, 1, 0) + 5b fe95e818 PyUFunc_GenericFunction (81e96e0, 807deac, 0, 80460b8, 2, 2) + 448 fe95fb10 ufunc_generic_call (81e96e0, 807deac, 0, fe98a820) + 70 feeb2d78 PyObject_Call (81e96e0, 807deac, 0, 80a24ec, 8061c08, 0) + 28 fef11900 PyEval_EvalFrame (80a2394, 81645a0, 8079824, 8079824) + 146c fef17708 PyEval_EvalCodeEx (81645a0, 8079824, 8079824, 0, 0, 0) + 620 fef178af PyEval_EvalCode (81645a0, 8079824, 8079824, 8061488, fef3d9ee, 0) + 2f fef3d095 PyRun_FileExFlags (feb91c98, 804687b, 101, 8079824, 8079824, 1) + 75 fef3d9ee PyRun_SimpleFileExFlags (feb91c98, 804687b, 1, 80465a8, fef454a1, 804687b) + 172 fef3e4fd PyRun_AnyFileExFlags (feb91c98, 804687b, 1, 80465a8) + 61 fef454a1 Py_Main (1, 80466b8, feb1cf35, fea935a1, 29, feb96750) + 9d9 08050862 main (2, 80466b8, 80466c4) + 22 08050758 _start (2, 8046874, 804687b, 0, 8046883, 80468ad) + 60 -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Wed Jan 5 12:50:16 2011 From: millman at berkeley.edu (Jarrod Millman) Date: Wed, 5 Jan 2011 18:50:16 +0100 Subject: [Numpy-discussion] Question regarding submitting patches In-Reply-To: References: Message-ID: On Tue, Jan 4, 2011 at 7:49 PM, Justin Peel wrote: > I've been submitting some patches recently just by putting them on > Trac. However, I noticed in the Numpy Developer Guide that it says: > > ? The recommended way to proceed is either to attach these files to > an enhancement ticket in the Numpy Trac and send a mail about the > enhancement to the NumPy mailing list. For now, we should just remove the 'either'. I will take a look at how to reintegrate the changes into the master gitwash document later tonight: https://github.com/matthew-brett/gitwash/blob/master/gitwash/patching.rst Thanks for pointing out the grammatical error. Jarrod PS. Just to be clear, the developer doc you are referring to is: http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html I just want to make sure that there isn't some old wiki page somewhere that needs to be deleted. From mwwiebe at gmail.com Wed Jan 5 13:01:18 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 5 Jan 2011 10:01:18 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: On Wed, Jan 5, 2011 at 4:26 AM, Francesc Alted wrote: > Sorry for the naive question, but I use the numpy.fromiter() iterator > quite a few in my projects. and I'm curious on whether this new > iterator would allow numpy.fromiter() to go faster (I mean, in Python > space). Any hint? > The new iterator doesn't offer any help to fromiter in general, but if the iterator being given to the function is a new iterator it would be possible to handle it specially and get a big speedup. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Jan 5 21:06:18 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 5 Jan 2011 20:06:18 -0600 Subject: [Numpy-discussion] newbie question (curve fitting Z=f(X,Y)) In-Reply-To: <75C2FED246299A478280FA1470EDA4430C9025E4@NL0230MBX06N1.DIR.slb.com> References: <75C2FED246299A478280FA1470EDA4430C9025E4@NL0230MBX06N1.DIR.slb.com> Message-ID: On Tue, Jan 4, 2011 at 1:31 AM, Sachin Kumar Sharma wrote: > Hi, > > > > Absolute basic question I want to do following > > > > ? Read data from excel sheet (three columns ? X,Y,Z) > > ? Curve fit a function Z=F(X,Y) > > ? Plot X Vs Z (from data) and plot X Vs Z (from curve fit) > > > > > > Kindly advise me how to write a basic python script for the same. > > > > Thanks & Regards > > > > Sachin > > Sachin, If you need to read data from excel files directly, there is a tool called python-excel: http://www.python-excel.org/ Although, personally, I would just simply recommend exporting the excel data into a csv file and then use numpy's loadtxt() function to read the text file. As for curve-fitting, you are likely want to use scipy's optimize toolkit: http://docs.scipy.org/doc/scipy/reference/optimize.html Finally, you use matplotlib for plotting. This is very high-level and doesn't go into much detail, but I am sure if you read up on how to use these tools, you will be able to get what you want. If you have any specific questions, then try asking either here in the numpy mailing list (for numpy-related issues), or the scipy-users mailing list (for scipy related issues) or the matplotlib-users mailing list (for plotting issues). I hope that helps! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Thu Jan 6 04:48:18 2011 From: faltet at pytables.org (Francesc Alted) Date: Thu, 6 Jan 2011 10:48:18 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: 2011/1/5, Mark Wiebe : > On Wed, Jan 5, 2011 at 4:26 AM, Francesc Alted wrote: > >> Sorry for the naive question, but I use the numpy.fromiter() iterator >> quite a few in my projects. and I'm curious on whether this new >> iterator would allow numpy.fromiter() to go faster (I mean, in Python >> space). Any hint? >> > > The new iterator doesn't offer any help to fromiter in general, but if the > iterator being given to the function is a new iterator it would be possible > to handle it specially and get a big speedup. Ah, that's what I thought. Thanks for the clarification. -- Francesc Alted From josef.pktd at gmail.com Thu Jan 6 05:14:15 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jan 2011 05:14:15 -0500 Subject: [Numpy-discussion] aa.astype(int) truncates and doesn't round Message-ID: just something I bumped into and wasn't aware of >>> aa array([ 1., 1., 1., 1., 1.]) >>> aa.astype(int) array([0, 1, 0, 0, 0]) >>> aa - 1 array([ -2.22044605e-16, 2.22044605e-16, -2.22044605e-16, -3.33066907e-16, -3.33066907e-16]) >>> np.round(aa).astype(int) array([1, 1, 1, 1, 1]) Josef From kwgoodman at gmail.com Thu Jan 6 10:40:57 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 6 Jan 2011 07:40:57 -0800 Subject: [Numpy-discussion] aa.astype(int) truncates and doesn't round In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 2:14 AM, wrote: > just something I bumped into and wasn't aware of > >>>> aa > array([ 1., ?1., ?1., ?1., ?1.]) >>>> aa.astype(int) > array([0, 1, 0, 0, 0]) >>>> aa - 1 > array([ -2.22044605e-16, ? 2.22044605e-16, ?-2.22044605e-16, > ? ? ? ?-3.33066907e-16, ?-3.33066907e-16]) >>>> np.round(aa).astype(int) > array([1, 1, 1, 1, 1]) >> a = np.ones(100) >> a.astype(int) array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) My default numpy int is 64 bits. Try 32 bits: >> a = np.ones(100, np.int32) >> a.astype(int) array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) From josef.pktd at gmail.com Thu Jan 6 12:04:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jan 2011 12:04:56 -0500 Subject: [Numpy-discussion] aa.astype(int) truncates and doesn't round In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 10:40 AM, Keith Goodman wrote: > On Thu, Jan 6, 2011 at 2:14 AM, ? wrote: >> just something I bumped into and wasn't aware of >> >>>>> aa >> array([ 1., ?1., ?1., ?1., ?1.]) >>>>> aa.astype(int) >> array([0, 1, 0, 0, 0]) >>>>> aa - 1 >> array([ -2.22044605e-16, ? 2.22044605e-16, ?-2.22044605e-16, >> ? ? ? ?-3.33066907e-16, ?-3.33066907e-16]) >>>>> np.round(aa).astype(int) >> array([1, 1, 1, 1, 1]) > >>> a = np.ones(100) >>> a.astype(int) > array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1]) > > My default numpy int is 64 bits. Try 32 bits: > >>> a = np.ones(100, np.int32) >>> a.astype(int) > > array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1]) the full exercise includes some calculations first, so we are not precisely at 1 >>> d_ array([[ 1., 0., 0., 0., 0.], [ 1., -1., 0., 0., 0.], [ 1., 0., -1., 0., 0.], [ 1., 0., 0., -1., 0.], [ 1., 0., 0., 0., -1.]]) >>> np.set_printoptions(precision=2) >>> np.linalg.pinv(d_) array([[ 1.00e+00, -2.34e-16, -8.50e-17, -7.63e-17, -8.50e-17], [ 1.00e+00, -1.00e+00, -1.06e-16, 2.19e-16, -6.18e-17], [ 1.00e+00, -2.21e-16, -1.00e+00, -2.27e-16, 9.38e-18], [ 1.00e+00, -6.40e-17, -2.84e-17, -1.00e+00, -7.65e-17], [ 1.00e+00, -1.70e-16, -9.55e-17, -1.52e-17, -1.00e+00]]) >>> np.linalg.pinv(d_).astype(int) array([[ 0, 0, 0, 0, 0], [ 0, -1, 0, 0, 0], [ 0, 0, -1, 0, 0], [ 0, 0, 0, -1, 0], [ 0, 0, 0, 0, -1]]) >>> np.linalg.inv(d_).astype(int) array([[ 1, 0, 0, 0, 0], [ 1, -1, 0, 0, 0], [ 1, 0, -1, 0, 0], [ 1, 0, 0, -1, 0], [ 1, 0, 0, 0, -1]]) Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Thu Jan 6 12:09:00 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 6 Jan 2011 11:09:00 -0600 Subject: [Numpy-discussion] aa.astype(int) truncates and doesn't round In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 09:40, Keith Goodman wrote: > On Thu, Jan 6, 2011 at 2:14 AM, ? wrote: >> just something I bumped into and wasn't aware of >> >>>>> aa >> array([ 1., ?1., ?1., ?1., ?1.]) >>>>> aa.astype(int) >> array([0, 1, 0, 0, 0]) >>>>> aa - 1 >> array([ -2.22044605e-16, ? 2.22044605e-16, ?-2.22044605e-16, >> ? ? ? ?-3.33066907e-16, ?-3.33066907e-16]) >>>>> np.round(aa).astype(int) >> array([1, 1, 1, 1, 1]) > >>> a = np.ones(100) >>> a.astype(int) > array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1]) > > My default numpy int is 64 bits. Try 32 bits: > >>> a = np.ones(100, np.int32) >>> a.astype(int) > > array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > ? ? ? 1, 1, 1, 1, 1, 1, 1, 1]) He's not pointing out a bug. His array does not have 1s in them, but values very close to 1, some slightly above and some slightly below, such that numpy's default printing rounds them to 1. See the "aa - 1" line. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From robert.kern at gmail.com Thu Jan 6 12:11:20 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 6 Jan 2011 11:11:20 -0600 Subject: [Numpy-discussion] aa.astype(int) truncates and doesn't round In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 04:14, wrote: > just something I bumped into and wasn't aware of > >>>> aa > array([ 1., ?1., ?1., ?1., ?1.]) >>>> aa.astype(int) > array([0, 1, 0, 0, 0]) >>>> aa - 1 > array([ -2.22044605e-16, ? 2.22044605e-16, ?-2.22044605e-16, > ? ? ? ?-3.33066907e-16, ?-3.33066907e-16]) >>>> np.round(aa).astype(int) > array([1, 1, 1, 1, 1]) This is behavior inherited from C and matches Python's behavior. int(aa[0]) == 0. Similarly, inside the C code, (int)(1.0 - 2.22e-16) == 0. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From brian.murphy at unitn.it Fri Jan 7 06:33:59 2011 From: brian.murphy at unitn.it (Brian Murphy) Date: Fri, 7 Jan 2011 12:33:59 +0100 Subject: [Numpy-discussion] code for multidimensional scaling? In-Reply-To: <4D26F96C.3070607@unitn.it> References: <4D26F96C.3070607@unitn.it> Message-ID: <4D26FA27.8000100@unitn.it> Hi, I'm new to the list, so I hope my question is appropriate (I've already sent the same posting to the SciPy Users list). I'm looking for code that implements multi-dimensional scaling (e.g. like Matlab's mdscale command) in Python. My best guess was that I would find it in the Scikit Learn package, but couldn't turn anything up. Any suggestions? thanks and regards, Brian -- Brian Murphy Post-Doctoral Researcher Language, Interaction and Computation Lab Centre for Mind/Brain Sciences University of Trento http://clic.cimec.unitn.it/brian/ From Thomas.EMMEL at 3ds.com Fri Jan 7 10:58:04 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Fri, 7 Jan 2011 15:58:04 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array Message-ID: <3A0080EEBFB19C4993C24098DD0A78D108D12609@EU-DCC-MBX01.dsone.3ds.com> Hi, There are some discussions on the speed of numpy compared to Numeric in this list, however I have a topic I don't understand in detail, maybe someone can enlighten me... I use python 2.6 on a SuSE installation and test this: #Python 2.6 (r26:66714, Mar 30 2010, 00:29:28) #[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 #Type "help", "copyright", "credits" or "license" for more information. import timeit #creation of arrays and tuples (timeit number=1000000 by default) timeit.Timer('a((1.,2.,3.))','from numpy import array as a').timeit() #8.2061841487884521 timeit.Timer('a((1.,2.,3.))','from Numeric import array as a').timeit() #9.6958281993865967 timeit.Timer('a((1.,2.,3.))','a=tuple').timeit() #0.13814711570739746 #Result: tuples - of course - are much faster than arrays and numpy is a bit faster in creating arrays than Numeric #working with arrays timeit.Timer('d=x1-x2;sum(d*d)','from Numeric import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() #3.263314962387085 timeit.Timer('d=x1-x2;sum(d*d)','from numpy import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() #9.7236979007720947 #Result: Numeric is three times faster than numpy! Why? #working with components: timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','a=tuple; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() #0.64785194396972656 timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from numpy import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() #3.4181499481201172 timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from Numeric import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() #0.97426199913024902 Result: tuples are again the fastest variant, Numeric is faster than numpy and both are faster than the variant above using the high-level functions! Why? For various reasons I need to use numpy in the future where I used Numeric before. Is there any better solution in numpy I missed? Kind regards and thanks in advance Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Fri Jan 7 11:09:41 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Fri, 7 Jan 2011 08:09:41 -0800 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D12609@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D12609@EU-DCC-MBX01.dsone.3ds.com> Message-ID: Did you try larger arrays/tuples? I would guess that makes a significant difference. On Fri, Jan 7, 2011 at 7:58 AM, EMMEL Thomas wrote: > Hi, > > There are some discussions on the speed of numpy compared to Numeric in > this list, however I have a topic > I don't understand in detail, maybe someone can enlighten me... > I use python 2.6 on a SuSE installation and test this: > > #Python 2.6 (r26:66714, Mar 30 2010, 00:29:28) > #[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 > #Type "help", "copyright", "credits" or "license" for more information. > > import timeit > > #creation of arrays and tuples (timeit number=1000000 by default) > > timeit.Timer('a((1.,2.,3.))','from numpy import array as a').timeit() > #8.2061841487884521 > timeit.Timer('a((1.,2.,3.))','from Numeric import array as a').timeit() > #9.6958281993865967 > timeit.Timer('a((1.,2.,3.))','a=tuple').timeit() > #0.13814711570739746 > > #Result: tuples - of course - are much faster than arrays and numpy is a > bit faster in creating arrays than Numeric > > #working with arrays > > timeit.Timer('d=x1-x2;sum(d*d)','from Numeric import array as a; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #3.263314962387085 > timeit.Timer('d=x1-x2;sum(d*d)','from numpy import array as a; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #9.7236979007720947 > > #Result: Numeric is three times faster than numpy! Why? > > #working with components: > > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','a=tuple; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #0.64785194396972656 > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from > numpy import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #3.4181499481201172 > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from > Numeric import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #0.97426199913024902 > > Result: tuples are again the fastest variant, Numeric is faster than numpy > and both are faster than the variant above using the high-level functions! > Why? > > For various reasons I need to use numpy in the future where I used Numeric > before. > Is there any better solution in numpy I missed? > > Kind regards and thanks in advance > > Thomas > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. > > If you are not one of the named recipients or have received this email in > error, > > (i) you should not read, disclose, or copy it, > > (ii) please notify sender of your receipt by reply email and delete this > email and all attachments, > > (iii) Dassault Systemes does not accept or assume any liability or > responsibility for any use of or reliance on this email. > > For other languages, Click Here > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jan 7 12:49:44 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 7 Jan 2011 11:49:44 -0600 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D12609@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D12609@EU-DCC-MBX01.dsone.3ds.com> Message-ID: On Fri, Jan 7, 2011 at 9:58 AM, EMMEL Thomas wrote: > Hi, > > There are some discussions on the speed of numpy compared to Numeric in > this list, however I have a topic > I don't understand in detail, maybe someone can enlighten me... > I use python 2.6 on a SuSE installation and test this: > > #Python 2.6 (r26:66714, Mar 30 2010, 00:29:28) > #[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 > #Type "help", "copyright", "credits" or "license" for more information. > > import timeit > > #creation of arrays and tuples (timeit number=1000000 by default) > > timeit.Timer('a((1.,2.,3.))','from numpy import array as a').timeit() > #8.2061841487884521 > timeit.Timer('a((1.,2.,3.))','from Numeric import array as a').timeit() > #9.6958281993865967 > timeit.Timer('a((1.,2.,3.))','a=tuple').timeit() > #0.13814711570739746 > > #Result: tuples - of course - are much faster than arrays and numpy is a > bit faster in creating arrays than Numeric > > #working with arrays > > timeit.Timer('d=x1-x2;sum(d*d)','from Numeric import array as a; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #3.263314962387085 > timeit.Timer('d=x1-x2;sum(d*d)','from numpy import array as a; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #9.7236979007720947 > > #Result: Numeric is three times faster than numpy! Why? > > #working with components: > > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','a=tuple; > x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #0.64785194396972656 > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from > numpy import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #3.4181499481201172 > timeit.Timer('d0=x1[0]-x2[0];d1=x1[1]-x2[1];d2=x1[2]-x2[2];d0*d0+d1*d1+d2*d2','from > Numeric import array as a; x1=a((1.,2.,3.));x2=a((2.,4.,6.))').timeit() > #0.97426199913024902 > > Result: tuples are again the fastest variant, Numeric is faster than numpy > and both are faster than the variant above using the high-level functions! > Why? > > For various reasons I need to use numpy in the future where I used Numeric > before. > Is there any better solution in numpy I missed? > > Kind regards and thanks in advance > > Thomas > Don't know how much of an impact it would have, but those timeit statements for array creation include the import process, which are going to be different for each module and are probably not indicative of the speed of array creation. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Sat Jan 8 16:32:48 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Sat, 8 Jan 2011 22:32:48 +0100 Subject: [Numpy-discussion] bincount question In-Reply-To: <4D10F7BB.3000905@gmail.com> References: <4D10F7BB.3000905@gmail.com> Message-ID: 2010/12/21 Alan G Isaac : > :: > > ? ? >>> np.bincount([]) > ? ? Traceback (most recent call last): > ? ? ? File "", line 1, in > ? ? ValueError: The first argument cannot be empty. > > Why not? > (I.e., why isn't an empty array the right answer?) >From the (i.e. "a", or, even more precise, "my") mathematical pov: Define the "bincount" sequence, which will mostly consist of trailing zeros for large indices. Then, the return value is the smallest sequence, s.t. there are no non-zero items left outside the return chunk of the sequence, and of course it must include the zeroth bincount sequence element. So, yes, [] would be the correct answer. >From the algorithmic point of view: Define the length of the sequence returned by the max() of the array handed in + 1. So, since max([]) is undefined, such is bincount in that case. I'm a bit in favour of the mathematical approach. But unfortunately, I cannot fix it, although I think it will break nothing because nothing should rely on this corner case yielding an Exception (but I might be proven wrong, I don't really know). In any case, it might be worth documenting this, by adding it to the ValueError section of the "Raises" part (http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html?highlight=bincount#numpy.bincount). This is something I might be able to do. Friedrich From mwwiebe at gmail.com Sun Jan 9 17:45:02 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 9 Jan 2011 14:45:02 -0800 Subject: [Numpy-discussion] numexpr with the new iterator Message-ID: As a benchmark of C-based iterator usage and to make it work properly in a multi-threaded context, I've updated numexpr to use the new iterator. In addition to some performance improvements, this also made it easy to add optional out= and order= parameters to the evaluate function. The numexpr repository with this update is available here: https://github.com/m-paradox/numexpr To use it, you need the new_iterator branch of NumPy from here: https://github.com/m-paradox/numpy In all cases tested, the iterator version of numexpr's evaluate function matches or beats the standard version. The timing results are below, with some explanatory comments placed inline: -Mark In [1]: import numexpr as ne # numexpr front page example In [2]: a = np.arange(1e6) In [3]: b = np.arange(1e6) In [4]: timeit a**2 + b**2 + 2*a*b 1 loops, best of 3: 121 ms per loop In [5]: ne.set_num_threads(1) # iterator version performance matches standard version In [6]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") 10 loops, best of 3: 24.8 ms per loop In [7]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") 10 loops, best of 3: 24.3 ms per loop In [8]: ne.set_num_threads(2) # iterator version performance matches standard version In [9]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") 10 loops, best of 3: 21 ms per loop In [10]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") 10 loops, best of 3: 20.5 ms per loop # numexpr front page example with a 10x bigger array In [11]: a = np.arange(1e7) In [12]: b = np.arange(1e7) In [13]: ne.set_num_threads(2) # the iterator version performance improvement is due to # a small task scheduler tweak In [14]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 282 ms per loop In [15]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 255 ms per loop # numexpr front page example with a Fortran contiguous array In [16]: a = np.arange(1e7).reshape(10,100,100,100).T In [17]: b = np.arange(1e7).reshape(10,100,100,100).T In [18]: timeit a**2 + b**2 + 2*a*b 1 loops, best of 3: 3.22 s per loop In [19]: ne.set_num_threads(1) # even with a C-ordered output, the iterator version performs better In [20]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 3.74 s per loop In [21]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 379 ms per loop In [22]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') 1 loops, best of 3: 2.03 s per loop In [23]: ne.set_num_threads(2) # the standard version just uses 1 thread here, I believe # the iterator version performs the same as for the flat 1e7-sized array In [24]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 3.92 s per loop In [25]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") 1 loops, best of 3: 254 ms per loop In [26]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') 1 loops, best of 3: 1.74 s per loop -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Sun Jan 9 18:33:41 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Sun, 9 Jan 2011 15:33:41 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: Message-ID: Is evaluate_iter basically numpexpr but using your numpy branch or are there other changes? On Sun, Jan 9, 2011 at 2:45 PM, Mark Wiebe wrote: > As a benchmark of C-based iterator usage and to make it work properly in a > multi-threaded context, I've updated numexpr to use the new iterator. In > addition to some performance improvements, this also made it easy to add > optional out= and order= parameters to the evaluate function. The numexpr > repository with this update is available here: > > https://github.com/m-paradox/numexpr > > To use it, you need the new_iterator branch of NumPy from here: > > https://github.com/m-paradox/numpy > > In all cases tested, the iterator version of numexpr's evaluate function > matches or beats the standard version. The timing results are below, with > some explanatory comments placed inline: > > -Mark > > In [1]: import numexpr as ne > > # numexpr front page example > > In [2]: a = np.arange(1e6) > In [3]: b = np.arange(1e6) > > In [4]: timeit a**2 + b**2 + 2*a*b > 1 loops, best of 3: 121 ms per loop > > In [5]: ne.set_num_threads(1) > > # iterator version performance matches standard version > > In [6]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 24.8 ms per loop > In [7]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 24.3 ms per loop > > In [8]: ne.set_num_threads(2) > > # iterator version performance matches standard version > > In [9]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 21 ms per loop > In [10]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 20.5 ms per loop > > # numexpr front page example with a 10x bigger array > > In [11]: a = np.arange(1e7) > In [12]: b = np.arange(1e7) > > In [13]: ne.set_num_threads(2) > > # the iterator version performance improvement is due to > # a small task scheduler tweak > > In [14]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 282 ms per loop > In [15]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 255 ms per loop > > # numexpr front page example with a Fortran contiguous array > > In [16]: a = np.arange(1e7).reshape(10,100,100,100).T > In [17]: b = np.arange(1e7).reshape(10,100,100,100).T > > In [18]: timeit a**2 + b**2 + 2*a*b > 1 loops, best of 3: 3.22 s per loop > > In [19]: ne.set_num_threads(1) > > # even with a C-ordered output, the iterator version performs better > > In [20]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 3.74 s per loop > In [21]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 379 ms per loop > In [22]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') > 1 loops, best of 3: 2.03 s per loop > > In [23]: ne.set_num_threads(2) > > # the standard version just uses 1 thread here, I believe > # the iterator version performs the same as for the flat 1e7-sized array > > In [24]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 3.92 s per loop > In [25]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 254 ms per loop > In [26]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') > 1 loops, best of 3: 1.74 s per loop > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Jan 9 21:23:43 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 9 Jan 2011 18:23:43 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: Message-ID: That's right, essentially all I've done is replaced the code that handled preparing the arrays and producing blocks of values for the inner loops. There are three new parameters to evaluate_iter as well. It has an "out=" parameter just like ufuncs do, an "order=" parameter which controls the layout of the output if it's created by the function, and a "casting=" parameter which controls what kind of data conversions are permitted. -Mark On Sun, Jan 9, 2011 at 3:33 PM, John Salvatier wrote: > Is evaluate_iter basically numpexpr but using your numpy branch or are > there other changes? > > On Sun, Jan 9, 2011 at 2:45 PM, Mark Wiebe wrote: > >> As a benchmark of C-based iterator usage and to make it work properly in a >> multi-threaded context, I've updated numexpr to use the new iterator. In >> addition to some performance improvements, this also made it easy to add >> optional out= and order= parameters to the evaluate function. The numexpr >> repository with this update is available here: >> >> https://github.com/m-paradox/numexpr >> >> To use it, you need the new_iterator branch of NumPy from here: >> >> https://github.com/m-paradox/numpy >> >> In all cases tested, the iterator version of numexpr's evaluate function >> matches or beats the standard version. The timing results are below, with >> some explanatory comments placed inline: >> >> -Mark >> >> In [1]: import numexpr as ne >> >> # numexpr front page example >> >> In [2]: a = np.arange(1e6) >> In [3]: b = np.arange(1e6) >> >> In [4]: timeit a**2 + b**2 + 2*a*b >> 1 loops, best of 3: 121 ms per loop >> >> In [5]: ne.set_num_threads(1) >> >> # iterator version performance matches standard version >> >> In [6]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") >> 10 loops, best of 3: 24.8 ms per loop >> In [7]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") >> 10 loops, best of 3: 24.3 ms per loop >> >> In [8]: ne.set_num_threads(2) >> >> # iterator version performance matches standard version >> >> In [9]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") >> 10 loops, best of 3: 21 ms per loop >> In [10]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") >> 10 loops, best of 3: 20.5 ms per loop >> >> # numexpr front page example with a 10x bigger array >> >> In [11]: a = np.arange(1e7) >> In [12]: b = np.arange(1e7) >> >> In [13]: ne.set_num_threads(2) >> >> # the iterator version performance improvement is due to >> # a small task scheduler tweak >> >> In [14]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 282 ms per loop >> In [15]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 255 ms per loop >> >> # numexpr front page example with a Fortran contiguous array >> >> In [16]: a = np.arange(1e7).reshape(10,100,100,100).T >> In [17]: b = np.arange(1e7).reshape(10,100,100,100).T >> >> In [18]: timeit a**2 + b**2 + 2*a*b >> 1 loops, best of 3: 3.22 s per loop >> >> In [19]: ne.set_num_threads(1) >> >> # even with a C-ordered output, the iterator version performs better >> >> In [20]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 3.74 s per loop >> In [21]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 379 ms per loop >> In [22]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') >> 1 loops, best of 3: 2.03 s per loop >> >> In [23]: ne.set_num_threads(2) >> >> # the standard version just uses 1 thread here, I believe >> # the iterator version performs the same as for the flat 1e7-sized array >> >> In [24]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 3.92 s per loop >> In [25]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") >> 1 loops, best of 3: 254 ms per loop >> In [26]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') >> 1 loops, best of 3: 1.74 s per loop >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.EMMEL at 3ds.com Mon Jan 10 03:09:17 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Mon, 10 Jan 2011 08:09:17 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array Message-ID: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> To John: > Did you try larger arrays/tuples? I would guess that makes a significant > difference. No I didn't, due to the fact that these values are coordinates in 3D (x,y,z). In fact I work with a list/array/tuple of arrays with 100000 to 1M of elements or more. What I need to do is to calculate the distance of each of these elements (coordinates) to a given coordinate and filter for the nearest. The brute force method would look like this: #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ def bruteForceSearch(points, point): minpt = min([(vec2Norm(pt, point), pt, i) for i, pt in enumerate(points)], key=itemgetter(0)) return sqrt(minpt[0]), minpt[1], minpt[2] #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ def vec2Norm(pt1,pt2): xDis = pt1[0]-pt2[0] yDis = pt1[1]-pt2[1] zDis = pt1[2]-pt2[2] return xDis*xDis+yDis*yDis+zDis*zDis I have a more clever method but it still takes a lot of time in the vec2norm-function. If you like I can attach a running example. To Ben: > Don't know how much of an impact it would have, but those timeit statements > for array creation include the import process, which are going to be > different for each module and are probably not indicative of the speed of > array creation. No, the timeit statements counts the time for the statement in the first argument only, the import-thing isn't included in the time. Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. From cournape at gmail.com Mon Jan 10 03:53:01 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 10 Jan 2011 17:53:01 +0900 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> Message-ID: On Mon, Jan 10, 2011 at 5:09 PM, EMMEL Thomas wrote: > To John: > >> Did you try larger arrays/tuples? I would guess that makes a significant >> difference. > > No I didn't, due to the fact that these values are coordinates in 3D (x,y,z). > In fact I work with a list/array/tuple of arrays with 100000 to 1M of elements or more. > What I need to do is to calculate the distance of each of these elements (coordinates) > to a given coordinate and filter for the nearest. Note that for this exact problem, there are much better methods than brute force (O(N^2) for N vectors), through e.g. kd-trees, which work very well in low-dimension. This will matter much more than numeric vs numpy cheers, David From Thomas.EMMEL at 3ds.com Mon Jan 10 04:04:48 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Mon, 10 Jan 2011 09:04:48 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray compared toNumeric.array In-Reply-To: References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> Message-ID: <3A0080EEBFB19C4993C24098DD0A78D108D1281D@EU-DCC-MBX01.dsone.3ds.com> > On Mon, Jan 10, 2011 at 5:09 PM, EMMEL Thomas > wrote: > > To John: > > > >> Did you try larger arrays/tuples? I would guess that makes a > significant > >> difference. > > > > No I didn't, due to the fact that these values are coordinates in 3D > (x,y,z). > > In fact I work with a list/array/tuple of arrays with 100000 to 1M of > elements or more. > > What I need to do is to calculate the distance of each of these > elements (coordinates) > > to a given coordinate and filter for the nearest. > > Note that for this exact problem, there are much better methods than > brute force (O(N^2) for N vectors), through e.g. kd-trees, which work > very well in low-dimension. This will matter much more than numeric vs > numpy > > cheers, > > David David, Yes, of course and my real implementation uses exactly these methods, but there are still issues with the arrays. Example: If I would use brute-force it will take ~5000s for a particular example to find all points in a list of points. Theoretically it should be possible to come to O(N*log(N)) with would mean ~2s in my case. My method need ~28s with tuples, but it takes ~30s with Numeric arrays and ~60s and more with numpy.ndarrays! I just use the brute-force method since it delivers the most reusable results for performance testing, the other methods are a bit dependent on the distribution of points in space. Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. From cournape at gmail.com Mon Jan 10 04:14:38 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 10 Jan 2011 18:14:38 +0900 Subject: [Numpy-discussion] speed of numpy.ndarray compared toNumeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D1281D@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> <3A0080EEBFB19C4993C24098DD0A78D108D1281D@EU-DCC-MBX01.dsone.3ds.com> Message-ID: On Mon, Jan 10, 2011 at 6:04 PM, EMMEL Thomas wrote: > > Yes, of course and my real implementation uses exactly these methods, > but there are still issues with the arrays. Did you try kd-trees in scipy ? David From Thomas.EMMEL at 3ds.com Mon Jan 10 04:42:10 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Mon, 10 Jan 2011 09:42:10 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray comparedtoNumeric.array In-Reply-To: References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com><3A0080EEBFB19C4993C24098DD0A78D108D1281D@EU-DCC-MBX01.dsone.3ds.com> Message-ID: <3A0080EEBFB19C4993C24098DD0A78D108D1284B@EU-DCC-MBX01.dsone.3ds.com> > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- > bounces at scipy.org] On Behalf Of David Cournapeau > Sent: Montag, 10. Januar 2011 10:15 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] speed of numpy.ndarray compared > toNumeric.array > > On Mon, Jan 10, 2011 at 6:04 PM, EMMEL Thomas > wrote: > > > > > Yes, of course and my real implementation uses exactly these methods, > > but there are still issues with the arrays. > > Did you try kd-trees in scipy ? > > David David, No, I didn't, however, my method is very similar and as far as I understood kd-trees, they need some time for pre-conditioning the search-area and this is the same as I did. In fact I think my method is more or less the same as a kd-tree. The problem remains that I need to calculate the distance of some points at a certain point in my code (when I am in a leaf of a kd-tree). For example when I use 100000 points I end up in a leaf of my kd-tree where I need to calculate the distance for only 100 points or less (depends on the tree). The problem still remains and I use cProfile to get into the details. Most of the time it takes is in vec2Norm, everything else is very short but I need to call it as often as I have points (again 100000) and this is why 100000*0.001s takes some time. For numpy.ndarray this is 0.002s-0.003s, for Numeric.array 0.001-0.002s and for tuple ~0.001s (values from cProfile). And, by the way, the same problem appears when I need to calculate the cross-product of several vectors. In this case I have a geometry in 3D with a surface of thousands of triangles and I need to calculate the normal of each of these triangles. Again, doing a loop over tuples is faster than arrays, although in this case numpy.cross is twice as fast as Numeric.cross_product. Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. From faltet at pytables.org Mon Jan 10 05:05:27 2011 From: faltet at pytables.org (Francesc Alted) Date: Mon, 10 Jan 2011 11:05:27 +0100 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: Message-ID: <201101101105.27421.faltet@pytables.org> A Sunday 09 January 2011 23:45:02 Mark Wiebe escrigu?: > As a benchmark of C-based iterator usage and to make it work properly > in a multi-threaded context, I've updated numexpr to use the new > iterator. In addition to some performance improvements, this also > made it easy to add optional out= and order= parameters to the > evaluate function. The numexpr repository with this update is > available here: > > https://github.com/m-paradox/numexpr > > To use it, you need the new_iterator branch of NumPy from here: > > https://github.com/m-paradox/numpy > > In all cases tested, the iterator version of numexpr's evaluate > function matches or beats the standard version. The timing results > are below, with some explanatory comments placed inline: [clip] Your patch looks mostly fine to my eyes; good job! Unfortunately, I've been unable to compile your new_iterator branch of NumPy: numpy/core/src/multiarray/multiarraymodule.c:45:33: fatal error: new_iterator_pywrap.h: El fitxer o directori no existeix Apparently, you forgot to add the new_iterator_pywrap.h file. My idea would be to merge your patch in numexpr and make the new `evaluate_iter()` the default (i.e. make it `evaluate()`). However, by looking into the code, it seems to me that unaligned arrays (this is an important use case when operating with columns of structured arrays) may need more fine-tuning for Intel platforms. When I can compile the new_iterator branch, I'll give a try at unaligned data benchs. Also, I'd like to try out the new thread scheduling that you suggested to me privately (i.e. T0T1T0T1... vs T0T0...T1T1...). Thanks! -- Francesc Alted From sebastian at sipsolutions.net Mon Jan 10 06:22:50 2011 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 10 Jan 2011 12:22:50 +0100 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> Message-ID: <1294658570.2490.33.camel@sebastian> Hey, On Mon, 2011-01-10 at 08:09 +0000, EMMEL Thomas wrote: > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > def bruteForceSearch(points, point): > > minpt = min([(vec2Norm(pt, point), pt, i) > for i, pt in enumerate(points)], key=itemgetter(0)) > return sqrt(minpt[0]), minpt[1], minpt[2] > > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > def vec2Norm(pt1,pt2): > xDis = pt1[0]-pt2[0] > yDis = pt1[1]-pt2[1] > zDis = pt1[2]-pt2[2] > return xDis*xDis+yDis*yDis+zDis*zDis > > I have a more clever method but it still takes a lot of time in the vec2norm-function. > If you like I can attach a running example. > if you use the vec2Norm function as you wrote it there, this code is not vectorized at all, and as such of course numpy would be slowest as it has the most overhead and no advantages for non vectorized code, you simply can't write python code like that and expect it to be fast for these kind of calculations. Your function should look more like this: import numpy as np def bruteForceSearch(points, point): dists = points - point # that may need point[None,:] or such for broadcasting to work dists *= dists dists = dists.sum(1) I = np.argmin(dists) return sqrt(dists[I]), points[I], I If points is small, this may not help much (though compared to this exact code my guess is it probably would), if points is larger it should speed up things tremendously (unless you run into RAM problems). It may be that you need to fiddle around with axes, I did not check the code. If this is not good enough for you (you will need to port it (and maybe the next outer loop as well) to Cython or write it in C/C++ and make sure it can optimize things right. Also I think somewhere in scipy there were some distance tools that may be already in C and nice fast, but not sure. I hope I got this right and it helps, Sebastian From faltet at pytables.org Mon Jan 10 06:55:16 2011 From: faltet at pytables.org (Francesc Alted) Date: Mon, 10 Jan 2011 12:55:16 +0100 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: <201101101105.27421.faltet@pytables.org> References: <201101101105.27421.faltet@pytables.org> Message-ID: <201101101255.16286.faltet@pytables.org> A Monday 10 January 2011 11:05:27 Francesc Alted escrigu?: > Also, I'd like to try out the new thread scheduling that you > suggested to me privately (i.e. T0T1T0T1... vs T0T0...T1T1...). I've just implemented the new partition schema in numexpr (T0T0...T1T1..., being the original T0T1T0T1...). I'm attaching the patch for this. The results are a bit confusing. For example, using the attached benchmark (poly.py), I get these results for a common dual- core machine, non-NUMA machine: With the T0T1...T0T1... (original) schema: Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points Using numpy: *** Time elapsed: 3.497 Using numexpr: *** Time elapsed for 1 threads: 1.279000 *** Time elapsed for 2 threads: 0.688000 With the T0T0...T1T1... (new) schema: Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points Using numpy: *** Time elapsed: 3.454 Using numexpr: *** Time elapsed for 1 threads: 1.268000 *** Time elapsed for 2 threads: 0.754000 which is around a 10% slower (2 threads) than the original partition. The results are a bit different on a NUMA machine (8 physical cores, 16 logical cores via hyper-threading): With the T0T1...T0T1... (original) partition: Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points Using numpy: *** Time elapsed: 3.005 Using numexpr: *** Time elapsed for 1 threads: 1.109000 *** Time elapsed for 2 threads: 0.677000 *** Time elapsed for 3 threads: 0.496000 *** Time elapsed for 4 threads: 0.394000 *** Time elapsed for 5 threads: 0.324000 *** Time elapsed for 6 threads: 0.287000 *** Time elapsed for 7 threads: 0.247000 *** Time elapsed for 8 threads: 0.234000 *** Time elapsed for 9 threads: 0.242000 *** Time elapsed for 10 threads: 0.239000 *** Time elapsed for 11 threads: 0.241000 *** Time elapsed for 12 threads: 0.235000 *** Time elapsed for 13 threads: 0.226000 *** Time elapsed for 14 threads: 0.214000 *** Time elapsed for 15 threads: 0.235000 *** Time elapsed for 16 threads: 0.218000 With the T0T0...T1T1... (new) partition: Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points Using numpy: *** Time elapsed: 3.003 Using numexpr: *** Time elapsed for 1 threads: 1.106000 *** Time elapsed for 2 threads: 0.617000 *** Time elapsed for 3 threads: 0.442000 *** Time elapsed for 4 threads: 0.345000 *** Time elapsed for 5 threads: 0.296000 *** Time elapsed for 6 threads: 0.257000 *** Time elapsed for 7 threads: 0.237000 *** Time elapsed for 8 threads: 0.260000 *** Time elapsed for 9 threads: 0.245000 *** Time elapsed for 10 threads: 0.261000 *** Time elapsed for 11 threads: 0.238000 *** Time elapsed for 12 threads: 0.210000 *** Time elapsed for 13 threads: 0.218000 *** Time elapsed for 14 threads: 0.200000 *** Time elapsed for 15 threads: 0.235000 *** Time elapsed for 16 threads: 0.198000 In this case, the performance is similar, with perhaps a slight advantage for the new partition scheme, but I don't know if it is worth to make it the default (probably not, as this partition performs clearly worse on non-NUMA machines). At any rate, both partitions perform very close to the aggregated memory bandwidth of NUMA machines (around 10 GB/s in the above case). In general, I don't think there is much point in using Intel's TBB in numexpr because the existing implementation already hits memory bandwidth limits pretty early (around 10 threads in the latter example). -- Francesc Alted -------------- next part -------------- A non-text attachment was scrubbed... Name: new_partition.diff Type: text/x-patch Size: 3778 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: poly.py Type: text/x-python Size: 1620 bytes Desc: not available URL: From totonixsame at gmail.com Mon Jan 10 08:53:05 2011 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Mon, 10 Jan 2011 11:53:05 -0200 Subject: [Numpy-discussion] Drawing circles in a numpy array Message-ID: Hi all, I have this problem: Given some point draw a circle centered in this point with radius r. I'm doing that using numpy this way (Snippet code from here [1]): >>> # Create the initial black and white image >>> import numpy as np >>> from scipy import ndimage >>> a = np.zeros((512, 512)).astype(uint8) #unsigned integer type needed by watershed >>> y, x = np.ogrid[0:512, 0:512] >>> m1 = ((y-200)**2 + (x-100)**2 < 30**2) >>> m2 = ((y-350)**2 + (x-400)**2 < 20**2) >>> m3 = ((y-260)**2 + (x-200)**2 < 20**2) >>> a[m1+m2+m3]=1 >>> imshow(a, cmap = cm.gray)# left plot in the image above The problem is that it have to evaluate all values from 0 to image size (in snippet, from 0 to 512 in X and Y dimensions). There is a faster way of doing that? Without evaluate all that values? For example: only evaluate from 0 to 30, in a circle centered in (0, 0) with radius 30. Thanks! Thiago Franco de Moraes [1] - http://www.scipy.org/Cookbook/Watershed From Thomas.EMMEL at 3ds.com Mon Jan 10 08:54:50 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Mon, 10 Jan 2011 13:54:50 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <1294658570.2490.33.camel@sebastian> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> <1294658570.2490.33.camel@sebastian> Message-ID: <3A0080EEBFB19C4993C24098DD0A78D121073232@EU-DCC-MBX02.dsone.3ds.com> Hey back... > > > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~ > ~~~ > > def bruteForceSearch(points, point): > > > > minpt = min([(vec2Norm(pt, point), pt, i) > > for i, pt in enumerate(points)], key=itemgetter(0)) > > return sqrt(minpt[0]), minpt[1], minpt[2] > > > > > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~ > ~~~~ > > def vec2Norm(pt1,pt2): > > xDis = pt1[0]-pt2[0] > > yDis = pt1[1]-pt2[1] > > zDis = pt1[2]-pt2[2] > > return xDis*xDis+yDis*yDis+zDis*zDis > > > > I have a more clever method but it still takes a lot of time in the > vec2norm-function. > > If you like I can attach a running example. > > > > if you use the vec2Norm function as you wrote it there, this code is > not vectorized at all, and as such of course numpy would be slowest as > it has the most overhead and no advantages for non vectorized code, > you simply can't write python code like that and expect it to be fast > for these kind of calculations. > > Your function should look more like this: > > import numpy as np > > def bruteForceSearch(points, point): > dists = points - point > # that may need point[None,:] or such for broadcasting to work > dists *= dists > dists = dists.sum(1) > I = np.argmin(dists) > return sqrt(dists[I]), points[I], I > > If points is small, this may not help much (though compared to this > exact code my guess is it probably would), if points is larger it > should speed up things tremendously (unless you run into RAM > problems). It may be that you need to fiddle around with axes, I did > not check the code. > If this is not good enough for you (you will need to port it (and > maybe the next outer loop as well) to Cython or write it in C/C++ and > make sure it can optimize things right. Also I think somewhere in > scipy there were some distance tools that may be already in C and nice > fast, but not sure. > > I hope I got this right and it helps, > > Sebastian > I see the point and it was very helpful to understand the behavior of the arrays a bit better. And your attempt improved the bruteForceSearch which is up to 6 times faster. But in case of a leaf in a kd-tree you end up with 50, 20, 10 or less points where the speed-up is reversed. In this particular case 34000 runs take 90s with your method and 50s with mine (not the bruteForce). I see now the limits of the arrays but of course I see the chances and - coming back to my original question - it seems that Numeric arrays were faster for my kind of application but they might be slower for larger amounts of data. Regards Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. From pascal22p at parois.net Mon Jan 10 10:25:33 2011 From: pascal22p at parois.net (Pascal) Date: Mon, 10 Jan 2011 16:25:33 +0100 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> Message-ID: <4D2B24ED.10105@parois.net> Hi, On 01/10/2011 09:09 AM, EMMEL Thomas wrote: > > No I didn't, due to the fact that these values are coordinates in 3D (x,y,z). > In fact I work with a list/array/tuple of arrays with 100000 to 1M of elements or more. > What I need to do is to calculate the distance of each of these elements (coordinates) > to a given coordinate and filter for the nearest. > The brute force method would look like this: > > > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > def bruteForceSearch(points, point): > > minpt = min([(vec2Norm(pt, point), pt, i) > for i, pt in enumerate(points)], key=itemgetter(0)) > return sqrt(minpt[0]), minpt[1], minpt[2] > > #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > def vec2Norm(pt1,pt2): > xDis = pt1[0]-pt2[0] > yDis = pt1[1]-pt2[1] > zDis = pt1[2]-pt2[2] > return xDis*xDis+yDis*yDis+zDis*zDis > I am not sure I understood the problem properly but here what I would use to calculate a distance from horizontally stacked vectors (big): ref=numpy.array([0.1,0.2,0.3]) big=numpy.random.randn(1000000, 3) big=numpy.add(big,-ref) distsquared=numpy.sum(big**2, axis=1) Pascal From n.becker at amolf.nl Mon Jan 10 11:08:38 2011 From: n.becker at amolf.nl (Nils Becker) Date: Mon, 10 Jan 2011 17:08:38 +0100 Subject: [Numpy-discussion] indexing of rank-0 structured arrays: why not? Message-ID: <4D2B2F06.6090100@amolf.nl> Hi, I noticed that I can index into a dtype when I take an element of a rank-1 array but not if I make a rank-0 array directly. This seems inconsistent. A bug? Nils In [76]: np.version.version Out[76]: '1.5.1' In [78]: dt = np.dtype([('x', ' in () IndexError: 0-d arrays can't be indexed In [87]: a_rank_0['x'] Out[87]: array(0.0) From renesd at gmail.com Mon Jan 10 11:23:05 2011 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Mon, 10 Jan 2011 16:23:05 +0000 Subject: [Numpy-discussion] speed of numpy.ndarray compared to Numeric.array In-Reply-To: <4D2B24ED.10105@parois.net> References: <3A0080EEBFB19C4993C24098DD0A78D108D1275F@EU-DCC-MBX01.dsone.3ds.com> <4D2B24ED.10105@parois.net> Message-ID: Hi, Spatial hashes are the common solution. Another common optimization is using the distance squared for collision detection. Since you do not need the expensive sqrt for this calc. cu. On Mon, Jan 10, 2011 at 3:25 PM, Pascal wrote: > Hi, > > On 01/10/2011 09:09 AM, EMMEL Thomas wrote: >> >> No I didn't, due to the fact that these values are coordinates in 3D (x,y,z). >> In fact I work with a list/array/tuple of arrays with 100000 to 1M of elements or more. >> What I need to do is to calculate the distance of each of these elements (coordinates) >> to a given coordinate and filter for the nearest. >> The brute force method would look like this: >> >> >> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> def bruteForceSearch(points, point): >> >> ? ? ?minpt = min([(vec2Norm(pt, point), pt, i) >> ? ? ? ? ? ? ? ? ? for i, pt in enumerate(points)], key=itemgetter(0)) >> ? ? ?return sqrt(minpt[0]), minpt[1], minpt[2] >> >> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> def vec2Norm(pt1,pt2): >> ? ? ?xDis = pt1[0]-pt2[0] >> ? ? ?yDis = pt1[1]-pt2[1] >> ? ? ?zDis = pt1[2]-pt2[2] >> ? ? ?return xDis*xDis+yDis*yDis+zDis*zDis >> > > I am not sure I understood the problem properly but here what I would > use to calculate a distance from horizontally stacked vectors (big): > > ref=numpy.array([0.1,0.2,0.3]) > big=numpy.random.randn(1000000, 3) > > big=numpy.add(big,-ref) > distsquared=numpy.sum(big**2, axis=1) > > Pascal > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mwwiebe at gmail.com Mon Jan 10 11:54:16 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 10 Jan 2011 08:54:16 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: <201101101105.27421.faltet@pytables.org> References: <201101101105.27421.faltet@pytables.org> Message-ID: On Mon, Jan 10, 2011 at 2:05 AM, Francesc Alted wrote: > > > Your patch looks mostly fine to my eyes; good job! Unfortunately, I've > been unable to compile your new_iterator branch of NumPy: > > numpy/core/src/multiarray/multiarraymodule.c:45:33: fatal error: > new_iterator_pywrap.h: El fitxer o directori no existeix > > Apparently, you forgot to add the new_iterator_pywrap.h file. > Oops, that's added now. > My idea would be to merge your patch in numexpr and make the new > `evaluate_iter()` the default (i.e. make it `evaluate()`). However, by > looking into the code, it seems to me that unaligned arrays (this is an > important use case when operating with columns of structured arrays) may > need more fine-tuning for Intel platforms. When I can compile the > new_iterator branch, I'll give a try at unaligned data benchs. > The aligned case should just be a matter of conditionally removing the NPY_ITER_ALIGNED flag in two places. The new code also needs support for the reduce operation. I didn't look too closely at the code for that, but a nested iteration pattern is probably appropriate. If the inner loop is just allowed to be one dimension, it could be done without actually creating the inner iterator. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 10 12:05:45 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 10 Jan 2011 11:05:45 -0600 Subject: [Numpy-discussion] indexing of rank-0 structured arrays: why not? In-Reply-To: <4D2B2F06.6090100@amolf.nl> References: <4D2B2F06.6090100@amolf.nl> Message-ID: On Mon, Jan 10, 2011 at 10:08, Nils Becker wrote: > Hi, > > I noticed that I can index into a dtype when I take an element > of a rank-1 array but not if I make a rank-0 array directly. This seems > inconsistent. A bug? Not a bug. Since there is no axis, you cannot use integers to index into a rank-0 array. Use an empty tuple instead. [~] |1> dt = np.dtype([('x', ' a_rank_0 = np.zeros((), dtype=dt) [~] |3> a_rank_0[()] (0.0, 0.0) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From faltet at pytables.org Mon Jan 10 12:47:02 2011 From: faltet at pytables.org (Francesc Alted) Date: Mon, 10 Jan 2011 18:47:02 +0100 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: <201101101105.27421.faltet@pytables.org> Message-ID: <201101101847.02890.faltet@pytables.org> A Monday 10 January 2011 17:54:16 Mark Wiebe escrigu?: > > Apparently, you forgot to add the new_iterator_pywrap.h file. > > Oops, that's added now. Excellent. It works now. > The aligned case should just be a matter of conditionally removing > the NPY_ITER_ALIGNED flag in two places. Wow, the support for unaligned in current `evaluate_iter()` seems pretty nice already: $ python unaligned-simple.py -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Numexpr version: 1.5.dev NumPy version: 2.0.0.dev-ebc963d Python version: 2.6.1 (r261:67515, Feb 3 2009, 17:34:37) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] Platform: linux2-x86_64 AMD/Intel CPU? True VML available? False Detected cores: 2 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- NumPy aligned: 0.658 s NumPy unaligned: 1.597 s Numexpr aligned: 0.59 s Numexpr aligned (new iter): 0.59 s Numexpr unaligned: 0.51 s Numexpr unaligned (new_iter): 0.528 s so, the new code is just < 5% slower. I suppose that removing the NPY_ITER_ALIGNED flag would give us a bit more performance, but that's great as it is now. How did you do that? Your new_iter branch in NumPy already deals with unaligned data, right? > The new code also needs support for the reduce operation. I didn't > look too closely at the code for that, but a nested iteration > pattern is probably appropriate. If the inner loop is just allowed > to be one dimension, it could be done without actually creating the > inner iterator. Well, if you can support reduce operations with your patch that would be extremely good news as I'm afraid that the current reduce code is a bit broken in Numexpr (at least, I vaguely remember seeing it working badly in some cases). -- Francesc Alted From n.becker at amolf.nl Mon Jan 10 13:15:13 2011 From: n.becker at amolf.nl (Nils Becker) Date: Mon, 10 Jan 2011 19:15:13 +0100 Subject: [Numpy-discussion] indexing of rank-0 structured arrays: why not? Message-ID: <4D2B4CB1.2090404@amolf.nl> Robert, your answer does work: after indexing with () I can then further index into the datatype. In [115]: a_rank_0[()][0] Out[115]: 0.0 I guess I just found the fact confusing that a_rank_1[0] and a_rank_0 compare and print equal but behave differently under indexing. More precisely if I do In [117]: b = a_rank_1[0] then In [118]: b.shape Out[118]: () and In [120]: a_rank_0 == b Out[120]: True but In [119]: b[0] Out[119]: 0.0 works but a_rank_0[0] doesn't. I thought b is a rank-0 array which it apparently is not since it can be indexed. So maybe b[0] should fail for consistency? N. From mwwiebe at gmail.com Mon Jan 10 13:29:33 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 10 Jan 2011 10:29:33 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: <201101101847.02890.faltet@pytables.org> References: <201101101105.27421.faltet@pytables.org> <201101101847.02890.faltet@pytables.org> Message-ID: On Mon, Jan 10, 2011 at 9:47 AM, Francesc Alted wrote: > > > so, the new code is just < 5% slower. I suppose that removing the > NPY_ITER_ALIGNED flag would give us a bit more performance, but that's > great as it is now. How did you do that? Your new_iter branch in NumPy > already deals with unaligned data, right? > Take a look at lowlevel_strided_loops.c.src. In this case, the buffering setup code calls PyArray_GetDTypeTransferFunction, which in turn calls PyArray_GetStridedCopyFn, which on an x86 platform returns _aligned_strided_to_contig_size8. This function has a simple loop of copies using a npy_uint64 data type. > The new code also needs support for the reduce operation. I didn't > > look too closely at the code for that, but a nested iteration > > pattern is probably appropriate. If the inner loop is just allowed > > to be one dimension, it could be done without actually creating the > > inner iterator. > > Well, if you can support reduce operations with your patch that would be > extremely good news as I'm afraid that the current reduce code is a bit > broken in Numexpr (at least, I vaguely remember seeing it working badly > in some cases). > Cool, I'll take a look at some point. I imagine with the most obvious implementation small reductions would perform poorly. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 10 14:16:02 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 10 Jan 2011 13:16:02 -0600 Subject: [Numpy-discussion] indexing of rank-0 structured arrays: why not? In-Reply-To: <4D2B4CB1.2090404@amolf.nl> References: <4D2B4CB1.2090404@amolf.nl> Message-ID: On Mon, Jan 10, 2011 at 12:15, Nils Becker wrote: > Robert, > > your answer does work: after indexing with () I can then further index > into the datatype. > > In [115]: a_rank_0[()][0] > Out[115]: 0.0 > > I guess I just found the fact confusing that a_rank_1[0] and a_rank_0 > compare and print equal but behave differently under indexing. They do not print equal. Many things compare equal but do not behave the same. > More precisely if I do > In [117]: b = a_rank_1[0] > > then > > In [118]: b.shape > Out[118]: () > > and > > In [120]: a_rank_0 == b > Out[120]: True > > but > > In [119]: b[0] > Out[119]: 0.0 > > works but a_rank_0[0] doesn't. I thought b is a rank-0 array which it > apparently is not since it can be indexed. So maybe b[0] should fail for > consistency? No, b is a record scalar. It can be indexed because it is often convient to treat such records like tuples. This replaces the default indexing behavior of scalars (which is to simply disallow indexing). a_rank_0 is an array, so the array indexing semantics are the default, and we do not change them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mwwiebe at gmail.com Mon Jan 10 14:35:08 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 10 Jan 2011 11:35:08 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: <201101101255.16286.faltet@pytables.org> References: <201101101105.27421.faltet@pytables.org> <201101101255.16286.faltet@pytables.org> Message-ID: I'm a bit curious why the jump from 1 to 2 threads is scaling so poorly. Your timings have improvement factors of 1.85, 1.68, 1.64, and 1.79. Since the computation is trivial data parallelism, and I believe it's still pretty far off the memory bandwidth limit, I would expect a speedup of 1.95 or higher. One reason I suggest TBB is that it can produce a pretty good schedule while still adapting to load produced by other processes and threads. Numexpr currently does that well, but simply dividing the data into one piece per thread doesn't handle that case very well, and makes it possible that one thread spends a fair bit of time finishing up while the others idle at the end. Perhaps using Cilk would be a better option than TBB, since the code could remain in C. -Mark On Mon, Jan 10, 2011 at 3:55 AM, Francesc Alted wrote: > A Monday 10 January 2011 11:05:27 Francesc Alted escrigu?: > > Also, I'd like to try out the new thread scheduling that you > > suggested to me privately (i.e. T0T1T0T1... vs T0T0...T1T1...). > > I've just implemented the new partition schema in numexpr > (T0T0...T1T1..., being the original T0T1T0T1...). I'm attaching the > patch for this. The results are a bit confusing. For example, using > the attached benchmark (poly.py), I get these results for a common dual- > core machine, non-NUMA machine: > > With the T0T1...T0T1... (original) schema: > > Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points > Using numpy: > *** Time elapsed: 3.497 > Using numexpr: > *** Time elapsed for 1 threads: 1.279000 > *** Time elapsed for 2 threads: 0.688000 > > With the T0T0...T1T1... (new) schema: > > Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points > Using numpy: > *** Time elapsed: 3.454 > Using numexpr: > *** Time elapsed for 1 threads: 1.268000 > *** Time elapsed for 2 threads: 0.754000 > > which is around a 10% slower (2 threads) than the original partition. > > The results are a bit different on a NUMA machine (8 physical cores, 16 > logical cores via hyper-threading): > > With the T0T1...T0T1... (original) partition: > > Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points > Using numpy: > *** Time elapsed: 3.005 > Using numexpr: > *** Time elapsed for 1 threads: 1.109000 > *** Time elapsed for 2 threads: 0.677000 > *** Time elapsed for 3 threads: 0.496000 > *** Time elapsed for 4 threads: 0.394000 > *** Time elapsed for 5 threads: 0.324000 > *** Time elapsed for 6 threads: 0.287000 > *** Time elapsed for 7 threads: 0.247000 > *** Time elapsed for 8 threads: 0.234000 > *** Time elapsed for 9 threads: 0.242000 > *** Time elapsed for 10 threads: 0.239000 > *** Time elapsed for 11 threads: 0.241000 > *** Time elapsed for 12 threads: 0.235000 > *** Time elapsed for 13 threads: 0.226000 > *** Time elapsed for 14 threads: 0.214000 > *** Time elapsed for 15 threads: 0.235000 > *** Time elapsed for 16 threads: 0.218000 > > With the T0T0...T1T1... (new) partition: > > Computing: '((.25*x + .75)*x - 1.5)*x - 2' with 100000000 points > Using numpy: > *** Time elapsed: 3.003 > Using numexpr: > *** Time elapsed for 1 threads: 1.106000 > *** Time elapsed for 2 threads: 0.617000 > *** Time elapsed for 3 threads: 0.442000 > *** Time elapsed for 4 threads: 0.345000 > *** Time elapsed for 5 threads: 0.296000 > *** Time elapsed for 6 threads: 0.257000 > *** Time elapsed for 7 threads: 0.237000 > *** Time elapsed for 8 threads: 0.260000 > *** Time elapsed for 9 threads: 0.245000 > *** Time elapsed for 10 threads: 0.261000 > *** Time elapsed for 11 threads: 0.238000 > *** Time elapsed for 12 threads: 0.210000 > *** Time elapsed for 13 threads: 0.218000 > *** Time elapsed for 14 threads: 0.200000 > *** Time elapsed for 15 threads: 0.235000 > *** Time elapsed for 16 threads: 0.198000 > > In this case, the performance is similar, with perhaps a slight > advantage for the new partition scheme, but I don't know if it is worth > to make it the default (probably not, as this partition performs clearly > worse on non-NUMA machines). At any rate, both partitions perform very > close to the aggregated memory bandwidth of NUMA machines (around 10 > GB/s in the above case). > > In general, I don't think there is much point in using Intel's TBB in > numexpr because the existing implementation already hits memory > bandwidth limits pretty early (around 10 threads in the latter example). > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jan 11 00:45:28 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 10 Jan 2011 21:45:28 -0800 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: <201101101105.27421.faltet@pytables.org> <201101101255.16286.faltet@pytables.org> Message-ID: On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe wrote: > I'm a bit curious why the jump from 1 to 2 threads is scaling so poorly. > Your timings have improvement factors of 1.85, 1.68, 1.64, and 1.79. Since > the computation is trivial data parallelism, and I believe it's still pretty > far off the memory bandwidth limit, I would expect a speedup of 1.95 or > higher. It looks like it is the memory bandwidth which is limiting the scalability. The slower operations scale much better than faster ones. Below are some timings of successively faster operations. When the operation is slow enough, it scales like I was expecting... -Mark Computing: 'cos(x**1.1) + sin(x**1.3) + tan(x**2.3)' with 20000000 points Using numpy: *** Time elapsed: 14.47 Using numexpr: *** Time elapsed for 1 threads: 12.659000 *** Time elapsed for 2 threads: 6.357000 *** Ratio from 1 to 2 threads: 1.991348 Using numexpr_iter: *** Time elapsed for 1 threads: 12.573000 *** Time elapsed for 2 threads: 6.398000 *** Ratio from 1 to 2 threads: 1.965145 Computing: 'x**2.345' with 20000000 points Using numpy: *** Time elapsed: 3.506 Using numexpr: *** Time elapsed for 1 threads: 3.375000 *** Time elapsed for 2 threads: 1.747000 *** Ratio from 1 to 2 threads: 1.931883 Using numexpr_iter: *** Time elapsed for 1 threads: 3.266000 *** Time elapsed for 2 threads: 1.760000 *** Ratio from 1 to 2 threads: 1.855682 Computing: '1*x+2*x+3*x+4*x+5*x+6*x+7*x+8*x+9*x+10*x+11*x+12*x+13*x+14*x' with 20000000 points Using numpy: *** Time elapsed: 9.774 Using numexpr: *** Time elapsed for 1 threads: 1.314000 *** Time elapsed for 2 threads: 0.703000 *** Ratio from 1 to 2 threads: 1.869132 Using numexpr_iter: *** Time elapsed for 1 threads: 1.257000 *** Time elapsed for 2 threads: 0.683000 *** Ratio from 1 to 2 threads: 1.840410 Computing: 'x+2.345' with 20000000 points Using numpy: *** Time elapsed: 0.343 Using numexpr: *** Time elapsed for 1 threads: 0.348000 *** Time elapsed for 2 threads: 0.300000 *** Ratio from 1 to 2 threads: 1.160000 Using numexpr_iter: *** Time elapsed for 1 threads: 0.354000 *** Time elapsed for 2 threads: 0.293000 *** Ratio from 1 to 2 threads: 1.208191 -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Tue Jan 11 05:44:15 2011 From: faltet at pytables.org (Francesc Alted) Date: Tue, 11 Jan 2011 11:44:15 +0100 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: <201101101847.02890.faltet@pytables.org> Message-ID: <201101111144.15649.faltet@pytables.org> A Monday 10 January 2011 19:29:33 Mark Wiebe escrigu?: > > so, the new code is just < 5% slower. I suppose that removing the > > NPY_ITER_ALIGNED flag would give us a bit more performance, but > > that's great as it is now. How did you do that? Your new_iter > > branch in NumPy already deals with unaligned data, right? > > Take a look at lowlevel_strided_loops.c.src. In this case, the > buffering setup code calls PyArray_GetDTypeTransferFunction, which > in turn calls PyArray_GetStridedCopyFn, which on an x86 platform > returns > _aligned_strided_to_contig_size8. This function has a simple loop of > copies using a npy_uint64 data type. I see. Brilliant! > > Well, if you can support reduce operations with your patch that > > would be extremely good news as I'm afraid that the current reduce > > code is a bit broken in Numexpr (at least, I vaguely remember > > seeing it working badly in some cases). > > Cool, I'll take a look at some point. I imagine with the most > obvious implementation small reductions would perform poorly. IMO, reductions like sum() or prod() are mainly limited my memory access, so my advise would be to not try to over-optimize here, and just make use of the new iterator. We can refine performance later on. -- Francesc Alted From faltet at pytables.org Tue Jan 11 06:58:27 2011 From: faltet at pytables.org (Francesc Alted) Date: Tue, 11 Jan 2011 12:58:27 +0100 Subject: [Numpy-discussion] numexpr with the new iterator In-Reply-To: References: Message-ID: <201101111258.27489.faltet@pytables.org> A Tuesday 11 January 2011 06:45:28 Mark Wiebe escrigu?: > On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe wrote: > > I'm a bit curious why the jump from 1 to 2 threads is scaling so > > poorly. > > > > Your timings have improvement factors of 1.85, 1.68, 1.64, and > > 1.79. Since > > > > the computation is trivial data parallelism, and I believe it's > > still pretty far off the memory bandwidth limit, I would expect a > > speedup of 1.95 or higher. > > It looks like it is the memory bandwidth which is limiting the > scalability. Indeed, this is an increasingly important problem for modern computers. You may want to read: http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf ;-) > The slower operations scale much better than faster > ones. Below are some timings of successively faster operations. > When the operation is slow enough, it scales like I was expecting... [clip] Yeah, for another example on this with more threads, see: http://code.google.com/p/numexpr/wiki/MultiThreadVM OTOH, I was curious about the performance of the new iterator with Intel's VML, but it seems to work decently too: $ python bench/vml_timing.py (original numexpr, *no* VML support) *************** Numexpr vs NumPy speed-ups ******************* Contiguous case: 1.72 (mean), 0.92 (min), 3.07 (max) Strided case: 2.1 (mean), 0.98 (min), 3.52 (max) Unaligned case: 2.35 (mean), 1.35 (min), 3.31 (max) $ python bench/vml_timing.py (original numexpr, VML support) *************** Numexpr vs NumPy speed-ups ******************* Contiguous case: 3.83 (mean), 1.1 (min), 10.19 (max) Strided case: 3.21 (mean), 0.98 (min), 7.45 (max) Unaligned case: 3.6 (mean), 1.47 (min), 7.87 (max) $ python bench/vml_timing.py (new iter numexpr, VML support) *************** Numexpr vs NumPy speed-ups ******************* Contiguous case: 3.56 (mean), 1.12 (min), 7.38 (max) Strided case: 2.37 (mean), 0.09 (min), 7.63 (max) Unaligned case: 3.56 (mean), 2.08 (min), 5.88 (max) However, there a couple of quirks here. 1) The original Numexpr performs generally faster than the iter version. 2) The strided case is quite worse for the iter version. I've isolated the tests that performs worse for the iter version, and here are a couple of samples: *************** Expression: exp(f3) numpy: 0.0135 numpy strided: 0.0144 numpy unaligned: 0.0200 numexpr: 0.0020 Speed-up of numexpr over numpy: 6.6584 numexpr strided: 0.1495 Speed-up of numexpr over numpy: 0.0962 numexpr unaligned: 0.0049 Speed-up of numexpr over numpy: 4.0859 *************** Expression: sin(f3)>cos(f4) numpy: 0.0291 numpy strided: 0.0366 numpy unaligned: 0.0407 numexpr: 0.0166 Speed-up of numexpr over numpy: 1.7518 numexpr strided: 0.1551 Speed-up of numexpr over numpy: 0.2361 numexpr unaligned: 0.0175 Speed-up of numexpr over numpy: 2.3246 Maybe you can shed some light on what's going on here (shall we discuss this off-the-list so as to not bore people too much?). -- Francesc Alted From totonixsame at gmail.com Tue Jan 11 11:13:03 2011 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Tue, 11 Jan 2011 14:13:03 -0200 Subject: [Numpy-discussion] Drawing circles in a numpy array In-Reply-To: References: Message-ID: On Mon, Jan 10, 2011 at 11:53 AM, totonixsame at gmail.com wrote: > Hi all, > > I have this problem: Given some point draw a circle centered in this > point with radius r. I'm doing that using numpy this way (Snippet code > from here [1]): > >>>> # Create the initial black and white image >>>> import numpy as np >>>> from scipy import ndimage >>>> a = np.zeros((512, 512)).astype(uint8) #unsigned integer type needed by watershed >>>> y, x = np.ogrid[0:512, 0:512] >>>> m1 = ((y-200)**2 + (x-100)**2 < 30**2) >>>> m2 = ((y-350)**2 + (x-400)**2 < 20**2) >>>> m3 = ((y-260)**2 + (x-200)**2 < 20**2) >>>> a[m1+m2+m3]=1 >>>> imshow(a, cmap = cm.gray)# left plot in the image above > > The problem is that it have to evaluate all values from 0 to image > size (in snippet, from 0 to 512 in X and Y dimensions). There is a > faster way of doing that? Without evaluate all that values? For > example: only evaluate from 0 to 30, in a circle centered in (0, 0) > with radius 30. > > Thanks! > Thiago Franco de Moraes > > [1] - http://www.scipy.org/Cookbook/Watershed Hi, I've just seen I can do something like this: >>> radius = 10 >>> a = np.zeros((512, 512)).astype('uint8') >>> cx, cy = 100, 100 # The center of circle >>> y, x = np.ogrid[-radius: radius, -radius: radius] >>> index = x**2 + y**2 <= radius**2 >>> a[cy-radius:cy+radius, cx-radius:cx+radius][index] = 255 Numpy is very cool! Is there other way of doing that? Only to know ... Thanks! Thiago Franco de Moraes From kwgoodman at gmail.com Tue Jan 11 13:46:21 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 11 Jan 2011 10:46:21 -0800 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) In-Reply-To: References: <4D21F8BD.60003@hawaii.edu> Message-ID: On Tue, Jan 4, 2011 at 8:14 AM, Keith Goodman wrote: > On Tue, Jan 4, 2011 at 8:06 AM, Sebastian Haase wrote: >> On Mon, Jan 3, 2011 at 5:32 PM, Erik Rigtorp wrote: >>> On Mon, Jan 3, 2011 at 11:26, Eric Firing wrote: >>>> Instead of calculating statistics independently each time the window is >>>> advanced one data point, the statistics are updated. ?I have not done >>>> any benchmarking, but I expect this approach to be quick. >>> >>> This might accumulate numerical errors. But could be fine for many applications. >>> >>>> The code is old; I have not tried to update it to take advantage of >>>> cython's advances over pyrex. ?If I were writing it now, I might not >>>> bother with the C level at all; it could all be done in cython, probably >>>> with no speed penalty, and maybe even with reduced overhead. >>>> >>> >>> No doubt this would be faster, I just wanted to offer a general way to >>> this in NumPy. >>> _______________________________________________ >> >> BTW, some of these operations can be done using scipy's ndimage ?- right ? >> Any comments ? ?How does the performance compare ? >> ndimage might have more options regarding edge handling, or ? > > Take a look at the moving window function in the development version > of the la package: > > https://github.com/kwgoodman/la/blob/master/la/farray/mov.py > > Many of the moving window functions offer three calculation methods: > filter (ndimage), strides (the strides trick discussed in this > thread), and loop (a simple python loop). > > For example: > >>> a = np.random.rand(500,2000) >>> timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') > 1 loops, best of 3: 336 ms per loop >>> timeit la.farray.mov_max(a, window=252, axis=-1, method='strides') > 1 loops, best of 3: 609 ms per loop >>> timeit la.farray.mov_max(a, window=252, axis=-1, method='loop') > 1 loops, best of 3: 638 ms per loop > > No one method is best for all situations. That is one of the reasons I > started the Bottleneck package. I figured Cython could beat them all. I added four new function to Bottleneck: move_min, move_max, move_nanmin, move_nanmax. They are much faster than using SciPy's ndimage.maximum_filter1d or the strides trick: >> a = np.random.rand(500,2000) >> timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') # ndimage 1 loops, best of 3: 336 ms per loop >> timeit bn.move_max(a, window=252, axis=-1) # bottleneck 100 loops, best of 3: 14.1 ms per loop That looks too good to be true. Are the outputs the same? >> a1 = la.farray.mov_max(a, window=252, axis=-1, method='filter') >> a2 = bn.move_max(a, window=252, axis=-1) >> np.testing.assert_array_almost_equal(a1, a2) >> Yes. From mfrank at ari.uni-heidelberg.de Tue Jan 11 14:21:40 2011 From: mfrank at ari.uni-heidelberg.de (Matthias Frank) Date: Tue, 11 Jan 2011 20:21:40 +0100 Subject: [Numpy-discussion] histogram2d and decreasing bin edges Message-ID: <4D2CADC4.1010100@ari.uni-heidelberg.de> Hi all, I've noticed a change in numpy.histogram2d between (possibly very much) older versions and the current one: The function can no longer handle the situation where bin edges decrease instead of increasing monotonically. The reason for this seems to be the handling of outliers histogramdd, see the output of minimal example below. If I understand correctly, this is the only place where histogramdd implicitly assumes monotonically increasing bin edges. If so, this could be fixed to work with increasing and decreasing bin edges by taking abs(dedges[i]).min() when calculating the rounding precision. If not, it might be more consistent, and produce a more meaningful error message, if histogram2d asserted that bin edges increase monotonically and otherwise raised an AttributeError as the 1-d histogram() function does in that case (see below) Matthias In [1]: import numpy In [2]: numpy.__version__ Out[2]: '1.5.1' In [3]: ascending=numpy.array([0,1]) In [4]: descending=numpy.array([1,0]) In [5]: numpy.histogram2d([0.5],[0.5],bins=(ascending,ascending)) Out[5]: (array([[ 1.]]), array([ 0., 1.]), array([ 0., 1.])) In [6]: numpy.histogram2d([0.5],[0.5],bins=(descending,descending)) Warning: invalid value encountered in log10 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /lib/python2.6/site-packages/numpy/lib/twodim_base.pyc in histogram2d(x, y, bins, range, normed, weights) 613 xedges = yedges = asarray(bins, float) 614 bins = [xedges, yedges] --> 615 hist, edges = histogramdd([x,y], bins, range, normed, weights) 616 return hist, edges[0], edges[1] 617 /lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 312 for i in arange(D): 313 # Rounding precision --> 314 decimal = int(-log10(dedges[i].min())) +6 315 # Find which points are on the rightmost edge. 316 on_edge = where(around(sample[:,i], decimal) == around(edges[i][-1], ValueError: cannot convert float NaN to integer Behavior of the 1-d histogram() In [8]: numpy.histogram([0.5],bins=ascending) Out[8]: (array([1]), array([0, 1])) In [9]: numpy.histogram([0.5],bins=descending) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights) 160 if (np.diff(bins) < 0).any(): 161 raise AttributeError( --> 162 'bins must increase monotonically.') 163 164 # Histogram is an integer or a float array depending on the weights. AttributeError: bins must increase monotonically. From bje at air.net.au Wed Jan 12 06:45:54 2011 From: bje at air.net.au (Ben Elliston) Date: Wed, 12 Jan 2011 22:45:54 +1100 Subject: [Numpy-discussion] mapping a function to a masked array Message-ID: <20110112114554.GA23259@air.net.au> I have a masked array of values that I would like to transform through a user-defined function. Naturally, I want to ignore any values that are masked in the initial array. The user-defined function examines other points around the value in question, so I need to use ndenumerate (or similar) to get the array index as I iterate over the array. So, I have two questions: how to make this run without looping in Python, and how to avoid masked values. Here is the clunky solution I have so far: result = ma.copy (data) for i, val in ndenumerate (data): if not data.mask[i]: result[i] = myfunc (data, i, val) Any suggestions? Thanks, Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From pgmdevlist at gmail.com Wed Jan 12 07:04:02 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 12 Jan 2011 13:04:02 +0100 Subject: [Numpy-discussion] mapping a function to a masked array In-Reply-To: <20110112114554.GA23259@air.net.au> References: <20110112114554.GA23259@air.net.au> Message-ID: <8528B7DB-F803-418F-93D1-DEDA86E8657B@gmail.com> On Jan 12, 2011, at 12:45 PM, Ben Elliston wrote: > I have a masked array of values that I would like to transform through > a user-defined function. Naturally, I want to ignore any values that > are masked in the initial array. > > The user-defined function examines other points around the value in > question, so I need to use ndenumerate (or similar) to get the array > index as I iterate over the array. Can your function accept arrays as input ? > > So, I have two questions: how to make this run without looping in > Python, and how to avoid masked values. Here is the clunky solution I > have so far: > > result = ma.copy (data) > for i, val in ndenumerate (data): > if not data.mask[i]: > result[i] = myfunc (data, i, val) `result` doesn't have to be a masked array, right ? result = np.empty_like(data) ndata = data.data for (i, (v, m)) in enumerate(zip(ndata, data.mask)): if not m: result[i] = myfunc(ndata, i, v) The main point is to avoid looping on the masked array itself. Instead, you loop on the `data` and `mask` attributes , that are regular ndarrays only. Should be far more efficient that way. Same thing for myfunc: don't call it on the masked array, just on the data part. About looping: well, if you can vectorize your function, you may avoid the loop. You may also wanna try a list comprehension: >>> result = [myfunc(ndata,i,v) for (i,(v,m)) in enumerate(zip(ndata,data.mask)) if not m] and retransform result to a ndarray afterwards. Or use fromiterator ? Let me know how it goes Cheers P. From dstaley at usgs.gov Wed Jan 12 10:31:31 2011 From: dstaley at usgs.gov (dstaley) Date: Wed, 12 Jan 2011 07:31:31 -0800 (PST) Subject: [Numpy-discussion] Variable in an array name? Message-ID: <30645276.post@talk.nabble.com> Is it possible to use a variable in an array name? I am looping through a bunch of calculations, and need to have each array as a separate entity. I'm pretty new to python and numpy, so forgive my ignorance. I'm sure there is a simple answer, but I can't seem to find it. let's say i have a variable 'i': i = 5 I would like my array to have the name array5 I know how I could do this manually, but not in a loop where i is redefined several times. any thoughts/comments/suggestions are appreciated. Thanks. -DS -- View this message in context: http://old.nabble.com/Variable-in-an-array-name--tp30645276p30645276.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From zachary.pincus at yale.edu Wed Jan 12 10:34:31 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 Jan 2011 10:34:31 -0500 Subject: [Numpy-discussion] Variable in an array name? In-Reply-To: <30645276.post@talk.nabble.com> References: <30645276.post@talk.nabble.com> Message-ID: <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> > Is it possible to use a variable in an array name? I am looping > through a > bunch of calculations, and need to have each array as a separate > entity. > I'm pretty new to python and numpy, so forgive my ignorance. I'm > sure there > is a simple answer, but I can't seem to find it. > > let's say i have a variable 'i': > > i = 5 > > I would like my array to have the name array5 > > I know how I could do this manually, but not in a loop where i is > redefined > several times. There are ways to do this, but what you likely actually want is just to put several arrays in a python list and then index into the list, instead of constructing numbered names. e.g.: array_list = [] for whatever: array_list.append(numpy.array(whatever)) for array in array_list: do_something(array) given_array = array_list[i] From dstaley at usgs.gov Wed Jan 12 10:40:51 2011 From: dstaley at usgs.gov (dstaley) Date: Wed, 12 Jan 2011 07:40:51 -0800 (PST) Subject: [Numpy-discussion] Variable in an array name? In-Reply-To: <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> References: <30645276.post@talk.nabble.com> <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> Message-ID: <30654306.post@talk.nabble.com> Zachary Pincus-2 wrote: > >> Is it possible to use a variable in an array name? I am looping >> through a >> bunch of calculations, and need to have each array as a separate >> entity. >> I'm pretty new to python and numpy, so forgive my ignorance. I'm >> sure there >> is a simple answer, but I can't seem to find it. >> >> let's say i have a variable 'i': >> >> i = 5 >> >> I would like my array to have the name array5 >> >> I know how I could do this manually, but not in a loop where i is >> redefined >> several times. > > There are ways to do this, but what you likely actually want is just > to put several arrays in a python list and then index into the list, > instead of constructing numbered names. > > e.g.: > > array_list = [] > > for whatever: > array_list.append(numpy.array(whatever)) > > for array in array_list: > do_something(array) > > given_array = array_list[i] > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Thank you very much for the prompt response. I have already done what you have suggested, but there are a few cases where I do need to have an array named with a variable (looping through large numbers of unrelated files and calculations that need to be dumped into different analyses). It would be extraordinarily helpful if someone could post a solution to this problem, regardless of inefficiency of the method. Thanks a ton for any additional help. -- View this message in context: http://old.nabble.com/Variable-in-an-array-name--tp30645276p30654306.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From paul.anton.letnes at gmail.com Wed Jan 12 11:02:46 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Wed, 12 Jan 2011 17:02:46 +0100 Subject: [Numpy-discussion] Variable in an array name? In-Reply-To: <30654306.post@talk.nabble.com> References: <30645276.post@talk.nabble.com> <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> <30654306.post@talk.nabble.com> Message-ID: <8EE54CE7-CD8A-4756-8E12-608022F8A1FC@gmail.com> On 12. jan. 2011, at 16.40, dstaley wrote: > > > Zachary Pincus-2 wrote: >> >>> Is it possible to use a variable in an array name? I am looping >>> through a >>> bunch of calculations, and need to have each array as a separate >>> entity. >>> I'm pretty new to python and numpy, so forgive my ignorance. I'm >>> sure there >>> is a simple answer, but I can't seem to find it. >>> >>> let's say i have a variable 'i': >>> >>> i = 5 >>> >>> I would like my array to have the name array5 >>> >>> I know how I could do this manually, but not in a loop where i is >>> redefined >>> several times. >> >> There are ways to do this, but what you likely actually want is just >> to put several arrays in a python list and then index into the list, >> instead of constructing numbered names. >> >> e.g.: >> >> array_list = [] >> >> for whatever: >> array_list.append(numpy.array(whatever)) >> >> for array in array_list: >> do_something(array) >> >> given_array = array_list[i] >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > Thank you very much for the prompt response. I have already done what you > have suggested, but there are a few cases where I do need to have an array > named with a variable (looping through large numbers of unrelated files and > calculations that need to be dumped into different analyses). It would be > extraordinarily helpful if someone could post a solution to this problem, > regardless of inefficiency of the method. Thanks a ton for any additional > help. > -- This may be obvious, but I sometimes forget myself: have you tried python dicts? >>> from numpy import * >>> a = linspace(0,10) >>> b = a.copy() >>> d = {'array1':a, 'array2':b} >>> for key in d: ... dosomething(d[key]) That way, you can assign a name / key for each array variable, and use this name for file names, or whatever you need names for. Cheers Paul. From zachary.pincus at yale.edu Wed Jan 12 11:05:58 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 Jan 2011 11:05:58 -0500 Subject: [Numpy-discussion] Variable in an array name? In-Reply-To: <30654306.post@talk.nabble.com> References: <30645276.post@talk.nabble.com> <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> <30654306.post@talk.nabble.com> Message-ID: > Thank you very much for the prompt response. I have already done > what you > have suggested, but there are a few cases where I do need to have an > array > named with a variable (looping through large numbers of unrelated > files and > calculations that need to be dumped into different analyses). It > would be > extraordinarily helpful if someone could post a solution to this > problem, > regardless of inefficiency of the method. Thanks a ton for any > additional > help. You could store arrays associated with string names, or other identifiers, (as opposed to integer indices) in a python dict. Global and local namespaces are also just dicts that you can grab with globals() and locals(), if you really want to look up variable names algorithmically, but I promise you that this is really not what you want to be doing. Zach From bsouthey at gmail.com Wed Jan 12 11:20:39 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 12 Jan 2011 10:20:39 -0600 Subject: [Numpy-discussion] Output dtype In-Reply-To: References: <4D067FF1.9090001@gmail.com> Message-ID: <4D2DD4D7.8020509@gmail.com> On 12/13/2010 04:53 PM, Keith Goodman wrote: > On Mon, Dec 13, 2010 at 12:20 PM, Bruce Southey wrote: > >> Unless something has changed since the docstring was written, this is >> probably an inherited 'bug' from np.mean() as the author expected that >> the docstring of mean was correct. For my 'old' 2.0 dev version: >> >> >>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype >> dtype('float32') >> >>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype >> dtype('float64') > Are you saying the bug is in the doc string, the output, or both? I > think it is both; I expect the second result above to be float32. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Sorry as I filed a bug for this as 1710 http://projects.scipy.org/numpy/ticket/1710 but this is the same as ticket 518 that is listed as won't fix: http://projects.scipy.org/numpy/ticket/518 My expectation is that the internal and output dtypes should not depend on the axis argument. Related to this, I also think that internal dtypes should be the same as the output dtype (see ticket 465 regarding the internal precision http://projects.scipy.org/numpy/ticket/465). If the consensus is still won't fix then I or someone needs to edit the documentation to clearly reflect these situations. Bruce From josef.pktd at gmail.com Wed Jan 12 11:22:49 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 12 Jan 2011 11:22:49 -0500 Subject: [Numpy-discussion] Variable in an array name? In-Reply-To: References: <30645276.post@talk.nabble.com> <5D4AFFDB-9E6C-414B-AE41-48EBDB827A3B@yale.edu> <30654306.post@talk.nabble.com> Message-ID: On Wed, Jan 12, 2011 at 11:05 AM, Zachary Pincus wrote: >> Thank you very much for the prompt response. ?I have already done >> what you >> have suggested, but there are a few cases where I do need to have an >> array >> named with a variable (looping through large numbers of unrelated >> files and >> calculations that need to be dumped into different analyses). ?It >> would be >> extraordinarily helpful if someone could post a solution to this >> problem, >> regardless of inefficiency of the method. ?Thanks a ton for any >> additional >> help. > > You could store arrays associated with string names, or other > identifiers, (as opposed to integer indices) in a python dict. > > Global and local namespaces are also just dicts that you can grab with > globals() and locals(), if you really want to look up variable names > algorithmically, but I promise you that this is really not what you > want to be doing. or (pretending to translate matlab) >>> a = 5 >>> for i in range(5): exec('var_%02d = np.array([%d])'%(i, a+i)) >>> [i for i in globals() if i[:3] == 'var'] ['var_00', 'var_01', 'var_02', 'var_03', 'var_04'] >>> var_00 array([5]) >>> var_01 array([6]) not very pythonic (?) Josef > > Zach > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthewturk at gmail.com Wed Jan 12 11:28:47 2011 From: matthewturk at gmail.com (Matthew Turk) Date: Wed, 12 Jan 2011 11:28:47 -0500 Subject: [Numpy-discussion] Autosummary using numpydoc Message-ID: Hi there, I've been trying to take the numpy docstring and apply the same methodology to a different project I work on, but there are a couple details that I think I'm unclear on, and I was hoping for some pointers or at least reassurances that it's working as intended, despite Sphinx's protests. The process of applying the numpy docstring method seems to take a few steps: 0) Have sphinx >=1.0 installed. 1) Write a numpydoc-compliant docstring in a class and its methods. 2) Make available to sphinx the numpydoc extension (in this case, I am using the current numpy git tip, which self-reports as 0.4) 3) Add the sphinx.ext.autosummary and numpydoc extensions to conf.py in the appropriate sphinx project 4) Copy the autosummary/class.rst template to the appropriate _template directory in the sphinx project 5) Set up the autosummary_generate variable in conf.py to contain all the files containing autosummary directives. 6) Build docs As a few other quick notes, my reading of the overridden class.rst template is that it comments out the individual method inclusions, using the HACK (comment) directive. The build mostly proceeds correctly when these steps have been taken, but a copious number of warnings are emitted. (The same character of warnings are emitted by the numpy documentation build.) These mostly show up, for instance, as: WARNING: toctree contains reference to nonexisting document 'reference/api/generated/add_phase_object' In this case, add_phase_object is a method hanging off an autosummary'ed class. It looks like a warning gets emitted for every method on every autosummary'ed class -- this can number into the thousands very easily. Additionally, warnings that look like: reference/api/generated/yt.visualization.api.PlotCollection.add_phase_object.rst:: WARNING: document isn't included in any toctree show up, where these reflect the full method name and class name. It looks like the toctree is gaining references to the method names without prefixing them with the class names, but the generated docs all have the full name resolution in their filenames. (There also seems to be a glitch in the output when using the ".. HACK" class.rst template, as within my code it appears to strip the one-line descriptions from the methods.) Is this behaving as expected, or have I perhaps gone through the steps wrong? Having so many warnings can confuse debugging other portions of the build. I appreciate any suggestions you might have -- thanks very much! Best, Matt From kwgoodman at gmail.com Wed Jan 12 12:28:05 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 12 Jan 2011 09:28:05 -0800 Subject: [Numpy-discussion] Output dtype In-Reply-To: <4D2DD4D7.8020509@gmail.com> References: <4D067FF1.9090001@gmail.com> <4D2DD4D7.8020509@gmail.com> Message-ID: On Wed, Jan 12, 2011 at 8:20 AM, Bruce Southey wrote: > On 12/13/2010 04:53 PM, Keith Goodman wrote: >> On Mon, Dec 13, 2010 at 12:20 PM, Bruce Southey ?wrote: >> >>> Unless something has changed since the docstring was written, this is >>> probably an inherited 'bug' from np.mean() as the author expected that >>> the docstring of mean was correct. For my 'old' 2.0 dev version: >>> >>> ? >>> ?np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype >>> dtype('float32') >>> ? >>> ?np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype >>> dtype('float64') >> Are you saying the bug is in the doc string, the output, or both? I >> think it is both; I expect the second result above to be float32. >> >> > Sorry as I filed a bug for this as 1710 > http://projects.scipy.org/numpy/ticket/1710 > but this is the same as ticket 518 that is listed as won't fix: > http://projects.scipy.org/numpy/ticket/518 I fixed ticket 518 in bottleneck: >> a = np.array([1,2,3], dtype='float32') >> bn.median(a).dtype dtype('float32') >> np.median(a).dtype dtype('float64') Not sure I would have done that if I knew that numpy has a won't fix on it. From gregory.guyomarch at gmail.com Wed Jan 12 12:34:32 2011 From: gregory.guyomarch at gmail.com (=?utf-8?b?R3LDqWdvcnk=?= Guyomarc'h) Date: Wed, 12 Jan 2011 17:34:32 +0000 (UTC) Subject: [Numpy-discussion] Non-deterministic floating point behavior in numpy 1.5.1 ? Message-ID: Hello, I have noticed strange non-deterministic behaviours with numpy 1.5.1 when using floating point arrays. The following script were run on 6 different machines, all Intel Core i7: - 3 of them running numpy 1.5.1 with either Python 2.7.1 (x86) or 2.5.2 (x86) and, - 3 of them running numpy 1.3 and Python 2.5.2.(x86). import numpy x = numpy.array([[0.00010876945607980702], [0.22568137594619658], [5.6435218858623557]]) for i in range(10): m = numpy.array([[36.0 * 36.0, 36.0, 1.0] for j in range(6)]) y = (numpy.dot(m, x) - 13.90901663) * 1000.0 print y[0] The output on each machine running 1.5.1 are similar to this one: [ 5.00486230e-06] [ 5.00486408e-06] [ 5.00486230e-06] [ 5.00486408e-06] [ 5.00486230e-06] [ 5.00486408e-06] [ 5.00486230e-06] [ 5.00486408e-06] [ 5.00486230e-06] [ 5.00486408e-06] I cannot make sense of the changes of the least significant digits across different iterations of the for loop since its body is actually constant. Note that this behavior is hard to reproduce: on some machines I had to insert dummy print statements here and there to reproduce the bug or increase the number of iterations or the length of the arrays inside the loop. Also, I could not reproduce it with older versions of numpy such as 1.3. Is this behavior expected ? Is there a way to make sure the results of a numpy floating point computations remains the same for multiple runs ? Thanks, Gregory. From pav at iki.fi Wed Jan 12 13:05:59 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 12 Jan 2011 18:05:59 +0000 (UTC) Subject: [Numpy-discussion] Non-deterministic floating point behavior in numpy 1.5.1 ? References: Message-ID: Wed, 12 Jan 2011 17:34:32 +0000, Gr?gory Guyomarc'h wrote: [clip] > y = (numpy.dot(m, x) - 13.90901663) * 1000.0 print y[0] [clip] > Also, I could not reproduce it with older versions of numpy such as 1.3. > Is this behavior expected ? Is there a way to make sure the results of a > numpy floating point computations remains the same for multiple runs? There are essentially no changes in the dot() routine since 1.3.0 in Numpy. The non-determinism is probably in the BLAS linear algebra library you have linked Numpy with. What platform are you using? (Windows? Linux? Where did you obtain Numpy binaries?) What do you get if you replace `numpy.dot` with `numpy.core.multiarray.dot` (which does not use BLAS)? There's another thread on a similar issue here: http://permalink.gmane.org/gmane.comp.python.scientific.user/27444 -- Pauli Virtanen From davecortesi at gmail.com Wed Jan 12 15:57:58 2011 From: davecortesi at gmail.com (David Cortesi) Date: Wed, 12 Jan 2011 12:57:58 -0800 Subject: [Numpy-discussion] Numpy 1.5.1 - Mac - with Activestate Python 3 Message-ID: I have installed ActiveState's Python 3 packages on Mac OS X 10.6.6. There exists: /Library/Frameworks/Python.framework/Versions/Current/Python* When I run the Mac OS installer it shows all disks as ineligible and the error message, "numpy 1.5.1 can't be installed on this disk. numpy requires System Python 2.6 to install." What can I do to persuade numpy to install? Must I build it from source to get it to use Python 3? Sorry for the Noob question, Dave Cortesi -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Jan 12 21:32:00 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed, 12 Jan 2011 18:32:00 -0800 Subject: [Numpy-discussion] Numpy 1.5.1 - Mac - with Activestate Python 3 In-Reply-To: References: Message-ID: <4D2E6420.4000700@noaa.gov> On 1/12/2011 12:57 PM, David Cortesi wrote: > I have installed ActiveState's Python 3 packages on Mac OS X 10.6.6. > When I run the Mac OS installer it shows all disks as ineligible and > the error message, "numpy 1.5.1 can't be installed on this disk. numpy > requires System Python 2.6 to install." Sorry, that is is a bad error message. What I'm pretty sure it means to say is: "numpy requires the Python 2.6 binary from python.org " I looked at this error message ages ago, and It's less trivial to fix that you'd think -- but I thought it had been fixed. > What can I do to persuade numpy to install? Must I build it from source > to get it to use Python 3? You *may* need the python.org binary, rather than ActiveState, but it looks like you're trying to install a numpy binary for 2.6 -- that's not going to work on 3.* -- look for a binary for 3.* -- I'm not sure it exists, though. NOTE: if you're still confused, tell us exactly what file you are trying to install from, and where you downloaded it from. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From seb.haase at gmail.com Thu Jan 13 04:06:16 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 13 Jan 2011 10:06:16 +0100 Subject: [Numpy-discussion] Numpy 1.5.1 - Mac - with Activestate Python 3 In-Reply-To: <4D2E6420.4000700@noaa.gov> References: <4D2E6420.4000700@noaa.gov> Message-ID: On Thu, Jan 13, 2011 at 3:32 AM, Chris Barker wrote: > On 1/12/2011 12:57 PM, David Cortesi wrote: >> I have installed ActiveState's Python 3 packages on Mac OS X 10.6.6. > >> When I run the Mac OS installer it shows all disks as ineligible and >> the error message, "numpy 1.5.1 can't be installed on this disk. numpy >> requires System Python 2.6 to install." > > Sorry, that is is a bad error message. What I'm pretty sure it means to > say is: > > "numpy requires the Python 2.6 binary from python.org " > > I looked at this error message ages ago, and It's less trivial to fix > that you'd think -- but I thought it had been fixed. > >> What can I do to persuade numpy to install? Must I build it from source >> to get it to use Python 3? > > You *may* need the python.org binary, rather than ActiveState, but it > looks like you're trying to install a numpy binary for 2.6 -- that's not > going to work on 3.* -- look for a binary for 3.* -- I'm not sure it > exists, though. > > NOTE: if you're still confused, tell us exactly what file you are trying > to install from, and where you downloaded it from. > > -Chris > Hi David, the simple answer you might be looking for is: it's easier to stay with Python 2.x for a while... Can you deinstall the ActiveState 3 version ? Cheers, - Sebastian Haase From ralf.gommers at googlemail.com Thu Jan 13 05:09:41 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 13 Jan 2011 18:09:41 +0800 Subject: [Numpy-discussion] Numpy 1.5.1 - Mac - with Activestate Python 3 In-Reply-To: References: <4D2E6420.4000700@noaa.gov> Message-ID: On Thu, Jan 13, 2011 at 5:06 PM, Sebastian Haase wrote: > On Thu, Jan 13, 2011 at 3:32 AM, Chris Barker > wrote: > > On 1/12/2011 12:57 PM, David Cortesi wrote: > >> I have installed ActiveState's Python 3 packages on Mac OS X 10.6.6. > > > >> When I run the Mac OS installer it shows all disks as ineligible and > >> the error message, "numpy 1.5.1 can't be installed on this disk. numpy > >> requires System Python 2.6 to install." > > > > Sorry, that is is a bad error message. What I'm pretty sure it means to > > say is: > > > > "numpy requires the Python 2.6 binary from python.org " > > > > I looked at this error message ages ago, and It's less trivial to fix > > that you'd think -- but I thought it had been fixed. > That message comes from bdist_mpkg. I fixed it on my machine, and a fix was also committed to the svn repo. However, I think there was no new bdist_mpkg release on pypi (I did get a "why don't you just use eggs instead?") and the 1.5.1 binaries were not made on my machine. So the problem returned. > > > >> What can I do to persuade numpy to install? Must I build it from source > >> to get it to use Python 3? > > Yes, there is no binary for Python 3 at the moment. But unless you have a specific need/desire to use 3.1/3.2 I'd suggest staying with 2.6 or 2.7 from python.org for now. With Activestate you have to compile yourself or use their (paid?) pypm repo. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.EMMEL at 3ds.com Thu Jan 13 11:04:33 2011 From: Thomas.EMMEL at 3ds.com (EMMEL Thomas) Date: Thu, 13 Jan 2011 16:04:33 +0000 Subject: [Numpy-discussion] Any idea to run the dot-product on many arrays Message-ID: <3A0080EEBFB19C4993C24098DD0A78D1226CC110@EU-DCC-MBX01.dsone.3ds.com> Hi, I need to rotate many vectors (x,y,z) with a given rotation matrix (3x3). I can always do for v in vectors: tv += np.dot(mat, v) where mat is my fixed matrix (or array of arrays) and v is a single array. Is there any efficient way to use an array of vectors to do the transfomation for all of these vectors at once? Kind regards Thomas This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pascal22p at parois.net Thu Jan 13 11:13:24 2011 From: pascal22p at parois.net (Pascal) Date: Thu, 13 Jan 2011 17:13:24 +0100 Subject: [Numpy-discussion] Any idea to run the dot-product on many arrays In-Reply-To: <3A0080EEBFB19C4993C24098DD0A78D1226CC110@EU-DCC-MBX01.dsone.3ds.com> References: <3A0080EEBFB19C4993C24098DD0A78D1226CC110@EU-DCC-MBX01.dsone.3ds.com> Message-ID: <4D2F24A4.6050807@parois.net> On 01/13/2011 05:04 PM, EMMEL Thomas wrote: > Hi, > > I need to rotate many vectors (x,y,z) with a given rotation matrix (3x3). > I can always do > > for v in vectors: > tv += np.dot(mat, v) > > where mat is my fixed matrix (or array of arrays) and v is a single array. > Is there any efficient way to use an array of vectors to do the > transfomation > for all of these vectors at once? numpy.dot(rotationmatrix , coordinates.T).T Where coordinates is a n*3 matrix of n stacked vectors in rows. It works with vectors stacked in column without the the two transpose. It's even possible to apply a symmetry operation to a bunch of second rank tensors in one go. Pascal From davecortesi at gmail.com Thu Jan 13 13:49:17 2011 From: davecortesi at gmail.com (David Cortesi) Date: Thu, 13 Jan 2011 10:49:17 -0800 Subject: [Numpy-discussion] Is python 3 supported or not? Message-ID: I asked about getting numpy to install on OS X with Activestate Python 3. I got thoughtful & responsive replies from three of you, many thanks to all! I am sad that the consistent message was, "forget it." Chris said, "...look for a binary for 3.* -- I'm not sure it exists, though." Sebastian said, "the simple answer you might be looking for is: it's easier to stay with Python 2.x..." Ralf said, "?unless you have a specific need/desire to use 3.1/3.2 I'd suggest staying with 2.6 or 2.7 from python.org for now..." I would like to point out that the wikipedia article on numpy says, "The release version 1.5 of NumPy is compatible with Python versions 2.4?2.7 and Python 3," citing the release note of september 2010, which itself opens with the following lines: > Highlights > > Python 3 compatibility > > This is the first NumPy release which is compatible with Python 3. There is an obvious disconnect here. Is it or isn't it? This is an important question because of the large number of packages at PyPI that depend on numpy. Numpy is a major gateway, or bottleneck, on the way to Python 3. I came looking for numpy because I want to work with an audio package, and all the audio packages at PyPI seem to have numpy dependencies. Ditto the packages for dealing with FITS data format, etc. etc. As to using Activestate's versus python.org's distro, *regardless* of which I use, the package will end up located in /Library/Frameworks/Python.framework/*. It will not be installed in /System/Library/etc. as the Apple distribution is; but it WILL be located at a known location with an executable named Python under Versions/Current. Not that it matters, but the reason I'm using Activestate is because I also needed their up to date version of Tcl/Tk, and python.org python3 wouldn't work with that. As to why I'm using Python 3, it's because I'm starting a new project with no prior dependencies and want the current and future language -- which is now TWO FRAKKIN' YEARS OLD! -- but that's a rant for another time. Thanks again for your attention, Dave Cortesi From numpy-discussion at maubp.freeserve.co.uk Thu Jan 13 14:24:41 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Jan 2011 19:24:41 +0000 Subject: [Numpy-discussion] Is python 3 supported or not? In-Reply-To: References: Message-ID: On Thu, Jan 13, 2011 at 6:49 PM, David Cortesi wrote: > > I asked about getting numpy to install on OS X with Activestate Python > 3. I got thoughtful & responsive replies from three of you, many > thanks to all! I am sad that the consistent message was, "forget it." I thought the message was since there isn't the easy option of a binary installer provided for Python 3 (yet), you should just install NumPy from source if you really want to use Python 3. That works for me fine on Mac OS X 10.6 (using both Python 3.1 and the current beta of Python 3.2, both themselves compiled from source). Peter P.S. You forgot to reference the thread, for those that missed it see: http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054486.html From pav at iki.fi Thu Jan 13 14:25:47 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 13 Jan 2011 19:25:47 +0000 (UTC) Subject: [Numpy-discussion] Is python 3 supported or not? References: Message-ID: On Thu, 13 Jan 2011 10:49:17 -0800, David Cortesi wrote: [clip] >> Highlights >> >> Python 3 compatibility >> >> This is the first NumPy release which is compatible with Python 3. > > There is an obvious disconnect here. Is it or isn't it? There is no disconnect. The fact just is that nobody has yet built easily redistributable binary packages for OSX. If you really want to run Python 3, just build it yourself from the sources. -- Pauli Virtanen From pav at iki.fi Thu Jan 13 14:27:12 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 13 Jan 2011 19:27:12 +0000 (UTC) Subject: [Numpy-discussion] Is python 3 supported or not? References: Message-ID: On Thu, 13 Jan 2011 19:25:47 +0000, Pauli Virtanen wrote: [clip] > If you really want to run Python 3, just build it yourself from the > sources. Of course, this should have been: "..., just build Numpy from the sources." From Chris.Barker at noaa.gov Thu Jan 13 17:20:17 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 13 Jan 2011 14:20:17 -0800 Subject: [Numpy-discussion] Is python 3 supported or not? In-Reply-To: References: Message-ID: <4D2F7AA1.4010304@noaa.gov> On 1/13/2011 10:49 AM, David Cortesi wrote: > I would like to point out that the wikipedia article on numpy says, > "The release version 1.5 of NumPy is compatible with Python versions > 2.4?2.7 and Python 3," citing the release note of september 2010, > which itself opens with the following lines: > There is an obvious disconnect here. Is it or isn't it? Support is not an absolute thing -- Python 2 is certainly better supported at this point in many ways, but yes, numpy works with Python3 > the large number of packages at PyPI > that depend on numpy. Numpy is a major gateway, or bottleneck, on the > way to Python 3. yes, but I doubt that many (any) of the packages that require numpy don't work on 2.0. Indeed, many of them probably are not yet ported to 3. Personally, I can't move 'till PIL and wxPython are ported, and maybe Pylons (Pyramid), too. > I came looking for numpy because I want to work with > an audio package, and all the audio packages at PyPI seem to have > numpy dependencies. Ditto the packages for dealing with FITS data > format, etc. etc. I'd make darn sure EVERYTHING you think you'll need is py3 compatible. > As to using Activestate's versus python.org's distro, *regardless* of > which I use, the package will end up located in > /Library/Frameworks/Python.framework/*. It will not be installed in > /System/Library/etc. as the Apple distribution is; but it WILL be > located at a known location with an executable named Python under > Versions/Current. That's just a link -- I Hope the actuall package is not in exactly the same place as the python,org binary gets installed -- but maybe it is -- in the past, they have generally been pretty compatible. There are way, way too many ways to get Python on the Mac -- varierty is good, but it is very confusing for newbies, and difficult for anyone that wants to distribute binaries. In general, the community tried to build binaries for the python.org builds, so there are advantages there. > As to why I'm using Python 3, it's because I'm starting a new project > with no prior dependencies and want the current and future language -- > which is now TWO FRAKKIN' YEARS OLD! -- but that's a rant for another > time. umm isn't that amazing, py3 has only been around for two years, and numpy and many other packages already support it! Fabulous! How long has python been around? how long numpy (and numeric before it?) How much work have you done to port things to Py3? > you should just install > NumPy from source if you really want to use Python 3. That works > for me fine on Mac OS X 10.6 (using both Python 3.1 and the > current beta of Python 3.2, both themselves compiled from source). yup -- as it happens, Apple delivers LAPACK, and has a standard, and freely available compiler -- building numpy on OS-X is a piece of cake. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gael.varoquaux at normalesup.org Fri Jan 14 03:26:11 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 14 Jan 2011 09:26:11 +0100 Subject: [Numpy-discussion] Is python 3 supported or not? In-Reply-To: References: Message-ID: <20110114082611.GA19526@phare.normalesup.org> On Thu, Jan 13, 2011 at 10:49:17AM -0800, David Cortesi wrote: > As to why I'm using Python 3, it's because I'm starting a new project > with no prior dependencies and want the current and future language -- > which is now TWO FRAKKIN' YEARS OLD! -- but that's a rant for another > time. Oh, you're saying that you'd like to help with building and distributing Python 3 binaries of numpy? G :$ From seb.haase at gmail.com Fri Jan 14 03:47:58 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 14 Jan 2011 09:47:58 +0100 Subject: [Numpy-discussion] Is python 3 supported or not? In-Reply-To: <20110114082611.GA19526@phare.normalesup.org> References: <20110114082611.GA19526@phare.normalesup.org> Message-ID: On Fri, Jan 14, 2011 at 9:26 AM, Gael Varoquaux wrote: > On Thu, Jan 13, 2011 at 10:49:17AM -0800, David Cortesi wrote: >> As to why I'm using Python 3, it's because I'm starting a new project >> with no prior dependencies and want the current and future language -- >> which is now TWO FRAKKIN' YEARS OLD! -- but that's a rant for another >> time. > > Oh, you're saying that you'd like to help with building and distributing > Python 3 binaries of numpy? > > G :$ David, One of the greatest things about Python - I found - is that it doesn't change every year. The fact that 3.0 came out 2 years ago does not change the fact that everyone says they are still committed to support Python 2 for 10 more years to come. (I hope this is the right number, but it is certainly is > 5 yrs) Python 3 is somewhat of a "bigger change" and the various sub-project communities where reluctant to switch right away. Don't confuse the degree of change with "Perl 6" - for what I have heard, that "change" is rather a new language, .... while in Python - as example - 1/2 while now be .5 and you would have to write 1//2 to get the old results of 0 . My answer, I gave you few days ago, was kept as general as possible - since you didn't say at the time what your actual needs/plans were. The fact that Numpy is now ready for Python 3 does nowhere imply that everything you might likely want to use with it (SciPy) is also as stable and well tested with Python 3 as Numpy is. Finally - let me teach you some python: (take it with a grain of salt ;-) ) if you write in Python 2(!!) from __future__ import division from __future__ import print_function from __future__ importabsolute_import at the beginning of each module you(!) write you can essentially already use most (many) features of Python 3 in Python 2. This way you can use all packages as they are available for Python 2 and already write your new modules "the Python 3 way". [see also e.g. http://stackoverflow.com/questions/388069/python-graceful-future-feature-future-import ] I'm sorry to tell you that this is not the list for flame wars, but rather the list of the bunch of most helpful people I found. Cheers, Sebastian From joonpyro at gmail.com Fri Jan 14 15:03:16 2011 From: joonpyro at gmail.com (Joon Ro) Date: Fri, 14 Jan 2011 14:03:16 -0600 Subject: [Numpy-discussion] NaN value processing in weave.inline code Message-ID: Hi, I was wondering if it is possible to process (in if statement - check if the given value is NaN) numpy NaN value inside the weave.inline c code. testcode = ''' if (test(0)) { return_val = test(0); } ''' err = weave.inline(testcode, ['test'], type_converters = converters.blitz, force = 0, verbose = 1) with test(0) = nan returns err = nan correctly, but I don't know how to check the nan value inside the c inline c code. Is there any way I can get similar functionality as isnan? Thank you, Joon -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jan 14 15:06:42 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 14 Jan 2011 12:06:42 -0800 Subject: [Numpy-discussion] NaN value processing in weave.inline code In-Reply-To: References: Message-ID: On Fri, Jan 14, 2011 at 12:03 PM, Joon Ro wrote: > Hi, > I was wondering if it is possible to process (in if statement - check if the > given value is NaN) numpy NaN value inside the weave.inline c code. > > testcode = ''' > if (test(0)) { > ? ? ? return_val = test(0); > } > ''' > > err = weave.inline(testcode, > ?['test'], > type_converters = converters.blitz, force = 0, verbose = 1) > > > with test(0) = nan returns err = nan correctly, but I don't know how to > check the nan value inside the c inline c code. Is there any way I can get > similar functionality as isnan? To check if a scalar, x, is NaN: if x == x: # No, it is not a NaN else: # Yes, it is a NaN From joonpyro at gmail.com Fri Jan 14 15:13:43 2011 From: joonpyro at gmail.com (Joon Ro) Date: Fri, 14 Jan 2011 14:13:43 -0600 Subject: [Numpy-discussion] NaN value processing in weave.inline code In-Reply-To: References: Message-ID: Oops .. I guess isnan() inside the weave code just works fine. Should have tried this first. By the way, is there any speed lost doing this? Should I convert all NaN values into a integer and use it inside the weave inline c code? -Joon On Fri, 14 Jan 2011 14:03:16 -0600, Joon Ro wrote: > Hi, > > I was wondering if it is possible to process (in if statement - check if > the given value is NaN) numpy NaN value inside the weave.inline c code. > > > testcode = ''' > if (test(0)) { > return_val = test(0); > } > ''' > > err = weave.inline(testcode, > ['test'], > type_converters = converters.blitz, force = 0, verbose = 1) > > > with test(0) = nan returns err = nan correctly, but I don't know how to > check the nan value inside the c inline c code. Is there any way I can > get similar functionality as isnan? > > Thank you, > Joon > -- > -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jan 14 15:33:03 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Jan 2011 15:33:03 -0500 Subject: [Numpy-discussion] isposinf returns array, isinf doesn't Message-ID: maybe just cosmetic, I just found this >>> stats.poisson.b 1.#INF >>> np.isinf(stats.poisson.b) True >>> np.isinf(-stats.poisson.b) True >>> np.isposinf(stats.poisson.b) array(True, dtype=bool) >>> np.isneginf(stats.poisson.b) array(False, dtype=bool) >>> np.isneginf(-stats.poisson.b) array(True, dtype=bool) but shape is the same >>> np.isneginf(stats.poisson.b).shape () >>> np.isinf(stats.poisson.b).shape () >>> type(np.isneginf(stats.poisson.b)) >>> type(np.isinf(stats.poisson.b)) Josef From dstaley at usgs.gov Fri Jan 14 17:52:45 2011 From: dstaley at usgs.gov (dstaley) Date: Fri, 14 Jan 2011 14:52:45 -0800 (PST) Subject: [Numpy-discussion] NOOB Alert: Looping Through Text Files... Message-ID: <30676099.post@talk.nabble.com> Warning, I am a python noob. Not only do I not know python, I really don't know anything about programming outside of ArcInfo and the ancient AML language. Regardless, here is my problem.... Let's say I have three text files (test1.txt, test2.txt and test3.txt). Each text file has 1 line of text in it "This is my text file", to which I want to add (append) a new line of text saying "I suck at Python and need help", and then save the file with a suffix attached (eg test1_modified.txt, test2_modified.txt, test3_modified.txt). I guess this is the equivalent of making a change in MS Word and using the "Save As..." command to maintain the integrity of the original file, and save the changes in a new file. But, I want to also do that in a loop (this is a simplified example of something I want to do with hundreds of text files). Now, I understand how to add this line to an existing text file: text_file = open("test1.txt", "a") text_file.write("\nI suck at Python and need help") text_file.close() While this adds a line of text, it saves the change to the original file (does not add the _modified.txt suffix to the file name), nor does it allow me to loop through all three of the files. I'm sure this is an easy thing to do, and is online in a million places. Unfortunately, I just cannot seem to find the answer. Here is my thought process: First i would define the list from which I would loop: textlist = ["test1.txt", "test2.txt", "test3.txt"] for i in textlist: text_file = open(textlist, "a") text_file.write("\nI suck at Python and need help") text_file.close() But, this doesn't work. It gives me the error: coercing to Unicode: need string or buffer, list found SO, I guess I need to do this from something other than a list? Even if it did work, it does not suit my needs as it does not create a new file and does not allow me to add the _modified.txt suffix, which will allow me to keep the original file intact. >From a responses to a previous post, this seems as if it may have something to do with a python dictionary, but I'm not sure. I'm probably totally off on how this should even be written, so any advice or suggestions would be greatly appreciated. Thanks in advance for your help! -DS -- View this message in context: http://old.nabble.com/NOOB-Alert%3A-Looping-Through-Text-Files...-tp30676099p30676099.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From dstaley at usgs.gov Fri Jan 14 17:57:10 2011 From: dstaley at usgs.gov (dstaley) Date: Fri, 14 Jan 2011 14:57:10 -0800 (PST) Subject: [Numpy-discussion] NOOB Alert: Looping Through Text Files... Message-ID: <30676099.post@talk.nabble.com> Warning, I am a python noob. Not only do I not know python, I really don't know anything about programming outside of ArcInfo and the ancient AML language. Regardless, here is my problem.... Let's say I have three text files (test1.txt, test2.txt and test3.txt). Each text file has 1 line of text in it "This is my text file", to which I want to add (append) a new line of text saying "I suck at Python and need help", and then save the file with a suffix attached (eg test1_modified.txt, test2_modified.txt, test3_modified.txt). I guess this is the equivalent of making a change in MS Word and using the "Save As..." command to maintain the integrity of the original file, and save the changes in a new file. But, I want to also do that in a loop (this is a simplified example of something I want to do with hundreds of text files). Now, I understand how to add this line to an existing text file: text_file = open("test1.txt", "a") text_file.write("\nI suck at Python and need help") text_file.close() While this adds a line of text, it saves the change to the original file (does not add the _modified.txt suffix to the file name), nor does it allow me to loop through all three of the files. I'm sure this is an easy thing to do, and is online in a million places. Unfortunately, I just cannot seem to find the answer. Here is my thought process: First i would define the list from which I would loop: textlist = ["test1.txt", "test2.txt", "test3.txt"] for i in textlist: text_file = open(textlist, "a") text_file.write("\nI suck at Python and need help") text_file.close() But, this doesn't work. It gives me the error: coercing to Unicode: need string or buffer, list found SO, I guess I need to do this from something other than a list? Even if it did work, it does not suit my needs as it does not create a new file and does not allow me to add the _modified.txt suffix, which will allow me to keep the original file intact. >From a responses to a previous post, this seems as if it may have something to do with a python dictionary, but I'm not sure. I'm probably totally off on how this should even be written, so any advice or suggestions would be greatly appreciated. Thanks in advance for your help! -DS EDIT: I posted this to the NUMPY list because my writing the new line of text is really a numpy function. I apologize if this is an improper forum for this posting. -- View this message in context: http://old.nabble.com/NOOB-Alert%3A-Looping-Through-Text-Files...-tp30676099p30676099.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From zachary.pincus at yale.edu Fri Jan 14 19:09:38 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 14 Jan 2011 19:09:38 -0500 Subject: [Numpy-discussion] NOOB Alert: Looping Through Text Files... In-Reply-To: <30676099.post@talk.nabble.com> References: <30676099.post@talk.nabble.com> Message-ID: > textlist = ["test1.txt", "test2.txt", "test3.txt"] > > for i in textlist: > text_file = open(textlist, "a") > text_file.write("\nI suck at Python and need help") > text_file.close() > > But, this doesn't work. It gives me the error: > > coercing to Unicode: need string or buffer, list found Yeah, it's probably the wrong list; still, easy enough to answer... You want this: text_file = open(i, "a") to open the individual file, as opposed to: text_file = open(textlist, "a") which tries to open the whole list of filenames. Python cannot figure out how to turn the list into a single (unicode) filename, hence the error. As for your wanting to write files with new names: for txtfile in txtlist: f = open(txtfile, 'r') data = f.read() f.close() fnew = open(get_new_name(txtfile), 'w') fnew.write(data) fnew.write('\nHelp is on the way.') fnew.close() where get_new_name() or whatever is defined appropriately to transform the old name ('test1.txt', say) into 'test1_appended.txt'... Zach From josef.pktd at gmail.com Fri Jan 14 22:31:38 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Jan 2011 22:31:38 -0500 Subject: [Numpy-discussion] special function xlogy with 0log0=0 Message-ID: I'm trying to fix again a case with x*log(y) which was a converted log(0**0), where x and y should broadcast. Is it possible to get a special function for this. It shows up very often, and any cheap fix, e.g. term = x*np.log(y) term[(x==0)*(y==0)] = 0 raises a warning (which I now also see) x*log(y+1e-300) would be easier, but I don't know whether there are no cases where numerical precision deteriorates. In another version, Skipper clipped y What's the best way? Josef From ben.root at ou.edu Sat Jan 15 00:16:02 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 14 Jan 2011 23:16:02 -0600 Subject: [Numpy-discussion] NOOB Alert: Looping Through Text Files... In-Reply-To: <30676099.post@talk.nabble.com> References: <30676099.post@talk.nabble.com> Message-ID: On Friday, January 14, 2011, dstaley wrote: > > Warning, I am a python noob. ?Not only do I not know python, I really don't > know anything about programming outside of ArcInfo and the ancient AML > language. ?Regardless, here is my problem.... > > Let's say I have three text files (test1.txt, test2.txt and test3.txt). > Each text file has 1 line of text in it "This is my text file", to which I > want to add (append) a new line of text saying "I suck at Python and need > help", and then save the file with a suffix attached (eg test1_modified.txt, > test2_modified.txt, test3_modified.txt). > > I guess this is the equivalent of making a change in MS Word and using the > "Save As..." command to maintain the integrity of the original file, and > save the changes in a new file. ?But, I want to also do that in a loop (this > is a simplified example of something I want to do with hundreds of text > files). > > Now, I understand how to add this line to an existing text file: > > text_file = open("test1.txt", "a") > text_file.write("\nI suck at Python and need help") > text_file.close() > > While this adds a line of text, it saves the change to the original file > (does not add the _modified.txt suffix to the file name), nor does it allow > me to loop through all three of the files. > > I'm sure this is an easy thing to do, and is online in a million places. > Unfortunately, I just cannot seem to find the answer. ?Here is my thought > process: > > First i would define the list from which I would loop: > > textlist = ["test1.txt", "test2.txt", "test3.txt"] > > for i in textlist: > ?? ? ? ?text_file = open(textlist, "a") > ?? ? ? ?text_file.write("\nI suck at Python and need help") > ?? ? ? ?text_file.close() > > But, this doesn't work. ?It gives me the error: > > coercing to Unicode: need string or buffer, list found > > SO, I guess I need to do this from something other than a list? > > Even if it did work, it does not suit my needs as it does not create a new > file and does not allow me to add the _modified.txt suffix, which will allow > me to keep the original file intact. > > >From a responses to a previous post, this seems as if it may have something > to do with a python dictionary, but I'm not sure. > > I'm probably totally off on how this should even be written, so any advice > or suggestions would be greatly appreciated. > > Thanks in advance for your help! > > -DS > DS, First, the problem with your loop is that you should pass 'i' not 'textlist' to the call to open. Second, to do what you want, you need to copy the file (maybe something in is.sys?) and then append the text to that copied file. I hope that helps > -- > View this message in context: http://old.nabble.com/NOOB-Alert%3A-Looping-Through-Text-Files...-tp30676099p30676099.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From imhexyl at gmail.com Sat Jan 15 04:15:07 2011 From: imhexyl at gmail.com (Hexyl Chan) Date: Sat, 15 Jan 2011 17:15:07 +0800 Subject: [Numpy-discussion] Is Savitzky-Golay filter provided by cookbook different from matlab's? Message-ID: I have used the Savitzky-Golay filter provided by cookbook, but the result is different from the one given by matlab function sgolayfilt(x,2,15). I attached the numpy code below: #! /usr/bin/env python # -*- coding:utf-8 -*- import numpy as np def savitzky_golay(y, window_size, order, deriv=0): r"""Smooth (and optionally differentiate) data with a Savitzky-Golay filter. The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techhniques. Parameters ---------- y : array_like, shape (N,) the values of the time history of the signal. window_size : int the length of the window. Must be an odd integer number. order : int the order of the polynomial used in the filtering. Must be less then `window_size` - 1. deriv: int the order of the derivative to compute (default = 0 means only smoothing) Returns ------- ys : ndarray, shape (N) the smoothed signal (or it's n-th derivative). Notes ----- The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. Examples -------- t = np.linspace(-4, 4, 500) y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape) ysg = savitzky_golay(y, window_size=31, order=4) import matplotlib.pyplot as plt plt.plot(t, y, label='Noisy signal') plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal') plt.plot(t, ysg, 'r', label='Filtered signal') plt.legend() plt.show() References ---------- .. [1] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 1964, 36 (8), pp 1627-1639. .. [2] Numerical Recipes 3rd Edition: The Art of Scientific Computing W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Cambridge University Press ISBN-13: 9780521880688 """ try: window_size = np.abs(np.int(window_size)) order = np.abs(np.int(order)) except ValueError, msg: raise ValueError("window_size and order have to be of type int") if window_size % 2 != 1 or window_size < 1: raise TypeError("window_size size must be a positive odd number") if window_size < order + 2: raise TypeError("window_size is too small for the polynomials order") order_range = range(order+1) half_window = (window_size -1) // 2 # precompute coefficients b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)]) m = np.linalg.pinv(b).A[deriv] # pad the signal at the extremes with # values taken from the signal itself firstvals = y[0] - np.abs( y[1:half_window+1][::-1] - y[0] ) lastvals = y[-1] + np.abs(y[-half_window-1:-1][::-1] - y[-1]) y = np.concatenate((firstvals, y, lastvals)) return np.convolve( m, y, mode='valid') if __name__ == "__main__": rrstype = np.dtype({'names':['wavelenth', 'rrs'], 'formats':['f','f']}) record_list = [] record_file = open("200607NO1.txt") for line in record_file: item_list = line.strip().split('\t') record_list.append((item_list[0], item_list[1])) record_file.close() record_array = np.array(record_list,dtype=rrstype) rrs_array = record_array["rrs"] rrs_array_2 = savitzky_golay(rrs_array, 15, 2) rrs_array_2.tofile("savitzky-golay-result.txt", "\n") -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Jan 15 08:28:57 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 15 Jan 2011 21:28:57 +0800 Subject: [Numpy-discussion] segfault on complex array on solaris x86 In-Reply-To: References: Message-ID: I've opened http://projects.scipy.org/numpy/ticket/1713 so this doesn't get lost. Ralf On Thu, Jan 6, 2011 at 12:27 AM, John Hunter wrote: > johnh at udesktop253:~> gcc --version > gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath) > Copyright (C) 2004 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > johnh at udesktop253:~> uname -a > SunOS udesktop253 5.10 Generic_142910-17 i86pc i386 i86pc > > johnh at udesktop253:~> cat test.py > import numpy as np > print np.__version__ > fs = 1000 > t = np.linspace(0, 0.3, 301) > A = np.array([2, 8]).reshape(-1, 1) > f = np.array([150, 140]).reshape(-1, 1) > xn = (A * np.exp(2j * np.pi * f * t)).sum(axis=0) > > johnh at udesktop253:~> python test.py > 2.0.0.dev-9451260 > Segmentation Fault (core dumped) > johnh at udesktop253:~> > > johnh at udesktop253:~> sudo pstack /var/core/core.python.957 > core '/var/core/core.python.957' of 9397: python test.py > febf1928 cexp (0, 0, 0, 0, 8060ab0, 84321ac) + 1b0 > fe9657e0 npy_cexp (80458e0, 0, 0, 0, 0, 84e2530) + 30 > fe95064f nc_exp (8045920, 84e72a0, 8045978, 8045920, 10, 10) + 3f > fe937d5b PyUFunc_D_D (84e2530, 84e20f4, 84e25b0, fe950610, 1, 0) + 5b > fe95e818 PyUFunc_GenericFunction (81e96e0, 807deac, 0, 80460b8, 2, 2) + > 448 > fe95fb10 ufunc_generic_call (81e96e0, 807deac, 0, fe98a820) + 70 > feeb2d78 PyObject_Call (81e96e0, 807deac, 0, 80a24ec, 8061c08, 0) + 28 > fef11900 PyEval_EvalFrame (80a2394, 81645a0, 8079824, 8079824) + 146c > fef17708 PyEval_EvalCodeEx (81645a0, 8079824, 8079824, 0, 0, 0) + 620 > fef178af PyEval_EvalCode (81645a0, 8079824, 8079824, 8061488, fef3d9ee, 0) > + 2f > fef3d095 PyRun_FileExFlags (feb91c98, 804687b, 101, 8079824, 8079824, 1) + > 75 > fef3d9ee PyRun_SimpleFileExFlags (feb91c98, 804687b, 1, 80465a8, fef454a1, > 804687b) + 172 > fef3e4fd PyRun_AnyFileExFlags (feb91c98, 804687b, 1, 80465a8) + 61 > fef454a1 Py_Main (1, 80466b8, feb1cf35, fea935a1, 29, feb96750) + 9d9 > 08050862 main (2, 80466b8, 80466c4) + 22 > 08050758 _start (2, 8046874, 804687b, 0, 8046883, 80468ad) + 60 > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 15 15:27:26 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Jan 2011 15:27:26 -0500 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy Message-ID: After upgrading to numpy 1.5.1 I got caught by some depreciated features. Given the depreciation policy of numpy, if we want to support more than two versions of numpy, then we need some conditional execution. Does anyone have any compatibility functions? I haven't looked at it carefully yet, but statsmodels might need things like the following if we want to support numpy 1.3 if np.__version__ < '1.5': freq,hsupp = np.histogram(rvs, histsupp, new=True) else: freq,hsupp = np.histogram(rvs,histsupp) matplotlib says it supports numpy >=1.1 but I didn't see any compatibility code that I could "borrow". Or do I worry for nothing? The compatibility.py in statsmodels is still almost empty. Josef From ralf.gommers at googlemail.com Sun Jan 16 00:22:02 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 16 Jan 2011 13:22:02 +0800 Subject: [Numpy-discussion] Prime size FFT: bluestein transform vs general chirp/z transform ? In-Reply-To: References: Message-ID: On Mon, Jan 3, 2011 at 2:46 PM, David Cournapeau wrote: > Hi, > > I finally took the time to clean up my code to speed up prime-size FFT > (which use a O(N^2) algo in both numpy and scipy). The code is there: > https://github.com/cournape/numpy/tree/bluestein (most of the code is > tests, because numpy.fft had almost none). > Bottom line: it is used only for prime numbers, and is faster than the > current code for complex transforms > 500. Because of python + > inherent bluestein overhead, this is mostly useful for "long" fft > (where the speed up is significant - already 100x speed up for prime > size ~ 50000). > Very nice, works like a charm for me! > > Several comments: > - the overhead is pretty significant (on my machine, bluestein > transfrom is slower for prime size < 500) > - it could be used as such for real transforms, but the overhead > would be even more significant (there is no bluestein transform for > real transforms, so one needs to re-rexpress real transforms in term > of complex ones, multiplying the overhead by 2x). There are several > alternatives to make things faster (Rader-like transform, as used by > fftw), but I think this would be quite hard to do in python without > significant slowdown, because the code cannot be vectorized. > - one could also decide to provide a chirp-z transform, of which > Bluestein transform is a special case. Maybe this is more adapted to > scipy ? > This is just terminology, but according to Wikipedia the Bluestein transform is the chirp-z transform, which is a special case of the z-transform. Is that what you meant? A z-transform may also be useful for digital filter design and other applications, scipy seems like the right place for it. > - more generic code will require a few simple (but not trivial) > arithmetic-like functions (find prime factors, find generator of Z/nZ > groups with n prime, etc...). Where should I put those ? > > I'm guessing you are talking about code that allows you to use the Bluestein algorithm also for non-prime sizes where it makes sense, for example to speed up the second case of this: In [24]: x = np.random.random(5879) # a large prime In [25]: %timeit np.fft.fft(x) 100 loops, best of 3: 8.65 ms per loop In [26]: x = np.random.random(5879*2) # Bluestein not used In [27]: %timeit np.fft.fft(x) 1 loops, best of 3: 241 ms per loop Probably just keep it in fft/helper.py is it's not too much code? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ater1980 at gmail.com Sun Jan 16 03:24:26 2011 From: ater1980 at gmail.com (Alex Ter-Sarkissov) Date: Sun, 16 Jan 2011 21:24:26 +1300 Subject: [Numpy-discussion] float conversion Message-ID: hi every1, I got the following issue: I wrote a function that converts binary strings into a decimal value (binary expansion). When I write type(x) to find out the type of the value I get NoneType. Therefore I can't convert it into anything else, such as float numbers, since command float(x) returns TypeError: float() argument must be a string or a number Any ideas what to do with this (e.g. convert to floating numbers)? cheers, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckkart at hoc.net Sun Jan 16 06:05:24 2011 From: ckkart at hoc.net (Christian K.) Date: Sun, 16 Jan 2011 12:05:24 +0100 Subject: [Numpy-discussion] float conversion In-Reply-To: References: Message-ID: Am 16.01.11 09:24, schrieb Alex Ter-Sarkissov: > hi every1, > > I got the following issue: I wrote a function that converts binary > strings into a decimal value (binary expansion). When I write > > type(x) > > to find out the type of the value I get NoneType. Therefore I can't Your function most probably returns None. Show us your code and we will able to help. Regards, Christian From hector1618 at gmail.com Sun Jan 16 08:09:41 2011 From: hector1618 at gmail.com (Hector troy) Date: Sun, 16 Jan 2011 18:39:41 +0530 Subject: [Numpy-discussion] Getting a clone copy of the NumPy repository. Message-ID: Hello everyone, I am a newbie on this open source world, and sincerely trying to make contribution to the development of Numpy. I was trying to learn about making patches from http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html but unable to get the clone of Numpy repository. In the terminal error massage shown is - $hector at hector:~$ sudo git clone git://github.com/numpy/numpy.git [sudo] password for hector: Initialized empty Git repository in /home/hector/numpy/.git/ github.com[0: 207.97.227.239]: errno=Connection timed out fatal: unable to connect a socket (Connection timed out) $hector at hector:~$ My internet is quite good at the moment but unable to understand why am I getting this error. Any help regarding this will be extremely helpful and encouraging for me. Thanking you in anticipation. -- -Regards Hector Whenever you think you can or you can't, in either way you are right. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 16 10:01:29 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 17 Jan 2011 00:01:29 +0900 Subject: [Numpy-discussion] Getting a clone copy of the NumPy repository. In-Reply-To: References: Message-ID: On Sun, Jan 16, 2011 at 10:09 PM, Hector troy wrote: > Hello everyone, I am a newbie on this open source world, and sincerely > trying to make contribution to the development of Numpy. I was trying to > learn about making patches from > http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html? but unable to get > the clone of Numpy repository. > > In the terminal error massage shown is - > > ??? $hector at hector:~$ sudo git clone git://github.com/numpy/numpy.git > ??? [sudo] password for hector: > ??? Initialized empty Git repository in /home/hector/numpy/.git/ > ??? github.com[0: 207.97.227.239]: errno=Connection timed out > ??? fatal: unable to connect a socket (Connection timed out) > ??? $hector at hector:~$ You should try through http: it may be that your network blocks the git protocol/port: git clone http://github.com/numpy/numpy.git You should also avoid using sudo to clone repositories (although it is unlikely to be the cause of your issue). cheers, David From hector1618 at gmail.com Sun Jan 16 10:19:19 2011 From: hector1618 at gmail.com (Hector troy) Date: Sun, 16 Jan 2011 20:49:19 +0530 Subject: [Numpy-discussion] Getting a clone copy of the NumPy repository. In-Reply-To: References: Message-ID: On Sun, Jan 16, 2011 at 8:31 PM, David Cournapeau wrote: > On Sun, Jan 16, 2011 at 10:09 PM, Hector troy > wrote: > > Hello everyone, I am a newbie on this open source world, and sincerely > > trying to make contribution to the development of Numpy. I was trying to > > learn about making patches from > > http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html but unable to > get > > the clone of Numpy repository. > > > > In the terminal error massage shown is - > > > > $hector at hector:~$ sudo git clone git://github.com/numpy/numpy.git > > [sudo] password for hector: > > Initialized empty Git repository in /home/hector/numpy/.git/ > > github.com[0: 207.97.227.239]: errno=Connection timed out > > fatal: unable to connect a socket (Connection timed out) > > $hector at hector:~$ > > You should try through http: it may be that your network blocks the > git protocol/port: > > git clone http://github.com/numpy/numpy.git > > You should also avoid using sudo to clone repositories (although it is > unlikely to be the cause of your issue). > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Mr. David, Thanks a lot, it worked. Hope you will help me with future quarries. Thanks again, -- -Regards Hector Whenever you think you can or you can't, in either way you are right. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zauborg at yahoo.com Sun Jan 16 13:50:12 2011 From: zauborg at yahoo.com (zb) Date: Sun, 16 Jan 2011 10:50:12 -0800 (PST) Subject: [Numpy-discussion] defmatrix 1.3 versus 1.5 Message-ID: <962629.93144.qm@web121604.mail.ne1.yahoo.com> Hi I am trying to compile a py2exe app. It works with numpy 1.3 but if I try numpy 1.5 I get an error when loading the app.exe Traceback (most recent call last): File "artisan.pyw", line 109, in File "numpy\__init__.pyo", line 136, in File "numpy\add_newdocs.pyo", line 9, in File "numpy\lib\__init__.pyo", line 5, in File "numpy\lib\index_tricks.pyo", line 15, in ImportError: No module named defmatrix I've noticed that there is a change in the location of defmatrix from 1.3 to the newer 1.5. 1.3 numpy/core/defmatrix 1.5 numpy/matrixlib/defmatrix How could I make py2exe work? Any tips? Thanks From cycomanic at gmail.com Sun Jan 16 18:23:59 2011 From: cycomanic at gmail.com (=?ISO-8859-1?Q?Jochen_Schr=F6der?=) Date: Mon, 17 Jan 2011 10:23:59 +1100 Subject: [Numpy-discussion] NOOB Alert: Looping Through Text Files... In-Reply-To: <30676099.post@talk.nabble.com> References: <30676099.post@talk.nabble.com> Message-ID: <4D337E0F.3070206@gmail.com> On 15/01/11 09:52, dstaley wrote: > > Warning, I am a python noob. Not only do I not know python, I really don't > know anything about programming outside of ArcInfo and the ancient AML > language. Regardless, here is my problem.... > > Let's say I have three text files (test1.txt, test2.txt and test3.txt). > Each text file has 1 line of text in it "This is my text file", to which I > want to add (append) a new line of text saying "I suck at Python and need > help", and then save the file with a suffix attached (eg test1_modified.txt, > test2_modified.txt, test3_modified.txt). > > I guess this is the equivalent of making a change in MS Word and using the > "Save As..." command to maintain the integrity of the original file, and > save the changes in a new file. But, I want to also do that in a loop (this > is a simplified example of something I want to do with hundreds of text > files). > > Now, I understand how to add this line to an existing text file: > > text_file = open("test1.txt", "a") > text_file.write("\nI suck at Python and need help") > text_file.close() > > While this adds a line of text, it saves the change to the original file > (does not add the _modified.txt suffix to the file name), nor does it allow > me to loop through all three of the files. > > I'm sure this is an easy thing to do, and is online in a million places. > Unfortunately, I just cannot seem to find the answer. Here is my thought > process: > > First i would define the list from which I would loop: > > textlist = ["test1.txt", "test2.txt", "test3.txt"] > > for i in textlist: > text_file = open(textlist, "a") ^^^^^^^^ This is your problem. You create the textfile list, and then loop over the list. Now the is are your elements of the list, however you are passing the list to the open function. That is what the error says, it expects a string but found a list. The better way to do this would be: for filename in textlist: text_file = open(filename, "a") ... > text_file.write("\nI suck at Python and need help") > text_file.close() > > But, this doesn't work. It gives me the error: > > coercing to Unicode: need string or buffer, list found > > SO, I guess I need to do this from something other than a list? > > Even if it did work, it does not suit my needs as it does not create a new > file and does not allow me to add the _modified.txt suffix, which will allow > me to keep the original file intact. There are a number of ways to to do this and what the best way is might depend on your text files. If they are very short the easiest way is to just read the content of your text files and write the content to a different file. something like this: for fn in textlist: fp = open(fn, 'r') fp.close() s = fp.read() s += "I suck at Python and need help") fp_new = open(fn[:-4]+'_modified.txt','w') fp_new.write(s) fp_new.close() Just for the future this is the numpy list, which is for discussing issues relating to the numpy python module, the next time you might want to post a question like this to one of the python beginners lists. This link might get you started: http://wiki.python.org/moin/BeginnersGuide Cheers Jochen > >> From a responses to a previous post, this seems as if it may have something > to do with a python dictionary, but I'm not sure. > > I'm probably totally off on how this should even be written, so any advice > or suggestions would be greatly appreciated. > > Thanks in advance for your help! > > -DS > From ater1980 at gmail.com Mon Jan 17 02:25:45 2011 From: ater1980 at gmail.com (Alex Ter-Sarkissov) Date: Mon, 17 Jan 2011 20:25:45 +1300 Subject: [Numpy-discussion] http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054512.html Message-ID: hi thanks I've sorted ou the issue Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.holderness at newcastle.ac.uk Mon Jan 17 07:35:56 2011 From: tom.holderness at newcastle.ac.uk (Tom Holderness) Date: Mon, 17 Jan 2011 12:35:56 +0000 Subject: [Numpy-discussion] find machine maximum numpy array size In-Reply-To: <8643A9A5465D9C4F8C6DB03185B82A277240F0687E@EXSAN01.campus.ncl.ac.uk> References: <8643A9A5465D9C4F8C6DB03185B82A277240F0687A@EXSAN01.campus.ncl.ac.uk> <8643A9A5465D9C4F8C6DB03185B82A277240F0687E@EXSAN01.campus.ncl.ac.uk> Message-ID: <8643A9A5465D9C4F8C6DB03185B82A277240F0687F@EXSAN01.campus.ncl.ac.uk> Hi, How do I find the maximum possible array size for a given data type on a given architecture? For example if I do the following on a 32-bit Windows machine: matrix = np.zeros((8873,9400),np.dtype('f8')) I get, Traceback (most recent call last): File "", line 1, in matrix = np.zeros((8873,9400),np.dtype('f8')) MemoryError If I reduce the matrix size then it works. However, if I run the original command on an equivalent 32-bit Linux machine this works fine (presumably some limit of memory allocation in the Windows kernel? I tested increasing the available RAM and it doesn't solve the problem). Is there a way I can find this limit? When distributing software to users (who all run different architectures) it would be great if we could check this before running the process and catch the error before the user hits "run". Many thanks in advance, Tom From numpy-discussion at maubp.freeserve.co.uk Mon Jan 17 09:05:41 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Jan 2011 14:05:41 +0000 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? Message-ID: Hi all, Are there plans to provide official 64bit Windows installers for NumPy? I'm aware that Christoph Gohlke had been able to do this, since he offers unofficial plain builds and MKL builds for NumPy here: http://www.lfd.uci.edu/~gohlke/pythonlibs/ Regards, Peter From robert.kern at gmail.com Mon Jan 17 10:29:27 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 17 Jan 2011 09:29:27 -0600 Subject: [Numpy-discussion] find machine maximum numpy array size In-Reply-To: <8643A9A5465D9C4F8C6DB03185B82A277240F0687F@EXSAN01.campus.ncl.ac.uk> References: <8643A9A5465D9C4F8C6DB03185B82A277240F0687A@EXSAN01.campus.ncl.ac.uk> <8643A9A5465D9C4F8C6DB03185B82A277240F0687E@EXSAN01.campus.ncl.ac.uk> <8643A9A5465D9C4F8C6DB03185B82A277240F0687F@EXSAN01.campus.ncl.ac.uk> Message-ID: On Mon, Jan 17, 2011 at 06:35, Tom Holderness wrote: > Hi, > > How do I find the maximum possible array size for a given data type on a given architecture? > For example if I do the following on a 32-bit Windows machine: > > matrix = np.zeros((8873,9400),np.dtype('f8')) > > I get, > Traceback (most recent call last): > ?File "", line 1, in > ? ?matrix = np.zeros((8873,9400),np.dtype('f8')) > MemoryError > > If I reduce the matrix size then it works. > However, if I run the original command on an equivalent 32-bit Linux machine this works fine (presumably some limit of memory allocation in the Windows kernel? I tested increasing the available RAM and it doesn't solve the problem). > > Is there a way I can find this limit? When distributing software to users (who all run different architectures) it would be great if we could check this before running the process and catch the error before the user hits "run". No, there is no way to know a priori. It's not just dependent on the system, but also on where memory is currently allocated. You could in principle determine the maximum size of the address space from the system. However, if you have, say, 3 Gb of address space for your process, and you allocate two blocks of 1 Gb each, those allocations may not be right next to each other. If the blocks start on 0.5 Gb and 2.0 Gb respectively, there is 1 Gb of free address space, but broken up into two 0.5 Gb blocks. You could not allocate a third 1 Gb block because there is nowhere to put it. You can detect this problem early by doing an np.empty() of the right size early in the program. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From domors at gmx.net Mon Jan 17 11:02:43 2011 From: domors at gmx.net (Stefan Reiterer) Date: Mon, 17 Jan 2011 17:02:43 +0100 Subject: [Numpy-discussion] Strange behaviour with for loops + numpy arrays Message-ID: <20110117160243.164370@gmx.net> Hi all! I made some "performance" tests with numpy to compare numpy on one cpu with mpi on 4 processesors, and something appears quite strange to me: I have the following code: N = 2**10*4 K = 16000 x = numpy.random.randn(N).astype(numpy.float32) x *= 10**10 print "x:", x t1 = time.time() #do something... for k in xrange(K): x *= 0.99 print "altered x:", x t = time.time() - t1 print "# loops:", K, "time needed:", t, " s " # loops: 1000 time needed: 0.0134310722351 s # loops: 2000 time needed: 0.028107881546 s # loops: 4000 time needed: 0.0367569923401 s # loops: 8000 time needed: 0.075756072998 s # loops: 16000 time needed: 2.11396384239 s So for K = 16000 it didn't need twice the amount of time as expected, it took 20 x more time! After that jump it seem to "normalize" # loops: 32000 time needed: 8.25508499146 s # loops: 64000 time needed: 20.5365290642 s First I suspected xrange was the culprit, but if I tried k = 0 while k < K: x *= 0.99 it changed anything. When I tried simply a=0 for k in xrange(K): a = a+1 none of the effects above triggered, so I suspect that numpy has to be involved. My Hardware is 2.3 GHz Intel Dual Core, 2 GB Ram and Ubuntu 10.04. For my tests I tried it with Python 2.6, and Sage 4.6. (which uses 2.6 too) Also changing the size of arrays or changing the computer didn't help. Has anyone an Idea what had could happen? Kind regards, maldun -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de From opossumnano at gmail.com Mon Jan 17 11:01:28 2011 From: opossumnano at gmail.com (Tiziano Zito) Date: Mon, 17 Jan 2011 17:01:28 +0100 Subject: [Numpy-discussion] MDP release 3.0 Message-ID: <20110117160128.GC25627@tulpenbaum.cognition.tu-berlin.de> We are glad to announce release 3.0 of the Modular toolkit for Data Processing (MDP). MDP is a Python library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software. The base of available algorithms includes signal processing methods (Principal Component Analysis, Independent Component Analysis, Slow Feature Analysis), manifold learning methods ([Hessian] Locally Linear Embedding), several classifiers, probabilistic methods (Factor Analysis, RBM), data pre-processing methods, and many others. What's new in version 3.0? -------------------------- - Python 3 support - New extensions: caching and gradient - Automatically generated wrappers for scikits.learn algorithms - Shogun and libsvm wrappers - New algorithms: convolution, several classifiers and several user-contributed nodes - Several new examples on the homepage - Improved and expanded tutorial - Several improvements and bug fixes - New license: MDP goes BSD! Resources --------- Download: http://sourceforge.net/projects/mdp-toolkit/files Homepage: http://mdp-toolkit.sourceforge.net Mailing list: http://lists.sourceforge.net/mailman/listinfo/mdp-toolkit-users Acknowledgments --------------- We thank the contributors to this release: Sven D?hne, Alberto Escalante, Valentin Haenel, Yaroslav Halchenko, Sebastian H?fer, Michael Hull, Samuel John, Jos? Quesada, Ariel Rokem, Benjamin Schrauwen, David Verstraeten, Katharina Maria Zeiner. The MDP developers, Pietro Berkes Zbigniew J?drzejewski-Szmek Rike-Benjamin Schuppner Niko Wilbert Tiziano Zito From josef.pktd at gmail.com Mon Jan 17 11:28:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Jan 2011 11:28:56 -0500 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: References: Message-ID: On Sat, Jan 15, 2011 at 3:27 PM, wrote: > After upgrading to numpy 1.5.1 I got caught by some depreciated > features. Given the depreciation policy of numpy, if we want to > support more than two versions of numpy, then we need some conditional > execution. > > Does anyone have any compatibility functions? > > I haven't looked at it carefully yet, but statsmodels might need > things like the following if we want to support numpy 1.3 > > ? ?if np.__version__ < '1.5': > ? ? ? ?freq,hsupp = np.histogram(rvs, histsupp, new=True) > ? ?else: > ? ? ? ?freq,hsupp = np.histogram(rvs,histsupp) > > matplotlib says it supports numpy >=1.1 but I didn't see any > compatibility code that I could "borrow". > Or do I worry for nothing? The compatibility.py in statsmodels is > still almost empty. for scipy.linalg, in numdifftools, I changed this in core (in my copy) if numpy.__version__ < '1.5': [qromb,rromb] = linalg.qr(rmat, econ=True) else: [qromb,rromb] = linalg.qr(rmat, mode='economic') > > Josef > From josef.pktd at gmail.com Mon Jan 17 11:32:12 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Jan 2011 11:32:12 -0500 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: References: Message-ID: On Mon, Jan 17, 2011 at 11:28 AM, wrote: > On Sat, Jan 15, 2011 at 3:27 PM, ? wrote: >> After upgrading to numpy 1.5.1 I got caught by some depreciated >> features. Given the depreciation policy of numpy, if we want to >> support more than two versions of numpy, then we need some conditional >> execution. >> >> Does anyone have any compatibility functions? >> >> I haven't looked at it carefully yet, but statsmodels might need >> things like the following if we want to support numpy 1.3 >> >> ? ?if np.__version__ < '1.5': >> ? ? ? ?freq,hsupp = np.histogram(rvs, histsupp, new=True) >> ? ?else: >> ? ? ? ?freq,hsupp = np.histogram(rvs,histsupp) >> >> matplotlib says it supports numpy >=1.1 but I didn't see any >> compatibility code that I could "borrow". >> Or do I worry for nothing? The compatibility.py in statsmodels is >> still almost empty. > > > for scipy.linalg, in numdifftools, I changed this in core (in my copy) > > ? ? ? ?if numpy.__version__ < '1.5': > ? ? ? ? ? ?[qromb,rromb] = linalg.qr(rmat, econ=True) > ? ? ? ?else: > ? ? ? ? ? ?[qromb,rromb] = linalg.qr(rmat, mode='economic') > which is of course silly, since this is for the scipy update >> >> Josef >> > From bsouthey at gmail.com Mon Jan 17 12:18:14 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Jan 2011 11:18:14 -0600 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: References: Message-ID: <4D3479D6.5020803@gmail.com> On 01/17/2011 10:32 AM, josef.pktd at gmail.com wrote: > On Mon, Jan 17, 2011 at 11:28 AM, wrote: >> On Sat, Jan 15, 2011 at 3:27 PM, wrote: >>> After upgrading to numpy 1.5.1 I got caught by some depreciated >>> features. Given the depreciation policy of numpy, if we want to >>> support more than two versions of numpy, then we need some conditional >>> execution. >>> >>> Does anyone have any compatibility functions? >>> >>> I haven't looked at it carefully yet, but statsmodels might need >>> things like the following if we want to support numpy 1.3 >>> >>> if np.__version__< '1.5': >>> freq,hsupp = np.histogram(rvs, histsupp, new=True) >>> else: >>> freq,hsupp = np.histogram(rvs,histsupp) >>> >>> matplotlib says it supports numpy>=1.1 but I didn't see any >>> compatibility code that I could "borrow". >>> Or do I worry for nothing? The compatibility.py in statsmodels is >>> still almost empty. >> >> for scipy.linalg, in numdifftools, I changed this in core (in my copy) >> >> if numpy.__version__< '1.5': >> [qromb,rromb] = linalg.qr(rmat, econ=True) >> else: >> [qromb,rromb] = linalg.qr(rmat, mode='economic') >> > which is of course silly, since this is for the scipy update >>> Josef Scipy release notes usually state the supported numpy version eg from the current 0.8.0 release notes "This release requires Python 2.4 - 2.6 and NumPy 1.4.1 or greater." Consequently if you want to support different numpy versions, then you will need to maintain your own branch with that type of patch. That can get rather complex to maintain. It would be better that you change the code calling numpy/scipy functions rather than the functions themselves such as passing the appropriate *args and **kwargs to the function. I would expect that a try/except block would be more general as well as numpy.__version__ being a str. Bruce From faltet at pytables.org Mon Jan 17 12:22:17 2011 From: faltet at pytables.org (Francesc Alted) Date: Mon, 17 Jan 2011 18:22:17 +0100 Subject: [Numpy-discussion] Strange behaviour with for loops + numpy arrays In-Reply-To: <20110117160243.164370@gmx.net> References: <20110117160243.164370@gmx.net> Message-ID: <201101171822.17398.faltet@pytables.org> A Monday 17 January 2011 17:02:43 Stefan Reiterer escrigu?: > Hi all! > > I made some "performance" tests with numpy to compare numpy on one > cpu with mpi on 4 processesors, and something appears quite strange > to me: > > I have the following code: > > N = 2**10*4 > K = 16000 > > x = numpy.random.randn(N).astype(numpy.float32) > x *= 10**10 > print "x:", x > t1 = time.time() > > #do something... > for k in xrange(K): > x *= 0.99 > > print "altered x:", x > > t = time.time() - t1 > print "# loops:", K, "time needed:", t, " s " > > # loops: 1000 time needed: 0.0134310722351 s > # loops: 2000 time needed: 0.028107881546 s > # loops: 4000 time needed: 0.0367569923401 s > # loops: 8000 time needed: 0.075756072998 s > # loops: 16000 time needed: 2.11396384239 s > > So for K = 16000 it didn't need twice the amount of time as expected, > it took 20 x more time! After that jump it seem to "normalize" > # loops: 32000 time needed: 8.25508499146 s > # loops: 64000 time needed: 20.5365290642 s > > First I suspected xrange was the culprit, but if I tried > k = 0 > while k < K: > x *= 0.99 > > it changed anything. > When I tried simply > a=0 > for k in xrange(K): > a = a+1 > > none of the effects above triggered, so I suspect that numpy has to > be involved. My Hardware is 2.3 GHz Intel Dual Core, 2 GB Ram and > Ubuntu 10.04. For my tests I tried it with Python 2.6, and Sage 4.6. > (which uses 2.6 too) > > Also changing the size of arrays or changing the computer didn't > help. > > Has anyone an Idea what had could happen? You are generating denormalized numbers: http://en.wikipedia.org/wiki/Denormal_number Many processors cannot deal efficiently with these beasts in hardware. You may want to convert these numbers to zero if you want more speed. -- Francesc Alted From ndbecker2 at gmail.com Mon Jan 17 13:12:38 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 17 Jan 2011 13:12:38 -0500 Subject: [Numpy-discussion] Prime size FFT: bluestein transform vs general chirp/z transform ? References: Message-ID: I just took a look at http://www.katjaas.nl/chirpZ/chirpZ2.html I'm VERY interested in the zoom. Does the code https://github.com/cournape/numpy/tree/bluestein implement the zoom feature? From zauborg at yahoo.com Mon Jan 17 13:21:05 2011 From: zauborg at yahoo.com (zb) Date: Mon, 17 Jan 2011 10:21:05 -0800 (PST) Subject: [Numpy-discussion] defmatrix 1.3 versus 1.5 In-Reply-To: <962629.93144.qm@web121604.mail.ne1.yahoo.com> Message-ID: <577377.49969.qm@web121601.mail.ne1.yahoo.com> I resolved the problem by commenting out two lines in my setup.py #"optimize":1, #"bundle_files": 2, The defmatrix lib was inside \lib\library.zip. However, the program.exe could not find it. Cheers --- On Sun, 1/16/11, zb wrote: > From: zb > Subject: [Numpy-discussion] defmatrix 1.3 versus 1.5 > To: numpy-discussion at scipy.org > Date: Sunday, January 16, 2011, 1:50 PM > Hi > > I am trying to compile a py2exe app. It works with numpy > 1.3 but if I try numpy 1.5 I get an error when loading the > app.exe > > Traceback (most recent call last): > ? File "artisan.pyw", line 109, in > ? File "numpy\__init__.pyo", line 136, in > > ? File "numpy\add_newdocs.pyo", line 9, in > > ? File "numpy\lib\__init__.pyo", line 5, in > > ? File "numpy\lib\index_tricks.pyo", line 15, in > > ImportError: No module named defmatrix > > I've noticed that there is a change in the location of > defmatrix from 1.3 to the newer 1.5. > > 1.3 numpy/core/defmatrix > > 1.5 numpy/matrixlib/defmatrix > > > How could I make py2exe work? Any tips? > > Thanks > > > > ? ? ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 17 13:27:08 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Jan 2011 13:27:08 -0500 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: <4D3479D6.5020803@gmail.com> References: <4D3479D6.5020803@gmail.com> Message-ID: On Mon, Jan 17, 2011 at 12:18 PM, Bruce Southey wrote: > On 01/17/2011 10:32 AM, josef.pktd at gmail.com wrote: >> On Mon, Jan 17, 2011 at 11:28 AM, ?wrote: >>> On Sat, Jan 15, 2011 at 3:27 PM, ?wrote: >>>> After upgrading to numpy 1.5.1 I got caught by some depreciated >>>> features. Given the depreciation policy of numpy, if we want to >>>> support more than two versions of numpy, then we need some conditional >>>> execution. >>>> >>>> Does anyone have any compatibility functions? >>>> >>>> I haven't looked at it carefully yet, but statsmodels might need >>>> things like the following if we want to support numpy 1.3 >>>> >>>> ? ? if np.__version__< ?'1.5': >>>> ? ? ? ? freq,hsupp = np.histogram(rvs, histsupp, new=True) >>>> ? ? else: >>>> ? ? ? ? freq,hsupp = np.histogram(rvs,histsupp) >>>> >>>> matplotlib says it supports numpy>=1.1 but I didn't see any >>>> compatibility code that I could "borrow". >>>> Or do I worry for nothing? The compatibility.py in statsmodels is >>>> still almost empty. >>> >>> for scipy.linalg, in numdifftools, I changed this in core (in my copy) >>> >>> ? ? ? ? if numpy.__version__< ?'1.5': >>> ? ? ? ? ? ? [qromb,rromb] = linalg.qr(rmat, econ=True) >>> ? ? ? ? else: >>> ? ? ? ? ? ? [qromb,rromb] = linalg.qr(rmat, mode='economic') >>> >> which is of course silly, since this is for the scipy update >>>> Josef > > Scipy release notes usually state the supported numpy version eg from > the current 0.8.0 release notes > "This release requires Python 2.4 - 2.6 and NumPy 1.4.1 or greater." > Consequently if you want to support different numpy versions, then you > will need to maintain your own branch with that type of patch. That can > get rather complex to maintain. I'm not doing the work of maintaining a scipy that conflicts with numpy. But *if* we want to support users that run numpy 1.3 with scipy 0.7, then we need to use different arguments for calls into numpy and scipy for depreciated and changed function arguments. > > It would be better that you change the code calling numpy/scipy > functions rather than the functions themselves such as passing the > appropriate *args and **kwargs to the function. > > I would expect that a try/except block would be more general as well as > numpy.__version__ being a str. Comparing strings is not a good idea, but I couldn't find anymore the function that parses a version string. As it might be obvious on the mailing list, I'm not a fan of frequent updates. With semi-annual releases, two versions only last a year. Josef > > > Bruce > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mmueller at python-academy.de Mon Jan 17 15:03:10 2011 From: mmueller at python-academy.de (=?ISO-8859-1?Q?Mike_M=FCller?=) Date: Mon, 17 Jan 2011 21:03:10 +0100 Subject: [Numpy-discussion] Scientific tools tutorial at PyCon US, March 9, 2011 In-Reply-To: References: Message-ID: <4D34A07E.4040705@python-academy.de> Scientific Python Tools not only for Scientists and Engineers ============================================================= This is the title of my three-hour tutorial at PyCon US: http://us.pycon.org/2011/schedule/sessions/164/ It is a compressed version of my much longer course about: * NumPy * SciPy * matplotlib/IPython * extensions with C and Fortran So if your are new to these tools and go to PyCon, you might consider taking the tutorial. Also, if you know somebody who would likely be interested in this tutorial, please spread the word. Thanks. Mike -- Mike M?ller mmueller at python-academy.de From domors at gmx.net Mon Jan 17 16:35:24 2011 From: domors at gmx.net (Stefan Reiterer) Date: Mon, 17 Jan 2011 22:35:24 +0100 Subject: [Numpy-discussion] Strange behaviour with for loops + numpy arrays In-Reply-To: <201101171822.17398.faltet@pytables.org> References: <20110117160243.164370@gmx.net> <201101171822.17398.faltet@pytables.org> Message-ID: <20110117213524.152930@gmx.net> Thanks that was the problem! You never stop to learn =) -------- Original-Nachricht -------- > Datum: Mon, 17 Jan 2011 18:22:17 +0100 > Von: Francesc Alted > An: Discussion of Numerical Python > Betreff: Re: [Numpy-discussion] Strange behaviour with for loops + numpy arrays > A Monday 17 January 2011 17:02:43 Stefan Reiterer escrigu?: > > Hi all! > > > > I made some "performance" tests with numpy to compare numpy on one > > cpu with mpi on 4 processesors, and something appears quite strange > > to me: > > > > I have the following code: > > > > N = 2**10*4 > > K = 16000 > > > > x = numpy.random.randn(N).astype(numpy.float32) > > x *= 10**10 > > print "x:", x > > t1 = time.time() > > > > #do something... > > for k in xrange(K): > > x *= 0.99 > > > > print "altered x:", x > > > > t = time.time() - t1 > > print "# loops:", K, "time needed:", t, " s " > > > > # loops: 1000 time needed: 0.0134310722351 s > > # loops: 2000 time needed: 0.028107881546 s > > # loops: 4000 time needed: 0.0367569923401 s > > # loops: 8000 time needed: 0.075756072998 s > > # loops: 16000 time needed: 2.11396384239 s > > > > So for K = 16000 it didn't need twice the amount of time as expected, > > it took 20 x more time! After that jump it seem to "normalize" > > # loops: 32000 time needed: 8.25508499146 s > > # loops: 64000 time needed: 20.5365290642 s > > > > First I suspected xrange was the culprit, but if I tried > > k = 0 > > while k < K: > > x *= 0.99 > > > > it changed anything. > > When I tried simply > > a=0 > > for k in xrange(K): > > a = a+1 > > > > none of the effects above triggered, so I suspect that numpy has to > > be involved. My Hardware is 2.3 GHz Intel Dual Core, 2 GB Ram and > > Ubuntu 10.04. For my tests I tried it with Python 2.6, and Sage 4.6. > > (which uses 2.6 too) > > > > Also changing the size of arrays or changing the computer didn't > > help. > > > > Has anyone an Idea what had could happen? > > You are generating denormalized numbers: > > http://en.wikipedia.org/wiki/Denormal_number > > Many processors cannot deal efficiently with these beasts in hardware. > You may want to convert these numbers to zero if you want more speed. > > -- > Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl From charlesr.harris at gmail.com Tue Jan 18 08:02:16 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Jan 2011 06:02:16 -0700 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: I believe the problem has been been 64 bit fortran for ATLAS, the mingw version has/had problems. A plain build using the MS compilers works fine without ATLAS. Chuck On Mon, Jan 17, 2011 at 7:05 AM, Peter < numpy-discussion at maubp.freeserve.co.uk> wrote: > Hi all, > > Are there plans to provide official 64bit Windows installers for NumPy? > > I'm aware that Christoph Gohlke had been able to do this, since > he offers unofficial plain builds and MKL builds for NumPy here: > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > Regards, > > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Jan 18 10:09:45 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 18 Jan 2011 23:09:45 +0800 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: References: <4D3479D6.5020803@gmail.com> Message-ID: On Tue, Jan 18, 2011 at 2:27 AM, wrote: > On Mon, Jan 17, 2011 at 12:18 PM, Bruce Southey > wrote: > > > > Scipy release notes usually state the supported numpy version eg from > > the current 0.8.0 release notes > > "This release requires Python 2.4 - 2.6 and NumPy 1.4.1 or greater." > > Consequently if you want to support different numpy versions, then you > > will need to maintain your own branch with that type of patch. That can > > get rather complex to maintain. > > I'm not doing the work of maintaining a scipy that conflicts with numpy. > But *if* we want to support users that run numpy 1.3 with scipy 0.7, > then we need to use different arguments for calls into numpy and > scipy for depreciated and changed function arguments. > > > > > It would be better that you change the code calling numpy/scipy > > functions rather than the functions themselves such as passing the > > appropriate *args and **kwargs to the function. > > > > I would expect that a try/except block would be more general as well as > > numpy.__version__ being a str. > > Comparing strings is not a good idea, but I couldn't find anymore the > function that parses a version string. > There's parse_numpy_version in pavement.py. The relevant lines are: a = re.compile("^([0-9]+)\.([0-9]+)\.([0-9]+)") return tuple([int(i) for i in a.match(out).groups()[:3]]) > As it might be obvious on the mailing list, I'm not a fan of frequent > updates. With semi-annual releases, two versions only last a year. > > Maybe you don't like the deprecation policy, but how can frequent (if semi-annual can be called frequent) releases be a bad thing? No one likes to write code that doesn't get released for ages. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 18 10:22:58 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 18 Jan 2011 10:22:58 -0500 Subject: [Numpy-discussion] compatibility for supporting more than 2 versions of numpy In-Reply-To: References: <4D3479D6.5020803@gmail.com> Message-ID: On Tue, Jan 18, 2011 at 10:09 AM, Ralf Gommers wrote: > > > On Tue, Jan 18, 2011 at 2:27 AM, wrote: >> >> On Mon, Jan 17, 2011 at 12:18 PM, Bruce Southey >> wrote: >> > >> > Scipy release notes usually state the supported numpy version eg from >> > the current 0.8.0 release notes >> > "This release requires Python 2.4 - 2.6 and NumPy 1.4.1 or greater." >> > Consequently if you want to support different numpy versions, then you >> > will need to maintain your own branch with that type of patch. That can >> > get rather complex to maintain. >> >> I'm not doing the work of maintaining a scipy that conflicts with numpy. >> But *if* we want to support users that run numpy 1.3 with scipy 0.7, >> then we need to use different arguments for calls into numpy and >> scipy for depreciated and changed function arguments. >> >> > >> > It would be better that you change the code calling numpy/scipy >> > functions rather than the functions themselves such as passing the >> > appropriate *args and **kwargs to the function. >> > >> > I would expect that a try/except block would be more general as well as >> > numpy.__version__ being a str. >> >> Comparing strings is not a good idea, but I couldn't find anymore the >> function that parses a version string. > > There's parse_numpy_version in pavement.py. The relevant lines are: > ? a = re.compile("^([0-9]+)\.([0-9]+)\.([0-9]+)") > ? return tuple([int(i) for i in a.match(out).groups()[:3]]) > > >> >> As it might be obvious on the mailing list, I'm not a fan of frequent >> updates. With semi-annual releases, two versions only last a year. >> > Maybe you don't like the deprecation policy, but how can frequent (if > semi-annual can be called frequent) releases be a bad thing? No one likes to > write code that doesn't get released for ages. Sorry, this was an ambiguous phrasing. I meant I don't like to update *my* computer very often, because I never know how much time it will take to get everything compatible again. I'm not criticizing the release policy, and I think you are doing a very good job (much better than we do with statsmodels.) Josef > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From jhtu at Princeton.EDU Tue Jan 18 11:39:16 2011 From: jhtu at Princeton.EDU (Jonathan Tu) Date: Tue, 18 Jan 2011 11:39:16 -0500 Subject: [Numpy-discussion] Installing numpy Message-ID: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> Hi, I need to reinstall numpy because the cluster I am using was recently overhauled. I am wondering if numpy works with Python 2.7 now. Also, I would like numpy to run as fast as possible. The last time I did this, I was advised to install ATLAS by hand, as the one that comes with RHEL is not suitable. The first time I tried this, I kept running into problems that I think were due to mismatched fortran compilers. Is there a good resource for how to do this? I am fairly new to Linux. Thanks, Jonathan Tu From numpy-discussion at maubp.freeserve.co.uk Tue Jan 18 12:24:13 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Tue, 18 Jan 2011 17:24:13 +0000 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Tue, Jan 18, 2011 at 1:02 PM, Charles R Harris wrote: > I believe the problem has been been 64 bit fortran for ATLAS, the mingw > version has/had problems. A plain build using the MS compilers works fine > without ATLAS. > > Chuck Do you think there would be interest/demand for official non-ATLAS binaries as a short term solution? I'm thinking also of 3rd party Python libraries that use NumPy, and if they/we can ship a win64 installer if NumPy doesn't. Peter From ischnell at enthought.com Tue Jan 18 15:28:26 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Tue, 18 Jan 2011 14:28:26 -0600 Subject: [Numpy-discussion] Installing numpy In-Reply-To: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> Message-ID: Hello Jonathan, yes, numpy work fine under Python 2.7 now. I don't see why building numpy against the system ATLAS should not work, as long as you install the developer version with the header files, and make sure that you edit the site.cfg file correct. - Ilan On Tue, Jan 18, 2011 at 10:39 AM, Jonathan Tu wrote: > Hi, > > I need to reinstall numpy because the cluster I am using was recently > overhauled. ?I am wondering if numpy works with Python 2.7 now. > > Also, I would like numpy to run as fast as possible. ?The last time I > did this, I was advised to install ATLAS by hand, as the one that > comes with RHEL is not suitable. ?The first time I tried this, I kept > running into problems that I think were due to mismatched fortran > compilers. ?Is there a good resource for how to do this? ?I am fairly > new to Linux. > > > > Thanks, > > > > Jonathan Tu > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jhtu at princeton.edu Tue Jan 18 15:31:21 2011 From: jhtu at princeton.edu (Jonathan Tu) Date: Tue, 18 Jan 2011 15:31:21 -0500 Subject: [Numpy-discussion] Installing numpy In-Reply-To: References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> Message-ID: <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Hi, I realized that my cluster has MKL installed. I've been trying to install against MKL, but am having trouble getting this to work. After it finishes, I do import numpy numpy.show_config() and nothing about the MKL libraries shows up. I have edited site.cfg to read like this: [mkl] library_dirs = /opt/intel/mkl/10.2.4.032/lib/em64t include_dirs = /opt/intel/mkl/10.2.4.032/include lapack_libs = mkl_lapack mkl_libs = mkl, guide My cluster is using Intel Xeon processors, and I edited cc_exe as follows cc_exe = 'icc -O2 -fPIC' Then I did python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install --prefix=/home/jhtu/local Jonathan Tu On Jan 18, 2011, at 3:28 PM, Ilan Schnell wrote: > Hello Jonathan, > > yes, numpy work fine under Python 2.7 now. I don't see why building > numpy against the system ATLAS should not work, as long as you > install the developer version with the header files, and make sure > that > you edit the site.cfg file correct. > > - Ilan > > On Tue, Jan 18, 2011 at 10:39 AM, Jonathan Tu > wrote: >> Hi, >> >> I need to reinstall numpy because the cluster I am using was recently >> overhauled. I am wondering if numpy works with Python 2.7 now. >> >> Also, I would like numpy to run as fast as possible. The last time I >> did this, I was advised to install ATLAS by hand, as the one that >> comes with RHEL is not suitable. The first time I tried this, I kept >> running into problems that I think were due to mismatched fortran >> compilers. Is there a good resource for how to do this? I am fairly >> new to Linux. >> >> >> >> Thanks, >> >> >> >> Jonathan Tu >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ischnell at enthought.com Tue Jan 18 15:39:16 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Tue, 18 Jan 2011 14:39:16 -0600 Subject: [Numpy-discussion] Installing numpy In-Reply-To: <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Message-ID: The MKL configuration looks right, except that I had to use: mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5 During the build process, it should tell you what it is linking aginast. Look at the compiler options passed to icc. - Ilan On Tue, Jan 18, 2011 at 2:31 PM, Jonathan Tu wrote: > Hi, > > I realized that my cluster has MKL installed. ?I've been trying to > install against MKL, but am having trouble getting this to work. > After it finishes, I do > > import numpy > numpy.show_config() > > and nothing about the MKL libraries shows up. ?I have edited site.cfg > to read like this: > > [mkl] > library_dirs = /opt/intel/mkl/10.2.4.032/lib/em64t > include_dirs = /opt/intel/mkl/10.2.4.032/include > lapack_libs = mkl_lapack > mkl_libs = mkl, guide > > My cluster is using Intel Xeon processors, and I edited cc_exe as > follows > > cc_exe = 'icc -O2 -fPIC' > > Then I did > > python setup.py config --compiler=intel build_clib --compiler=intel > build_ext --compiler=intel install --prefix=/home/jhtu/local > > > > > Jonathan Tu > > > > > On Jan 18, 2011, at 3:28 PM, Ilan Schnell wrote: > >> Hello Jonathan, >> >> yes, numpy work fine under Python 2.7 now. ?I don't see why building >> numpy against the system ATLAS should not work, as long as you >> install the developer version with the header files, and make sure >> that >> you edit the site.cfg file correct. >> >> - Ilan >> >> On Tue, Jan 18, 2011 at 10:39 AM, Jonathan Tu >> wrote: >>> Hi, >>> >>> I need to reinstall numpy because the cluster I am using was recently >>> overhauled. ?I am wondering if numpy works with Python 2.7 now. >>> >>> Also, I would like numpy to run as fast as possible. ?The last time I >>> did this, I was advised to install ATLAS by hand, as the one that >>> comes with RHEL is not suitable. ?The first time I tried this, I kept >>> running into problems that I think were due to mismatched fortran >>> compilers. ?Is there a good resource for how to do this? ?I am fairly >>> new to Linux. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Jonathan Tu >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jhtu at princeton.edu Tue Jan 18 16:29:14 2011 From: jhtu at princeton.edu (Jonathan Tu) Date: Tue, 18 Jan 2011 16:29:14 -0500 Subject: [Numpy-discussion] Installing numpy In-Reply-To: References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Message-ID: Hi, I have installed numpy but the unit tests fail. When I ran them, I got Traceback (most recent call last): File "/home/jhtu/local/lib/python2.7/site-packages/numpy/testing/ decorators.py", line 215, in knownfailer return f(*args, **kwargs) File "/home/jhtu/local/lib/python2.7/site-packages/numpy/core/tests/ test_umath_complex.py", line 312, in test_special_values assert_almost_equal(np.log(np.conj(xa[i])), np.conj(np.log(xa[i]))) File "/home/jhtu/local/lib/python2.7/site-packages/numpy/testing/ utils.py", line 443, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([-inf+3.14159265j]) DESIRED: array([-inf-3.14159265j]) This was with numpy built against MKL. To install I modified site.cfg to read [mkl] library_dirs = /opt/intel/mkl/10.2.4.032/lib/em64t include_dirs = /opt/intel/mkl/10.2.4.032/include lapack_libs = mkl_lapack mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core My cluster is using Intel Xeon processors, and I edited cc_exe as follows cc_exe = 'icc -O2 -fPIC' I installed using python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install --prefix=/home/jhtu/local Jonathan Tu On Jan 18, 2011, at 3:39 PM, Ilan Schnell wrote: > The MKL configuration looks right, except that I had to use: > mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5 > > During the build process, it should tell you what it is linking > aginast. Look at the compiler options passed to icc. > > - Ilan > > On Tue, Jan 18, 2011 at 2:31 PM, Jonathan Tu > wrote: >> Hi, >> >> I realized that my cluster has MKL installed. I've been trying to >> install against MKL, but am having trouble getting this to work. >> After it finishes, I do >> >> import numpy >> numpy.show_config() >> >> and nothing about the MKL libraries shows up. I have edited site.cfg >> to read like this: >> >> [mkl] >> library_dirs = /opt/intel/mkl/10.2.4.032/lib/em64t >> include_dirs = /opt/intel/mkl/10.2.4.032/include >> lapack_libs = mkl_lapack >> mkl_libs = mkl, guide >> >> My cluster is using Intel Xeon processors, and I edited cc_exe as >> follows >> >> cc_exe = 'icc -O2 -fPIC' >> >> Then I did >> >> python setup.py config --compiler=intel build_clib --compiler=intel >> build_ext --compiler=intel install --prefix=/home/jhtu/local >> >> >> >> >> Jonathan Tu >> >> >> >> >> On Jan 18, 2011, at 3:28 PM, Ilan Schnell wrote: >> >>> Hello Jonathan, >>> >>> yes, numpy work fine under Python 2.7 now. I don't see why building >>> numpy against the system ATLAS should not work, as long as you >>> install the developer version with the header files, and make sure >>> that >>> you edit the site.cfg file correct. >>> >>> - Ilan >>> >>> On Tue, Jan 18, 2011 at 10:39 AM, Jonathan Tu >>> wrote: >>>> Hi, >>>> >>>> I need to reinstall numpy because the cluster I am using was >>>> recently >>>> overhauled. I am wondering if numpy works with Python 2.7 now. >>>> >>>> Also, I would like numpy to run as fast as possible. The last >>>> time I >>>> did this, I was advised to install ATLAS by hand, as the one that >>>> comes with RHEL is not suitable. The first time I tried this, I >>>> kept >>>> running into problems that I think were due to mismatched fortran >>>> compilers. Is there a good resource for how to do this? I am >>>> fairly >>>> new to Linux. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> Jonathan Tu >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bergstrj at iro.umontreal.ca Tue Jan 18 17:19:28 2011 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Tue, 18 Jan 2011 17:19:28 -0500 Subject: [Numpy-discussion] numpy.max(0, 1e-6) == 0 (!?) In-Reply-To: References: Message-ID: I find that "numpy.max(0, 1e-6) == 0" is confusing, because it makes bugs hard to spot. The doc says that the second argument to max is an optional integer. My bad. But could the function raise an error if it is passed an invalid 'axis' argument? That would have helped. James -- http://www-etud.iro.umontreal.ca/~bergstrj -------------- next part -------------- An HTML attachment was scrubbed... URL: From vvoznesensky at gmail.com Wed Jan 19 06:02:28 2011 From: vvoznesensky at gmail.com (Vladimir Voznesensky) Date: Wed, 19 Jan 2011 14:02:28 +0300 Subject: [Numpy-discussion] OpenMP-ficated loops Message-ID: <4D36C4C4.6020506@gmail.com> Hello. I've hacked numpy/core/src/umath/loops.c.src to use OMP acceleration. I's a little bit hairy, some things are very suboptimal, but it helps me, so I've decided to share my work as is. Note, that with errstate(...) does not work with my stuff. Use -O2 or less for Intel compiler. Always run test suite after re-compilation. Please, found the hacked file in the attachement. Cheers, VV -------------- next part -------------- A non-text attachment was scrubbed... Name: loops.c.src Type: application/x-wais-source Size: 44812 bytes Desc: not available URL: From cournape at gmail.com Wed Jan 19 09:36:52 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 19 Jan 2011 23:36:52 +0900 Subject: [Numpy-discussion] Prime size FFT: bluestein transform vs general chirp/z transform ? In-Reply-To: References: Message-ID: On Sun, Jan 16, 2011 at 2:22 PM, Ralf Gommers wrote: > > > On Mon, Jan 3, 2011 at 2:46 PM, David Cournapeau wrote: >> >> Hi, >> >> I finally took the time to clean up my code to speed up prime-size FFT >> (which use a O(N^2) algo in both numpy and scipy). The code is there: >> https://github.com/cournape/numpy/tree/bluestein (most of the code is >> tests, because numpy.fft had almost none). >> >> >> Bottom line: it is used only for prime numbers, and is faster than the >> current code for complex transforms > 500. Because of python + >> inherent bluestein overhead, this is mostly useful for "long" fft >> (where the speed up is significant - already 100x speed up for prime >> size ~ 50000). > > Very nice, works like a charm for me! > >> >> Several comments: >> ?- the overhead is pretty significant (on my machine, bluestein >> transfrom is slower for prime size < 500) >> ?- it could be used as such for real transforms, but the overhead >> would be even more significant (there is no bluestein transform for >> real transforms, so one needs to re-rexpress real transforms in term >> of complex ones, multiplying the overhead by 2x). There are several >> alternatives to make things faster (Rader-like transform, as used by >> fftw), but I think this would be quite hard to do in python without >> significant slowdown, because the code cannot be vectorized. >> ?- one could also decide to provide a chirp-z transform, of which >> Bluestein transform is a special case. Maybe this is more adapted to >> scipy ? > > This is just terminology, but according to Wikipedia the Bluestein transform > is the chirp-z transform, which is a special case of the z-transform. Is > that what you meant? Right, I meant z-transform. > I'm guessing you are talking about code that allows you to use the Bluestein > algorithm also for non-prime sizes where it makes sense, for example to > speed up the second case of this: > > In [24]: x = np.random.random(5879)? # a large prime This is for the real transform case, where the reindexing step requires to find a generator of Z/nZ. As for dealing with non prime sizes, I am not sure what to do: there is the speed issue, but the precision issue as well. I am currently doing some more thorough tests across various sizes to make sure bluestein transforms do not cause loss of precisions cheers, David From jsalvati at u.washington.edu Wed Jan 19 10:43:00 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 19 Jan 2011 07:43:00 -0800 Subject: [Numpy-discussion] adding numexpr user-provided ufunc evaluation Message-ID: I am thinking about building a patch to implement user-provided ufunc evaluation for numexpr as was discussed in this thread http://www.mail-archive.com/numpy-discussion at scipy.org/msg26292.html. Does anyone have any advice for learning/understanding the numexpr source and/or suggestions about implementation? All advice is helpful :) Francesc Alted mentioned that Mark Wiebe's other NEP https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/deferred-ufunc-evaluation.rst, would provide this feature (along with many other benefits). It looks really amazing, but I don't have a good sense of how developed it is. Best Regards, John Salvatier -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhtu at princeton.edu Wed Jan 19 17:24:41 2011 From: jhtu at princeton.edu (Jonathan Tu) Date: Wed, 19 Jan 2011 17:24:41 -0500 Subject: [Numpy-discussion] MKL libraries can't be found In-Reply-To: References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Message-ID: Hi, I am trying to install numpy with the MKL libraries available on my cluster. Most of the libraries are available in one directory, but the iomp5 library is in another. /opt/intel/Compiler/11.1/072/mkl/lib/em64t ---> mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_def, mkl_mc /opt/intel/Compiler/11.1/072/lib/intel64 ---> iomp5 Using an older MKL library that was available, I found that when all libraries are in one directory, the install went through fine. But in this case it says the libraries cannot be found, even if I list both under the library_dirs in site.cfg [mkl] library_dirs = /opt/intel/Compiler/11.1/072/mkl/lib/em64t:/opt/intel/Compiler/11.1/072/lib/intel64 include_dirs = /opt/intel/Compiler/11.1/072/mkl/include lapack_libs = mkl_lapack mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_def, mkl_mc, iomp5 If I try to install without iomp5, then when I import numpy I get the following error /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.so: undefined symbol: omp_in_parallel Any ideas? I tried to put symbolic links to both library directories in one place, but that didn't work either. I'm trying to avoid creating a directory of symbolic links to every necessary library. Jonathan Tu From jhtu at princeton.edu Wed Jan 19 18:29:59 2011 From: jhtu at princeton.edu (Jonathan Tu) Date: Wed, 19 Jan 2011 18:29:59 -0500 Subject: [Numpy-discussion] Tests Fail with MKL Installed In-Reply-To: References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Message-ID: Hi, I installed numpy with MKL and found that the unit tests fail. In particular, I get the error message FAIL: test_special_values (test_umath_complex.TestClog) It says that this is a "known failure," specifically KNOWNFAIL=4. Is this ok? I saw from Googling that "The test failure indicates that your platform has a non-C99 compliant implementation of clog. Not fatal, but the test should be marked as a known failure on the platform." I'm not sure what this means. Would it be safer for my work to use a package w/o MKL that passes the tests? I'm currently benchmarking to see what the slowdown would really be. Jonathan Tu From cournape at gmail.com Wed Jan 19 19:59:12 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 20 Jan 2011 09:59:12 +0900 Subject: [Numpy-discussion] Prime size FFT: bluestein transform vs general chirp/z transform ? In-Reply-To: References: Message-ID: On Tue, Jan 18, 2011 at 3:12 AM, Neal Becker wrote: > I just took a look at > > http://www.katjaas.nl/chirpZ/chirpZ2.html > > I'm VERY interested in the zoom. ?Does the code > > https://github.com/cournape/numpy/tree/bluestein > > implement the zoom feature? No - the current code is only an implementation optimization to deal with fft size which cannot be factorized in small factors. cheers, David From charlesr.harris at gmail.com Wed Jan 19 20:26:00 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 19 Jan 2011 18:26:00 -0700 Subject: [Numpy-discussion] Tests Fail with MKL Installed In-Reply-To: References: <91254948-7598-4793-9055-80EC9D427DD6@Princeton.EDU> <08A709B5-3F86-4B28-8E9E-99D711024A62@princeton.edu> Message-ID: On Wed, Jan 19, 2011 at 4:29 PM, Jonathan Tu wrote: > Hi, > > I installed numpy with MKL and found that the unit tests fail. In > particular, I get the error message > > FAIL: test_special_values (test_umath_complex.TestClog) > > It says that this is a "known failure," specifically KNOWNFAIL=4. Is this > ok? I saw from Googling that "The test failure indicates that your platform > has a non-C99 compliant implementation of clog. Not fatal, but the test > should be marked as a known failure on the platform." > > I'm not sure what this means. Would it be safer for my work to use a > package w/o MKL that passes the tests? I'm currently benchmarking to see > what the slowdown would really be. > > > Don't worry about it, the test that failed is a corner case. Few, if any, libraries are fully c99 compliant for corner cases. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Thu Jan 20 05:11:38 2011 From: opossumnano at gmail.com (Tiziano Zito) Date: Thu, 20 Jan 2011 11:11:38 +0100 Subject: [Numpy-discussion] [Ann] EuroScipy 2011 - Call for papers Message-ID: <20110120101138.GE31049@tulpenbaum.cognition.tu-berlin.de> ========================= Announcing EuroScipy 2011 ========================= --------------------------------------------- The 4th European meeting on Python in Science --------------------------------------------- **Paris, Ecole Normale Sup?rieure, August 25-28 2011** We are happy to announce the 4th EuroScipy meeting, in Paris, August 2011. The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research. This event strives to bring together both users and developers of scientific tools, as well as academic research and state of the art industry. Main topics =========== - Presentations of scientific tools and libraries using the Python language, including but not limited to: - vector and array manipulation - parallel computing - scientific visualization - scientific data flow and persistence - algorithms implemented or exposed in Python - web applications and portals for science and engineering. - Reports on the use of Python in scientific achievements or ongoing projects. - General-purpose Python tools that can be of special interest to the scientific community. Tutorials ========= There will be two tutorial tracks at the conference, an introductory one, to bring up to speed with the Python language as a scientific tool, and an advanced track, during which experts of the field will lecture on specific advanced topics such as advanced use of numpy, scientific visualization, software engineering... Keynote Speaker: Fernando Perez =============================== We are excited to welcome Fernando Perez (UC Berkeley, Helen Wills Neuroscience Institute, USA) as our keynote speaker. Fernando Perez is the original author of the enhanced interactive python shell IPython and a very active contributor to the Python for Science ecosystem. Important dates =============== Talk submission deadline: Sunday May 8 Program announced: Sunday May 29 Tutorials tracks: Thursday August 25 - Friday August 26 Conference track: Saturday August 27 - Sunday August 28 Call for papers =============== We are soliciting talks that discuss topics related to scientific computing using Python. These include applications, teaching, future development directions, and research. We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis. We look forward to hearing your recent breakthroughs using Python! Submission guidelines ===================== - We solicit talk proposals in the form of a one-page long abstract. - Submissions whose main purpose is to promote a commercial product or service will be refused. - All accepted proposals must be presented at the EuroSciPy conference by at least one author. The one-page long abstracts are for conference planing and selection purposes only. We will later select papers for publication of post-proceedings in a peer-reviewed journal. How to submit an abstract ========================= To submit a talk to the EuroScipy conference follow the instructions here: http://www.euroscipy.org/card/euroscipy2011_call_for_papers Organizers ========== Chairs: - Ga?l Varoquaux (INSERM, Unicog team, and INRIA, Parietal team) - Nicolas Chauvat (Logilab) Local organization committee: - Emmanuelle Gouillart (Saint-Gobain Recherche) - Jean-Philippe Chauvat (Logilab) Tutorial chair: - Valentin Haenel (MKP, Technische Universit?t Berlin) Program committee: - Chair: Tiziano Zito (MKP, Technische Universit?t Berlin) - Romain Brette (ENS Paris, DEC) - Emmanuelle Gouillart (Saint-Gobain Recherche) - Eric Lebigot (Laboratoire Kastler Brossel, Universit? Pierre et Marie Curie) - Konrad Hinsen (Soleil Synchrotron, CNRS) - Hans Petter Langtangen (Simula laboratories) - Jarrod Millman (UC Berkeley, Helen Wills NeuroScience institute) - Mike M?ller (Python Academy) - Didrik Pinte (Enthought Inc) - Marc Poinot (ONERA) - Christophe Pradal (CIRAD/INRIA, Virtual Plantes team) - Andreas Schreiber (DLR) - St?fan van der Walt (University of Stellenbosch) Website ======= http://www.euroscipy.org/conference/euroscipy_2011 From dagss at student.matnat.uio.no Thu Jan 20 08:22:31 2011 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 20 Jan 2011 14:22:31 +0100 Subject: [Numpy-discussion] Cython workshop Message-ID: <4D383717.7060207@student.matnat.uio.no> As we have funding for it, we're talking about organizing a Cython workshop sometimes this year (possibly in Munich, Germany, though it's not decided yet). It's still not clear how user-centric vs. developer-centric the workshop will be, or how strong a role numerical computation will have vs. more general language features. We're just getting in touch with people potentially interested in joining the workshop, and then we'll take it from there. Respond on the wiki page or on cython-dev. http://wiki.cython.org/workshop1 Dag Sverre -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Jan 22 07:28:47 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 22 Jan 2011 20:28:47 +0800 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Wed, Jan 19, 2011 at 1:24 AM, Peter < numpy-discussion at maubp.freeserve.co.uk> wrote: > On Tue, Jan 18, 2011 at 1:02 PM, Charles R Harris > wrote: > > I believe the problem has been been 64 bit fortran for ATLAS, the mingw > > version has/had problems. A plain build using the MS compilers works fine > > without ATLAS. > > > > Chuck > > Do you think there would be interest/demand for official non-ATLAS > binaries as a short term solution? > I doubt this would add much to what's currently available unofficially. > > I'm thinking also of 3rd party Python libraries that use NumPy, and if > they/we can ship a win64 installer if NumPy doesn't. > This is no problem of course. If I were you though, I would first consider if it's not better to refer your users to the Enthought version, or the builds provided by Christoph Gohlke for example. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Sat Jan 22 09:20:43 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Sat, 22 Jan 2011 14:20:43 +0000 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Sat, Jan 22, 2011 at 12:28 PM, Ralf Gommers wrote: > > On Wed, Jan 19, 2011 at 1:24 AM, Peter > wrote: >> >> On Tue, Jan 18, 2011 at 1:02 PM, Charles R Harris >> wrote: >> > I believe the problem has been been 64 bit fortran for ATLAS, the mingw >> > version has/had problems. A plain build using the MS compilers works >> > fine without ATLAS. >> > >> > Chuck >> >> Do you think there would be interest/demand for official non-ATLAS >> binaries as a short term solution? > > I doubt this would add much to what's currently available unofficially. But it would be "official", which counts for something - especially in commercial setting. >> >> I'm thinking also of 3rd party Python libraries that use NumPy, and if >> they/we can ship a win64 installer if NumPy doesn't. > > This is no problem of course. If I were you though, I would first consider > if it's not better to refer your users to the Enthought version, or the > builds provided by Christoph Gohlke for example. We're currently pointing people on 64 bit Windows towards Christoph Gohlke's unofficial builds. I'd be quite happy if Christoph's 64bit NumPy installer was blessed as official and distributed via the NumPy website (but there may be technical issues I'm unaware of). Peter From ralf.gommers at googlemail.com Sat Jan 22 12:43:42 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 23 Jan 2011 01:43:42 +0800 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Sat, Jan 22, 2011 at 10:20 PM, Peter < numpy-discussion at maubp.freeserve.co.uk> wrote: > On Sat, Jan 22, 2011 at 12:28 PM, Ralf Gommers > wrote: > > > > On Wed, Jan 19, 2011 at 1:24 AM, Peter > > wrote: > >> > >> On Tue, Jan 18, 2011 at 1:02 PM, Charles R Harris > >> wrote: > >> > I believe the problem has been been 64 bit fortran for ATLAS, the > mingw > >> > version has/had problems. A plain build using the MS compilers works > >> > fine without ATLAS. > >> > > >> > Chuck > >> > >> Do you think there would be interest/demand for official non-ATLAS > >> binaries as a short term solution? > > > > I doubt this would add much to what's currently available unofficially. > > But it would be "official", which counts for something - especially in > commercial setting. > > >> > >> I'm thinking also of 3rd party Python libraries that use NumPy, and if > >> they/we can ship a win64 installer if NumPy doesn't. > > > > This is no problem of course. If I were you though, I would first > consider > > if it's not better to refer your users to the Enthought version, or the > > builds provided by Christoph Gohlke for example. > > We're currently pointing people on 64 bit Windows towards Christoph > Gohlke's unofficial builds. I'd be quite happy if Christoph's 64bit NumPy > installer was blessed as official and distributed via the NumPy website > (but there may be technical issues I'm unaware of). > The plain builds don't work with scipy as I think you know, which IMHO means they should not be official. The MKL ones should not be official because they're non-free. That said, if others feel that plain official builds are useful *and* someone steps up to create and troubleshoot them, then of course that's fine with me. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Sat Jan 22 14:26:23 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Sat, 22 Jan 2011 19:26:23 +0000 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Sat, Jan 22, 2011 at 5:43 PM, Ralf Gommers wrote: > > On Sat, Jan 22, 2011 at 10:20 PM, Peter wrote: >> We're currently pointing people on 64 bit Windows towards Christoph >> Gohlke's unofficial builds. I'd be quite happy if Christoph's 64bit NumPy >> installer was blessed as official and distributed via the NumPy website >> (but there may be technical issues I'm unaware of). > > The plain builds don't work with scipy as I think you know, which IMHO > means they should not be official. I see (on closer inspection) that Christoph only has 64bit SciPy for the Intel MKL version of 64bit NumPy (not the plain version), which is presumably what you are refering to? Is this down to some problem in SciPy or NumPy (or both)? I'm not familiar with the problem. > > The MKL ones should not be official because they're non-free. > Of course - Intel's licensing must be respected there. > That said, if others feel that plain official builds are useful *and* > someone steps up to create and troubleshoot them, then of > course that's fine with me. > > Cheers, > Ralf :) Peter From cournape at gmail.com Sat Jan 22 19:34:02 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 23 Jan 2011 09:34:02 +0900 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: On Sun, Jan 23, 2011 at 4:26 AM, Peter wrote: > > I see (on closer inspection) that Christoph only has 64bit SciPy for > the Intel MKL version of 64bit NumPy (not the plain version), which > is presumably what you are refering to? Is this down to some problem > in SciPy or NumPy (or both)? I'm not familiar with the problem. The main issue is that there is no easy way to get a working implementation of blas/lapack for windows 64 bits, nor a working, freely available fortran compiler. This is not so much a limitation of numpy/scipy, but rather one of lack of open source tools maturity on that platform. As Intel license requires developers to get a license of their tools, that's not an acceptable solution for the official version of numpy/scipy. cheers, David From ischnell at enthought.com Sat Jan 22 20:19:12 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Sat, 22 Jan 2011 19:19:12 -0600 Subject: [Numpy-discussion] 64 bit Windows installers for NumPy? In-Reply-To: References: Message-ID: But if you have an Intel license, you are allows to redistribute the MKL runtime. This is the reason why Enthought can distribute EPD which includes numpy, scipy and numexpr linked against MKL. The next EPD 7.0, will include Python 2.7.1 and numpy 1.5.1 and be released on February 8. - Ilan On Sat, Jan 22, 2011 at 6:34 PM, David Cournapeau wrote: > On Sun, Jan 23, 2011 at 4:26 AM, Peter > wrote: > >> >> I see (on closer inspection) that Christoph only has 64bit SciPy for >> the Intel MKL version of 64bit NumPy (not the plain version), which >> is presumably what you are refering to? Is this down to some problem >> in SciPy or NumPy (or both)? I'm not familiar with the problem. > > The main issue is that there is no easy way to get a working > implementation of blas/lapack for windows 64 bits, nor a working, > freely available fortran compiler. > > This is not so much a limitation of numpy/scipy, but rather one of > lack of open source tools maturity on that platform. As Intel license > requires developers to get a license of their tools, that's not an > acceptable solution for the official version of numpy/scipy. > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From vvoznesensky at gmail.com Sun Jan 23 03:58:34 2011 From: vvoznesensky at gmail.com (Vladimir Voznesensky) Date: Sun, 23 Jan 2011 11:58:34 +0300 Subject: [Numpy-discussion] OpenMP-fication of loops. Message-ID: <4D3BEDBA.1040401@gmail.com> Hello. I've hacked loops.c.src to use OpenMP. Is anybody here interested in my hacks? Cheers. - VV From tmp50 at ukr.net Sun Jan 23 05:00:38 2011 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 23 Jan 2011 12:00:38 +0200 Subject: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? Message-ID: Hi all, I have AMD processor and I would like to get to know what's the easiest way to install numpy/scipy linked with ACML. Is it possible to link linux apt or PYPI installation linked with ACML? Answer for the same question about MKL also would be useful, however, AFAIK it has commercial license and thus can't be handled in the ways. Thank you in advance, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 23 05:07:29 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 23 Jan 2011 19:07:29 +0900 Subject: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? In-Reply-To: References: Message-ID: 2011/1/23 Dmitrey : > Hi all, > I have AMD processor and I would like to get to know what's the easiest way > to install numpy/scipy linked with ACML. > Is it possible to link linux apt or PYPI installation linked with ACML? > Answer for the same question about MKL also would be useful, however, AFAIK > it has commercial license and thus can't be handled in the ways. For the MKL, the easiest solution is to get EPD, or to build numpy/scipy by yourself, although the later is not that easy. For ACML, I don't know how difficult it is, but I would be surprised if it worked out of the box. cheers, David From tmp50 at ukr.net Sun Jan 23 05:27:35 2011 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 23 Jan 2011 12:27:35 +0200 Subject: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? In-Reply-To: References: Message-ID: Are free EPD distributions linked with MKL and ACML? Does anyone know is SAGE or PythonXY already linked with ACML or MKL? Thanks, D. --- ???????? ????????? --- ?? ????: "David Cournapeau" ????: "Discussion of Numerical Python" ????: 23 ?????? 2011, 12:07:29 ????: Re: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? 2011/1/23 Dmitrey < tmp50 at ukr.net >: > Hi all, > I have AMD processor and I would like to get to know what's the easiest way > to install numpy/scipy linked with ACML. > Is it possible to link linux apt or PYPI installation linked with ACML? > Answer for the same question about MKL also would be useful, however, AFAIK > it has commercial license and thus can't be handled in the ways. For the MKL, the easiest solution is to get EPD, or to build numpy/scipy by yourself, although the later is not that easy. For ACML, I don't know how difficult it is, but I would be surprised if it worked out of the box. cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecarlson at eng.ua.edu Sun Jan 23 09:54:22 2011 From: ecarlson at eng.ua.edu (Eric Carlson) Date: Sun, 23 Jan 2011 08:54:22 -0600 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: <4D3BEDBA.1040401@gmail.com> References: <4D3BEDBA.1040401@gmail.com> Message-ID: As a user, I am very interested. That said, do you any tests or examples or benchmarks that give a ballpark estimate of performance improvements? From vvoznesensky at gmail.com Sun Jan 23 11:36:17 2011 From: vvoznesensky at gmail.com (Vladimir Voznesensky) Date: Sun, 23 Jan 2011 19:36:17 +0300 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: References: <4D3BEDBA.1040401@gmail.com> Message-ID: <4D3C5901.8000900@gmail.com> My computer has 12 hyperthreaded cores. My application uses dot multiplication from Intel MKL, that accelerated it by ~ 5 times. After OpenMP-fication of loops.c.src, my app was accelerated by ~12-15 times. So, it hardly depends on your computer ;) . 2ALL: How could I propagate the hacked file to the interested parties? Eric Carlson ?????: > As a user, I am very interested. That said, do you any tests or examples > or benchmarks that give a ballpark estimate of performance improvements? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ischnell at enthought.com Sun Jan 23 11:41:48 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Sun, 23 Jan 2011 10:41:48 -0600 Subject: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? In-Reply-To: References: Message-ID: Yes, also the free academic EPDs are linked to MKL. I don't know about sage, but probably not. - Ilan 2011/1/23 Dmitrey : > Are free EPD distributions linked with MKL and ACML? > Does anyone know is SAGE or PythonXY already linked with ACML or MKL? > > Thanks, D. > > --- ???????? ????????? --- > ?? ????: "David Cournapeau" > ????: "Discussion of Numerical Python" > ????: 23 ?????? 2011, 12:07:29 > ????: Re: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation > linked with ACML? > > > > 2011/1/23 Dmitrey : >> Hi all, >> I have AMD processor and I would like to get to know what's the easiest >> way >> to install numpy/scipy linked with ACML. >> Is it possible to link linux apt or PYPI installation linked with ACML? >> Answer for the same question about MKL also would be useful, however, >> AFAIK >> it has commercial license and thus can't be handled in the ways. > > For the MKL, the easiest solution is to get EPD, or to build > numpy/scipy by yourself, although the later is not that easy. For > ACML, I don't know how difficult it is, but I would be surprised if it > worked out of the box. > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From matthieu.brucher at gmail.com Sun Jan 23 11:49:05 2011 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sun, 23 Jan 2011 17:49:05 +0100 Subject: [Numpy-discussion] Is numpy/scipy linux apt or PYPI installation linked with ACML? In-Reply-To: References: Message-ID: I think the main issue is that ACML didn't have an official CBLAS interface, so you have to check if they provide one now. If thy do, it should be almost easy to link against it. Matthieu 2011/1/23 David Cournapeau > 2011/1/23 Dmitrey : > > Hi all, > > I have AMD processor and I would like to get to know what's the easiest > way > > to install numpy/scipy linked with ACML. > > Is it possible to link linux apt or PYPI installation linked with ACML? > > Answer for the same question about MKL also would be useful, however, > AFAIK > > it has commercial license and thus can't be handled in the ways. > > For the MKL, the easiest solution is to get EPD, or to build > numpy/scipy by yourself, although the later is not that easy. For > ACML, I don't know how difficult it is, but I would be surprised if it > worked out of the box. > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecarlson at eng.ua.edu Sun Jan 23 15:52:55 2011 From: ecarlson at eng.ua.edu (Eric Carlson) Date: Sun, 23 Jan 2011 14:52:55 -0600 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: <4D3C5901.8000900@gmail.com> References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> Message-ID: On 1/23/2011 10:36 AM, Vladimir Voznesensky wrote: > My computer has 12 hyperthreaded cores. > My application uses dot multiplication from Intel MKL, that accelerated > it by ~ 5 times. > After OpenMP-fication of loops.c.src, my app was accelerated by ~12-15 > times. > I was greatly disappointed in the parallel performance on a new workstation for some of my programs. I could not get better than about a factor of 5 on my dual xeon with 24 threads. Last Fall, I stumbled across this example of OpenMP with f2py, https://gist.github.com/226473 that I built on Ubuntu 10.04 x64 using (slightly different than the instructions): f2py -c -m deemingomp periodogram.f90 --f90flags="-fopenmp " -lgomp -lf77blas -lcblas -latlas On my machine for larger array sizes, I saw speed-ups of 20x over single thread in the example program. Indeed, the example serves as an excellent way to test the thermal stability of the workstation. I did not get a chance to follow this up yet, but if you can get 12x improvement with normal numpy codes, I am very interested... Cheers, EC From vvoznesensky at gmail.com Sun Jan 23 15:57:04 2011 From: vvoznesensky at gmail.com (Vladimir Voznesensky) Date: Sun, 23 Jan 2011 23:57:04 +0300 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> Message-ID: <4D3C9620.3060105@gmail.com> Dear Eric, Sure, I will give you my code, but who will "follow this up"? Eric Carlson ?????: > On 1/23/2011 10:36 AM, Vladimir Voznesensky wrote: > >> My computer has 12 hyperthreaded cores. >> My application uses dot multiplication from Intel MKL, that accelerated >> it by ~ 5 times. >> After OpenMP-fication of loops.c.src, my app was accelerated by ~12-15 >> times. >> >> > > I was greatly disappointed in the parallel performance on a new > workstation for some of my programs. I could not get better than about a > factor of 5 on my dual xeon with 24 threads. > > Last Fall, I stumbled across this example of OpenMP with f2py, > > https://gist.github.com/226473 > > that I built on Ubuntu 10.04 x64 using (slightly different than the > instructions): > > f2py -c -m deemingomp periodogram.f90 --f90flags="-fopenmp " -lgomp > -lf77blas -lcblas -latlas > > On my machine for larger array sizes, I saw speed-ups of 20x over single > thread in the example program. Indeed, the example serves as an > excellent way to test the thermal stability of the workstation. > > I did not get a chance to follow this up yet, but if you can get 12x > improvement with normal numpy codes, I am very interested... > > Cheers, > EC > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecarlson at eng.ua.edu Sun Jan 23 16:49:37 2011 From: ecarlson at eng.ua.edu (Eric Carlson) Date: Sun, 23 Jan 2011 15:49:37 -0600 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: <4D3C9620.3060105@gmail.com> References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> <4D3C9620.3060105@gmail.com> Message-ID: >On 1/23/2011 2:57 PM, Vladimir Voznesensky wrote: > > Sure, I will give you my code, but who will "follow this up"? > Hey Vladimir, A good question. At this point, I am most curious about the difficulties of using this as a standard built into numpy. EC From charlesr.harris at gmail.com Sun Jan 23 19:48:54 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Jan 2011 17:48:54 -0700 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: <4D3C9620.3060105@gmail.com> References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> <4D3C9620.3060105@gmail.com> Message-ID: On Sun, Jan 23, 2011 at 1:57 PM, Vladimir Voznesensky < vvoznesensky at gmail.com> wrote: > Dear Eric, > > Sure, I will give you my code, but who will "follow this up"? > > I suggest you start with an account on github and put your modified code in a branch. You can then post a link, and if things go well, maybe you can post a pull request to numpy at some point. At the moment there is a fair amount of churn in the pipeline for ufuncs and so you might have to wait until mid summer, but I think a lot of folks will be interested in how to speed things up at the loop level. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 23 20:09:11 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Jan 2011 18:09:11 -0700 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> <4D3C9620.3060105@gmail.com> Message-ID: On Sun, Jan 23, 2011 at 5:48 PM, Charles R Harris wrote: > > > On Sun, Jan 23, 2011 at 1:57 PM, Vladimir Voznesensky < > vvoznesensky at gmail.com> wrote: > >> Dear Eric, >> >> Sure, I will give you my code, but who will "follow this up"? >> >> > I suggest you start with an account on github and put your modified code in > a branch. You can then post a link, and if things go well, maybe you can > post a pull request to numpy at some point. At the moment there is a fair > amount of churn in the pipeline for ufuncs and so you might have to wait > until mid summer, but I think a lot of folks will be interested in how to > speed things up at the loop level. > > > > More explicit instructions on setting up a github account are are here, look at the git for development section. Once you have things up on github you can post a link to the branch, that will make it easy for folks to download and test your code. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vvoznesensky at gmail.com Sun Jan 23 23:55:15 2011 From: vvoznesensky at gmail.com (Vladimir Voznesensky) Date: Mon, 24 Jan 2011 07:55:15 +0300 Subject: [Numpy-discussion] OpenMP-fication of loops. In-Reply-To: References: <4D3BEDBA.1040401@gmail.com> <4D3C5901.8000900@gmail.com> <4D3C9620.3060105@gmail.com> Message-ID: <4D3D0633.6080205@gmail.com> Sorry, the cross is too heavy for me. I must feed my family. For people who would and could try my hack and do this git matter: please, write me a letter. Charles R Harris ?????: > > > On Sun, Jan 23, 2011 at 5:48 PM, Charles R Harris > > wrote: > > > > On Sun, Jan 23, 2011 at 1:57 PM, Vladimir Voznesensky > > wrote: > > Dear Eric, > > Sure, I will give you my code, but who will "follow this up"? > > > I suggest you start with an account on github and put your > modified code in a branch. You can then post a link, and if things > go well, maybe you can post a pull request to numpy at some point. > At the moment there is a fair amount of churn in the pipeline for > ufuncs and so you might have to wait until mid summer, but I think > a lot of folks will be interested in how to speed things up at the > loop level. > > > > > More explicit instructions on setting up a github account are are here > , look at the git for > development section. Once you have things up on github you can post a > link to the branch, that will make it easy for folks to download and > test your code. > > Chuck > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cool-rr at cool-rr.com Mon Jan 24 07:22:08 2011 From: cool-rr at cool-rr.com (cool-RR) Date: Mon, 24 Jan 2011 14:22:08 +0200 Subject: [Numpy-discussion] How can I install numpy on Python 3.1 in Ubuntu? Message-ID: Hello folks, I have Ubuntu 10.10 server on EC2. I installed Python 3.1, and now I want to install NumPy on it. How do I do it? I tried `easy_install-3.1 numpy` but got this error: [...] RefactoringTool: Refactored /tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/core/defchararray.py RefactoringTool: Files that were modified: RefactoringTool: /tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/compat/py3k.py RefactoringTool: /tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/core/defchararray.py Running from numpy source directory.Traceback (most recent call last): File "/usr/local/bin/easy_install-3.1", line 9, in load_entry_point('distribute==0.6.14', 'console_scripts', 'easy_install-3.1')() File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 1855, in main with_ei_usage(lambda: File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 1836, in with_ei_usage return f() File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 1859, in distclass=DistributionWithoutHelpCommands, **kw File "/usr/lib/python3.1/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/lib/python3.1/distutils/dist.py", line 919, in run_commands self.run_command(cmd) File "/usr/lib/python3.1/distutils/dist.py", line 938, in run_command cmd_obj.run() File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 342, in run self.easy_install(spec, not self.no_deps) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 582, in easy_install return self.install_item(spec, dist.location, tmpdir, deps) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 612, in install_item dists = self.install_eggs(spec, download, tmpdir) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 802, in install_eggs return self.build_and_install(setup_script, setup_base) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 1079, in build_and_install self.run_setup(setup_script, setup_base, args) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/command/easy_install.py", line 1068, in run_setup run_setup(setup_script, args) File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/sandbox.py", line 30, in run_setup lambda: exec(compile(open( File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/sandbox.py", line 71, in run return func() File "/usr/local/lib/python3.1/dist-packages/distribute-0.6.14-py3.1.egg/setuptools/sandbox.py", line 33, in {'__file__':setup_script, '__name__':'__main__'}) File "setup.py", line 211, in File "setup.py", line 204, in setup_package File "/tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/distutils/core.py", line 152, in setup File "setup.py", line 151, in configuration File "/tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/distutils/misc_util.py", line 972, in add_subpackage File "/tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/distutils/misc_util.py", line 941, in get_subpackage File "/tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/distutils/misc_util.py", line 878, in _get_configuration_from_setup_py File "numpy/setup.py", line 5, in configuration File "/tmp/easy_install-MiUli2/numpy-1.5.1/build/py3k/numpy/distutils/misc_util.py", line 713, in __init__ ValueError: 'build/py3k/numpy' is not a directory What can I do? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Jan 24 08:23:29 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 24 Jan 2011 21:23:29 +0800 Subject: [Numpy-discussion] How can I install numpy on Python 3.1 in Ubuntu? In-Reply-To: References: Message-ID: On Mon, Jan 24, 2011 at 8:22 PM, cool-RR wrote: > Hello folks, > > I have Ubuntu 10.10 server on EC2. I installed Python 3.1, and now I want > to install NumPy on it. How do I do it? I tried `easy_install-3.1 numpy` but > got this error: > Just do "python3.1 setup.py install". That's always a better idea for numpy/scipy than trying to use easy_install. Also you need to make sure some packages are installed first. From http://www.scipy.org/Installing_SciPy/Linux: sudo apt-get install build-essential python-dev swig gfortran python-nose Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From washakie at gmail.com Mon Jan 24 08:53:29 2011 From: washakie at gmail.com (John) Date: Mon, 24 Jan 2011 14:53:29 +0100 Subject: [Numpy-discussion] need a better way to fill a grid Message-ID: Hello, I'm trying to cycle over some vectors (lat,lon,emissions) of irregularly spaced lat/lon spots, and values. I need to sum the values each contributing to grid on a regular lat lon grid. This is what I have presently, but it is too slow. Is there a more efficient way to do this? I would prefer not to create an external module (f2py, cython) unless there is really no way to make this more efficient... it's the looping through the grid I guess that takes so long. Thanks, john def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): """ sample the emissions into a grid to fold into model output """ dx = grid.dxout dy = grid.dyout # Generate a regular grid to fill with the sum of emissions xi = np.linspace(grid.outlon0, grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) yi = np.linspace(grid.outlat0, grid.outlat0+(grid.nymax*grid.dy), grid.nymax) X, Y = np.meshgrid(yi, xi) Z = np.zeros(X.shape) for i,x in enumerate(xi): for j,y in enumerate(yi): Z[i,j] = np.sum( emissions[\ np.where(((lat>y-dy) & (latx-dx) & (lon References: Message-ID: Hi John, Since you have a regular grid, you should be able to find the x and y indices without np.where, ie something like I = (lon-grid.outlon0 / grid.dx).astype(int) J = (lat-grid.outlat0 / grid.dy).astype(int) for i, j, e in zip(I, J, emissions): Z[i,j] += e David On Mon, Jan 24, 2011 at 8:53 AM, John wrote: > Hello, > > I'm trying to cycle over some vectors (lat,lon,emissions) of > irregularly spaced lat/lon spots, and values. I need to sum the values > each contributing to grid on a regular lat lon grid. > > This is what I have presently, but it is too slow. Is there a more > efficient way to do this? I would prefer not to create an external > module (f2py, cython) unless there is really no way to make this more > efficient... it's the looping through the grid I guess that takes so > long. > > Thanks, > john > > > > ? ?def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, > grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): > ? ? ? ?""" sample the emissions into a grid to fold into model output > ? ? ? ?""" > > ? ? ? ?dx = grid.dxout > ? ? ? ?dy = grid.dyout > > ? ? ? ?# Generate a regular grid to fill with the sum of emissions > ? ? ? ?xi = np.linspace(grid.outlon0, > grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) > ? ? ? ?yi = np.linspace(grid.outlat0, > grid.outlat0+(grid.nymax*grid.dy), grid.nymax) > > ? ? ? ?X, Y = np.meshgrid(yi, xi) > ? ? ? ?Z = np.zeros(X.shape) > > ? ? ? ?for i,x in enumerate(xi): > ? ? ? ? ? ?for j,y in enumerate(yi): > ? ? ? ? ? ? ? ?Z[i,j] = np.sum( emissions[\ > ? ? ? ? ? ? ? ? ? ? ? ? np.where(((lat>y-dy) & (lat ((lon>x-dx) & (lon > ? ? ? ?return Z > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From schut at sarvision.nl Mon Jan 24 09:56:15 2011 From: schut at sarvision.nl (Vincent Schut) Date: Mon, 24 Jan 2011 15:56:15 +0100 Subject: [Numpy-discussion] need a better way to fill a grid In-Reply-To: References: Message-ID: On 01/24/2011 02:53 PM, John wrote: > Hello, > > I'm trying to cycle over some vectors (lat,lon,emissions) of > irregularly spaced lat/lon spots, and values. I need to sum the values > each contributing to grid on a regular lat lon grid. > > This is what I have presently, but it is too slow. Is there a more > efficient way to do this? I would prefer not to create an external > module (f2py, cython) unless there is really no way to make this more > efficient... it's the looping through the grid I guess that takes so > long. Use np.histogram2d with weights=emissions, and lat and lon as your x and y to histogram. Choose the bins to match your grid, and it will effectively sum the emission values for each grid cell. Vincent. > > Thanks, > john > > > > def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, > grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): > """ sample the emissions into a grid to fold into model output > """ > > dx = grid.dxout > dy = grid.dyout > > # Generate a regular grid to fill with the sum of emissions > xi = np.linspace(grid.outlon0, > grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) > yi = np.linspace(grid.outlat0, > grid.outlat0+(grid.nymax*grid.dy), grid.nymax) > > X, Y = np.meshgrid(yi, xi) > Z = np.zeros(X.shape) > > for i,x in enumerate(xi): > for j,y in enumerate(yi): > Z[i,j] = np.sum( emissions[\ > np.where(((lat>y-dy)& (lat ((lon>x-dx)& (lon > return Z From washakie at gmail.com Mon Jan 24 09:57:50 2011 From: washakie at gmail.com (John) Date: Mon, 24 Jan 2011 15:57:50 +0100 Subject: [Numpy-discussion] need a better way to fill a grid In-Reply-To: References: Message-ID: I know we're not supposed to 'broadcast' thanks, but Thanks! This works much more efficiently! On Mon, Jan 24, 2011 at 3:50 PM, David Huard wrote: > Hi John, > > Since you have a regular grid, you should be able to find the x and y > indices without np.where, ie something like > > I = (lon-grid.outlon0 / grid.dx).astype(int) > J = (lat-grid.outlat0 / grid.dy).astype(int) > > for i, j, e in zip(I, J, emissions): > ? ?Z[i,j] += e > > > David > > On Mon, Jan 24, 2011 at 8:53 AM, John wrote: >> Hello, >> >> I'm trying to cycle over some vectors (lat,lon,emissions) of >> irregularly spaced lat/lon spots, and values. I need to sum the values >> each contributing to grid on a regular lat lon grid. >> >> This is what I have presently, but it is too slow. Is there a more >> efficient way to do this? I would prefer not to create an external >> module (f2py, cython) unless there is really no way to make this more >> efficient... it's the looping through the grid I guess that takes so >> long. >> >> Thanks, >> john >> >> >> >> ? ?def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, >> grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): >> ? ? ? ?""" sample the emissions into a grid to fold into model output >> ? ? ? ?""" >> >> ? ? ? ?dx = grid.dxout >> ? ? ? ?dy = grid.dyout >> >> ? ? ? ?# Generate a regular grid to fill with the sum of emissions >> ? ? ? ?xi = np.linspace(grid.outlon0, >> grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) >> ? ? ? ?yi = np.linspace(grid.outlat0, >> grid.outlat0+(grid.nymax*grid.dy), grid.nymax) >> >> ? ? ? ?X, Y = np.meshgrid(yi, xi) >> ? ? ? ?Z = np.zeros(X.shape) >> >> ? ? ? ?for i,x in enumerate(xi): >> ? ? ? ? ? ?for j,y in enumerate(yi): >> ? ? ? ? ? ? ? ?Z[i,j] = np.sum( emissions[\ >> ? ? ? ? ? ? ? ? ? ? ? ? np.where(((lat>y-dy) & (lat> ((lon>x-dx) & (lon> >> ? ? ? ?return Z >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Configuration `````````````````````````` Plone 2.5.3-final, CMF-1.6.4, Zope (Zope 2.9.7-final, python 2.4.4, linux2), Python 2.6 PIL 1.1.6 Mailman 2.1.9 Postfix 2.4.5 Procmail v3.22 2001/09/10 Basemap: 1.0 Matplotlib: 1.0.0 From washakie at gmail.com Mon Jan 24 09:58:45 2011 From: washakie at gmail.com (John) Date: Mon, 24 Jan 2011 15:58:45 +0100 Subject: [Numpy-discussion] need a better way to fill a grid In-Reply-To: References: Message-ID: I will try this as well and report back with a timing... On Mon, Jan 24, 2011 at 3:56 PM, Vincent Schut wrote: > On 01/24/2011 02:53 PM, John wrote: >> Hello, >> >> I'm trying to cycle over some vectors (lat,lon,emissions) of >> irregularly spaced lat/lon spots, and values. I need to sum the values >> each contributing to grid on a regular lat lon grid. >> >> This is what I have presently, but it is too slow. Is there a more >> efficient way to do this? I would prefer not to create an external >> module (f2py, cython) unless there is really no way to make this more >> efficient... it's the looping through the grid I guess that takes so >> long. > > Use np.histogram2d with weights=emissions, and lat and lon as your x and > y to histogram. Choose the bins to match your grid, and it will > effectively sum the emission values for each grid cell. > > Vincent. > >> >> Thanks, >> john >> >> >> >> ? ? ?def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, >> grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): >> ? ? ? ? ?""" sample the emissions into a grid to fold into model output >> ? ? ? ? ?""" >> >> ? ? ? ? ?dx = grid.dxout >> ? ? ? ? ?dy = grid.dyout >> >> ? ? ? ? ?# Generate a regular grid to fill with the sum of emissions >> ? ? ? ? ?xi = np.linspace(grid.outlon0, >> grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) >> ? ? ? ? ?yi = np.linspace(grid.outlat0, >> grid.outlat0+(grid.nymax*grid.dy), grid.nymax) >> >> ? ? ? ? ?X, Y = np.meshgrid(yi, xi) >> ? ? ? ? ?Z = np.zeros(X.shape) >> >> ? ? ? ? ?for i,x in enumerate(xi): >> ? ? ? ? ? ? ?for j,y in enumerate(yi): >> ? ? ? ? ? ? ? ? ?Z[i,j] = np.sum( emissions[\ >> ? ? ? ? ? ? ? ? ? ? ? ? ? np.where(((lat>y-dy)& ?(lat> ((lon>x-dx)& ?(lon> >> ? ? ? ? ?return Z > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Configuration `````````````````````````` Plone 2.5.3-final, CMF-1.6.4, Zope (Zope 2.9.7-final, python 2.4.4, linux2), Python 2.6 PIL 1.1.6 Mailman 2.1.9 Postfix 2.4.5 Procmail v3.22 2001/09/10 Basemap: 1.0 Matplotlib: 1.0.0 From washakie at gmail.com Mon Jan 24 10:46:56 2011 From: washakie at gmail.com (John) Date: Mon, 24 Jan 2011 16:46:56 +0100 Subject: [Numpy-discussion] need a better way to fill a grid In-Reply-To: References: Message-ID: Thanks again everyone, Just for completeness. First, there seems to be a problem with my original method, but it must have to do with indexing. Apart from being slow, it reports back maximum values a factor of two greater than the other two methods, so something is amiss there. The other two methods provide identical results in terms of the sums. Original method: ~ 13.3 seconds Pure Python per David: ~ 0.017 seconds Numpy histogramdd per Vincent: ~ 0.007 seconds Thanks, john From jsalvati at u.washington.edu Mon Jan 24 12:47:58 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 24 Jan 2011 09:47:58 -0800 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements Message-ID: Hello, I have discovered a strange bug with numexpr. numexpr.evaluate gives randomized results on arrays larger than 2047 elements. The following program demonstrates this: from numpy import * from numexpr import evaluate def func(x): return evaluate("sum(x, axis = 0)") x = zeros(2048)+.01 print evaluate("sum(x, axis = 0)") print evaluate("sum(x, axis = 0)") For me this prints different results each time, for example: 11.67 14.84 If we set the size to 2047 I get consistent results. 20.47 20.47 Interestingly, if I do not add .01 to x, it consistently sums to 0. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Jan 24 13:19:19 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 24 Jan 2011 10:19:19 -0800 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier wrote: > Hello, > > I have discovered a strange bug with numexpr. numexpr.evaluate gives > randomized results on arrays larger than 2047 elements. The following > program demonstrates this: > > from numpy import * > from numexpr import evaluate > > def func(x): > > return evaluate("sum(x, axis = 0)") > > > x = zeros(2048)+.01 > > print evaluate("sum(x, axis = 0)") > print evaluate("sum(x, axis = 0)") > > For me this prints different results each time, for example: > > 11.67 > 14.84 > > If we set the size to 2047 I get consistent results. > > 20.47 > 20.47 > > Interestingly, if I do not add .01 to x, it consistently sums to 0. -------------- next part -------------- An HTML attachment was scrubbed... URL: From totonixsame at gmail.com Mon Jan 24 13:23:06 2011 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Mon, 24 Jan 2011 16:23:06 -0200 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: I have the same problem here. I'm using numexpr 1.4.1 and numpy 1.3.0. On Mon, Jan 24, 2011 at 4:19 PM, John Salvatier wrote: > Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 > > On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier > wrote: >> >> Hello, >> I have discovered a strange bug with numexpr. numexpr.evaluate gives >> randomized results on arrays larger than 2047 elements. The following >> program demonstrates this: >> >> from numpy import * >> from numexpr import evaluate >> def func(x): >> ?? ?return evaluate("sum(x, axis = 0)") >> >> x = zeros(2048)+.01 >> print evaluate("sum(x, axis = 0)") >> print evaluate("sum(x, axis = 0)") >> >> For me this prints different results each time, for example: >> >> 11.67 >> 14.84 >> >> If we set the size to 2047 I get consistent results. >> >> 20.47 >> 20.47 >> >> Interestingly, if I do not add .01 to x, it consistently sums to 0. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From warren.weckesser at enthought.com Mon Jan 24 13:23:18 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 24 Jan 2011 12:23:18 -0600 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: I see the same "randomness", but at a different array size: In [23]: numpy.__version__ Out[23]: '1.4.0' In [24]: import numexpr In [25]: numexpr.__version__ Out[25]: '1.4.1' In [26]: x = zeros(8192)+0.01 In [27]: print evaluate('sum(x, axis=0)') 72.97 In [28]: print evaluate('sum(x, axis=0)') 66.92 In [29]: print evaluate('sum(x, axis=0)') 67.9 In [30]: x = zeros(8193)+0.01 In [31]: print evaluate('sum(x, axis=0)') 72.63 In [32]: print evaluate('sum(x, axis=0)') 71.74 In [33]: print evaluate('sum(x, axis=0)') 81.93 In [34]: x = zeros(8191)+0.01 In [35]: print evaluate('sum(x, axis=0)') 81.91 In [36]: print evaluate('sum(x, axis=0)') 81.91 Warren On Mon, Jan 24, 2011 at 12:19 PM, John Salvatier wrote: > Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 > > > On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier > wrote: > >> Hello, >> >> I have discovered a strange bug with numexpr. numexpr.evaluate gives >> randomized results on arrays larger than 2047 elements. The following >> program demonstrates this: >> >> from numpy import * >> from numexpr import evaluate >> >> def func(x): >> >> return evaluate("sum(x, axis = 0)") >> >> >> x = zeros(2048)+.01 >> >> print evaluate("sum(x, axis = 0)") >> print evaluate("sum(x, axis = 0)") >> >> For me this prints different results each time, for example: >> >> 11.67 >> 14.84 >> >> If we set the size to 2047 I get consistent results. >> >> 20.47 >> 20.47 >> >> Interestingly, if I do not add .01 to x, it consistently sums to 0. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Jan 24 13:29:41 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 24 Jan 2011 10:29:41 -0800 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: I also get the same issue with prod() On Mon, Jan 24, 2011 at 10:23 AM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > I see the same "randomness", but at a different array size: > > In [23]: numpy.__version__ > Out[23]: '1.4.0' > > In [24]: import numexpr > > In [25]: numexpr.__version__ > Out[25]: '1.4.1' > > In [26]: x = zeros(8192)+0.01 > > In [27]: print evaluate('sum(x, axis=0)') > 72.97 > > In [28]: print evaluate('sum(x, axis=0)') > 66.92 > > In [29]: print evaluate('sum(x, axis=0)') > 67.9 > > In [30]: x = zeros(8193)+0.01 > > In [31]: print evaluate('sum(x, axis=0)') > 72.63 > > In [32]: print evaluate('sum(x, axis=0)') > 71.74 > > In [33]: print evaluate('sum(x, axis=0)') > 81.93 > > In [34]: x = zeros(8191)+0.01 > > In [35]: print evaluate('sum(x, axis=0)') > 81.91 > > In [36]: print evaluate('sum(x, axis=0)') > 81.91 > > > Warren > > > > On Mon, Jan 24, 2011 at 12:19 PM, John Salvatier < > jsalvati at u.washington.edu> wrote: > >> Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 >> >> >> On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> Hello, >>> >>> I have discovered a strange bug with numexpr. numexpr.evaluate gives >>> randomized results on arrays larger than 2047 elements. The following >>> program demonstrates this: >>> >>> from numpy import * >>> from numexpr import evaluate >>> >>> def func(x): >>> >>> return evaluate("sum(x, axis = 0)") >>> >>> >>> x = zeros(2048)+.01 >>> >>> print evaluate("sum(x, axis = 0)") >>> print evaluate("sum(x, axis = 0)") >>> >>> For me this prints different results each time, for example: >>> >>> 11.67 >>> 14.84 >>> >>> If we set the size to 2047 I get consistent results. >>> >>> 20.47 >>> 20.47 >>> >>> Interestingly, if I do not add .01 to x, it consistently sums to 0. >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Jan 24 14:13:43 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 24 Jan 2011 11:13:43 -0800 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: Looks like this is related to issue 41 ( http://code.google.com/p/numexpr/issues/detail?id=41&can=1). On Mon, Jan 24, 2011 at 10:29 AM, John Salvatier wrote: > I also get the same issue with prod() > > > On Mon, Jan 24, 2011 at 10:23 AM, Warren Weckesser < > warren.weckesser at enthought.com> wrote: > >> I see the same "randomness", but at a different array size: >> >> In [23]: numpy.__version__ >> Out[23]: '1.4.0' >> >> In [24]: import numexpr >> >> In [25]: numexpr.__version__ >> Out[25]: '1.4.1' >> >> In [26]: x = zeros(8192)+0.01 >> >> In [27]: print evaluate('sum(x, axis=0)') >> 72.97 >> >> In [28]: print evaluate('sum(x, axis=0)') >> 66.92 >> >> In [29]: print evaluate('sum(x, axis=0)') >> 67.9 >> >> In [30]: x = zeros(8193)+0.01 >> >> In [31]: print evaluate('sum(x, axis=0)') >> 72.63 >> >> In [32]: print evaluate('sum(x, axis=0)') >> 71.74 >> >> In [33]: print evaluate('sum(x, axis=0)') >> 81.93 >> >> In [34]: x = zeros(8191)+0.01 >> >> In [35]: print evaluate('sum(x, axis=0)') >> 81.91 >> >> In [36]: print evaluate('sum(x, axis=0)') >> 81.91 >> >> >> Warren >> >> >> >> On Mon, Jan 24, 2011 at 12:19 PM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 >>> >>> >>> On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier < >>> jsalvati at u.washington.edu> wrote: >>> >>>> Hello, >>>> >>>> I have discovered a strange bug with numexpr. numexpr.evaluate gives >>>> randomized results on arrays larger than 2047 elements. The following >>>> program demonstrates this: >>>> >>>> from numpy import * >>>> from numexpr import evaluate >>>> >>>> def func(x): >>>> >>>> return evaluate("sum(x, axis = 0)") >>>> >>>> >>>> x = zeros(2048)+.01 >>>> >>>> print evaluate("sum(x, axis = 0)") >>>> print evaluate("sum(x, axis = 0)") >>>> >>>> For me this prints different results each time, for example: >>>> >>>> 11.67 >>>> 14.84 >>>> >>>> If we set the size to 2047 I get consistent results. >>>> >>>> 20.47 >>>> 20.47 >>>> >>>> Interestingly, if I do not add .01 to x, it consistently sums to 0. >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Mon Jan 24 14:35:55 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 24 Jan 2011 13:35:55 -0600 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: On Mon, Jan 24, 2011 at 1:13 PM, John Salvatier wrote: > Looks like this is related to issue 41 ( > http://code.google.com/p/numexpr/issues/detail?id=41&can=1). That might not be the same issue. You can fix the "randomness" by setting the number of threads to 1, as in input [6] here: In [1]: import numexpr as ne In [2]: x = zeros(8192)+0.01 In [3]: ne.evaluate('sum(x, axis=0)') Out[3]: array(71.119999999999479) In [4]: ne.evaluate('sum(x, axis=0)') Out[4]: array(81.920000000005004) In [5]: ne.evaluate('sum(x, axis=0)') Out[5]: array(68.379999999998077) In [6]: ne.set_num_threads(1) In [7]: ne.evaluate('sum(x, axis=0)') Out[7]: array(81.920000000005004) In [8]: ne.evaluate('sum(x, axis=0)') Out[8]: array(81.920000000005004) In [9]: ne.evaluate('sum(x, axis=0)') Out[9]: array(81.920000000005004) Warren > > > On Mon, Jan 24, 2011 at 10:29 AM, John Salvatier < > jsalvati at u.washington.edu> wrote: > >> I also get the same issue with prod() >> >> >> On Mon, Jan 24, 2011 at 10:23 AM, Warren Weckesser < >> warren.weckesser at enthought.com> wrote: >> >>> I see the same "randomness", but at a different array size: >>> >>> In [23]: numpy.__version__ >>> Out[23]: '1.4.0' >>> >>> In [24]: import numexpr >>> >>> In [25]: numexpr.__version__ >>> Out[25]: '1.4.1' >>> >>> In [26]: x = zeros(8192)+0.01 >>> >>> In [27]: print evaluate('sum(x, axis=0)') >>> 72.97 >>> >>> In [28]: print evaluate('sum(x, axis=0)') >>> 66.92 >>> >>> In [29]: print evaluate('sum(x, axis=0)') >>> 67.9 >>> >>> In [30]: x = zeros(8193)+0.01 >>> >>> In [31]: print evaluate('sum(x, axis=0)') >>> 72.63 >>> >>> In [32]: print evaluate('sum(x, axis=0)') >>> 71.74 >>> >>> In [33]: print evaluate('sum(x, axis=0)') >>> 81.93 >>> >>> In [34]: x = zeros(8191)+0.01 >>> >>> In [35]: print evaluate('sum(x, axis=0)') >>> 81.91 >>> >>> In [36]: print evaluate('sum(x, axis=0)') >>> 81.91 >>> >>> >>> Warren >>> >>> >>> >>> On Mon, Jan 24, 2011 at 12:19 PM, John Salvatier < >>> jsalvati at u.washington.edu> wrote: >>> >>>> Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 >>>> >>>> >>>> On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier < >>>> jsalvati at u.washington.edu> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have discovered a strange bug with numexpr. numexpr.evaluate gives >>>>> randomized results on arrays larger than 2047 elements. The following >>>>> program demonstrates this: >>>>> >>>>> from numpy import * >>>>> from numexpr import evaluate >>>>> >>>>> def func(x): >>>>> >>>>> return evaluate("sum(x, axis = 0)") >>>>> >>>>> >>>>> x = zeros(2048)+.01 >>>>> >>>>> print evaluate("sum(x, axis = 0)") >>>>> print evaluate("sum(x, axis = 0)") >>>>> >>>>> For me this prints different results each time, for example: >>>>> >>>>> 11.67 >>>>> 14.84 >>>>> >>>>> If we set the size to 2047 I get consistent results. >>>>> >>>>> 20.47 >>>>> 20.47 >>>>> >>>>> Interestingly, if I do not add .01 to x, it consistently sums to 0. >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Jan 24 14:53:14 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 24 Jan 2011 11:53:14 -0800 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: You're right, I got the same behavior. Interesting. On Mon, Jan 24, 2011 at 11:35 AM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Mon, Jan 24, 2011 at 1:13 PM, John Salvatier > wrote: > >> Looks like this is related to issue 41 ( >> http://code.google.com/p/numexpr/issues/detail?id=41&can=1). > > > > That might not be the same issue. > > You can fix the "randomness" by setting the number of threads to 1, as in > input [6] here: > > In [1]: import numexpr as ne > > In [2]: x = zeros(8192)+0.01 > > In [3]: ne.evaluate('sum(x, axis=0)') > Out[3]: array(71.119999999999479) > > In [4]: ne.evaluate('sum(x, axis=0)') > Out[4]: array(81.920000000005004) > > In [5]: ne.evaluate('sum(x, axis=0)') > Out[5]: array(68.379999999998077) > > In [6]: ne.set_num_threads(1) > > In [7]: ne.evaluate('sum(x, axis=0)') > Out[7]: array(81.920000000005004) > > In [8]: ne.evaluate('sum(x, axis=0)') > Out[8]: array(81.920000000005004) > > In [9]: ne.evaluate('sum(x, axis=0)') > Out[9]: array(81.920000000005004) > > > Warren > > > >> >> >> On Mon, Jan 24, 2011 at 10:29 AM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> I also get the same issue with prod() >>> >>> >>> On Mon, Jan 24, 2011 at 10:23 AM, Warren Weckesser < >>> warren.weckesser at enthought.com> wrote: >>> >>>> I see the same "randomness", but at a different array size: >>>> >>>> In [23]: numpy.__version__ >>>> Out[23]: '1.4.0' >>>> >>>> In [24]: import numexpr >>>> >>>> In [25]: numexpr.__version__ >>>> Out[25]: '1.4.1' >>>> >>>> In [26]: x = zeros(8192)+0.01 >>>> >>>> In [27]: print evaluate('sum(x, axis=0)') >>>> 72.97 >>>> >>>> In [28]: print evaluate('sum(x, axis=0)') >>>> 66.92 >>>> >>>> In [29]: print evaluate('sum(x, axis=0)') >>>> 67.9 >>>> >>>> In [30]: x = zeros(8193)+0.01 >>>> >>>> In [31]: print evaluate('sum(x, axis=0)') >>>> 72.63 >>>> >>>> In [32]: print evaluate('sum(x, axis=0)') >>>> 71.74 >>>> >>>> In [33]: print evaluate('sum(x, axis=0)') >>>> 81.93 >>>> >>>> In [34]: x = zeros(8191)+0.01 >>>> >>>> In [35]: print evaluate('sum(x, axis=0)') >>>> 81.91 >>>> >>>> In [36]: print evaluate('sum(x, axis=0)') >>>> 81.91 >>>> >>>> >>>> Warren >>>> >>>> >>>> >>>> On Mon, Jan 24, 2011 at 12:19 PM, John Salvatier < >>>> jsalvati at u.washington.edu> wrote: >>>> >>>>> Forgot to mention that I am using numexpr 1.4.1 and numpy 1.5.1 >>>>> >>>>> >>>>> On Mon, Jan 24, 2011 at 9:47 AM, John Salvatier < >>>>> jsalvati at u.washington.edu> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I have discovered a strange bug with numexpr. numexpr.evaluate gives >>>>>> randomized results on arrays larger than 2047 elements. The following >>>>>> program demonstrates this: >>>>>> >>>>>> from numpy import * >>>>>> from numexpr import evaluate >>>>>> >>>>>> def func(x): >>>>>> >>>>>> return evaluate("sum(x, axis = 0)") >>>>>> >>>>>> >>>>>> x = zeros(2048)+.01 >>>>>> >>>>>> print evaluate("sum(x, axis = 0)") >>>>>> print evaluate("sum(x, axis = 0)") >>>>>> >>>>>> For me this prints different results each time, for example: >>>>>> >>>>>> 11.67 >>>>>> 14.84 >>>>>> >>>>>> If we set the size to 2047 I get consistent results. >>>>>> >>>>>> 20.47 >>>>>> 20.47 >>>>>> >>>>>> Interestingly, if I do not add .01 to x, it consistently sums to 0. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jan 24 16:29:19 2011 From: e.antero.tammi at gmail.com (eat) Date: Mon, 24 Jan 2011 21:29:19 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?How_to_improve_performance_of_slow_t?= =?utf-8?q?ri*=5Findices_calculations=3F?= Message-ID: Hi, Running on: In []: np.__version__ Out[]: '1.5.1' In []: sys.version Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' For the reference: In []: X= randn(10, 125) In []: timeit dot(X.T, X) 10000 loops, best of 3: 170 us per loop In []: X= randn(10, 250) In []: timeit dot(X.T, X) 1000 loops, best of 3: 671 us per loop In []: X= randn(10, 500) In []: timeit dot(X.T, X) 100 loops, best of 3: 5.15 ms per loop In []: X= randn(10, 1000) In []: timeit dot(X.T, X) 100 loops, best of 3: 20 ms per loop In []: X= randn(10, 2000) In []: timeit dot(X.T, X) 10 loops, best of 3: 80.7 ms per loop Performance of triu_indices: In []: timeit triu_indices(125) 1000 loops, best of 3: 662 us per loop In []: timeit triu_indices(250) 100 loops, best of 3: 2.55 ms per loop In []: timeit triu_indices(500) 100 loops, best of 3: 15 ms per loop In []: timeit triu_indices(1000) 10 loops, best of 3: 59.8 ms per loop In []: timeit triu_indices(2000) 1 loops, best of 3: 239 ms per loop So the tri*_indices calculations seems to be unreasonable slow compared to for example calculations of inner products. Now, just to compare for a very naive implementation of triu indices. In []: def iut(n): ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) ..: c= r.copy() ..: a= np.arange(n) ..: m= 0 ..: for i in xrange(n): ..: ni= n- i ..: mni= m+ ni ..: r[m: mni]= i ..: c[m: mni]= a[i: n] ..: m+= ni ..: return (r, c) ..: Are we really calculating the same thing? In []: triu_indices(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) In []: iut(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Seems so, and then its performance: In []: timeit iut(125) 1000 loops, best of 3: 992 us per loop In []: timeit iut(250) 100 loops, best of 3: 2.03 ms per loop In []: timeit iut(500) 100 loops, best of 3: 5.3 ms per loop In []: timeit iut(1000) 100 loops, best of 3: 13.9 ms per loop In []: timeit iut(2000) 10 loops, best of 3: 39.8 ms per loop Even the naive implementation is very slow, but allready outperforms triu_indices, when n is > 250! So finally my question is how one could substantially improve the performance of indices calculations? Regards, eat From jsseabold at gmail.com Mon Jan 24 17:47:59 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 24 Jan 2011 17:47:59 -0500 Subject: [Numpy-discussion] bug in genfromtxt with missing values? Message-ID: Am I misreading the docs or missing something? Consider the following adapted from here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html from StringIO import StringIO import numpy as np data = "1, 2, 3\n4, ,5" np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", missing_values=" ", filling_values=0) array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)], dtype=[('a', ' References: Message-ID: On Mon, Jan 24, 2011 at 4:29 PM, eat wrote: > Hi, > > Running on: > In []: np.__version__ > Out[]: '1.5.1' > In []: sys.version > Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' > > For the reference: > In []: X= randn(10, 125) > In []: timeit dot(X.T, X) > 10000 loops, best of 3: 170 us per loop > In []: X= randn(10, 250) > In []: timeit dot(X.T, X) > 1000 loops, best of 3: 671 us per loop > In []: X= randn(10, 500) > In []: timeit dot(X.T, X) > 100 loops, best of 3: 5.15 ms per loop > In []: X= randn(10, 1000) > In []: timeit dot(X.T, X) > 100 loops, best of 3: 20 ms per loop > In []: X= randn(10, 2000) > In []: timeit dot(X.T, X) > 10 loops, best of 3: 80.7 ms per loop > > Performance of triu_indices: > In []: timeit triu_indices(125) > 1000 loops, best of 3: 662 us per loop > In []: timeit triu_indices(250) > 100 loops, best of 3: 2.55 ms per loop > In []: timeit triu_indices(500) > 100 loops, best of 3: 15 ms per loop > In []: timeit triu_indices(1000) > 10 loops, best of 3: 59.8 ms per loop > In []: timeit triu_indices(2000) > 1 loops, best of 3: 239 ms per loop > > So the tri*_indices calculations seems to be unreasonable slow compared to for > example calculations of inner products. > > Now, just to compare for a very naive implementation of triu indices. > In []: def iut(n): > ? ..: ? ? r= np.empty(n* (n+ 1)/ 2, dtype= int) > ? ..: ? ? c= r.copy() > ? ..: ? ? a= np.arange(n) > ? ..: ? ? m= 0 > ? ..: ? ? for i in xrange(n): > ? ..: ? ? ? ? ni= n- i > ? ..: ? ? ? ? mni= m+ ni > ? ..: ? ? ? ? r[m: mni]= i > ? ..: ? ? ? ? c[m: mni]= a[i: n] > ? ..: ? ? ? ? m+= ni > ? ..: ? ? return (r, c) > ? ..: > > Are we really calculating the same thing? > In []: triu_indices(5) > Out[]: > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > ?array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > In []: iut(5) > Out[]: > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > ?array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > Seems so, and then its performance: > In []: timeit iut(125) > 1000 loops, best of 3: 992 us per loop > In []: timeit iut(250) > 100 loops, best of 3: 2.03 ms per loop > In []: timeit iut(500) > 100 loops, best of 3: 5.3 ms per loop > In []: timeit iut(1000) > 100 loops, best of 3: 13.9 ms per loop > In []: timeit iut(2000) > 10 loops, best of 3: 39.8 ms per loop > > Even the naive implementation is very slow, but allready outperforms > triu_indices, when n is > 250! > > So finally my question is how one could substantially improve the performance > of indices calculations? What's the timing of this version (taken from nitime) ? it builds a full intermediate array m = np.ones((n,n),int) a = np.triu(m,k) np.where(a != 0) >>> n=5 >>> m = np.ones((n,n),int) >>> np.where(np.triu(m,0) != 0) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) or maybe variations on it that all build a full intermediate matrix >>> np.where(1-np.tri(n,n,-1)) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >>> np.where(np.subtract.outer(np.arange(n), np.arange(n))<=0) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Josef > > > Regards, > eat > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 24 19:01:16 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 24 Jan 2011 19:01:16 -0500 Subject: [Numpy-discussion] How to improve performance of slow tri*_indices calculations? In-Reply-To: References: Message-ID: On Mon, Jan 24, 2011 at 6:49 PM, wrote: > On Mon, Jan 24, 2011 at 4:29 PM, eat wrote: >> Hi, >> >> Running on: >> In []: np.__version__ >> Out[]: '1.5.1' >> In []: sys.version >> Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' >> >> For the reference: >> In []: X= randn(10, 125) >> In []: timeit dot(X.T, X) >> 10000 loops, best of 3: 170 us per loop >> In []: X= randn(10, 250) >> In []: timeit dot(X.T, X) >> 1000 loops, best of 3: 671 us per loop >> In []: X= randn(10, 500) >> In []: timeit dot(X.T, X) >> 100 loops, best of 3: 5.15 ms per loop >> In []: X= randn(10, 1000) >> In []: timeit dot(X.T, X) >> 100 loops, best of 3: 20 ms per loop >> In []: X= randn(10, 2000) >> In []: timeit dot(X.T, X) >> 10 loops, best of 3: 80.7 ms per loop >> >> Performance of triu_indices: >> In []: timeit triu_indices(125) >> 1000 loops, best of 3: 662 us per loop >> In []: timeit triu_indices(250) >> 100 loops, best of 3: 2.55 ms per loop >> In []: timeit triu_indices(500) >> 100 loops, best of 3: 15 ms per loop >> In []: timeit triu_indices(1000) >> 10 loops, best of 3: 59.8 ms per loop >> In []: timeit triu_indices(2000) >> 1 loops, best of 3: 239 ms per loop >> >> So the tri*_indices calculations seems to be unreasonable slow compared to for >> example calculations of inner products. >> >> Now, just to compare for a very naive implementation of triu indices. >> In []: def iut(n): >> ? ..: ? ? r= np.empty(n* (n+ 1)/ 2, dtype= int) >> ? ..: ? ? c= r.copy() >> ? ..: ? ? a= np.arange(n) >> ? ..: ? ? m= 0 >> ? ..: ? ? for i in xrange(n): >> ? ..: ? ? ? ? ni= n- i >> ? ..: ? ? ? ? mni= m+ ni >> ? ..: ? ? ? ? r[m: mni]= i >> ? ..: ? ? ? ? c[m: mni]= a[i: n] >> ? ..: ? ? ? ? m+= ni >> ? ..: ? ? return (r, c) >> ? ..: >> >> Are we really calculating the same thing? >> In []: triu_indices(5) >> Out[]: >> (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), >> ?array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >> In []: iut(5) >> Out[]: >> (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), >> ?array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >> >> Seems so, and then its performance: >> In []: timeit iut(125) >> 1000 loops, best of 3: 992 us per loop >> In []: timeit iut(250) >> 100 loops, best of 3: 2.03 ms per loop >> In []: timeit iut(500) >> 100 loops, best of 3: 5.3 ms per loop >> In []: timeit iut(1000) >> 100 loops, best of 3: 13.9 ms per loop >> In []: timeit iut(2000) >> 10 loops, best of 3: 39.8 ms per loop >> >> Even the naive implementation is very slow, but allready outperforms >> triu_indices, when n is > 250! >> >> So finally my question is how one could substantially improve the performance >> of indices calculations? > > What's the timing of this version (taken from nitime) ? it builds a > full intermediate array I should have checked the numpy source first, that's exactly the implementation of triu_indices in numpy 1.5.1 Josef > > ? ?m = np.ones((n,n),int) > ? ?a = np.triu(m,k) > ? ?np.where(a != 0) > >>>> n=5 >>>> m = np.ones((n,n),int) >>>> np.where(np.triu(m,0) != 0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > or maybe variations on it that all build a full intermediate matrix > >>>> np.where(1-np.tri(n,n,-1)) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > >>>> np.where(np.subtract.outer(np.arange(n), np.arange(n))<=0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > Josef > >> >> >> Regards, >> eat >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From e.antero.tammi at gmail.com Tue Jan 25 08:08:26 2011 From: e.antero.tammi at gmail.com (E. Antero Tammi) Date: Tue, 25 Jan 2011 15:08:26 +0200 Subject: [Numpy-discussion] How to improve performance of slow tri*_indices calculations? In-Reply-To: References: Message-ID: Hi, On Tue, Jan 25, 2011 at 1:49 AM, wrote: > On Mon, Jan 24, 2011 at 4:29 PM, eat wrote: > > Hi, > > > > Running on: > > In []: np.__version__ > > Out[]: '1.5.1' > > In []: sys.version > > Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit > (Intel)]' > > > > For the reference: > > In []: X= randn(10, 125) > > In []: timeit dot(X.T, X) > > 10000 loops, best of 3: 170 us per loop > > In []: X= randn(10, 250) > > In []: timeit dot(X.T, X) > > 1000 loops, best of 3: 671 us per loop > > In []: X= randn(10, 500) > > In []: timeit dot(X.T, X) > > 100 loops, best of 3: 5.15 ms per loop > > In []: X= randn(10, 1000) > > In []: timeit dot(X.T, X) > > 100 loops, best of 3: 20 ms per loop > > In []: X= randn(10, 2000) > > In []: timeit dot(X.T, X) > > 10 loops, best of 3: 80.7 ms per loop > > > > Performance of triu_indices: > > In []: timeit triu_indices(125) > > 1000 loops, best of 3: 662 us per loop > > In []: timeit triu_indices(250) > > 100 loops, best of 3: 2.55 ms per loop > > In []: timeit triu_indices(500) > > 100 loops, best of 3: 15 ms per loop > > In []: timeit triu_indices(1000) > > 10 loops, best of 3: 59.8 ms per loop > > In []: timeit triu_indices(2000) > > 1 loops, best of 3: 239 ms per loop > > > > So the tri*_indices calculations seems to be unreasonable slow compared > to for > > example calculations of inner products. > > > > Now, just to compare for a very naive implementation of triu indices. > > In []: def iut(n): > > ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) > > ..: c= r.copy() > > ..: a= np.arange(n) > > ..: m= 0 > > ..: for i in xrange(n): > > ..: ni= n- i > > ..: mni= m+ ni > > ..: r[m: mni]= i > > ..: c[m: mni]= a[i: n] > > ..: m+= ni > > ..: return (r, c) > > ..: > > > > Are we really calculating the same thing? > > In []: triu_indices(5) > > Out[]: > > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > In []: iut(5) > > Out[]: > > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > > > Seems so, and then its performance: > > In []: timeit iut(125) > > 1000 loops, best of 3: 992 us per loop > > In []: timeit iut(250) > > 100 loops, best of 3: 2.03 ms per loop > > In []: timeit iut(500) > > 100 loops, best of 3: 5.3 ms per loop > > In []: timeit iut(1000) > > 100 loops, best of 3: 13.9 ms per loop > > In []: timeit iut(2000) > > 10 loops, best of 3: 39.8 ms per loop > > > > Even the naive implementation is very slow, but allready outperforms > > triu_indices, when n is > 250! > > > > So finally my question is how one could substantially improve the > performance > > of indices calculations? > > What's the timing of this version (taken from nitime) ? it builds a > full intermediate array > > m = np.ones((n,n),int) > a = np.triu(m,k) > np.where(a != 0) > > >>> n=5 > >>> m = np.ones((n,n),int) > >>> np.where(np.triu(m,0) != 0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) This is ~20% slower than triu_indices. > > or maybe variations on it that all build a full intermediate matrix > > >>> np.where(1-np.tri(n,n,-1)) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) This ~5% faster than triu_indicies. > > >>> np.where(np.subtract.outer(np.arange(n), np.arange(n))<=0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Clever, and its 50% faster than triu_indices, but the naive implementaion is still almost 3x faster. However it seems that some 80% of time is spent in where. So simple subtract.outer(arange(n), arange(n))<= 0 and logical indenxing outperforms slightly the naive version. I was hoping to find a way to do this at least 10x faster than naive implementation, but meanwhile I'll stick with logical indexing. Thanks, eat > > Josef > > > > > > > Regards, > > eat > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Tue Jan 25 09:08:15 2011 From: faltet at pytables.org (Francesc Alted) Date: Tue, 25 Jan 2011 15:08:15 +0100 Subject: [Numpy-discussion] Numexpr giving randomized results on arrays larger than 2047 elements In-Reply-To: References: Message-ID: <201101251508.15074.faltet@pytables.org> A Monday 24 January 2011 18:47:58 John Salvatier escrigu?: > Hello, > > I have discovered a strange bug with numexpr. numexpr.evaluate gives > randomized results on arrays larger than 2047 elements. The following > program demonstrates this: > > from numpy import * > from numexpr import evaluate > > def func(x): > > return evaluate("sum(x, axis = 0)") > > > x = zeros(2048)+.01 > > print evaluate("sum(x, axis = 0)") > print evaluate("sum(x, axis = 0)") > > For me this prints different results each time, for example: > > 11.67 > 14.84 > > If we set the size to 2047 I get consistent results. > > 20.47 > 20.47 > > Interestingly, if I do not add .01 to x, it consistently sums to 0. I'm about to release Numexpr 1.4.2 that should fix this. Could you give it a try at the tarball in?: python setup.py sdist Thanks, -- Francesc Alted From pgmdevlist at gmail.com Tue Jan 25 11:17:55 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 25 Jan 2011 17:17:55 +0100 Subject: [Numpy-discussion] bug in genfromtxt with missing values? In-Reply-To: References: Message-ID: <66102999-E2CE-4666-8D76-1B2AC3ED9235@gmail.com> On Jan 24, 2011, at 11:47 PM, Skipper Seabold wrote: > Am I misreading the docs or missing something? Consider the following > adapted from here: > http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html > > from StringIO import StringIO > import numpy as np > > data = "1, 2, 3\n4, ,5" > > np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", > missing_values=" ", filling_values=0) > array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)], > dtype=[('a', ' > np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", > missing_values={'b':" "}, filling_values={'b' : 0}) > array([(1.0, 2.0, 3.0), (4.0, 0.0, 5.0)], > dtype=[('a', ' > Unless I use the dict for missing_values, it doesn't fill them in. > It's probably a bug . Mind opening a ticket ? I'll try to care of it when I can. Thx in advance P. From charlesr.harris at gmail.com Tue Jan 25 11:42:01 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Jan 2011 09:42:01 -0700 Subject: [Numpy-discussion] Numpy 2.0 schedule Message-ID: Hi All, Just thought it was time to start discussing a release schedule for numpy 2.0 so we have something to aim at. I'm thinking sometime in the period April-June might be appropriate. There is a lot coming with the next release: the Enthought's numpy refactoring, Mark's float16 and iterator work, and support for IronPython. How do things look to the folks involved in those projects? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Jan 25 11:56:55 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 25 Jan 2011 11:56:55 -0500 Subject: [Numpy-discussion] bug in genfromtxt with missing values? In-Reply-To: <66102999-E2CE-4666-8D76-1B2AC3ED9235@gmail.com> References: <66102999-E2CE-4666-8D76-1B2AC3ED9235@gmail.com> Message-ID: On Tue, Jan 25, 2011 at 11:17 AM, Pierre GM wrote: > > On Jan 24, 2011, at 11:47 PM, Skipper Seabold wrote: > >> Am I misreading the docs or missing something? ?Consider the following >> adapted from here: >> http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html >> >> from StringIO import StringIO >> import numpy as np >> >> data = "1, 2, 3\n4, ,5" >> >> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", >> missing_values=" ", filling_values=0) >> array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)], >> ? ? ?dtype=[('a', '> >> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", >> missing_values={'b':" "}, filling_values={'b' : 0}) >> array([(1.0, 2.0, 3.0), (4.0, 0.0, 5.0)], >> ? ? ?dtype=[('a', '> >> Unless I use the dict for missing_values, it doesn't fill them in. >> > > It's probably a bug . Mind opening a ticket ? I'll try to care of it when I can. > Thx in advance > P. http://projects.scipy.org/numpy/ticket/1722 Forgot to use the code formatting, and it doesn't look like I can edit. Thanks, Skipper From faltet at pytables.org Tue Jan 25 12:39:11 2011 From: faltet at pytables.org (Francesc Alted) Date: Tue, 25 Jan 2011 18:39:11 +0100 Subject: [Numpy-discussion] ANN: Numexpr 1.4.2 released Message-ID: <201101251839.11824.faltet@pytables.org> ========================== Announcing Numexpr 1.4.2 ========================== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. What's new ========== This is a maintenance release. The most annying issues have been fixed (including the reduction bugs introduced in 1.4 series). Also, several performance enhancements are included too. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted From bsouthey at gmail.com Tue Jan 25 15:06:12 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 25 Jan 2011 14:06:12 -0600 Subject: [Numpy-discussion] bug in genfromtxt with missing values? In-Reply-To: References: <66102999-E2CE-4666-8D76-1B2AC3ED9235@gmail.com> Message-ID: <4D3F2D34.4050304@gmail.com> On 01/25/2011 10:56 AM, Skipper Seabold wrote: > On Tue, Jan 25, 2011 at 11:17 AM, Pierre GM wrote: >> On Jan 24, 2011, at 11:47 PM, Skipper Seabold wrote: >> >>> Am I misreading the docs or missing something? Consider the following >>> adapted from here: >>> http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html >>> >>> from StringIO import StringIO >>> import numpy as np >>> >>> data = "1, 2, 3\n4, ,5" >>> >>> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", >>> missing_values=" ", filling_values=0) >>> array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)], >>> dtype=[('a', '>> >>> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", >>> missing_values={'b':" "}, filling_values={'b' : 0}) >>> array([(1.0, 2.0, 3.0), (4.0, 0.0, 5.0)], >>> dtype=[('a', '>> >>> Unless I use the dict for missing_values, it doesn't fill them in. >>> >> It's probably a bug . Mind opening a ticket ? I'll try to care of it when I can. >> Thx in advance >> P. > http://projects.scipy.org/numpy/ticket/1722 > > Forgot to use the code formatting, and it doesn't look like I can edit. > > Thanks, > > Skipper Hi, Your filling_values is zero so there is this line (1295?) in the code: user_filling_values = filling_values or [] Which of cause presumes your filling_values is not something like 0 or [0]. Now it can be a code bug or just undocumented feature that filling_values can not a single zero. Thus something like these work: np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", filling_values=-90) np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", filling_values=[0,0]) np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c", filling_values=[0,0,0]) Bruce From oliphant at enthought.com Tue Jan 25 15:13:40 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 25 Jan 2011 14:13:40 -0600 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Jan 25, 2011, at 10:42 AM, Charles R Harris wrote: > Hi All, > > Just thought it was time to start discussing a release schedule for numpy 2.0 so we have something to aim at. I'm thinking sometime in the period April-June might be appropriate. There is a lot coming with the next release: the Enthought's numpy refactoring, Mark's float16 and iterator work, and support for IronPython. How do things look to the folks involved in those projects? I would target June / July at this point ;-) I know I deserve a "I told you so" from Chuck --- I will take it. There is a bit of work that Mark is doing that would be good to include, also some modifications to the re-factoring that will support better small array performance. It may make sense for a NumPy 1.6 to come out in March / April in the interim. Thoughts? -Travis > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From daniele at grinta.net Tue Jan 25 16:04:42 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 25 Jan 2011 22:04:42 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation Message-ID: <4D3F3AEA.9050600@grinta.net> Hello, I'm trying to write a numerical implementation of Wiener filtering / decorrelation (extraction of a signal from noisy time series). What I'm trying to do is the construction of the time domain filter from a measurement of the power spectrum of the noise and the shape of the signal. However I'm encountering some problems that are beyond my knowledge of the matter. Can someone suggest me a reference text book, or other resource? Thank you in advance. Cheers, -- Daniele From jrocher at enthought.com Tue Jan 25 16:12:59 2011 From: jrocher at enthought.com (Jonathan Rocher) Date: Tue, 25 Jan 2011 15:12:59 -0600 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: <4D3F3AEA.9050600@grinta.net> References: <4D3F3AEA.9050600@grinta.net> Message-ID: Hi Daniele, I would recommend the Numerical recipes in Fortran 77, obviously not for the language but for its mathematical sections and its discussions of coding algorithms efficiently. Section 13.3 is about wiener filtering with FFT. Hope this helps, Jonathan On Tue, Jan 25, 2011 at 3:04 PM, Daniele Nicolodi wrote: > Hello, > > I'm trying to write a numerical implementation of Wiener filtering / > decorrelation (extraction of a signal from noisy time series). What I'm > trying to do is the construction of the time domain filter from a > measurement of the power spectrum of the noise and the shape of the signal. > > However I'm encountering some problems that are beyond my knowledge of > the matter. Can someone suggest me a reference text book, or other > resource? > > Thank you in advance. Cheers, > -- > Daniele > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Jonathan Rocher, Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jan 25 16:20:02 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 25 Jan 2011 21:20:02 +0000 (UTC) Subject: [Numpy-discussion] Wiener filter / decorrelation References: <4D3F3AEA.9050600@grinta.net> Message-ID: Hi, On Tue, 25 Jan 2011 22:04:42 +0100, Daniele Nicolodi wrote: [clip] > However I'm encountering some problems that are beyond my knowledge of > the matter. Can someone suggest me a reference text book, or other > resource? Scipy-user list would be more appropriate for queries not directly involving Numpy. -- Pauli Virtanen From daniele at grinta.net Tue Jan 25 16:26:50 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 25 Jan 2011 22:26:50 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: References: <4D3F3AEA.9050600@grinta.net> Message-ID: <4D3F401A.3060203@grinta.net> On 25/01/11 22:12, Jonathan Rocher wrote: > I would recommend the Numerical recipes in Fortran 77, obviously not for > the language but for its mathematical sections and its discussions of > coding algorithms efficiently. Section 13.3 is about wiener filtering > with FFT. Thank you, Jonathan. I took at look at my university library catalog, and the Fortran 77 version of Numerical Recipes is not available (I would have to get it at the engineering faculty). There is available the Fortran 90, and the C editions, plus another edition whose title is simply "Numerical Recipes". Is the content of the different editions equivalent, or should I look for this specific edition? Thank you again. Cheers, -- Daniele From daniele at grinta.net Tue Jan 25 16:29:35 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 25 Jan 2011 22:29:35 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: References: <4D3F3AEA.9050600@grinta.net> Message-ID: <4D3F40BF.7090303@grinta.net> On 25/01/11 22:20, Pauli Virtanen wrote: >> However I'm encountering some problems that are beyond my knowledge of >> the matter. Can someone suggest me a reference text book, or other >> resource? > > Scipy-user list would be more appropriate for queries not directly > involving Numpy. Sorry. I thought that the -discussion suffix meant a broader topic than numpy only. I'll address other similar questions to scipy-user. Thank. Cheers, -- Daniele From pgmdevlist at gmail.com Tue Jan 25 16:58:37 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 25 Jan 2011 22:58:37 +0100 Subject: [Numpy-discussion] bug in genfromtxt with missing values? In-Reply-To: <4D3F2D34.4050304@gmail.com> References: <66102999-E2CE-4666-8D76-1B2AC3ED9235@gmail.com> <4D3F2D34.4050304@gmail.com> Message-ID: <54D5EE9F-788E-4CAF-BD3A-8449A510FD90@gmail.com> On Jan 25, 2011, at 9:06 PM, Bruce Southey wrote: > Your filling_values is zero so there is this line (1295?) in the code: > user_filling_values = filling_values or [] > > Which of cause presumes your filling_values is not something like 0 or [0]. That's the bug. I forgot that filling_values could be 0. (I was more thinking of None) so it should be if filling_values is None: filling_values = [] user_filling_values = filling_values. > Now it can be a code bug or just undocumented feature that > filling_values can not a single zero. Thus something like these work: You're too kind. That's just sloppy coding... If you correct it before i do, don't forget to add a test case... Thx again P. From charlesr.harris at gmail.com Tue Jan 25 17:00:15 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Jan 2011 15:00:15 -0700 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: <4D3F401A.3060203@grinta.net> References: <4D3F3AEA.9050600@grinta.net> <4D3F401A.3060203@grinta.net> Message-ID: On Tue, Jan 25, 2011 at 2:26 PM, Daniele Nicolodi wrote: > On 25/01/11 22:12, Jonathan Rocher wrote: > > I would recommend the Numerical recipes in Fortran 77, obviously not for > > the language but for its mathematical sections and its discussions of > > coding algorithms efficiently. Section 13.3 is about wiener filtering > > with FFT. > > Thank you, Jonathan. > > I took at look at my university library catalog, and the Fortran 77 > version of Numerical Recipes is not available (I would have to get it at > the engineering faculty). There is available the Fortran 90, and the C > editions, plus another edition whose title is simply "Numerical > Recipes". Is the content of the different editions equivalent, or should > I look for this specific edition? > > The edition/lanquage doesn't matter much. The older editions are available online, just google numerical recipes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrocher at enthought.com Tue Jan 25 17:21:59 2011 From: jrocher at enthought.com (Jonathan Rocher) Date: Tue, 25 Jan 2011 16:21:59 -0600 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: References: <4D3F3AEA.9050600@grinta.net> <4D3F401A.3060203@grinta.net> Message-ID: Actually I believe the version does matter: I have seen a C version of num rec that doesn't contain all the algorithmic part but only the codes. I cannot remember exactly which ones are the light versions. If I had to guess, the F90 is also a light version and that's why I bought the F77 book. Jonathan On Tue, Jan 25, 2011 at 4:00 PM, Charles R Harris wrote: > > > On Tue, Jan 25, 2011 at 2:26 PM, Daniele Nicolodi wrote: > >> On 25/01/11 22:12, Jonathan Rocher wrote: >> > I would recommend the Numerical recipes in Fortran 77, obviously not for >> > the language but for its mathematical sections and its discussions of >> > coding algorithms efficiently. Section 13.3 is about wiener filtering >> > with FFT. >> >> Thank you, Jonathan. >> >> I took at look at my university library catalog, and the Fortran 77 >> version of Numerical Recipes is not available (I would have to get it at >> the engineering faculty). There is available the Fortran 90, and the C >> editions, plus another edition whose title is simply "Numerical >> Recipes". Is the content of the different editions equivalent, or should >> I look for this specific edition? >> >> > The edition/lanquage doesn't matter much. The older editions are available > online, just google numerical recipes. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Jonathan Rocher, Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Tue Jan 25 20:05:17 2011 From: david at silveregg.co.jp (David) Date: Wed, 26 Jan 2011 10:05:17 +0900 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: <4D3F734D.3000902@silveregg.co.jp> On 01/26/2011 01:42 AM, Charles R Harris wrote: > Hi All, > > Just thought it was time to start discussing a release schedule for > numpy 2.0 so we have something to aim at. I'm thinking sometime in the > period April-June might be appropriate. There is a lot coming with the > next release: the Enthought's numpy refactoring, Mark's float16 and > iterator work, and support for IronPython. How do things look to the > folks involved in those projects? One thing which I was wondering about numpy 2.0: what's the story for the C-API compared to 1.x for extensions. Is it fundamentally different so that extensions will need to be rewritten ? I especially wonder about scipy and cython's codegen backend, cheers, David From charlesr.harris at gmail.com Tue Jan 25 20:12:16 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Jan 2011 18:12:16 -0700 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: <4D3F734D.3000902@silveregg.co.jp> References: <4D3F734D.3000902@silveregg.co.jp> Message-ID: On Tue, Jan 25, 2011 at 6:05 PM, David wrote: > On 01/26/2011 01:42 AM, Charles R Harris wrote: > > Hi All, > > > > Just thought it was time to start discussing a release schedule for > > numpy 2.0 so we have something to aim at. I'm thinking sometime in the > > period April-June might be appropriate. There is a lot coming with the > > next release: the Enthought's numpy refactoring, Mark's float16 and > > iterator work, and support for IronPython. How do things look to the > > folks involved in those projects? > > One thing which I was wondering about numpy 2.0: what's the story for > the C-API compared to 1.x for extensions. Is it fundamentally different > so that extensions will need to be rewritten ? I especially wonder about > scipy and cython's codegen backend, > > The C-API looks the same but anything hard coded type numbers and such will have problems. I would like to see the initial parts of the merge go in as early as possible so we can start chasing down any problems that turn up. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 25 20:18:24 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Jan 2011 18:18:24 -0700 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant wrote: > > On Jan 25, 2011, at 10:42 AM, Charles R Harris wrote: > > > Hi All, > > > > Just thought it was time to start discussing a release schedule for numpy > 2.0 so we have something to aim at. I'm thinking sometime in the period > April-June might be appropriate. There is a lot coming with the next > release: the Enthought's numpy refactoring, Mark's float16 and iterator > work, and support for IronPython. How do things look to the folks involved > in those projects? > > I would target June / July at this point ;-) I know I deserve a "I told > you so" from Chuck --- I will take it. > > How much remains to get done? > There is a bit of work that Mark is doing that would be good to include, > also some modifications to the re-factoring that will support better small > array performance. > > Not everything needs to go into first release as long as the following releases are backward compatible. So the ABI needs it's final form as soon as possible. Is it still in flux? > It may make sense for a NumPy 1.6 to come out in March / April in the > interim. > > Pulling out the changes to attain backward compatibility isn't getting any easier. I'd rather shoot for 2.0 in June. What can the rest of us do to help move things along? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jan 25 23:28:37 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 25 Jan 2011 20:28:37 -0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris wrote: > > > On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant wrote: > >> >> On Jan 25, 2011, at 10:42 AM, Charles R Harris wrote: >> >> > Hi All, >> > >> > Just thought it was time to start discussing a release schedule for >> numpy 2.0 so we have something to aim at. I'm thinking sometime in the >> period April-June might be appropriate. There is a lot coming with the next >> release: the Enthought's numpy refactoring, Mark's float16 and iterator >> work, and support for IronPython. How do things look to the folks involved >> in those projects? >> > My suggestion is to do a 1.6 relatively soon, as the current trunk feels pretty stable to me, and it would be nice to release the features without having to go through the whole merging process. > >> I would target June / July at this point ;-) I know I deserve a "I told >> you so" from Chuck --- I will take it. >> >> > How much remains to get done? > My changes probably make merging the refactor more challenging too. > > >> There is a bit of work that Mark is doing that would be good to include, >> also some modifications to the re-factoring that will support better small >> array performance. >> >> > Not everything needs to go into first release as long as the following > releases are backward compatible. So the ABI needs it's final form as soon > as possible. Is it still in flux? > I would suggest it is - there are a number of things I think could be improved in it, and it would be nice to bake in the underlying support features to make lazy/deferred evaluation of array expressions possible. > It may make sense for a NumPy 1.6 to come out in March / April in the >> interim. >> >> > Pulling out the changes to attain backward compatibility isn't getting any > easier. I'd rather shoot for 2.0 in June. What can the rest of us do to help > move things along? > I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was the main issue, then that might be done. An ABI compatible 1.6 with the datetime and half types should be doable, just some extensions might get confused if they encounter arrays made with the new data types. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Wed Jan 26 04:47:28 2011 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 26 Jan 2011 10:47:28 +0100 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: <4D3F734D.3000902@silveregg.co.jp> References: <4D3F734D.3000902@silveregg.co.jp> Message-ID: <4D3FEDB0.5020700@student.matnat.uio.no> On 01/26/2011 02:05 AM, David wrote: > On 01/26/2011 01:42 AM, Charles R Harris wrote: > >> Hi All, >> >> Just thought it was time to start discussing a release schedule for >> numpy 2.0 so we have something to aim at. I'm thinking sometime in the >> period April-June might be appropriate. There is a lot coming with the >> next release: the Enthought's numpy refactoring, Mark's float16 and >> iterator work, and support for IronPython. How do things look to the >> folks involved in those projects? >> > One thing which I was wondering about numpy 2.0: what's the story for > the C-API compared to 1.x for extensions. Is it fundamentally different > so that extensions will need to be rewritten ? I especially wonder about > scipy and cython's codegen backend, > For CPython, my understanding is that extensions that access struct fields directly without accessor macros need to be changed, but not much else. There's a "backwards-compatability" PyArray_* API for CPython. That doesn't work for .NET, but neither does anything else in C extensions. So in the SciPy port to .NET there's my efforts to replace f2py with fwrap/Cython, and many SciPy C extensions will be rewritten in Cython. These will use the Npy_* interface (or backwards-compatability PyArray_* wrappers in numpy.pxd, but these only work in Cython under .NET, not in C, due to typing issues (what is "object" and so on)). Dag Sverre From ralf.gommers at googlemail.com Wed Jan 26 05:23:48 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 26 Jan 2011 18:23:48 +0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 12:28 PM, Mark Wiebe wrote: > On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant wrote: >> >>> >>> It may make sense for a NumPy 1.6 to come out in March / April in the >>> interim. >>> >>> >> Pulling out the changes to attain backward compatibility isn't getting any >> easier. I'd rather shoot for 2.0 in June. What can the rest of us do to help >> move things along? >> > > Focusing on 2.0 makes sense to me too. Besides that, March/April is bad timing for me so someone else should volunteer to be the release manager if we go for a 1.6. I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was > the main issue, then that might be done. An ABI compatible 1.6 with the > datetime and half types should be doable, just some extensions might get > confused if they encounter arrays made with the new data types. > > Even if you fixed the ABI incompatibility (I don't know enough about the issue to confirm that), I'm not sure how much value there is in a release with as main new feature two dtypes that are not going to work well with scipy/other binaries compiled against 1.5. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Wed Jan 26 07:22:09 2011 From: e.antero.tammi at gmail.com (eat) Date: Wed, 26 Jan 2011 14:22:09 +0200 Subject: [Numpy-discussion] tril, triu, document/ implementation conflict Message-ID: Hi, I just noticed a document/ implementation conflict with tril and triu. According tril documentation it should return of same shape and data-type as called. But this is not the case at least with dtype bool. The input shape is referred as (M, N) in tril and triu, but as (N, M) in tri. Inconsistent? Also I'm not very happy with the performance, at least dtype bool can be accelerated as follows. In []: M= ones((2000, 3000), dtype= bool) In []: timeit triu(M) 10 loops, best of 3: 173 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 107 ms per loop In []: M= asarray(M, dtype= int) In []: timeit triu(M) 10 loops, best of 3: 160 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 163 ms per loop In []: M= asarray(M, dtype= float) In []: timeit triu(M) 10 loops, best of 3: 195 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 157 ms per loop I have attached a crude 'fix' incase someone is interested. Regards, eat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: twodim_base_fix.py Type: application/octet-stream Size: 3886 bytes Desc: not available URL: From josef.pktd at gmail.com Wed Jan 26 07:35:44 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 26 Jan 2011 07:35:44 -0500 Subject: [Numpy-discussion] tril, triu, document/ implementation conflict In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 7:22 AM, eat wrote: > Hi, > > I just noticed a document/ implementation conflict with tril and triu. > According tril documentation it should return of same shape and data-type as > called. But this is not the case at least with dtype bool. > > The input shape is referred as (M, N) in tril and triu, but as (N, M) in > tri. > Inconsistent? > > Also I'm not very happy with the performance, at least dtype bool can be > accelerated as follows. > > In []: M= ones((2000, 3000), dtype= bool) > In []: timeit triu(M) > 10 loops, best of 3: 173 ms per loop > In []: timeit triu_(M) > 10 loops, best of 3: 107 ms per loop > > In []: M= asarray(M, dtype= int) > In []: timeit triu(M) > 10 loops, best of 3: 160 ms per loop > In []: timeit triu_(M) > 10 loops, best of 3: 163 ms per loop > > In []: M= asarray(M, dtype= float) > In []: timeit triu(M) > 10 loops, best of 3: 195 ms per loop > In []: timeit triu_(M) > 10 loops, best of 3: 157 ms per loop > > I have attached a crude 'fix' incase someone is interested. You could open a ticket for this. just one comment: I don't think this is readable, especially if we only look at the source of the function with np.source out= mul(ge(so(ar(m.shape[0]), ar(m.shape[1])), -k), m) from np.source(np.tri) with numpy 1.5.1 m = greater_equal(subtract.outer(arange(N), arange(M)),-k) Josef > > Regards, > eat > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From e.antero.tammi at gmail.com Wed Jan 26 08:12:46 2011 From: e.antero.tammi at gmail.com (eat) Date: Wed, 26 Jan 2011 15:12:46 +0200 Subject: [Numpy-discussion] tril, triu, document/ implementation conflict In-Reply-To: References: Message-ID: Hi, On Wed, Jan 26, 2011 at 2:35 PM, wrote: > On Wed, Jan 26, 2011 at 7:22 AM, eat wrote: > > Hi, > > > > I just noticed a document/ implementation conflict with tril and triu. > > According tril documentation it should return of same shape and data-type > as > > called. But this is not the case at least with dtype bool. > > > > The input shape is referred as (M, N) in tril and triu, but as (N, M) in > > tri. > > Inconsistent? > Any comments about the names for rows and cols. I prefer (M, N). > > > > Also I'm not very happy with the performance, at least dtype bool can be > > accelerated as follows. > > > > In []: M= ones((2000, 3000), dtype= bool) > > In []: timeit triu(M) > > 10 loops, best of 3: 173 ms per loop > > In []: timeit triu_(M) > > 10 loops, best of 3: 107 ms per loop > > > > In []: M= asarray(M, dtype= int) > > In []: timeit triu(M) > > 10 loops, best of 3: 160 ms per loop > > In []: timeit triu_(M) > > 10 loops, best of 3: 163 ms per loop > > > > In []: M= asarray(M, dtype= float) > > In []: timeit triu(M) > > 10 loops, best of 3: 195 ms per loop > > In []: timeit triu_(M) > > 10 loops, best of 3: 157 ms per loop > > > > I have attached a crude 'fix' incase someone is interested. > > You could open a ticket for this. > > just one comment: > I don't think this is readable, especially if we only look at the > source of the function with np.source > > out= mul(ge(so(ar(m.shape[0]), ar(m.shape[1])), -k), m) > > from np.source(np.tri) with numpy 1.5.1 > m = greater_equal(subtract.outer(arange(N), arange(M)),-k) I agree, thats why I called it crude. Before opening a ticket I'll try to figure out if there exists somewhere in numpy .astype functionality, but not copying if allready proper dtype. Also I'm afraid that I can't produce sufficient testing. Regards, eat > > Josef > > > > > Regards, > > eat > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Jan 26 08:40:56 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 26 Jan 2011 22:40:56 +0900 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: <4D3FEDB0.5020700@student.matnat.uio.no> References: <4D3F734D.3000902@silveregg.co.jp> <4D3FEDB0.5020700@student.matnat.uio.no> Message-ID: On Wed, Jan 26, 2011 at 6:47 PM, Dag Sverre Seljebotn wrote: > On 01/26/2011 02:05 AM, David wrote: >> On 01/26/2011 01:42 AM, Charles R Harris wrote: >> >>> Hi All, >>> >>> Just thought it was time to start discussing a release schedule for >>> numpy 2.0 so we have something to aim at. I'm thinking sometime in the >>> period April-June might be appropriate. There is a lot coming with the >>> next release: the Enthought's numpy refactoring, Mark's float16 and >>> iterator work, and support for IronPython. How do things look to the >>> folks involved in those projects? >>> >> One thing which I was wondering about numpy 2.0: what's the story for >> the C-API compared to 1.x for extensions. Is it fundamentally different >> so that extensions will need to be rewritten ? I especially wonder about >> scipy and cython's codegen backend, >> > > For CPython, my understanding is that extensions that access struct > fields directly without accessor macros need to be changed, but not much > else. There's a "backwards-compatability" PyArray_* API for CPython. > > That doesn't work for .NET, but neither does anything else in C > extensions. So in the SciPy port to .NET there's my efforts to replace > f2py with fwrap/Cython, and many SciPy C extensions will be rewritten in > Cython. These will use the Npy_* interface (or backwards-compatability > PyArray_* wrappers in numpy.pxd, but these only work in Cython under > .NET, not in C, due to typing issues (what is "object" and so on)). Ok, good to know. A good test would be to continuously build numpy + scipy on top of it ASAP. Do you think cython (or is it sage) could donate some CPU resources on the cython CI server for numpy ? I could spend some time to make that work, cheers, David > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From scipy at SamuelJohn.de Wed Jan 26 09:08:32 2011 From: scipy at SamuelJohn.de (Samuel John) Date: Wed, 26 Jan 2011 15:08:32 +0100 Subject: [Numpy-discussion] How to tell if I succeeded to build numpy with amd, umfpack and lapack Message-ID: Hi there! I have successfully built numpy 1.5 on ubuntu lucid (32 for now). I think I got ATLAS/lapack/BLAS support, and if I > ldd linalg/lapack_lite.so I see that my libptf77blas.so etc. are successfully linked. :-) However, how to I find out, if (and where) libamd.a and libumfpack.a have been found and (statically) linked. As far as I understand, I they are not present, a fallback in pure python is used, right? Is there a recommended way, I can query against which libs numpy has been built? So I can be sure numpy uses my own compiled versions of libamd, lapack and so forth. And the fftw3 is no longer supported, I guess (even if it is still mentioned in the site.cfg.example) Bests, Samuel -- Dipl.-Inform. Samuel John - - - - - - - - - - - - - - - - - - - - - - - - - PhD student, CoR-Lab(.de) and Neuroinformatics Group, Faculty of Technology, D33594 Bielefeld in cooperation with the HONDA Research Institute Europe GmbH jabber: samueljohn at jabber.org - - - - - - - - - - - - - - - - - - - - - - - - - From bsouthey at gmail.com Wed Jan 26 11:51:02 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 26 Jan 2011 10:51:02 -0600 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: <4D4050F6.6060603@gmail.com> On 01/25/2011 10:28 PM, Mark Wiebe wrote: > On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris > > wrote: > > > > On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant > > wrote: > > > On Jan 25, 2011, at 10:42 AM, Charles R Harris wrote: > > > Hi All, > > > > Just thought it was time to start discussing a release > schedule for numpy 2.0 so we have something to aim at. I'm > thinking sometime in the period April-June might be > appropriate. There is a lot coming with the next release: the > Enthought's numpy refactoring, Mark's float16 and iterator > work, and support for IronPython. How do things look to the > folks involved in those projects? > > > My suggestion is to do a 1.6 relatively soon, as the current trunk > feels pretty stable to me, and it would be nice to release the > features without having to go through the whole merging process. > > > I would target June / July at this point ;-) I know I > deserve a "I told you so" from Chuck --- I will take it. > > > How much remains to get done? > > > My changes probably make merging the refactor more challenging too. > > > There is a bit of work that Mark is doing that would be good > to include, also some modifications to the re-factoring that > will support better small array performance. > > > Not everything needs to go into first release as long as the > following releases are backward compatible. So the ABI needs it's > final form as soon as possible. Is it still in flux? > > > I would suggest it is - there are a number of things I think could be > improved in it, and it would be nice to bake in the underlying support > features to make lazy/deferred evaluation of array expressions possible. > > It may make sense for a NumPy 1.6 to come out in March / April > in the interim. > > > Pulling out the changes to attain backward compatibility isn't > getting any easier. I'd rather shoot for 2.0 in June. What can the > rest of us do to help move things along? > > > I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc > was the main issue, then that might be done. An ABI compatible 1.6 > with the datetime and half types should be doable, just some > extensions might get confused if they encounter arrays made with the > new data types. > > -Mark > I do understand that it may take time for the 'dust to settle' but there is the opportunity to implement aspects that may require 'significant' notification or least start the process for any appropriate changes. So, would it be possible to start developing some strategic plan of the changes that will occur? The type of things I think are in terms of: 1) Notifying/warning users of the API changes that will occur. I agree with Chuck that other 'eyes' need to see it. 2) Add any desired depreciation warnings but I do not know of any. Perhaps the files in numpy/oldnumeric and numpy/numarray - if these are still important then these should have a better home since both have not had a release since mid 2006. 3) Changes or reorganization of the namespace. My personal one is my ticket 1051 (Renaming and removing NaN and related IEEE754 special cases): http://projects.scipy.org/numpy/ticket/1051 Hopefully some of it will be applied. 4) Changes in functions. Examples: Ticket 1262 (genfromtxt: dtype should be None by default) http://projects.scipy.org/numpy/ticket/1262 Tickets 465 and 518 related to the accumulator dtype argument issues because this topic keeps appearing on the list. http://projects.scipy.org/numpy/ticket/518 http://projects.scipy.org/numpy/ticket/465 For example, perhaps changing the default arguments of mean in numpy/core/fromnumeric.py as that allows the old behavior to remain by changing the dtype argument: Change: def mean(a, axis=None, dtype=None, out=None): To: def mean(a, axis=None, dtype=float, out=None): 5) Adding any enhancement patches like median of Ticket 1213 http://projects.scipy.org/numpy/ticket/1213 Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuanjim013 at gmail.com Wed Jan 26 12:48:48 2011 From: liuhuanjim013 at gmail.com (Huan Liu) Date: Wed, 26 Jan 2011 17:48:48 +0000 (UTC) Subject: [Numpy-discussion] 3d plane to point cloud fitting using SVD References: <7f9779a7b1b4b8c8be07c5663ca74c50@mmb.pcb.ub.es> Message-ID: Hi, I just confirmed Stefan's answer on one of the examples in http://www.mathworks.co.jp/matlabcentral/newsreader/view_thread/262996 matlab: A = randn(100,2)*[2 0;3 0;-1 2]'; A = A + randn(size(A))/3; [U,S,V] = svd(A); X = V(:,end) python: from numpy import * A = random.randn(100,2)*mat([[2,3,-1],[0,0,2]]) A = A + random.randn(100,3)/3.0 u,s,vh = linalg.linalg.svd(A) v = vh.conj().transpose() print v[:,-1] It works! Thanks Peter for bringing this up and Stefan for answering! Huan From mwwiebe at gmail.com Wed Jan 26 15:10:32 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 12:10:32 -0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 2:23 AM, Ralf Gommers wrote: > On Wed, Jan 26, 2011 at 12:28 PM, Mark Wiebe wrote: > >> On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant >> > wrote: >>> >>>> >>>> It may make sense for a NumPy 1.6 to come out in March / April in the >>>> interim. >>>> >>>> >>> Pulling out the changes to attain backward compatibility isn't getting >>> any easier. I'd rather shoot for 2.0 in June. What can the rest of us do to >>> help move things along? >>> >> >> > Focusing on 2.0 makes sense to me too. Besides that, March/April is bad > timing for me so someone else should volunteer to be the release manager if > we go for a 1.6. > I think sooner than March/April might be a possibility. I've gotten the ABI working so this succeeds on my machine: * Build SciPy against NumPy 1.5.1 * Build NumPy trunk * Run NumPy trunk with the 1.5.1-built SciPy - all tests pass except for one (PIL image resize, which tests all float types and half lacks the precisions necessary) I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was >> the main issue, then that might be done. An ABI compatible 1.6 with the >> datetime and half types should be doable, just some extensions might get >> confused if they encounter arrays made with the new data types. >> >> Even if you fixed the ABI incompatibility (I don't know enough about the > issue to confirm that), I'm not sure how much value there is in a release > with as main new feature two dtypes that are not going to work well with > scipy/other binaries compiled against 1.5. > I've recently gotten the faster ufunc NEP implementation finished except for generalized ufuncs, and most things work the same or faster with it. Below are some timings of 1.5.1 vs the new_iterator branch. In particular, the overhead on small arrays hasn't gotten worse, but the output memory layout speeds up some operations by a lot. To exercise the iterator a bit, and try to come up with a better approach than the generalized ufuncs, I came up with a new function, 'einsum' for the Einstein summation convention. I'll send another email about it, but it for instance solves the problem discussed here: http://mail.scipy.org/pipermail/numpy-discussion/2006-May/020506.html as "c = np.einsum('rij,rjk->rik', a, b)" -Mark The timings: In [1]: import numpy as np In [2]: np.version.version Out[2]: '1.5.1' In [3]: a = np.arange(9.).reshape(3,3); b = a.copy() In [4]: timeit a + b 100000 loops, best of 3: 3.48 us per loop In [5]: timeit 2 * a 100000 loops, best of 3: 6.07 us per loop In [6]: timeit np.sum(a) 100000 loops, best of 3: 7.19 us per loop In [7]: a = np.arange(1000000).reshape(100,100,100); b = a.copy() In [8]: timeit a + b 100 loops, best of 3: 17.1 ms per loop In [9]: a = np.arange(1920*1080*3).reshape(1080,1920,3).swapaxes(0,1) In [10]: timeit a * a 1 loops, best of 3: 794 ms per loop In [1]: import numpy as np In [2]: np.version.version Out[2]: '2.0.0.dev-c97e9d5' In [3]: a = np.arange(9.).reshape(3,3); b = a.copy() In [4]: timeit a + b 100000 loops, best of 3: 3.24 us per loop In [5]: timeit 2 * a 100000 loops, best of 3: 6.12 us per loop In [6]: timeit np.sum(a) 100000 loops, best of 3: 6.6 us per loop In [7]: a = np.arange(1000000).reshape(100,100,100); b = a.copy() In [8]: timeit a + b 100 loops, best of 3: 17 ms per loop In [9]: a = np.arange(1920*1080*3).reshape(1080,1920,3).swapaxes(0,1) In [10]: timeit a * a 10 loops, best of 3: 116 ms per loop -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jan 26 15:27:51 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 12:27:51 -0800 Subject: [Numpy-discussion] einsum Message-ID: I wrote a new function, einsum, which implements Einstein summation notation, and I'd like comments/thoughts from people who might be interested in this kind of thing. In testing it, it is also faster than many of NumPy's built-in functions, except for dot and inner. At the bottom of this email you can find the documentation blurb I wrote for it, and here are some timings: In [1]: import numpy as np In [2]: a = np.arange(25).reshape(5,5) In [3]: timeit np.einsum('ii', a) 100000 loops, best of 3: 3.45 us per loop In [4]: timeit np.trace(a) 100000 loops, best of 3: 9.8 us per loop In [5]: timeit np.einsum('ii->i', a) 1000000 loops, best of 3: 1.19 us per loop In [6]: timeit np.diag(a) 100000 loops, best of 3: 7 us per loop In [7]: b = np.arange(30).reshape(5,6) In [8]: timeit np.einsum('ij,jk', a, b) 10000 loops, best of 3: 11.4 us per loop In [9]: timeit np.dot(a, b) 100000 loops, best of 3: 2.8 us per loop In [10]: a = np.arange(10000.) In [11]: timeit np.einsum('i->', a) 10000 loops, best of 3: 22.1 us per loop In [12]: timeit np.sum(a) 10000 loops, best of 3: 25.5 us per loop -Mark The documentation: einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe') Evaluates the Einstein summation convention on the operands. Using the Einstein summation convention, many common multi-dimensional array operations can be represented in a simple fashion. This function provides a way compute such summations. The best way to understand this function is to try the examples below, which show how many common NumPy functions can be implemented as calls to einsum. The subscripts string is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding operand. Repeated subscripts labels in one operand take the diagonal. For example, ``np.einsum('ii', a)`` is equivalent to ``np.trace(a)``. Whenever a label is repeated, it is summed, so ``np.einsum('i,i', a, b)`` is equivalent to ``np.inner(a,b)``. If a label appears only once, it is not summed, so ``np.einsum('i', a)`` produces a view of ``a`` with no changes. The order of labels in the output is by default alphabetical. This means that ``np.einsum('ij', a)`` doesn't affect a 2D array, while ``np.einsum('ji', a)`` takes its transpose. The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired. The call ``np.einsum('i->', a)`` is equivalent to ``np.sum(a, axis=-1)``, and ``np.einsum('ii->i', a)`` is equivalent to ``np.diag(a)``. It is also possible to control how broadcasting occurs using an ellipsis. To take the trace along the first and last axes, you can do ``np.einsum('i...i', a)``, or to do a matrix-matrix product with the left-most indices instead of rightmost, you can do ``np.einsum('ij...,jk...->ik...', a, b)``. When there is only one operand, no axes are summed, and no output parameter is provided, a view into the operand is returned instead of a new array. Thus, taking the diagonal as ``np.einsum('ii->i', a)`` produces a view. Parameters ---------- subscripts : string Specifies the subscripts for summation. operands : list of array_like These are the arrays for the operation. out : None or array If provided, the calculation is done into this array. dtype : None or data type If provided, forces the calculation to use the data type specified. Note that you may have to also give a more liberal ``casting`` parameter to allow the conversions. order : 'C', 'F', 'A', or 'K' Controls the memory layout of the output. 'C' means it should be Fortran contiguous. 'F' means it should be Fortran contiguous, 'A' means it should be 'F' if the inputs are all 'F', 'C' otherwise. 'K' means it should be as close to the layout as the inputs as is possible, including arbitrarily permuted axes. casting : 'no', 'equiv', 'safe', 'same_kind', 'unsafe' Controls what kind of data casting may occur. Setting this to 'unsafe' is not recommended, as it can adversely affect accumulations. 'no' means the data types should not be cast at all. 'equiv' means only byte-order changes are allowed. 'safe' means only casts which can preserve values are allowed. 'same_kind' means only safe casts or casts within a kind, like float64 to float32, are allowed. 'unsafe' means any data conversions may be done. Returns ------- output : ndarray The calculation based on the Einstein summation convention. See Also -------- dot, inner, outer, tensordot Examples -------- >>> a = np.arange(25).reshape(5,5) >>> b = np.arange(5) >>> c = np.arange(6).reshape(2,3) >>> np.einsum('ii', a) 60 >>> np.trace(a) 60 >>> np.einsum('ii->i', a) array([ 0, 6, 12, 18, 24]) >>> np.diag(a) array([ 0, 6, 12, 18, 24]) >>> np.einsum('ij,j', a, b) array([ 30, 80, 130, 180, 230]) >>> np.dot(a, b) array([ 30, 80, 130, 180, 230]) >>> np.einsum('ji', c) array([[0, 3], [1, 4], [2, 5]]) >>> c.T array([[0, 3], [1, 4], [2, 5]]) >>> np.einsum(',', 3, c) array([[ 0, 3, 6], [ 9, 12, 15]]) >>> np.multiply(3, c) array([[ 0, 3, 6], [ 9, 12, 15]]) >>> np.einsum('i,i', b, b) 30 >>> np.inner(b,b) 30 >>> np.einsum('i,j', np.arange(2)+1, b) array([[0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]) >>> np.outer(np.arange(2)+1, b) array([[0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]) >>> np.einsum('i...->', a) array([50, 55, 60, 65, 70]) >>> np.sum(a, axis=0) array([50, 55, 60, 65, 70]) >>> a = np.arange(60.).reshape(3,4,5) >>> b = np.arange(24.).reshape(4,3,2) >>> np.einsum('ijk,jil->kl', a, b) array([[ 4400., 4730.], [ 4532., 4874.], [ 4664., 5018.], [ 4796., 5162.], [ 4928., 5306.]]) >>> np.tensordot(a,b, axes=([1,0],[0,1])) array([[ 4400., 4730.], [ 4532., 4874.], [ 4664., 5018.], [ 4796., 5162.], [ 4928., 5306.]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Wed Jan 26 16:36:45 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 26 Jan 2011 12:36:45 -0900 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 11:27 AM, Mark Wiebe wrote: > I wrote a new function, einsum, which implements Einstein summation > notation, and I'd like comments/thoughts from people who might be interested > in this kind of thing. This sounds really cool! I've definitely considered doing something like this previously, but never really got around to seriously figuring out any sensible API. Do you have the source up somewhere? I'd love to try it out myself. --Josh From mwwiebe at gmail.com Wed Jan 26 16:48:24 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 13:48:24 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 1:36 PM, Joshua Holbrook wrote: > On Wed, Jan 26, 2011 at 11:27 AM, Mark Wiebe wrote: > > I wrote a new function, einsum, which implements Einstein summation > > notation, and I'd like comments/thoughts from people who might be > interested > > in this kind of thing. > > This sounds really cool! I've definitely considered doing something > like this previously, but never really got around to seriously > figuring out any sensible API. > > Do you have the source up somewhere? I'd love to try it out myself. > You can check out the new_iterator branch from here: https://github.com/m-paradox/numpy $ git clone https://github.com/m-paradox/numpy.git Cloning into numpy... -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Wed Jan 26 17:01:14 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 26 Jan 2011 13:01:14 -0900 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 12:48 PM, Mark Wiebe wrote: > On Wed, Jan 26, 2011 at 1:36 PM, Joshua Holbrook > wrote: >> >> On Wed, Jan 26, 2011 at 11:27 AM, Mark Wiebe wrote: >> > I wrote a new function, einsum, which implements Einstein summation >> > notation, and I'd like comments/thoughts from people who might be >> > interested >> > in this kind of thing. >> >> This sounds really cool! I've definitely considered doing something >> like this previously, but never really got around to seriously >> figuring out any sensible API. >> >> Do you have the source up somewhere? I'd love to try it out myself. > > You can check out the new_iterator branch from here: > https://github.com/m-paradox/numpy > $ git clone https://github.com/m-paradox/numpy.git > Cloning into numpy... > -Mark > Thanks for the link! How closely coupled is this new code with numpy's internals? That is, could you factor it out into its own package? If so, then people could have immediate use out of it without having to integrate it into numpy proper. --Josh From mwwiebe at gmail.com Wed Jan 26 17:43:14 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 14:43:14 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 2:01 PM, Joshua Holbrook wrote: > > How closely coupled is this new code with numpy's internals? That is, > could you factor it out into its own package? If so, then people could > have immediate use out of it without having to integrate it into numpy > proper. The code depends heavily on the iterator I wrote, and I think the idea itself depends on having a good dynamic multi-dimensional array library. When the numpy-refactor branch is complete, this would be part of libndarray, and could be used directly from C without depending on Python. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 26 17:48:24 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 26 Jan 2011 16:48:24 -0600 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 16:43, Mark Wiebe wrote: > On Wed, Jan 26, 2011 at 2:01 PM, Joshua Holbrook > wrote: >> >> >> How closely coupled is this new code with numpy's internals? That is, >> could you factor it out into its own package? If so, then people could >> have immediate use out of it without having to integrate it into numpy >> proper. > > The code depends heavily on the iterator I wrote, and I think the idea > itself depends on having a good dynamic multi-dimensional array library. > ?When the numpy-refactor branch is complete, this would be part of > libndarray, and could be used directly from C without depending on Python. It think his real question is whether einsum() and the iterator stuff can live in a separate module that *uses* a released version of numpy rather than a development branch. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From josh.holbrook at gmail.com Wed Jan 26 18:05:33 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 26 Jan 2011 14:05:33 -0900 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: > > It think his real question is whether einsum() and the iterator stuff > can live in a separate module that *uses* a released version of numpy > rather than a development branch. > > -- > Robert Kern > Indeed, I would like to be able to install and use einsum() without having to install another version of numpy. Even if it depends on features of a new numpy, it'd be nice to have it be a separate module. --Josh From klemm at phys.ethz.ch Wed Jan 26 18:18:30 2011 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Thu, 27 Jan 2011 00:18:30 +0100 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: Mark, interesting idea. Given the fact that in 2-d euclidean metric, the Einstein summation conventions are only a way to write out conventional matrix multiplications, do you consider at some point to include a non-euclidean metric in this thing? (As you have in special relativity, for example) Something along the lines: eta = np.diag(-1,1,1,1) a = np.array(1,2,3,4) b = np.array(1,1,1,1) such that einsum('i,i', a,b, metric=eta) = -1 + 2 + 3 + 4 I don't know how useful it would be, just a thought, Hanno Am 26.01.2011 um 21:27 schrieb Mark Wiebe: > I wrote a new function, einsum, which implements Einstein summation > notation, and I'd like comments/thoughts from people who might be > interested in this kind of thing. From gael.varoquaux at normalesup.org Wed Jan 26 18:21:06 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 27 Jan 2011 00:21:06 +0100 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: <20110126232106.GB32531@phare.normalesup.org> On Thu, Jan 27, 2011 at 12:18:30AM +0100, Hanno Klemm wrote: > interesting idea. Given the fact that in 2-d euclidean metric, the > Einstein summation conventions are only a way to write out > conventional matrix multiplications, do you consider at some point to > include a non-euclidean metric in this thing? (As you have in special > relativity, for example) In my experience, Einstein summation conventions are quite incomprehensible for people who haven't studies relativity (they aren't used much outside some narrow fields of physics). If you start adding metrics, you'll make it even harder for people to follow. My 2 cents, Ga?l From mwwiebe at gmail.com Wed Jan 26 18:21:55 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 15:21:55 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 3:05 PM, Joshua Holbrook wrote: > > > > It think his real question is whether einsum() and the iterator stuff > > can live in a separate module that *uses* a released version of numpy > > rather than a development branch. > > > > -- > > Robert Kern > > > > Indeed, I would like to be able to install and use einsum() without > having to install another version of numpy. Even if it depends on > features of a new numpy, it'd be nice to have it be a separate module. > > --Josh > Ah, sorry for misunderstanding. That would actually be very difficult, as the iterator required a fair bit of fixes and adjustments to the core. The new_iterator branch should be 1.5 ABI compatible, if that helps. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jan 26 18:29:28 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 15:29:28 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 3:18 PM, Hanno Klemm wrote: > > Mark, > > interesting idea. Given the fact that in 2-d euclidean metric, the > Einstein summation conventions are only a way to write out > conventional matrix multiplications, do you consider at some point to > include a non-euclidean metric in this thing? (As you have in special > relativity, for example) > > Something along the lines: > > eta = np.diag(-1,1,1,1) > > a = np.array(1,2,3,4) > b = np.array(1,1,1,1) > > such that > > einsum('i,i', a,b, metric=eta) = -1 + 2 + 3 + 4 > This particular example is already doable as follows: >>> eta = np.diag([-1,1,1,1]) >>> eta array([[-1, 0, 0, 0], [ 0, 1, 0, 0], [ 0, 0, 1, 0], [ 0, 0, 0, 1]]) >>> a = np.array([1,2,3,4]) >>> b = np.array([1,1,1,1]) >>> np.einsum('i,j,ij', a, b, eta) 8 I think that's right, did I understand you correctly? Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From klemm at phys.ethz.ch Wed Jan 26 18:48:44 2011 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Thu, 27 Jan 2011 00:48:44 +0100 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: <120CEF14-89D1-4750-A309-8665DE30C44B@phys.ethz.ch> Am 27.01.2011 um 00:29 schrieb Mark Wiebe: > On Wed, Jan 26, 2011 at 3:18 PM, Hanno Klemm > wrote: > > Mark, > > interesting idea. Given the fact that in 2-d euclidean metric, the > Einstein summation conventions are only a way to write out > conventional matrix multiplications, do you consider at some point to > include a non-euclidean metric in this thing? (As you have in special > relativity, for example) > > Something along the lines: > > eta = np.diag(-1,1,1,1) > > a = np.array(1,2,3,4) > b = np.array(1,1,1,1) > > such that > > einsum('i,i', a,b, metric=eta) = -1 + 2 + 3 + 4 > > This particular example is already doable as follows: > > >>> eta = np.diag([-1,1,1,1]) > >>> eta > array([[-1, 0, 0, 0], > [ 0, 1, 0, 0], > [ 0, 0, 1, 0], > [ 0, 0, 0, 1]]) > >>> a = np.array([1,2,3,4]) > >>> b = np.array([1,1,1,1]) > >>> np.einsum('i,j,ij', a, b, eta) > 8 > > I think that's right, did I understand you correctly? > > Cheers, > Mark Yes, that's what I had in mind. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Jan 26 19:35:31 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 26 Jan 2011 18:35:31 -0600 Subject: [Numpy-discussion] einsum In-Reply-To: <20110126232106.GB32531@phare.normalesup.org> References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: On Wednesday, January 26, 2011, Gael Varoquaux wrote: > On Thu, Jan 27, 2011 at 12:18:30AM +0100, Hanno Klemm wrote: >> interesting idea. Given the fact that in 2-d euclidean metric, the >> Einstein summation conventions are only a way to write out >> conventional matrix multiplications, do you consider at some point to >> include a non-euclidean metric in this thing? (As you have in special >> relativity, for example) > > In my experience, Einstein summation conventions are quite > incomprehensible for people who haven't studies relativity (they aren't > used much outside some narrow fields of physics). If you start adding > metrics, you'll make it even harder for people to follow. > > My 2 cents, > > Ga?l > Just to dispel the notion that Einstein notation is only used in the study of relativity, I can personally attest that Einstein notation is used in the field of fluid dynamics and some aspects of meteorology. This is really a neat idea and I support the idea of packaging it as a separate module. Ben Root From josh.holbrook at gmail.com Wed Jan 26 20:02:40 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 26 Jan 2011 16:02:40 -0900 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: > Ah, sorry for misunderstanding. That would actually be very difficult, > as the iterator required a fair bit of fixes and adjustments to the core. > The new_iterator branch should be 1.5 ABI compatible, if that helps. I see. Perhaps the fixes and adjustments can/should be included with numpy standard, even if the Einstein notation package is made a separate module. > Just to dispel the notion that Einstein notation is only used in the > study of relativity, I can personally attest that Einstein notation is > used in the field of fluid dynamics and some aspects of meteorology. Einstein notation is also used in solid mechanics. --Josh From tjhnson at gmail.com Wed Jan 26 20:18:53 2011 From: tjhnson at gmail.com (T J) Date: Wed, 26 Jan 2011 17:18:53 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: On Wed, Jan 26, 2011 at 5:02 PM, Joshua Holbrook wrote: >> Ah, sorry for misunderstanding. ?That would actually be very difficult, >> as the iterator required a fair bit of fixes and adjustments to the core. >> The new_iterator branch should be 1.5 ABI compatible, if that helps. > > I see. Perhaps the fixes and adjustments can/should be included with > numpy standard, even if the Einstein notation package is made a > separate module. > > Indeed, I would like to be able to install and use einsum() without > having to install another version of numpy. Even if it depends on > features of a new numpy, it'd be nice to have it be a separate module. I don't really understand the desire to have this single function exist in a separate package. If it requires the new version of NumPy, then you'll have to install/upgrade either way...and if it comes as part of that new NumPy, then you are already set. Doesn't a separate package complicate things unnecessarily? It make sense to me if einsum consisted of many functions (such as Bottleneck). From josef.pktd at gmail.com Wed Jan 26 20:23:01 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 26 Jan 2011 20:23:01 -0500 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: On Wed, Jan 26, 2011 at 7:35 PM, Benjamin Root wrote: > On Wednesday, January 26, 2011, Gael Varoquaux > wrote: >> On Thu, Jan 27, 2011 at 12:18:30AM +0100, Hanno Klemm wrote: >>> interesting idea. Given the fact that in 2-d euclidean metric, the >>> Einstein summation conventions are only a way to write out >>> conventional matrix multiplications, do you consider at some point to >>> include a non-euclidean metric in this thing? (As you have in special >>> relativity, for example) >> >> In my experience, Einstein summation conventions are quite >> incomprehensible for people who haven't studies relativity (they aren't >> used much outside some narrow fields of physics). If you start adding >> metrics, you'll make it even harder for people to follow. >> >> My 2 cents, >> >> Ga?l >> > > Just to dispel the notion that Einstein notation is only used in the > study of relativity, I can personally attest that Einstein notation is > used in the field of fluid dynamics and some aspects of meteorology. > This is really a neat idea and I support the idea of packaging it as a > separate module. So, if I read the examples correctly we finally get dot along an axis np.einsum('ijk,ji->', a, b) np.einsum('ijk,jik->k', a, b) or something like this. the notation might require getting used to but it doesn't look worse than figuring out what tensordot does. The only disadvantage I see, is that choosing the axes to operate on in a program or function requires string manipulation. Josef > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jrocher at enthought.com Wed Jan 26 21:41:02 2011 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 26 Jan 2011 20:41:02 -0600 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: Nice function, and wonderful that it speeds some tasks up. some feedback: the following notation is a little counter intuitive to me: >>> np.einsum('i...->', a) array([50, 55, 60, 65, 70]) >>> np.sum(a, axis=0) array([50, 55, 60, 65, 70]) Since there is nothing after the ->, I expected a scalar not a vector. I might suggest 'i...->...' Just noticed also a typo in the doc: order : 'C', 'F', 'A', or 'K' Controls the memory layout of the output. 'C' means it should be Fortran contiguous. 'F' means it should be Fortran contiguous, should be changed to order : 'C', 'F', 'A', or 'K' Controls the memory layout of the output. 'C' means it should be C contiguous. 'F' means it should be Fortran contiguous, Hope this helps, Jonathan On Wed, Jan 26, 2011 at 2:27 PM, Mark Wiebe wrote: > I wrote a new function, einsum, which implements Einstein summation > notation, and I'd like comments/thoughts from people who might be interested > in this kind of thing. > > In testing it, it is also faster than many of NumPy's built-in functions, > except for dot and inner. At the bottom of this email you can find the > documentation blurb I wrote for it, and here are some timings: > > In [1]: import numpy as np > In [2]: a = np.arange(25).reshape(5,5) > > In [3]: timeit np.einsum('ii', a) > 100000 loops, best of 3: 3.45 us per loop > In [4]: timeit np.trace(a) > 100000 loops, best of 3: 9.8 us per loop > > In [5]: timeit np.einsum('ii->i', a) > 1000000 loops, best of 3: 1.19 us per loop > In [6]: timeit np.diag(a) > 100000 loops, best of 3: 7 us per loop > > In [7]: b = np.arange(30).reshape(5,6) > > In [8]: timeit np.einsum('ij,jk', a, b) > 10000 loops, best of 3: 11.4 us per loop > In [9]: timeit np.dot(a, b) > 100000 loops, best of 3: 2.8 us per loop > > In [10]: a = np.arange(10000.) > > In [11]: timeit np.einsum('i->', a) > 10000 loops, best of 3: 22.1 us per loop > In [12]: timeit np.sum(a) > 10000 loops, best of 3: 25.5 us per loop > > -Mark > > The documentation: > > einsum(subscripts, *operands, out=None, dtype=None, order='K', > casting='safe') > > Evaluates the Einstein summation convention on the operands. > > Using the Einstein summation convention, many common multi-dimensional > array operations can be represented in a simple fashion. This function > provides a way compute such summations. > > The best way to understand this function is to try the examples below, > which show how many common NumPy functions can be implemented as > calls to einsum. > > The subscripts string is a comma-separated list of subscript labels, > where each label refers to a dimension of the corresponding operand. > Repeated subscripts labels in one operand take the diagonal. For > example, > ``np.einsum('ii', a)`` is equivalent to ``np.trace(a)``. > > Whenever a label is repeated, it is summed, so ``np.einsum('i,i', a, > b)`` > is equivalent to ``np.inner(a,b)``. If a label appears only once, > it is not summed, so ``np.einsum('i', a)`` produces a view of ``a`` > with no changes. > > The order of labels in the output is by default alphabetical. This > means that ``np.einsum('ij', a)`` doesn't affect a 2D array, while > ``np.einsum('ji', a)`` takes its transpose. > > The output can be controlled by specifying output subscript labels > as well. This specifies the label order, and allows summing to be > disallowed or forced when desired. The call ``np.einsum('i->', a)`` > is equivalent to ``np.sum(a, axis=-1)``, and > ``np.einsum('ii->i', a)`` is equivalent to ``np.diag(a)``. > > It is also possible to control how broadcasting occurs using > an ellipsis. To take the trace along the first and last axes, > you can do ``np.einsum('i...i', a)``, or to do a matrix-matrix > product with the left-most indices instead of rightmost, you can do > ``np.einsum('ij...,jk...->ik...', a, b)``. > > When there is only one operand, no axes are summed, and no output > parameter is provided, a view into the operand is returned instead > of a new array. Thus, taking the diagonal as ``np.einsum('ii->i', a)`` > produces a view. > > Parameters > ---------- > subscripts : string > Specifies the subscripts for summation. > operands : list of array_like > These are the arrays for the operation. > out : None or array > If provided, the calculation is done into this array. > dtype : None or data type > If provided, forces the calculation to use the data type specified. > Note that you may have to also give a more liberal ``casting`` > parameter to allow the conversions. > order : 'C', 'F', 'A', or 'K' > Controls the memory layout of the output. 'C' means it should > be Fortran contiguous. 'F' means it should be Fortran contiguous, > 'A' means it should be 'F' if the inputs are all 'F', 'C' > otherwise. > 'K' means it should be as close to the layout as the inputs as > is possible, including arbitrarily permuted axes. > casting : 'no', 'equiv', 'safe', 'same_kind', 'unsafe' > Controls what kind of data casting may occur. Setting this to > 'unsafe' is not recommended, as it can adversely affect > accumulations. > 'no' means the data types should not be cast at all. 'equiv' means > only byte-order changes are allowed. 'safe' means only casts > which can preserve values are allowed. 'same_kind' means only > safe casts or casts within a kind, like float64 to float32, are > allowed. 'unsafe' means any data conversions may be done. > > Returns > ------- > output : ndarray > The calculation based on the Einstein summation convention. > > See Also > -------- > dot, inner, outer, tensordot > > > Examples > -------- > > >>> a = np.arange(25).reshape(5,5) > >>> b = np.arange(5) > >>> c = np.arange(6).reshape(2,3) > > >>> np.einsum('ii', a) > 60 > >>> np.trace(a) > 60 > > >>> np.einsum('ii->i', a) > array([ 0, 6, 12, 18, 24]) > >>> np.diag(a) > array([ 0, 6, 12, 18, 24]) > > >>> np.einsum('ij,j', a, b) > array([ 30, 80, 130, 180, 230]) > >>> np.dot(a, b) > array([ 30, 80, 130, 180, 230]) > > >>> np.einsum('ji', c) > array([[0, 3], > [1, 4], > [2, 5]]) > >>> c.T > array([[0, 3], > [1, 4], > [2, 5]]) > > >>> np.einsum(',', 3, c) > array([[ 0, 3, 6], > [ 9, 12, 15]]) > >>> np.multiply(3, c) > array([[ 0, 3, 6], > [ 9, 12, 15]]) > > >>> np.einsum('i,i', b, b) > 30 > >>> np.inner(b,b) > 30 > > >>> np.einsum('i,j', np.arange(2)+1, b) > array([[0, 1, 2, 3, 4], > [0, 2, 4, 6, 8]]) > >>> np.outer(np.arange(2)+1, b) > array([[0, 1, 2, 3, 4], > [0, 2, 4, 6, 8]]) > > >>> np.einsum('i...->', a) > array([50, 55, 60, 65, 70]) > >>> np.sum(a, axis=0) > array([50, 55, 60, 65, 70]) > > >>> a = np.arange(60.).reshape(3,4,5) > >>> b = np.arange(24.).reshape(4,3,2) > >>> np.einsum('ijk,jil->kl', a, b) > array([[ 4400., 4730.], > [ 4532., 4874.], > [ 4664., 5018.], > [ 4796., 5162.], > [ 4928., 5306.]]) > >>> np.tensordot(a,b, axes=([1,0],[0,1])) > array([[ 4400., 4730.], > [ 4532., 4874.], > [ 4664., 5018.], > [ 4796., 5162.], > [ 4928., 5306.]]) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Jonathan Rocher, Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jan 26 22:09:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 26 Jan 2011 20:09:39 -0700 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 1:10 PM, Mark Wiebe wrote: > On Wed, Jan 26, 2011 at 2:23 AM, Ralf Gommers > wrote: > >> On Wed, Jan 26, 2011 at 12:28 PM, Mark Wiebe wrote: >> >>> On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant < >>>> oliphant at enthought.com> wrote: >>>> >>>>> >>>>> It may make sense for a NumPy 1.6 to come out in March / April in the >>>>> interim. >>>>> >>>>> >>>> Pulling out the changes to attain backward compatibility isn't getting >>>> any easier. I'd rather shoot for 2.0 in June. What can the rest of us do to >>>> help move things along? >>>> >>> >>> >> Focusing on 2.0 makes sense to me too. Besides that, March/April is bad >> timing for me so someone else should volunteer to be the release manager if >> we go for a 1.6. >> > > I think sooner than March/April might be a possibility. I've gotten the > ABI working so this succeeds on my machine: > > If we go with a 1.6 I have some polynomial stuff I want to put in, probably a weekend or two of work, and there are tickets and pull requests to look through, so to me March-April looks like a good time. It sounds like Ralf has stuff scheduled for the rest of the spring after the scipy release. IIRC, there was at least one other person interested in managing a release when David left for Silveregg, do we have any volunteers for a 1.6? If we do go for 1.6 I would like to keep 2.0 in sight. If datetime, the new iterator, einsum, and float16 are in 1.6 then 2.0 looks more like a cleanup the library/inteface and support IronPython release and there isn't as much pressure to get it out soon. Also it is important to get the ABI right so we don't need to change it again soon and doing that might take a bit of trial and error. Does September seem reasonable? * Build SciPy against NumPy 1.5.1 > * Build NumPy trunk > * Run NumPy trunk with the 1.5.1-built SciPy - all tests pass except for > one (PIL image resize, which tests all float types and half lacks the > precisions necessary) > > I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was >>> the main issue, then that might be done. An ABI compatible 1.6 with the >>> datetime and half types should be doable, just some extensions might get >>> confused if they encounter arrays made with the new data types. >>> >>> Even if you fixed the ABI incompatibility (I don't know enough about the >> issue to confirm that), I'm not sure how much value there is in a release >> with as main new feature two dtypes that are not going to work well with >> scipy/other binaries compiled against 1.5. >> > > I've recently gotten the faster ufunc NEP implementation finished except > for generalized ufuncs, and most things work the same or faster with > it. Below are some timings of 1.5.1 vs the new_iterator branch. In > particular, the overhead on small arrays hasn't gotten worse, but the output > memory layout speeds up some operations by a lot. > > Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jan 26 22:54:15 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 19:54:15 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: Message-ID: On Wed, Jan 26, 2011 at 6:41 PM, Jonathan Rocher wrote: > Nice function, and wonderful that it speeds some tasks up. > > some feedback: the following notation is a little counter intuitive to me: > >>> np.einsum('i...->', a) > array([50, 55, 60, 65, 70]) > >>> np.sum(a, axis=0) > array([50, 55, 60, 65, 70]) > Since there is nothing after the ->, I expected a scalar not a vector. I > might suggest 'i...->...' > Hmm, the dimension that's left is a a broadcast dimension, and the dimension labeled 'i' did go away. I suppose disallowing the empty output string and forcing a '...' is reasonable. Would disallowing broadcasting by default be a good approach? Then, einsum('ii->i', a) would only except two dimensional inputs, and you would have to specify einsum('...ii->...i', a) to get the current default behavior for it. Just noticed also a typo in the doc: > > order : 'C', 'F', 'A', or 'K' > Controls the memory layout of the output. 'C' means it should > be Fortran contiguous. 'F' means it should be Fortran contiguous, > should be changed to > order : 'C', 'F', 'A', or 'K' > Controls the memory layout of the output. 'C' means it should > be C contiguous. 'F' means it should be Fortran contiguous, > > Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jan 26 23:06:49 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 20:06:49 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: On Wed, Jan 26, 2011 at 5:23 PM, wrote: > > So, if I read the examples correctly we finally get dot along an axis > > np.einsum('ijk,ji->', a, b) > np.einsum('ijk,jik->k', a, b) > > or something like this. > > the notation might require getting used to but it doesn't look worse > than figuring out what tensordot does. > I thought of various extensions to the notation, but the idea is tricky enough as is I think. Decoding a regex-like syntax probably wouldn't help. The only disadvantage I see, is that choosing the axes to operate on > in a program or function requires string manipulation. > One possibility would be for the Python exposure to accept lists or tuples of integers. The subscript 'ii' could be [(0,0)], and 'ij,jk->ik' could be [(0,1), (1,2), (0,2)]. Internally it would convert this directly to a C-string to pass to the API function. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Wed Jan 26 23:29:38 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 26 Jan 2011 19:29:38 -0900 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: >> >> The only disadvantage I see, is that choosing the axes to operate on >> in a program or function requires string manipulation. > > > One possibility would be for the Python exposure to accept lists or tuples > of integers. ?The subscript 'ii' could be [(0,0)], and 'ij,jk->ik' could be > [(0,1), (1,2), (0,2)]. ?Internally it would convert this directly to a > C-string to pass to the API function. > -Mark > What if you made objects i, j, etc. such that i*j = (0, 1) and etcetera? Maybe you could generate them with something like (i, j, k) = einstein((1, 2, 3)) . Feel free to disregard me since I haven't really thought too hard about things and might not even really understand what the problem is *anyway*. I'm just trying to help brainstorm. :) --Josh From mwwiebe at gmail.com Wed Jan 26 23:45:33 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Jan 2011 20:45:33 -0800 Subject: [Numpy-discussion] einsum In-Reply-To: References: <20110126232106.GB32531@phare.normalesup.org> Message-ID: On Wed, Jan 26, 2011 at 8:29 PM, Joshua Holbrook wrote: > >> > >> The only disadvantage I see, is that choosing the axes to operate on > >> in a program or function requires string manipulation. > > > > > > One possibility would be for the Python exposure to accept lists or > tuples > > of integers. The subscript 'ii' could be [(0,0)], and 'ij,jk->ik' could > be > > [(0,1), (1,2), (0,2)]. Internally it would convert this directly to a > > C-string to pass to the API function. > > -Mark > > > > What if you made objects i, j, etc. such that i*j = (0, 1) and > etcetera? Maybe you could generate them with something like (i, j, k) > = einstein((1, 2, 3)) . > > Feel free to disregard me since I haven't really thought too hard > about things and might not even really understand what the problem is > *anyway*. I'm just trying to help brainstorm. :) > No worries. :) The problem is that someone will probably want to dynamically generate the axes to process in a loop, rather than having them hardcoded beforehand. For example, generalizing the diag function as follows. Within Python, creating lists and tuples is probably more natural. -Mark >>> def diagij(x, i, j): ... ss = "" ... so = "" ... # should error check i, j ... fill = ord('b') ... for k in range(x.ndim): ... if k in [i, j]: ... ss += 'a' ... else: ... ss += chr(fill) ... so += chr(fill) ... fill += 1 ... ss += '->' + so + 'a' ... return np.einsum(ss, x) ... >>> x = np.arange(3*3*3).reshape(3,3,3) >>> diagij(x, 0, 1) array([[ 0, 12, 24], [ 1, 13, 25], [ 2, 14, 26]]) >>> [np.diag(x[:,:,i]) for i in range(3)] [array([ 0, 12, 24]), array([ 1, 13, 25]), array([ 2, 14, 26])] >>> diagij(x, 1, 2) array([[ 0, 4, 8], [ 9, 13, 17], [18, 22, 26]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pivanov314 at gmail.com Thu Jan 27 01:09:51 2011 From: pivanov314 at gmail.com (Paul Ivanov) Date: Wed, 26 Jan 2011 22:09:51 -0800 Subject: [Numpy-discussion] How to tell if I succeeded to build numpy with amd, umfpack and lapack In-Reply-To: References: Message-ID: <20110127060951.GB21623@ykcyc> Samuel John, on 2011-01-26 15:08, wrote: > Hi there! > > I have successfully built numpy 1.5 on ubuntu lucid (32 for now). > I think I got ATLAS/lapack/BLAS support, and if I > > ldd linalg/lapack_lite.so > I see that my libptf77blas.so etc. are successfully linked. :-) > > However, how to I find out, if (and where) libamd.a and libumfpack.a > have been found and (statically) linked. > As far as I understand, I they are not present, a fallback in pure > python is used, right? > > Is there a recommended way, I can query against which libs numpy has > been built? > So I can be sure numpy uses my own compiled versions of libamd, lapack > and so forth. Hi Samuel, take a look at numpy.show_config() and scipy.show_config() best, -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From scipy at SamuelJohn.de Thu Jan 27 03:43:18 2011 From: scipy at SamuelJohn.de (Samuel John) Date: Thu, 27 Jan 2011 09:43:18 +0100 Subject: [Numpy-discussion] How to tell if I succeeded to build numpy with amd, umfpack and lapack In-Reply-To: <20110127060951.GB21623@ykcyc> References: <20110127060951.GB21623@ykcyc> Message-ID: Hi Paul, thanks for your answer! I was not aware of numpy.show_config(). However, it does not say anything about libamd.a and libumfpack.a, right? How do I know if they were successfully linked (statically)? Does anybody have a clue? greetings Samuel From gnurser at gmail.com Thu Jan 27 05:04:14 2011 From: gnurser at gmail.com (George Nurser) Date: Thu, 27 Jan 2011 10:04:14 +0000 Subject: [Numpy-discussion] Einstein summation convention Message-ID: Hi Mark, I was very interested to see that you had written an implementation of the Einstein summation convention for numpy. I'd thought about this last year, and wrote some notes on what I thought might be a reasonable interface. Unfortunately I was not in a position to actually implement it myself, so I left it. But I'll set them out here in case they are useful to you, or anyone else. The discussion about the dot product earlier last year e.g. http://mail.scipy.org/pipermail/numpy-discussion/2010-April/050160.html, http://mail.scipy.org/pipermail/numpy-discussion/2010-April/050202.html and the more recent work on tensor manipulation http://mail.scipy.org/pipermail/numpy-discussion/2010-June/050945.html and named arrays, http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes suggested to me that there is a lot of interest in writing tensor and vector products more explicitly. Rather than defining a new binary operator for matrix multiplication, perhaps the most pressing need is for an easy, intuitive, notation to define the axes that are being summed over. My initial thought was that perhaps one could do something like a[:,:*]*b[:*,:] = dot(a,b) where the * would denote the axis that was being summed over. So a[:,:*]*b[:,:*] = dot(a,b.T) and perhaps allow more lables, e.g. & a[:&,:*]*b[:&,:*] = tensordot(a,b,axes=([0,1],[0,1])) I'm not sure whether this is possible to implement, though. Even if it was, I suppose it would require major changes to the python slicing mechanism. A rather more cumbersome, though more powerful idea, might be to somehow parse operations written in terms of the Einstein summation convention. So in this case a function einst might be implemented such that e.g. supposing a.shape = (M,N), b.shape = (N,P), c.shape = (M,N) d.shape = (M,), e.shape = (M,), f.shape = (M,M), g.shape = (M,N,N,P) from numpy import einst as E The dot product would be taken where the same letter is found in lowerscript and upperscript, but no sum would be taken where the same letter was in lowerscript both times. So E(d,'i',e,'I') = sum_i d_{i}*b_{i} = dot(a,b) E(a,'ik',b,'Kj') = sum_k a_{ik}*b_{kj} = dot(a,b) E(f,'ik',a,'IK') = sum_{ik} f_{ik}*a_{ik} = tensordot(a,b,axes=([0,1],[0,1])) E(f,'ki',a,'IK') = sum_{ik} f_{ki}*a_{ik} = tensordot(a,b,axes=([1,0],[0,1])) Contraction, diagonalization and outer product emerge naturally E(f,'iI') = sum_i a_{ii} = f.trace() E(f,'ii') = a_{ii} = f.diagonal() E(d,'i',e,'k') = outer(d,e) Multiplication of more than two tensors could be performed: E(d,'i',f,'Ik',e,'K') = sum_{ik} d_{i}*f_{ik}*e_{k} = dot(d,dot(f,e)) = d.dot(f.dot(e)) E(a,'ik',g,'IKlm',b,'LM') = sum_{iklm} a_{ik}*g_{iklm}*b{lm} = tensordot(a,tensordot(g,b,axes=([2,3],[0,1])),axes=([0,1],[0,1])) Multiplication term by term without summation is implied by repeated letters of the same case. Multiplication over unequal axis lengths would raise an error. E(d,'j',e,'j') = d_{j}*e_{j} = d*e E(a,'ij',c,'ij') = a_{ij}*c_{ij} = a*c E(a,'ij',f,'ij') undefined, since a.shape[0] != f.shape[0] E(f,'ij',f,'ji') = f_{ij}*f_{ji} = f*f.T Broadcasting would be explicit E(f,'ij',d,'j') =a_{ij}*d_j = a*d E(f,'ij',d,'i') =a_{ij}*d_i = a*d[:,None] If a definite order of precedence of sums were established (e.g. doing the rightmost multiplication first) dotting and term by term multiplication could be mixed E(d,'I',d,'i',e,'i') = sum_i d_i*d_i*e_i I hope this is of some use. --George Nurser From markbak at gmail.com Thu Jan 27 05:40:00 2011 From: markbak at gmail.com (Mark Bakker) Date: Thu, 27 Jan 2011 11:40:00 +0100 Subject: [Numpy-discussion] Error in tanh for large complex argument Message-ID: Hello list, When computing tanh for large complex argument I get unexpected nans: tanh works fine for large real values: In [84]: tanh(1000) Out[84]: 1.0 Not for large complex values: In [85]: tanh(1000+0j) Out[85]: (nan+nan*j) While the correct answer is: In [86]: (1.0-exp(-2.0*(1000+0j)))/(1.0+exp(-2.0*(1000+0j))) Out[86]: 1.0 Bug? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From cool-rr at cool-rr.com Thu Jan 27 05:55:21 2011 From: cool-rr at cool-rr.com (cool-RR) Date: Thu, 27 Jan 2011 12:55:21 +0200 Subject: [Numpy-discussion] How can I install numpy on Python 3.1 in Ubuntu? In-Reply-To: References: Message-ID: On Mon, Jan 24, 2011 at 3:23 PM, Ralf Gommers wrote: > > > On Mon, Jan 24, 2011 at 8:22 PM, cool-RR wrote: > >> Hello folks, >> >> I have Ubuntu 10.10 server on EC2. I installed Python 3.1, and now I want >> to install NumPy on it. How do I do it? I tried `easy_install-3.1 numpy` but >> got this error: >> > > Just do "python3.1 setup.py install". That's always a better idea for > numpy/scipy than trying to use easy_install. Also you need to make sure some > packages are installed first. From > http://www.scipy.org/Installing_SciPy/Linux: > sudo apt-get install build-essential python-dev swig gfortran python-nose > > Cheers, > Ralf > > Thanks, I'll give it a try. Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jan 27 06:11:54 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 27 Jan 2011 11:11:54 +0000 (UTC) Subject: [Numpy-discussion] Error in tanh for large complex argument References: Message-ID: Thu, 27 Jan 2011 11:40:00 +0100, Mark Bakker wrote: [clip] > Not for large complex values: > > In [85]: tanh(1000+0j) > Out[85]: (nan+nan*j) Yep, it's a bug. Care to file a ticket? The implementation is just sinh/cosh, which overflows. The fix is to provide an asymptotic expansion (sgn Re z), although around the imaginary axis the switch is perhaps somewhat messy to handle. OTOH, the glibc-provided C99 function doesn't fare too well either: #include #include #include int main() { complex double z = 1000; double x, y; z = ctanh(z); x = creal(z); y = cimag(z); printf("%g %g\n", x, y); return 0; } ### -> Prints 0 0 on glibc 2.12.1 From nadavh at visionsense.com Thu Jan 27 06:37:33 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 27 Jan 2011 03:37:33 -0800 Subject: [Numpy-discussion] Error in tanh for large complex argument In-Reply-To: References: , Message-ID: <26FC23E7C398A64083C980D16001012D1AD93B941F@VA3DIAXVS361.RED001.local> The C code return the right result with glibc 2.12.2 (linux 64 + gcc 4.52). However I get the same nan+nan*j with python. Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Pauli Virtanen [pav at iki.fi] Sent: 27 January 2011 13:11 To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Error in tanh for large complex argument Thu, 27 Jan 2011 11:40:00 +0100, Mark Bakker wrote: [clip] > Not for large complex values: > > In [85]: tanh(1000+0j) > Out[85]: (nan+nan*j) Yep, it's a bug. Care to file a ticket? The implementation is just sinh/cosh, which overflows. The fix is to provide an asymptotic expansion (sgn Re z), although around the imaginary axis the switch is perhaps somewhat messy to handle. OTOH, the glibc-provided C99 function doesn't fare too well either: #include #include #include int main() { complex double z = 1000; double x, y; z = ctanh(z); x = creal(z); y = cimag(z); printf("%g %g\n", x, y); return 0; } ### -> Prints 0 0 on glibc 2.12.1 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Thu Jan 27 06:47:24 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 27 Jan 2011 20:47:24 +0900 Subject: [Numpy-discussion] Error in tanh for large complex argument In-Reply-To: <26FC23E7C398A64083C980D16001012D1AD93B941F@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D1AD93B941F@VA3DIAXVS361.RED001.local> Message-ID: On Thu, Jan 27, 2011 at 8:37 PM, Nadav Horesh wrote: > The C code return the right result with glibc 2.12.2 (linux 64 + gcc 4.52). Same for me on mac os x (not sure which C library it is using, the freebsd one ?) for ppc, i386 and amd64, cheers, David From ralf.gommers at googlemail.com Thu Jan 27 10:09:29 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 27 Jan 2011 23:09:29 +0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 11:09 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Wed, Jan 26, 2011 at 1:10 PM, Mark Wiebe wrote: > >> On Wed, Jan 26, 2011 at 2:23 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> On Wed, Jan 26, 2011 at 12:28 PM, Mark Wiebe wrote: >>> >>>> On Tue, Jan 25, 2011 at 5:18 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> On Tue, Jan 25, 2011 at 1:13 PM, Travis Oliphant < >>>>> oliphant at enthought.com> wrote: >>>>> >>>>>> >>>>>> It may make sense for a NumPy 1.6 to come out in March / April in >>>>>> the interim. >>>>>> >>>>>> >>>>> Pulling out the changes to attain backward compatibility isn't getting >>>>> any easier. I'd rather shoot for 2.0 in June. What can the rest of us do to >>>>> help move things along? >>>>> >>>> >>>> >>> Focusing on 2.0 makes sense to me too. Besides that, March/April is bad >>> timing for me so someone else should volunteer to be the release manager if >>> we go for a 1.6. >>> >> >> I think sooner than March/April might be a possibility. I've gotten the >> ABI working so this succeeds on my machine: >> >> > If we go with a 1.6 I have some polynomial stuff I want to put in, probably > a weekend or two of work, and there are tickets and pull requests to look > through, so to me March-April looks like a good time. It sounds like Ralf > has stuff scheduled for the rest of the spring after the scipy release. > IIRC, there was at least one other person interested in managing a release > when David left for Silveregg, do we have any volunteers for a 1.6? > > If we do go for 1.6 I would like to keep 2.0 in sight. If datetime, the new > iterator, einsum, and float16 are in 1.6 then 2.0 looks more like a cleanup > the library/inteface and support IronPython release and there isn't as much > pressure to get it out soon. Also it is important to get the ABI right so we > don't need to change it again soon and doing that might take a bit of trial > and error. Does September seem reasonable? > > * Build SciPy against NumPy 1.5.1 >> * Build NumPy trunk >> * Run NumPy trunk with the 1.5.1-built SciPy - all tests pass except for >> one (PIL image resize, which tests all float types and half lacks the >> precisions necessary) >> > The PIL test can still be fixed before the final 0.9.0 release, it looks like we will need another RC anyway. Does anyone have time for this in the next few days? > >> I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was >>>> the main issue, then that might be done. An ABI compatible 1.6 with the >>>> datetime and half types should be doable, just some extensions might get >>>> confused if they encounter arrays made with the new data types. >>>> >>>> Even if you fixed the ABI incompatibility (I don't know enough about the >>> issue to confirm that), I'm not sure how much value there is in a release >>> with as main new feature two dtypes that are not going to work well with >>> scipy/other binaries compiled against 1.5. >>> >> >> I've recently gotten the faster ufunc NEP implementation finished except >> for generalized ufuncs, and most things work the same or faster with >> it. Below are some timings of 1.5.1 vs the new_iterator branch. In >> particular, the overhead on small arrays hasn't gotten worse, but the output >> memory layout speeds up some operations by a lot. >> >> Your new additions indeed look quite promising. I tried your new_iterator branch but ran into a segfault immediately on running the tests on OS X. I opened a ticket for it, to not mix it into this discussion about releases too much: http://projects.scipy.org/numpy/ticket/1724. Before we decide on a 1.6 release I would suggest to do at least the following: - review of ABI fixes by someone very familiar with the problem that occurred in 1.4.0 (David, Pauli, Charles?) - test on Linux, OS X and Windows 32-bit and 64-bit. Also with an MSVC build on Windows, since that exposes more issues each release. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Jan 27 11:17:08 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 27 Jan 2011 08:17:08 -0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 7:09 AM, Ralf Gommers wrote: > > The PIL test can still be fixed before the final 0.9.0 release, it looks > like we will need another RC anyway. Does anyone have time for this in the > next few days? > I've attached a patch which fixes it for me. > I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc was >>>>> the main issue, then that might be done. An ABI compatible 1.6 with the >>>>> datetime and half types should be doable, just some extensions might get >>>>> confused if they encounter arrays made with the new data types. >>>>> >>>>> Even if you fixed the ABI incompatibility (I don't know enough about >>>> the issue to confirm that), I'm not sure how much value there is in a >>>> release with as main new feature two dtypes that are not going to work well >>>> with scipy/other binaries compiled against 1.5. >>>> >>> >>> I've recently gotten the faster ufunc NEP implementation finished except >>> for generalized ufuncs, and most things work the same or faster with >>> it. Below are some timings of 1.5.1 vs the new_iterator branch. In >>> particular, the overhead on small arrays hasn't gotten worse, but the output >>> memory layout speeds up some operations by a lot. >>> >>> Your new additions indeed look quite promising. I tried your new_iterator > branch but ran into a segfault immediately on running the tests on OS X. I > opened a ticket for it, to not mix it into this discussion about releases > too much: http://projects.scipy.org/numpy/ticket/1724. > Is that a non-Intel platform? While I tried to get aligned access right, it's likely there's a bug in it somewhere. Before we decide on a 1.6 release I would suggest to do at least the > following: > - review of ABI fixes by someone very familiar with the problem that > occurred in 1.4.0 (David, Pauli, Charles?) > - test on Linux, OS X and Windows 32-bit and 64-bit. Also with an MSVC > build on Windows, since that exposes more issues each release. > All tests pass for me now, maybe it's a good time to merge the branch into the trunk so we can run it on the buildbot? -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy_piltest.patch Type: text/x-patch Size: 537 bytes Desc: not available URL: From f.pollastri at inrim.it Thu Jan 27 11:50:49 2011 From: f.pollastri at inrim.it (Fabrizio Pollastri) Date: Thu, 27 Jan 2011 16:50:49 +0000 (UTC) Subject: [Numpy-discussion] sort descending with NaNs Message-ID: Hello, when one has to find a given number of highest values in an array containing NaNs, the sort function (always ascending) is uncomfortable. Since numpy >= 1.4.0 NaNs are sorted to the end, so the searched values are just before the first NaN in a unpredictable position and one has to do another search for the first NaN position. Sorting descending will solve the problem, but there is no option with numpy sort. There is any other trick to avoid this second search? TIA, Fabrizio Pollastri From charlesr.harris at gmail.com Thu Jan 27 12:10:17 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 10:10:17 -0700 Subject: [Numpy-discussion] sort descending with NaNs In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 9:50 AM, Fabrizio Pollastri wrote: > Hello, > > when one has to find a given number of highest values in an array > containing > NaNs, the sort function (always ascending) is uncomfortable. > > Since numpy >= 1.4.0 NaNs are sorted to the end, so the searched values are > just > before the first NaN in a unpredictable position and one has to do another > search for the first NaN position. > > Sorting descending will solve the problem, but there is no option with > numpy > sort. There is any other trick to avoid this second search? > If you just want to reverse the result, try a[::-1]. I think you may still need to find the boundaries of the nan's just to make sure they aren't included among the largest values. Searchsorted is pretty quick in any case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 27 12:17:54 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 10:17:54 -0700 Subject: [Numpy-discussion] sort descending with NaNs In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 10:10 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, Jan 27, 2011 at 9:50 AM, Fabrizio Pollastri wrote: > >> Hello, >> >> when one has to find a given number of highest values in an array >> containing >> NaNs, the sort function (always ascending) is uncomfortable. >> >> Since numpy >= 1.4.0 NaNs are sorted to the end, so the searched values >> are just >> before the first NaN in a unpredictable position and one has to do another >> search for the first NaN position. >> >> Sorting descending will solve the problem, but there is no option with >> numpy >> sort. There is any other trick to avoid this second search? >> > > If you just want to reverse the result, try a[::-1]. I think you may still > need to find the boundaries of the nan's just to make sure they aren't > included among the largest values. Searchsorted is pretty quick in any case. > > To sort in descending order sort the negatives, i.e. In [1]: -sort(-array((0,1,2,3,4,nan))) Out[1]: array([ 4., 3., 2., 1., 0., nan]) I still think a.searchsorted(nan) would be faster. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 27 12:36:34 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 10:36:34 -0700 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 9:17 AM, Mark Wiebe wrote: > On Thu, Jan 27, 2011 at 7:09 AM, Ralf Gommers > wrote: > >> >> The PIL test can still be fixed before the final 0.9.0 release, it looks >> like we will need another RC anyway. Does anyone have time for this in the >> next few days? >> > > I've attached a patch which fixes it for me. > > >> I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc >>>>>> was the main issue, then that might be done. An ABI compatible 1.6 with the >>>>>> datetime and half types should be doable, just some extensions might get >>>>>> confused if they encounter arrays made with the new data types. >>>>>> >>>>>> Even if you fixed the ABI incompatibility (I don't know enough about >>>>> the issue to confirm that), I'm not sure how much value there is in a >>>>> release with as main new feature two dtypes that are not going to work well >>>>> with scipy/other binaries compiled against 1.5. >>>>> >>>> >>>> I've recently gotten the faster ufunc NEP implementation finished except >>>> for generalized ufuncs, and most things work the same or faster with >>>> it. Below are some timings of 1.5.1 vs the new_iterator branch. In >>>> particular, the overhead on small arrays hasn't gotten worse, but the output >>>> memory layout speeds up some operations by a lot. >>>> >>>> Your new additions indeed look quite promising. I tried your >> new_iterator branch but ran into a segfault immediately on running the tests >> on OS X. I opened a ticket for it, to not mix it into this discussion about >> releases too much: http://projects.scipy.org/numpy/ticket/1724. >> > > Is that a non-Intel platform? While I tried to get aligned access right, > it's likely there's a bug in it somewhere. > > Before we decide on a 1.6 release I would suggest to do at least the >> following: >> - review of ABI fixes by someone very familiar with the problem that >> occurred in 1.4.0 (David, Pauli, Charles?) >> - test on Linux, OS X and Windows 32-bit and 64-bit. Also with an MSVC >> build on Windows, since that exposes more issues each release. >> > > All tests pass for me now, maybe it's a good time to merge the branch into > the trunk so we can run it on the buildbot? > > Might be better to merge your unadulterated stuff into master, make a 1.6 branch, and add the compatibility fixes in the branch. You can test branches on the buildbot I think, at least that worked for svn, I haven't tried it with github. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dewald.pieterse at gmail.com Thu Jan 27 16:03:22 2011 From: dewald.pieterse at gmail.com (Dewald Pieterse) Date: Thu, 27 Jan 2011 16:03:22 -0500 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop Message-ID: I am processing two csv files against another, my first implementation used python list of lists and list.append to generate a new list while looping all the data including the non-relevant data (can't determine location of specific data element in a list of list). So I re-implented the exact same code but using numpy.array's (2d arrays) using numpy.where to prevent looping over an entire dataset needlessly but the numpy.array based code is about 7.6 times slower? relevant list of list code: > starttime = time.clock() > #NI_data_list room_eqp_list > NI_data_list_new = [] > for NI_row in NI_data_list: > treelevel = NI_row[0] > elevation = NI_row[1] > locater = NI_row[2] > area = NI_row[3] > NIroom = NI_row[4] > #Write appropriate equipment models and drawing into new list > if NIroom != '': > #Write appropriate equipment models and drawing into new list > for row in room_eqp_list: > eqp_room = row[0] > if len(eqp_room) == 5: > eqp_drawing = row[1] > if NIroom == eqp_room: > newrow = > [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] > NI_data_list_new.append(newrow) > #Write appropriate piping info into the new list > for prow in unique_piping_list: > pipe_room = prow[0] > if len(pipe_room) == 5: > pipe_drawing = prow[1] > if pipe_room == NIroom: > piperow = > [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] > NI_data_list_new.append(piperow) > #Write appropriate equipment models and drawing into new list > if (locater != '' and NIroom == ''): > #Write appropriate equipment models and drawing into new list > for row in room_eqp_list: > eqp_locater = row[0] > if len(eqp_locater) == 4: > eqp_drawing = row[1] > if locater == eqp_locater: > newrow = > [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] > NI_data_list_new.append(newrow) > #Write appropriate piping info into the new list > for prow in unique_piping_list: > pipe_locater = prow[0] > if len(pipe_locater) == 4: > pipe_drawing = prow[1] > if pipe_locater == locater: > piperow = > [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] > NI_data_list_new.append(piperow) > #Rewrite NI_data to new list > if NIroom == '': > NI_data_list_new.append(NI_row) > > print (time.clock()-starttime) > relevant numpy.array code: > NI_data_write_url = reports_dir + 'NI_data_room2.csv' > NI_data_list_file = open(NI_data_write_url, 'wb') > NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', > quotechar='"') > starttime = time.clock() > #NI_data_list room_eqp_list > NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', > 'BuildingLocater', 'Area', 'Room', 'Item']]) > for NI_row in NI_data_list: > treelevel = NI_row[0] > elevation = NI_row[1] > locater = NI_row[2] > area = NI_row[3] > NIroom = NI_row[4] > #Write appropriate equipment models and drawing into new array > if NIroom != '': > #Write appropriate equipment models and drawing into new array > (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) > for row_iter in rowtest: > eqp_room = room_eqp_list[row_iter,0] > if len(eqp_room) == 5: > eqp_drawing = room_eqp_list[row_iter,1] > if NIroom == eqp_room: > newrow = > numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) > NI_data_list_new = numpy.append(NI_data_list_new, > newrow, 0) > > #Write appropriate piping info into the new array > (rowtest, columntest) = > numpy.where(unique_room_piping_list==NIroom) > for row_iter in rowtest: #unique_room_piping_list > pipe_room = unique_room_piping_list[row_iter,0] > if len(pipe_room) == 5: > pipe_drawing = unique_room_piping_list[row_iter,1] > if pipe_room == NIroom: > piperow = > numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) > NI_data_list_new = numpy.append(NI_data_list_new, > piperow, 0) > #Write appropriate equipment models and drawing into new array > if (locater != '' and NIroom == ''): > #Write appropriate equipment models and drawing into new array > (rowtest, columntest) = numpy.where(room_eqp_list==locater) > for row_iter in rowtest: > eqp_locater = room_eqp_list[row_iter,0] > if len(eqp_locater) == 4: > eqp_drawing = room_eqp_list[row_iter,1] > if locater == eqp_locater: > newrow = > numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) > NI_data_list_new = numpy.append(NI_data_list_new, > newrow, 0) > #Write appropriate piping info into the new array > (rowtest, columntest) = numpy.where(unique_room_eqp_list==locater) > for row_iter in rowtest: > pipe_locater = unique_room_piping_list[row_iter,0] > if len(pipe_locater) == 4: > pipe_drawing = unique_room_piping_list[row_iter,1] > if pipe_locater == locater: > piperow = > numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) > NI_data_list_new = numpy.append(NI_data_list_new, > piperow, 0) > #Rewrite NI_data to new list > if NIroom == '': > NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0) > > print (time.clock()-starttime) > some relevant output > >>> print NI_data_list_new > [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] > ['0' '' '1000' '' '' ''] > ['1' '' '1000' '' '' 'docname Rev 0'] > ..., > ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] > ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] > ['0' '' '' '' '' '']] > Is numpy.append so slow? or is the culprit numpy.where? Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Jan 27 16:19:43 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 27 Jan 2011 13:19:43 -0800 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: References: Message-ID: <4D41E16F.5060505@noaa.gov> On 1/27/11 1:03 PM, Dewald Pieterse wrote: > I am processing two csv files against another, my first implementation > used python list of lists and list.append to generate a new list while > looping all the data including the non-relevant data (can't determine > location of specific data element in a list of list). So I re-implented > the exact same code but using numpy.array's (2d arrays) using > numpy.where to prevent looping over an entire dataset needlessly but the > numpy.array based code is about 7.6 times slower? Didn't look at your code in any detail, but: numpy arrays are not designed to be re-sizable, so numpy.append actually creates a new array, and copies the old to the new, along with the new stuff. It's a convenience function, but it means you are re-allocating and copying all your data with each call. python lists, on the other hand, are designed to be re-sizable, so they pre-allocate extra room, so that appending can be fast. In general, the recommended solution in this sort of situation is to build up your data in a python list, then convert it to an array. If I'm right about what you're doing you could keep the "rows" as numpy arrays, but put them in a list while building it up. Also, a numpy array of strings isn't necessarily a great dats structure for this kind of data. YOu might want to look at structured arrays. I wrote an appendable numpy array class a while back, to address this. It has some advantages, though, as it it written, not as much as you'd think. It does have some benifits for structured arrays, though. Code enclosed -Chris > relevant list of list code: > > starttime = time.clock() > #NI_data_list room_eqp_list > NI_data_list_new = [] > for NI_row in NI_data_list: > treelevel = NI_row[0] > elevation = NI_row[1] > locater = NI_row[2] > area = NI_row[3] > NIroom = NI_row[4] > #Write appropriate equipment models and drawing into new list > if NIroom != '': > #Write appropriate equipment models and drawing into new list > for row in room_eqp_list: > eqp_room = row[0] > if len(eqp_room) == 5: > eqp_drawing = row[1] > if NIroom == eqp_room: > newrow = > [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] > NI_data_list_new.append(newrow) > #Write appropriate piping info into the new list > for prow in unique_piping_list: > pipe_room = prow[0] > if len(pipe_room) == 5: > pipe_drawing = prow[1] > if pipe_room == NIroom: > piperow = > [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] > NI_data_list_new.append(piperow) > #Write appropriate equipment models and drawing into new list > if (locater != '' and NIroom == ''): > #Write appropriate equipment models and drawing into new list > for row in room_eqp_list: > eqp_locater = row[0] > if len(eqp_locater) == 4: > eqp_drawing = row[1] > if locater == eqp_locater: > newrow = > [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] > NI_data_list_new.append(newrow) > #Write appropriate piping info into the new list > for prow in unique_piping_list: > pipe_locater = prow[0] > if len(pipe_locater) == 4: > pipe_drawing = prow[1] > if pipe_locater == locater: > piperow = > [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] > NI_data_list_new.append(piperow) > #Rewrite NI_data to new list > if NIroom == '': > NI_data_list_new.append(NI_row) > > print (time.clock()-starttime) > > > relevant numpy.array code: > > NI_data_write_url = reports_dir + 'NI_data_room2.csv' > NI_data_list_file = open(NI_data_write_url, 'wb') > NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', > quotechar='"') > starttime = time.clock() > #NI_data_list room_eqp_list > NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', > 'BuildingLocater', 'Area', 'Room', 'Item']]) > for NI_row in NI_data_list: > treelevel = NI_row[0] > elevation = NI_row[1] > locater = NI_row[2] > area = NI_row[3] > NIroom = NI_row[4] > #Write appropriate equipment models and drawing into new array > if NIroom != '': > #Write appropriate equipment models and drawing into new array > (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) > for row_iter in rowtest: > eqp_room = room_eqp_list[row_iter,0] > if len(eqp_room) == 5: > eqp_drawing = room_eqp_list[row_iter,1] > if NIroom == eqp_room: > newrow = > numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) > NI_data_list_new = > numpy.append(NI_data_list_new, newrow, 0) > > #Write appropriate piping info into the new array > (rowtest, columntest) = > numpy.where(unique_room_piping_list==NIroom) > for row_iter in rowtest: #unique_room_piping_list > pipe_room = unique_room_piping_list[row_iter,0] > if len(pipe_room) == 5: > pipe_drawing = unique_room_piping_list[row_iter,1] > if pipe_room == NIroom: > piperow = > numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) > NI_data_list_new = > numpy.append(NI_data_list_new, piperow, 0) > #Write appropriate equipment models and drawing into new array > if (locater != '' and NIroom == ''): > #Write appropriate equipment models and drawing into new array > (rowtest, columntest) = numpy.where(room_eqp_list==locater) > for row_iter in rowtest: > eqp_locater = room_eqp_list[row_iter,0] > if len(eqp_locater) == 4: > eqp_drawing = room_eqp_list[row_iter,1] > if locater == eqp_locater: > newrow = > numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) > NI_data_list_new = > numpy.append(NI_data_list_new, newrow, 0) > #Write appropriate piping info into the new array > (rowtest, columntest) = > numpy.where(unique_room_eqp_list==locater) > for row_iter in rowtest: > pipe_locater = unique_room_piping_list[row_iter,0] > if len(pipe_locater) == 4: > pipe_drawing = unique_room_piping_list[row_iter,1] > if pipe_locater == locater: > piperow = > numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) > NI_data_list_new = > numpy.append(NI_data_list_new, piperow, 0) > #Rewrite NI_data to new list > if NIroom == '': > NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0) > > print (time.clock()-starttime) > > > some relevant output > > >>> print NI_data_list_new > [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] > ['0' '' '1000' '' '' ''] > ['1' '' '1000' '' '' 'docname Rev 0'] > ..., > ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] > ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] > ['0' '' '' '' '' '']] > > > Is numpy.append so slow? or is the culprit numpy.where? > > Dewald Pieterse > > "A democracy is nothing more than mob rule, where fifty-one percent of > the people take away the rights of the other forty-nine." ~ Thomas Jefferson > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: Accumulator.zip Type: application/zip Size: 4703 bytes Desc: not available URL: From mwwiebe at gmail.com Thu Jan 27 12:56:50 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 27 Jan 2011 09:56:50 -0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 9:36 AM, Charles R Harris wrote: > >> All tests pass for me now, maybe it's a good time to merge the branch into >> the trunk so we can run it on the buildbot? >> >> > Might be better to merge your unadulterated stuff into master, make a 1.6 > branch, and add the compatibility fixes in the branch. You can test branches > on the buildbot I think, at least that worked for svn, I haven't tried it > with github. > I'm inclined to put the ABI fixes in trunk as well for the time being. The two changes of note, moving the 'cast' array to the end of PyArray_ArrFuncs and making 'flags' in PyArray_Descr bigger, can be reapplied if the 2.0 refactor ends up needing them. I think for 2.0, more extensive future-proofing will be desirable anyway, so trunk may as well be ABI compatible until it's clear what changes are necessary. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dewald.pieterse at gmail.com Thu Jan 27 16:33:43 2011 From: dewald.pieterse at gmail.com (Dewald Pieterse) Date: Thu, 27 Jan 2011 16:33:43 -0500 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: <4D41E16F.5060505@noaa.gov> References: <4D41E16F.5060505@noaa.gov> Message-ID: On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker wrote: > On 1/27/11 1:03 PM, Dewald Pieterse wrote: > >> I am processing two csv files against another, my first implementation >> used python list of lists and list.append to generate a new list while >> looping all the data including the non-relevant data (can't determine >> location of specific data element in a list of list). So I re-implented >> the exact same code but using numpy.array's (2d arrays) using >> numpy.where to prevent looping over an entire dataset needlessly but the >> numpy.array based code is about 7.6 times slower? >> > > Didn't look at your code in any detail, but: > > numpy arrays are not designed to be re-sizable, so numpy.append actually > creates a new array, and copies the old to the new, along with the new > stuff. It's a convenience function, but it means you are re-allocating and > copying all your data with each call. > > python lists, on the other hand, are designed to be re-sizable, so they > pre-allocate extra room, so that appending can be fast. > > In general, the recommended solution in this sort of situation is to build > up your data in a python list, then convert it to an array. > > If I'm right about what you're doing you could keep the "rows" as numpy > arrays, but put them in a list while building it up. > Thanks Chris, I believe this is the problem then, I can continue to use the arrays as reference data but build list instead, the only reason I used the arrays was to be able to use numpy.where, I just use both data types, best of both worlds. As I already have row arrays I will do a build a list or arrays. > Also, a numpy array of strings isn't necessarily a great dats structure for > this kind of data. YOu might want to look at structured arrays. > Atm, I use : comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"') comit_eqp_lt = numpy.array([[col for col in row] for row in comit_eqp_reader]) to setup the arrays, I will look at using structured arrays > > I wrote an appendable numpy array class a while back, to address this. It > has some advantages, though, as it it written, not as much as you'd think. > It does have some benifits for structured arrays, though. > > > Code enclosed > > -Chris > > > > relevant list of list code: >> >> starttime = time.clock() >> #NI_data_list room_eqp_list >> NI_data_list_new = [] >> for NI_row in NI_data_list: >> treelevel = NI_row[0] >> elevation = NI_row[1] >> locater = NI_row[2] >> area = NI_row[3] >> NIroom = NI_row[4] >> #Write appropriate equipment models and drawing into new list >> if NIroom != '': >> #Write appropriate equipment models and drawing into new list >> for row in room_eqp_list: >> eqp_room = row[0] >> if len(eqp_room) == 5: >> eqp_drawing = row[1] >> if NIroom == eqp_room: >> newrow = >> [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] >> NI_data_list_new.append(newrow) >> #Write appropriate piping info into the new list >> for prow in unique_piping_list: >> pipe_room = prow[0] >> if len(pipe_room) == 5: >> pipe_drawing = prow[1] >> if pipe_room == NIroom: >> piperow = >> [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] >> NI_data_list_new.append(piperow) >> #Write appropriate equipment models and drawing into new list >> if (locater != '' and NIroom == ''): >> #Write appropriate equipment models and drawing into new list >> for row in room_eqp_list: >> eqp_locater = row[0] >> if len(eqp_locater) == 4: >> eqp_drawing = row[1] >> if locater == eqp_locater: >> newrow = >> [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] >> NI_data_list_new.append(newrow) >> #Write appropriate piping info into the new list >> for prow in unique_piping_list: >> pipe_locater = prow[0] >> if len(pipe_locater) == 4: >> pipe_drawing = prow[1] >> if pipe_locater == locater: >> piperow = >> [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] >> NI_data_list_new.append(piperow) >> #Rewrite NI_data to new list >> if NIroom == '': >> NI_data_list_new.append(NI_row) >> >> print (time.clock()-starttime) >> >> >> relevant numpy.array code: >> >> NI_data_write_url = reports_dir + 'NI_data_room2.csv' >> NI_data_list_file = open(NI_data_write_url, 'wb') >> NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', >> quotechar='"') >> starttime = time.clock() >> #NI_data_list room_eqp_list >> NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', >> 'BuildingLocater', 'Area', 'Room', 'Item']]) >> for NI_row in NI_data_list: >> treelevel = NI_row[0] >> elevation = NI_row[1] >> locater = NI_row[2] >> area = NI_row[3] >> NIroom = NI_row[4] >> #Write appropriate equipment models and drawing into new array >> if NIroom != '': >> #Write appropriate equipment models and drawing into new array >> (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) >> for row_iter in rowtest: >> eqp_room = room_eqp_list[row_iter,0] >> if len(eqp_room) == 5: >> eqp_drawing = room_eqp_list[row_iter,1] >> if NIroom == eqp_room: >> newrow = >> >> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) >> NI_data_list_new = >> numpy.append(NI_data_list_new, newrow, 0) >> >> #Write appropriate piping info into the new array >> (rowtest, columntest) = >> numpy.where(unique_room_piping_list==NIroom) >> for row_iter in rowtest: #unique_room_piping_list >> pipe_room = unique_room_piping_list[row_iter,0] >> if len(pipe_room) == 5: >> pipe_drawing = unique_room_piping_list[row_iter,1] >> if pipe_room == NIroom: >> piperow = >> >> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) >> NI_data_list_new = >> numpy.append(NI_data_list_new, piperow, 0) >> #Write appropriate equipment models and drawing into new array >> if (locater != '' and NIroom == ''): >> #Write appropriate equipment models and drawing into new array >> (rowtest, columntest) = numpy.where(room_eqp_list==locater) >> for row_iter in rowtest: >> eqp_locater = room_eqp_list[row_iter,0] >> if len(eqp_locater) == 4: >> eqp_drawing = room_eqp_list[row_iter,1] >> if locater == eqp_locater: >> newrow = >> >> numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) >> NI_data_list_new = >> numpy.append(NI_data_list_new, newrow, 0) >> #Write appropriate piping info into the new array >> (rowtest, columntest) = >> numpy.where(unique_room_eqp_list==locater) >> for row_iter in rowtest: >> pipe_locater = unique_room_piping_list[row_iter,0] >> if len(pipe_locater) == 4: >> pipe_drawing = unique_room_piping_list[row_iter,1] >> if pipe_locater == locater: >> piperow = >> >> numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) >> NI_data_list_new = >> numpy.append(NI_data_list_new, piperow, 0) >> #Rewrite NI_data to new list >> if NIroom == '': >> NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0) >> >> print (time.clock()-starttime) >> >> >> some relevant output >> >> >>> print NI_data_list_new >> [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] >> ['0' '' '1000' '' '' ''] >> ['1' '' '1000' '' '' 'docname Rev 0'] >> ..., >> ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] >> ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] >> ['0' '' '' '' '' '']] >> >> >> Is numpy.append so slow? or is the culprit numpy.where? >> >> Dewald Pieterse >> >> "A democracy is nothing more than mob rule, where fifty-one percent of >> the people take away the rights of the other forty-nine." ~ Thomas >> Jefferson >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson -------------- next part -------------- An HTML attachment was scrubbed... URL: From dewald.pieterse at gmail.com Thu Jan 27 16:47:43 2011 From: dewald.pieterse at gmail.com (Dewald Pieterse) Date: Thu, 27 Jan 2011 16:47:43 -0500 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: References: <4D41E16F.5060505@noaa.gov> Message-ID: On Thu, Jan 27, 2011 at 4:33 PM, Dewald Pieterse wrote: > > > On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker > wrote: > >> On 1/27/11 1:03 PM, Dewald Pieterse wrote: >> >>> I am processing two csv files against another, my first implementation >>> used python list of lists and list.append to generate a new list while >>> looping all the data including the non-relevant data (can't determine >>> location of specific data element in a list of list). So I re-implented >>> the exact same code but using numpy.array's (2d arrays) using >>> numpy.where to prevent looping over an entire dataset needlessly but the >>> numpy.array based code is about 7.6 times slower? >>> >> >> Didn't look at your code in any detail, but: >> >> numpy arrays are not designed to be re-sizable, so numpy.append actually >> creates a new array, and copies the old to the new, along with the new >> stuff. It's a convenience function, but it means you are re-allocating and >> copying all your data with each call. >> >> python lists, on the other hand, are designed to be re-sizable, so they >> pre-allocate extra room, so that appending can be fast. >> >> In general, the recommended solution in this sort of situation is to build >> up your data in a python list, then convert it to an array. >> >> If I'm right about what you're doing you could keep the "rows" as numpy >> arrays, but put them in a list while building it up. >> > > Thanks Chris, I believe this is the problem then, I can continue to use the > arrays as reference data but build list instead, the only reason I used the > arrays was to be able to use numpy.where, I just use both data types, best > of both worlds. As I already have row arrays I will do a build a list or > arrays. > Now my code is nearly 4 times faster than the list of lists implementation! Wonderful, thanks. > >> Also, a numpy array of strings isn't necessarily a great dats structure >> for this kind of data. YOu might want to look at structured arrays. >> > > Atm, I use : > comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"') > comit_eqp_lt = numpy.array([[col for col in row] for row in > comit_eqp_reader]) > to setup the arrays, I will look at using structured arrays > >> >> I wrote an appendable numpy array class a while back, to address this. It >> has some advantages, though, as it it written, not as much as you'd think. >> It does have some benifits for structured arrays, though. >> >> >> Code enclosed >> >> -Chris >> >> >> >> relevant list of list code: >>> >>> starttime = time.clock() >>> #NI_data_list room_eqp_list >>> NI_data_list_new = [] >>> for NI_row in NI_data_list: >>> treelevel = NI_row[0] >>> elevation = NI_row[1] >>> locater = NI_row[2] >>> area = NI_row[3] >>> NIroom = NI_row[4] >>> #Write appropriate equipment models and drawing into new list >>> if NIroom != '': >>> #Write appropriate equipment models and drawing into new list >>> for row in room_eqp_list: >>> eqp_room = row[0] >>> if len(eqp_room) == 5: >>> eqp_drawing = row[1] >>> if NIroom == eqp_room: >>> newrow = >>> [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] >>> NI_data_list_new.append(newrow) >>> #Write appropriate piping info into the new list >>> for prow in unique_piping_list: >>> pipe_room = prow[0] >>> if len(pipe_room) == 5: >>> pipe_drawing = prow[1] >>> if pipe_room == NIroom: >>> piperow = >>> [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] >>> NI_data_list_new.append(piperow) >>> #Write appropriate equipment models and drawing into new list >>> if (locater != '' and NIroom == ''): >>> #Write appropriate equipment models and drawing into new list >>> for row in room_eqp_list: >>> eqp_locater = row[0] >>> if len(eqp_locater) == 4: >>> eqp_drawing = row[1] >>> if locater == eqp_locater: >>> newrow = >>> [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] >>> NI_data_list_new.append(newrow) >>> #Write appropriate piping info into the new list >>> for prow in unique_piping_list: >>> pipe_locater = prow[0] >>> if len(pipe_locater) == 4: >>> pipe_drawing = prow[1] >>> if pipe_locater == locater: >>> piperow = >>> [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] >>> NI_data_list_new.append(piperow) >>> #Rewrite NI_data to new list >>> if NIroom == '': >>> NI_data_list_new.append(NI_row) >>> >>> print (time.clock()-starttime) >>> >>> >>> relevant numpy.array code: >>> >>> NI_data_write_url = reports_dir + 'NI_data_room2.csv' >>> NI_data_list_file = open(NI_data_write_url, 'wb') >>> NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', >>> quotechar='"') >>> starttime = time.clock() >>> #NI_data_list room_eqp_list >>> NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', >>> 'BuildingLocater', 'Area', 'Room', 'Item']]) >>> for NI_row in NI_data_list: >>> treelevel = NI_row[0] >>> elevation = NI_row[1] >>> locater = NI_row[2] >>> area = NI_row[3] >>> NIroom = NI_row[4] >>> #Write appropriate equipment models and drawing into new array >>> if NIroom != '': >>> #Write appropriate equipment models and drawing into new >>> array >>> (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) >>> for row_iter in rowtest: >>> eqp_room = room_eqp_list[row_iter,0] >>> if len(eqp_room) == 5: >>> eqp_drawing = room_eqp_list[row_iter,1] >>> if NIroom == eqp_room: >>> newrow = >>> >>> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, newrow, 0) >>> >>> #Write appropriate piping info into the new array >>> (rowtest, columntest) = >>> numpy.where(unique_room_piping_list==NIroom) >>> for row_iter in rowtest: #unique_room_piping_list >>> pipe_room = unique_room_piping_list[row_iter,0] >>> if len(pipe_room) == 5: >>> pipe_drawing = unique_room_piping_list[row_iter,1] >>> if pipe_room == NIroom: >>> piperow = >>> >>> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, piperow, 0) >>> #Write appropriate equipment models and drawing into new array >>> if (locater != '' and NIroom == ''): >>> #Write appropriate equipment models and drawing into new >>> array >>> (rowtest, columntest) = numpy.where(room_eqp_list==locater) >>> for row_iter in rowtest: >>> eqp_locater = room_eqp_list[row_iter,0] >>> if len(eqp_locater) == 4: >>> eqp_drawing = room_eqp_list[row_iter,1] >>> if locater == eqp_locater: >>> newrow = >>> >>> numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, newrow, 0) >>> #Write appropriate piping info into the new array >>> (rowtest, columntest) = >>> numpy.where(unique_room_eqp_list==locater) >>> for row_iter in rowtest: >>> pipe_locater = unique_room_piping_list[row_iter,0] >>> if len(pipe_locater) == 4: >>> pipe_drawing = unique_room_piping_list[row_iter,1] >>> if pipe_locater == locater: >>> piperow = >>> >>> numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, piperow, 0) >>> #Rewrite NI_data to new list >>> if NIroom == '': >>> NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0) >>> >>> print (time.clock()-starttime) >>> >>> >>> some relevant output >>> >>> >>> print NI_data_list_new >>> [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] >>> ['0' '' '1000' '' '' ''] >>> ['1' '' '1000' '' '' 'docname Rev 0'] >>> ..., >>> ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] >>> ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] >>> ['0' '' '' '' '' '']] >>> >>> >>> Is numpy.append so slow? or is the culprit numpy.where? >>> >>> Dewald Pieterse >>> >>> "A democracy is nothing more than mob rule, where fifty-one percent of >>> the people take away the rights of the other forty-nine." ~ Thomas >>> Jefferson >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Dewald Pieterse > > "A democracy is nothing more than mob rule, where fifty-one percent of the > people take away the rights of the other forty-nine." ~ Thomas Jefferson > -- Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Jan 27 16:47:22 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 27 Jan 2011 22:47:22 +0100 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: References: Message-ID: <4D41E7EA.6080603@molden.no> Den 27.01.2011 22:03, skrev Dewald Pieterse: > Is numpy.append so slow? or is the culprit numpy.where? Please observe that appending to a Python list is amortized O(1), whereas appending to a numpy array is O(N**2). Sturla From sturla at molden.no Thu Jan 27 16:53:40 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 27 Jan 2011 22:53:40 +0100 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: <4D41E7EA.6080603@molden.no> References: <4D41E7EA.6080603@molden.no> Message-ID: <4D41E964.5030100@molden.no> Den 27.01.2011 22:47, skrev Sturla Molden: > > Please observe that appending to a Python list is amortized O(1), > whereas appending to a numpy array is O(N**2). > Sorry, one append to a numpy array is O(N). But N appends are O(N) for lists and O(N*N) for arrays. S.M. From sturla at molden.no Thu Jan 27 17:47:47 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 27 Jan 2011 23:47:47 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: References: <4D3F3AEA.9050600@grinta.net> <4D3F401A.3060203@grinta.net> Message-ID: <4D41F613.9080700@molden.no> Den 25.01.2011 23:21, skrev Jonathan Rocher: > Actually I believe the version does matter: I have seen a C version of > num rec that doesn't contain all the algorithmic part but only the > codes. I cannot remember exactly which ones are the light versions. If > I had to guess, the F90 is also a light version and that's why I > bought the F77 book. The F90 version is meant to be read in conjunction with the F77 version, not alone. It is very useful for NumPy programmers, as it is one of few text books that deals with vectorisation of algorithms. (F90 is an array-oriented language like Matlab and NumPy.) It is also the NR version with the "cleanest" source code examples. NR in C uses a nasty (and illegal) hack to get base-1 arrays in C. It is also notorious for numerically unstable code, and should never have been published. That is why the authors later published "NR in C++" to rescue their image. NR's third edition is utterly atrocious. It uses C++ OOP for code obfuscation, such as inheritance and functors (objects instead of functions), which is not instructive at all in explaining "numerical methods". They also play with methods and inheritance in structs, not just classes, which can confuse readers not knowing the dusty corners of C++. The text is also messier to read, less organized, and some of it is bady written compared to previous editions. But the scope is more extensive. It has many valuable details that should have been covered in previous versions, but it is presented in a way that makes me barf. Also beware of common NR pitfalls like unstable SVD, slow FFTs, bad PRNGs, etc. Always use proper libraries like LAPACK, BLAS, FFTW, et al. NR code is just for inspiration. :-) Sturla From sturla at molden.no Thu Jan 27 17:57:48 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 27 Jan 2011 23:57:48 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: <4D41F613.9080700@molden.no> References: <4D3F3AEA.9050600@grinta.net> <4D3F401A.3060203@grinta.net> <4D41F613.9080700@molden.no> Message-ID: <4D41F86C.40507@molden.no> Den 27.01.2011 23:47, skrev Sturla Molden: > The F90 version is meant to be read in conjunction with the F77 version, > not alone. It is very useful for NumPy programmers, as it is one of few > text books that deals with vectorisation of algorithms. (F90 is an > array-oriented language like Matlab and NumPy.) It is also the NR > version with the "cleanest" source code examples. BTW, they are available here: http://www.nrbook.com/a/bookfpdf.php http://www.nrbook.com/a/bookf90pdf.php Sturla From oliphant at enthought.com Thu Jan 27 18:15:13 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 27 Jan 2011 17:15:13 -0600 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: <0A6387EF-1227-46C0-BB60-C5AA7A024D68@enthought.com> My $0.02 on the NumPy 2.0 schedule: NumPy 2.0 is for ABI-incompatible changes like datetime support, and .NET support. It would be ideal, if at the same time we could future-proof the ABI some-what so that future changes can be made in an ABI-compatible way. I also think it would be a good idea to incorporate Mark's small-array improvements into the C-structure of NumPy arrays. If Mark has time to work on this, I have some hope we can get there. I have been wanting to propose a "generator array" for some time now, but have not had time to write it up. I have the outline of a design that overlaps but I think generalizes Mark's deferred arrays. Mark's deferred arrays would be a particular realization of the generator array, but other realizations are possible as well. There is much that has to be fleshed out for it to really work, and I think it will have to be in NumPy 2.0 because it will create ABI changes. I don't have the time to personally implement the design. If there are others out there that have the time, I would love to talk with them about it. However, I don't want to distract from this scheduling thread to discuss the ideas (I will post something else for that). The reason for a NumPy 1.6 suggestion, is that Mark (and others it would seem) have additional work and features that do not need to wait for the NumPy 2.0 ABI design to finalize in order to get out there. If someone is willing to manage the release of NumPy 1.6, then it sounds like a great idea to me. -Travis Basically the reason for On Jan 27, 2011, at 11:56 AM, Mark Wiebe wrote: > On Thu, Jan 27, 2011 at 9:36 AM, Charles R Harris wrote: > > All tests pass for me now, maybe it's a good time to merge the branch into the trunk so we can run it on the buildbot? > > > Might be better to merge your unadulterated stuff into master, make a 1.6 branch, and add the compatibility fixes in the branch. You can test branches on the buildbot I think, at least that worked for svn, I haven't tried it with github. > > I'm inclined to put the ABI fixes in trunk as well for the time being. The two changes of note, moving the 'cast' array to the end of PyArray_ArrFuncs and making 'flags' in PyArray_Descr bigger, can be reapplied if the 2.0 refactor ends up needing them. I think for 2.0, more extensive future-proofing will be desirable anyway, so trunk may as well be ABI compatible until it's clear what changes are necessary. > > -Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Thu Jan 27 18:17:20 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Thu, 27 Jan 2011 17:17:20 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list Message-ID: Hey all, What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? -Travis From robert.kern at gmail.com Thu Jan 27 18:23:00 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 27 Jan 2011 17:23:00 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 17:17, Travis Oliphant wrote: > > Hey all, > > What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? We've resisted it for years. I don't think the split has done scipy much good. But that may just be my perspective because I'm subscribed to both and filter them both to the same folder. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From chanley at stsci.edu Thu Jan 27 18:24:42 2011 From: chanley at stsci.edu (Christopher Hanley) Date: Thu, 27 Jan 2011 18:24:42 -0500 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 6:23 PM, Robert Kern wrote: > On Thu, Jan 27, 2011 at 17:17, Travis Oliphant > wrote: > > > > Hey all, > > > > What is the thought about having two separate NumPy lists (one for > development discussions and one for user discussions)? > > We've resisted it for years. I don't think the split has done scipy > much good. But that may just be my perspective because I'm subscribed > to both and filter them both to the same folder. > I do the same as Robert. I don't see much value in creating separate lists. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 27 18:32:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 16:32:22 -0700 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: On Thu, Jan 27, 2011 at 4:23 PM, Robert Kern wrote: > On Thu, Jan 27, 2011 at 17:17, Travis Oliphant > wrote: > > > > Hey all, > > > > What is the thought about having two separate NumPy lists (one for > development discussions and one for user discussions)? > > We've resisted it for years. I don't think the split has done scipy > much good. But that may just be my perspective because I'm subscribed > to both and filter them both to the same folder. > > Me too. I don't think there is so much traffic that a distinction needs to be made. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Jan 27 18:33:36 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 27 Jan 2011 15:33:36 -0800 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: <4D41E964.5030100@molden.no> References: <4D41E7EA.6080603@molden.no> <4D41E964.5030100@molden.no> Message-ID: <4D4200D0.7070007@noaa.gov> On 1/27/11 1:53 PM, Sturla Molden wrote: > But N appends are O(N) for lists and O(N*N) for arrays. hmmm - that doesn't seem quite right -- lists still have to re-allocate and copy, they just do it every n times (where n grows with the list), so I wouldn't expect exactly O(N). But you never know 'till you profile. See the enclosed code and figures. Interestingly both appear to be pretty linear, though the constant is Much larger for numpy arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: append_time.py Type: application/x-python Size: 1006 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: append_timing.png Type: image/png Size: 40428 bytes Desc: not available URL: From oliphant at enthought.com Thu Jan 27 18:35:38 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 27 Jan 2011 17:35:38 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> I think for me, the trouble is I don't have time to read all the messages, but I want to see developer-centric discussions. Sometimes, I can tell that from the subject (but I miss it). I agree that traffic is probably not too heavy at this point (but it does create some difficulty in keeping up). I know we have resisted it for years. I appreciate the comments. -Travis On Jan 27, 2011, at 5:32 PM, Charles R Harris wrote: > > > On Thu, Jan 27, 2011 at 4:23 PM, Robert Kern wrote: > On Thu, Jan 27, 2011 at 17:17, Travis Oliphant wrote: > > > > Hey all, > > > > What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? > > We've resisted it for years. I don't think the split has done scipy > much good. But that may just be my perspective because I'm subscribed > to both and filter them both to the same folder. > > > Me too. I don't think there is so much traffic that a distinction needs to be made. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Jan 27 18:54:03 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 28 Jan 2011 00:54:03 +0100 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: <4D4200D0.7070007@noaa.gov> References: <4D41E7EA.6080603@molden.no> <4D41E964.5030100@molden.no> <4D4200D0.7070007@noaa.gov> Message-ID: <4D42059B.5080106@molden.no> Den 28.01.2011 00:33, skrev Christopher Barker: > > hmmm - that doesn't seem quite right -- lists still have to > re-allocate and copy, they just do it every n times (where n grows > with the list), so I wouldn't expect exactly O(N). Lists allocate empty slots at their back, proportional to their size. So as lists grows, re-allocations become rarer and rarer. Then on average the complexity per append becomes O(1), which is the "amortised" complexity. Appending N items to a list thus has the amortized complexity O(N). The advantage of this implementation over linked lists is that indexing will be O(1) as well. NumPy arrays are designed to be fixed size, and not designed to amortize the complexity of appends. So if you want to use arrays as efficient re-sizeable containers, you must code this logic yourself. Sturla From oliphant at enthought.com Thu Jan 27 19:01:27 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 27 Jan 2011 18:01:27 -0600 Subject: [Numpy-discussion] Generator arrays Message-ID: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> Just to start the conversation, and to find out who is interested, I would like to informally propose generator arrays for NumPy 2.0. This concept has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, it also allows for "compressed arrays", on-the-fly computed arrays, and streamed or generated arrays. Basically, the modification I would like to make is to have an array flag (MEMORY) that when set means that the data attribute of a numpy array is a pointer to the address in memory where the data begins with the strides attribute pointing to a C-array of integers (in other words, all current arrays are MEMORY arrays) But, when the MEMORY flag is not set, the data attribute instead points to a length-2 C-array of pointers to functions [read(N, output_address, self->index_iter, self->extra), write(N, input_address, self->index_iter, self->extra)] Either of these could then be NULL (i.e. if write is NULL, then the array must be read-only). When the MEMORY flag is not set, the strides member of the ndarray structure is a pointer to the index_iter object (which could be anything that the particular read and write methods need it to be). The array structure should also get a member to hold the "extra" argument (which would hold any state that the array needed to hold on to in order to correctly perform the read or write operations --- i.e. it could hold an execution graph for deferred evaluation). The index_iter structure is anything that the read and write methods need to correctly identify *where* to write. Now, clearly, we could combine index_iter and extra into just one "structure" that holds all needed state for read and write to work correctly. The reason I propose two slots is because at least mentally in the use case of having these structures be calculation graphs, one of these structures is involved in "computing the location to read/write" and the other is involved in "computing what to read/write" The idea is fairly simple, but with some very interesting potential features: * lazy evaluation (of indexing, ufuncs, etc.) * fancy indexing as views instead of copies (really just another example of lazy evaluation) * compressed arrays * generated arrays (from computation or streamed data) * infinite arrays * computed arrays * missing-data arrays * ragged arrays (shape would be the bounding box --- which makes me think of ragged arrays as examples of masked arrays). * arrays that view PIL data. One could build an array with a (logically) infinite number of elements (we could use -2 in the shape tuple to indicate that). We don't need examples of all of these features for NumPy 2.0 to be released, because to really make this useful, we would need to modify all "calculation" code to produce a NON MEMORY array. What to do here still needs a lot of thought and experimentation. But, I can think about a situation where all NumPy calculations that produce arrays provide the option that when they are done inside of a particular context, a user-supplied behavior over-rides the default return. I want to study what Mark is proposing and understand his new iterator at a deeper level before providing more thoughts here. That's the gist of what I am thinking about. I would love feedback and comments. The other things I would like to see in NumPy 2.0 that have not been discussed lately (that could affect the ABI) are: * a geometry member to the data structure (that allows labels to dimensions and axes to be provided -- ala data_array) * small array performance improvements that Mark Wiebe has suggested (including the addition of an optional low-level loop that is used when you have contiguous data) * completed datetime implementation * pointer data-types (i.e. the memory location holds a pointer to another part of an ndarray) --- very useful for "join" - type arrays If anybody is interested in helping with any of these (and has time to do it, let me know). Some of this I could fund (especially if you are willing to come to Austin and be an intern for Enthought). Best regards, -Travis P.S. I hope to have more time this year to hang-out here on the numpy-discussion list (but we will see....) From sturla at molden.no Thu Jan 27 19:07:40 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 28 Jan 2011 01:07:40 +0100 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: <4D4208CC.60607@molden.no> Den 28.01.2011 00:23, skrev Robert Kern: > We've resisted it for years. I don't think the split has done scipy > much good. The scope of NumPy is narrower development-wise and wider user-wise. While SciPy does not benefit, as use and development are still quite entangled, this is not be the case for NumPy. Perhaps we could split the NumPy list and merge the SciPy lists? Sturla From brennan.williams at visualreservoir.com Thu Jan 27 19:12:40 2011 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 28 Jan 2011 13:12:40 +1300 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: <4D4208CC.60607@molden.no> References: <4D4208CC.60607@molden.no> Message-ID: <4D4209F8.4080901@visualreservoir.com> On 28/01/2011 1:07 p.m., Sturla Molden wrote: > Den 28.01.2011 00:23, skrev Robert Kern: >> We've resisted it for years. I don't think the split has done scipy >> much good. > The scope of NumPy is narrower development-wise and wider user-wise. > While SciPy does not benefit, as use and development are still quite > entangled, this is not be the case for NumPy. > > Perhaps we could split the NumPy list and merge the SciPy lists? As a user of NumPy and SciPy (and there are probably a lot of people who use both) why not have one developer list for both NumPy and SciPy and one user list for both NumPy and SciPy? It might not work from a developer point of view but I think it does from a user point of view - mind you I just put everything of interest from both mailing lists into one folder. Brennan From jlhouchin at gmail.com Thu Jan 27 19:34:48 2011 From: jlhouchin at gmail.com (Jimmie Houchin) Date: Thu, 27 Jan 2011 18:34:48 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> References: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> Message-ID: On 1/27/2011 5:35 PM, Travis Oliphant wrote: > I think for me, the trouble is I don't have time to read all the > messages, but I want to see developer-centric discussions. Sometimes, I > can tell that from the subject (but I miss it). > > I agree that traffic is probably not too heavy at this point (but it > does create some difficulty in keeping up). > > I know we have resisted it for years. I appreciate the comments. > > -Travis Maybe a convention can voluntarily be adopted using a DEV or some such subject prefix by the author when an author submits such a developer oriented message. It wouldn't be perfect but could possibly aid in scanning subjects of interest when pressed for time if the convention became sufficiently adopted. Possibly such a convention could be put into an FAQ on the website about the mailing list. Also included in a welcome message to subscribers. Just a thought. Jimmie From charlesr.harris at gmail.com Thu Jan 27 19:37:00 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 17:37:00 -0700 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> Message-ID: On Thu, Jan 27, 2011 at 5:01 PM, Travis Oliphant wrote: > > Just to start the conversation, and to find out who is interested, I would > like to informally propose generator arrays for NumPy 2.0. This concept > has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, > it also allows for "compressed arrays", on-the-fly computed arrays, and > streamed or generated arrays. > > Basically, the modification I would like to make is to have an array flag > (MEMORY) that when set means that the data attribute of a numpy array is a > pointer to the address in memory where the data begins with the strides > attribute pointing to a C-array of integers (in other words, all current > arrays are MEMORY arrays) > > But, when the MEMORY flag is not set, the data attribute instead points to > a length-2 C-array of pointers to functions > > [read(N, output_address, self->index_iter, self->extra), write(N, > input_address, self->index_iter, self->extra)] > > Either of these could then be NULL (i.e. if write is NULL, then the array > must be read-only). > > When the MEMORY flag is not set, the strides member of the ndarray > structure is a pointer to the index_iter object (which could be anything > that the particular read and write methods need it to be). > > The array structure should also get a member to hold the "extra" argument > (which would hold any state that the array needed to hold on to in order to > correctly perform the read or write operations --- i.e. it could hold an > execution graph for deferred evaluation). > > The index_iter structure is anything that the read and write methods need > to correctly identify *where* to write. Now, clearly, we could combine > index_iter and extra into just one "structure" that holds all needed state > for read and write to work correctly. The reason I propose two slots is > because at least mentally in the use case of having these structures be > calculation graphs, one of these structures is involved in "computing the > location to read/write" and the other is involved in "computing what to > read/write" > > The idea is fairly simple, but with some very interesting potential > features: > > * lazy evaluation (of indexing, ufuncs, etc.) > * fancy indexing as views instead of copies (really just another > example of lazy evaluation) > * compressed arrays > * generated arrays (from computation or streamed data) > * infinite arrays > * computed arrays > * missing-data arrays > * ragged arrays (shape would be the bounding box --- which makes me > think of ragged arrays as examples of masked arrays). > * arrays that view PIL data. > > One could build an array with a (logically) infinite number of elements (we > could use -2 in the shape tuple to indicate that). > > We don't need examples of all of these features for NumPy 2.0 to be > released, because to really make this useful, we would need to modify all > "calculation" code to produce a NON MEMORY array. What to do here still > needs a lot of thought and experimentation. > > But, I can think about a situation where all NumPy calculations that > produce arrays provide the option that when they are done inside of a > particular context, a user-supplied behavior over-rides the default return. > I want to study what Mark is proposing and understand his new iterator at > a deeper level before providing more thoughts here. > > That's the gist of what I am thinking about. I would love feedback and > comments. > > The other things I would like to see in NumPy 2.0 that have not been > discussed lately (that could affect the ABI) are: > > * a geometry member to the data structure (that allows labels to > dimensions and axes to be provided -- ala data_array) > * small array performance improvements that Mark Wiebe has suggested > (including the addition of an optional low-level loop that is used when you > have contiguous data) > * completed datetime implementation > * pointer data-types (i.e. the memory location holds a pointer to > another part of an ndarray) --- very useful for "join" - type arrays > > If anybody is interested in helping with any of these (and has time to do > it, let me know). Some of this I could fund (especially if you are willing > to come to Austin and be an intern for Enthought). > > Best regards, > > I'd kind of like to keep arrays simple, they are already pretty complex objects. Perhaps a higher level interface to lower level objects with a common API would be an easier way to go, that way functionality could be added piecewise as the need arises. I think would be good to stick to need driven additions as otherwise it is easy to get sucked into the quagmire of trying to design for every need and eventuality and projects like that never finish. What happens to the buffer API/persistence with all those additions? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Jan 27 19:42:40 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 27 Jan 2011 16:42:40 -0800 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> References: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> Message-ID: <4D421100.5040503@noaa.gov> On 1/27/11 3:35 PM, Travis Oliphant wrote: >>> What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? Speaking as someone who hasn't contributed code to numpy itself, I still really like to follow the development discussion, so I'll subscribe to both lists anyway, and filter them to the same place in my email. So it makes little difference to me. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jan 27 19:46:13 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 27 Jan 2011 16:46:13 -0800 Subject: [Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop In-Reply-To: <4D42059B.5080106@molden.no> References: <4D41E7EA.6080603@molden.no> <4D41E964.5030100@molden.no> <4D4200D0.7070007@noaa.gov> <4D42059B.5080106@molden.no> Message-ID: <4D4211D5.3060704@noaa.gov> On 1/27/11 3:54 PM, Sturla Molden wrote: > Lists allocate empty slots at their back, proportional to their size. So > as lists grows, re-allocations become rarer and rarer. Then on average > the complexity per append becomes O(1), which is the "amortised" > complexity. Appending N items to a list thus has the amortized > complexity O(N). I think I get that now... > NumPy arrays are designed to be fixed size, and not designed to amortize > the complexity of appends. So if you want to use arrays as efficient > re-sizeable containers, you must code this logic yourself. And I do get that. And yet, experimentally, appending numpy arrays (on that one simple example) appeared to be O(N). Granted, a much larger constant that for lists, but it sure looks linear to me. Should it be O(N^2)? Maybe I need to run it for larger N , but I got impatient as it is. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ben.root at ou.edu Thu Jan 27 22:02:07 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 27 Jan 2011 21:02:07 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: <4D421100.5040503@noaa.gov> References: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> <4D421100.5040503@noaa.gov> Message-ID: On Thursday, January 27, 2011, Christopher Barker wrote: > On 1/27/11 3:35 PM, Travis Oliphant wrote: > >>>> What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? > > Speaking as someone who hasn't contributed code to numpy itself, I still > really like to follow the development discussion, so I'll subscribe to > both lists anyway, and filter them to the same place in my email. So it > makes little difference to me. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Personally, I like separating the two lists and I think it has served matplotlib well. In gmail, I can tag them separately and with different colors. It makes it very easy to spot which emails I feel like handling at the moment. But, as I have been told, I am weird... Ben Root From bsouthey at gmail.com Thu Jan 27 22:23:32 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 27 Jan 2011 21:23:32 -0600 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: <6C754EFA-AD24-4AE1-86F7-EFD4321A5F71@enthought.com> <4D421100.5040503@noaa.gov> Message-ID: On Thu, Jan 27, 2011 at 9:02 PM, Benjamin Root wrote: > On Thursday, January 27, 2011, Christopher Barker wrote: >> On 1/27/11 3:35 PM, Travis Oliphant wrote: >> >>>>> What is the thought about having two separate NumPy lists (one for development discussions and one for user discussions)? >> >> Speaking as someone who hasn't contributed code to numpy itself, I still >> really like to follow the development discussion, so I'll subscribe to >> both lists anyway, and filter them to the same place in my email. So it >> makes little difference to me. >> >> -Chris >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice >> 7600 Sand Point Way NE ? (206) 526-6329 ? fax >> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Personally, I like separating the two lists and I think it has served > matplotlib well. ?In gmail, I can tag them separately and with > different colors. ?It makes it very easy to spot which emails I feel > like handling at the moment. > > But, as I have been told, I am weird... > > Ben Root > You bring up a good point, for those people that only use numpy or scipy then it makes sense to have separate lists. I have no problem with the lists as they are now. I also tag them differently so I tend to focus first on numpy then scipy-dev and finally scipy-user. For the most part people do post to the correct list so usually there is no confusion. Usually scipy-user tends to be very different from the other two but I do not see many major numpy threads that would really be the same as scipy-user. Also I think scipy-user has it's own vibe as well as scipy-dev being different than numpy list. Bruce From charlesr.harris at gmail.com Thu Jan 27 22:46:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Jan 2011 20:46:22 -0700 Subject: [Numpy-discussion] Should we make the master branch backward compatible. Message-ID: Hi All, Mark Wiebe has proposed making the master branch backward compatible with 1.5. The argument for doing this is that 1) removing the new bits for new releases is a chore as the refactor schedule slips and 2) the new ABI isn't settled and keeping the current code in won't help with the merge. Mark thinks it is possible to keep the datetime types along with the new half types while restoring backward compatibility, and if so we won't lose anything by making the change. I'm in favor of this change, but I may have overlooked something. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Jan 27 23:58:12 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 28 Jan 2011 13:58:12 +0900 Subject: [Numpy-discussion] Should we make the master branch backward compatible. In-Reply-To: References: Message-ID: On Fri, Jan 28, 2011 at 12:46 PM, Charles R Harris wrote: > Hi All, > > Mark Wiebe has proposed making the master branch backward compatible with > 1.5. The argument for doing this is that 1) removing the new bits for new > releases is a chore as the refactor schedule slips and 2) the new ABI isn't > settled and keeping the current code in won't help with the merge. Mark > thinks it is possible to keep the datetime types along with the new half > types while restoring backward compatibility, and if so we won't lose > anything by making the change. I'm in favor of this change, but I may have > overlooked something. Thoughts? I would be in favor too, but having not being able to code much in numpy the last few months, my opinion should not carry too much weight. I don't know how many people install numpy from github nowadays (which are the first "victims" when ABI breaks) cheers, David From oliphant at enthought.com Fri Jan 28 01:37:16 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Fri, 28 Jan 2011 00:37:16 -0600 Subject: [Numpy-discussion] Generator arrays In-Reply-To: References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> Message-ID: <57B6A8C8-7480-4B90-A45F-D28AE4B5D151@enthought.com> > > What happens to the buffer API/persistence with all those additions? I understand the desire to keep things simple, which is why I am only proposing a rather small change to the array object with *huge* implications --- encompassing the very cool deferred arrays that Mark Wiebe is proposing. As Einstein said, "everything should be as simple as possilbe, *but not simpler*". While now arrays have a data-pointer that always points to memory and an accompanying strides array, all I'm suggesting is that they allow for "indirect" or "computed arrays" in a fairly simple, but general-purpose way. Generators have been such a huge feature in Python, I really think we need to figure out how to have "generated arrays" in NumPy as well --- and it turns out to have huge features that right now are difficult with NumPy (including deferred evaluation). I guess it's debatable how complex the array object is. I actually see the array object itself as quite simple even with the changes. What is complicated is how calculations are done and scattered in an ad hoc fashion between ufuncs and other array functions. I like the idea of unifying the calculation framework using ideas like Mark's iterators and the generic functions that were added earlier to ufuncs. I don't like the data-types holding on to the "calculation structures". I think all calculations in NumPy should fit under a common rubric. To me this would be an important part of any change. Obviously the buffer API could only be implemented for MEMORY arrays (other arrays would raise an error). What to do with persistence is a good question, but resolvable I think. Initially, I would also raise an error for trying to pickle arrays that are not MEMORY arrays --- simply calling "copy" on an array gives you something that can be persisted. Having this kind of functionality on the base NumPy object would be transformational for NumPy use. Yes, you could do similar things with other approaches, but there is a lot of benefit of having a powerful fundamental object that is a shared-place to mange the expression of data calculations. Another approach is to introduce another object as you suggest which is the "generator array". This could work, especially if there were hooks in the calculation engine that allowed it to be produced by array operations (say in an appropriate context as described before). My main conerns are that in practice having a whole slew of different "array objects" (i.e. masked arrays, data arrays, labeled arrays, etc.) tends to cause code to be much bulkier to read in-practice (as you are doing a lot of conversions back and forth to take advantage of APIs that require one array or another. Having code that is written to a single object is unifying and really assists with code re-use and code readability. One of the things I see happening is a tool like Cython being used to generate the call-graphs or read-write functions that are being proposed. I could be convinced, though, that leaving array objects alone and creating a better calculation object (i.e. something like an array vector machine) embracing and extending ufuncs is a better way to go. But, I haven't seen that proposal. -Travis From markbak at gmail.com Fri Jan 28 05:25:19 2011 From: markbak at gmail.com (Mark Bakker) Date: Fri, 28 Jan 2011 11:25:19 +0100 Subject: [Numpy-discussion] Error in tanh for large complex argument Message-ID: I'll file a ticket. Incidentally, if tanh(z) is simply programmed as (1.0 - exp(-2.0*z)) / (1.0 + exp(-2.0*z)) the problem is fixed. Thanks, Mark [clip] > > Not for large complex values: > > > > In [85]: tanh(1000+0j) > > Out[85]: (nan+nan*j) > > Yep, it's a bug. Care to file a ticket? > > The implementation is just sinh/cosh, which overflows. > The fix is to provide an asymptotic expansion (sgn Re z), > although around the imaginary axis the switch is perhaps > somewhat messy to handle. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jan 28 05:26:27 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 28 Jan 2011 10:26:27 +0000 (UTC) Subject: [Numpy-discussion] Should we make the master branch backward compatible. References: Message-ID: Thu, 27 Jan 2011 20:46:22 -0700, Charles R Harris wrote: > Mark Wiebe has proposed making the master branch backward compatible > with 1.5. The argument for doing this is that 1) removing the new bits > for new releases is a chore as the refactor schedule slips and 2) the > new ABI isn't settled and keeping the current code in won't help with > the merge. Mark thinks it is possible to keep the datetime types along > with the new half types while restoring backward compatibility, and if > so we won't lose anything by making the change. I'm in favor of this > change, but I may have overlooked something. Thoughts? +1 from me, if that is possible. Pauli From pav at iki.fi Fri Jan 28 05:30:00 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 28 Jan 2011 10:30:00 +0000 (UTC) Subject: [Numpy-discussion] Error in tanh for large complex argument References: Message-ID: Fri, 28 Jan 2011 11:25:19 +0100, Mark Bakker wrote: > I'll file a ticket. > > Incidentally, if tanh(z) is simply programmed as > > (1.0 - exp(-2.0*z)) / (1.0 + exp(-2.0*z)) This will overflow as z -> -\infty. The solution is probably to use a different expression for Re(z) < 0, and to check how other libraries do this in case the above still misses something. Pauli From markbak at gmail.com Fri Jan 28 05:45:09 2011 From: markbak at gmail.com (Mark Bakker) Date: Fri, 28 Jan 2011 11:45:09 +0100 Subject: [Numpy-discussion] Error in tanh for large complex argument Message-ID: Good point, so we need a better solution that fixes all cases >> I'll file a ticket. >> >> Incidentally, if tanh(z) is simply programmed as >> >> (1.0 - exp(-2.0*z)) / (1.0 + exp(-2.0*z)) >This will overflow as z -> -\infty. The solution is probably to use a >different expression for Re(z) < 0, and to check how other libraries do >this in case the above still misses something. > > Pauli -------------- next part -------------- An HTML attachment was scrubbed... URL: From markbak at gmail.com Fri Jan 28 05:49:34 2011 From: markbak at gmail.com (Mark Bakker) Date: Fri, 28 Jan 2011 11:49:34 +0100 Subject: [Numpy-discussion] incorrect behavior when complex number with zero imaginary part is multiplied by inf Message-ID: When I multiply a complex number with inf, I get inf + inf*j: In [17]: inf * (1+1j) Out[17]: (inf+inf*j) Even when the imaginary part is really small: In [18]: inf * (1+1e-100j) Out[18]: (inf+inf*j) Yet when the imaginary part is zero (and it really is a real number), the imaginary part is nan: In [19]: inf * (1+0j) Out[19]: (inf+nan*j) That is not correct. It should really given (inf+0*j). (I know where it comes from, inf*0 is not defined, but in this case it is, as 1+0j is really a real number and inf is by definition real as well). If there is consensus I can file a ticket. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Fri Jan 28 06:37:33 2011 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 28 Jan 2011 12:37:33 +0100 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> Message-ID: <4D42AA7D.1000304@student.matnat.uio.no> On 01/28/2011 01:01 AM, Travis Oliphant wrote: > Just to start the conversation, and to find out who is interested, I would like to informally propose generator arrays for NumPy 2.0. This concept has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, it also allows for "compressed arrays", on-the-fly computed arrays, and streamed or generated arrays. > > Basically, the modification I would like to make is to have an array flag (MEMORY) that when set means that the data attribute of a numpy array is a pointer to the address in memory where the data begins with the strides attribute pointing to a C-array of integers (in other words, all current arrays are MEMORY arrays) > > But, when the MEMORY flag is not set, the data attribute instead points to a length-2 C-array of pointers to functions > > [read(N, output_address, self->index_iter, self->extra), write(N, input_address, self->index_iter, self->extra)] > > Either of these could then be NULL (i.e. if write is NULL, then the array must be read-only). > > When the MEMORY flag is not set, the strides member of the ndarray structure is a pointer to the index_iter object (which could be anything that the particular read and write methods need it to be). > > The array structure should also get a member to hold the "extra" argument (which would hold any state that the array needed to hold on to in order to correctly perform the read or write operations --- i.e. it could hold an execution graph for deferred evaluation). > > The index_iter structure is anything that the read and write methods need to correctly identify *where* to write. Now, clearly, we could combine index_iter and extra into just one "structure" that holds all needed state for read and write to work correctly. The reason I propose two slots is because at least mentally in the use case of having these structures be calculation graphs, one of these structures is involved in "computing the location to read/write" and the other is involved in "computing what to read/write" > > The idea is fairly simple, but with some very interesting potential features: > > * lazy evaluation (of indexing, ufuncs, etc.) > * fancy indexing as views instead of copies (really just another example of lazy evaluation) > * compressed arrays > * generated arrays (from computation or streamed data) > * infinite arrays > * computed arrays > * missing-data arrays > * ragged arrays (shape would be the bounding box --- which makes me think of ragged arrays as examples of masked arrays). > * arrays that view PIL data. > > One could build an array with a (logically) infinite number of elements (we could use -2 in the shape tuple to indicate that). > > We don't need examples of all of these features for NumPy 2.0 to be released, because to really make this useful, we would need to modify all "calculation" code to produce a NON MEMORY array. What to do here still needs a lot of thought and experimentation. > > But, I can think about a situation where all NumPy calculations that produce arrays provide the option that when they are done inside of a particular context, a user-supplied behavior over-rides the default return. I want to study what Mark is proposing and understand his new iterator at a deeper level before providing more thoughts here. > > That's the gist of what I am thinking about. I would love feedback and comments. > I guess my reaction is along the lines of Charles': Why can't "a + b", where a and b are NumPy arrays, simply return an object of a different type that is lazily evaluated? Why can't infinite arrays simply be yet another type? Of course, much useful functionality should then be refactored into a new "abstract array" class, and iterators etc. be given an API that works with more than one type. A special-case flag and function pointers seems a bit like reinventing OO to me, and OO is already provided by Python. Dag Sverre From dagss at student.matnat.uio.no Fri Jan 28 06:43:24 2011 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 28 Jan 2011 12:43:24 +0100 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <4D42AA7D.1000304@student.matnat.uio.no> References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> <4D42AA7D.1000304@student.matnat.uio.no> Message-ID: <4D42ABDC.9060508@student.matnat.uio.no> On 01/28/2011 12:37 PM, Dag Sverre Seljebotn wrote: > On 01/28/2011 01:01 AM, Travis Oliphant wrote: > >> Just to start the conversation, and to find out who is interested, I would like to informally propose generator arrays for NumPy 2.0. This concept has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, it also allows for "compressed arrays", on-the-fly computed arrays, and streamed or generated arrays. >> >> Basically, the modification I would like to make is to have an array flag (MEMORY) that when set means that the data attribute of a numpy array is a pointer to the address in memory where the data begins with the strides attribute pointing to a C-array of integers (in other words, all current arrays are MEMORY arrays) >> >> But, when the MEMORY flag is not set, the data attribute instead points to a length-2 C-array of pointers to functions >> >> [read(N, output_address, self->index_iter, self->extra), write(N, input_address, self->index_iter, self->extra)] >> >> Either of these could then be NULL (i.e. if write is NULL, then the array must be read-only). >> >> When the MEMORY flag is not set, the strides member of the ndarray structure is a pointer to the index_iter object (which could be anything that the particular read and write methods need it to be). >> >> The array structure should also get a member to hold the "extra" argument (which would hold any state that the array needed to hold on to in order to correctly perform the read or write operations --- i.e. it could hold an execution graph for deferred evaluation). >> >> The index_iter structure is anything that the read and write methods need to correctly identify *where* to write. Now, clearly, we could combine index_iter and extra into just one "structure" that holds all needed state for read and write to work correctly. The reason I propose two slots is because at least mentally in the use case of having these structures be calculation graphs, one of these structures is involved in "computing the location to read/write" and the other is involved in "computing what to read/write" >> >> The idea is fairly simple, but with some very interesting potential features: >> >> * lazy evaluation (of indexing, ufuncs, etc.) >> * fancy indexing as views instead of copies (really just another example of lazy evaluation) >> * compressed arrays >> * generated arrays (from computation or streamed data) >> * infinite arrays >> * computed arrays >> * missing-data arrays >> * ragged arrays (shape would be the bounding box --- which makes me think of ragged arrays as examples of masked arrays). >> * arrays that view PIL data. >> >> One could build an array with a (logically) infinite number of elements (we could use -2 in the shape tuple to indicate that). >> >> We don't need examples of all of these features for NumPy 2.0 to be released, because to really make this useful, we would need to modify all "calculation" code to produce a NON MEMORY array. What to do here still needs a lot of thought and experimentation. >> >> But, I can think about a situation where all NumPy calculations that produce arrays provide the option that when they are done inside of a particular context, a user-supplied behavior over-rides the default return. I want to study what Mark is proposing and understand his new iterator at a deeper level before providing more thoughts here. >> >> That's the gist of what I am thinking about. I would love feedback and comments. >> >> > I guess my reaction is along the lines of Charles': Why can't "a + b", > where a and b are NumPy arrays, simply return an object of a different > type that is lazily evaluated? Why can't infinite arrays simply be yet > another type? > > Of course, much useful functionality should then be refactored into a > new "abstract array" class, and iterators etc. be given an API that > works with more than one type. > > A special-case flag and function pointers seems a bit like reinventing > OO to me, and OO is already provided by Python. > Whoops. I spend too much time with Cython. Cython provides this kind of (fast, C-level) OO, but not Python. Sorry! Dag Sverre From markbak at gmail.com Fri Jan 28 06:57:18 2011 From: markbak at gmail.com (Mark Bakker) Date: Fri, 28 Jan 2011 12:57:18 +0100 Subject: [Numpy-discussion] Error in tanh for large complex argument In-Reply-To: References: Message-ID: Follow up: The behavior is correct for real argument: In [20]: sinh(1000) Out[20]: inf In [21]: cosh(1000) Out[21]: inf In [22]: tanh(1000) Out[22]: 1.0 So maybe we should look there for good logic, Mark On Fri, Jan 28, 2011 at 11:45 AM, Mark Bakker wrote: > Good point, so we need a better solution that fixes all cases > > > >> I'll file a ticket. > >> > >> Incidentally, if tanh(z) is simply programmed as > >> > >> (1.0 - exp(-2.0*z)) / (1.0 + exp(-2.0*z)) > > >This will overflow as z -> -\infty. The solution is probably to use a > >different expression for Re(z) < 0, and to check how other libraries do > >this in case the above still misses something. > > > > > Pauli > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jan 28 07:07:06 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 28 Jan 2011 12:07:06 +0000 (UTC) Subject: [Numpy-discussion] Error in tanh for large complex argument References: Message-ID: Fri, 28 Jan 2011 12:57:18 +0100, Mark Bakker wrote: > Follow up: > > The behavior is correct for real argument: [clip] > So maybe we should look there for good logic, In the real case you can do "if (abs(z) > cutoff) return sgn(z)", which is not the right thing to do for complex numbers. Anyway, Python's cmath functions correctly, so I'm first going to look there, and then at the glibc/gfortran implementation. Pauli From ralf.gommers at googlemail.com Fri Jan 28 07:26:46 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 28 Jan 2011 20:26:46 +0800 Subject: [Numpy-discussion] Numpy 2.0 schedule In-Reply-To: References: Message-ID: On Fri, Jan 28, 2011 at 1:36 AM, Charles R Harris wrote: > > > On Thu, Jan 27, 2011 at 9:17 AM, Mark Wiebe wrote: > >> On Thu, Jan 27, 2011 at 7:09 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> The PIL test can still be fixed before the final 0.9.0 release, it looks >>> like we will need another RC anyway. Does anyone have time for this in the >>> next few days? >>> >> >> I've attached a patch which fixes it for me. >> > Thanks, I'll check and apply it. > >> >>> I took a shot at fixing the ABI compatibility, and if PyArray_ArrFunc >>>>>>> was the main issue, then that might be done. An ABI compatible 1.6 with the >>>>>>> datetime and half types should be doable, just some extensions might get >>>>>>> confused if they encounter arrays made with the new data types. >>>>>>> >>>>>>> Even if you fixed the ABI incompatibility (I don't know enough about >>>>>> the issue to confirm that), I'm not sure how much value there is in a >>>>>> release with as main new feature two dtypes that are not going to work well >>>>>> with scipy/other binaries compiled against 1.5. >>>>>> >>>>> >>>>> I've recently gotten the faster ufunc NEP implementation finished >>>>> except for generalized ufuncs, and most things work the same or faster with >>>>> it. Below are some timings of 1.5.1 vs the new_iterator branch. In >>>>> particular, the overhead on small arrays hasn't gotten worse, but the output >>>>> memory layout speeds up some operations by a lot. >>>>> >>>>> Your new additions indeed look quite promising. I tried your >>> new_iterator branch but ran into a segfault immediately on running the tests >>> on OS X. I opened a ticket for it, to not mix it into this discussion about >>> releases too much: http://projects.scipy.org/numpy/ticket/1724. >>> >> >> Is that a non-Intel platform? While I tried to get aligned access right, >> it's likely there's a bug in it somewhere. >> > No, standard Intel and i386 Python. > >> Before we decide on a 1.6 release I would suggest to do at least the >>> following: >>> - review of ABI fixes by someone very familiar with the problem that >>> occurred in 1.4.0 (David, Pauli, Charles?) >>> - test on Linux, OS X and Windows 32-bit and 64-bit. Also with an MSVC >>> build on Windows, since that exposes more issues each release. >>> >> >> All tests pass for me now, maybe it's a good time to merge the branch into >> the trunk so we can run it on the buildbot? >> >> > Might be better to merge your unadulterated stuff into master, make a 1.6 > branch, and add the compatibility fixes in the branch. You can test branches > on the buildbot I think, at least that worked for svn, I haven't tried it > with github. > > The buildbot is not working with github yet. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbednar at inf.ed.ac.uk Fri Jan 28 08:29:55 2011 From: jbednar at inf.ed.ac.uk (James A. Bednar) Date: Fri, 28 Jan 2011 13:29:55 +0000 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: References: Message-ID: <19778.50387.907465.111907@cortex.inf.ed.ac.uk> | Date: Thu, 27 Jan 2011 16:32:22 -0700 | From: Charles R Harris | | On Thu, Jan 27, 2011 at 4:23 PM, Robert Kern wrote: | | > On Thu, Jan 27, 2011 at 17:17, Travis Oliphant wrote: | > | > > Hey all, | > > | > > What is the thought about having two separate NumPy lists (one | > > for development discussions and one for user discussions)? | > | > We've resisted it for years. I don't think the split has done | > scipy much good. But that may just be my perspective because I'm | > subscribed to both and filter them both to the same folder. | | Me too. I don't think there is so much traffic that a distinction | needs to be made. I'm subscribed to the numpy digest, and I have 8 digest emails from yesterday (27 January), i.e. one single day, sitting in my inbox. These 8 digests represent who knows how many separate emails. If that is not heavy traffic, I really wouldn't know what is! As someone who uses numpy heavily (I manage a large numpy-based software project) but is not a numpy developer, I would very much appreciate having a separate user list. I can't bring myself to unsubscribe from the current list, for fear of not noticing some important new features, related packages, or serious issues, but sorting out those things from the rest of the posts does take significant work. None of my actual developers subscribe any more, as they found the volume of posts overwhelming, so I've sacrificed myself so that I can try to notice anything important and bring it to their attention. Anything that would help that would be greatly appreciated! Jim -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From josef.pktd at gmail.com Fri Jan 28 08:56:05 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 Jan 2011 08:56:05 -0500 Subject: [Numpy-discussion] Developer NumPy list versus User NumPy list In-Reply-To: <19778.50387.907465.111907@cortex.inf.ed.ac.uk> References: <19778.50387.907465.111907@cortex.inf.ed.ac.uk> Message-ID: On Fri, Jan 28, 2011 at 8:29 AM, James A. Bednar wrote: > | ?Date: Thu, 27 Jan 2011 16:32:22 -0700 > | ?From: Charles R Harris > | > | ?On Thu, Jan 27, 2011 at 4:23 PM, Robert Kern wrote: > | > | ?> On Thu, Jan 27, 2011 at 17:17, Travis Oliphant wrote: > | ?> > | ?> > Hey all, > | ?> > > | ?> > What is the thought about having two separate NumPy lists (one > | ?> > for development discussions and one for user discussions)? > | ?> > | ?> We've resisted it for years. I don't think the split has done > | ?> scipy much good. But that may just be my perspective because I'm > | ?> subscribed to both and filter them both to the same folder. > | > | ?Me too. I don't think there is so much traffic that a distinction > | ?needs to be made. > > I'm subscribed to the numpy digest, and I have 8 digest emails from > yesterday (27 January), i.e. one single day, sitting in my inbox. > These 8 digests represent who knows how many separate emails. ?If that > is not heavy traffic, I really wouldn't know what is! > > As someone who uses numpy heavily (I manage a large numpy-based > software project) but is not a numpy developer, I would very much > appreciate having a separate user list. ?I can't bring myself to > unsubscribe from the current list, for fear of not noticing some > important new features, related packages, or serious issues, but > sorting out those things from the rest of the posts does take > significant work. ?None of my actual developers subscribe any more, as > they found the volume of posts overwhelming, so I've sacrificed myself > so that I can try to notice anything important and bring it to their > attention. ?Anything that would help that would be greatly > appreciated! Maybe a digest is not the best way to screen the messages. In threaded view (in gmail reader or Thunderbird) I have 2 to 5 threads a day in the last half month from the numpy mailing list, so I find it easy to screen threads. I think quite a bit of user traffic for numpy has moved to http://stackoverflow.com/questions/tagged/numpy and to me it looks like the mailing list gets mostly the "heavier" questions. Josef > > Jim > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From asmi.capri at gmail.com Fri Jan 28 10:01:36 2011 From: asmi.capri at gmail.com (Asmi Shah) Date: Fri, 28 Jan 2011 16:01:36 +0100 Subject: [Numpy-discussion] create a numpy array of images Message-ID: Hi guys, I am using python for a while now and I have a requirement of creating a numpy array of microscopic tiff images ( this data is 3d, meaning there are 100 z slices of 512 X 512 pixels.) How can I create an array of images? i then would like to use visvis for visualizing this in 3D. any help is highly appreciated to get me started.. Thanks,.. -- Regards, Asmi Shah -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Fri Jan 28 10:19:55 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 28 Jan 2011 09:19:55 -0600 Subject: [Numpy-discussion] Should we make the master branch backward compatible. In-Reply-To: References: Message-ID: <4D42DE9B.8030403@gmail.com> On 01/27/2011 10:58 PM, David Cournapeau wrote: > On Fri, Jan 28, 2011 at 12:46 PM, Charles R Harris > wrote: >> Hi All, >> >> Mark Wiebe has proposed making the master branch backward compatible with >> 1.5. The argument for doing this is that 1) removing the new bits for new >> releases is a chore as the refactor schedule slips and 2) the new ABI isn't >> settled and keeping the current code in won't help with the merge. Mark >> thinks it is possible to keep the datetime types along with the new half >> types while restoring backward compatibility, and if so we won't lose >> anything by making the change. I'm in favor of this change, but I may have >> overlooked something. Thoughts? > I would be in favor too, but having not being able to code much in > numpy the last few months, my opinion should not carry too much > weight. I don't know how many people install numpy from github > nowadays (which are the first "victims" when ABI breaks) > > cheers, > > David It is important to hear from people like Keith that build upon numpy and those that build numpy binaries for distribution especially Windows and non-gcc stuff like Intel's compilers and MKL. So while I do count less but I am in favor of it provided that scipy can build and run correctly with this new numpy. Bruce From pav at iki.fi Fri Jan 28 10:23:09 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 28 Jan 2011 15:23:09 +0000 (UTC) Subject: [Numpy-discussion] incorrect behavior when complex number with zero imaginary part is multiplied by inf References: Message-ID: Fri, 28 Jan 2011 11:49:34 +0100, Mark Bakker wrote: [clip] > Yet when the imaginary part is zero (and it really is a real number), > the imaginary part is nan: > > In [19]: inf * (1+0j) > Out[19]: (inf+nan*j) > > That is not correct. It should really given (inf+0*j). (I know where it > comes from, inf*0 is not defined, but in this case it is, as 1+0j is > really a real number and inf is by definition real as well). > > If there is consensus I can file a ticket. Both behaviors are accepted by the C99 standard: all combinations where one entry is `+-inf` are equivalent to the complex infinity. gcc itself returns `inf-1j*nan`. gfortran returns `inf+1j*nan`. A good rationale for the present behavior is that there is no way to know that 1+0j is supposed to be real; it could as well be a number too small to represent (eg. result from an underflow in the imaginary part), in which case `nan` is indeed the correct result. -- Pauli Virtanen From xscript at gmx.net Fri Jan 28 10:25:45 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Fri, 28 Jan 2011 16:25:45 +0100 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> (Travis Oliphant's message of "Thu, 27 Jan 2011 18:01:27 -0600") References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> Message-ID: <87mxmldp8m.fsf@ginnungagap.bsc.es> Travis Oliphant writes: > This concept has as one use-case, the deferred arrays that Mark Wiebe > has proposed. Interesting, I didn't read about that. In fact, I was playing around with a proxy wrapper for ndarrays not long ago, in order to build a tree of deferred operations that can be later optimized through numexpr once __str__ or __repr__ is called on such a deferred object. The idea was to have something like: a = np.array(...) a = defereval(a) # returns a proxy wrapper for known methods of np.ndarray b = 10 + a ** 2 print a # here the tree of deferred operations is flattened # into a string that numpexpr can use I didn't play much with it, but proxying all methods but __str__ and __repr__ (thus iterating on the original a.__dict__) seemed to suffice. The benefits I see of building this into ndarray itself is that ndarray would then be the hourglass waist of the framework. Subclassing ndarray is moderately complex right now, so I think that having a way to move some of these subclasses below the hourglass waist and not having to deal with the overloading of ndarray's UI would be a big step forward towards extension code simplicity. So, having near-zero knowledge on the internals of numpy and all new features that have been discussed here, my naive view of what the stack should contain is: * ndarray subclasses Overload indexing (e.g., data_array's named dimension elements), translating any fancy indexing into ndarray's "native" indexing methods Overload user representation (e.g., show some extra info when printing an array) * ndarray slicing and numeric operations A central point for slicing/indexing (the output should be either views or copies) A central point to control the deferral of operations (both native and extensions - see below -). In fact, I see deferred operations as just a form of copy-on-write/evaluate-on-access views (COW must be used when one of the input operands of a deferred tree of operations is modified after capturing it into such a tree). * numeric operations extensions Numeric operations should be first-class if deferred operation evaluation is to be taken to its highest potential, and thus they should be aware of an "operation evaluation engine" (as well as the other way around). If they are not (and they should be able not to be), two things can happen: - for those based only on first-class operations, it is just the root of a subtree - if more complex operations are performed (explicit looping?), they simply diminish the range of possibilities of optimizing opearation evaluation (actually producing multiple evaluation trees, or maybe simply forcing evaluation). * operation evaluation engine This would take care of evaluating the operation tree, while performing optimizations on it. Fortunately, if a sensible interface is established between this and first-class numeric operations, a first implementation can provide just the naive evaluation, and further optimizations can be provided behind the scenes. Such optimizations would provide things like operation tree simplification/reorganization, blocking (a la numexpr) and parallellization of computations. * storage access extensions Slicing in ndarray should be aware of objects represented by means other than "plain strided memory buffers": e.g., the compressed array case (where decompression could be treated with a sliding window), or deferred operation evaluation itself. In fact, as you pointed of with the MEMORY flag, both storage and operation evaluation can be subject to the common concept of deferral (accessing a compressed array is just another form of accessing computed contents, like accessing elements on a deferred array). I just hope they're all not just obvious observations of what has already been said. Lluis PS: sorry for the unnecessarily long mail -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From friedrichromstedt at gmail.com Fri Jan 28 13:23:15 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Fri, 28 Jan 2011 19:23:15 +0100 Subject: [Numpy-discussion] Strange behaviour of numpy.asarray() in corner case Message-ID: Python 2.6.6 (r266:84374, Aug 31 2010, 11:00:51) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__version__ '1.5.1' >>> class X: ... pass ... >>> numpy.asarray([X(), numpy.asarray([1, 1])]).shape (2,) >>> numpy.asarray([numpy.asarray([1, 1]), X()]).shape () >>> I would expect (2,) in the second case too. The following works ok: >>> numpy.asarray([1, X()]).shape (2,) >>> numpy.asarray([X(), 1]).shape (2,) Friedrich From nadavh at visionsense.com Fri Jan 28 13:44:07 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Fri, 28 Jan 2011 10:44:07 -0800 Subject: [Numpy-discussion] Error in tanh for large complex argument In-Reply-To: References: Message-ID: <26FC23E7C398A64083C980D16001012D1AD93B9421@VA3DIAXVS361.RED001.local> A brief history: I wrote the asinh and acosh functions for the math (or was it cmath?) for python 2.0. It fixed some problems of GVR implementation, but still it was far from perfect, and replaced shortly after. My 1/4 cent tip: Do not rush --- find a good code. Nadav ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Mark Bakker [markbak at gmail.com] Sent: 28 January 2011 12:45 To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Error in tanh for large complex argument Good point, so we need a better solution that fixes all cases >> I'll file a ticket. >> >> Incidentally, if tanh(z) is simply programmed as >> >> (1.0 - exp(-2.0*z)) / (1.0 + exp(-2.0*z)) >This will overflow as z -> -\infty. The solution is probably to use a >different expression for Re(z) < 0, and to check how other libraries do >this in case the above still misses something. > > Pauli -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Jan 28 13:57:43 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 28 Jan 2011 10:57:43 -0800 Subject: [Numpy-discussion] create a numpy array of images In-Reply-To: References: Message-ID: <4D4311A7.20007@noaa.gov> On 1/28/11 7:01 AM, Asmi Shah wrote: > I am using python for a while now and I have a requirement of creating a > numpy array of microscopic tiff images ( this data is 3d, meaning there are > 100 z slices of 512 X 512 pixels.) How can I create an array of images? It's quite straightforward to create a 3-d array to hold this kind of data: image_block = np.empty((100, 512, 512), dtype=??) now you can load it up by using some lib (PIL, or ???) to load the tif images, and then: for i in images: image_block[i,:,:] = i note that I put dtype to ??? up there. What dtype you want is dependent on what's in the tiff images -- tiff can hold just about anything. So if they are say, 16 bit greyscale, you'd want: dtype=np.uint16 if they are 24 bit rgb, you might want a custom dtype (I don't think there is a 24 bit dtype built in): RGB_type = np.dtype([('r',np.uint8),('g',np.uint8),('b',np.uint8)]) for 32 bit rgba, you can use the same approach, or just a 32 bit integer. The cool thing is that you can make views of this array with different dtypes, depending on what's easiest for the given use case. You can even break out the rgb parts into different axis: image_block = np.empty((100, 512, 512), dtype=RGB_type) image_block_rgb=image_block.view(dtype=np.uint8).reshape((100,512,512,3)) The two arrays now share the same data block, but you can look at them differently. I think this a really cool feature of numpy. > i then would like to use visvis for visualizing this in 3D. you'll have to see what visvis is expecting in terms of data types, etc. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From kanterburiad at gmail.com Fri Jan 28 16:03:17 2011 From: kanterburiad at gmail.com (kant erburiad) Date: Fri, 28 Jan 2011 22:03:17 +0100 Subject: [Numpy-discussion] SWIG examples from the Cookbook Message-ID: Hi, I'd like to ask for your help regarding the use of SWIG with numpy. ** problem description ** While I can compile successfully the examples provided in ./numpy/doc/swig I can't compile the first example provided in the Cookbook. http://www.scipy.org/Cookbook/SWIG_NumPy_examples "A simple ARGOUT_ARRAY1 example" I have this error: ezrange_wrap.c: In function ?SWIG_AsVal_long?: ezrange_wrap.c:3227: error: initializer element is not constant ** hypothesis and ugly workaround ** Comparing the examples from ./numpy/doc/swig and from the Cookbook, I noticed the main difference lies in the fact the first are in C++ while the latter in plain C. I "converted" the example from the Coobook to look like a c++ project. Essentially I renamed the .c file with a .cxx extension and I modified setup.py accordingly. This time it compiles successfully and the module is usable in python. ** questions ** 1/ What should I do to compile the Cookbook example with a C compiler? 2/ If it appears swig is now only compatible with the C++ compiler, do you have a practical workaround to propose? (without renaming the C files with a .cxx extension) ? ** configuration ** debian testing i686 swig version 1.3.40 python-numpy version 1.5.1 (I also tried 1.4.1) python version 2.6 I hope I explained my problem clearly enough. Tell me if you need more details. Regards, KB -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Fri Jan 28 16:18:46 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Fri, 28 Jan 2011 15:18:46 -0600 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <87mxmldp8m.fsf@ginnungagap.bsc.es> References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> <87mxmldp8m.fsf@ginnungagap.bsc.es> Message-ID: <1FD06257-33BD-4F8C-88A8-5BBB9A3B16F1@enthought.com> Thanks for the long email. I think there are a lot of thoughts around some of these ideas and it is good to get as many of them articulated as possible. I learn much from these kinds of discussions. I think others value them as well. I like your ideas about what kind of overloading hooks, subclasses of ndarray's should really be allowed to over-write. One thing I didn't talk about in my previous long email, was the re-organization of calculation functions that needs to happen. I really think that the ufunc concept needs to be broadened so that all function-pointers that are currently attached to data-type objects can be handled under the same calculation super-structure. This re-factoring would go a long way into cementing what kind of API is needed for different "array objects". I am persuaded that improving framework for vectorized calculations which allow for any array-like objects (objects satisfying a certain protocol or API) is a better approach then altering the nice map of ndarray to in-memory data. Then, deferred arrays, masked arrays, computed arrays, and other array-like objects could provide protocols and APIs (and callbacks) that satisfy this general calculation structure. This kind of generalization is probably more useful than changes to the array object itself. But, it's also hard and I'm not entirely sure what that structure should be. I'm looking forward to thoughts in this direction and looking more closely at what Mark has done with writing ufuncs as wrappers around his iterators. I'm concerned that his iterators don't support the generalized ufunc interface that I really like and was hoping would provide the abstraction needed to allow all the functions currently attached to dtypes (searchsorted, etc.) to be incorporated in the generalized calculation structure. -Travis On Jan 28, 2011, at 9:25 AM, Llu?s wrote: > Travis Oliphant writes: > >> This concept has as one use-case, the deferred arrays that Mark Wiebe >> has proposed. > > Interesting, I didn't read about that. > > In fact, I was playing around with a proxy wrapper for ndarrays not long > ago, in order to build a tree of deferred operations that can be later > optimized through numexpr once __str__ or __repr__ is called on such a > deferred object. The idea was to have something like: > > a = np.array(...) > a = defereval(a) # returns a proxy wrapper for known methods of np.ndarray > b = 10 + a ** 2 > print a # here the tree of deferred operations is flattened > # into a string that numpexpr can use > > I didn't play much with it, but proxying all methods but __str__ and > __repr__ (thus iterating on the original a.__dict__) seemed to suffice. > > > The benefits I see of building this into ndarray itself is that ndarray > would then be the hourglass waist of the framework. > > Subclassing ndarray is moderately complex right now, so I think that > having a way to move some of these subclasses below the hourglass waist > and not having to deal with the overloading of ndarray's UI would be a > big step forward towards extension code simplicity. > > So, having near-zero knowledge on the internals of numpy and all new > features that have been discussed here, my naive view of what the stack > should contain is: > > * ndarray subclasses > > Overload indexing (e.g., data_array's named dimension elements), > translating any fancy indexing into ndarray's "native" indexing > methods > > Overload user representation (e.g., show some extra info when printing > an array) > > * ndarray slicing and numeric operations > > A central point for slicing/indexing (the output should be either > views or copies) > > A central point to control the deferral of operations (both native and > extensions - see below -). In fact, I see deferred operations as just > a form of copy-on-write/evaluate-on-access views (COW must be used > when one of the input operands of a deferred tree of operations is > modified after capturing it into such a tree). > > * numeric operations extensions > > Numeric operations should be first-class if deferred operation > evaluation is to be taken to its highest potential, and thus they > should be aware of an "operation evaluation engine" (as well as the > other way around). > > If they are not (and they should be able not to be), two things can > happen: > > - for those based only on first-class operations, it is just the root > of a subtree > > - if more complex operations are performed (explicit looping?), they > simply diminish the range of possibilities of optimizing opearation > evaluation (actually producing multiple evaluation trees, or maybe > simply forcing evaluation). > > * operation evaluation engine > > This would take care of evaluating the operation tree, while > performing optimizations on it. > > Fortunately, if a sensible interface is established between this and > first-class numeric operations, a first implementation can provide > just the naive evaluation, and further optimizations can be provided > behind the scenes. > > Such optimizations would provide things like operation tree > simplification/reorganization, blocking (a la numexpr) and > parallellization of computations. > > * storage access extensions > > Slicing in ndarray should be aware of objects represented by means > other than "plain strided memory buffers": e.g., the compressed array > case (where decompression could be treated with a sliding window), or > deferred operation evaluation itself. > > In fact, as you pointed of with the MEMORY flag, both storage and > operation evaluation can be subject to the common concept of deferral > (accessing a compressed array is just another form of accessing > computed contents, like accessing elements on a deferred array). > > > I just hope they're all not just obvious observations of what has > already been said. > > > Lluis > > PS: sorry for the unnecessarily long mail > > -- > "And it's much the same thing with knowledge, for whenever you learn > something new, the whole world becomes that much richer." > -- The Princess of Pure Reason, as told by Norton Juster in The Phantom > Tollbooth > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From mwwiebe at gmail.com Fri Jan 28 17:40:33 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 28 Jan 2011 14:40:33 -0800 Subject: [Numpy-discussion] Generator arrays In-Reply-To: <1FD06257-33BD-4F8C-88A8-5BBB9A3B16F1@enthought.com> References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> <87mxmldp8m.fsf@ginnungagap.bsc.es> <1FD06257-33BD-4F8C-88A8-5BBB9A3B16F1@enthought.com> Message-ID: On Fri, Jan 28, 2011 at 1:18 PM, Travis Oliphant wrote: > > Just to start the conversation, and to find out who is interested, I would > like to informally propose generator arrays for NumPy 2.0. This concept > has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, > it also allows for "compressed arrays", on-the-fly computed arrays, and > streamed or generated arrays. > I like the idea, it could work very well. It's also a bit risky, in the sense that if the design isn't right in the end it could be overly complicated or perform poorly. The design will need time to bake. > Basically, the modification I would like to make is to have an array flag > (MEMORY) that when set means that the data attribute of a numpy array is a > pointer to the address in memory where the data begins with the strides > attribute pointing to a C-array of integers (in other words, all current > arrays are MEMORY arrays) > > But, when the MEMORY flag is not set, the data attribute instead points to > a length-2 C-array of pointers to functions > > [read(N, output_address, self->index_iter, self->extra), write(N, > input_address, self->index_iter, self->extra)] > > Either of these could then be NULL (i.e. if write is NULL, then the array > must be read-only). > > When the MEMORY flag is not set, the strides member of the ndarray > structure is a pointer to the index_iter object (which could be anything > that the particular read and write methods need it to be). > The details will have to be worked out, but one additional thing the deferred calculation needs is a way to re-enable writing to arrays that are referenced by a deferred calculation. For finite non-MEMORY arrays, a common operation will probably be to convert them in-place into MEMORY arrays. The array structure should also get a member to hold the "extra" argument > (which would hold any state that the array needed to hold on to in order to > correctly perform the read or write operations --- i.e. it could hold an > execution graph for deferred evaluation). > > The index_iter structure is anything that the read and write methods need > to correctly identify *where* to write. Now, clearly, we could combine > index_iter and extra into just one "structure" that holds all needed state > for read and write to work correctly. The reason I propose two slots is > because at least mentally in the use case of having these structures be > calculation graphs, one of these structures is involved in "computing the > location to read/write" and the other is involved in "computing what to > read/write" > There are many ways one may want values to be gotten or put - dense array data, values where a boolean mask has true values, a flat array with values from specified arbitrary coordinates. Some data sources may only support a subset of the possibilities, and others may be able to support very fast arbitrary access. There probably needs to be a hierarchy of access methods, similar to C++ STL iterators. The idea is fairly simple, but with some very interesting potential > features: > > * lazy evaluation (of indexing, ufuncs, etc.) > * fancy indexing as views instead of copies (really just another > example of lazy evaluation) > * compressed arrays > * generated arrays (from computation or streamed data) > * infinite arrays > * computed arrays > * missing-data arrays > * ragged arrays (shape would be the bounding box --- which makes me > think of ragged arrays as examples of masked arrays). > * arrays that view PIL data. > > One could build an array with a (logically) infinite number of elements (we > could use -2 in the shape tuple to indicate that). > The infinite shape reminds me of D's ranges. It's also getting into a territory where specifying a bounding box for the data makes more sense than just a shape. For just 1D data, you have [lower, upper], (-inf, upper], [lower, +inf), and (-inf, inf) cases. One thing I didn't talk about in my previous long email, was the > re-organization of calculation functions that needs to happen. I really > think that the ufunc concept needs to be broadened so that all > function-pointers that are currently attached to data-type objects can be > handled under the same calculation super-structure. > I like this approach since it's going in a more library-oriented direction. The generic style of programming popularized by STL can provide good guidance to design this. Even if C doesn't support templates, designing orthogonal interfaces for calculation and iteration is still possible. The design will also need layering. At it's simplest, one should be able to specify a ufunc with just a "double calc(double, double)" function and one simple object creation function call. At the same time, one should be able to specialize the calculation function to do inner contiguous loops, accumulation loops, or use SSE to get big speed improvements. > > This re-factoring would go a long way into cementing what kind of API is > needed for different "array objects". I am persuaded that improving > framework for vectorized calculations which allow for any array-like objects > (objects satisfying a certain protocol or API) is a better approach then > altering the nice map of ndarray to in-memory data. > The interface to the iterator already is general enough that, for example, a "compressed array" iterator could provide the same interface to client code. This could be generalized further as needed. Francesc noticed this when I modified the numexpr code. Then, deferred arrays, masked arrays, computed arrays, and other array-like > objects could provide protocols and APIs (and callbacks) that satisfy this > general calculation structure. This kind of generalization is probably > more useful than changes to the array object itself. > > But, it's also hard and I'm not entirely sure what that structure should > be. I'm looking forward to thoughts in this direction and looking more > closely at what Mark has done with writing ufuncs as wrappers around his > iterators. I'm concerned that his iterators don't support the generalized > ufunc interface that I really like and was hoping would provide the > abstraction needed to allow all the functions currently attached to dtypes > (searchsorted, etc.) to be incorporated in the generalized calculation > structure. > Think of the iterator as a tool just for accessing some or all of the elements in an array or arrays. The ufunc interfaces for generic functions, reductions, accumulations, and the generalized ufunc simply use the iterator to access the data, the iterator itself doesn't constrain them. Particularly the op_axes parameter turned out to be flexible in a way which made these multitude of different views of the data for each calculation possible. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jan 28 18:37:20 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 28 Jan 2011 15:37:20 -0800 Subject: [Numpy-discussion] Should we make the master branch backward compatible. In-Reply-To: <4D42DE9B.8030403@gmail.com> References: <4D42DE9B.8030403@gmail.com> Message-ID: Does anyone have any objections to me merging the branch into the numpy trunk right now? Chuck suggested I try to split out the ABI changes, but they're kind of tangled with the other changes. In particular, they involve fixing the type promotion code to be enum order-independent, which depended on some changes done for the iterator buffering code. After the key ABI fixes, there are then changes to a fair bit of code to make both numpy and scipy pass their tests. Anyways, editing that history at all feels like a bit of a quagmire, so I'd sooner just go ahead and do the merge. Cheers, Mark On Fri, Jan 28, 2011 at 7:19 AM, Bruce Southey wrote: > On 01/27/2011 10:58 PM, David Cournapeau wrote: > > On Fri, Jan 28, 2011 at 12:46 PM, Charles R Harris > > wrote: > >> Hi All, > >> > >> Mark Wiebe has proposed making the master branch backward compatible > with > >> 1.5. The argument for doing this is that 1) removing the new bits for > new > >> releases is a chore as the refactor schedule slips and 2) the new ABI > isn't > >> settled and keeping the current code in won't help with the merge. Mark > >> thinks it is possible to keep the datetime types along with the new half > >> types while restoring backward compatibility, and if so we won't lose > >> anything by making the change. I'm in favor of this change, but I may > have > >> overlooked something. Thoughts? > > I would be in favor too, but having not being able to code much in > > numpy the last few months, my opinion should not carry too much > > weight. I don't know how many people install numpy from github > > nowadays (which are the first "victims" when ABI breaks) > > > > cheers, > > > > David > > It is important to hear from people like Keith that build upon numpy and > those that build numpy binaries for distribution especially Windows and > non-gcc stuff like Intel's compilers and MKL. > > So while I do count less but I am in favor of it provided that scipy can > build and run correctly with this new numpy. > > Bruce > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 28 19:14:23 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Jan 2011 17:14:23 -0700 Subject: [Numpy-discussion] Should we make the master branch backward compatible. In-Reply-To: References: <4D42DE9B.8030403@gmail.com> Message-ID: On Fri, Jan 28, 2011 at 4:37 PM, Mark Wiebe wrote: > Does anyone have any objections to me merging the branch into the numpy > trunk right now? > > Chuck suggested I try to split out the ABI changes, but they're kind of > tangled with the other changes. In particular, they involve fixing the type > promotion code to be enum order-independent, which depended on some changes > done for the iterator buffering code. After the key ABI fixes, there are > then changes to a fair bit of code to make both numpy and scipy pass their > tests. Anyways, editing that history at all feels like a bit of a quagmire, > so I'd sooner just go ahead and do the merge. > > Go ahead and merge it in and we'll see how it goes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jan 28 19:46:51 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 28 Jan 2011 16:46:51 -0800 Subject: [Numpy-discussion] Should we make the master branch backward compatible. In-Reply-To: References: <4D42DE9B.8030403@gmail.com> Message-ID: On Fri, Jan 28, 2011 at 4:14 PM, Charles R Harris wrote: > > > Go ahead and merge it in and we'll see how it goes. > > I did the merge, and tried to trigger the buildbot, but it looks like a github svn issue has reared its head: http://support.github.com/discussions/repos/3155-svn-checkout-error-200-ok-error -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From akabaila at pcug.org.au Sat Jan 29 06:40:05 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Sat, 29 Jan 2011 22:40:05 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. Message-ID: <201101292240.05470.akabaila@pcug.org.au> Hi, I am interested in determining if a matrix is singular or "nearly singular" - very ill conditioned. The problem occurs in structural engineering applications. My OS is kubuntu 10.10 (32 bit) Python 2.6.6 numpy and numpy.linalg binaries from ubuntu repositories. The attached tar ball has a little CLI script that generates singular or near singular matrices (because of the inevitable roundoffs) for matrices with elements from sequence 1, 2, 3, 4 etc. The dimension of matrix nn can be passed as command line parameter via sys.argv[1] . If argv[1] does not exist, the 5x5 default matrix is used. for nn = 3 and 4 numpy does not raise an exception for nn = 5 it does raise an exception for nn = 6, 7 np not raises exception for nn = 8 np does raise exception for nn = 9 np does not raise exception for higher nn values np mostly raises the exception, but for nn = 23 and nn=120 it does NOT raise the exception. It is worht noting that in practical problems of engineering analyisis the ill conditioned matrix is not "exact" - there always are approximations and roundoff errors. So my question is: how can one reliably detect singularity (or near singularity) and raise an exception? Many thanks for your attention, Al. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf -------------- next part -------------- A non-text attachment was scrubbed... Name: inversion.tar.gz Type: application/x-compressed-tar Size: 1089 bytes Desc: not available URL: From sdb at cloud9.net Sat Jan 29 06:47:23 2011 From: sdb at cloud9.net (Stuart Brorson) Date: Sat, 29 Jan 2011 06:47:23 -0500 (EST) Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <201101292240.05470.akabaila@pcug.org.au> References: <201101292240.05470.akabaila@pcug.org.au> Message-ID: > So my question is: how can one reliably detect singularity (or > near singularity) and raise an exception? Matrix condition number: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.cond.html http://en.wikipedia.org/wiki/Condition_number Stuart From scheffer.nicolas at gmail.com Sat Jan 29 16:01:45 2011 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Sat, 29 Jan 2011 13:01:45 -0800 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix Message-ID: Hi all, First email to the list for me, I just want to say how grateful I am to have python+numpy+ipython etc... for my day to day needs. Great combination of software. Anyway, I've been having this bottleneck in one my algorithms that has been bugging me for quite a while. The objective is to speed this part up. I've been doing tons of optimization and parallel processing around that piece of code to get a decent run time. The problem is easy. You want to accumulate in a matrix, a weighted sum of other matrices. Let's call this function scale_and_add: def scale_and_add_re(R,w,Ms): (nb_add,mdim,j)=np.shape(Ms) for i in range(nb_add): R+=w[i]*Ms[i] return R This 'for' loop bugs me since I know this will slow things down. But the dimension of these things are so large that any attempt to vectorize this is slower and takes too much memory. I typically work with 1000 weights and matrices, matrices of dimension of several hundred. My current config is: In [2]: %timeit scale_and_add_re(R,w,Ms) 1 loops, best of 3: 392 ms per loop In [3]: w.shape Out[3]: (2000,) In [4]: Ms.shape Out[4]: (2000, 200, 200) I'd like to be able to double these dimensions. For instance I could use broadcasting by using a dot product %timeit dot(Ms.T,w) 1 loops, best of 3: 1.77 s per loop But this is i) slower ii) takes too much memory (btw, I'd really need an inplace dot-product in numpy to avoid the copy in memory, like the blas call in scipy.linalg. But that's for an other thread...) The matrices are squared and symmetric. I should be able to get something out of this, but I never found anything related to this in Numpy. I also tried a Cython reimplementation %timeit scale_and_add_reg(L1,w,Ms) 1 loops, best of 3: 393 ms per loop It brought nothing in speed. Here's the code @cython.boundscheck(False) def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): return _scale_and_add_reg(R,w,Ms) @cython.boundscheck(False) cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): cdef unsigned int mdim cdef int nb_add (nb_add,mdim,j)=np.shape(Ms) cdef unsigned int i for i from 0 <= i < nb_add: R+=w[i]*Ms[i] #for j in range(mdim): # for k in range(mdim): # R[j][k]+=w[i]*Ms[i][j][k] return 0 So here's my question if someone has time to answer it: Did I try anything possible? Should I move along and deal with this bottleneck? Or is there something else I didn't think of? Thanks for reading, keep up the great work! -n From ben.root at ou.edu Sat Jan 29 16:17:37 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 29 Jan 2011 15:17:37 -0600 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: On Saturday, January 29, 2011, Nicolas SCHEFFER wrote: > Hi all, > > First email to the list for me, I just want to say how grateful I am > to have python+numpy+ipython etc... for my day to day needs. Great > combination of software. > > Anyway, I've been having this bottleneck in one my algorithms that has > been bugging me for quite a while. > The objective is to speed this part up. I've been doing tons of > optimization and parallel processing around that piece of code to get > a decent run time. > > The problem is easy. You want to accumulate in a matrix, a weighted > sum of other matrices. Let's call this function scale_and_add: > def scale_and_add_re(R,w,Ms): > ? ?(nb_add,mdim,j)=np.shape(Ms) > ? ?for i in range(nb_add): > ? ? ? ?R+=w[i]*Ms[i] > ? ?return R > This 'for' loop bugs me since I know this will slow things down. > > But the dimension of these things are so large that any attempt to > vectorize this is slower and takes too much memory. > I typically work with 1000 weights and matrices, matrices of dimension > of several hundred. > > My current config is: > In [2]: %timeit scale_and_add_re(R,w,Ms) > 1 loops, best of 3: 392 ms per loop > > In [3]: w.shape > Out[3]: (2000,) > > In [4]: Ms.shape > Out[4]: (2000, 200, 200) > I'd like to be able to double these dimensions. > > For instance I could use broadcasting by using a dot product > %timeit dot(Ms.T,w) > 1 loops, best of 3: 1.77 s per loop > But this is i) slower ii) takes too much memory > (btw, I'd really need an inplace dot-product in numpy to avoid the > copy in memory, like the blas call in scipy.linalg. But that's for an > other thread...) > > The matrices are squared and symmetric. I should be able to get > something out of this, but I never found anything related to this in > Numpy. > > I also tried a Cython reimplementation > %timeit scale_and_add_reg(L1,w,Ms) > 1 loops, best of 3: 393 ms per loop > It brought nothing in speed. > > Here's the code > @cython.boundscheck(False) > def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, > ndim=1] w, np.ndarray[float, ndim=3] Ms): > ? ?return _scale_and_add_reg(R,w,Ms) > > @cython.boundscheck(False) > cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R, > np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): > ? ?cdef unsigned int mdim > ? ?cdef int nb_add > ? ?(nb_add,mdim,j)=np.shape(Ms) > ? ?cdef unsigned int i > ? ?for i from 0 <= i < nb_add: > ? ? ? ?R+=w[i]*Ms[i] > ? ? ? ?#for j in range(mdim): > ? ? ? ?# ? ?for k in range(mdim): > ? ? ? ?# ? ? ? ?R[j][k]+=w[i]*Ms[i][j][k] > > ? ?return 0 > > So here's my question if someone has time to answer it: Did I try > anything possible? Should I move along and deal with this bottleneck? > Or is there something else I didn't think of? > > Thanks for reading, keep up the great work! > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Have you tried chunking the vectoriized version of the code? By chunking it, you gain the speed-ups of vectorizing, but still stay within manageable memory sizes. You would first pre-allocate you output array. Then you would create an array of indices starting at zero and ending with the size of your array. The values would be spaced so that you could use the index at j as the start of a slice and the index at j + 1 as the end of a slice. I typically use 4096 as my increment and allow the last chunk to be smaller. Your loop would loop through these indices, using your vectorized code in its body. The modification is that you will need to specify the slices that the vectorized code operates on. Let us know if that helps! Ben Root From scheffer.nicolas at gmail.com Sat Jan 29 17:03:43 2011 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Sat, 29 Jan 2011 14:03:43 -0800 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: Thanks for the prompt reply! I quickly tried that and it actually helps compared to the full vectorized version. Depending on the dimensions, the chunk size has to be tuned (typically 100 or so) But I don't get any improvement w/r to the simple for loop (i can almost match the time though). My guess is that the dot product overhead is still too big for element by element multiplication of matrices. Can't I in anyway leverage the fact that the matrices are symmetric? I'm sure there might some slicing wizardry for such a problem ;) Can the += on the result be optimized by numpy.accumulate or similar? Or is it done in C anyway w/o coming back to the python VM? I also looked into the type of CONTIGUOUS array in memory but not much luck here either. -n >> Hi all, >> >> First email to the list for me, I just want to say how grateful I am >> to have python+numpy+ipython etc... for my day to day needs. Great >> combination of software. >> >> Anyway, I've been having this bottleneck in one my algorithms that has >> been bugging me for quite a while. >> The objective is to speed this part up. I've been doing tons of >> optimization and parallel processing around that piece of code to get >> a decent run time. >> >> The problem is easy. You want to accumulate in a matrix, a weighted >> sum of other matrices. Let's call this function scale_and_add: >> def scale_and_add_re(R,w,Ms): >> ?? ?(nb_add,mdim,j)=np.shape(Ms) >> ?? ?for i in range(nb_add): >> ?? ? ? ?R+=w[i]*Ms[i] >> ?? ?return R >> This 'for' loop bugs me since I know this will slow things down. >> >> But the dimension of these things are so large that any attempt to >> vectorize this is slower and takes too much memory. >> I typically work with 1000 weights and matrices, matrices of dimension >> of several hundred. >> >> My current config is: >> In [2]: %timeit scale_and_add_re(R,w,Ms) >> 1 loops, best of 3: 392 ms per loop >> >> In [3]: w.shape >> Out[3]: (2000,) >> >> In [4]: Ms.shape >> Out[4]: (2000, 200, 200) >> I'd like to be able to double these dimensions. >> >> For instance I could use broadcasting by using a dot product >> %timeit dot(Ms.T,w) >> 1 loops, best of 3: 1.77 s per loop >> But this is i) slower ii) takes too much memory >> (btw, I'd really need an inplace dot-product in numpy to avoid the >> copy in memory, like the blas call in scipy.linalg. But that's for an >> other thread...) >> >> The matrices are squared and symmetric. I should be able to get >> something out of this, but I never found anything related to this in >> Numpy. >> >> I also tried a Cython reimplementation >> %timeit scale_and_add_reg(L1,w,Ms) >> 1 loops, best of 3: 393 ms per loop >> It brought nothing in speed. >> >> Here's the code >> @cython.boundscheck(False) >> def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, >> ndim=1] w, np.ndarray[float, ndim=3] Ms): >> ?? ?return _scale_and_add_reg(R,w,Ms) >> >> @cython.boundscheck(False) >> cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R, >> np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): >> ?? ?cdef unsigned int mdim >> ?? ?cdef int nb_add >> ?? ?(nb_add,mdim,j)=np.shape(Ms) >> ?? ?cdef unsigned int i >> ?? ?for i from 0 <= i < nb_add: >> ?? ? ? ?R+=w[i]*Ms[i] >> ?? ? ? ?#for j in range(mdim): >> ?? ? ? ?# ? ?for k in range(mdim): >> ?? ? ? ?# ? ? ? ?R[j][k]+=w[i]*Ms[i][j][k] >> >> ?? ?return 0 >> >> So here's my question if someone has time to answer it: Did I try >> anything possible? Should I move along and deal with this bottleneck? >> Or is there something else I didn't think of? >> >> Thanks for reading, keep up the great work! >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Have you tried chunking the vectoriized version of the code? ?By > chunking it, you gain the speed-ups of vectorizing, but still stay > within manageable memory sizes. > > You would first pre-allocate you output array. ?Then you would create > an array of indices starting at zero and ending with the size of your > array. ?The values would be spaced so that you could use the index at > j as the start of a slice and the index at j + 1 as the end of a > slice. ?I typically use 4096 as my increment and allow the last chunk > to be smaller. > > Your loop would loop through these indices, using your vectorized code > in its body. ?The modification is that you will need to specify the > slices that the vectorized code operates on. > > Let us know if that helps! > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Sat Jan 29 17:10:30 2011 From: sturla at molden.no (Sturla Molden) Date: Sat, 29 Jan 2011 23:10:30 +0100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <201101292240.05470.akabaila@pcug.org.au> References: <201101292240.05470.akabaila@pcug.org.au> Message-ID: <4D449056.6090406@molden.no> Den 29.01.2011 12:40, skrev Algis Kabaila: > So my question is: how can one reliably detect singularity (or > near singularity) and raise an exception? Use an SVD, examine the singular values. One or more small singular values indicate ill-conditioning. (What constitutes a small singular value is a debate I'll avoid.) The SVD will also let you fix the problem when inverting the matrix, simply by truncating them to 0. SVD is slow, but the advantage is that it cannot fail. If SVD is deemed too slow, you might look for a more specialized solution. But always try to understand the problem, don't just use SVD as a universal solution to any ill-conditioned matrix problem, even if it's tempting. In statistics we sometimes see ill-conditioning of covariance matrices. Another way to deal with multicollinearity besides SVD/PCA is regularisation. Simply adding a small bias k*I to the diagonal might fix the problem (cf. ridge regression). In the Levenberg-Marquardt algorithm used to fit non-linear least squares models (cf. scipy.optimize.leastsq), the bias k to the diagonal of the Jacobian is changed adaptively. One might also know in advance if a covariance matrix could be ill-conditioned (the number of samples is small compared to the number of dimensions) or singular (less data than parameters). That is, sometimes we don't even need to look at the matrix to give the correct diagnosis. Another widely used strategy is to use Cholesky factorization on covariance matrices. It is always stable unless there is a singularity, for which it will fail (NumPy will raise a LinAlgError exception). Cholesky is therefore safer to use for inverting covariance matrices than LU (as well as faster). If Cholesky fails one might fallback to SVD or regularisation to correct the problem. Sturla From charlesr.harris at gmail.com Sat Jan 29 17:30:01 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Jan 2011 15:30:01 -0700 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: On Sat, Jan 29, 2011 at 2:01 PM, Nicolas SCHEFFER < scheffer.nicolas at gmail.com> wrote: > Hi all, > > First email to the list for me, I just want to say how grateful I am > to have python+numpy+ipython etc... for my day to day needs. Great > combination of software. > > Anyway, I've been having this bottleneck in one my algorithms that has > been bugging me for quite a while. > The objective is to speed this part up. I've been doing tons of > optimization and parallel processing around that piece of code to get > a decent run time. > > The problem is easy. You want to accumulate in a matrix, a weighted > sum of other matrices. Let's call this function scale_and_add: > def scale_and_add_re(R,w,Ms): > (nb_add,mdim,j)=np.shape(Ms) > for i in range(nb_add): > R+=w[i]*Ms[i] > return R > This 'for' loop bugs me since I know this will slow things down. > > I'd put the flattened matrices in a stack, weight in place, sum on the first index, and reshape. Something like In [1]: m = array([eye(3)]*4).reshape(4,-1) In [2]: m Out[2]: array([[ 1., 0., 0., 0., 1., 0., 0., 0., 1.], [ 1., 0., 0., 0., 1., 0., 0., 0., 1.], [ 1., 0., 0., 0., 1., 0., 0., 0., 1.], [ 1., 0., 0., 0., 1., 0., 0., 0., 1.]]) In [3]: w = array([1.,2.,3.,4.]) In [4]: m *= w[:,None] In [5]: r = m.sum(0).reshape(3,3) In [6]: r Out[6]: array([[ 10., 0., 0.], [ 0., 10., 0.], [ 0., 0., 10.]]) This should fit in memory I think, depending of course on how much memory you have. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Jan 29 17:44:29 2011 From: sturla at molden.no (Sturla Molden) Date: Sat, 29 Jan 2011 23:44:29 +0100 Subject: [Numpy-discussion] Wiener filter / decorrelation In-Reply-To: <83b96e9e-6a8f-440d-8f14-45ecffd92d8c@e9g2000vbi.googlegroups.com> References: <4D3F3AEA.9050600@grinta.net> <4D3F401A.3060203@grinta.net> <4D41F613.9080700@molden.no> <83b96e9e-6a8f-440d-8f14-45ecffd92d8c@e9g2000vbi.googlegroups.com> Message-ID: <4D44984D.2080404@molden.no> Didn't I recommend the C++ and Fortran versions of 2nd edition? I particularly like the Fortran 90 edition as NumPy behaves like a vector machine as well. (The algorithms are explained in the Fortran 77 text, so they must be read together.) I'd like to warn against NR in C (all algorithms contain illegal C) and the third edition of NR (bloated and unorganised text, obfuscated C++), as well as known NR problems (e.g. bad PRNGs, unstable SVD code, slow FFTs, license). Another book I recommend is Golub and van Loan's text on matrix computations (1996). It explains the algorithms with array syntax (Matlab inspired), as well as citing LAPACK and BLAS routines relevant for the text. Sturla Den 28.01.2011 15:50, skrev denis: > Sturla, > what books do you like then, on any of NR's topics ? > Examples convince better than attacks. > Sure NR has its faults, but I like its style: de gustabus > cheers > -- denis > > On Jan 27, 11:47 pm, Sturla Molden wrote: > >> NR's third edition is utterly atrocious... >> extensive. It has many valuable details that should have been covered in >> previous versions, but it is presented in a way that makes me barf. From e.antero.tammi at gmail.com Sat Jan 29 17:56:10 2011 From: e.antero.tammi at gmail.com (eat) Date: Sun, 30 Jan 2011 00:56:10 +0200 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: Hi, On Sat, Jan 29, 2011 at 11:01 PM, Nicolas SCHEFFER < scheffer.nicolas at gmail.com> wrote: > Hi all, > > First email to the list for me, I just want to say how grateful I am > to have python+numpy+ipython etc... for my day to day needs. Great > combination of software. > > Anyway, I've been having this bottleneck in one my algorithms that has > been bugging me for quite a while. > The objective is to speed this part up. I've been doing tons of > optimization and parallel processing around that piece of code to get > a decent run time. > > The problem is easy. You want to accumulate in a matrix, a weighted > sum of other matrices. Let's call this function scale_and_add: > def scale_and_add_re(R,w,Ms): > (nb_add,mdim,j)=np.shape(Ms) > for i in range(nb_add): > R+=w[i]*Ms[i] > return R > This 'for' loop bugs me since I know this will slow things down. > > But the dimension of these things are so large that any attempt to > vectorize this is slower and takes too much memory. > I typically work with 1000 weights and matrices, matrices of dimension > of several hundred. > > My current config is: > In [2]: %timeit scale_and_add_re(R,w,Ms) > 1 loops, best of 3: 392 ms per loop > > In [3]: w.shape > Out[3]: (2000,) > > In [4]: Ms.shape > Out[4]: (2000, 200, 200) > I'd like to be able to double these dimensions. How this array Ms is created? Do you really need to have it in the memory as whole? Assuming it's created by (200, 200) 'chunks' at a time, then you could accumulate that right away to R. It still involves Python looping but that's not so much overhead. My 2 cents eat For instance I could use broadcasting by using a dot product %timeit dot(Ms.T,w) 1 loops, best of 3: 1.77 s per loop But this is i) slower ii) takes too much memory (btw, I'd really need an inplace dot-product in numpy to avoid the copy in memory, like the blas call in scipy.linalg. But that's for an other thread...) The matrices are squared and symmetric. I should be able to get something out of this, but I never found anything related to this in Numpy. I also tried a Cython reimplementation %timeit scale_and_add_reg(L1,w,Ms) 1 loops, best of 3: 393 ms per loop It brought nothing in speed. Here's the code @cython.boundscheck(False) def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): return _scale_and_add_reg(R,w,Ms) @cython.boundscheck(False) cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): cdef unsigned int mdim cdef int nb_add (nb_add,mdim,j)=np.shape(Ms) cdef unsigned int i for i from 0 <= i < nb_add: R+=w[i]*Ms[i] #for j in range(mdim): # for k in range(mdim): # R[j][k]+=w[i]*Ms[i][j][k] return 0 So here's my question if someone has time to answer it: Did I try anything possible? Should I move along and deal with this bottleneck? Or is there something else I didn't think of? Thanks for reading, keep up the great work! -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Sat Jan 29 20:58:33 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Sat, 29 Jan 2011 19:58:33 -0600 Subject: [Numpy-discussion] numpy.linalg.svd documentation Message-ID: <4D44C5C9.4060205@creativetrax.com> The SVD documentation seems a bit misleading. It says: Factors the matrix a as u * np.diag(s) * v, where u and v are unitary and s is a 1-d array of a?s singular values. However, that only is true (i.e., you just have to do np.diag(s) to get S) in general if full_matrices is False, which is not the default. Otherwise, you have to something like in the first example in the docstring. I'm not sure what the right fix is here. Changing the default for full_matrices seems too drastic. But then having u*np.diag(s)*v in the first line doesn't work if you have a rectangular matrix. Perhaps the first line could be changed to: Factors the matrix a as u * S * v, where u and v are unitary and S is a matrix with shape (a.shape[0], a.shape[1]) with np.diag(S)=s, where s is a 1-d array of a?s singular values. It sounds more confusing that way, but at least it's correct. Maybe even better would be to add a shape option to np.diag, and then just make the first line of the svd docstring say u*np.diag(s,shape=(a.shape[0],a.shape[1]))*v Jason From josef.pktd at gmail.com Sat Jan 29 21:46:50 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 Jan 2011 21:46:50 -0500 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: <4D44C5C9.4060205@creativetrax.com> References: <4D44C5C9.4060205@creativetrax.com> Message-ID: On Sat, Jan 29, 2011 at 8:58 PM, Jason Grout wrote: > The SVD documentation seems a bit misleading. ?It says: > > Factors the matrix a as u * np.diag(s) * v, where u and v are unitary > and s is a 1-d array of a?s singular values. > > However, that only is true (i.e., you just have to do np.diag(s) to get > S) in general if full_matrices is False, which is not the default. > Otherwise, you have to something like in the first example in the docstring. > > I'm not sure what the right fix is here. ?Changing the default for > full_matrices seems too drastic. ?But then having u*np.diag(s)*v in the > first line doesn't work if you have a rectangular matrix. ?Perhaps the > first line could be changed to: > > Factors the matrix a as u * S * v, where u and v are unitary and S is a > matrix with shape (a.shape[0], a.shape[1]) with np.diag(S)=s, where s is > a 1-d array of a?s singular values. > > It sounds more confusing that way, but at least it's correct. > > Maybe even better would be to add a shape option to np.diag, and then > just make the first line of the svd docstring say > u*np.diag(s,shape=(a.shape[0],a.shape[1]))*v or move scipy's diagsvd to numpy scipy.linalg.diagsvd(s, M, N) I found the difference between full and not full matrices confusing when I tried to figure out how svd (in scipy) works. diagsvd was a big help for me. I think you could just edit it with the documentation editor. Any clarification is better even if it sounds a bit complicated. Josef > > > Jason > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sat Jan 29 21:58:54 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Jan 2011 19:58:54 -0700 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: <4D44C5C9.4060205@creativetrax.com> References: <4D44C5C9.4060205@creativetrax.com> Message-ID: On Sat, Jan 29, 2011 at 6:58 PM, Jason Grout wrote: > The SVD documentation seems a bit misleading. It says: > > Factors the matrix a as u * np.diag(s) * v, where u and v are unitary > and s is a 1-d array of a?s singular values. > > However, that only is true (i.e., you just have to do np.diag(s) to get > S) in general if full_matrices is False, which is not the default. > Otherwise, you have to something like in the first example in the > docstring. > > I'm not sure what the right fix is here. Changing the default for > full_matrices seems too drastic. But then having u*np.diag(s)*v in the > first line doesn't work if you have a rectangular matrix. Perhaps the > first line could be changed to: > > I hate full_matrices as the default, it is almost never what I want and a horrible waste of time and space. Nothing is too drastic when it comes to full matrices. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 29 22:05:14 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 Jan 2011 22:05:14 -0500 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: On Sat, Jan 29, 2011 at 5:30 PM, Charles R Harris wrote: > > > On Sat, Jan 29, 2011 at 2:01 PM, Nicolas SCHEFFER > wrote: >> >> Hi all, >> >> First email to the list for me, I just want to say how grateful I am >> to have python+numpy+ipython etc... for my day to day needs. Great >> combination of software. >> >> Anyway, I've been having this bottleneck in one my algorithms that has >> been bugging me for quite a while. >> The objective is to speed this part up. I've been doing tons of >> optimization and parallel processing around that piece of code to get >> a decent run time. >> >> The problem is easy. You want to accumulate in a matrix, a weighted >> sum of other matrices. Let's call this function scale_and_add: >> def scale_and_add_re(R,w,Ms): >> ? ?(nb_add,mdim,j)=np.shape(Ms) >> ? ?for i in range(nb_add): >> ? ? ? ?R+=w[i]*Ms[i] >> ? ?return R >> This 'for' loop bugs me since I know this will slow things down. >> > > I'd put the flattened matrices in a stack, weight in place, sum on the first > index, and reshape. Something like > > In [1]: m = array([eye(3)]*4).reshape(4,-1) using triu_indices instead of reshape would cut the memory consumption ind1, ind2 = np.triu_indices(?,?) m = m[:, ind1, ind2] I have no idea how expensive the indexing is in this case Josef > > In [2]: m > Out[2]: > array([[ 1.,? 0.,? 0.,? 0.,? 1.,? 0.,? 0.,? 0.,? 1.], > ?????? [ 1.,? 0.,? 0.,? 0.,? 1.,? 0.,? 0.,? 0.,? 1.], > ?????? [ 1.,? 0.,? 0.,? 0.,? 1.,? 0.,? 0.,? 0.,? 1.], > ?????? [ 1.,? 0.,? 0.,? 0.,? 1.,? 0.,? 0.,? 0.,? 1.]]) > > In [3]: w = array([1.,2.,3.,4.]) > > In [4]: m *= w[:,None] > > In [5]: r = m.sum(0).reshape(3,3) > > In [6]: r > Out[6]: > array([[ 10.,?? 0.,?? 0.], > ?????? [? 0.,? 10.,?? 0.], > ?????? [? 0.,?? 0.,? 10.]]) > > This should fit in memory I think, depending of course on how much memory > you have. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From akabaila at pcug.org.au Sat Jan 29 23:52:18 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Sun, 30 Jan 2011 15:52:18 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: References: <201101292240.05470.akabaila@pcug.org.au> Message-ID: <201101301552.18874.akabaila@pcug.org.au> On Saturday 29 January 2011 22:47:23 Stuart Brorson wrote: > > So my question is: how can one reliably detect singularity > > (or near singularity) and raise an exception? > > Matrix condition number: > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.lin > alg.cond.html http://en.wikipedia.org/wiki/Condition_number > > Stuart Stuart, Thank you for the wonderful pointer to good information. When I was writing my PhD in 1966, there was little if anything known about the condition mumbers. Your judiciously chosen references put me on the track to search some information about it. Internet is wonderful - there are some well written material there to keep me quietly reading for a while... Thank you, Al. (aka OldAl) -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf From akabaila at pcug.org.au Sun Jan 30 00:11:23 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Sun, 30 Jan 2011 16:11:23 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <4D449056.6090406@molden.no> References: <201101292240.05470.akabaila@pcug.org.au> <4D449056.6090406@molden.no> Message-ID: <201101301611.23488.akabaila@pcug.org.au> On Sunday 30 January 2011 09:10:30 Sturla Molden wrote: > Den 29.01.2011 12:40, skrev Algis Kabaila: > > So my question is: how can one reliably detect singularity > > (or near singularity) and raise an exception? > > Use an SVD, examine the singular values. I gather that SVD is the Singular Value Decomposition, but I have no idea how to perform such decomposition. Would you care to refer me to some simple source material? I have been advised to watch the condition numbers. No doubt, SVD and condition numbers are related. The references about condition numbers are very interesting and I intend to follow them in the first instance. > In statistics we sometimes see ill-conditioning of covariance > matrices. Another way to deal with multicollinearity besides > SVD/PCA is regularisation. Simply adding a small bias k*I to > the diagonal might fix the problem (cf. ridge regression). > In the Levenberg-Marquardt algorithm used to fit non-linear > least squares models (cf. > scipy.optimize.leastsq), the bias k to the diagonal of the > Jacobian is changed adaptively. One might also know in > advance if a covariance matrix could be ill-conditioned (the > number of samples is small compared to the number of > dimensions) or singular (less data than parameters). That > is, sometimes we don't even need to look at the matrix to > give the correct diagnosis. Another widely used strategy is > to use Cholesky factorization on covariance matrices. It is > always stable unless there is a singularity, for which it > will fail (NumPy will raise a LinAlgError exception). > Cholesky is therefore safer to use for inverting covariance > matrices than LU (as well as faster). If Cholesky fails one > might fallback to SVD or regularisation to correct the > problem. > > Sturla > My knowledge of statistics is rather limited, though our son Dr. Paul Kabaila is a specialist in that area. My interests lie in the area of Analysis of Engineering Structures - it saves my brain from falling to a permafrost like sleep :) Thank you for your reply - greatly appreciated. Al. PS: Paul, I thought there is a minuscule chance that this is of some interest to you. Tete. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf From charlesr.harris at gmail.com Sun Jan 30 00:35:15 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Jan 2011 22:35:15 -0700 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <201101301611.23488.akabaila@pcug.org.au> References: <201101292240.05470.akabaila@pcug.org.au> <4D449056.6090406@molden.no> <201101301611.23488.akabaila@pcug.org.au> Message-ID: On Sat, Jan 29, 2011 at 10:11 PM, Algis Kabaila wrote: > On Sunday 30 January 2011 09:10:30 Sturla Molden wrote: > > Den 29.01.2011 12:40, skrev Algis Kabaila: > > > So my question is: how can one reliably detect singularity > > > (or near singularity) and raise an exception? > > > > Use an SVD, examine the singular values. > I gather that SVD is the Singular Value Decomposition, but I > have no idea how to perform such decomposition. Would you care > to refer me to some simple source material? I have been advised > to watch the condition numbers. No doubt, SVD and condition > numbers are related. The references about condition numbers are > very interesting and I intend to follow them in the first > instance. > > Use numpy.linalg.svd. The condition number is the ratio of the largest singular value to the smallest. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From akabaila at pcug.org.au Sun Jan 30 01:28:32 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Sun, 30 Jan 2011 17:28:32 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: References: <201101292240.05470.akabaila@pcug.org.au> <201101301611.23488.akabaila@pcug.org.au> Message-ID: <201101301728.32665.akabaila@pcug.org.au> On Sunday 30 January 2011 16:35:15 Charles R Harris wrote: > On Sat, Jan 29, 2011 at 10:11 PM, Algis Kabaila wrote: > > On Sunday 30 January 2011 09:10:30 Sturla Molden wrote: > > > Den 29.01.2011 12:40, skrev Algis Kabaila: > > > > So my question is: how can one reliably detect > > > > singularity (or near singularity) and raise an > > > > exception? > > > > > > Use an SVD, examine the singular values. > > > > I gather that SVD is the Singular Value Decomposition, but > > I have no idea how to perform such decomposition. > Use numpy.linalg.svd. The condition number is the ratio of > the largest singular value to the smallest. > > > > Chuck Why not simply numply.linalg.cond? This gives the condition number directly (and presumably performs the inspection of sv's). Or do you think that sv's give more useful information? Thanks for writing - I find it all rather fascinating... Gratefully, Al. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf From gael.varoquaux at normalesup.org Sun Jan 30 09:28:32 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 30 Jan 2011 15:28:32 +0100 Subject: [Numpy-discussion] [ANN] FEMTEC: Trac on open source scientific software Message-ID: <20110130142832.GJ20649@phare.normalesup.org> Hi list, This is just a note that an extra track at FEMTEC, a conference for computational methods in engineering and science, is open for open source scientific software. The organisers have a taste for Python, so if you want to submit a paper on numerical methods with Python, this is an excellent venue. Abstract submission is open till end of February. To submit you need to create an account and edit you profile. Gael ________________________________________________________________________________ The 3rd International Conference on Finite Element Methods in Engineering and Science (FEMTEC 2011, http://hpfem.org/events/femtec-2011/) will have a track on Open-source projects and Python in scientific computing. FEMTEC 2011 is co-organized by the University of Nevada (Reno), Desert Reseach Institute (Reno), Idaho National Laboratory (Idaho Falls, Idaho), and U.S. Army Engineer Research and Development Center (Vicksburg, Mississippi). The objective of the meeting is to strengthen the interaction between researchers who develop new computational methods, and scientists and engineers from various fields who employ numerical methods in their research. Specific focus areas of FEMTEC 2011 include, but are not limited to, the following: * Computational methods in hydrology, atmospheric modeling, and other earth sciences. * Computational methods in nuclear, mechanical, civil, electrical, and other engineering fields. * Mesh generation and scientific visualization. * Open-source projects and Python in scientific computing. Part of the conference will be a software afternoon featuring open source projects of participants. Proceedings Proceedings of FEMTEC 2011 will appear as a special issue of Journal of Computational and Applied Mathematics (2008 SCI impact factor 1.292), and additional high-impact international journals as needed. From sturla at molden.no Sun Jan 30 10:15:34 2011 From: sturla at molden.no (Sturla Molden) Date: Sun, 30 Jan 2011 16:15:34 +0100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <201101301728.32665.akabaila@pcug.org.au> References: <201101292240.05470.akabaila@pcug.org.au> <201101301611.23488.akabaila@pcug.org.au> <201101301728.32665.akabaila@pcug.org.au> Message-ID: <4D458096.1000808@molden.no> Den 30.01.2011 07:28, skrev Algis Kabaila: > Why not simply numply.linalg.cond? This gives the condition > number directly (and presumably performs the inspection of > sv's). Or do you think that sv's give more useful information? You can use the singular value decomposition to invert the matrix, solve linear systems and solve least squares problems. Looking at the topic you don't just want to compute condition numbers, but invert the ill-conditioned (nearly singular) matrix. Say you want to do: invA = np.linalg.inv(a) With matrix a nearly singular, you can proceed like this: U, s, Vh = np.linalg.svd(a, full_matrices=False) Edit small singular values. This will add a small bias but reduce rounding error: singular = s < threshold invS = 1/s invS[singular] = 0 # truncate inf to 0 actually helps... Et voil?: invA = np.dot(np.dot(U,np.diag(invS)),Vh) I hope this helps :) There is a chapter on SVD in "Numerical Receipes" and almost any linear algebra textbook. Just to verify: >>> a = np.diag([1,2,3]) >>> np.linalg.inv(a) array([[ 1. , 0. , 0. ], [ 0. , 0.5 , 0. ], [ 0. , 0. , 0.33333333]]) >>> U, s, Vh = np.linalg.svd(a, full_matrices=False) >>> np.dot(np.dot(U,np.diag(1/s)),Vh) array([[ 1. , 0. , 0. ], [ 0. , 0.5 , 0. ], [ 0. , 0. , 0.33333333]]) Sturla From sturla at molden.no Sun Jan 30 10:25:42 2011 From: sturla at molden.no (Sturla Molden) Date: Sun, 30 Jan 2011 16:25:42 +0100 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: <4D44C5C9.4060205@creativetrax.com> References: <4D44C5C9.4060205@creativetrax.com> Message-ID: <4D4582F6.1090309@molden.no> Den 30.01.2011 02:58, skrev Jason Grout: > Factors the matrix a as u * S * v, It actually returns the Hermitian of v, as almost any use of SVD will require v.H. And by the way, the documentation does not say that the factorization is u * S * v, but u * np.diag(s) * v.H. Sturla From bsouthey at gmail.com Sun Jan 30 10:40:10 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 30 Jan 2011 09:40:10 -0600 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <4D458096.1000808@molden.no> References: <201101292240.05470.akabaila@pcug.org.au> <201101301611.23488.akabaila@pcug.org.au> <201101301728.32665.akabaila@pcug.org.au> <4D458096.1000808@molden.no> Message-ID: On Sun, Jan 30, 2011 at 9:15 AM, Sturla Molden wrote: > Den 30.01.2011 07:28, skrev Algis Kabaila: >> Why not simply numply.linalg.cond? This gives the condition >> number directly (and presumably performs the inspection of >> sv's). Or do you think that sv's give more useful information? > > You can use the singular value decomposition to invert the matrix, solve > linear systems and solve least squares problems. Looking at the topic > you don't just want to compute condition numbers, but invert the > ill-conditioned (nearly singular) matrix. > > Say you want to do: > > ? ?invA = ?np.linalg.inv(a) > > With matrix a nearly singular, you can proceed like this: > > ? ?U, s, Vh = np.linalg.svd(a, full_matrices=False) > > Edit small singular values. This will add a small bias but reduce > rounding error: > > ? ?singular = s < threshold > ? ?invS = 1/s > ? ?invS[singular] = 0 # truncate inf to 0 actually helps... > > Et voil?: > > ? ?invA = np.dot(np.dot(U,np.diag(invS)),Vh) > > I hope this helps :) > > There is a chapter on SVD in "Numerical Receipes" and almost any linear > algebra textbook. > > Just to verify: > > ?>>> a = np.diag([1,2,3]) > ?>>> np.linalg.inv(a) > array([[ 1. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?], > ? ? ? ?[ 0. ? ? ? ?, ?0.5 ? ? ? , ?0. ? ? ? ?], > ? ? ? ?[ 0. ? ? ? ?, ?0. ? ? ? ?, ?0.33333333]]) > ?>>> U, s, Vh = np.linalg.svd(a, full_matrices=False) > ?>>> np.dot(np.dot(U,np.diag(1/s)),Vh) > array([[ 1. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?], > ? ? ? ?[ 0. ? ? ? ?, ?0.5 ? ? ? , ?0. ? ? ? ?], > ? ? ? ?[ 0. ? ? ? ?, ?0. ? ? ? ?, ?0.33333333]]) > > > Sturla > > If the matrix is not full rank then it is not invertible (http://en.wikipedia.org/wiki/Matrix_inverse) and (matrix) rank (http://en.wikipedia.org/wiki/Matrix_rank) can be computed from the above code. You do have to beware that you can get a generalized inverse (numpy.linalg provides pinv) when your system has an infinite number of solutions. (A generalized inverse is not the problem, the problem is when you expect a unique solution and do not get it.) Bruce From charlesr.harris at gmail.com Sun Jan 30 11:04:08 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 30 Jan 2011 09:04:08 -0700 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: <4D4582F6.1090309@molden.no> References: <4D44C5C9.4060205@creativetrax.com> <4D4582F6.1090309@molden.no> Message-ID: On Sun, Jan 30, 2011 at 8:25 AM, Sturla Molden wrote: > Den 30.01.2011 02:58, skrev Jason Grout: > > Factors the matrix a as u * S * v, > > It actually returns the Hermitian of v, as almost any use of SVD will > require v.H. And by the way, the documentation does not say that the > factorization is u * S * v, but u * np.diag(s) * v.H. > > The v.H is the old, incorrect, version of the documentation. The current documentation is correct. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregor.thalhammer at gmail.com Sun Jan 30 11:12:50 2011 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Sun, 30 Jan 2011 17:12:50 +0100 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: References: Message-ID: <69968386-FCEF-497C-9A4B-39DB368A3906@gmail.com> Am 29.1.2011 um 22:01 schrieb Nicolas SCHEFFER: > Hi all, > > First email to the list for me, I just want to say how grateful I am > to have python+numpy+ipython etc... for my day to day needs. Great > combination of software. > > Anyway, I've been having this bottleneck in one my algorithms that has > been bugging me for quite a while. > The objective is to speed this part up. I've been doing tons of > optimization and parallel processing around that piece of code to get > a decent run time. > > The problem is easy. You want to accumulate in a matrix, a weighted > sum of other matrices. Let's call this function scale_and_add: > def scale_and_add_re(R,w,Ms): > (nb_add,mdim,j)=np.shape(Ms) > for i in range(nb_add): > R+=w[i]*Ms[i] > return R > This 'for' loop bugs me since I know this will slow things down. > > But the dimension of these things are so large that any attempt to > vectorize this is slower and takes too much memory. > I typically work with 1000 weights and matrices, matrices of dimension > of several hundred. > > My current config is: > In [2]: %timeit scale_and_add_re(R,w,Ms) > 1 loops, best of 3: 392 ms per loop > > In [3]: w.shape > Out[3]: (2000,) > > In [4]: Ms.shape > Out[4]: (2000, 200, 200) > I'd like to be able to double these dimensions. > > For instance I could use broadcasting by using a dot product > %timeit dot(Ms.T,w) > 1 loops, best of 3: 1.77 s per loop > But this is i) slower ii) takes too much memory > (btw, I'd really need an inplace dot-product in numpy to avoid the > copy in memory, like the blas call in scipy.linalg. But that's for an > other thread...) If you use a different memory layout for your data, you can improve your performance with dot: MsT = Ms.T.copy() %timeit np.dot(M,w) 10 loops, best of 3: 107 ms per loop for comparison: In [32]: %timeit scale_and_add_re(R,w,Ms) 1 loops, best of 3: 245 ms per loop I don't think you can do much better than this. The above value gives a memory bandwidth for accessing Ms of 6 GB/s, I think the hardware limit for my system is about 10 Gb/s. Gregor From gael.varoquaux at normalesup.org Sun Jan 30 11:21:37 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 30 Jan 2011 17:21:37 +0100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <4D458096.1000808@molden.no> References: <201101292240.05470.akabaila@pcug.org.au> <201101301611.23488.akabaila@pcug.org.au> <201101301728.32665.akabaila@pcug.org.au> <4D458096.1000808@molden.no> Message-ID: <20110130162137.GD14858@phare.normalesup.org> On Sun, Jan 30, 2011 at 04:15:34PM +0100, Sturla Molden wrote: > Den 30.01.2011 07:28, skrev Algis Kabaila: > > Why not simply numply.linalg.cond? This gives the condition > > number directly (and presumably performs the inspection of > > sv's). Or do you think that sv's give more useful information? > You can use the singular value decomposition to invert the matrix, solve > linear systems and solve least squares problems. Looking at the topic > you don't just want to compute condition numbers, but invert the > ill-conditioned (nearly singular) matrix. And if you are trying to solve a least-squares, I think that you should be using a ridge (or Tikhonov) regularisation: http://en.wikipedia.org/wiki/Tikhonov_regularization read in particular the paragraph above the table of content: you are most likely interested in Gamma = alpha identity, where you set alpha to be say 1% (or .1%) of the largest eigenvalue of A^t A. Gael From sturla at molden.no Sun Jan 30 12:35:56 2011 From: sturla at molden.no (Sturla Molden) Date: Sun, 30 Jan 2011 18:35:56 +0100 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: References: <4D44C5C9.4060205@creativetrax.com> <4D4582F6.1090309@molden.no> Message-ID: <4D45A17C.1040007@molden.no> Den 30.01.2011 17:04, skrev Charles R Harris: > > The v.H is the old, incorrect, version of the documentation. The > current documentation is correct. !!! Was it just the documentation that was false, or did SVD return v.H before? Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Sun Jan 30 14:29:16 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Sun, 30 Jan 2011 20:29:16 +0100 Subject: [Numpy-discussion] create a numpy array of images In-Reply-To: <4D4311A7.20007@noaa.gov> References: <4D4311A7.20007@noaa.gov> Message-ID: 2011/1/28 Christopher Barker : > On 1/28/11 7:01 AM, Asmi Shah wrote: >> I am using python for a while now and I have a requirement of creating a >> numpy array of microscopic tiff images ( this data is 3d, meaning there are >> 100 z slices of 512 X 512 pixels.) How can I create an array of images? > > It's quite straightforward to create a 3-d array to hold this kind of data: > > image_block = np.empty((100, 512, 512), dtype=??) > > now you can load it up by using some lib (PIL, or ???) to load the tif > images, and then: > > for i in images: > ? ? image_block[i,:,:] = i Notice that since PIL 1.1.6, PIL Image objects support the numpy interface: http://effbot.org/zone/pil-changes-116.htm >>> import PIL.Image >>> im = PIL.Image.open('P1010102.JPG') >>> im >>> a = numpy.asarray(im) >>> a.shape (2448, 3264, 3) >>> a.dtype dtype('uint8') You can use the image just as any other ndarray: >>> stack = numpy.empty((5, 2488, 3264, 3)) >>> stack[0] = im and so on for 5 images in a stack, notice that the dtype of the initially empty ndarray is float! It works also vice-versa: >>> im_copy = PIL.Image.fromarray(a) but this seems to require integer-valued ndarrays as input, except when the ndarray is monochrome. This might be even simpler than the dtype proposed by Christopher. For more info on PIL: http://www.pythonware.com/library/pil/handbook/ Friedrich From scheffer.nicolas at gmail.com Sun Jan 30 14:37:07 2011 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Sun, 30 Jan 2011 11:37:07 -0800 Subject: [Numpy-discussion] Help in speeding up accumulation in a matrix In-Reply-To: <69968386-FCEF-497C-9A4B-39DB368A3906@gmail.com> References: <69968386-FCEF-497C-9A4B-39DB368A3906@gmail.com> Message-ID: Hi all, Thanks for all of the answers, it gives me a lot of new ideas and new functions I didn't know of. @Charles: The reshape way is a great idea! It gives a great alternative to the for loop for your code to be vectorized. I tested it I get %timeit scale_and_add_reshape(R,w,Msr) 1 loops, best of 3: 1.35 s per loop Msr being already reshaped. So better than dot but far from the naive way. @eat: Yes I need these matrices in memory. I'm doing this operation tens of thousands of time using different weights (it's an ML algorithm where w is given by each input example). How would not having them in memory improve the speed? @josef So triu_indices is how you take care of symmetric matrices in Numpy? I read on the mailing lists that some of the implementations of these slicings might be slow though. I didn't have the chance to test that yet, but I sure will. @gregor Ok, I feel stupid ;) Thanks Gregor. I really thought I checked the compatibility of the array.flags before the dot product but it seems I didn't. This is the solution, making sure to be C_contiguous before doing the dot. That gives a 4x improvement over the naive implementation In [17]: %timeit scale_and_add_re(L1,w,Ms) 1 loops, best of 3: 392 ms per loop In [18]: %timeit dot(Ms.T,w) 1 loops, best of 3: 1.81 s per loop In [19]: %timeit dot(MsT,w) 10 loops, best of 3: 86.2 ms per loop That's a hell of a boost! I don't think we could do better than that. However it's doing almost twice the work it needs to do (symmetric matrices), so it probably can be done faster. Maybe the underlying BLAS code takes care of that w/o me knowing though. For now, I think I'll be fine with 4x improvement! Thanks much for the help! -n On Sun, Jan 30, 2011 at 8:12 AM, Gregor Thalhammer wrote: > > Am 29.1.2011 um 22:01 schrieb Nicolas SCHEFFER: > >> Hi all, >> >> First email to the list for me, I just want to say how grateful I am >> to have python+numpy+ipython etc... for my day to day needs. Great >> combination of software. >> >> Anyway, I've been having this bottleneck in one my algorithms that has >> been bugging me for quite a while. >> The objective is to speed this part up. I've been doing tons of >> optimization and parallel processing around that piece of code to get >> a decent run time. >> >> The problem is easy. You want to accumulate in a matrix, a weighted >> sum of other matrices. Let's call this function scale_and_add: >> def scale_and_add_re(R,w,Ms): >> ? ?(nb_add,mdim,j)=np.shape(Ms) >> ? ?for i in range(nb_add): >> ? ? ? ?R+=w[i]*Ms[i] >> ? ?return R >> This 'for' loop bugs me since I know this will slow things down. >> >> But the dimension of these things are so large that any attempt to >> vectorize this is slower and takes too much memory. >> I typically work with 1000 weights and matrices, matrices of dimension >> of several hundred. >> >> My current config is: >> In [2]: %timeit scale_and_add_re(R,w,Ms) >> 1 loops, best of 3: 392 ms per loop >> >> In [3]: w.shape >> Out[3]: (2000,) >> >> In [4]: Ms.shape >> Out[4]: (2000, 200, 200) >> I'd like to be able to double these dimensions. >> >> For instance I could use broadcasting by using a dot product >> %timeit dot(Ms.T,w) >> 1 loops, best of 3: 1.77 s per loop >> But this is i) slower ii) takes too much memory >> (btw, I'd really need an inplace dot-product in numpy to avoid the >> copy in memory, like the blas call in scipy.linalg. But that's for an >> other thread...) > > If you use a different memory layout for your data, you can improve your performance with dot: > > MsT = Ms.T.copy() > %timeit np.dot(M,w) > > 10 loops, best of 3: 107 ms per loop > > for comparison: > In [32]: %timeit scale_and_add_re(R,w,Ms) > 1 loops, best of 3: 245 ms per loop > > I don't think you can do much better than this. The above value gives a memory bandwidth for accessing Ms of 6 GB/s, I think the hardware limit for my system is about 10 Gb/s. > > Gregor > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Sun Jan 30 15:11:42 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 30 Jan 2011 20:11:42 +0000 (UTC) Subject: [Numpy-discussion] numpy.linalg.svd documentation References: <4D44C5C9.4060205@creativetrax.com> <4D4582F6.1090309@molden.no> <4D45A17C.1040007@molden.no> Message-ID: On Sun, 30 Jan 2011 18:35:56 +0100, Sturla Molden wrote: > Den 30.01.2011 17:04, skrev Charles R Harris: >> The v.H is the old, incorrect, version of the documentation. The >> current documentation is correct. > > !!! > > Was it just the documentation that was false, or did SVD return v.H > before? The documentation only. Obviously, the behavior has not been changed. From charlesr.harris at gmail.com Sun Jan 30 15:40:15 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 30 Jan 2011 13:40:15 -0700 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: <4D45A17C.1040007@molden.no> References: <4D44C5C9.4060205@creativetrax.com> <4D4582F6.1090309@molden.no> <4D45A17C.1040007@molden.no> Message-ID: On Sun, Jan 30, 2011 at 10:35 AM, Sturla Molden wrote: > Den 30.01.2011 17:04, skrev Charles R Harris: > > > The v.H is the old, incorrect, version of the documentation. The current > documentation is correct. > > > !!! > > Was it just the documentation that was false, or did SVD return v.H before? > > > Well, strictly speaking, both documentations say the same thing, but the old version was somewhat obfuscated. Either svd returns v.H and A = dot(u*d, v.H) or svd returns v and A = dot(u*d,v). I think the second is a clearer statement of the return value and the resulting factorization, but I suppose some may hold a different opinion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From akabaila at pcug.org.au Sun Jan 30 21:05:56 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Mon, 31 Jan 2011 13:05:56 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <20110130162137.GD14858@phare.normalesup.org> References: <201101292240.05470.akabaila@pcug.org.au> <4D458096.1000808@molden.no> <20110130162137.GD14858@phare.normalesup.org> Message-ID: <201101311305.56481.akabaila@pcug.org.au> > > And if you are trying to solve a least-squares, I think that > you should be using a ridge (or Tikhonov) regularisation: > http://en.wikipedia.org/wiki/Tikhonov_regularization > read in particular the paragraph above the table of content: > you are most likely interested in Gamma = alpha identity, > where you set alpha to be say 1% (or .1%) of the largest > eigenvalue of A^t A. > > Gael First of all I want to thank all who have contributed to this discussion. It has been nothing less than inspiring! However, it has drifted to areas in which I lack expertese and interest. My interest is in structural analysis of engineering structures. The structure response is generally characterised by a square matrix with real elements. Actually, the structural engineer has no interest in trying to invert a singular matrix. However he/she is interested (or should be interested :) ) when the square response matrix might approach singularity for this would signal instability. He/She knows what the result of instability would be - a disaster! It is my fault not to have stated the problem with adequate clarity and I intend to do that as soon as I can. Thank you again for all your valuable contributions. Al. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf From totonixsame at gmail.com Mon Jan 31 06:19:57 2011 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Mon, 31 Jan 2011 09:19:57 -0200 Subject: [Numpy-discussion] create a numpy array of images In-Reply-To: References: <4D4311A7.20007@noaa.gov> Message-ID: I've been done that but with CT and MRI dicom files, and the cool thing is that with numpy I can do something like this: # getting axial slice axial = slices[n,:,:] # getting coronal slice coronal = slices[:, n, :] # getting sagital slice sagital = slices[:,:, n] On Sun, Jan 30, 2011 at 5:29 PM, Friedrich Romstedt wrote: > 2011/1/28 Christopher Barker : >> On 1/28/11 7:01 AM, Asmi Shah wrote: >>> I am using python for a while now and I have a requirement of creating a >>> numpy array of microscopic tiff images ( this data is 3d, meaning there are >>> 100 z slices of 512 X 512 pixels.) How can I create an array of images? >> >> It's quite straightforward to create a 3-d array to hold this kind of data: >> >> image_block = np.empty((100, 512, 512), dtype=??) >> >> now you can load it up by using some lib (PIL, or ???) to load the tif >> images, and then: >> >> for i in images: >> ? ? image_block[i,:,:] = i > > Notice that since PIL 1.1.6, PIL Image objects support the numpy > interface: http://effbot.org/zone/pil-changes-116.htm > >>>> import PIL.Image >>>> im = PIL.Image.open('P1010102.JPG') >>>> im > >>>> a = numpy.asarray(im) >>>> a.shape > (2448, 3264, 3) >>>> a.dtype > dtype('uint8') > > You can use the image just as any other ndarray: > >>>> stack = numpy.empty((5, 2488, 3264, 3)) >>>> stack[0] = im > and so on > > for 5 images in a stack, notice that the dtype of the initially empty > ndarray is float! > > It works also vice-versa: > >>>> im_copy = PIL.Image.fromarray(a) > > but this seems to require integer-valued ndarrays as input, except > when the ndarray is monochrome. > > This might be even simpler than the dtype proposed by Christopher. > > For more info on PIL: http://www.pythonware.com/library/pil/handbook/ > > Friedrich > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ndbecker2 at gmail.com Mon Jan 31 07:02:20 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 31 Jan 2011 07:02:20 -0500 Subject: [Numpy-discussion] Generator arrays References: <82C3B34D-12D1-4EFD-B83E-61568AFCA692@enthought.com> <87mxmldp8m.fsf@ginnungagap.bsc.es> <1FD06257-33BD-4F8C-88A8-5BBB9A3B16F1@enthought.com> Message-ID: I'm not sure how it applies to this discussion, but I'd just like to mention that a lot of interest (in c++ and d communities) has moved away from using iterators as the fundamental interface to containers and to ranges as the interface. From cimrman3 at ntc.zcu.cz Mon Jan 31 07:39:11 2011 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 31 Jan 2011 13:39:11 +0100 (CET) Subject: [Numpy-discussion] using loadtxt() for given number of rows? Message-ID: Hi, I work with text files which contain several arrays separated by a few lines of other information, for example: POINTS 4 float -5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 -5.000000e-01 5.000000e-01 0.000000e+00 CELLS 2 8 3 0 1 2 3 2 3 0 (yes, that's the legacy VTK format, but take it just as an example) I have used custom Python code with loops to read similar files, so the speed was not too good. Now I wonder if it would be possible to use the numpy.loadtxt() function for the "array-like" parts. It supports passing an open file object in, which is good, but it wants to read the entire file, which does not work in this case. It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job, so that the function does not try reading the entire file. Another possibility would be to raise an exception as it is now, but also to return the data succesfully read so far. What do you think? Is this worth a ticket? r. From DParker at chromalloy.com Mon Jan 31 10:15:59 2011 From: DParker at chromalloy.com (DParker at chromalloy.com) Date: Mon, 31 Jan 2011 10:15:59 -0500 Subject: [Numpy-discussion] Vectorize or rewrite function to work with array inputs? Message-ID: I have several functions like the example below that I would like to make compatible with array inputs. The problem is the conditional statements give a ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). I can use numpy.vectorize, but if possible I'd prefer to rewrite the function. Does anyone have any advice the best way to modify the code to accept array inputs? Thanks in advance for any assistance. NAN = float('nan') def air_gamma(t, far=0.0): """ Specific heat ratio (gamma) of Air/JP8 t - static temperature, Rankine [far] - fuel air ratio [- defaults to 0.0 (dry air)] air_gamma - specific heat ratio """ if far < 0.: return NAN elif far < 0.005: if t < 379. or t > 4731.: return NAN else: air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. - 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.0000002753524 * t ** 2. + 0.0001684666 * t + 1.368652 elif far < 0.069: if t < 699. or t > 4731.: return NAN else: a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + 3.103507e-21 * far - 3.391308e-22 a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. - 5.469399e-17 * far + 6.058125e-18 a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + 3.865306e-13 * far - 4.302534e-14 a3 = -0.00000001700602 * far ** 3. + 0.000000006593809 * far ** 2. - 0.000000001392629 * far + 1.520583e-10 a2 = 0.00003431136 * far ** 3. - 0.00001248285 * far ** 2. + 0.000002688007 * far - 0.0000002651616 a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - 0.002676877 * far + 0.0001580424 a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * far + 1.372714 air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t ** 3. + a2 * t ** 2. + a1 * t + a0 elif far >= 0.069: return NAN else: return NAN return air_gamma David Parker Chromalloy - TDAG -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jan 31 10:24:03 2011 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 31 Jan 2011 16:24:03 +0100 Subject: [Numpy-discussion] Vectorize or rewrite function to work with array inputs? In-Reply-To: References: Message-ID: <1296487443.2529.4.camel@sebastian> Hello, On Mon, 2011-01-31 at 10:15 -0500, DParker at chromalloy.com wrote: > I have several functions like the example below that I would like to > make compatible with array inputs. The problem is the conditional > statements give a ValueError: The truth value of an array with more > than one element is ambiguous. Use a.any() or a.all(). I can use > numpy.vectorize, but if possible I'd prefer to rewrite the function. > Does anyone have any advice the best way to modify the code to accept > array inputs? Thanks in advance for any assistance. > You can use binary indexing instead of the if: condition = far < 0 values[condition] = np.nan # set to NaN wherever far < 0 is True. or if you like I suppose you could put it into cython, add some typing to avoid creating those binary arrays etc. all over to speed things up more. Regards, Sebastian > > NAN = float('nan') > > def air_gamma(t, far=0.0): > """ > Specific heat ratio (gamma) of Air/JP8 > t - static temperature, Rankine > [far] - fuel air ratio [- defaults to 0.0 (dry air)] > air_gamma - specific heat ratio > """ > if far < 0.: > return NAN > elif far < 0.005: > if t < 379. or t > 4731.: > return NAN > else: > air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** > 5. - 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.0000002753524 > * t ** 2. + 0.0001684666 * t + 1.368652 > elif far < 0.069: > if t < 699. or t > 4731.: > return NAN > else: > a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + > 3.103507e-21 * far - 3.391308e-22 > a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. > - 5.469399e-17 * far + 6.058125e-18 > a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + > 3.865306e-13 * far - 4.302534e-14 > a3 = -0.00000001700602 * far ** 3. + 0.000000006593809 * > far ** 2. - 0.000000001392629 * far + 1.520583e-10 > a2 = 0.00003431136 * far ** 3. - 0.00001248285 * far ** 2. > + 0.000002688007 * far - 0.0000002651616 > a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - > 0.002676877 * far + 0.0001580424 > a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + > 0.3573201 * far + 1.372714 > air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + > a3 * t ** 3. + a2 * t ** 2. + a1 * t + a0 > elif far >= 0.069: > return NAN > else: > return NAN > return air_gamma > > David Parker > Chromalloy - TDAG > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Mon Jan 31 11:27:22 2011 From: sturla at molden.no (Sturla Molden) Date: Mon, 31 Jan 2011 17:27:22 +0100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <201101311305.56481.akabaila@pcug.org.au> References: <201101292240.05470.akabaila@pcug.org.au> <4D458096.1000808@molden.no> <20110130162137.GD14858@phare.normalesup.org> <201101311305.56481.akabaila@pcug.org.au> Message-ID: <4D46E2EA.7080004@molden.no> Den 31.01.2011 03:05, skrev Algis Kabaila: > Actually, the structural engineer > has no interest in trying to invert a singular matrix. However > he/she is interested (or should be interested :) ) when the > square response matrix might approach singularity for this would > signal instability. I am sorry for having confused the issue by mentioning statistics. The mathematics (linear algebra) is of course the same. A singular matrix cannot be inverted by definition. The methods mentioned (SVD, Tikohonov regularization), as well as the transforms mentioned by Paul, will let you avoid numerical instability when matrices "approach singularity" (i.e. are very ill-conditioned). OT: I think I know what structural engineering is. Back in 1994 I had to take a class in "statikk" (not sure what that translates to in English), with a textbook by Fritjof Irgens. From what I remember we did vector calculus to ensure the forces in a construction summed to 0, so that Newton's first law of motion would apply. It's unhealthy to be inside a building otherwise ;-) Sturla Molden From e.antero.tammi at gmail.com Mon Jan 31 11:38:10 2011 From: e.antero.tammi at gmail.com (eat) Date: Mon, 31 Jan 2011 18:38:10 +0200 Subject: [Numpy-discussion] Vectorize or rewrite function to work with array inputs? In-Reply-To: References: Message-ID: Hi, On Mon, Jan 31, 2011 at 5:15 PM, wrote: > I have several functions like the example below that I would like to make > compatible with array inputs. The problem is the conditional statements give > a *ValueError: The truth value of an array with more than one element is > ambiguous. Use a.any() or a.all()*. I can use numpy.vectorize, but if > possible I'd prefer to rewrite the function. Does anyone have any advice the > best way to modify the code to accept array inputs? Thanks in advance for > any assistance. If I understod your question correctly, then air_gamma could be coded as: def air_gamma_0(t, far=0.0): """ Specific heat ratio (gamma) of Air/JP8 t - static temperature, Rankine [far] - fuel air ratio [- defaults to 0.0 (dry air)] air_gamma - specific heat ratio """ if far< 0.: return NAN elif far < 0.005: ag= air_gamma_1(t) ag[np.logical_or(t< 379., t> 4731.)]= NAN return ag elif far< 0.069: ag= air_gamma_2(t, far) ag[np.logical_or(t< 699., t> 4731.)]= NAN return ag else: return NAN Rest of the code is in the attachment. My two cents, eat > > > > NAN = float('nan') > > def air_gamma(t, far=0.0): > """ > Specific heat ratio (gamma) of Air/JP8 > t - static temperature, Rankine > [far] - fuel air ratio [- defaults to 0.0 (dry air)] > air_gamma - specific heat ratio > """ > if far < 0.: > return NAN > elif far < 0.005: > if t < 379. or t > 4731.: > return NAN > else: > air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. - > 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.0000002753524 * t ** 2. > + 0.0001684666 * t + 1.368652 > elif far < 0.069: > if t < 699. or t > 4731.: > return NAN > else: > a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + > 3.103507e-21 * far - 3.391308e-22 > a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. - > 5.469399e-17 * far + 6.058125e-18 > a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + > 3.865306e-13 * far - 4.302534e-14 > a3 = -0.00000001700602 * far ** 3. + 0.000000006593809 * far ** > 2. - 0.000000001392629 * far + 1.520583e-10 > a2 = 0.00003431136 * far ** 3. - 0.00001248285 * far ** 2. + > 0.000002688007 * far - 0.0000002651616 > a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - > 0.002676877 * far + 0.0001580424 > a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * > far + 1.372714 > air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t > ** 3. + a2 * t ** 2. + a1 * t + a0 > elif far >= 0.069: > return NAN > else: > return NAN > return air_gamma > > David Parker > Chromalloy - TDAG > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: air_gamma.py Type: application/octet-stream Size: 3676 bytes Desc: not available URL: From sturla at molden.no Mon Jan 31 12:35:11 2011 From: sturla at molden.no (Sturla Molden) Date: Mon, 31 Jan 2011 18:35:11 +0100 Subject: [Numpy-discussion] numpy.linalg.svd documentation In-Reply-To: References: <4D44C5C9.4060205@creativetrax.com> <4D4582F6.1090309@molden.no> <4D45A17C.1040007@molden.no> Message-ID: <4D46F2CF.90101@molden.no> Den 30.01.2011 21:40, skrev Charles R Harris: > Well, strictly speaking, both documentations say the same thing, but > the old version was somewhat obfuscated. Either svd returns v.H and A > = dot(u*d, v.H) or svd returns v and A = dot(u*d,v). I think the > second is a clearer statement of the return value and the resulting > factorization, but I suppose some may hold a different opinion. I agree that it is a clearer statement, since there is a difference between A = dot(u*d, v.H) and A = dot(u*d, vH), and we actually have the latter. Still, the common definition of SVD is u * s * v.H = x and not u * s * v = x This might be a bit confusing for those expecting the conjugate transpose of v in the decomposition. Be aware that Matlab's SVD is [u,s,v] = svd(x) so that u*s*v' = x. Clearly Matlab and NumPy differ on the definition of v here, with Matlab following the common convention. That is why I prefer the old notation u, s, vH = np.linalg.svd(x) v = vH.H as it leaves no room for confusion. As long as the behaviour of SVD has not changed, none of my SVD code will break. That was what worried me most :-) Sturla From akabaila at pcug.org.au Mon Jan 31 15:27:59 2011 From: akabaila at pcug.org.au (Algis Kabaila) Date: Tue, 1 Feb 2011 07:27:59 +1100 Subject: [Numpy-discussion] Inversion of near singular matrices. In-Reply-To: <4D46E2EA.7080004@molden.no> References: <201101292240.05470.akabaila@pcug.org.au> <201101311305.56481.akabaila@pcug.org.au> <4D46E2EA.7080004@molden.no> Message-ID: <201102010728.00282.akabaila@pcug.org.au> On Tuesday 01 February 2011 03:27:22 Sturla Molden wrote: > Den 31.01.2011 03:05, skrev Algis Kabaila: > > Actually, the structural engineer > > has no interest in trying to invert a singular matrix. > > However he/she is interested (or should be interested :) > > ) when the square response matrix might approach > > singularity for this would signal instability. > > I am sorry for having confused the issue by mentioning > statistics. The mathematics (linear algebra) is of course > the same. A singular matrix cannot be inverted by > definition. The methods mentioned (SVD, Tikohonov > regularization), as well as the transforms mentioned by > Paul, will let you avoid numerical instability when matrices > "approach singularity" (i.e. are very ill-conditioned). > > OT: I think I know what structural engineering is. Back in > 1994 I had to take a class in "statikk" (not sure what that > translates to in English), with a textbook by Fritjof > Irgens. From what I remember we did vector calculus to > ensure the forces in a construction summed to 0, so that > Newton's first law of motion would apply. It's unhealthy to > be inside a building otherwise ;-) > > Sturla Molden > I would guess that "statikk" is statics, the subject of conditions of equilibrium. Yes, teaching is not for the faint hearted... Particularly in "foreign" areas. Just to put your mind at ese - it is important to have some idea of statistics even in simplest engineering structures, such as those made up of statically determinate trusses. (A truss is made up of members that are pin jointed, or are imagined to be pin jointed. Because of the pin joints, each member can only be subjected to an axial force. My next code snippet will show the vagaries of analisis of statically determinate trusses). Before I can really ask my next question, I should know what matrix norms are used for the calculation of matrix condition number in numpy.linalg. You see, I tried to compare it with a condition number found in an undergraduate text book and got a totally different number. So if you know that and are able to explain it in simple terms so that even engineers can understand it, it will be greatly appreciated. Al. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf From zachary.pincus at yale.edu Mon Jan 31 15:55:05 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 31 Jan 2011 12:55:05 -0800 Subject: [Numpy-discussion] create a numpy array of images In-Reply-To: References: <4D4311A7.20007@noaa.gov> Message-ID: <0810160E-E830-4FA0-8129-FF31AF960DED@yale.edu> >>> I am using python for a while now and I have a requirement of >>> creating a >>> numpy array of microscopic tiff images ( this data is 3d, meaning >>> there are >>> 100 z slices of 512 X 512 pixels.) How can I create an array of >>> images? >> >> It's quite straightforward to create a 3-d array to hold this kind >> of data: >> >> image_block = np.empty((100, 512, 512), dtype=??) >> >> now you can load it up by using some lib (PIL, or ???) to load the >> tif >> images, and then: >> >> for i in images: >> image_block[i,:,:] = i > > Notice that since PIL 1.1.6, PIL Image objects support the numpy > interface: http://effbot.org/zone/pil-changes-116.htm For even longer than this, PIL has been somewhat broken with regard to 16-bit images (very common in microscopy); you may run into strange byte-ordering issues that scramble the data on reading or writing. Also, PIL's numpy interface is somewhat broken in similar ways. (Numerous people have provided patches to PIL, but these are never incorporated into any releases, as far as I can tell.) So try PIL, but if the images come out all wrong, you might want to check out the scikits.image package, which has hooks for various other image read/write tools. Zach From Chris.Barker at noaa.gov Mon Jan 31 18:13:31 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 31 Jan 2011 15:13:31 -0800 Subject: [Numpy-discussion] using loadtxt() for given number of rows? In-Reply-To: References: Message-ID: <4D47421B.8070501@noaa.gov> On 1/31/11 4:39 AM, Robert Cimrman wrote: > I work with text files which contain several arrays separated by a few > lines of other information, for example: > > POINTS 4 float > -5.000000e-01 -5.000000e-01 0.000000e+00 > 5.000000e-01 -5.000000e-01 0.000000e+00 > 5.000000e-01 5.000000e-01 0.000000e+00 > -5.000000e-01 5.000000e-01 0.000000e+00 > > CELLS 2 8 > 3 0 1 2 > 3 2 3 0 > I have used custom Python code with loops to read similar files, so the > speed was not too good. Now I wonder if it would be possible to use the > numpy.loadtxt() function for the "array-like" parts. It supports passing > an open file object in, which is good, but it wants to read the entire > file, which does not work in this case. > > It seems to me, that an additional parameter to loadtxt(), say "nrows" or > "numrows", would do the job, I agree that that would be a useful feature. However, I'm not sure it would help performance much -- I think loadtxt is written in python as well. One option in the meantime. If you know how many rows, you presumable know how many items on each row. IN that case, you can use: np.fromfile(open_file, sep=' ', count=num_items_to_read) It'll only work for multi-line text if the separator is whitespace, which it was in your example. But if it does, it should be pretty fast. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From friedrichromstedt at gmail.com Mon Jan 31 21:32:29 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Tue, 1 Feb 2011 03:32:29 +0100 Subject: [Numpy-discussion] Strange behaviour of numpy.asarray() in corner case In-Reply-To: References: Message-ID: 2011/1/28 Friedrich Romstedt : >>>> numpy.asarray([X(), numpy.asarray([1, 1])]).shape > (2,) >>>> numpy.asarray([numpy.asarray([1, 1]), X()]).shape > () Does noone have an opinion about this? Shall I file a ticket? Friedrich From warren.weckesser at enthought.com Mon Jan 31 22:04:48 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 31 Jan 2011 21:04:48 -0600 Subject: [Numpy-discussion] Strange behaviour of numpy.asarray() in corner case In-Reply-To: References: Message-ID: On Mon, Jan 31, 2011 at 8:32 PM, Friedrich Romstedt < friedrichromstedt at gmail.com> wrote: > 2011/1/28 Friedrich Romstedt : > >>>> numpy.asarray([X(), numpy.asarray([1, 1])]).shape > > (2,) > >>>> numpy.asarray([numpy.asarray([1, 1]), X()]).shape > > () > > Does noone have an opinion about this? Looks wrong to me. Shall I file a ticket? > Yes. Warren > > Friedrich > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Mon Jan 31 23:44:13 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Tue, 1 Feb 2011 05:44:13 +0100 Subject: [Numpy-discussion] Strange behaviour of numpy.asarray() in corner case In-Reply-To: References: Message-ID: 2011/2/1 Warren Weckesser : >> ?Shall I file a ticket? > > Yes. Ok, #1730: http://projects.scipy.org/numpy/ticket/1730. Thanks, Friedrich